Photo by Daniel Jensen on Unsplash
Logistic Regression - Hyperparameter Tuning
Using GridSearch Algorithm
Index
Introduction
What are hyperparameters?
Why is hyperparameter tuning necessary?
Code Example
Conclusion
Introduction
Before we dive into hyperparameter tuning of a Logistic Regression model, it's essential to know what Logistic Regression is. Logistic Regression is a machine learning algorithm primarily used as a classification algorithm.
For example:-
- Classify an email as spam or not
What are hyperparameters?
When we train a machine learning or a deep learning algorithm, these algorithms have a lot of parameters, but the parameters that affect the model's performance the most are known as hyperparameters.
Why is hyperparameter tuning necessary?
Let's say you trained your model, and the accuracy of the model is very low, let's say less than 50%. Then it's rather apparent that this model is not very useful as it is wrong more than half the time.
So in situations like this, we tune the hyperparameters to increase the accuracy of our models.
Code Example
Here is a code example on how to perform hyperparameter tuning on a machine learning model
Importing libraries and functions
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split GridSearchCV
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')
Importing dataset
data = pd.read_csv('diabetes.csv')
Data Preprocessing
X = data.drop(['Outcome'], axis=1)
y = data['Outcome']
objectList = X.select_dtypes(include=["int64",'float64']).columns
le = LabelEncoder()
for feature in objectList:
X[feature] = le.fit_transform(X[feature].astype(str))
print(X.info())
Splitting the data in training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
stratify=y, random_state=42)
Model Training and Evaluation
logi = LogisticRegression()
logi.fit(X_train, y_train)
y_predict = logi.predict(X_test)
acc = accuracy_score(y_test, y_predict)
Accuracy Before Hyperparameter tuning:- 69.4%
Hyperparameter Tuning
param_grid = [
{'penalty' : ['l1', 'l2', 'elasticnet', 'none'],
'C' : np.logspace(-10, 10, 20),
'solver' : ['lbfgs','newton-cg','liblinear','sag','saga'],
'max_iter' : [100, 1000,2500, 5000]
}
]
logi_hypertuned = GridSearchCV(logi, param_grid = param_grid, cv = 3, verbose=True, n_jobs=-1)
res_hypertuned = logi_hypertuned.fit(X,y)
print(best_clf.best_estimator_)
Best Parameters :- LogisticRegression(C=0.2976351441631313, solver='liblinear') Accuracy After Hyperparameter tuning:- 71.5%
Conclusion
As we can see, tuning the hyperparameters can give better results.