Logistic Regression - Hyperparameter Tuning

Using GridSearch Algorithm

·

2 min read

Index

  • Introduction

  • What are hyperparameters?

  • Why is hyperparameter tuning necessary?

  • Code Example

  • Conclusion

Introduction

Before we dive into hyperparameter tuning of a Logistic Regression model, it's essential to know what Logistic Regression is. Logistic Regression is a machine learning algorithm primarily used as a classification algorithm.

For example:-

  • Classify an email as spam or not

What are hyperparameters?

When we train a machine learning or a deep learning algorithm, these algorithms have a lot of parameters, but the parameters that affect the model's performance the most are known as hyperparameters.

Why is hyperparameter tuning necessary?

Let's say you trained your model, and the accuracy of the model is very low, let's say less than 50%. Then it's rather apparent that this model is not very useful as it is wrong more than half the time.

So in situations like this, we tune the hyperparameters to increase the accuracy of our models.

Code Example

Here is a code example on how to perform hyperparameter tuning on a machine learning model

Importing libraries and functions

import pandas as pd 
import numpy as np 
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split GridSearchCV
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

Importing dataset

Dataset Link

data  = pd.read_csv('diabetes.csv')

Data Preprocessing

X = data.drop(['Outcome'], axis=1)
y = data['Outcome']
objectList = X.select_dtypes(include=["int64",'float64']).columns

le = LabelEncoder()

for feature in objectList:
    X[feature] = le.fit_transform(X[feature].astype(str))
print(X.info())

Splitting the data in training and testing data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
stratify=y, random_state=42)

Model Training and Evaluation

logi = LogisticRegression()

logi.fit(X_train, y_train)
y_predict = logi.predict(X_test)
acc = accuracy_score(y_test, y_predict)

Accuracy Before Hyperparameter tuning:- 69.4%

Hyperparameter Tuning

param_grid = [    
    {'penalty' : ['l1', 'l2', 'elasticnet', 'none'],
    'C' : np.logspace(-10, 10, 20),
    'solver' : ['lbfgs','newton-cg','liblinear','sag','saga'],
    'max_iter' : [100, 1000,2500, 5000]
    }
]

logi_hypertuned = GridSearchCV(logi, param_grid = param_grid, cv = 3, verbose=True, n_jobs=-1)

res_hypertuned = logi_hypertuned.fit(X,y)

print(best_clf.best_estimator_)

Best Parameters :- LogisticRegression(C=0.2976351441631313, solver='liblinear') Accuracy After Hyperparameter tuning:- 71.5%

Conclusion

As we can see, tuning the hyperparameters can give better results.