Index

Introduction
What are hyperparameters?
Why is hyperparameter tuning necessary?
Code Example
Conclusion

`Introduction`

Before we dive into hyperparameter tuning of a Logistic Regression model, it's essential to know what Logistic Regression is. Logistic Regression is a machine learning algorithm primarily used as a classification algorithm.

For example:-

Classify an email as spam or not

`What are hyperparameters?`

When we train a machine learning or a deep learning algorithm, these algorithms have a lot of parameters, but the parameters that affect the model's performance the most are known as hyperparameters.

`Why is hyperparameter tuning necessary?`

Let's say you trained your model, and the accuracy of the model is very low, let's say less than 50%. Then it's rather apparent that this model is not very useful as it is wrong more than half the time.

So in situations like this, we tune the hyperparameters to increase the accuracy of our models.

`Code Example`

Here is a code example on how to perform hyperparameter tuning on a machine learning model

`Importing libraries and functions`

import pandas as pd 
import numpy as np 
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split GridSearchCV
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

`Importing dataset`

Dataset Link

data  = pd.read_csv('diabetes.csv')

`Data Preprocessing`

X = data.drop(['Outcome'], axis=1)
y = data['Outcome']
objectList = X.select_dtypes(include=["int64",'float64']).columns

le = LabelEncoder()

for feature in objectList:
    X[feature] = le.fit_transform(X[feature].astype(str))
print(X.info())

`Splitting the data in training and testing data`

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
stratify=y, random_state=42)

`Model Training and Evaluation`

logi = LogisticRegression()

logi.fit(X_train, y_train)
y_predict = logi.predict(X_test)
acc = accuracy_score(y_test, y_predict)

Accuracy Before Hyperparameter tuning:- 69.4%

`Hyperparameter Tuning`

param_grid = [    
    {'penalty' : ['l1', 'l2', 'elasticnet', 'none'],
    'C' : np.logspace(-10, 10, 20),
    'solver' : ['lbfgs','newton-cg','liblinear','sag','saga'],
    'max_iter' : [100, 1000,2500, 5000]
    }
]

logi_hypertuned = GridSearchCV(logi, param_grid = param_grid, cv = 3, verbose=True, n_jobs=-1)

res_hypertuned = logi_hypertuned.fit(X,y)

print(best_clf.best_estimator_)

Best Parameters :- LogisticRegression(C=0.2976351441631313, solver='liblinear') Accuracy After Hyperparameter tuning:- 71.5%

`Conclusion`

As we can see, tuning the hyperparameters can give better results.

Logistic Regression - Hyperparameter Tuning

Using GridSearch Algorithm

Table of contents