Photo by Luke Chesser on Unsplash
How to evaluate the performance of your machine learning model?
Evaluating your machine learning model using Scikit-learn
Index
What is model evaluation?
Different methods of evaluating your classification model
How to evaluate your model using Scikit-learn metrics?
Conclusion
What is model evaluation?
Let's say you are learning a new subject, and you have spent countless hours on that subject but have not tested yourself. How can you know for sure that you know the subject, or are you headed for embarrassment in a future situation?
So without evaluating your knowledge, you won't even trust yourself.
The same logic applies to machine learning models you spent hours training if you don't know how well they have learned the training data, you can't trust them in real-world situations.
So, to answer the question, "What is model evaluation ?" it is the process of using different evaluation techniques to understand the performance of our model and also get an idea of its efficiency, strengths and weakness.
Different methods of evaluating your classification model.
There are several methods of evaluating your classification model, but in this article, we will discuss the most commonly used metrics.
Before we start with the metrics, we will make some assumptions to understand the concepts better, both theoretically and mathematically
Assumptions:-
P -> Total number of Positive Samples
N -> Total number of Negative Samples
PP -> Total number of Predicted Positive Samples
PN -> Total number of Predicted Negative Samples
TP -> Total number of Predicted Positive Samples that were Positive Samples
TN -> Total number of Predicted Negative Samples that were Negative Samples
FP -> Total number of Predicted Positive Samples that were Negative Samples
FN ->Total number of Predicted Negative Samples that were Positive Samples
1. Accuracy
Accuracy can be defined as the total number of correct predictions upon the total number of predictions.
Formula:-
$$\frac{TP + TN}{P+T}$$
For example, if your model made 85 correct predictions out of 100 predictions, then the accuracy is simply 85 %
2. Precision
The precision score tells us about the ability of our model not to classify a negative sample as positive.
Formula:-
$$\frac{TP }{TP+FP}$$
For example, let us say you have a binary classification model that classifies a person as diabetic or non-diabetic, and this model has a precision score of 80%. This means that 20% per cent of the time, your model classifies a non-diabetic person as diabetic or a diabetic person as non-diabetic.
We sure don't want a diabetic person to celebrate being non-diabetic and thus reducing their lifespan. It also won't be nice to stop a non-diabetic person from eating chocolates while he still can.
So, in this case, high precision is necessary. There might be cases where you can trade off precision for better accuracy, but this is not one of them.
3. Recall
The recall score can be defined as the measure of predicted positive samples out of the total positive samples.
Formula:-
$$\frac{TP }{TP+FN}$$
For example, if our low-precision diabetes model classifies 60 patients as diabetic, but the total number of diabetic patients there were 80, it means that our model falsely classified 20 people as non-diabetic.
4. F1 Score
After learning about Precision and Recall, there are a few scenarios that arise.
- High Precision and Low Recall
In this case, the model identifies the positive values correctly, but it identifies very few of them out of the total positive values.
- Low Precision and High Recall
In this case, the model identifies a lot of positive values, but most of them are not correct.
- Low Precision and Low Recall
In this case, the model identifies the positive values incorrectly, and it identifies very few of them out of the total positive values. (Worst Case)
- High Precision and High Recall
In this case, the model identifies the positive values correctly but also identifies most of them out of the total positive values. (Best Case)
F1 scores focus on the last case, which means high precision, recall means a high F1 score, which is desirable. Formula:-
$$\frac{2 \cdot Precision \cdot Recall }{Precision+Recall}$$
How to evaluate your model using Scikit-learn metrics?
Here is a code example on how to evaluate your model using Scikit-learn.
Importing libraries and functions
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,f1_score, precision_score, recall_score
import warnings
warnings.filterwarnings('ignore')
Importing dataset
data = pd.read_csv('diabetes.csv')
Data Preprocessing
X = data.drop(['Outcome'], axis=1)
y = data['Outcome']
objectList = X.select_dtypes(include=["int64",'float64']).columns
le = LabelEncoder()
for feature in objectList:
X[feature] = le.fit_transform(X[feature].astype(str))
print(X.info())
Splitting the data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
stratify=y, random_state=42)
Model Training and Evaluation
logi = LogisticRegression()
logi.fit(X_train, y_train)
y_predict = logi.predict(X_test)
acc = accuracy_score(y_test, y_predict)
pre = precision_score(y_test, y_predict)
rec = recall_score(y_test, y_predict)
f1 = f1_score(y_test, y_predict)
print(f"Accuracy Score {acc}")
print(f"Precision Score {pre}")
print(f"Recall Score {rec}")
print(f"F1 Score {f1}")
Output :-
Accuracy Score 0.6948051948051948
Precision Score 0.6060606060606061
Recall Score 0.37037037037037035
F1 Score 0.45977011494252873
Conclusion
Evaluation metrics are essential and can give us a lot of insight into our model's performance.