Regression and Classification

Regression:

Definition: Regression is a type of supervised learning algorithm used to predict continuous numerical values. It establishes a relationship between independent variables(x) and dependent variables(y) in the form of a mathematical model or function.
Example: Predicting house prices based on features like square feet, number of bedrooms, and location.
Mathematical Representation: The basic form of a linear regression model with one independent variable is given by:
- Simple Linear Regression: $y = m x + c$
- Multiple Linear Regression: $y = b 0 + b 1 x 1 + b 2 x 2 + . . . + b n x n$
Objective: The goal of regression is to minimize the error or the difference between the predicted value and the actual value, often measured using metrics like Mean Squared Error (MSE) or Root Mean Squared Error (RMSE).

Formulae:

Residual Sum of Squares (RSS):

$R S S = \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}$

RSS measures the difference between the predicted values and the actual values.

Mean Squared Error (MSE):

$M S E = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}$

MSE calculates the average squared difference between the actual and predicted values.

Root Mean Squared Error (RMSE):

$R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}$

RMSE is the square root of the MSE and provides the error in the same units as the response variable.

Mean Absolute Error (MAE):

$M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |$

MAE measures the average absolute differences between actual and predicted values.

In these formulas:

$y_{i}$ - represents the actual value.
${\hat{y}}_{i}$ - represents the predicted value.
$n$ - is the number of samples.

Classification:

Definition: Classification is a type of supervised learning algorithm used to categorize data into discrete classes or categories based on past observations. It predicts the class label for the given input data.
Example: Email spam classification, where emails are classified as spam or non-spam.
Mathematical Representation: Classification problems can be solved using various algorithms such as logistic regression, support vector machines, decision trees, or neural networks. The basic logistic regression model can be represented as:
- Logistic Regression:

$h (x) = \frac{1}{1 + e^{- (b 0 + b 1 x 1 + b 2 x 2 + . . . + b n x n)}}$

Objective: In classification, the algorithm aims to minimize misclassification by adjusting the decision boundary separating different classes. Common evaluation metrics for classification include accuracy, precision, recall, F1 score, ROC curve, and AUC.

Formulae:

Accuracy:

$A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}$

Accuracy measures the proportion of correct predictions out of the total predictions made.

Precision:

$P r e c i s i o n = \frac{T P}{T P + F P}$

Precision quantifies the accuracy of the positive predictions made.

Recall (Sensitivity):

$R e c a l l = \frac{T P}{T P + F N}$

Recall measures the proportion of actual positives that were correctly identified.

F1 Score:

$F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$

F1 score is the harmonic mean of precision and recall, providing a balance between the two metrics.

In these formulas:

$T P$ - is the number of true positives.
$T N$ - is the number of true negatives.
$F P$ - is the number of false positives.
$F N$ - is the number of false negatives.

Differentiation between Regression and Classification:

Output Type: Regression predicts continuous values, while classification predicts discrete class labels.
Evaluation: Regression is evaluated using metrics like RMSE, MAE, while classification typically uses accuracy, precision, recall, etc.
Models: Regression can use models like linear regression, polynomial regression, and neural networks. Classification can use models like logistic regression, decision trees, and SVMs.

Regression and Classification

Regression:

Formulae:

$R S S = \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}$

$M S E = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}$

$R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}$

$M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |$

Classification:

$h (x) = \frac{1}{1 + e^{- (b 0 + b 1 x 1 + b 2 x 2 + . . . + b n x n)}}$

Formulae:

$A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}$

$P r e c i s i o n = \frac{T P}{T P + F P}$

$R e c a l l = \frac{T P}{T P + F N}$

$F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$

Differentiation between Regression and Classification:

View Next Topics:

Clustering:

Training the Model:

Regression:

Formulae:

RSS=∑i=1n(yi−y^i)2

MSE=1n∑i=1n(yi−y^i)2

RMSE=1n∑i=1n(yi−y^i)2

MAE=1n∑i=1n|yi−y^i|

Classification:

h(x)=11+e−(b0+b1x1+b2x2+...+bnxn)

Formulae:

Accuracy=TP+TNTP+TN+FP+FN

Precision=TPTP+FP

Recall=TPTP+FN

F1=2×Precision×RecallPrecision+Recall

Differentiation between Regression and Classification:

View Next Topics:

Clustering:

Training the Model:

$R S S = \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}$

$M S E = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}$

$R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}$

$M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |$

$h (x) = \frac{1}{1 + e^{- (b 0 + b 1 x 1 + b 2 x 2 + . . . + b n x n)}}$

$A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}$

$P r e c i s i o n = \frac{T P}{T P + F P}$

$R e c a l l = \frac{T P}{T P + F N}$

$F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$