Regression and Classification


Regression:

  1. Definition: Regression is a type of supervised learning algorithm used to predict continuous numerical values. It establishes a relationship between independent variables(x) and dependent variables(y) in the form of a mathematical model or function.

  2. Example: Predicting house prices based on features like square feet, number of bedrooms, and location.

  3. Mathematical Representation: The basic form of a linear regression model with one independent variable is given by:

    • Simple Linear Regression: y=mx+c
    • Multiple Linear Regression: y=b0+b1x1+b2x2+...+bnxn
  4. Objective: The goal of regression is to minimize the error or the difference between the predicted value and the actual value, often measured using metrics like Mean Squared Error (MSE) or Root Mean Squared Error (RMSE).

Formulae:

RSS=i=1n(yiy^i)2

RSS measures the difference between the predicted values and the actual values.

MSE=1ni=1n(yiy^i)2

MSE calculates the average squared difference between the actual and predicted values.

RMSE=1ni=1n(yiy^i)2

RMSE is the square root of the MSE and provides the error in the same units as the response variable.

MAE=1ni=1n|yiy^i|

MAE measures the average absolute differences between actual and predicted values.

In these formulas:


Classification:

  1. Definition: Classification is a type of supervised learning algorithm used to categorize data into discrete classes or categories based on past observations. It predicts the class label for the given input data.

  2. Example: Email spam classification, where emails are classified as spam or non-spam.

  3. Mathematical Representation: Classification problems can be solved using various algorithms such as logistic regression, support vector machines, decision trees, or neural networks. The basic logistic regression model can be represented as:

    • Logistic Regression:

h(x)=11+e(b0+b1x1+b2x2+...+bnxn)

  1. Objective: In classification, the algorithm aims to minimize misclassification by adjusting the decision boundary separating different classes. Common evaluation metrics for classification include accuracy, precision, recall, F1 score, ROC curve, and AUC.
Formulae:

Accuracy=TP+TNTP+TN+FP+FN

Accuracy measures the proportion of correct predictions out of the total predictions made.

Precision=TPTP+FP

Precision quantifies the accuracy of the positive predictions made.

Recall=TPTP+FN

Recall measures the proportion of actual positives that were correctly identified.

F1=2×Precision×RecallPrecision+Recall

F1 score is the harmonic mean of precision and recall, providing a balance between the two metrics.

In these formulas:


Differentiation between Regression and Classification:

  1. Output Type: Regression predicts continuous values, while classification predicts discrete class labels.
  2. Evaluation: Regression is evaluated using metrics like RMSE, MAE, while classification typically uses accuracy, precision, recall, etc.
  3. Models: Regression can use models like linear regression, polynomial regression, and neural networks. Classification can use models like logistic regression, decision trees, and SVMs.

View Next Topics:

Clustering:

Training the Model: