linear_regression


"/home/yossef/notes/personal/ml/linear_regression.md"

path: personal/ml/linear_regression.md

- **fileName**: linear_regression
- **Created on**: 2025-04-30 02:43:06

Linear regression is a statistical method used to model the relationship
between a dependent variable and one or more independent variables

so it's making a model by represented relation between some variable or
characteristic for agent so making a connection and found a pattern to make
a relation between this and making prediction based on this relation.

linear regression is type of supervised learning (learning from labeled data)
mapping this datasets to points with the most optimized linear function

Example:

if we want to predict house price we consider various factor such as house age,
distance from the main road, location, area and number of room,
linear regression uses all these parameter to predict house price as it consider
a linear relation between all these features and price of house.

it's worked based on the best fit line

best fit line: the error between the predict point and the actual value
should be minimum, so it's represent a relation between variables to line
linear regression img relation

Here Y is called a dependent or target variable and X is called an
independent variable also known as the predictor of Y. There are many
types of functions or modules that can be used for regression. A linear
function is the simplest type of function. Here, X may be a single feature
or multiple features representing the problem.

Example: Predicting Salary Based on Experience

Data:
Experience (Years) (X) Salary ($1000s) (Y)

Experience (Years) Salary ($1000s)
1 30
2 35
3 40
4 45
5 50

You can see there's a linear relationship:
as experience increases, salary increases.

Linear Regression Goal:

We want to find a line (function) that best predicts salary (Y) given years
of experience (X).

The line has the form:

Y=mX+b

Where:
m = slope (how much Y changes when X increases)
b = intercept (the value of Y when X = 0)

tensorflow exmaple:

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Years of experience
y = np.array([30, 35, 40, 45, 50])  # Salary in $1000s

# Train the model
model = LinearRegression()
model.fit(X, y)

# Predict
predictions = model.predict(X)

# Plot
plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X, predictions, color='red', label='Regression Line')
plt.xlabel('Experience (Years)')
plt.ylabel('Salary ($1000s)')
plt.title('Linear Regression Example')
plt.legend()
plt.show()

linear regression tensor output

some conditions to get accurate results

Case 1: Homoscedasticity

Experience (X) Actual Salary (Y) Predicted Salary (Y') Error (Residual)
1 year 30,000 31,000 -1,000
2 years 35,000 34,000 +1,000
3 years 40,000 41,000 -1,000
4 years 45,000 44,000 +1,000
5 years 50,000 49,000 +1,000

the errors are small and stay roughly the same → Model is valid.

Case 2: Heteroscedasticity

Experience (X) Actual Salary (Y) Predicted Salary (Y') Error (Residual)
1 year 30,000 29,000 +1,000
2 years 35,000 33,000 +2,000
3 years 40,000 37,000 +3,000
4 years 45,000 40,000 +5,000
5 years 50,000 42,000 +8,000

The error increases as experience increases → Model is not reliable.

2. Multiple Linear Regression

Multiple linear regression involves more than one independent variable and one dependent variable. The equation for multiple linear regression is:

y=β0+β1X1+β2X2++βnXn

Where:

No Multicollinearity (in Multiple Linear Regression)

Definition:
There should be little or no correlation between
independent variables in a multiple linear regression model.

Why it matters:

If two or more independent variables are highly
correlated
(i.e., they provide overlapping information), the
model can't accurately determine which variable is actually
affecting the dependent variable.

What is Multicollinearity?

Example:

Suppose you’re trying to predict house prices based on:

But usually, bigger houses have more rooms — so X₁ and
X₂ are highly correlated.

As a result:

Problem:

If there's multicollinearity, multiple linear regression
becomes inaccurate and less interpretable.

Tip for Detecting Multicollinearity:

Additivity (in Multiple Linear Regression)

Definition:
The model assumes that the effect of changes in a predictor variable
on the response variable is consistent regardless of the values of
the other variables. This assumption implies that there is no
interaction
between variables in their effects on the dependent
variable.

Why it matters:

If this assumption is violated (i.e., there are interactions
between variables), the model will not be able to account for the
combined effects properly, leading to inaccurate predictions and
interpretations.

Example:

Suppose we are predicting house prices using two variables:

The assumption of additivity implies that the effect of increasing
the size (X₁) on the price will be the same regardless of how many
rooms (X₂) the house has.

Feature Selection in Multiple Linear Regression

Feature selection is the process of choosing the most relevant
predictor variables to include in the model. This helps to:

Selecting the best features ensures that the model doesn't use
irrelevant or redundant predictors that might introduce noise and
increase variance.

Overfitting: Overfitting occurs when the model fits the training data too closely,

Real-World Applications of Multiple Linear Regression (MLR)

Real Estate Pricing

is used to predict property prices based on
multiple factors such as:

Financial Forecasting

to predict stock prices or economic
indicators
based on multiple influencing factors, including:

Agricultural Yield Prediction

Farmers can apply MLR to estimate crop yields using variables
like:

E-commerce Sales Analysis

An e-commerce company can use MLR to analyze how various factors
impact sales, such as:

for cost function for linear regression

./cost_function_linear_regression.md

Gradient Descent for Linear Regression

A linear regression model can be trained using the optimization
algorithm gradient descent. This method iteratively updates
the model’s parameters to reduce the Mean Squared Error (MSE)
on a training dataset.

Goal: Minimize the Cost Function

The goal is to minimize the cost function (typically the MSE),
which results in a best-fit line for the data. To do this,
we update the parameters ( theta_1 ) and ( \theta_2 )
to reduce the Root Mean Squared Error (RMSE).

We start with random initial values for ( \theta_1 ) and
( \theta_2 ), then update them iteratively using the gradient
descent algorithm:

θj:=θjαJθj

Where:

Iterative Updates

Each update step moves ( \theta_1 ) and ( \theta_2 ) slightly
closer to the values that minimize the cost function.

Over many iterations, gradient descent converges toward the
optimal parameters for the best linear regression line.

gradient decent linear aggression

example using pytorch:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# 1. Generate synthetic data
# Experience (in years)
X = torch.tensor([[1.0], [2.0], [3.0], [4.0], [5.0]])
# Salary in thousands
Y = torch.tensor([[30.0], [35.0], [40.0], [45.0], [50.0]])

# 2. Define a simple linear regression model
class LinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)  # input and output are both 1D

    def forward(self, x):
        return self.linear(x)

model = LinearRegressionModel()

# 3. Loss function: Mean Squared Error
criterion = nn.MSELoss()

# 4. Optimizer: Gradient Descent (SGD)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# 5. Training the model
epochs = 200
loss_values = []

for epoch in range(epochs):
    model.train()
    
    # Forward pass: compute predicted y
    y_pred = model(X)

    # Compute loss
    loss = criterion(y_pred, Y)
    loss_values.append(loss.item())

    # Zero gradients, backward pass, update weights
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Plot every 50 epochs
    if epoch % 50 == 0:
        plt.figure()
        plt.scatter(X.numpy(), Y.numpy(), color='blue', label='Actual')
        plt.plot(X.numpy(), y_pred.detach().numpy(), color='red', label='Predicted')
        plt.title(f'Epoch {epoch} | Loss: {loss.item():.2f}')
        plt.xlabel('Experience (Years)')
        plt.ylabel('Salary (Thousands)')
        plt.legend()
        plt.grid(True)
        plt.show()

# 6. Final prediction
final_pred = model(X).detach().numpy()

# 7. Confusion Matrix (only for demonstration)
# Convert to discrete bins to simulate classification
actual_classes = (Y // 5).numpy().flatten().astype(int)
predicted_classes = (torch.tensor(final_pred) // 5).numpy().flatten().astype(int)

# Compute and show confusion matrix
cm = confusion_matrix(actual_classes, predicted_classes)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.title("Confusion Matrix (Binned Salaries)")
plt.show()

# 8. Plot loss over epochs
plt.figure()
plt.plot(loss_values)
plt.title("Loss over Epochs")
plt.xlabel("Epoch")
plt.ylabel("MSE Loss")
plt.grid(True)
plt.show()

references