Understanding Linear and Logistic Regression: Core Machine Learning Concepts
Linear Regression and Logistic Regression are fundamental concepts in machine learning that serve as basic solutions for classification and regression problems. These concepts, along with techniques like GDR and non-linear weight optimization, now form the foundation of deep learning.
Linear Regression
Linear regression is a type of modeling that shows the relationship between explanatory variables and scalar responses. It uses a linear approach called a "linear model". The algorithms that predict parameters must follow a key restriction: their conditional average must be expressed as an affine function. The most common algorithms for linear regression are least squares and Newton's method.
Formulation
represents a vector of observations, which can be a multi-dimensional matrix.
represents the model parameters, which have a dimension of .
represents possible error.
Key Concepts and Limitations
While deep learning and other advanced machine learning methods have largely superseded linear regression, it remains more cost-effective in certain cases.
Exogeneity is a measurement or property that is not related to the model's error.
Strict Exogeneity: The model maintains exogeneity over an extended period.
Weak Exogeneity: The model only maintains exogeneity over the current period.
Deterministic: The model maintains exogeneity for past periods but not for current and future periods
Linearity means the relationship between parameters and explanatory variables can be measured through linear combinations.
Constant Variance means the model's error range remains independent of the predicted value. For example, if the model predicts an individual's income as 1000, their actual income might range from
800~1200
.Independence of Errors means that errors are not correlated with each other. This is one of the major limitations of linear regression, though it can be addressed through data regularization or Bayesian linear regression.
Understanding Learning Rule to Fit the Model using GDR, Gradient Descent Rule
GDR (Gradient Descent Rule) is a learning rule and optimization technique for linear regression that helps fit the model to the problem. It minimizes the Cost Function by updating weights. This approach has become the fundamental workflow for optimization in modern machine learning and deep learning.
Initialize weight as or random number.
Calculate the relationship between the model and real-world observations using cost function .
Until is fully minimized, the algorithm continues calculating , where is the newly updated weight and $w$ is the previous weight.
Logistic Regression
Logistic Model (or Logit Model) is a statistical method that predicts the log-odds of an event using a linear combination of variables. The most common measurement is Cross-Entropy Loss (or Log Loss), which differs from linear least squares but can still be explained as ordinary least squares.
Formulation
Input is called feature vector while output is called label.
represents the linear combination of inputs and weights
while can be any real number, (called Sigmoid Function) maps it to a probability space between .
The Sigmoid/Logistic Function as an Activation Function
Activation Function is a mathematical function applied to the output. Its main purposes are adding non-linearity to the model and leveraging the output range to help make better decisions—most image recognition and NLP models cannot work without it.
Cross-entropy/Log Loss
Cross-Entropy Loss is an algorithm that fits or evaluates the parameters as log-likelihood, which differs slightly from least squares. It ensures convexity during gradient descent and penalizes wrong predictions more heavily when the model is "confident but wrong".
To minimize , update weights using the gradient:
Where the gradient is:
Vectorized update rule from the above:
Last updated