Introduction#
These notes are written as a quick reference on ordinary least squares (OLS) regression. I have more detailed notes in pdf form here .
Suppose we have a dataset , where is a scalar covariate and is the response. We assume this data is drawn i.i.d. from some unknown distribution . Since we’re interested in predicting from , it’s natural to factor the joint distribution as:
In linear regression, we posit a parametric model for the conditional distribution:
where and is a fixed noise variance. This is equivalent to the model:
The goal is to find parameters and that minimize the squared residuals. Define the loss function:
Minimizing with respect to and yields the closed-form solution:
where and .
Multivariate Case#
Now consider the general multivariate case where each input is a vector . To include the intercept , we prepend a 1 to each input vector. That is, we define the augmented input vector .
Stack the data into matrices:
- Let be the design matrix with rows ,
- Let be the response vector.
The model is:
where includes the intercept term as its first entry.
Maximizing the likelihood (or equivalently minimizing squared error) yields the OLS estimator:
This formula generalizes the scalar case and provides a fast, closed-form solution when is invertible. We will discuss more about when is invertible in the section on multicollinearity.
Assumptions#
Before we apply the OLS estimator, it’s worth reviewing the key assumptions that underpin it. These assumptions aren’t always strictly satisfied in practice — in fact, they’re often violated to some degree. But knowing what assumptions we’re making helps us understand when and why the OLS results might be misleading.
OLS Regression Assumptions#
- Linearity
- Random sampling
- No perfect multicollinearity
- No weak exogeneity
- Homoscedasticity
- Uncorrelated errors
- Errors follow a distribution
- Model specification
Linearity#
This means that the response is linear wrt the parameters . For example, if we had two covariates and , we are assuming that . The covariates may themselves be nonlinear functions of an underlying variable, e.g. we could model .
Random sampling#
The data samples are assumed to be IID.
Homoscedasticity#
Residuals have constant variance: for all .
Uncorrelated errors#
Residuals are not autocorrelated. That is, they satisfy
for .
No perfect multicollinearity#
If you have multiple covariates, perfect multicollinearity means that a subset of the covariates are linearly dependent. The simplest example would be . More generally, it means there exists a relationship like , where are the indices of the multicollinear covariates. Although the assumption is that there is no perfect multicollinearity, even if there is strong multicollinearity OLS regression becomes unreliable, even if the asymptotic properties still hold.
Errors follow a distribution#
This is optional, but we typically assume that the residuals are i.i.d Gaussian:
.
Model specification#
This is a very subtle and easy-to-forget assumption, which basically says that we assume the model is true. In other words, not only do we assume that the true relationship is linear, but also that the only covariates are precisely the ones we included in our formula. This means that we’re assuming there are no other omitted covariates that influence the response (I’m using the word “cause” very loosely here).
Significance Testing#
After fitting the OLS model, we often want to assess whether each coefficient is statistically different from zero. This is typically done using a t-test, which tests the null hypothesis against the alternative .
General form#
The test statistic for coefficient is given by:
where is the estimated standard error of . Under the null hypothesis, and assuming the OLS assumptions hold (particularly homoscedasticity and Gaussian errors), this statistic approximately follows a Student’s t-distribution with degrees of freedom, where is the number of covariates. (For example if our model is then ).
Univariate case#
For the simple regression model , we define the residual variance:
Then the standard error of the slope estimator is:
and the t-statistic becomes:
The corresponding p-value is computed from the t-distribution with degrees of freedom.
The standard error of the intercept is:
Multivariate case#
In the multivariate case, the estimated covariance matrix of is:
where is the residual variance.
Then for the -th coefficient, the standard error is:
and the t-statistic is:
The derivation for these formulas is really easy in the multivariate case.
Practical use and limitations#
In practice, these t-tests are used to assess the individual relevance of each covariate. A small p-value (e.g., < 0.05) suggests evidence against the null , implying the corresponding feature may have predictive value.
However, there are important caveats:
- Multiple comparisons: If many covariates are tested, some will appear significant by chance. Adjustments (e.g. Bonferroni correction) may be needed.
- Model misspecification: If the model omits relevant variables or includes irrelevant ones, the significance test can be misleading.
- Multicollinearity: High correlation between features inflates standard errors, making it harder to detect significance even when the variable matters.
- Non-Gaussian errors: The test assumes normally distributed residuals. When this fails, especially in small samples, the p-values may be unreliable.
For these reasons, t-tests are often viewed as a rough diagnostic rather than a definitive decision tool. In applied settings (including finance), it is common to combine statistical significance with out-of-sample validation and domain knowledge.
Violating the assumptions#
What happens when the assumptions are violated?
Heteroscedasticity#
When the error variance is not constant the OLS estimator becomes inefficient