1001Ferramentas
📈Calculators

Linear Regression (Least Squares)

Fit y = a·x + b by least squares; reports slope, intercept, R².

Simple linear regression: fitting y = a·x + b

Simple linear regression fits a straight line y = a·x + b to a set of (x, y) pairs by minimizing the sum of squared residuals (ordinary least squares, OLS). Closed-form solution: a = Σ((xᵢ − x̄)(yᵢ − ȳ)) / Σ(xᵢ − x̄)² and b = ȳ − a·x̄. The coefficient of determination R² = 1 − SS_res / SS_tot measures fit on a 0-to-1 scale. The Gauss-Markov theorem guarantees OLS is the best linear unbiased estimator when residuals are normal, homoscedastic and independent — the four classical assumptions. Method introduced by Legendre (1805) and rigorously analyzed by Gauss (1809). Example: pairs (1,2), (2,3), (3,5), (4,6), (5,8) yield a ≈ 1.5, b ≈ 0.3, R² ≈ 0.987.

Applications

Sales forecasting from marketing spend, classical machine learning (sklearn.linear_model.LinearRegression), experimental physics (Hooke's law F = k·x estimated by fitting measurement points), econometric models (the Phillips curve, demand and supply estimation), and any quick exploratory analysis where you want a baseline relationship between two variables.

FAQ

What does R² mean exactly? The fraction of variance in y explained by x. R² = 0.85 means the line accounts for 85% of the variation; the remaining 15% is residual.

Does high R² imply causation? No. Regression captures association, not causation. Spurious correlations can produce excellent fits without any causal link.

What if the relationship is nonlinear? OLS will underfit. Apply a transformation (log, square root), use polynomial regression, or move to nonlinear models like splines and decision trees.

Multiple predictors? Extend to multiple linear regression: y = a₁x₁ + a₂x₂ + … + b. The matrix form β̂ = (XᵀX)⁻¹Xᵀy generalizes the closed-form solution.

Related Tools