Regression

A regression is a method to calculate the relationships between a dependent variable (Y) and independent variables (Xi).

Linear Regression (Go to the calculator)

You may use the linear regression when having a linear relationship between the dependent variable (X) and the independent varaibe (Y). When adding one unit to X then Y will be changed by a constant value, the a coefficient.

H0: Y = b0
H1: Y = b0 + b1X

Regression calculation

The least squares method is used to calculate the coefficients b and a. The mothod choose the line that will minimize the sum of the square length of the real values (Yi from the linear line.
$$Min(\sum_{i=1 }^{n}(\hat y_i-y_i)^2)$$ $$b_1=\frac{\sum_{1}^{n}(x_i-\bar{x})(y_i-\bar{y}) }{\sum_{1}^{n}(x_i-\bar{x})^2}\\ b_0=\bar{y}-b_1\bar{x}$$ R2 is the ratio of the T variance explain by X with the regression Y
R is the correlation between X and Y
$$R=a*\frac{var(x)}{var(y)}$$

Multiple Regression (Go to the calculator)

When having more than one dependent varaible, the multiple regression will compare the following hyposesis, using the F statistic:
H0: Y = b0
H1: Y = b0+b1X1+...+bpXp

This is an interative process as you should also check independently each coefficient fo the following hyposesis:
H0: bi = 0
H1: bi ≠ 0

Each time you should remove only the one most insignificant variable (p-value > α) changing the include value from to χ
After removing one insignifcant variable other insignifican variable may become significan in the new model.

Assumptions

• Linearity - a linear relationship between the dependent variable, Y and the independent variables, Xi
• Residual normality - the tool will run the Shapiro-Wilk test per each variable, but for the regression the only normality assumption is regard the residual.
• Homoscedasticity, homogeneity of variance - the variance of the residuals is constand and doesn't depend on the independent variables Xi
• Variables - The dependent varaible, Y, should be continuous varaible while the independent varaibles, Xi, should be continuous varaibles or ordinal variables (ordinal example: low, medium, high)
• No Perfect Correlation (Multicollinearity) - between two or more independent varaibles, Xi.
• Independent observations

Overfitting

It is tempting to increase the number of independent variable to increate the model fitting, but you should be ware that any additional independent variable may increase the fitting of the current data but will not improve the predication of future data.

Can't calculate the model

The tool will not be able to calculate the model when having one of the following problems, technically it would not be able to calculate the inverse of the following matrix multiplication: XtX
• Too many independent variables (Xi) or too small sample size.
Solution: Reduce the number of the independent variables or increase the sample size.
• Multicollinearity, two independent variables (Xi) has a perfect correlation (1).
Solution: Remove one of the variables.

White test

Test for homoscedasticity, homogeneity of variance using the following hyposesis
$$H_0: \hat\varepsilon_i^2=b_0\\ H_1: \hat\varepsilon_i^2=b_0+b_1\hat Y_i+b_2\hat Y_i^2$$ While the ε is the residual and Ŷ is the predicated Y, the test will run a second regression with the following variables:
Independet variable: Y' = ε2.
Dependent variabels: X'1=Ŷ, X'2=Ŷ 2.

The tool uses the F statistic which is the result of the second regression. other option is to use the following statistic: χ2=nR'2 while n is the sample size and R'2 is the result of the second regression.

The regression is robust for the homoscedasticity assuption vioilation, you can try one of the following to meet this assumption Reccomedations
• Try to transporm the dependent varaibles Xi, square root for count variable, log for skew variable and other
• You may be missing an independent variable or combination (xi or xixj or xi2)
• Weighted regression

Regression calculation

Caclualte the regression's parameters without matrixs is very complex, but it is very easy with the matrix calculation.
p - number of independent variables.
n - sample size.

Y - dependent variable vector (n x 1). $$\hat Y (predicted \space Y) \space vector (n x 1).$$ X - independent matrix (n x p+1). Ε - Residuals vector (n x 1). B - Coefficient vector (p+1 x 1) $$Y=\begin{bmatrix} &Y_1\\ &Y_2\\ & :\\ &Y_n \end{bmatrix} \hat Y=\begin{bmatrix} & \hat Y_1\\ & \hat Y_2\\ & :\\ & \hat Y_n \end{bmatrix} X=\begin{bmatrix} &1 &X_{11} &X_{12} & .. &X_{1p} \\ &1 &X_{21} &X_{22} & .. &X_{2p} \\ & : & : & : & : & : \\ &1 &X_{n1} &X_{n2} & .. &X_{np} \end{bmatrix} Ε=\begin{bmatrix} & \varepsilon_1\\ & \varepsilon_2\\ & :\\ & \varepsilon_n \end{bmatrix} B=\begin{bmatrix} &b_0\\ &b_1\\ &b_2\\ & :\\ &b_p \end{bmatrix}\\$$ Y = XB + Ε, is equvalent to the following equation: Y = b0 + b1X1 + b2X2+...+bpXp
$$B = (X'X)^{-1}X'Y\\ \hat Y=XB\\ Ε=Y-\hat Y$$ Calculate the Sum of Squares, Degrees of Freedom and the Mean Squares $$Total: \space SST=\sum_{1}^{n}(Y_i-\bar{Y})^2, \quad DFT=n-1\\ Residual: \space SSE=Ε'Ε, \quad DFE=n-p-1, \quad MSE=\frac{SSE}{DFE} \\ Regression \space SSR=SST-SSE, \quad DFR=p, \quad MSR=\frac{SSR}{DFR}\\ R \space Squared: \space R^2=1-\frac{SSE}{SST}\\ Regression statistic: \space F=\frac{MSR}{MSE} \quad(DFR,DFE)\\ Covariance(B)=MSE(X'X)^{-1}\\ Var(B)=diagonal(Covariance(B))$$ The standard error (SE) vector is the standard deviation of B vector. $$SE(B)=Sqrt(Var(B))$$ Following T vector that contains the t statistics for each coefficient significance $$T_i=\frac{B_i}{SE_i}(DFE)$$ Coefficients Confident Interval $$Lower=B_i+SE_i+t_{\alpha/2}(DFE)\\ Upper=B_i+SE_i+t_{1-\alpha/2}(DFE)\\$$

Numeric Example

X1X2Y
112.1
223.9
336.3
414.95
527.1
638.5

Following the data as a matrix structure.
$$Y=\begin{bmatrix} &2.1\\ &3.9\\ &6.3\\ &4.95\\ &7.1\\ &8.5\\ \end{bmatrix} \quad X=\begin{bmatrix} &1 &1 &1 \\ &1 &2 &2 \\ &1 &3 &3 \\ &1 &4 &1 \\ &1 &5 &2 \\ &1 &6 &3 \end{bmatrix}$$ The first column of the X matrix contains only the value 1 for the b intercept. $$B = (X'X)^{-1}X'Y\\\\ X'=\begin{bmatrix} &1 &1 &1 &1 &1 &1\\ &1 &2 &3 &4 &5 &6\\ &1 &2 &3 &1 &2 &3 \end{bmatrix} \quad XX'=\begin{bmatrix} &6 &21 &12\\ &21 &91 &46\\ &12 &46 &28 \end{bmatrix} \quad (X'X)^{-1}=\begin{bmatrix} & 4/3 & -1/9 & -7/8\\ & -1/9 & 2/27 & -2/27\\ & -7/18 & -2/27 &35/108 \end{bmatrix}$$ $$H=(X'X)^{-1}X'=\begin{bmatrix} &25/72 & -23/36 & -13/8 &1/72 & -35/36 & -47/24\\ & -1/9 & -1/9 & -1/9 &1/9 &1/9 &1/9\\ & -5/8 & -3/8 & -1/8 & -61/72 & -43/72 & -25/72 \end{bmatrix}$$ $$B=HY=\begin{bmatrix} &0.2250\\ &0.9167\\ &1.0208 \end{bmatrix} \quad \hat Y=XB=\begin{bmatrix} &2.1625\\ &4.1\\ &6.0375\\ &4.9125\\ &6.85\\ &8.7875\\ \end{bmatrix} \quad Ε=\begin{bmatrix} & -0.0625\\ & -0.2\\ & 0.2625\\ & 0.0375\\ & 0.25\\ & -0.2875\\ \end{bmatrix}$$

Y = 0.2250 + 0.9167X1 + 1.0208X2
SSDFMS
Total (T)26.6187213.1797
Residual (E)0.259430.08646
Regression (R)26.618755.3237

R2 = 0.9903
F = 152.4398 $$Covariance(B)=\begin{bmatrix} & 0.1153 & -0.0096 & -0.0336\\ & -0.0096 & 0.0064 & -0.0064\\ & -0.0336 & -0.0064 & 0.0280 \end{bmatrix} \quad Var(B)=\begin{bmatrix} &0.1153\\ &0.0064\\ &1.0280 \end{bmatrix} \quad T=\begin{bmatrix} &0.6627\\ &11.4545\\ &6.0986 \end{bmatrix} \quad$$

Logistic Regression (Go to the calculator)

When the dependent varaible is a binary variable, also called dichoyomous variable, you should use the Logistic Regression. The model will calclulate the probability for the category to occure based on the independent varaibles, Xj.
The dependent varaible Y may have only two options 1 or 0, for example win or lose, succees or failure etc. The following model required the accumulated input data, based on i combination of X: (x1..xp), there is also a similar commonly used model is based on single events, therefore each data row is a single event. In this case, every row is a single event and Yi may be only 1 or 0.

• y(1)i: the total 1 occurances based for i combination of X.
• y(0)i: the total 0 occurances based for i combination of X.
• ti: the total events for i combination of X, ti=y(1)i+y(0)i
• pi: the observed probablilty for event=1.
• i: the predicted probablilty for event=1 based on the model.

Odds is the ratio between the probablity that the event will happend to the probability it won't happend
The odds is actualy similar to the probablity but from a differnt angel. $$odds=\frac{p}{1-p}$$ Examples
When P = 1/3 the odds are 1:2 (odds = 0.5).
When P = 1/2 the odds are 1:1 (odds = 1).

H0: ln(odds) = b0
H1: ln(odds) = b0+b1X1+...+bpXp

$$odds(x_1..x_p)=\frac{p(x_1..x_p)}{1-p(x_1..x_p)}=e^{b_0+b_1x_1+...+b_px_p}\quad\Rightarrow\quad p(x_1..x_p)=\frac{1}{1+e^{-(b_0+b_1x_1+...+b_px_p)}}$$ The maximize log-likelihood method is used instead of the least squares method that is used in the linear regression. The method is based on the Binomial distribution.

Likelihood is the posibility that the sample data will occur and the maximize log-likelihood method finds the Pi that will maximize this posibility.
n: The number of X combinations (x1..xp) $$L=\prod_{i=1}^n \hat p^{y_i(1)}(1- \hat p_i)^{y_i(0)}$$ $$LL=ln(L)=\sum_{1=1}^n{(y_i(1) ln(\hat p_i) + y_i(0)ln(1- \hat p_i))}$$

Newton's Method

We use the newton's method to find the B vector of parameters that will maximize the Log-likelihood function based on the following iteration formula. The iteration loop will run until the differences between BR+1 and BR will limit to zero for each coefficient element. B0=[0,0,..0]
$$V=\begin{bmatrix} t_1 \hat p_1(1- \hat p_1) & 0 & 0 & 0\\ 0 & t_2 \hat p_2(1- \hat p_2) & 0 & 0 \\ 0 & 0 & ...& 0\\ 0 & 0 & 0 & t_n \hat p_n(1- \hat p_n) \end{bmatrix}$$ R: interation.
T: ti vector.
P: pi vector. (observed probabiliies)
p̂: p̂i vector (predicted probabiliies).
B: bi vector. (coefficients)
$$B_{R+1}=B_R+(X'V_RX)^{-1}X'T⊙(P- \hat P_R)$$ All the multipications are matrix multipications, except for the last which is an element wise multipication . Example: [2,3] ⊙ [4,5] =[8,15].