The one way ANOVA test checks the null assumption that the mean (average) of two or more groups is equal. The test tries to determine if the difference between the sample averages reflects a real difference between the groups, or is due to the random noise inside each group.

When the ANOVA test rejects the null assumption it only tells that not all the mans equal. For more information, the tool also runs the Tukey HSD that compares each pair separately. The **one way ANOVA model** is identical to the **linear regression model** with one categorical variable - the group. When using the linear regression the results will be the same ANOVA table and the same p-value.

**Independency**- Independent observations that represent the population.**Normal distribution**- The population distributes normally. This assumption is important for a small sample size. (n<30)

The ANOVA calculator runs the Shapiro Wilk test as part of the test run.**Equality of variances**- the variances of all the groups are equal. The ANOVA test considered to be robust to the homogeneity of variances assumption when the groups' sizes are similar. (Maximum sample size/ minimum sample size< 1.5)

The ANOVA calculator runs the Levene's test as part of the test run.

The model analyzes the differences between all the observations and the overall average and tries to determine if the differences are only random differences or also partially explained by the group. (similar to the linear regression).

As in the standard deviation calculation, we use the sum of squares instead of the absolute difference.**SST** - the sum of squares of the total differences.**SSG/SSB** - the sum of squares of the differences caused by the group. The calculation is similar to the SST but instead of using the entire difference between any observations and the overall average, it takes only the difference between the group's average and the overall average.**SSE/SSW** - the sum of squares of the differences within the groups. The calculation is similar to the SST but takes only the differences between the observations and the groups' averages

Source | Degrees of Freedom | Sum of Squares | Mean Square | F statistic | p-value |
---|---|---|---|---|---|

Groups(between groups) | k - 1 | $$SSG= \sum_{j=1}^{n_i}\sum_{i=1}^k (\bar{x}_{i}-\bar{x})^2 = \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2$$ | $$MSG = \frac{SSG}{k - 1}$$ | $$F = \frac{MSG}{MSE}$$ | P(x > F) |

Error(within groups) | n - k | $$SSE=\sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x_i})^2 = \sum_{i=1}^k (n_i-1)S_i^2$$ | $$MSE = \frac{SSE}{n - k}$$ | ||

Total | n - 1 | $$SST = \sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x})^2 = SSG + SSE$$ | $$Sample Variance = \frac{SST}{n - 1}$$ |

If you are not sure what expected effect size value and type to choose, just choose "Medium" effect size and the tool will choose 'f' type and the relevant value. There are several methods to calculate the effect size.

__Eta-squared__

$$\eta^2=\frac{SSG}{SST} \qquad \eta^2=\frac{f^2}{1+f^2} \qquad f^2=\frac{\eta^2}{1-eta^2}$$ This the ratio of the explained sum of squares random the total sum of squares. equivalent to the R^{2}in the linear regression__Cohen's f-Method-1__

The tool uses this method. $$f=\sqrt{\frac{SSG}{SSE}} $$ This the ratio of the explained sum of squares and the non-explained sum of squares (random noise).__Cohen's f-Method-2__

$$f=\sqrt{ \frac{\sum_{i=1}^k(\bar{x}_{i}-\bar{x})^2}{k*\sigma^2}}\\ $$

When running n multiple comparisons with significance level (α) in each comparison, the probability that at least one of the test will reject a correct null assumption is much bigger α\' $$\alpha'=1-(1-\alpha)^n$$ Example, when using 6 comparisons (n=6) and α=0.05 the allowed probability for type I error is:

α'=1 - (1 - 0.05)

So if we want to keep α'= 0.05 we need to use much smaller significance level in each single test.

The number of tests / pairs.

Overall significance level.

One pair's significance level.

Any change in any field will calculate the other fields. Change in **n** will calculate the **corrected α**, change in the **overall α'** will calculate the **corrected α** and change in the **corrected α** will calculate the **overall α'.**

The number of tests / pairs.

Overall significance level.

One pair's significance level.

When you use a corrected significance level of **α = 0.025321** in any single test, the overall significance level **α' = 0.05**.

This is the probability to get type I error in at least one of the tests when all the null assumptions are correct in all the tests.

This is the probability to get type I error in at least one of the tests when all the null assumptions are correct in all the tests.

Any change in any field will calculate the other fields. Change in **n** will calculate the **corrected α**, change in the **overall α'** will calculate the **corrected α** and change in the **corrected α** will calculate the **overall α'.**

The Tukey HSD (Honestly Significant different ) test is a multiple comparison test that compares the means of each combination. The test uses the Studentized range distribution instead of the regular t-test. It is only a two-tailed test, as the null assumption is equal means. The Tukey HSD test assumes **equal groups** and the Tukey-Kramer know to handle **unequal groups**, so the Tukey HSD test is a special case of the **Tukey-Kramer test**.

The ANOVA calculator executes the Tukey-Kramer test. There is no dedicate calculator to the Tukey-Kramer.

**Independency**- Independent observations that represent the population.**Normal distribution**- The population distributes normally**Equality of variances**- the variances of all the groups are equal.

Calculating the following for each pair of groups: Group_i-Group_j

$$ Difference = |\bar{x}_i-\bar{x}_j|\\ SE=\sqrt{(\frac{MSW}{2}(\frac{1}{n_i}+\frac{1}{n_j})}$$ __ The test statistic__ $$ Q=\frac{Difference}{SE} $$ Calculating the p-value and the Q

The Levene's test checks the null assumption the standard deviation of two or more groups is equal. The test tries to determine if the difference between the variances reflects a real difference between the groups, or is due to the random noise inside each group.

The Levene's test run the ANOVA model of the absolute differences from **each group's center**, using mean or median as the group center.

**Independency**- Independent observations that represent the population.**Normal distribution**- The population distributes normally. This assumption is important for a small sample size. (n<30)

The ANOVA calculator runs the Shapiro Wilk test as part of the test run.

The general recommendation is to use **mean** for a symmetrical distribution and **median** for asymmetrical distribution.

Since the median and the mean are almost the same in the symmetrical distribution, you may just use the median.

- $$X'_{ij}=X_{ij}-\bar{X}_i.\quad \bar{X}_i \; is\;the\;mean\;of\;group\;i$$
- $$X'_{ij}=X_{ij}-\tilde{X}_i.\quad \tilde{X}_i \; is\;the\;median\;of\;group\;i$$

$$\begin{bmatrix}Group1&Group2&Group3&\\3&5.5&16&\end{bmatrix}$$

In this example, we use differences from the medians.

$$\begin{bmatrix}Group1&Group2&Group3&\\2.0&2.5&3.0&\\1.0&1.5&1.0&\\1.0&0.5&0&\\0&0.5&0&\\1.0&2.5&3.0&\\2.0&5.5&5.0&\\3.0&&6.0&\end{bmatrix}$$ Now you can run a regular ANOVA test over the