Wilcoxon Signed-Rank test (go to the calculator)

Wilcoxon Signed-Rank test Calculator

The Wilcoxon Signed-Rank test is a nonparametric test, it compares Interval scale data for a significant difference between two dependent groups. The test finds the differences of the two groups. Then, it sorts the pairs by the absolute value of the deltas.

When to use?

The test usually comes as a failsafe option to a paired t-test when the test doesn't meet the normality assumption or contains many outliers.
A paired t-test compares the means of the two groups, while a Wilcoxon Signed-Rank test compares the entire distributions. If the two groups have a similar distribution curve, the test will also check if the median of the deltas is the expected value (usually zero).
When the two groups have a similar symmetrical distribution curve, the test will also check if the mean of the deltas is the expected value.
A paired t-test is slightly stronger than a Wilcoxon Signed-Rank Test. A Wilcoxon Signed-Rank Test has 95% efficiency in comparison to a paired t-test. If the population is similar to a normal distribution or reasonably symmetric with sample size of at least 30, it is better to use the paired t-test.

Not Normal, the data distribution is not normal, and not symmetric with at least 30 observations
Many Outliers the test is more robust to outliers than the t-test, but the outliers handling is similar - if you sure that a value is an outlier, you should exclude it, otherwise, keep it.

Assumptions

Dependent pairs, but independence between the pairs
Interval scale variables - if it is not at least anInterval scale you may not be able to rank the differences.
Strictly you can not use ordinal data, in this case, you may use the less powerful sign test instead.
. Some researchers take the strong assumption that the difference between any two consecutive values is identical.
For example, in the following Likert scale: strongly agree, agree, neutral, disagree, strongly disagree.
It is a strong assumption to say that the difference: strongly agree - agree equal to the difference: neutral - disagree.
Shape the data is not necessarily normally distributed but should have a similar shape. If not you can compare the ranks but not the median of the differences

Calculate W

Calculate the difference between the pairs, for example after treatment value minus before treatment value
Exclude pairs with zero difference and sort by the absolute rank
Sort the data from low absolute value to the high absolute value
Rank the list, the lower absolute value get rank of 1, the second rank of 2, etc
When having ties group, identical value for several observations the rank will be the average of the ranks for the entire group
Give each rank a sign positive if the difference is positive and negative if the difference is negative.
Accumulate the ranks
R_i - the absolute rank of the i rank difference.
n - number of pairs, where the difference is not zero. $$W_+=\sum_{i=1}^{n}{R_{i}}\enspace where \enspace sign\enspace is\enspace positive$$ $$W_-=\sum_{i=1}^{n}{R_{i}}\enspace where \enspace sign\enspace is\enspace negative$$ $$W_++W_-= \frac{n(1+n)}{2} \enspace sum \enspace of \enspace arithmetic \enspace progression$$ Since the distribution is symmetrical, usually w is the minimum between w₊ and W_-. $$W=min(W_+ , W_-)$$ It is good for two tails test, but for one tail the test will always assume that H₁ is the sample with bigger values is bigger than the sample with the smaller values. In this website we chose: statistics=W_-, in this way we can calculate the left-tailed test or the right-tailed test like any other test.
Check the critical W in the table, reject H₀ if W < W_critical.

Critical Value

When n is small, the tool will use the exact value from tables, the exact critical value is accurate (for common significant levels ), while the p-value is usually interpolated from two values in the table. The tool uses log interpolation which is more accurate with small p-values and less accurate with bigger p-values. (the common significant levels are small: 0.05, 0.01).
There is no consensus about what is a small n. When Method="automatic" the tool uses the Exact method when n ≤ 40, and normal approximation when n > 40.
When Method="z approximation" the tool uses only the normal approximation.

Statistical tables

Calculated the critical W from a statistic table.

Corrected normal approximation

To get more accurate results, the tool uses continuity correction and ties correction. ties is a group of observations with the same value. $$ z = \frac { W_- - \mu_w + C_{continuity}} {\sigma_w}\\ \mu= \frac {n(n+1)} {4}\\ \sigma^2= \frac {n(n+1)(2n+1)} {24} - C_{ties}$$

Continuity correction

When using continuous distribution to calculate discrete data it is better to use continuity correction.
P(X < a) => P(X < a - 0.5)
P(X > a) => P(X > a + 0.5)

As a result:
Right tail, or two tails with positive Z, (W- > μ) , C_continuity = -0.5 .
Left tail, or two tails with negative Z, (W- < μ) , C_continuity = 0.5 .
When we don't correct the data, C_continuity = 0.

Ties correction

$$C_{ties} = \sum_{i=1}^{t}{\frac{f_t^3-f_t}{48}}$$ t - group number of ties
f_t - number of values in group t

example: [1,2,2,2,3,4,5,5,5,5,6,7,8,8,8,8,8,8,9,10]
$$group_1: [2,2,2], \quad f_1=3\\ group_2: [5,5,5,5], \quad f_2=4\\ group_3: [8,8,8,8,8,8], \quad f_3=6$$

Effect Size

The standardized effect size $$r=\frac{Z}{\sqrt{n}}$$ The common language effect size is the probability that a random value from Group₁ (like before) is greater than his dependent value from Group₂ (like after).
$$ f=\frac{2W}{n(1+n)}$$

Example

In the following example, we check the number of questions answered correctly by the same subject, before and after performing training.
The significant level (α) is 0.05;

Calculate Difference, Absolute and Sign:
Difference = After - Before, Absolute = Absolute(Difference), Sign = sign(Difference).
Subject Before After Difference Absolute Sign
A 1 5 4 4 +
B 6 1 -5 5 -
C 3 6 3 3 +
D 4 4 0 0
E 10 13 3 3 +
F 6 3 -3 3 -
G 2 8 6 6 +
H 3 16 13 13 +
I 5 12 7 7 +
J 2 10 8 8 +
K 13 15 2 2 +
L 6 7 1 1 +
M 5 14 9 9 +
The sample size contains 13 pairs. Exclude D pair with zero difference
n = 13 - 1 = 12
Sort by Absolute value

Subject Absolute Sign Simple Rank Rank W_- W₊
L 1 + 1 1 1
K 2 + 2 2 2
C 3 + 3 4 4
E 3 + 4 4 4
F 3 - 5 4 4
A 4 + 6 6 6
B 5 - 7 7 7
G 6 + 8 8 8
I 7 + 9 9 9
J 8 + 10 10 10
M 9 + 11 11 11
H 13 + 12 12 12
Total 11 67
Simple Rank - rank by the Absolute value, the lower Absolute value get 1 rank, the second 2 etc.
Rank - usually will be the same as Simple Rank. subjects C,E,F have identical Absolute value: 3. The rank is the average of the simple rank values
For subjects C,E,F
$$\frac{3+4+5}{3}=4$$ W- = 4 + 7 = 11
W+ = 1+ 2 + 4 + 4 + 6 + 8 + 9 + 10 + 11 + 12 = 67 W = min(11 , 67) = 11

Subject	Before	After	Difference	Absolute	Sign
A	1	5	4	4	+
B	6	1	-5	5	-
C	3	6	3	3	+
D	4	4	0	0
E	10	13	3	3	+
F	6	3	-3	3	-
G	2	8	6	6	+
H	3	16	13	13	+
I	5	12	7	7	+
J	2	10	8	8	+
K	13	15	2	2	+
L	6	7	1	1	+
M	5	14	9	9	+

Subject	Absolute	Sign	Simple Rank	Rank	W_-	W₊
L	1	+	1	1		1
K	2	+	2	2		2
C	3	+	3	4		4
E	3	+	4	4		4
F	3	-	5	4	4
A	4	+	6	6		6
B	5	-	7	7	7
G	6	+	8	8		8
I	7	+	9	9		9
J	8	+	10	10		10
M	9	+	11	11		11
H	13	+	12	12		12
Total		11	67

Statistical tables

Two-tailed (H₀: Before = After)

Critical Value
Check the the two tails statistic table, for α = 0.05, n = 12.
Critical W is 13.
P-value
For α=0.02, critical W is 9.
For α=0.05, critical W is 13.
Since 11 is between 9 and 13, the p-value will be between 0.02 and 0.05.
The tool will do a logarithmic extrapolation: p-value = 0.032
Decision
Since p-value < α (0.032 < 0.05 ) or alternatively since W < W_critical (11 < 17) we reject the H₀
Website
The website uses W- instead of W.
Left critical W- = 13.
Right critical W- = n (1 + n) / 2 - Left critical W = 12 * 13 / 2 - 13 = 65.
Since W- (11) is in the following range: [13,65], accept H₀. When W- = 13 or 65 you still accept the H₀.

Left tail (H₀: Before ≥ After)

Critical Value
Check the the two tails statistic table, for α = 2 * 0.05 = 0.1, n = 12.
The critical W is 17.
P-value
P-value = p-value(Two-tailed) / 2 = 0.032 / 2 = 0.016
Decision
Since p-value < α (0.016 < 0.05) or alternatively since W- < W_critical(11 < 17 ) reject H₀.

Right tail (H₀: Before < After)

Critical Value
Check the the two tails statistic table, for α = 2 * 0.05 = 0.1, n = 12.
The in the table is is 17.
The critical W is n(1 + n) /2 - value from the table = 12 * 13 / 2 - 17 = 61.
P-value
P-value =1 - p-value(Two-tailed) / 2 = 1 - 0.016 = 0.984
Decision
Since p-value < α (0.984 < 0.05) or alternatively since W- < W_critical (11 < 61), accept H₀

Corrected normal approximation

There is only one tie group (t=1), that contains 3 values: f₁=3
$$C_{ties} = \sum_{i=1}^{1}{\frac{3^3-3}{48}}=0.5$$
Since W<μ , C_continuity=0.5 $$ \mu_w= \frac {n(n+1)}{4}=\frac {12(12+1)}{4}=39 $$ $$ \sigma^2_w= \frac {n(n+1)(2n+1)} {24} - C_{ties}= \frac {12(12+1)(2*12+1)} {24} - 0.5=162.5 \quad => \sigma_w=12.7279 $$ $$ Z = \frac { W_- - \mu_w + C_{continuity}} {\sigma_w} = \frac { 11 - 39 + 0.5} {12.7279}=-2.16$$
P(z≤Z) = P(z≤-2.16) = 0.01539.

Two-tailed (H₀: Before = After)

p-value = 2 * 0.01539 = 0.03078.
Since 0.03078 < 0.05, reject H₀.

Left tail (H₀: Before ≥ After)

p-value = P(z≤-2.16) = 0.01539.
Since 0.01539 < 0.05, reject H₀.

Right tail (H₀: Before < After)

p-value =1 - P( z≤ -2.16) = 1 - 0.01539 = 0.98461.
Since 0.98461 > 0.05, accept H₀.

Subject	Before	After	Difference	Absolute	Sign
A	1	5	4	4	+
B	6	1	-5	5	-
C	3	6	3	3	+
D	4	4	0	0
E	10	13	3	3	+
F	6	3	-3	3	-
G	2	8	6	6	+
H	3	16	13	13	+
I	5	12	7	7	+
J	2	10	8	8	+
K	13	15	2	2	+
L	6	7	1	1	+
M	5	14	9	9	+

Subject	Before	After	Difference	Absolute	Sign
A	1	5	4	4	+
B	6	1	-5	5	-
C	3	6	3	3	+
D	4	4	0	0
E	10	13	3	3	+
F	6	3	-3	3	-
G	2	8	6	6	+
H	3	16	13	13	+
I	5	12	7	7	+
J	2	10	8	8	+
K	13	15	2	2	+
L	6	7	1	1	+
M	5	14	9	9	+

Wilcoxon Signed-Rank test (go to the calculator)

When to use?

Assumptions

Calculate W

Critical Value

Statistical tables

Corrected normal approximation

Continuity correction

Ties correction

Effect Size

Example

Statistical tables

Two-tailed (H0: Before = After)

Left tail (H0: Before ≥ After)

Right tail (H0: Before < After)

Corrected normal approximation

Two-tailed (H0: Before = After)

Left tail (H0: Before ≥ After)

Right tail (H0: Before < After)

Two-tailed (H₀: Before = After)

Left tail (H₀: Before ≥ After)

Right tail (H₀: Before < After)

Two-tailed (H₀: Before = After)

Left tail (H₀: Before ≥ After)

Right tail (H₀: Before < After)

Subject	Before	After	Difference	Absolute	Sign
A	1	5	4	4	+
B	6	1	-5	5	-
C	3	6	3	3	+
D	4	4	0	0
E	10	13	3	3	+
F	6	3	-3	3	-
G	2	8	6	6	+
H	3	16	13	13	+
I	5	12	7	7	+
J	2	10	8	8	+
K	13	15	2	2	+
L	6	7	1	1	+
M	5	14	9	9	+