Variance Analysis of Correlated Samples

About variance analysis logic and process of analysis when the samples are correlated than when they are independent and the use of f-ratio comapred to t-test for inferential statiscal purposes.

In this article I discuss the value of analyzing the variance of correlated samples as opposed to independent samples when the subjects are tested in equal to or more than three conditions as repeated measure or in randomized blocks. The ANOVA logic is same for independent samples but in correlated samples more assumptions has to be satisfied and the researcher has to eliminate differences arising from individual difference to calculate the f ratio for correlated samples to enable reliable testing of variability for testing the null hypothesis or to test whether the differences are due to chance or due to significant differences under the influence of a particular independent variable.

I will demonstrate the application in this article by a practical example illuminating the logic and the process as well as the assumptions and robustness of ANOVA statistical test process. In addition, this article will highlight the influence of individual differences and how it is eliminated in the f-test process.

The logic and the process of variance analysis of correlated samples as opposed to independent samples

In correlated samples the subjects are measured under the influence of an independent variable under three or more different conditions and for every subject the dependent variable measure is observed for a given sample size. That is repeated measures are taken and the samples are correlated under the three conditions. But in independent samples the measures are not repeated the subjects are chosen randomly on an independent basis.  In the analysis of variance of independent samples the sum of square deviates total is equal to sum of square deviates between groups and sum of square deviates within groups and the deviates due to individual differences are ignored. However, in correlated samples the sum of square deviates within groups are further divided in to sum of square deviates due to random error and sum of square deviates due to individual differences. This elimination is important because the deviation within groups can be attributed to random error and individual differences and if individual differences are not eliminated one may over estimate the sum of square deviates due to random error.

Like in ANOVA variance analysis for independent samples the same steps and process of analysis is used but the complexity for correlated samples are increased due to the calculation of Sum of square deviates due to individual differences for correlated samples.

The first step is to calculate the means, squares of each measurement of repeated measures under each conditions their total, and square deviates under each conditions and the total square deviates. The second step is to calculate the sum of square deviates between conditions and to calculate the sum of square deviates due to deviations within groups. The third step is to calculate the sum of square deviates due to individual differences and then calculate the sum of square deviates due to random error using the formula where the sum of square deviates within groups is equal to sum of square deviates due o individual differences and due to random error due randomness within groups.

The process of ANOVA for correlated sample can be demonstrated by the following example. Say researcher wants to study under different rhythmic conditions for 18 subjects and take repeated measure for each subject. He wants to know the variations between these conditions are due to mere chance or due to significant differences under the influence of the independent variable the rhythmic conditions their motor skills in performing a task. Say he has the following data set for the correlated sample as follows:

 Subject 1

Condition A 35, Condition B 39, Condition C – 32 Mean – 35.3

Subject 2

Condition A 32, Condition B 35, Condition C – 31 Mean – 32.7

Subject 3

Condition A 33, Condition B 32, Condition C – 28 Mean – 31.0

 Subject 4

Condition A 32, Condition B 32, Condition C – 29 Mean – 31.0

Subject 5

Condition A 31, Condition B 33, Condition C – 26 Mean – 30.0

Subject 6

Condition A 29, Condition B 30, Condition C – 29 Mean 29.3

Subject 7

Condition A 29, Condition B 31, Condition C – 27 Mean – 29.0

Subject 8

Condition A 27, Condition B 29, Condition C – 27 Mean – 27.7

Subject 9

Condition A 27, Condition B 31, Condition C – 24 Mean – 27.3

Subject 10

Condition A 28, Condition B 27, Condition C – 24 Mean – 26.3

Subject 11

Condition A 27, Condition B 27, Condition C – 23 Mean – 25.7

Subject 12

Condition A 27, Condition B 26, Condition C – 23 Mean – 25.7

Subject 13

Condition A 24, Condition B 29, Condition C – 19 Mean – 24.0

Subject 14

Condition A 24, Condition B 25, Condition C – 19 Mean – 22.7

Subject 15

Condition A 17, Condition B 16, Condition C – 18 Mean – 17.0

Subject 16

Condition A 17, Condition B 15, Condition C – 17 Mean – 16.3

Subject 17

Condition A 14, Condition B 15, Condition C – 12 Mean – 13.7

Subject 18

Condition A 13, Condition B 13, Condition C -13 Mean – 13.0

Group Mean Condition A 25.9

Group Mean Condition B 26.9

Group Mean Condition C 23.4

Sum of Condition A 466, Sum of Condition B 485, Sum of Condition C 421, Sum of all groups 1372

Sum of squares of Condition A 12800, Sum of squares Condition B 14021, Sum of squares of condition C 10433, Sum of squares all groups 37264

Sum of square deviates Condition A 735.8, Sum of square deviates Condition B 952.9, Sum of square deviates Condition C 596.3, Sum of square deviates of all groups 2405.0

There fore Sum of square deviates within groups (SS w-g) = 735.8 + 952.9 + 596.3 = 2285.0.

The sum of square deviates between groups ( SS b-g) = Sum of square deviates (SS t) – SS w-g.

There fore SS b-g = 2405.0 – 2285.0 = 120

Then as mentioned above calculate the Sum of square deviates sue to individual differences. That SS subjects for the data set as follows: 

The first step is to know that SS subject = The sum of (The sum of subject total under each condition)2/number of conditions – (The sum of all conditions)2/Total sample size.

In this instance the sum of (The sum of measures of subjects under each conditions as a group) squared = 111122.  There fore applying the above formula the SS subject can be calculated as follows:

SS subject = 111122/ 3 – (1372) squared/ (18+18+18) = 2181.7

However as explained above SS w-g = SS subject + SS error

That is SS error = 2285.0 – 2181.7 = 103.3

Then calculate the degrees of freedom for the total group, for the conditions and the degrees of freedom for the random error, degrees of freedom subjects as follows:

Degrees of freedom Total = (3*18 -1) = 53

Degrees of freedom between groups = 3-1 = 2

Degrees of freedom within group = 54- 3 = 51 as Degrees of freedom total group = Degrees of freedom between groups and Degrees of freedom within groups.

Degrees of freedom subject = 18 -1 = 17

Degrees of freedom error = 51- 17 = 34 as the degrees of freedom within group is equal to the degrees of freedom of error and degrees of freedom of subjects.

Using the degrees of freedom for error and degrees of freedom for SS b-g one can calculate the f ratio as follows:

Mean square deviate b-g or MS b-g = SS b-g/ Degrees of freedom between groups = 120/ 2  = 60.0

Mean square deviate of error = SS error/ Degrees of freedom error = 103.3/ 34 = 3.0

There fore f ratio = MS b-g/ MS error with degrees of freedom 2, 34 = 60/3 = 20.0

However for this sampling distribution with the degrees of freedom 2, 34 at 0.05 levels and 0.01 level the f ratio is 3.28 and 5.29 respectively. As the f –ratio calculated is far more than the critical values of f-ratios the investigator can reject the null-hypothesis and conclude confidently the differences between different conditions are due to the influence of the independent variable than due to mere chance.

It must be noted the ANOVA analysis is based on more than the above assumptions and even they are violated in certain assumptions the robustness of the analysis is strong and there fore can be applied even in conditions where some of the assumptions are violated.

However, the sample size must be the same for the ANOVA when the variance between k groups is not equal even though the assumption of ANOVA is based on the equality of variance between groups.

However, one must be careful in reaching this conclusion for correlated samples because they can have different correlation coefficients and if they are significantly differ then the conclusions can be misleading as the analysis is based on the assumption that the correlation of the measures between different conditions are similar or not significantly different. As well the correlation coefficient between groups must be positive. In this example the correlation coefficient between A and B, A and C and B and c are approximately same and positive. There fore the analysis satisfied this assumption and there fore the researcher can confidently conclude that the differences are more than due to mere chance alone.

Summary

As discussed above the ANOVA is a useful statistical technique when the sample are three or more than three compared to t-test. In addition, ANOVA is a robust technique for inferential statistical purposes even some assumptions are not applicable. However the process of analysis is more complex when the samples are correlated samples compared to when the samples are independent. It is important to note that the sample size must be same under each different k groups when analyzing variance and must analyze whether the correlation coefficient is approximately equal and positive to reliably reject the null hypothesis. The above example demonstrates the usefulness of this statistical testing process in complex situations and research design.

1
Liked it

No Responses to “Variance Analysis of Correlated Samples”

Post Comment