Statistical Analysis of Variance

This discusses the relevance of the logic of the study of variance. As well, it highlights the real value of this analysis to make corrrect staistical inferences about a population and to apply the correct test method to test statistical significance for complex situations and study.

 

In certain circumstances the analysis of variance is important in the statistical process of testing whether any differences exist due to chance or mere fluke at a particular level of significance. The statistical analysis of t-test cannot be applicable reliably to samples more than two if they are from a normally distributed population and they are independent samples. This arises because the fact of disjunctive probability. In this article I will discuss why t-test is not applicable for more than two samples which are independent and the analysis of variance and the use of f-distribution for the critical value of f at the particular degrees of freedom with the calculated f –ratio at a particular level of significance. As well, this article will highlight the process of calculating f-ratio and its statistical meaning in simple terms rather than in mathematical form. The variance analysis is normally called ANOVA in statistical terminology.

The Limitation of t-test for more than two independent samples given one independent variable

If say a researcher wants to study the effects of type of music of three types on different independent samples on the task performance of these three groups. Say the type of music is A, B and C and the samples are independent. The researcher chooses a 0.05 level of significance for the t-test to verify whether the difference between the mean is significance at 0.05 levels. If the researcher wants to know the difference in mean is significance at this level due to disjunctive probabilities and conditional probabilities the significant level may become closer to 0.15 level of significance which is two high for the requisite level of significance and it may be sue to randomness or chance of this probability level and there fore even if the t-test is done and it may appear significant due the above reason the researcher may be concluding erroneously that the means are different and significant even though they are not. In other words, t-test cannot be reliably applied for more than two independent samples and a new statistical measure has to apply studying the variance between the means and the variance due to random error. As well, apply a sampling distribution of variance at particular degrees of freedom at various levels of significance to estimate the critical values of a ratio called f-ratio to compare with the calculated f ratio to infer whether they are different at a particular level  of significance chosen by the researcher considering the field of research and the accepted level of significance relevant to the particular phenomenon under study.

The statistical logic of the ANOVA method of analysis

The measure of the variance analysis is analogous to the t-test. What is different is that to that of t-test is that the mean differences are not taken but the sum of square deviates of mean of the there samples compared to the total mean weighted for the number of sample items is taken as the mean differences or variability than the mere actual differences compared to the variability within the samples measured by the sum of square deviates divided by the degrees of freedom as done in the t-test. This can be demonstrated by an example as follows.

Say a researcher wants to study three types of music on task performance and has chosen three independent samples from a normally distributed population. The sample items are 5 or the sample size is 5 for each sample chosen random manner. The date set and the basic statistics is as follows:

Sample (A) Measures and data:

16, 15, 17, 15, 20

The number of samples (N a) = 5

The Sum of sample (A) = 83

The sum squares of sample = 1395

Mean Ma = 16.6

Sum of square deviates SS a = 1395- (83)2/5= 17.2

 Sample (B) Measures and data:

20, 19, 21, 16, 18

The number of sample (N b) = 5

The sum of sample = 94

The sum of squares of sample = 1782

Mean Mb = 18.8

Sum of square deviates SS b = 1782 – (94)2/5 = 14.8

Sample C Measures and data

18, 19, 18, 23, 18

The number of sample (N c) = 5

The sum of sample = 96

The sum of squares of sample = 1862

Mean Mc = 19.2

Sum of square deviates SS c = 1862 – (96)2/5= 18.8

Total array Measures and data

16, 15, 17, 15, 20 sample (A)

20, 19, 21, 16, 18 sample (B)

18, 19, 18, 23, 18 sample (C)

Total Number of sample (N t) = 15

The sum of sample = 273

The sum of square of sample = 5039

The sum of square deviates = 5039 – (273)2/15 = 70.4

First step is to calculate the measure of aggregate differences between sample mean. That is to calculate the aggregate square difference of mean from the array mean multiplied by the number of sample items in sample A, B and C. In this case the Aggregate sample mean difference is measured by SS b-g = 5*(16.6- 18.2)2 + 5*(18.8 – 18.2)2 + 5*(19.2 – 18.2)2 = 19.6.

Then calculate the variance within groups in aggregate which is the sum of square deviates. That SS w-g = SS a + SS b + SS c = 17.2 + 14 .8 + 18.8 = 50 8

Then estimate the variance by the refinement by the degrees of freedom or arbitrates measured by the sample size -1 for between groups and with group of square deviates. In this case the mean square deviates or refined estimate of population variance between groups and with groups are as follows:

Mean square deviates between groups is MS b-g = SS b-g/ degrees of freedom = 196/ 2 = 9.8. Mean square deviates within groups MS w-g = SS w-g/ degrees of freedom of array = 50.8/ (15-3) = 4.23.

The like the t-test calculate the f ratio = MS b-g/ Ms w-g = 9.8/ 4.23= 2.32

Then the f ratio must be greater than 1 first to be considered because if the difference measure variation is smaller than the random variability with the samples then at any level of significance the difference will not be significance. However even if it is greater than 1 it can be or cannot be significant depending on the particular sampling f distribution given the degrees of freedom of the sample and within group. In this instant the specific sampling distribution of f distribution is for the degrees of freedom 2 and 12 as explained above. Using the f- distribution table for the specific sampling distribution at 0.05 and 0.01 levels of significance is 3.81 and 6.70 respectively. In this instance the calculated f ratio of the three samples is 2.32. This is less than the critical values at 0.05 and 0.01 levels of significance. That is in this example the ANOVA for one-way analysis of variance in the context of one independent variable of type of music is not significance and the differences are due to randomness other than the influence of the type of music.

Conclusion

As explained above the analysis of variance and the test or f-test is important when the samples or independent samples are more than two.

As well, the analysis of variance can also be applied to more than one independent variable and to study the interaction of the two independent variables and its effects and to separate these effects to ensure whether the variability differences are due to chance or influence of the particular independent variable without the effect of interaction eliminated. This is important in statistical sense in science and in behavioral science like psychology and social science field because interaction is often the reality.

As explained above t-test is not applicable when more than two samples are tested for statistical significance and analysis of variance or ANOVA analysis is necessary for statistical inferences to make general statement of population because of disjunctive probability and conditional probability making the significance level more than the accepted level of significance based on the t-test.

In general ANOVA is an important statistical method to analyze and make inferences of complex situations and circumstances or complex study of more than one idenpendent variable at the same time.

0
Liked it

No Responses to “Statistical Analysis of Variance”

Post Comment