Biological Sciences

Chi-Square Test

The Chi-Square Test is a statistical method used to determine if there is a significant association between categorical variables. In biological sciences, it can be applied to analyze data such as genetic ratios, allele frequencies, or the distribution of traits among different groups. The test compares observed data with expected data to assess whether any differences are due to chance or actual relationships.

Written by Perlego with AI-assistance

Related key terms

Chi Square Test for Goodness of Fit

Chi Square Test for Homogeneity

Chi Square Test for Independence

Chi-Square Distribution

Hypothesis Test for Correlation

Inference for Distributions of Categorical Data

Paired T-Test

Two Categorical Variables

Two Sample Test

Z Test

8 Key excerpts on "Chi-Square Test"

eBook - ePub
Sensory Evaluation of Food
Statistical Methods and Procedures
- Michael O'Mahony(Author)
- 2017(Publication Date)
- Routledge
  (Publisher)
6
Chi-Square

6.1 What is Chi-Square?

We now examine a test called chi-square or chi-squared (also written as χ 2 , where χ is the Greek lowercase letter chi); it is used to test hypotheses about frequency of occurrence. As the binomial test is used to test whether there may be more men or women in the university (a test of frequency of occurrence in the “men” and “women” categories), chi-square may be used for the same purpose. However, chi-square has more uses because it can test hypotheses about frequency of occurrence in more than two categories (e.g., dogs vs. cats vs. cows vs. horses). This is often used for categorizing responses to foods (“like” vs. “indifferent” vs. “dislike” or “too sweet” vs. “correct sweetness” vs. “not sweet enough”).

Just as there is a normal and a binomial distribution, there is also a chi-square distribution, which can be used to calculate the probability of getting our particular results if the null hypothesis were true (see Section 6.6 ). In practice, a chi-square value is calculated and compared with the largest value that could occur on the null hypothesis (given in tables for various levels of significance); if the calculated value is larger than this value in the tables, H 0 is rejected. This procedure will become clearer with examples.
In general, chi-square is given by the formula
Chi-square = Σ [
( O − E )
2
E
]
where
O = observed frequency
E =
expected frequency

We will now examine the application of this formula to various problems. First we look at the single-sample case, where we examine a sample to find out something about the population; this is the case in which a binomial test can also be used.

6.2 Chi-Square: Single-Sample Test-One-Way Classification

In the example we used for the binomial test (Section 5.2 ) we were interested in whether there were different numbers of men and women on a university campus. Assume that we took a sample of 22 persons, of whom 16 were male and 6 were female. We use the same logic as with a binomial test. We calculate the probability of getting our result on H 0 , and if it is small, we reject H 0 . From Table G.4.b , the two-tailed binomial probability associated with this is 0.052, so we would not reject H 0 at p < 0.05. However, we can also set up a Chi-Square Test. If H 0 is true, there is no difference in the numbers of men and women; the expected number of males and females from a sample of 22 is 11 each. Thus we have our observed frequencies (O = 16 and 6) and our expected frequencies (E
Sign up to read
Learn more about book
eBook - ePub
An Introduction to Statistical Analysis in Research, Optimized Edition
With Applications in the Biological and Life Sciences
- Kathleen F. Weaver, Vanessa C. Morales, Sarah L. Dunn, Kanya Godde, Pablo F. Weaver(Authors)
- 2017(Publication Date)
- Wiley
  (Publisher)
9 Chi-Square Test Learning Outcomes By the end of this chapter, you should be able to: Determine the observed and expected chi-square values as well as the degrees of freedom associated with each scenario. Use statistical programs to perform a Chi-Square Test and determine significance. Evaluate the relationship between the observed and expected and construct a logical conclusion for each scenario. Use the skills acquired to perform, analyze, and evaluate your own dataset from independent research. 9.1 Chi-Square Background The last test we will cover involving two or more samples is known as the Pearson's Chi-Square Test (X 2 test). The Chi-Square Test (also known as chi-squared test) examines the difference between expected and observed distributions. Specifically, we will look at a goodness-of-fit test, comparing the expected frequency (which is the value that we expect to see based on the literature background material or a hypothesis generated as part of an experiment) to the observed frequency (which is the value actually observed as part of an experiment or study). The goodness-of-fit test compares the distribution of your data to a specified distribution. The default in some programs is a uniform distribution (all expected values are the same), but researchers can define and test other distributions. Often researchers use the Chi-Square Test in genetics for tests of Hardy–Weinberg equilibrium and for comparing expected and observed offspring phenotypes. The Chi-Square Test is used on categorical variables. 9.2 Case Study 1 The tiger leech, Haemadipsa picta, is known to rely on sensory mechanisms to locate prey. Field observations have recorded cases where leeches moved toward warm or moving objects, in addition to detecting sound and vibrations from incoming prey. Kmiecik and colleagues (2008) conducted a comparative study to further examine leech behavior when introduced to a moving or warm object
Sign up to read
Learn more about book
eBook - ePub
Practical Statistics for Field Biology
- Jim Fowler, Lou Cohen, Phil Jarvis(Authors)
- 2013(Publication Date)
- Wiley
  (Publisher)
13

ANALYSING FREQUENCIES

13.1 The Chi-Square Test

Field biologists spend a good deal of their time counting and classifying things on nominal scales such as species, colour and habitat. Statistical techniques which analyse frequencies are therefore especially useful. The classical method of analysing frequencies is the Chi-Square Test. This involves computing a test statistic which is compared with a chi-square (χ 2 ) distribution that we outlined in Section 11.11. Because there is a different distribution for every possible number of degrees of freedom (df), tables in Appendix 3 showing the distribution of χ 2 are restricted to the critical values at the significance levels we are interested in. There we give critical values at P = 0.05 and P = 0.01 (the 5% and 1% levels) for 1 to 30 df. Between 30 and 100 df, the critical values are estimated by interpolation, but the need to do this arises infrequently.

Chi-Square Tests are variously referred to as tests for homogeneity, randomness, association, independence and goodness of fit. This array is not as alarming as it might seem at first sight. The precise applications will become clear as you study the examples. In each application the underlying principle is the same. The frequencies we observe are compared with those we expect on the basis of some Null Hypothesis. If the discrepancy between observed and expected frequencies is great, then the value of the calculated test statistic will exceed the critical value at the appropriate number of degrees of freedom. We are then obliged to reject the Null Hypothesis in favour of some alternative.

The mastery of the method lies not so much in the computation of the test statistic itself but in the calculation of the expected frequencies. We have already shown some examples of how expected frequencies are generated. They can be derived from sample data (Example 7.5) or according to a mathematical model (Section 7.4). The versions of the test which compare observed frequencies with those expected from a model are called goodness of fit tests. All versions of the Chi-Square Test assume that samples are random and observations are independent.
Sign up to read
Learn more about book
eBook - ePub
Interpreting Quantitative Data with IBM SPSS Statistics
- Rachad Antonius(Author)
- 2012(Publication Date)
- SAGE Publications Ltd
  (Publisher)
The curve representing the distribution of the chi-squared statistics is known. It depends on the number of degrees of freedom. For every number of degrees of freedom, we can determine the corresponding chi-square curve. We will refer to the properties of these curves to determine whether two categorical variables (i.e. measured by a nominal scale) are statistically associated.

Chi-squared as a Test of Association between Two Nominal Variables

The most important use of χ2 is to test whether the association observed on a sample reflects a statistical association between the independent variable and the dependent variable at the level of the whole population. To do that, we will rely on the method of hypothesis testing, formulated as follows:

H0 : There is no statistical association between the independent and dependent variables. H1 : There is a statistical association between them.
The method is based on the following reasoning.
Given an independent variable with i categories, and a dependent variable with j categories, a situation of no statistical association will mean that χ2 = 0, since the observed frequencies of the cross-tabulation will be equal to the expected frequencies.

A sample of n individuals drawn at random from this population is likely to differ a little from the expected situation of no association, and for that sample, χ2 will have a non-zero value. The mathematical properties of the distribution of all such values and its diagram are known, and they both depend on the number of degrees of freedom of the statistic. Given a specific value of χ2 , with df degrees of freedom, we can determine with precision the probability of getting such a value or larger. This probability is equal to the area under the χ2 curve, to the right of the given value. Given a sample of data values for the independent and dependent variables, SPSS will compute such a probability. If this probability is less than 0.05 (i.e. if we have less than a 5% chance of getting it), we conclude that such a value of χ2 is unlikely under the null hypothesis. We therefore reject the null hypothesis in favor of the alternative hypothesis. This is why the area to the right of the curve that consists of 5% of the whole area is called the rejection zone (see Figure 11.1
Sign up to read
Learn more about book
eBook - ePub
Social Statistics
Managing Data, Conducting Analyses, Presenting Results
- Thomas J. Linneman(Author)
- 2021(Publication Date)
- Routledge
  (Publisher)
ropractor. Although I like to drink chai, that’s not what we’re doing here. Although I appreciate tai chi, that’s not what we’re doing here. In the world of statistical tests, the Chi-Square Test is a relatively easy one to use. It contrasts the frequencies you observed in the crosstab with the frequencies you would expect if there were no relationship among the variables in your crosstab. It makes this contrast with each cell in the crosstab. We’ll use the third sex/gun crosstab from earlier, the one where your gut wasn’t completely sure if there was a generalizable relationship. Here it is, with its frequencies expected crosstab next to it:

Exhibit 4.12 Frequencies Observed and Frequencies Expected

Let’s first find the difference between the frequencies observed (hereafter referred to as f
o
) and the frequencies we would expect (hereafter referred to as f
e
):

Exhibit 4.13 Differences Between Observed and Expected Frequencies

Cell fo fe fo – fe

Top left 56 49 7

Top right 91 98 –7

Bottom left 44 51 –7

Bottom right 109 102 7

Then we’re going to square each of these and divide it by its corresponding f
e
:

Exhibit 4.14 Calculating the Chi-Square Value

Cell fo fe (fo – fe ) (fo – fe ) 2 (fo – fe ) 2 /fe

Top left 56 49 7 49 1.00

Top right 91 98 –7 49 0.50

Bottom left 44 51 –7 49 0.96

Bottom right 109 102 7 49 0.48

The sum of the last column of numbers is our value for chi-square:
- 1.00 + 0.50 + 0.96 + 0.48 = 2.94
Here is the formula for what we just did:

χ 2
= ∑

(
f o
−
f e
)
2

f e

Notice that the symbol for chi-square is χ2 . It looks like an x
Sign up to read
Learn more about book
eBook - ePub
Compassionate Statistics
Applied Quantitative Analysis for Social Services (With exercises and instructions in SPSS)
- Vincent Faherty(Author)
- 2007(Publication Date)
- SAGE Publications, Inc
  (Publisher)
The absolute chi-square value appearing as a whole number taken to two decimal places (e.g., 2.34, 4.67, 5.29) 2. The number of degrees of freedom appearing as a whole number (e.g., 3, 5, 8) 3. The probability factor appearing as a number less than 1.00 (e.g.,.05,.001,.34), usually preceded by the symbol for less than (<) or more than (>) There are two standard ways of writing up the results of a Chi-Square Test: one traditional, the other a more modern alternative. You can choose either format as long as you are consistent throughout the entire research report. Measuring the Strength of the Relationship Many researchers, after computing a chi-square analysis, will choose to stop at that point and go on to another analysis involving some other variables. However, if you want to learn just a bit more about the relationship you just discovered between the two variables in the chi-square analysis, you can take one more step and determine how strong that relationship is. This is also referred to as determining a statistical test’s effect size. Reporting such an effect size is becoming increasingly common today in many books, monographs, and journals published throughout the field of social services. Suppose you analyzed a sample of college students and discovered a statistically significant relationship between gender and academic major, as categorized into education, psychology, or social work. Furthermore, assume that your data suggest that males in your sample tended to choose social work over education or psychology, while females showed no particular preference for one over the other two choices of major
Sign up to read
Learn more about book
eBook - ePub
Statistics
The Essentials for Research
- Henry E. Klugh(Author)
- 2013(Publication Date)
- Psychology Press
  (Publisher)
10 Chi Square
The binomial distribution and its normal approximation provide a test of significance for hypotheses about dichotomous data. It is an appropriate test to use when observations can be classified into one of two possible categories such as “yes-no,” or “male-female,” or “correct-incorrect.” When data can be classified in more than two categories, the binomial no longer provides a test of significance. For example, a response might be classified as occurring “always,” “often,” “sometimes,” or “never.” In that situation, when the population is composed of more than two classes of events, it would appear reasonable to employ the multinomial distribution to determine the probability of obtaining particular kinds of samples. This is theoretically possible but the calculations required quickly become prohibitive. Consequently, we use a distribution that approximates the multinomial (and the binomial) distribution. This distribution, and the test of significance named for it, is called Chi Square (χ2 ).
10.1 The Chi Square Distribution
The chi square distribution differs in some important ways from the binomial and normal distributions we have already discussed, but no new principles are involved in chi square’s use as a test of significance. Before discussing chi square, let us briefly review the normal approximation of the binomial as a test of significance. We begin with some hypothesis about the proportion of events in a binomial population which, given the sample size, allows us to calculate the mean and standard error of a theoretical sampling distribution. With this information we can calculate the z equivalent of any obtained sample proportion. When it can be assumed that z is normally distributed, Table N gives the probability of obtaining a value of z as large or larger than that yielded by the sample. If this z
Sign up to read
Learn more about book
eBook - ePub
Goodness-of-Fit-Techniques
- RalphB. D'Agostino(Author)
- 2017(Publication Date)
- Routledge
  (Publisher)
3Tests of Chi-Squared Type
David S. Moore Purdue University, West Lafayette, Indiana

3.1 Introduction

In the course of his Mathematical Contributions to the Theory of Evolution , Karl Pearson abandoned the assumption that biological populations are normally distributed, introducing the Pearson system of distributions to provide other models. The need to test fit arose naturally in this context, and in 1900 Pearson invented his chi-squared test. This statistic and others related to it remain among the most used statistical procedures.

Pearson’s idea was to reduce the general problem of testing fit to a multinomial setting by basing a test on a comparison of observed cell counts with their expected values under the hypothesis to be tested. This reduction in general discards some information, so that tests of chi-squared type are often less powerful than other classes of tests of fit. But chi-squared tests apply to discrete or continuous, univariate or multivariate data. They are therefore the most generally applicable tests of fit.

Modern developments have increased the flexibility of chi-squared tests, especially when unknown parameters must be estimated in the hypothesized family. This chapter considers two classes of chi-squared procedures. One, called “classical” because it contains such familiar statistics as the log likelihood ratio, Neyman modified chi-squared, and Freeman-Tukey, is discussed in Section 3.2 . The second, consisting of nonnegative definite quadratic forms in the standardized cell frequencies, is the main subject of Section 3.3 . Other newer developments relevant to both classes of statistics, especially the use of data-dependent cells, are also treated primarily in 3.3, while such practical considerations as choice of cells and accuracy of asymptotic approximate distributions appear in 3.2. Both sections contain a number of examples.

Tests of the types considered here are also used in assessing the fit of models for categorical data. The scope of this volume forbids venturing into this closely related territory. Bishop, Fienberg, and Holland (1975) discuss the methods of categorical data analysis most closely related to the contents of this chapter.
Sign up to read
Learn more about book

Cell	fo	fe	fo – fe
Top left	56	49	7
Top right	91	98	–7
Bottom left	44	51	–7
Bottom right	109	102	7

Cell	fo	fe	(fo – fe )	(fo – fe ) 2	(fo – fe ) 2 /fe
Top left	56	49	7	49	1.00
Top right	91	98	–7	49	0.50
Bottom left	44	51	–7	49	0.96
Bottom right	109	102	7	49	0.48

Learn about this page

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

View all