Georgetown University Seal

Department of Psychology

Students walking across lawn.

Theory

Theory: Why does this work?

Central Limit Theorem

When several samples are drawn and tested from the same population, and the mean is computed for each sample, chances are that the means will not be the same, but will instead be distributed. No matter what, populations are made up of people (or variables) that vary in their values, and no sample will ever completely match its population. Because of this, we know that all samples have some error. A sample mean may not look exactly like the population mean because of error due to chance. We can draw many samples from a population; they will vary somewhat around the population mean. If you were to graph these sample means, the distribution will be roughly normal, given a large enough sample size. This is known as a sampling distribution of means. According to the central limit theorem, these means will be normally distributed—a histogram will yield a normal curve. This sampling distribution of means will be normally distributed (assuming samples sizes of about 30 or greater) even if the true population distribution is not normally distributed! Like all distributions, this one has a standard deviation . It is known as the Standard Error of the Mean and describes the variations of sample means around the true population mean. (Means have a standard error due to chance variations resulting from random sampling).

 


 

Standard Error of the Mean:

It is impractical, of course, to test and re-test every member of the population for one study. That’s why we use sampling. When we sample, though, we don’t have any information about the variability of the population. However, you can use the standard deviation of your sample as a way to estimate the variability within the population. Based on this, there is a way to calculate the standard error of the mean based on data from the sample:



Where:
s: standard deviation
n: number in sample

Because the sampling distribution of means is a standard normal distribution, 68% of all cases run within one standard deviation of the mean; this means that 68% of the cases will lie within one SEm point from the true (population) mean: there is a 68% confidence that the true mean lies within one SEm point from the mean.

 


 

 

Confidence Interval

Most people will laugh at you if you say you’re 68% sure of something. There’s still 32% to worry about. But there’s no inherent rule about acceptable levels of confidence or uncertainty. So, in general, psychologists have established a convention about certainty by agreeing that 95% confidence is an “acceptable” level of certainty.

To obtain a 95% confidence interval all you need to do is multiply the SEm by 1.96 and then add it to and subtract it from the mean. This will give you a lower limit and an upper limit.

CI(95) = M ± (1.96)( SEm)

Similarly, to obtain a 99% confidence interval, just multiply the SEm by 2.58, and then add it to and subtract it from the mean.

CI(99) = M ± (2.58)( SEm)

Each of these confidence intervals has advantages and disadvantages. In general, a larger sample and smaller standard error of the mean yield a more precise confidence interval.

If you look at the formula:


you’ll notice that s is in the numerator and n is in the denominator. The relationship between these two and the standard error can help make you generalize data with more confidence. Standard deviation and standard error of the mean are directly related: the smaller your standard deviation, the smaller your standard error. Sample size and standard error of the mean are indirectly related: the larger your sample size, the smaller your standard error.

 


 

 

Null and alternative hypothesis

Data can be misleading. Apparent differences between 2 groups, or samples, could be because of the treatment or experimental condition. Or, they could be “false differences,” that it is just due too the error associated with sampling. Inferential statistics help determine which is more likely. They help you decide how confident you are that the apparent differences are “true” differences, rather than “false” differences due to sampling error. We use inferential statistics with two “versions” of our hypothesis: the null hypothesis and the alternative or research hypothesis.

The null hypothesis assumes nothing: no relationship, no difference, no effects. In an experimental design, the null hypothesis states that the treatment had no effect on the experiment group, If we are testing whether there are differences between groups, the null states that there is no difference; said another way, that the true difference between the means in the population is zero: that all groups are equal, and any difference is due to sampling errors.

The alternative hypothesis assumes something: some relationship, some difference, some effect. It states that there is a difference between the two groups. It’s usually the hypothesis that we’re really interested in. It can either be directional (group one’s average is higher than group two’s) or non-directional (the groups’ averages are different). Non-directional hypotheses are not frequently used.

For the purposes of statistics and hypothesis testing, researchers assume that the null hypothesis is true, that the treatment had no effect what so ever, and that there is no difference (other than sampling errors) between groups. These seems counterintuitive because we often really believe there is a difference, but here’s how it works. If we assume that there is no relationship or no effect, it only takes one piece of contradictory evidence to show that the null hypothesis is wrong: the result of an inferential test. That is, we examine the data to see how likely it is that the null hypothesis (Ho or H0) is true, given evidence provided by sample data. So, hypothesis testing asks whether our data (the evidence) is consistent with the assumption of the null hypothesis – that there is no difference or no relationship.

After using an inferential test, a researcher will arrive at a probability that the null hypothesis is true. A p (probability, or alpha) of .05 or less means that the likelihood that these data came from a population where the null hypothesis is true is less than 5 out of 100, or 5%. Psychologists typically interpret a value of .05 to mean that the results do differ significantly, and that this difference is not very likely due to sampling error. If the researcher obtains a p of .05 or less, she can reject the null hypothesis and instead accept the alternative hypothesis (HA or H1) as consistent with the data. The alternative hypothesis generally states that there is a significant difference between the two groups, and this difference could be attributable to the treatment or experimental condition, despite a small chance of errors due to sampling.

A p-value greater than .05 means there is a greater than 5% chance that these results came from a population where the null hypothesis is actually true. In this case, you are less than 95% confident or certain that the groups are different, so you fail to reject the null hypothesis. Your research hypothesis is not supported, and any differences you see are not significant, but instead could be due to error or chance.

Pay attention to the language you use when making decisions about the null and alternative hypotheses. We never accept the null hypothesis, we only fail to reject it. Why? Because when we fail to find significant results it could be for one of two reasons. It could be that your research hypothesis is wrong and there is no relationship between the variables, or no effect of the treatment/experiment. If we were absolutely sure that this explanation was correct, we could “accept” the null and claim there is no relationship or effect. But, we could fail to find significant results for methodological reasons such as our sample size was too small. In this situation, there truly is a relationship between the variables, but we just couldn’t find it (see Type II errors below). So under these circumstances, we shouldn’t accept the null because it’s not really accurate. Because we never really know whether a lack of significant results is due to a true lack of relationship or to faulty methodology, we never accept the null; we just fail to reject it (and perhaps design another study).

 


 

Type-I and Type-II errors

No matter how accurate the test or how large the sample, there is always some small chance that the null hypothesis is true and that a difference between groups is due to chance. This is the chance of creating a type-I error, when a p<.05 is obtained but that the null hypothesis really is true. For example, a tester might be trying to determine which month gets more rain, April or August. She collects data every day and discovers that, based on her data, it rains more in August than April. However, in all other years, it rains more in April than August. It just happened by chance that she got a particularly dry April or rainy August that year. She didn’t do anything wrong, she just happened upon that small random chance variation in the weather patterns.

A type II error occurs when the null hypothesis is not rejected, even if it is false and should be rejected. There are two reasons why a null hypothesis is not rejected – either there truly is no relationship between the variables (i.e., your research hypothesis was wrong), or, your test does not have enough power to find a difference that is truly there (i.e., your research hypothesis is correct but you can’t prove it). For example, a researcher might be trying to find out if there is a significant difference between Democrats’ and Republicans’ opinions on the role of the government. However, she only had two Democrats and two Republicans in her sample, and all four happened to be fairly moderate. So while the two groups varied on their opinions, this researcher by chance happened upon Democrats and Republicans that agree with each other. Therefore, she did not reject the null hypothesis. Some ways to prevent a type-I error are to have a large sample size and low variability. See “central limit theorem” for more information on this.

 


 

 

One-tailed vs. two-tailed

Inferential tests that use a p value of .05 are allowing the possibility of error 5% of the time.


   

When looking at the graph on the left, the areas to either side of +/- 1.96 standard deviations are shaded. 2.5% of the graph’s area lies in either of these shaded regions for a total of 5%. This is known as a two-tailed test, because there are two “tails,” or possibilities for error. It also means that there is a possibility of a difference in either direction—that you believe that Group A could be higher or lower than Group B. However, if you’re testing directionally (that is, if you specify initially in your hypothesis that group A will be significantly higher than group B and you are uninterested in whether group A might be lower than group B), there is no need to test in the other direction (that group A will be significantly lower than group B). This means that you move all 5% of your probability for error into one “tail” of the curve:
Because this is closer than 1.96 standard deviations from the mean, you don’t use 1.96 as the z. You use 1.65. This gives you a probability of .05 (p<.05).
You should only use a one-tailed test if you have a directional hypothesis and are confident that there is no significant difference in the other direction


Back to Inferential Statistics

 


 

References:

Leedy, Paul D. & Ormrod, Jeanne Ellis. (2001). Practical research: Planning and design (7th ed.). Columbus: Merrill Prentice Hall.

Patten, Mildred L. (2002). Understanding research methods: An Overview of the essentials (3rd ed.). Los Angeles: Pyrczak Publishing.

Pavkov, Thomas W., & Pierce, Kent A. (2003). Ready, set, go! A Student guide to SPSS(R) 11.0 for Windows. Boston: McGraw-Hill.

Pyrczak, Fred. (2002). Success at statistics: A Worktext with humor (2nd ed.). Los Angeles: Pyrczak Publishing.

Schutt, Russell K. (1999). Investigating the social world: the Process and practice of research (2nd ed.). Thousand Oaks: Pine Forge Press.

 

Upcoming Events

  • There are no upcoming events scheduled at this time.

View all upcoming events.

Box 571001
White-Gravenor Hall 306 Washington, DC 20057-1001
Phone (202) 687-4042
Fax (202) 687-6050
Georgetown College Nameplate