Inferential statistics are data which are used to make generalizations about a population based on a sample. They rely on the use of a random sampling technique designed to ensure that a sample is representative. A simple example of inferential statistics can probably be found on the front page of almost any newspaper, with any article claiming that “X% of Y population thinks/does/feels/believes Z.” A statement such as “33% of 24-30 year olds prefer cake to pie” relies on inferential statistics. It would be impractical to question every single 24-30 year old about his dessert preferences, so instead, a representative sample of the population has been surveyed with the goal of making an inference about the population as a whole.
Inferential and Descriptive Statistics
Another way of using survey data takes the form of descriptive statistics. In this case, statements are made that simply describe the data collected. It is possible for the same set of data to be used in a descriptive or inferential way. For example, in the run-up to a US election 1,000 people in a town might be questioned about their voting intentions, with the result that 430 said they would vote Democrat, 410 said they would vote Republican, with 160 undecided or unwilling to say. An example of using this data in a descriptive way would be to state simply that 43% of 1,000 people interviewed in this town intend to vote democrat. An inferential statement would be “Democrats hold 2% lead” — an inference about voting intentions in general has been drawn from a sample.
Methods
Before drawing any general conclusions from a sample it is important to employ the correct methods, otherwise these conclusions may not be valid. Common sources of error are in the way the sample is put together, and a number of factors can influence the validity of the sample population. Size is critical, because the smaller the size, the greater the risk that the sample will not be representative of the population as a whole. Care must also be taken to eliminate sources of bias. In the above example, factors such as age, gender, and income may have considerable influence over voting intentions, so if the sample was not composed in such a way as to reflect the general population, the conclusion may not be valid.
Sampling methods must be chosen carefully; for example, if someone took a convenience sample which included every 10th name in the phone book or every 10th passer-by at a mall, this sample might not be valid. Sample bias is also a consideration. For example, it is possible that 24 to 30 year olds attending a pie lover's convention are more likely to enjoy pie than cake, which would mean that a survey on dessert preferences which used conference attendees as a sample would not be very representative.
Uses
The use of inferential statistics is a cornerstone of research on populations and events, because it is usually difficult, and often impossible, to survey every member of a population or to observe every event. Instead, researchers attempt to get a representative sample, and use that as a basis for more general conclusions. For example, it would not have been possible to check the medical records of every single smoker in order to establish a link between smoking and lung cancer, but numerous random samples comparing smokers with non-smokers, and eliminating other risk factors, have firmly established this link.
Researchers who work with inferential statistics try to keep their methods and practices transparent, and as rigorous as possible, to ensure the integrity of their results. Statements based on informal polls and quick surveys may not be very useful, but in areas such as medical research and clinical trials standards are much tighter, and inferential statistics have provided vast amounts of valuable information. In other areas, they are used every day to make sweeping generalizations about populations that may shape public policy, product design, marketing, and political campaigns.