School of Communication, Information and Library Studies

SAMPLING THEORY

Methods of Inquiry Syllabus:514

Gustav W. Friedrich

HISTORY.
Sampling in social research has developed hand in hand with political polling.
1. President Alf Landon. The Literary Digest poll of 1936. The sampling frame was telephone subscribers and automobile owners; a 22% return rate.
2. President Thomas E. Dewey. George Gallup used quota sampling in 1936 and correctly predicted FDR would beat Landon. In 1948, however, he picked Thomas Dewey over Harry Truman. Part of the problem was over-representation of rural voters based on an out-dated 1940 census. He also stopped polling in early October-with many undecided voters.

KEY TERMS.
Sampling:
the selection of some part of the whole in such a way that we can use the part to inform us about the whole. We sample when it is less useful, impractical, or impossible to deal with the whole of something.
Our first step, therefore, is to define the population in terms of Units of Analysis: People; Groups; Organizations; Social Artifacts. Then we decide between a sample and a census (the whole population). A summary characteristic of a sample is a statistic.
Population: the total (complete) collection of elements. A study population is that aggregation of elements from which the sample is actually collected. A summary characteristic of a population is a parameter.
Element: a single member of a population.
Sampling Unit. A sampling unit is that element or set of elements considered for selection in some stage of sampling. In a simple single-stage sample, the sampling units are the same as the elements. In a three stage sampling approach (e.g., census blocks, households, adults), however, all three are sampling units while only the last is an element. The terms primary sampling unit, secondary sampling unit, and final sampling unit designate these success stages.
Observation Unit. An element or aggregation of elements from which information is collected. It is typically the same as the unit of analysis, although we could interview heads of households (observation unit) about family income (unit of analysis).
Criterion Measures: those characteristics of elements which are of interest in our study and thus for which we want the sample to be representative of the population. They, and demographic variables such as gender, are used to argue the representativeness of the sample.
Sampling Frame: a list of all the individual sampling units (elements) in the population.
Gap: those elements of the population that are missing from the sampling frame (either because we missed them in the first place or they choose to drop out of our study).
In-tab Sample: the elements that get analyzed in the study.

KINDS OF SAMPLING.
1. Convenience: using readily available elements of the population (students in Comm 1113). Methods of reducing sampling error: (1) to reduce researcher/interviewer preferences (going for the quickest/easiest sample or for people like them), (a) define all available samples; (b) provide specific instructions, and (d) use quota sampling; (2) to reduce respondent self-selection, made the choice of participation as attractive as ethically possible.
2. Judgmental: elements are chosen deliberately rather than randomly
a. dimension sampling: starts deductively with exclusive and exhaustive categories and selects elements to fit those categories.
b. theory sampling: starts inductively by looking for elements that are different from previous cases, e.g., coding responses to open-ended questions; or developing categories for content analysis. When no "different" elements are found, the sample is considered complete.
Potential uses:
a. identification of "pure types" (alcoholics)
b. to insure representation of diversity (family schooling families)
c. to eliminate sources of variation
3. Probability: randomness enters into the selection process. Goal: EPSEM (equal probability of selection method) samples.
a. simple random (SRS): every element has an equal chance of selection. Three requirements: (1) every member of the population must be available for selection. (2) the method of selection must not discriminate among the members; (3) once selected, the element must be entered into the sample without regard to the elements already in the sample or those left out of the sample. It is seldom used in pure fashion and is often replaced with systematic sampling: every kth after random start. Two terms: (1) sampling interval: the standard distance between elements; and (2) sampling ratio: the proportion of elements in the sample that are selected. A potential danger is periodicity-lists arranged in a cyclical pattern that coincides with the sampling interval (e.g., military rosters arranged by rank; apartment numbers arranged by floor).
b. stratified random: random sampling within subgroups/strata of the elements. It may be either proportionate to the population or equal to one another. Three advantages: (1) ensures that at least some members are in each strata; (2) allows different collection methods (phone; in person) for each strata; (3) reduces sampling error by restricting range of scores within strata.
c. cluster: random selection of heterogeneous units of the elements (e.g., classrooms, schools, towns, etc.). It takes advantage of the fact that in some populations, elements assemble in easily identifiable units.
d. multistage: successive random selection of units starting with the largest unit (primary unit) and moving to the smallest (secondary units). It involves the repetition of two basic steps: listing and sampling. Probability Proportionate to Size (PPS) Sampling: each cluster is given a chance of selection proportionate to size; within each cluster, a fixed number of element (e.g., 5) is selected.

POTENTIAL ERRORS.
1. Sampling Error: estimate of the likelihood that the sample deviates from the population on the criterion measures.
With convenience and judgment sample, we can but live with sampling error. With probability samples, we can estimate its rate using the central limit theorem (CLT), which states: as many successive samples are drawn from a population, the sampling distribution of an estimate of some population characteristic will form around the true population value; and the sampling distribution itself will approach a known distribution called the normal density distribution.
Theoretically, we have a sampling distribution: the set of all possible samples of a given size. The mean of this distribution is the mean of the population and the standard deviation is a measure of the sampling error. Thus, using CLT we can specify a confidence interval with a known confidence level: +/- 1 S.D. = 68%; +/- 2 S.D. = 95%.
In practice, we use the sample mean as an unbiased estimate of the population mean; and the sample standard deviation (standard error of the mean) as the best estimate of the population standard deviation.
To reduce sampling error: (1) we increase sample size (Because of the square root formula, the standard error is reduced by half if the sample size is quadrupled. Thus, if samples of 100 produce a standard error of 5%, the sample size must be 400 for 2.5%.); or (2) we partition variability through stratification.
2. Sampling bias: systematic exclusion of elements of the population from the sample because of the sampling procedure used.

SAMPLE SIZE.
It is dependent on three properties of the study: (a) the complexity of the characteristics under study (the number of categories used to measure it), (b) the precision required to approach these characteristics, and (c) the resources available. Characteristics which are very complex and must be approached with high precision require large samples and considerable resources.
The logic of confidence levels and confidence intervals provides the basis for determining the appropriate sample size for a study. Once you have decided on the degree of sampling error you can tolerate, you will be able to calculate the number of cases needed in your sample. Thus, for example, if you want to be 95% confident that your study findings are accurate within plus or minus 5% of the population parameters, you should select a sample of at least 400.
A very rough rule of thumb for the novice researcher is that samples of less than 30 are generally considered inadequate except for pretesting; samples in the 100 to 200 range are rarely brought into question; and few research questions require samples of greater than 500.

UNIT OF ANALYSIS:
 

 Characteristics

 Orientations

 Actions

Individual

all

students

voters

 

sex

age

martial status

 religious

liberal

superstitious

voting

investing

striking

 Group

family

streetgang

city

 

mean income

size

location

 

purposes

policies

regulations

 

praying together

going on picnic

fighting 

Organization

churches

corporations

colleges

 

 

number of people

gross assets

location

 

 

procedures

orientation

motives

 

 

mergers

fixing prices

donating to causes

Social Artifacts

books

weddings

court cases

 

 

 

price

location

people involved

 

 

 

political orientation

goals

rules

 

 

 

hostile strike

marital success

conviction/acquittal

To return to the Syllabus