In research, weighting is used to adjust the results of a study to bring them more in line with what is known about a population. A frequent topic of conversation.
It is often the case that there is confusion around when to use it, how to use it, and for what purpose.
So what is it?
Weighting is an applied correction technique. By multiplying each response that is collected from a consumer by a ratio in order to bring the total results more in line with what is known about a population.
For example, only 20% of the consumers that responded to your survey were males. And we know that males make up roughly 50% of the general census population. So that 20% needs to be adjusted in order to bring the results more in line with the overall characteristics of the population.
In general, for responses that belong to a category that is under represented you assign a certain weight that is larger than the value of 1, and for responses that belong to a category that is over represented you assign a weight less than 1.
But why weight?
Ideally, you collect your survey data from a representative sample to start and no weighting is needed (stay tuned for future information around best practices for sampling).
If you’ve done the work upfront and you simply sample your target population in a representative way that meets the goals of your research, then the analysis is the very next step. No weighting, cleaning, re-coding, or time wasting is needed.
Weighting techniques become important when there are discrepancies between the actual population you are trying to analyze and understand, and the breakdown of the consumers you engaged. When this happens, no reliable conclusion can be drawn from your collected responses, so we weight.
In order to apply weighting properly, consumer insights teams and market researchers need to rely on what are called “auxiliary variables”. These are variables that have been measured in the survey itself, and their distribution in the population is already known. Like males in the example above.
Typically, these include demographic variables such as age, sex, gender, marital status, etc. obtained from credible national statistical institutions such as the Census Bureau. They can help consumer insights and research teams make estimates on the larger population.
Populations can be anything from all voting age citizens of the United States to craft beer enthusiasts in Portland, Oregon, or cat owners in Pennsylvania. Each is unique. And each has their own different representative sample.
Now for the tough question. To weight or not to weight?
In published research, top-notch empirical scholars make conflicting choices about whether and how to weight, and often provide little or no rationale for their choices.
Additionally, in private discussions among experts, it has been repeatedly found that accomplished researchers express confusion or provide faulty reasons for their weighting choices. This debate is captured below:
- Enable researchers to argue for more accuracy and representation of results for the larger population by adjusting for over or under represented segments of the population
- It increases the standard errors of the statistical analysis, making the overall findings less precise and more variable
- In other words, if you are up-weighting respondents, and counting each as more than one person, the more their answers are exaggerated
- All analysis is effected, to include reported descriptive statistics (means, percentiles, medians, modes), and inferential statistics such as regressions and coefficients
Again, this is a correction technique. So by that definition, you are fundamentally altering the data you’ve collected from your consumers.
There is general consensus in statistics literature that weights can be useful for descriptive statistics, stuff like mean, median, mode, and standard deviation.
However, there is less consensus on whether weights should be routinely used in advanced statistics such as regressions and significant testing.
Our view is that insights professionals and researchers should avoid relying on them as much as possible. Opt for focusing a little more time upfront identifying and targeting a represented balanced sample before you get yourselves into the muddy area of weighting. It will not only save you time on the back-end, but generally make your lives a little easier.