- 16/05/2024
- Posted by: Charlie King
- Category: Blog
As a young statistician it was important for me to understand when the project needed a probability design and when it was less important. There are a number of ways to distinguish the difference, but possibly the most useful distinction was whether the research was required for strategic planning or for tactical marketing purposes. The former would normally require a high level of accuracy, covering the market as a whole; whereas the second would often require more targeted research with the objective often achieved through a small sample qualitative study – focus groups or one-on-ones.
So whenever the research was intended to cover ‘the market for x’ the need would be to identify the users of product x and a means of selecting a probability sample of those users for the research. This would enable estimation of the size of the market in terms of numbers of users and their purchase behaviour, using a weighting procedure based upon inverse probability methods. Often there was no means of identifying a suitable list of users and the only accurate way of proceeding would be to take a random sample of the total population and screen that sample for incidence of usage of the product, taking careful note of any variation in response by characteristics such as age, gender, location etc. A random selection of households for face-to-face contact or of telephone numbers for a telephone research project would always ensure that we started the process with a sample selected with equal probability and it would only need calculation of the differential response rates for the achieved interviews to ensure an understanding of the different probabilities required for accurate estimation of the market characteristics. It was only the different response rates that had distorted the original equal probability random sample selected for the research
Note that both face-to-face and telephone research use interviewers to interpret the questions in a standard manner and the methods also ensure that the respondent is selected from a random sample. This is in stark contrast to today’s online methods, where the respondent interprets the question for themselves and, for the majority of cases, the sample is self-selected.
Note also the use of the words ‘inverse probability’; the process that describes the principle underlying the estimation of any market or characteristic from a sample is based upon determining the probability of each of the elements in the sample and using the inverse of that probability for the actual calculation of the size of the market. Since there are often variations in the response characteristics across any achieved sample there will normally be a number of different weights and it often surprises me that many of today’s researchers, who apply weighting to make their data more representative, are actually unaware that they are adjusting the probabilities to take account of differential response rates based upon the assumption of an underlying random sample.
Unfortunately, such weighting rarely accounts for the real differences in response rates, because they ignore many of the biases inherent with the method used for the data collection.
A classic example goes back to the 1992 general election when the opinion polls completely failed to predict the Conservative win. In those days there was no online research and telephone research was the principal method for opinion polling, using random telephone numbers for the sample and then quota controls to ‘ensure’ all important elements of the population are included in proportion to the population. Not only was there no online method available, but there were also no mobile phones, so the samples used were only landline numbers. Given the demand for quick completion of such surveys the quotas were completed more by the ‘stay at homes’ whose behaviour is not the same as those who travel outside of the home, and they often have different opinions. Filling quotas early with responses from the people who stay at home will bias the results of many research projects and this is particularly the case for many projects based upon online research. Regardless of how representative the underlying panel may be, or how comprehensive a river sample process, quotas are always filled by the early responders, and they may have very different characteristics from those who have not responded to the survey, whatever quota group they may belong to.
The failure to understand the implications of this fact can distort important results. An example has been the recently published GP Access Survey conducted by the Office for National Statistics, indicating that there are more persons experiencing long waiting times than as published by the NHS. Unfortunately, this is not necessarily the case, because there are flaws in the methods used, this is partially recognised by the ONS themselves by the statement “they are based upon an online survey, which implies a more ‘digitally literate’ population.” However, the ONS had considered that this had only influenced one aspect of the survey results, whereas it is likely to have caused more fundamental errors.
Another recent example was provided by The Critic Magazine, drawing attention to an online survey by the Gambling Commission that had overstated the incidence of problem gambling within the population, pointing out:
“Firstly, online surveys appeal to people who are very online — and that includes a lot of problem gamblers. Older people, who are less likely to be problem gamblers, are under-represented..….Secondly, people who gamble a lot are attracted to surveys about gambling”.
In this case the Gambling Commission acknowledged that an “online methodology means that the sample responding to the survey are more likely to be engaged online, thus skewing the data”. Nevertheless, they headlined the incidence of problem gambling in the survey
It is almost impossible to extrapolate the results of any online survey to statements about the population as a whole and therefore strategic research requires the expense of an interviewer and a genuine random sample, either face to face or by telephone. Yes, telephone research remains a viable method using a dual frame RDD sample, provided all the response details are carefully recorded, a long time is allowed for the data collection (minimum three weeks) and quota controls are not used to cut off responses.
But what about marketing research? Well if you are a company offering some new online gaming product an online survey is your way forward. But if you are thinking of making some change to your betting shops, possibly a focus group or two in a room at the local pub might be your best approach.
As a famous actor once said – not many people know that.