Sample Selection Bias and Inductive Reasoning
What is sample selection bias, and how is it related to inductive reasoning? I have briefly discussed the sample selection bias in the diversity principle article. This topic is closely related to my own research field. So, I will keep talking about this many times. However, I would like to write this one as the main article on selection bias.
What is “Nonprobability Sample”?
Nonprobability sample is a sample that is drawn from a population with certain selection mechanisms. For example, the whole world has been struggling with the Covid-19 pandemic since 2020. There had been many discussions on the impact of COVID-19 on the human body. for example, the mortality rate of the COVID-19. However, it is difficult to estimate the mortality rate because there a selection bias that must be controlled. We can only record people who are more likely to show symptoms of the disease. Otherwise, people are less likely to get tested because, for instance, they do not even realize they had the disease. Therefore, with this sample, when you estimate the mortality, you overestimate it because our sample will contain people who are less healthy than the population average, e.g., older people. This sample is called a nonprobability sample because of such selection bias.
Consider the following diagram. Blue dots are people with no symptoms. The red ones are people with symptoms of the disease. Our records, however, more likely to contain red ones than blue ones. If you take the average of the mortality using this sample, estimates will be biased, resulting from the sample selection mechanism.
To estimate the mortality rate accurately, there are many methods – which we will not talk about now. But it is clear that our sample must have enough diversity so that we can estimate the mortality rate. For instance, there should be more people with no symptoms. Assume that our sample contains all kinds of people. How can we truly estimate the “true mortality rate”?
Well, how the weights are estimated is a difficult problem which many researchers are working on. The obvious solution is to give different weights to blue and red ones when calculating the average. If blue ones are less likely to be in our sample, we will weigh them more because we know more people out there are not included in our sample.
I explained to you a statistical problem closely related to inductive reasoning. But it has a strong reflection on our daily life reasoning. This is one of the most common reasons why our reasoning fails. We are surrounded by people like ourselves. For instance, when we think about the problem in this world, we tend to think about our problems first, our family, and our country. We give less thought to people’s problems in the other parts of the world. Why would we do otherwise? It is not useful for our survival, although it could make the world a better place. (I discussed the “merits of bias” in this article if you are interested).
Sample Selection Bias used as a Propaganda
We might explain this kind of sample selection bias problem with its merits. But the truth is it is not useful in a global world. It limits your imagination, causes social conflicts, impairs your reasoning by making you a lazy thinker, etc.
Let’s discuss one example of how human reasoning is abused on purpose by sample selection bias: Propaganda (Propaganda is a broad term that includes sample selection bias.) People/media presents examples supporting their ideology. We can explain it using the term “Confirmation Bias.” If media keeps showing bad news that has a common pattern, say X, you will start to think, “X is bad.”. And vice versa… We today know that it is very easy to do. Look for “Joseph Goebbels,” for example. Propagandas got stronger with newspapers, radio, television, the internet, and social media (today). How can one have a good understanding of the world when he/she is exposed to biased samples? Difficult to answer, but one thing is for certain: We need to beware of this problem. It is overwhelming, but we cannot understand the world without filtering the information. This makes it amazingly difficult to following social media, reading newspapers, or listening what your friend says. We truly live in the post-truth era.