You’re interested in whether having a Star on its belly helps a Sneetch to perfectly roast a frankfurter:
Frankfurter-roast quality is a normally distributed random variable across the whole population of 10,000 Sneetches:
You’ve read in all the leading Journals of Frankfurterology that a Star on one’s belly is a great aid to frankfurter-roasting, so you devise a clever experiment. You will randomly assign the Sneetches to one of two conditions:
Half get Stars on their Bellies, half do not.
You’ll then compare the frankfurter quality of the two groups– on average, the difference in frankfurter quality is an estimate of the true causal effect of belly stars on frankfurter quality.
As a good practitioner of Frankfurterology, you know that stars are key to frankfurter quality. Therefore, when the estimated treatment effect is zero, you try, try again, with different groups of Sneetches and various Advanced Sneetchesian Machine Learning Data Analysis techniques, until you get a statistically significant effect.
First you do the experiment 1000 times with 50 Sneetches.
Then you do the experiment 1000 times with 150 Sneetches.
Then 1000 times with 250.
Then with 350, and finally with 450.
In each run of 1000 experiments, about 50 by random chance show statistically significant effects. This number (which some doubting thomases might call false positives) does not depend on sample size, as long as you keep the statistical significance filter the same.
However, here’s what the magnitude of the average statistically significant effect looks like, depending on the number of Sneetches in each sample:
Our experiments with 50 Sneetches produced Important, Substantively Meaningful Effects. Our experiments with 450 Sneetches were much less impressive.
This is why proper practitioners of Frankfurterology know that small samples are often more convenient than large ones.