2016-03-23

Yosemite-Fucking Sam

What's this about?  

Statistics, of course!  Something of a YANSS post.

Lately, YANSS has been devoting a "seasons" worth of podcasts to logical fallacies.  They're pretty good, and it's very worthwhile to listen to them, but they just don't have that "wow, holy crap" factor that the more self-delusion focused episodes have for me.   That's probably because I'm generally much more familiar with formal logic, at least from a mathematical standpoint.

So as far as Yosemite-Fucking Sam goes, episode 71 covers the Texas Sharpshooter Fallacy, which is somewhat similar to confirmation bias, but is a little more nuanced (this episode doesn't appear on the website yet, but is available wherever overcast syncs from).

And yes, I'm aware that Yosemite is in California!

Now the reason I'm writing this isn't necessarily to call attention to this particular podcast, or fallacy (but listen to it anyway), but to take issue with how nearly completely one of the guest experts dismissed statistics as a cause of the fallacy, rather than when properly applied, the best solution to the same.  I'm by no means an expert in statistics; I rarely need to calculate anything more meaningful than variance on the data sets that I work with regularly (parallel file system performance data, mainly).

None of this is to say that statistics is immune to the Texas Sharpshooter fallacy, but understanding a little bit about basic statistics can help one avoid it, and recognize when it is happening.

Inferential statistics (the kind used to model and draw conclusions about a population) generally begins with the null hypothesis, that is a general position that no relationship between the studied phenomena exists.  The null hypothesis is assumed to be true, and the analysis of the population data (sampled or not) generally attempts to reject or disprove this hypothesis.

The most obvious way that the TS fallacy can creep in is when an inappropriate null hypothesis is chosen.  For example, if a relationship is assumed to exist.   It can also crop up if the wrong statistical model is applied to a valid null hypothesis.

What models, selection criteria, sampling methods and confidence tests are available and how they should be applied to various data sets and situations are way beyond my specific realm of knowledge, and most peoples' as well.  How do we trust that the statistics presented in support of an argument are valid and not suffering from the TS fallacy?   That's a difficult question.

One of the experts suggests questioning the motivations of the person/entity that proffers the statistics.  I suggest questioning the source as well. 

If CERN tells us that they've reached 5 sigma on the discovery of the Higgs Boson (go CERN!), one can be fairly certain that they've done the appropriate tests to compute that confidence interval (99.9999%).

If a blatantly biased organization is using numbers, than we'd be better suited to question the validity of the analysis that drove the results.

It isn't important that people necessarily understand the ins and outs of statistical analysis, but it is important to understand the basics, and the basics include knowledge that various analysis methods exist.  Right now, to most people statistics is a black box (to say nothing of the overall innumeracy of Americans), and that makes it particularly easy to lie with numbers.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.