How to Avoid Misleading Your Audience with Statistics
I recently picked up a copy of Darrell Huff’s 1954 classic How to Lie with Statistics. Although Huff’s background was not in statistics, his experience as a writer exposed him to the use (and misuse) of statistics. This book is a small volume that packs quite a punch, pointing out numerous examples of how readers can be misled by shady statistics.
All the classics are here. Switching between different measures of central tendency (the mean, median, or mode) without specifying which is being used, or using one of these measures when you should be using another. Changing the scale of your charts to make a small percentage change look very dramatic, or vice versa. Bolstering an argument with “semi-attached figures” that are very precise but that are only tangentially related to the actual issue at hand.
Some of Huff’s considerations do not even require a technical grasp of statistics. For instance, who conducted this research? Did they have a financial stake in the outcome? Were the results self-reported, and thus prone to exaggeration or subject to poor memories?
Huff’s intention is admirable. He wants to arm his readers with the tools necessary to be responsible and informed consumers of statistics. And he does that by pointing out the numerous methods that the unscrupulous use to lie with statistics. And they are numerous. After reading Huff, it would be only natural for a reader to just throw their hands up in exasperation and swear off the use or consumption of statistics with its tricksy numbers and vacuous conclusions.
However, we live in a world of big data, and that is not going away anytime soon. If anything, the amount of data that our society generates is projected to increase with each passing year. Statistics is the study of data: collecting it, aggregating it, analyzing it, and interpreting it. If we hope to make any sense of our big data world, we have to use statistics.
In Huff’s own words: “But arbitrarily rejecting statistical methods makes no sense either. That is like refusing to read because writers sometimes use words to hide facts and relationships rather than to reveal them.”[1]
I appreciate Huff’s passion for illuminating potential abuses of statistics. In fact, it got me thinking about how I could apply his wisdom to my own work. While I do not wish to outright lie with statistics (I prefer my insights to be honest), how can I avoid misleading my audience with statistics?
In some ways, you’ll be doing well by just doing the opposite of all the bad practices collected in How to Lie with Statistics. Always clearly label the axes on your charts. Be precise when describing a percent change versus a percentage point change. Always note the sample size. If a figure has substantial variation, show a range with the mean (or represent it visually with a boxplot).
In addition, I have noticed in my pricing career that it is of the utmost importance to communicate your statistics to your audience at the appropriate level of detail. Depending on your audience, this could involve going deeper into the realm of statistics or backing out of it a little bit. Keep in mind that unintentionally confusing your audience by talking over their heads can be just as detrimental to their understanding as intentionally fudging your numbers.
For instance, I recently had a meeting with a pricing working group in which I was trying to communicate the variation that was present in their dataset. Unsurprisingly, I chose to convey this variation by using the mean plus and minus one standard deviation, which every statistics student knows is a range that covers 68% of a normally distributed dataset. Unfortunately, my audience did not consist of statistics students.
I thought that since I was going to use the standard deviation to communicate their pricing reality, it only made sense for me to take the time to explain what a standard deviation was and why anybody should care. I went too far into the weeds. My audience, who was not used to dealing with statistics, was confused about what I was trying to communicate. After several failed attempts at explaining the concept, I finally found the words to share my insight. It was obvious to me that I would need to leave explicit explanations of the standard deviation out of future conversations with this audience.
In hindsight, I could have gotten to the same result (i.e., describing the dataset variation) without taking a scenic tour through the wide world of statistics. It would have been much more effective for everybody if I left out the textbook definitions and jumped straight to the conclusion: “68% of your transactions in FY2018 for this product were between $34 and $43.” That would have delivered the most important information that I wanted my audience to walk away with.
On the other hand, if my audience was made up exclusively of statisticians, then I could have simply listed the mean and standard deviation. They would have filled out all the additional details. Although it probably still would have helped to have a summary table with the mean plus or minus one, two, or three standard deviations (which would cover 68.3%, 95.5%, and 99.7% of the transactions respectively).
And for those brave readers in the mood for a challenge, your communication task is multiplied if your audience is a mixed audience with both very technical and very non-technical people. You will have to commit to clarity to pull that off. All I can advise is to try your best to create an environment in which everyone is comfortable asking questions. Additionally, regularly check in with your audience: “Before I move on, does anybody have any questions? Does that concept make sense?” And don’t for a second believe that silence means that everybody understands you. Ideally, you want to provide everybody in the room with a baseline understanding and enough information to actually engage in the discussion at hand.
So, if you wish to avoid misleading your audience with statistics, by all means start with Huff’s guidelines for not lying with statistics. But also take the time to tailor your message and communicate with your audience at the appropriate level of detail.
Tagged: big data, communication, Darrell Huff, How to Lie with Statistics, pricing, statistics