Believing Numbers are True, When Numbers Are; Lies, Damn Lies, Statistics: Protect Yourself from Falsehood, Distortion…

“What comes full of virtue from the statistician’s desk may find itself twisted, exaggerated, oversimplified, and distorted-through-selection by salesman, public relations expert, journalist, or advertising copywriter.” ~Darrell Huff

Statistics (numbers) can be very helpful in providing a powerful interpretation of reality, but also can be used to distort true understanding. How often do we hear in newspapers and people say ‘according to statistics…’ and ‘statistics show that…’? What is statistics: It’s the branch of mathematics that deals with the collection, organization, analysis, and interpretation of numerical data.

Although statistics can be very valuable in providing interpretation of reality; its use can be misleading and distorted into believing information that is not true.  Statistics can be seen as a paradox where its simplicity, directness and completeness are its strengths, but these qualities are also its weaknesses. The phrase ‘Lies, Damn Lies, and Statistics’ describes the persuasive power of numbers, particularly the use of statistics to bolster arguments.

The term was popularized by Mark Twain, who attributed it to the 19th-century British Prime Minister Benjamin Disraeli.  However, there are earlier references, such as, by Eliza Gutch, in 1891, who said; “Sir,–It has been wittily remarked that there are three kinds of falsehood: the first is a ‘fib’, the second is a downright ‘lie’, and the third and most aggravated is ‘statistics’…”

In the article “How to Explain Lies, Damn Lies, and Statistics” by Tim Berry writes:  Don’t get me wrong: I like research… I just say don’t bet the store on it. Use it to educate your guesses, but only as long as you stay skeptical. Read it, consider it, but don’t believe it. Mark Twain said: “There are lies, damn lies, and statistics”. Blogger and business researcher Steve King, a sometimes-Twain-like research analyst gives a great example in his post ‘Why Surveys Show Wide Differences in Small Business Social Media Use’.

Steve pulls up two surveys with starkly different results. The Wall Street Journal reported that 70% of small business owners think social media is important.  But a Citibank survey said only 36% of small businesses use social media and a mere 24% have found social media useful for finding leads or generating revenue. What’s up with that? Methodology and sampling techniques, Steve explains. The survey that was big on social media was taken from people it found online using Twitter, Facebook, and other social media.

The other one was a telephone survey. So, as they say, ‘no duh’. Most of the business people who use social media think it’s important. Most of the ones caught on the phone don’t. The point is that both surveys are valid in their specific context, both were done professionally, and both can help you understand what a defined group of people thought – or told survey takers they thought. But they contradict each other. So if you’re using research, use it well, explore the assumptions, look for the built-in slant, and take all of that into account. Humans make decisions. Statistics don’t.

In the articleLying With Statistics writes:  Statistics are islands of certainty in a sea of unknowns. ‘Islands of certainty’, that is, unless they are biased, which is often the case. Statistics are commonly used to support a biased position or an outright fabrication for two reasons. The first reason arises from fact that few people understand statistics well enough to question them.

The second and more sinister reason is that lying with statistics requires no actual lying. If the most favorable data is highlighted and the most unfavorable data is suppressed, statistics can be manipulated to illustrate just about any point of view, allowing the manipulator’s hands to remain unsullied. One common way that statistics can be misleading is when they refer to so-called ‘averages’.

When you hear the word ‘average,’ your first question should be; ‘what kind’, because an average can come in three flavors: a ‘mean’, ‘median’ or ‘mode’. Sometimes the ‘mean’, the ‘median’ and the ‘mode’ are so arithmetically close to one another as not to really matter to a layman. This occurs when the data is characterized by what is known as a ‘normal distribution’. Human attributes, such as the height of men and women, are characterized by a ‘normal distribution’. If you read that the average height of American men is 5′ 10″, it makes little difference whether the average refers to the mean, median or mode because they will all be very close to one another.

However, not every set of data will be ‘normal’. Another method of statistical prevarication is to use deceptive visual graphics. Because many business concepts and ideas are technical and complex, a common and useful way of conveying such information is through the use of graphs, charts and pictures. Although illustrations can reflect the true facts under consideration, such visual information is easily massaged, edited or distorted to manipulate how the information will be interpreted.

In the article “Damned Lies and Statistics: Helping Numbers Make Sense” by Enrico Giovannini writes: Should you believe what you read? What would your reaction have been if you had read the following sentence in a 1995 newspaper article: ‘Every year since 1950, the number of American children gunned down has doubled’?

You probably would have been shocked, but would you have spent that much time thinking about the reliability of such an impressive ‘statistic’? For if you had believed it, as one often does when reading such headlines, you would have been making a big mistake. Suppose that in 1950 only one child in America was shot dead, then doubling this number 45 times, you would reach a number of 35 trillion children gunned down in 1995.

Fortunately this is an implausible figure. This widely quoted example is recalled by Joel Best in his most interesting book, ‘More Damned Lies and Statistics’. The example is even more pertinent insofar as it is based on a syntactic mistake. In fact, the correct wording in the original source was ‘The number of American children killed each year by guns has doubled since 1950’ and it was the misquote that disseminated the ‘false statistic’. The example points to several other lessons, too.

For a start, data (i.e., a numeric value) without appropriate metadata (i.e. information about the meaning of data) do not give any meaningful information. This means a ‘trained’ brain may indeed be necessary to be able to assess the reliability of a statistical value. Unfortunately, this creates a general attitude of –leaving to media and other experts the role of selecting the ‘correct statistical information’, with the user/public left to judge the credibility of the source.

In other words, the layman’s capacity to evaluate the correctness of a statistic is quite limited, and as a result they (user/public) become even more convinced that all statistics, whatever their source, are beneath even the damnedest of lies.

Statistics do have a sort of magical appeal. They appear to the untrained eye to be based on complex math that is difficult to understand. This is rubbish; statistics are easy to create, whereas accurate statistics are much more difficult to calculate. Statistics are governed by a term used to describe computer problems, namely, ‘gigo’, or ‘garbage-in-garbage-out’: If the survey ‘asked the wrong question’, or ‘asked the wrong group of people’ or ‘are subject to any other major issues’; there is no statistical analysis method in the world that can create meaningful information from the raw data.

There are some techniques that can correct small errors, but the more small errors corrected, the less accurate the results… Startling statistics shape our thinking about many issues; business, social, political…. and all too often, these numbers are wrong. When it comes to statistics, knowing how to measure and how to interpret results is at least as important as knowing what to measure.   Ultimately, people who rely on statistics without understanding them naively accept them. In the words of Andrew Lang, “some people using statistics are no better than a drunken man (who) uses lamp-posts…for support rather than illumination.”

An alternative is to be cynical and assume that all numbers are meaningless; but despite this temptation, we should be cautions and seek to understand the real story behind the numbers. In a New York Times article by Justin Wolfers, Wharton Professor, writes: ‘Today, consumers of information are drowning in data. Terabytes of data are being generated from constant measurement of businesses, workers, government and other activity, and there are many ways to draw inferences from the raw data… unfortunately, many of them lead in the wrong direction.’

“I abhor averages.  I like the individual case.  A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live”.  ~Louis D. Brandeis

“Statistics: The only science that enables different experts using the same figures to draw different conclusions”. ~Evan Esar