A picture is worth a thousand words

Everything should be made as simple as possible, but no simpler.

C’est la Bérézina– French phrase meaning ‘it’s a complete disaster’***

We’ve all heard it, ‘A picture is worth a thousand words’. What a preposterous lie! Let’s analyze this phrase.

‘Picture’ can be any visual image, for this blog, I’m thinking of a statistical graph. What can I say? I’m a statistician.

‘Worth’, well beauty is in the eye of the beholder. This is no doubt true. With regard to statistical graphs, what is worth? I can’t judge ‘worth’ in terms of my eye, nor my client’s. Worth should ultimately be judged from my client’s target audience, be it the Agency or a practicing MD scanning a periodical. What is worth from my prospective? As a statistician, truth comes foremost to my mind, followed closely by elegance.

Truth? We all want to focus the reader’s attention on the treatment effect. Let’s say we see a final active treatment mean of 72 and the placebo treatment mean of 70, on a scale which goes from 1 to 100. It would be misleading to present a graphic which presents the results with an axis which goes from 71 to 73, on this 100 point scale. There are various ways to mislead. One can shorten the height of the graphic, minimizing differences or maximize it by cutting off part of the scale. Any way around it? Yes, we can plot the difference between the active and placebo (e.g., +2.0), embellished by its 95% confidence interval on the mean. I love plotting that with a line indicating a zero difference (the null hypothesis). Even better would be the mean changes from baseline (‘improvement’) for active, placebo, and the difference in ‘improvement’, each with their 95% CI of the mean. Sometimes it helps to have two axes (on both the left and right sides), if the magnitude of the scales differ. For example, putting the difference on the right axis and the individual means scaled by the left axis.

As a side note, you might wonder why I stated ‘on the mean’ twice in the last paragraph. In my first job, my boss’s boss (Lou Gura) told me that graphs should be entirely self contained. The reader shouldn’t have to read the text of the paper to figure out what is presented. There are many confidence intervals, like the 95% CI of the raw data. One should, at a glance be able to deduce what each tick on a graph is. One expert suggested including a text box summarizing the conclusion the reader should reach.

Elegance? As a statistician, I favor simplicity. “Everything should be made as simple as possible, but no simpler.” “There are some easy figures the simplest must understand, and the astutest cannot wriggle out of” (for the full quote, see 6. ‘Lies, Damned Lies, and Statistics’ part 1, and Analysis Plans, an essential tool).

Combining truth and elegance, I want to present a graphic which conveys the information clearly and completely. More on this in Graphics II.

‘Thousand words’ is quantifiable. I pulled up a work of fiction and timed how long it took me to leisurely read 1,000 words. It took almost 5 minutes to read these two pages. Obviously, skimming would be faster and reading technical works takes longer. How long do you look at any picture? When I go to a museum, I seldom spend 5 minutes on any picture. Most pictures I spend less than 10 seconds on (< 33 words?). When was the last time you read a medical journal and spent 5 minutes on a statistical graph? Nevertheless, our objective with a statistical graph is to foster the reader to linger, but to understand the graph immediately, especially what the graph’s originator is trying to say.

Let me give you a counter-example, clearly worth many 1,000 words. I took a graphics course by Edward Tufte. Along with three of his books he gave each attendee the following graph from Charles Joseph Minard, it presents Napoleon’s March to Moscow – The War of 1812. Tufte claimed, and I agree, that it “may well be the best statistical graphic ever drawn”. I’ve spent many hours staring at this graphic. It on my office wall above my monitor, for inspiration.

The graph presents a map from the Polish Russian border to Moscow; it presents the size of the Army going to (gold) and returning from (black) Moscow, including various troop diversions; and the temperatures experienced by the returning army at various dates. The Russians successfully used the scorched earth policy to devastate the invading army. It is the most expressive anti-war picture I’ve ever seen. One can’t fail to see the astonishing loss of Napoleon’s troops. One can’t fail to see that 442,000 soldiers entered Russia, with a steady loss of 10,000 men per millimeter. Only 10,000 returned (6,000 of whom returning from the North). The army lost about 99% of its soldiers invading Moscow. One can also see the staggering loss at the Berezina River (“*C’est la Bérézina*“) and from cold spells. Two years later Napoleon fell.

With all the audiences a graph represents, with all the various elements, let me quip – A picture is worth a thousand dollars. By this I mean, in contrast to computing a set of means or tests, a statistical graph takes many, many iterations (time) to get just right. The elements of a graph include the title and subtitle, font and letter sizes, style of graph, the left (and right) and bottom axes, legends, embedded notes, colors, type of ticks, etc. All require discussion and many, many iterations. Multiply this by each dependent variable and costs rise. In my early days, I never gave my clients graphs – too much arguing about cost. Then I realized that most of my clients were visualizers, whose primary way of assimilating information is not conceptual nor verbal, but visual.

**A great graph can, on its own, grant an Agency’s approval.** We should replace the summary of a submission with a simple graph; the rest of the submission would be ‘filler’ and ‘boiler-plate’.

More about how to achieve this in Graph II.