Which Should You Trust: Scientific Literature or Anecdote?

In a comment on a BMJ paper critical of alternative medicine (the author submitted a fictional abstract to a conference then criticized the program committee for not rejecting it), a retired chemist named Joe Magrath said:

The scientific literature tells us that acupuncture, cupping and reflexology are all nonsense.

I haven’t looked into it but I’ll take his word for it.

Around the time Magrath said that, James Fallows said this:

During our years in Malaysia in the 1980s, and more recently in China, my wife and I became unlikely converts to a lot of Asian medical practices. I had serious back pain cured by an acupuncturist (who used needles the size of aluminum baseball bats) in Kuala Lumpur. In her book, my wife describes how the gruesome-seeming therapy of fire-cupping, applied in an all-night massage parlor in the city of Yueyang, snapped her out of a serious bout of the flu. Sure, she had big red welts on her back for the next ten days, but her fever was gone!

Which do you believe?

  1. My understanding: Chinese wouldn’t waste money on this stuff for thousands of years. The benefits of these treatments must outweigh the costs.

  2. After reading your blog for the past months, I have learned the only true science is the science that works for you. Research papers are only suggestions for you to try.

  3. I’m with Jake, especially when it comes to medical science. Because of the way most medical research is done, with an emphasis on group studies, individual differences and their consequences are not well understood. Doctors don’t seem equipped to take advantage of evidence that doesn’t fit an established pattern, especially if it’s weak evidence.

    That’s one of the reasons I think self-measurement and -experimentation seem to be so effective at improving health outcomes: medical science has a big hole in it, and if you fall into it, there’s a good chance it’s because of how you don’t fit established patterns. Studying your differences form the norm, then, is likely to be profitable.

  4. This stuff about ‘deciding for yourself’ is downright silly. In case you haven’t noticed, you (yes, YOU) suffer from innumerable cognitive biases and illusions. More importantly – as I point out in the post I linked to above – determining causality when n=1 (as it is in ‘your own case’) is basically impossible.

    ‘What works for me’ is usually ‘what I think works for me’ which often simply doesn’t.

  5. Michael, almost all experiments are n=1 in several ways (e.g., one classroom, one strain of rats), as John Tukey pointed out long ago. Yet science progresses. So there is a problem with your idea that “determining causality when n=1 is basically impossible”.

  6. @Michael,

    “determining causality when n=1 (as it is in ‘your own case’) is basically impossible”

    How do detectives determine causality? How do computer programmers? Cooks?

  7. Folks frequently complain that “it’s merely anecdotal evidence.” Actually all evidence is anecdotal. When you write a journal article, you are basically saying “I did this, and I observed that, and this is what I think it means.” Publication in an ‘official’ journal only says that an editor has decided to publish it, sometimes after consulting with one or more other people.

    If I hear a story from someone I have no reason to believe is lying to me, that can be as convincing (frequently more so) than a published account, by someone whose competence I must infer from the journal and institutional reputation.

    In fact, I am the most competent observer of my own experiences. Aren’t you of yours?

  8. @Seth: Well, no. Actually, yes but only in the short run. Science is a deeply social and collaborative enterprise that takes time. Scientist 1 investigates some topic by doing experiment A. Scientist 2 thinks this is dodgy, so does experiments B-C. Scientist 1 responds with experiments D-F, etc. etc. Over time, the same topic (if it’s important enough to warrant sustained attention) gets investigated from all sorts of different angles, disagreements are sorted out and (ideally) a consensus is reached. This is how we got consensus on evolution by natural selection, the efficacy of vaccines, general and special relativity, cold fusion, etc.

    This process takes time, is difficult and is almost invisible to non-specialists because it’s either conducted in recondite spaces (journals, etc.) or in private (conferences, bars, seminars, etc.). [Luckily, blogs are changing this somewhat. Now post-publication peer-review happens rapidly, in public, online].

    That is why science progresses. If scientists forever focused on only one line of evidence, we’d never make much progress.

  9. @Anthony: re detectives, programmers and cooks. Well… As far as I’m aware, cooks make many dishes, and the same dish repeatedly, so n=/1 for them. (Besides, they often have false beliefs). Similarly, it’s not as if programmers can’t do the equivalent of experiments on their programs. If I’m coding program X, I can change a bunch of code experimentally, and see what happens. Functionally, n/=1 even when working on only one program. As for detectives, they (1) are notoriously bad at their jobs, but, anyway, (2) are not concerned with subtle effects, unlike medicine.

    But can’t a self-experimenter do the same? Like a programmer? Well, no. A programmer has exact control over his program. He can start with v. 0.0.123 of the program, make changes, see what happens; then go back to v.0.0.123 and make different changes. No person, however committed, has anything close to as much control as this.

    Similarly, medical effects are often subtle: that is, effect sizes are small. No one person could ever have discovered, say, that taking aspirin (slightly) reduces the risk of developing heart disease. Exactly because n=1.

  10. @Michael,

    Interesting comments and interesting piece you linked to above. An issue that has received short shrift in the scientific methodology world is that “n” frequently equals one on the object or task side of the ledger in behavioral science experiments. Experimental psychologists, for instance, never seem to tire of taking a large subject sample size, testing these subjects on one task, and then concluding that their results generalize beyond that single task. No such generalization is justified.

    I think it is possible to learn from single subject experiments if done well, for instance by taking one person and testing this one person across a variety of tasks. Egon Brunswik did an interesting perceptual experiment (1944, I think) of this “n=1” kind when he followed a single subject around Berkeley for several days and randomly asked the subject to estimate the size of objects in her environment. DId Brunswik’s results generalize beyond the single person? Yes, to the degree that other people shared the relevant perceptual framework — two eyes, good vision, not color-blind, an environment offering the same cues that the subject encountered in her environment, etc. Brunswik had a theory (constancy) that he was able to test using a subject “n” of one.

  11. @Michael,

    “when n=1 (as it is in ‘your own case’)”

    “cooks make many dishes, and the same dish repeatedly, so n=/1 for them”

    Perhaps I am not understanding what you mean by n = 1 here, then. If I’m doing an experiment involving myself, I can do the same thing many days in a row. If I get the same result every day, is n > 1 according to how you are using this concept?

    “Besides, they often have false beliefs”

    Everyone has mostly false beliefs. The question is how to improve on our beliefs.

  12. Bob, you write “no such generalization [to other tests] is justified.” It is justified by the history of psychology, which shows that the results with one task turn out to predict the results with other tasks.

