Assorted Links

Thanks to Peter Spero and VeganKitten.

Andrew Gelman’s Top Statistical Tip

Andrew Gelman writes:

If I had to come up with one statistical tip that would be most useful to you–that is, good advice that’s easy to apply and which you might not already know–it would be to use transformations. Log, square-root, etc.–yes, all that, but more! I’m talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something [more] continuous (those “total scores” that we all love). And not doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don’t care if the threshold is “clinically relevant” or whatever–just don’t do it. If you gotta discretize, for Christ’s sake break the variable into 3 categories.

I agree (and wrote an article about it). Transforming data is so important that intro stats texts should have a whole chapter on it — but instead barely mention it. A good discussion of transformation would also include use of principal components to boil down many variables into a much smaller number. (You should do this twice — once with your independent variables, once with your dependent variables.) Many researchers measure many things (e.g., a questionnaire with 50 questions, a blood test that measures 10 components) and then foolishly correlate all independent variables with all dependent variables. They end up testing dozens of likely-to-be-zero correlations for significance. Thereby effectively throwing all their data away — when you do dozens of such tests, none can be trusted.

My explanation why this isn’t taught differs from Andrew’s. I think it’s pure Veblen: professors dislike appearing useful and like showing off. Statistics professors, like engineering professors, do less useful research than you might expect, so they are less aware than you might expect of how useful transformations are. And because most transformations don’t involve esoteric math, writing about them doesn’t allow you to show off.

In my experience, not transforming your data is at least as bad as throwing half of it away, in the sense that your tests will be that much less sensitive.

Obesity and Your Commute

In the 1950s — before the invention of BMI (Body Mass Index) — Jean Mayer and others did a study of obesity at a factory in India. They divided workers by how much exertion their job required. Almost everyone, even desk clerks, was thin, with the exception of the most sedentary. It appeared that walking one hour per day (to and from work) was enough to get almost all the weight loss possible with exercise. Doing more had greatly diminished returns. A study with rats suggested the same thing. Bottom line: If you’re sedentary, you can easily lose weight via exercise, which can be as simple as walking to work. If not, it’s hard.

This month GOOD has a kind of update of that ancient study — a scatterplot, each point a different country, that shows percentage of obesity and fraction of commutes that are active (bike or walk). It supports what Mayer and others found — that how you get to work makes a difference. If you fitted a line to the data it would have a negative slope (more obesity, less active commutes). America has the most obesity and relatively few active commutes; Switzerland has the most active commutes and relatively little obesity. The graph also suggests that other factors matter a lot. Although Australia has less active commutes than America, it also has less obesity.

John Tukey and GPS

In this amusing article Emily Yoffe tells about her troubles with GPS. She fails, unfortunately, to look on the bright side — to say how flawed GPS is better than no GPS. After a talk by John Tukey, the statistician, at Berkeley, I told him that I had found the tools he wrote about in Exploratory Data Analysis to be really helpful. (For example, smoothing my data led me to discover that eating breakfast made me wake up too early.) Tukey replied that if the tools are helpful half the time, that’s good. It isn’t easy to make an interesting response to a compliment!

Something is better than nothing.

Self-Tracking: What I’ve Learned

I want to measure, day by day, how well my brain is working. After I saw big fast effects of flaxseed oil, I realized how well my brain works (a) depends on what I eat and (b) can change quickly. Maybe other things besides dietary omega-3 matter. Maybe large amounts of omega-6 make my brain work worse, for example. Another reason for this project is that I’m interested in how to generate ideas, a neglected part of scientific methodology. Maybe this sort of long-term monitoring can generate new ideas about what affects our brains.

So I needed a brain task that I’ll do daily. When I set out to devise a good task, here’s what I already knew:

1. Many numbers, not one. A task that provides many numbers per test (e.g., many latencies) is better than a task that provides only one number (e.g., percent correct). Gathering many numbers per test allows me to look at their distribution and choose an efficient method of combining (i.e., averaging) them into one number. (E.g., harmonic mean, geometric mean, trimmed mean.) Gathering many numbers also allows me to calculate a standard error, which helps identify unusual scores.

2. Graded, not binary. Graded measures (e.g., latencies) are better than binary ones (e.g., right/wrong).

Every experimental psychologist knows this. What none of them know is how to make the task fun. If I’m going to do something every day, it matters a great deal whether I enjoy it or not. It might be the difference between possible and impossible. People enjoy video games, which is a kind of existence proof. Video games have dozens of elements; which matter? Here’s what I figured out by trial and error:

3. Hand-eye coordination. Making difficult movements that involve hand-eye coordination is fun. My bilboquet taught me this. Presumably this tendency originated during the tool-making hobbyist stage of human evolution; it caused people to become better and better at making tools. Ordinary typing involves skilled movement but not hand-eye coordination. This idea has worked. I led me to try one-finger typing (where I look at the keyboard while I type) instead of regular typing. And, indeed, I enjoy the one-finger typing task, whereas I didn’t enjoy the ordinary typing tasks I’ve tried.

4. Detailed problem-by-problem feedback. Right/wrong is the crudest form of feedback; it doesn’t do much. What I find is much more motivating is more graded feedback based on performance on the same problem.

5. Less than 5 minutes. The longer the task the more data, sure, but also the more reluctant I am to do it. Three minutes seems close to ideal: long enough for the task to be a pleasant break but not so long that it seems like a burden.

Experimental psychology is a hundred years old. Small daily tests is an unexplored ecology that might have practical benefits.

Unfortunate Obituaries: The Case of David Freedman

One of my colleagues at Berkeley didn’t return library books. He kept them in his office, as if he owned them. He didn’t pay bills, either: He stuck them in his desk drawer. He was smart and interesting but after he failed to show up at a lunch date — no explanation, no apology — I stopped having lunch with him. He died several years ago. At his memorial service, at the Berkeley Faculty Club, one of the speakers mentioned his non-return of library books and non-payment of bills as if they were amusing eccentricities! I’m sure they were signs of a bigger problem. He did no research, no scholarly work of any sort. When talking about science with him — a Berkeley professor in a science department — it was like talking to a non-scientist.

David Freedman, a Berkeley statistics professor who died recently, was more influential. He is best known for a popular introductory textbook. The work of his I found most interesting was his comments on census adjustment: He was against adjusting the census to remove bias caused by undercount. This was only slightly less ridiculous than not returning library books — and far more harmful, because his arguments were used by Republicans to block census adjustment. The  undercounted tended to vote Democrat. The similarity with my delinquent colleague is the very first line in Freedman’s obituary: He “fought for three decades to keep the United States census on a firm statistical foundation.” Please. A Berkeley statistics professor, I have no idea who, must have written or approved that statement!

The obituary elaborates on this supposed contribution:

“The census turns out to be remarkably good, despite the generally bad press reviews,” Freedman and Wachter wrote in a 2001 paper published in the journal Society. “Statistical adjustment is unlikely to improve the accuracy, because adjustment can easily put in more error than it takes out.”

There are two kinds of error: variance and bias. The adjustment would surely increase variance and almost surely decrease bias. The quoted comments ignore this. They are a modern Let Them Eat Cake.

Few people hoard library books, but Freedman’s misbehavior is common. I blogged earlier about a blue-ribbon nutrition committee that ignored evidence that didn’t come from a double-blind trial. Late in his career, Freedman spent a great deal of time criticizing other people’s work. Maybe his critiques did some good but I thought they were obvious (the assumptions of the statistical method weren’t clearly satisfied — who knew?) and that it was lazy the way he would merely show that the criticized work (e.g., earthquake prediction) fell short of perfection and fail to show how it related to other work in its field — whether it was an improvement or not. As they say, he could see the cost of everything and the value of nothing. That he felt comfortable spending most of his time doing this, and his obituary would praise it (“the skeptical conscience of statistics”), says something highly unflattering about modern scientific culture.

For reasonable comments about census adjustment, see Eriksen, Eugene P., Kadane, Joseph B., and Tukey, John W. (1989). Adjusting the 1980 census of population and housing. JASA, 84, 927-943.

Is Your Milk Safe? A Statistical Fable

This recently happened in a class at the Beijing Language and Culture University:

TEACHER Your milk is safe if you buy it at a supermarket.

STUDENT What do you mean, “supermarket”? Where else could you buy it?

TEACHER That’s a good question, I don’t know the answer. They told us to say that.

When analyzing their data, a vast number of scientists more or less blindly do what a statistics book told them to do, just as this teacher said what she’d been told to say. Even worse, a vast number of statistics textbook writers simply copy other textbooks (not word for word, just the ideas and recommendations). The scientists and the textbook writers take refuge in false certainty. They fail to grasp that although the recommendations are black and white, the world is not — just as it isn’t black and white what milk is safe. Unlike this particular classroom, no one questions this.

Thanks to Sally McGregor.

How to Spot Incompetence

Nassim Taleb says, “When someone says he’s busy, he means that he’s incompetent.” I think he also distrusts anyone wearing a tie. In college, I wrote an essay called “The Scientific _______” in which I argued that any writer who uses the term scientific without explaining what it means is incompetent and you should stop reading immediately.

I still believe that. Now, for the first time, I am going to update my list of incompetence giveaways: Plotting something on a raw scale that should be on a log scale. Size-versus-time data should usually have the size axis on a log scale.

This presentation by someone at Sequoia Capital, the Silicon Valley venture capital firm, is full of examples. The Dow Jones Industrial Average (from the 1960s to now) is on a raw scale (where the distance from 5 to 10 equals the distance from 10 to 15), should be on a log scale (where the distance from 5 to 10 equals the distance from 10 to 20). Same for an index of housing prices. Same for the Nikkei. Many other examples. You can still believe the data, of course; just don’t trust what’s concluded from the data. Given the ubiquity of this practice (plotting on a raw scale what should be on a log scale), especially among financial supposed-experts, Taleb and I are not far apart.

More Taleb makes a similar point in his online notebook. Writing about a debate with Charles Murray:

Finally I showed a graph of the rise of the US stock market since 1900, on a regular (non-log) plot. Without logarithmic scaling we see a huge move in the period after1982 –the bulk of the variation comes from that segment, which dwarfs the previous rises. It resembles Murray’s graph about the timeline of the quantitative contributions of civilization, which exhibits a marked jump in 1500. Geometric (i.e. multiplicative) growth overestimates the contribution of the ending portion of a graph.