I haven’t been interested in the work of John Ioannidis because it seems unrelated to discovery. Ioannidis says too many papers are “wrong”. I don’t know how the fraction of “wrong” papers is related to the rate of discovery. For example, what percentage of “wrong” papers produces the most discovery? Ioannidis doesn’t seem to think about this. Yet that is the goal of science — better understanding. Not “right” papers.
Almost all important health discoveries are discoveries of new cause-effect relationships. If you do X, Y happens. My view of the problem with modern health science is nothing like what Ioannidis and other critics (such as the “couldn’t replicate Finding X” critics) say. It is lack of progress on major health questions (e.g., what causes depression?), emphasized every year by awarding of the Nobel Prize in Medicine to research of little or no practical value. Almost every year, the Nobel Prize press office says the honored research will be useful in the future. The lack of progress shows no sign of ending.
The best that can be said about recent critics of science, such as Ioannidis and Danny Kahneman, my former colleague, is they see there’s a problem. The worst that can be said about them is they fail to understand the cause of the problem. This is why their proposed solutions could easily make the problem worse.
Whenever you do an experiment — psychology and the health sciences are almost all experimental — you “use up” the effect you are studying (X causes Y). You can do an experiment to learn if X causes Y only so many times. After that, you know the answer and a new experiment is pointless. Professional scientists are only able to test ideas (cause-effect statements) that are fairly plausible. With such ideas, a publishable outcome is likely enough to be worth the cost of testing. They are unable to test implausible ideas, because such experiments are not likely enough to produce a publishable outcome. With limited resources, they must generate a certain number of published papers per year, at least if they want a career.
To have a viable system, you need to generate new plausible ideas at at least the same rate you are using them up. Otherwise you will run out. You must design your experiments so that they accomplish this. Not necessarily every experiment, but your experiments in aggregate. It isn’t easy to find new plausible ideas. If you think I’ll just get on with my career, generating papers as fast as possible and leave it to someone else to come up with new ideas worth testing, then your field will run downhill as plausible ideas are used up and not replaced. This is what has happened in several fields, including mine (animal learning). In psychology much greater concern about both fraud and lack of replicability have started at about the same time. I believe both (more fraud, more lack of replicability) stem from the increasing difficulty of honest (or more honest) research.
A friend who is a psychology professor agreed with me that psychologists — at least him — didn’t know how to generate new ideas worth testing. “Do you?” he asked. I said I did:
1. They [= psychologists] should modify their data collection. In my experience, new ideas almost always come from carefully collected data. They don’t come from introspection, talking to friends, reading the newspaper, watching TV, going to talks, etc.
2. Finding new ideas worth testing means finding new ideas that are plausible enough to be worth the cost of testing. To find new ideas with sufficient plausibility to test you need to test implausible ideas. A small fraction will pass the test, gaining plausibility. They will become sufficiently plausible to be worth testing.
3. To test implausible ideas in a career-consistent manner, you need to be able to test them very cheaply. Few if any psychologists have thought about this. They don’t realize how important it is.
When you have very cheap tests, you can test far more ideas than you can if you only have expensive tests. You need a “test set”: very cheap tests, cheap tests, almost-cheap tests, and so on. Ideas that pass a very cheap test become worth testing with a cheap test, those that pass a cheap test become worth testing with an almost cheap test, and so on. With current methods (all tests are expensive), perhaps social psychology professors who want to publish have a set of 50 ideas that are plausible enough to be worth the cost of testing. Those ideas get tested over and over, using them up. Were cheap tests available, perhaps the same professors could choose from a set of 1000 ideas those they want to test. Of those 1000 ideas, 950 were too implausible to test with expensive tests. Among those 950, I believe, would be some ideas that when tested seemed to be true.
I came to these beliefs trying to understand why my self-experimentation did a good job of finding new ideas worth testing. I concluded that the secret was this: I was able to test implausible ideas very cheaply — thousands of times more cheaply than professional scientists. Self-tracking — keeping track of my sleep, for example, and looking for outliers — was a very cheap way of getting new ideas about what controls sleep. Self-experimentation was a slightly more expensive (but still very cheap) way to test ideas that self-tracking came up with.
Many people have complained about a lack of replicability problem in psychology, including my friend and co-author Hal Pashler. An obvious solution is to raise the bar for publication: require better (= stronger) evidence. Sure, this will improve the quality of testing, but how will it affect the rate of production of plausible new ideas? My cost-of-test proposal suggests it will reduce that rate of production. I am saying that cheap tests are all important. Raising the publication bar will make the only test you have more expensive. What if the replication problem is a response to lack of plausible new ideas? Then this solution to the problem would make the problem worse.