I don’t often go to empirical talks, but when I do, I fall asleep. Recently, while so engaged, I dreamt of the `replicability crisis’ in Economics (see Chang and Li (2015)). The penultimate line of their abstract is the following bleak assessment:
`Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable.’
Eager to help my empirical colleagues snatch victory from the jaws of defeat, I did what all theorists do. Build a model. Here it is.
The journal editor is the principal and the agent is an author. Agent has a paper characterized by two numbers . The first is the value of the findings in the paper assuming they are replicable. The second is the probability that the findings are indeed replicable. The expected benefit of the paper is
. Assume that
is common knowledge but
is the private information of agent. The probability that agent is of type
is
.
Given a paper, the principal can at a cost inspect the paper. With probability
the inspection process will replicate the findings of the paper. Principal proposes an incentive compatible direct mechanism. Agent reports their type,
. Let
denote the interim probability that agent’s paper is provisionally accepted. Let
be the interim probability of agent’s paper not being inspected given it has been provisionally accepted. If a provisionally accepted paper is not inspected, it is published. If a paper subject to inspection is successfully replicated, the paper is published. Otherwise it is rejected and, per custom, the outcome is kept private. Agent cares only about the paper being accepted. Hence, agent cares only about
The principal cares about replicability of papers and suffers a penalty of for publishing a paper that is not replicable. Principal also cares about the cost of inspection. Therefore she maximizes
The incentive compatibility constraint is
Recall, an agent cannot lie about the value component of the type.
We cannot screen on , so all that matters is the distribution of
conditional on
. Let
. For a given
there are only 3 possibilities: accept always, reject always, inspect and accept. The first possibility has an expected payoff of
for the principal. The second possibility has value zero. The third has value .
The principal prefers to accept immediately over inspection if
The principal will prefer inspection to rejection if . The principal prefers to accept rather than reject depends if
Under a suitable condition on as a function of
, the optimal mechanism can be characterized by two cutoffs
. Choose
to be the smallest
such that
Choose to be the largest
such that
.
A paper with will be accepted without inspection. A paper with
will be rejected. A paper with
will be provisionally accepted and then inspected.
For empiricists the advice would be to should shoot for high and damn the
!
More seriously, the model points out that even a journal that cares about replicability and bears the cost of verifying this will publish papers that have a low probability of being replicable. Hence, the presence of published papers that are not replicable is not, by itself, a sign of something rotten in Denmark.
One could improve outcomes by making authors bear the costs of a paper not being replicated. This points to a larger question. Replication is costly. How should the cost of replication be apportioned? In my model, the journal bore the entire cost. One could pass it on to the authors but this may have the effect of discouraging empirical research. One could rely on third parties (voluntary, like civic associations, or professionals supported by subscription). Or, one could rely on competing partisan groups pursuing their agendas to keep the claims of each side in check. The last seems at odds with the romantic ideal of disinterested scientists but could be efficient. The risk is partisan capture of journals which would shut down cross-checking.
11 comments
January 17, 2016 at 3:07 am
Links for 01-17-16 | Economics Blogs
[…] TINA and the ACA – Paul Krugman […]
January 17, 2016 at 11:53 am
e abrams
I fall asleep…
uh, the point of your idiot model is …
isn’t snark fun ? nothing like dissing people to make yourself feel better
but seriously, the situation is even worse with theory, no ?
January 17, 2016 at 1:23 pm
rvohra
Dear e abrams
If the post or the model sent you into the arms of morpheus, you may have missed the following:
`More seriously, the model points out that even a journal that cares about replicability and bears the cost of verifying this will publish papers that have a low probability of being replicable. Hence, the presence of published papers that are not replicable is not, by itself, a sign of something rotten in Denmark.’
Re your question: isn’t the situation even worse in theory? There are two situations you could be referring to. First, that theory talks have an even greater soporific quality than empirical talks. A poll will probably support this claim. Second, reproducibility is even more of an issue with theory papers. It depends on the goal of the theory paper. One that aspires to make sharp predictions should be judged by those predictions. However, many theory papers have other concerns. For example, trying to understand the implications of a set of assumptions or checking the internal consistency of various stories and intuitions.
January 17, 2016 at 11:24 pm
blink
Very nice model. In these terms, the replicability zealots real complaint may be, “R is too low!”
Regarding solutions, the competing factions approach leverages the incentive to undercut opponents. But as you point out, capture is a real concern. Perhaps we can reward replication directly, effectively lowering K.
January 18, 2016 at 9:09 am
rvohra
Dear Blink
Thank you.
Rewarding replication directly raises a collection of interesting questions. First what is the optimal balance between production of new research and replication? Rewards for replication would change how scholars allocate their time. Second, how should one match `replicators’ to papers? Third, its not obvious that replicator effort is observable, so what is the signal one would use to base rewards on? As you might imagine, I consider this a feature not a bug of the proposal as it is an excuse to build even more models!
January 18, 2016 at 11:58 am
Nick
‘Replication’ in this context means ‘were we able to run the code they ran to get their results and if we were, did we get roughly something like the same numbers they reported?’ It’s an absolutely minimum standard of competency. They presumably ran their code to get their numbers. Why can’t we?
We may be currently in an optimal paper inspection regime, and an eventual replication failure rate of 50% among published papers may be the optimal outcome — given current research practices. But with a bare minimum of good research practice, we should have K = $1.99, and every published paper replicable.
Also, replace the concept of ‘replicate’ with ‘verify proof’ and apply the same analysis to theory work. Should we shoot for the most interesting possible theorems, and not bother with any of this difficult ‘proof’ business? (I’m really hoping the answer is ‘yes’ btw.)
January 18, 2016 at 3:26 pm
rvohra
Dear Nick
The answer to the question of what constitutes replication does not seem to be obvious. For experiments in social psychology this has been the subject of debate. For example, if the original experiment was run in English and the replication was done in Mandarin, does this count? In the life sciences it has long been known that certain techniques are not explicitly communicated. It is not unusual for one lab to send a delegation to another lab to `learn’ this tacit knowledge. When we move to data, I would have thought as you do that this should be cut and dried. I submit the algo and the data file and you run it and get the same outcome. Apparently not. See the Chang and Li paper. The difficulties in this case may be endogenous. If the authors were expecting their findings to be replicated, they may maintain their files and code in a way to lend themselves to replication.
The same logic applies also to the verification of proofs. However, I think, in most cases the cost of verification is relatively small. There is also a benefit to verification in that the correct proof of a high `v’ theorem usually involves the introduction of a new idea/technique that will be useful elsewhere.
Finally, I recall, Courant perhaps, saying that the difference between pure and applied maths is that the first cares about an interesting proof while the second cares about an interesting theorem. I think something like this applies to Economic theory. We want interesting theorems (that are true) and its even better when the proof is simple.
January 25, 2016 at 11:28 am
Joshua Gans (@joshgans)
I think that you may get a better outcome if R was a function of v; I suspect that the bigger the result is the harder it falls. If this was apportioned to the researcher the selection effect you posit may go away.
At present it seems that this doesnt quite happen for retractions and junior researchers suffer more than senior ones.
January 25, 2016 at 7:58 pm
rvohra
Dear Joshua
Agreed. However, as you push more of the
R' onto the author, one worries about discouraging authors from seeking high v. A
fuller’ model would incorporate a non-contractible project selection stage by the author. The author can expend costly effort to get a draw of a high `v’, as well as expend effort to push p up.January 29, 2016 at 7:26 am
nageebali
Consider a researcher who is selecting among projects within a set, and is uncertain about the value of p and v that shall be realized. Different researchers have different sets, and also differ in the cost they incur from their work not being replicable. With the thresholds that that you have characterized, researchers may find it optimal to go for the projects that are likely to realize high v’s, and sacrifice the p’s, in which case a journal may be better off publishing a lower value publication (that has a higher chance of replicability), violating the “suitable condition on {p_v} as a function of {v}….”
And, as in much of life, the rule that is ex post optimal may not be ex ante optimal given how it affects incentives.
February 1, 2016 at 9:01 pm
Rakesh Vohra
Dear Nageebali
Agreed, which suggests that a `fuller¹ model should include a non-contractible investment action by authors before submission. The investment would determine v and p.