Of course, this is a rant. You can tell by the title. But, for the record, let me be forward that I don’t have an idea how to solve the problem that I will be lamenting, nor do I think that I am doing a better job than everyone else at assessing mathematics done by other people. I am writing this little rant as the beginning of a thought process about how to improve things. Now that this little apology is out of the way, let me lament away.
Once in a while I compare the rigor and care that I exercise when checking whether a piece of mathematics is correct, with the methodology that I employ when evaluating the quality of mathematical output, such as when I referee a paper for a journal, or write a report for a grant funding agency, when considering job applicants and even when judging the worth of my own work. The difference is like earth and sky. I believe that I am not alone in this.
In a mathematical proof, every single claim has to follow with fool-proof logic from previously established propositions, every reference needs to be precise. How careful we are when crafting these arguments! But when assessing a paper submitted to a journal? Well, we do our best.
I read a paper carefully and convince myself that it is correct and new. But is it good enough? I squint at the paper and write: “The paper under review does not meet the standard of the Journal of A” and recommend rejection. But if this paper would have been submitted to another journal, I might have written “I recommend to publish this paper in the Journal of B“. Because JOB is somewhat less prestigious than JOA, you see. It’s not that I have read more than a couple of papers that were published in either of these two journals in the past five years, I just know that JOA is better, I can just tell that this paper does or doesn’t cut the bar. As a member of the community I seem to have some idea of how good a paper has to be in order to be published in this or that journal. And that’s that.
Recommending that their paper be rejected is not the worst thing that you can do to someone. Big deal! Submit somewhere else. But we also need to consider job applicants. And here, to our horror, we need to assess the quality of work done in a field in which we are not experts. Our chief tool – at least for screening and forming a short list – is the list of publications. We read the list very carefully, we check the timeline against the output, but most importantly, we look at journal names. If there are Top Journals in the publication list, then this applicant is considered to be promising.
Applicants are compared one against the other. In a way it is like a card game: every list of publications is like a hand that the applicant obtained. There are rules. Some rules are agreed upon universally, some are “house rules” that come in different variants. All things being equal, having more papers is better. But the quality of the venue counts much more. In the imaginary example above JOA beats JOB. In math, there are four big journals – Annals of Mathematics, Acta Mathematica, Journal of the American Math Society, and Inventiones mathematicae – and they play the role of joker cards – they beat anything. I think that these rules do make some sense, because if you use the criterion “published in the top four” for deciding if someone is an excellent mathematician, you are going to have a very low false positive error rate. Even if this method has a high false negative rate, from the point of view of top institutions doing the hiring it makes sense to use this proxy.
Most mathematicians are wary of using bibliometrics. We are not so stupid that we make judgements based on quantitative measures that can be gamed. Most of us have our own personal journal ranking system. We might all agree what the big four journals are, but then there are dozens of very selective journals and we have different opinions about how they compare one with the other. I recently decided that I want a paper in a top journal so I submitted to Journal X. Happily the paper was accepted, but then a friend told me that Journal X is not as selective as it used to be. Besides being pissed at this, I thought how did you reach this conclusion? Surely he hasn’t evaluated a significant number of papers over the years but is making an opinion based on a small sample of papers. It’s ironic: we so often use the prestige of a journal to assess the quality of a paper, that assessing the quality of a journal by a small sample of papers it published almost seems like circular reasoning.
To complicate things even more, it is also the case that different people will employ different strategies when choosing a journal to which they submit their work. So even if we try to apply uniform standards for evaluating job applicants, the job applicants will not be lending themselves to a uniform evaluation since they approach our ranking machine from different directions.
It has happened to me that an excellent paper that I submitted to a good journal came back with two contradicting referee reports. A reasonable conclusion from this experience is that whether or not a paper of given quality gets accepted to a certain journal is a random variable. Note that I am not claiming that it is a 50:50 toss up, but rather that when one submits to a journal one is making an educated bet. Sure, most papers have 0% chance of being published in Acta. But some very excellent papers were rejected from Acta, meaning that their authors could not predict that they would be rejected for certain. In some cases it could be the negative opinion of a single referee out of a handful that leads to rejection – things might have worked out differently with a little bit of luck.
Suppose you are a PhD student, and you have written what you believe to be an excellent paper. Where should you submit it? The higher you send it, the bigger the prize, you will be able to apply for jobs in more prestigious places. But the higher you send it, the greater the chances that it will eventually get rejected, and that lowers the chances of having an actual publication when you apply for a postdoc. The same dilemma arises at every step of one’s career. Different people choose different strategies, based on their personalities, connections, responsibilities and safety nets.
To be clear: when evaluating a job applicant for a position, nobody relies on blindly reading a list of publications. We look into the research, applicants are often invited to give a talk, people in the department who work in related fields might be able to weigh in. Most importantly, letters of recommendations are solicited. The letters of recommendations are written by experts in the field who can explain the novelty and difficulty of the applicant’s work, and sometimes give very delicate and valuable information, such as comparing the applicant to researchers in a similar stage.
Letters make us more informed, but the job of taking a set of applicants, each with their own set of letters of recommendation, and ranking them according to the letters is not a very well defined task with rigorous methodology. Different applicants typically have disjoint sets of letter writers, different writers have different styles, and different readers come up with different interpretations for the same letters!
Perhaps it is unavoidable that a mathematician operating outside of mathematics will feel the tension between the rigorous standards of our profession and the subjective task of assessing quality and making a value judgement. Maybe the question to ask is not whether we are being rigorous or consistent, but whether we end up making good decisions.