CSR designed a pilot study to test whether allowing reviewers to vote in 0.5 point increments — based upon the score range set by the assigned reviewers at their meeting — can reduce score compression and ties by giving reviewers more flexibility in score choices following discussion of applications.”
The current NIH application scoring system gives reviewers nine score choices (1-9). In practice, many reviewers only use a portion of this scoring range in their initial scoring and during their meeting discussions, where the majority of applications are scored in the 2-4 range. In other words, reviewers frequently only use three of their scoring choices to evaluate the best applications, and the best score (1) is being sparingly used by most panels.
This can contribute to score compression in which a high percentage of applications are scored in a fairly narrow range. For example, a moderately compressed study section could score more than 25% of its applications with a score between 30 and 10. This problem may be exacerbated by low funding levels. In study sections with severe compression, competitive applications are often scored within only two score choices (1 and 2).
In addition, a review of scoring patterns across NIH has revealed peaks in the distribution of final overall impact scores at 20 and 30, indicating many tied scores. Score compression and ties make it difficult for program officers to distinguish between the very best applications reviewed in study sections, particularly when several applications receive identical scores and/or percentile ranks within the same study section.
CSR designed a pilot study to test whether allowing reviewers to vote in 0.5 point increments, based upon the score range set by the assigned reviewers, can reduce score compression and ties by giving reviewers more flexibility in score choices following discussion of applications.
Pilot Design
The pilot was run alongside the normal review process at study section meetings in the 2016/05 and 2016/10 council rounds. Thirty-three study sections in the 2016/05 and 11 study sections in the 2016/10 council rounds participated.
Following the discussion of applications, the assigned reviewers set the range with integer scores following standard guidance. All reviewers entered integer scores for the official scoring of applications based on the score range unless indicating that they intended to vote outside the range.
Reviewers were provided with separate score sheets that included a column for unofficial half point scores ranging from 0.5 points below their official score to 0.5 points above the official score. For example, if the score range for an application was 2-3, the allowable range for final scoring became 1.5 – 3.5. If a reviewer had scored an application with a 2 officially, the reviewer could enter 1.5, 2.0 or 2.5 in the half point column; for a score of 3, the reviewer could enter 2.5, 3, or 3.5.
A score of 1 remained the best possible score; a score of 0.5 was not allowed. Half point scores 0.5 lower or 0.5 higher than the score range did not require reviewers to identify themselves as voting outside the range. If reviewers had already identified themselves as officially voting out of the range, they could enter a score either 0.5 lower or 0.5 higher than the official score they entered.
Results
Aggregated analysis of the data revealed that, when reviewers used the half point option, they were more likely to raise the score of an application (for example, a move from 2 to 2.5) than to reduce it (a move from 2 to 1.5).
Scores calculated from half point data were compared to the official scores for 1,371 discussed applications from 39 study sections. The percentage of scores at 20, 30 and 40 was reduced when reviewers were given the half point option, as shown in the figure below. Score compression was also improved for a few study sections that were highly compressed based on the official scores.
Reviewer Survey Found Strong Support for a 0.5 Scale
Surveys were sent via email to 311 reviewers in the 11 study sections participating in the 2016/10 council round to gain feedback on the half point pilot. Reviewers were asked to complete the survey near the end of the meeting. They were assured that their response was voluntary, identities would not be disclosed, and only aggregated responses would be used in analysis.
Completed surveys were received from 138 reviewers, 118 of which used the half-point increments during final scoring. Overall, a majority of reviewers indicated that the option of providing half-point increments improved their ability to prioritize applications based on impact and that the resulting scores more accurately reflected the scientific merit. Two thirds of surveyed reviewers agreed or strongly agreed that NIH policy should be changed to permit half-point increments in scoring.
Read Comments on “Pilot Results: Scoring with an Expanded Haf-Point Scale”