The Office of Disease Prevention’s (ODP) Strategic Priority I focuses on the development of new tools to characterize the National Institutes of Health (NIH) prevention research portfolio. The ODP is working closely with colleagues in the NIH Office of Portfolio Analysis (OPA) to develop these new tools and will use them to examine the NIH prevention research portfolio and develop reports that can be shared with collaborators within and outside of the NIH to monitor the progress of NIH prevention research.
This work is being developed in five phases.
Phase 1: Develop a taxonomy, or classification system, to characterize the abstracts of NIH-funded prevention research awards across six categories.
The ODP’s Strategic Priority I Team has developed a prevention research taxonomy along with a detailed protocol that serves as a set of rules for coding abstracts. The protocol provides teams of coders with instructions, definitions, and examples to facilitate the accurate, standardized classification of awards.
Phase 2: Manually code Type 1 (new) R01 grant abstracts to establish a set of “gold standard” examples of how abstracts should be coded across a wide range of prevention research projects.
The ODP developed custom software called the Prevention Abstract Classification Tool (PACT) to support staff who are coding awards based on the taxonomy. This web-based platform records and assigns abstracts for coding, provides quick access to the coding protocol, and captures individual and consensus coding. PACT is also integrated with SAS to calculate inter-rater reliability. To date, the Strategic Priority I Team has completed manual coding for 8,159 Type 1 (new) R01 grants in Fiscal Years 2010–2016.
Phase 3: Use these examples to develop a machine-learning algorithm.
Using these manually coded grants, the ODP collaborated with the OPA to develop an automated coding process that uses machine learning to identify prevention research. Early results indicate that the machine-learning tool achieved 89% accuracy in identifying prevention research grants.
Phase 4: Validate the machine-learning algorithms with manual coding.
In addition, the OPA has developed a user-friendly interface, the Prevention Research Output Validation Engine (iPROVE), which combines PACT data and data from the NIH’s internal grant database, IMPAC II. iPROVE allows the ODP to easily query the two sources of data and then review and either accept or reject the predictions made by the machine-learning tool. In doing so, the ODP can further train and refine the algorithms used by the machine-learning tool and therefore improve the accuracy with which the tool identifies prevention research grants.
Phase 5: Apply the taxonomy to other NIH activity codes.
After classifying the Type 1 R01 grants in Fiscal Years 2010–2016, the Strategic Priority I Team has begun to apply the machine-learning process to identify prevention research grants funded under other NIH activity codes, beginning with Type 2 (renewal) R01s, R03s, and R21s. A random sample of each of the identified activity codes will be manually coded to achieve an accurate view of the entire NIH prevention research portfolio.
The Strategic Priority I Team has given several presentations to NIH audiences about its progress. Many Institutes, Centers, and Offices have expressed interest in adopting or adapting the taxonomy protocol, PACT, the team-coding process, and/or the machine-learning process to assess and monitor their own portfolios.
Rigorous analysis of the NIH prevention research portfolio, which is dependent on more accurate classification of NIH grants, will enable the identification of funding patterns and trends, as well as research areas that may benefit from targeted investments. Those investments may help us address important modifiable risk factors and therefore reduce the burden of disease.
David M. Murray, Ph.D.
Associate Director for Prevention
Director of the Office of Disease Prevention