New Papers from the ODP: Characterizing the NIH Prevention Research Portfolio Using a Novel Machine Learning Approach

Headshot of Sheri D. Schully, Ph.D.


Guest message from Sheri D. Schully, Ph.D. Dr. Schully leads the ODP team that analyzes and monitors NIH investments in prevention research, including developing new methods and tools for characterizing NIH research.


For the past several years, Dr. Schully and her team have been working to develop better approaches for identifying the characteristics of NIH-funded prevention research studies, and to summarize their findings in a meaningful way. In October 2018, the first two papers from the ODP’s analysis of the NIH prevention research portfolio were published in the American Journal of Preventive Medicine (AJPM): NIH Primary and Secondary Prevention Research in Humans During 2012−2017 and A Machine Learning Approach to Identify NIH-Funded Applied Prevention Research.

Briefly, why is the project that led to these papers so important, and what did it entail?

Analyzing the NIH prevention research portfolio makes it possible to identify funding patterns and trends, as well as research areas that may benefit from targeted investments by the NIH. Those investments may help us address important modifiable risk factors and, therefore, reduce the burden of preventable disease.

To address this need, our team developed novel and comprehensive methods and tools to better characterize the NIH prevention research portfolio. These included a prevention research taxonomy, a team-coding approach, and an accompanying protocol to ensure consistent and standardized classification of NIH-funded prevention research projects.

Our team also collaborated with the NIH Office of Portfolio Analysis (OPA) to develop novel machine learning (ML) algorithms that identify prevention research projects. Using ML to characterize the NIH prevention research portfolio is an efficient way for the ODP to describe trends in NIH-funded prevention research, identify gaps in the NIH prevention research portfolio, and ultimately help inform the agency’s funding priorities.

Our AJPM paper about using a ML approach describes how we—with our OPA colleagues—developed and validated this new method to more accurately identify applied prevention research projects funded by the NIH.

Overall process of characterizing the NIH prevention research portfolio using a novel machine learning algorithm. #1: Database of funded NIH grants (FY2012-2017). #2: Feed into Machine Learning Program. #3: Machine learning program identifies prevention and non-prevention projects. #3 50% of prevention projects are manually coded and 5% of non-prevention are manually coded by staff. #4: Quality control checks of project coding. #5: Extrapolation of data. #6: Database of well-annotated NIH-funded prevention research.

What were some of the most significant findings?

Our team coded more than 11,000 research projects across 12 activity codes awarded in FY 2012–2017, leading to the first-ever detailed analysis of the NIH prevention research portfolio. For these activity codes, primary and secondary prevention research represented 16.7% of research projects and 22.6% of research funding.

Study designs of NIH-funded prevention research, FY2012-2017. Observational study = 63.3% (confidence interval = 61.1-65.5%). Analysis of existing data = 43.4% (confidence interval = 41.3-45.6%). Methods research = 23.9% (confidence interval = 21.9-26.1%). Randomized intervention = 18.2% (confidence interval = 16.7-19.7%). Pilot/feasibility study = 11.3% (09.9-12.9%). Nonrandomized intervention = 6.2% (5.4-7.3%). Unclear = 3.5% (2.7-4.4%).

The large proportion of prevention research projects that included observational studies (63.3%), analysis of existing data (43.4%), or methods research (23.9%) was surprising when compared with the significantly lower proportion of projects that included a randomized intervention (18.2%).

Primary prevention was by far the most common type of prevention research in the portfolio, at 62.3%. Methods research and secondary prevention represented smaller, but still substantial fractions, at 23.3% and 19.2%, respectively. Studies focused on screening were much less common, particularly screening for risk factors. This finding is consistent with U.S. Preventive Services Task Force reports, which regularly call for more evidence on screening for preventable conditions.

A question for the NIH and the research community is whether these proportions are appropriate. There is no agreed upon answer to this question, and the need for prevention research in humans is variable, depending on how much and what kind of research has already been conducted in a given area. The NIH must also balance the needs of many areas of science as it fulfills its mission to uncover new knowledge that can lead to better health for everyone.

Our AJPM paper about NIH-funded primary and secondary prevention research between 2012 and 2017 describes and discusses our findings in full, including the study rationales and populations covered in the NIH prevention research portfolio. The paper provides the most detailed and carefully validated analysis ever conducted of the NIH’s prevention research portfolio, and the data helps inform the conversation about the appropriate level of NIH support for prevention research.

What’s next?

From the outset, our team recognized that manually coding the entire NIH portfolio for all the categories in our taxonomy would be impractical. For that reason, we’re continuing our collaboration with the OPA to refine and use our ML approach to evaluate the sensitivity and specificity for each topic in the taxonomy. The ODP will continue to use these new methods to identify and analyze new prevention research grants and to monitor the NIH prevention research portfolio as a whole. We will also explore the value of expanding our efforts to examine submitted applications, not just funded projects.


Interested in more publications from ODP staff members?

Check out our Staff Publications, Posters, & Presentations page.