*Names in bold indicate Presenter
Forecasting social welfare improvements from today’s scientific advances is a difficult task, but the first step towards that evaluation is understanding and quantifying the measurable outputs that NIH funding produces. If the last dollar spent on one topic is less likely to generate valuable scientific advances than a dollar spent on another topic, reallocation of NIH’s limited resources may be socially desirable.
Our analysis begins with data extracted from NIH RePORTER for the universe of research funding awarded by NCI from FY1992 through FY2012, including the project title and abstract, and related metadata. Our Labeled Latent Dirichlet Allocation (LLDA) topic modeling algorithms then combine metadata labels such as the study section that reviewed the proposal, and for more recent years, NIH’s own Research Categorization and Disease Classification (RCDC) labels, with investigator-generated text to generate a parsimonious set of topics representing each NCI-funded research project. We utilize probabilities generated by the LLDA procedure to partition each grant’s total funding, per year, across topics. Unique PubMed ID numbers and grant serial numbers allowed us to link NCI funding to its resulting research publications. Next, we identified the 1000 most common journals in which NCI-funded research was published over this period, and extracted our second major data corpus: article abstracts and metadata for all research articles published in those journals over the same 20-year period. Using the topic set derived from NCI awards, we infer topic labels for the PubMed articles, including NCI-funded, other NIH-funded, and non-NIH-funded publications. Then, using article-level citation data from Thomson Reuters’ Web of Science, we generate topic-normalized citation weights, allowing us to evaluate the relative quality and usefulness of NCI-funded publications, compared to non-federally-funded research publications covering the same topics.
Combining and organizing these data at the level of the topic-year, we investigate how the proportion of articles citing NIH funding for any given topic changes over time, and whether a temporary or permanent increase in NIH funding for a topic results in: (a) a higher proportion of papers on that topic citing NIH funding; (b) an increase in total publication output on that topic, and (c) long-run lagged effects on the direction of published research, even if NIH’s funding for a specific topic declines. Then, using relative distribution methods, we evaluate differences across topics in the distribution of projects’ output quantity and quality. These additional investigations will allow us to assess potential crowd-out by public funds, and also whether the NIH tends to invest more funds in more (or less) “risky” projects and topics (i.e., those with higher variance in project outcomes). Finally, we explore possible heterogeneous effects across grantee organization types and funding mechanisms.