*Names in bold indicate Presenter
This paper describes a new database available to the research community. Using a Bayesian supervised learning approach, we disambiguated all inventor names from the U.S. utility patent database, from 1975 to the end of 2010. We provide an overview of the disambiguation methods, assess their accuracy, characterize the resulting dataset, calculate network measures based on co-authorship, and provide illustrative examples. The dataset is available at the Patent Network Dataverse (http://dvn.iq.harvard.edu/dvn/dv/patent).
The paper also describes a data platform (API) that aims to improve access to federal agency grant data and its associated outcomes, which includes patents. The initial approach has been successfully applied to the National Science Foundation and is supplemented with programmatic access to a public facing Application Programming Interface. We also experimented with algorithmic approaches such as topic modeling to further create internal data linkages and expose data patterns. It offers exciting possibilities to drastically reduce the barriers to entry to a larger community of researchers and the potential to develop new data tools that can be built on top of automatically updated data sources.