Panel Paper: Understanding Growth Change In the US Scientific Workforce

Friday, November 9, 2012 : 10:05 AM
Schaefer (Sheraton Baltimore City Center Hotel)

*Names in bold indicate Presenter

Antonio Sanfilippo, Pacific Northwest National Laboratory

Economic studies have long shown that as much as 85% of measured growth in US income per capita can be explained in terms of technological change. A talented, diverse, innovative, stable, and strong scientific workforce is therefore the necessary presupposition for social and economy prosperity. The achievement of such a goal requires a clear understanding of the factors that impact on the growth of the US scientific workforce. For example, the participation of underrepresented minority groups (URM) to the US scientific workforce is currently the cause of increasing concern. URM comprised 28.5% of our national population in 2006, but only formed 9.1% of the scientific workforce, and are expected to grow to about 45% by 2050. A threefold increase of URM involvement in STEM education is thus needed for a strong US scientific workforce in the face of ongoing demographic changes. The first step towards designing policies that can lead to the increased participation of URM in the scientific force is to understand what factors are currently countering a higher educational attainment in STEM education for URM. The goal of this paper is to show how such an understanding can be achieved through an investigation of demographic changes in the population of science and engineering graduates who form the basis for US scientific workforce.

Our approach consists in (1) collecting evidence relevant to demographic dynamics of the scientific workforce, and (2) using the evidence collected to develop, calibrate and evaluate computational models of population change in the scientific workforce. We describe an embodiment of this approach in which we utilize the Surveys of Doctoral Recipients (SDR) compiled by the National Science Foundation to develop classification models that identify survey factors that are correlated with changes in three race cohorts of doctoral recipients: Underrepresented Minorities (URM), non-Hispanic Whites (Whites), and non-Hispanic Asians (Asians).

First, the survey data is transformed into vector structures suitable for machine learning processing. Each vector structure provides values for the parameters used in the surveys regarding employment, education and demographics, for each PhD recipient who participated in the survey. In one classification task, the resulting vector structures – about 30,000 per survey year – are partitioned into three classes, using the race parameter (URM, Whites, Asians). We use the Weka data mining platform ( to learn decision-tree classification models that identify PhD recipients cohorts by race in terms of the survey parameters. The model indicates the extent to which different employment, education, and demographic factors are correlated and how with each race cohort. In another classification tasks we build decision-tree models that characterize changes in each cohort across survey years, e.g. 2001, 2003 and 2006, in terms of survey factors. The correlations of survey factors to race cohorts and the interdependencies between survey factors in the models offer useful insights for the design of policies that target growth changes in the US scientific workforce.