*Names in bold indicate Presenter
Our approach consists in (1) collecting evidence relevant to demographic dynamics of the scientific workforce, and (2) using the evidence collected to develop, calibrate and evaluate computational models of population change in the scientific workforce. We describe an embodiment of this approach in which we utilize the Surveys of Doctoral Recipients (SDR) compiled by the National Science Foundation to develop classification models that identify survey factors that are correlated with changes in three race cohorts of doctoral recipients: Underrepresented Minorities (URM), non-Hispanic Whites (Whites), and non-Hispanic Asians (Asians).
First, the survey data is transformed into vector structures suitable for machine learning processing. Each vector structure provides values for the parameters used in the surveys regarding employment, education and demographics, for each PhD recipient who participated in the survey. In one classification task, the resulting vector structures – about 30,000 per survey year – are partitioned into three classes, using the race parameter (URM, Whites, Asians). We use the Weka data mining platform (http://www.cs.waikato.ac.nz/ml/weka/) to learn decision-tree classification models that identify PhD recipients cohorts by race in terms of the survey parameters. The model indicates the extent to which different employment, education, and demographic factors are correlated and how with each race cohort. In another classification tasks we build decision-tree models that characterize changes in each cohort across survey years, e.g. 2001, 2003 and 2006, in terms of survey factors. The correlations of survey factors to race cohorts and the interdependencies between survey factors in the models offer useful insights for the design of policies that target growth changes in the US scientific workforce.