Poster Paper: Machine Learning Algorithms in Educational Interventions: Application of Longitudinal Clustering

Friday, April 6, 2018
Mary Graydon Center - Room 2-5 (American University)

*Names in bold indicate Presenter

Alberto Guzman-Alvarez and Lindsay C. Page, University of Pittsburgh


Machine Learning (ML) in the age of big data in education is still in its infancy but has already borne great fruits, especially with the use unsupervised ML algorithms such as cluster analysis. Most of the early research has focused on clustering high-dimensional education data, for example, administrative data records. Little focus has been placed on clustering techniques applied to longitudinal data sets. In longitudinal research, an important question concerns the estimation of homogeneous trajectories (Genolini, Alacoque, Sentenac & Arnaud 2015). A standard way to analyze variable trajectories is to cluster individuals into distinct groups with homogenous characteristics. One advantage of using this data reduction technique is that it enables several continuous correlated variables to be reduced to a single categorical variable. This study focused on the application of ML clustering algorithms to a behavioral nudge intervention with the purpose of using the clusters to estimate heterogeneous treatment effects.

Data for this analysis came from an RCT that used an automated and personalized text message intervention to remind college-going students of required college enrollments tasks and connected them with counselor-based support via text message communication. These types of low-cost behavioral nudges are increasing popular in policy research, therefore, a worthy candidate for the use of ML algorithms. Specifically, data sets that come from students’ interaction within an intervention across several time points. The study included 20-time points with more than 20,000 thousand students participating. The uptake of the treatment varied, we observed variation in the intensity of that engagement relating to the number of messages sent to counselors during each time points as well as in student engagement as measured by message character count.

We use longitudinal clustering algorithms with various constructed student engagement trajectories as model inputs. With these clustering algorithms classify students into different engagement groups that reveal the common patterns of interaction in the intervention. This work will be extended into estimating treatment effects for various outcomes using the constructed trajectory clusters. These clustering methods could lead to insight into different trends of treatment uptakes by the students; this could inform targeted interventions that could address students who were engaging at various levels.