A Classy Affair: Modeling Course Enrollment Prediction
CitationLee, Dianne. 2020. A Classy Affair: Modeling Course Enrollment Prediction. Bachelor's thesis, Harvard College.
AbstractThe problem of course enrollment prediction has many implications in the determination of university policy. Namely, logistic concerns around course planning cause many universities, Harvard among them, to consider moving away from allowing students a "shopping period" prior to finalizing their courses. This thesis addresses and evaluates these concerns through the development of several predictive models to forecast course enrollment figures in the context of section allocation.
Much of the prior research in enrollment prediction does not provide sufficiently rich evaluation of models fit for the problem of section allocation. Even earlier work in predicting Harvard section allocation does not model the problem comprehensively, not accounting for significant features of the system such as variance in departmental section sizes. Moreover, the existing literature only compares one type of machine learning model and baseline for specific test sets. Previous research does not address the divide between new courses, which inherently have a smaller feature set, and continued courses, other than to note differences in accuracy. Furthermore, existing models in the literature only utilize quantitative features to model course attributes, and do not consider the qualitative aspects considered by students and human baseline predictions.
This thesis addresses these gaps in the literature and performs experiments on updated data in order to develop and evaluate an updated and comprehensive approach to predicting enrollment in the context of Harvard's section allocation problem. Four evaluation metrics are developed based on previous work as well as qualitative interviews. Multiple machine learning approaches, automatic baselines, and human baselines are implemented and compared. New and continued courses are differentiated within the modeling process in order to analyze and retain their unique attributes. The qualitative feature of course topic relevance is approximated through natural language processing.
We found that for existing courses, both ML models and one automatic baseline outperformed the human baseline. The Random Forest model displayed the best performance across nearly all evaluation metrics, with past enrollment being the most significant feature. Within new courses, no models showed significant improvement over the human baselines across a sufficient number of metrics. However, in terms of predicting raw enrollment, all models outperformed the human and automatic baselines despite their limited feature set.
The findings from this thesis support the inclusion of predictive learning models in Harvard's course enrollment prediction and section allocation process. Although the results do not indicate that a machine learning model should replace human predictions entirely, particularly in the case of newly offered courses, predictive models offer insights and advantages over human baselines. Future work should consider further optimization of these models and the incorporation of a more complete feature set.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364768
- FAS Theses and Dissertations