Comparative Analysis of Multiple Linear Regression and Random Forest Regression in Predicting Academic Performance of Students in Higher Education

Rachelle P. Tapio *

La Salle University, Ozamiz City, Philippines.

*Author to whom correspondence should be addressed.


Abstract

Aims: This study aimed to compare the predictive accuracy of Multiple Linear Regression (MLR) and Random Forest Regression (RFR) models in forecasting academic performance among Social Work students. Specifically, it sought to identify which among the considered variables—study habits, learning styles, stress, anxiety, coping mechanisms, study motivation, exam preparation, age, and parental income—most significantly influenced students’ academic outcomes.

Study Design: Quantitative predictive-correlational research methodology.

Place and Duration of Study: The research was conducted with 45 Social Work students enrolled in a Statistics course at a higher education institution in the Philippines during the second semester of the 2024–2025 academic year.

Methodology: Validated questionnaires were distributed via Google Forms to assess students’ behaviors and emotional states. Instrument reliability was confirmed using JAMOVI (version 2.6.23). Descriptive statistics were applied to evaluate mean scores, while Spearman’s rho was used for correlation analysis due to violations of normality. Predictive modeling was conducted using Python (Jupyter Notebook), employing both MLR and RFR to evaluate predictive power and identify significant factors. Model performance was assessed using the Mean Absolute Percentage Error (MAPE) and the coefficient of determination (R²).

Results: Descriptive analysis showed moderate levels of stress, anxiety, and coping mechanisms, with exam preparation yielding the lowest mean. Anxiety level displayed a moderate negative correlation with academic performance. Both MLR and RFR models identified Learning Style, Exam Preparation, and General Stress as the strongest predictors. The RFR model outperformed MLR, achieving a lower MAPE (2.9481) and higher R² (0.865) compared to MLR (MAPE = 3.3690, R² = 0.216), indicating stronger predictive accuracy and better handling of nonlinear relationships.

Conclusion: The Random Forest model outperformed the Multiple Linear Regression model in predicting academic performance, demonstrating higher accuracy and a stronger capacity to model nonlinear relationships. Learning Style, Exam Preparation, and General Stress emerged as the most influential predictors. These findings support the integration of machine learning approaches in educational research and suggest that tailored learning strategies, structured exam preparation, and stress management interventions may contribute to improved academic outcomes.

Keywords: Academic performance, higher education, multiple regression, random forest regression, machine learning, study habits, academic stress, resilience, teaching style, mean absolute percentage


How to Cite

Tapio, Rachelle P. 2025. “Comparative Analysis of Multiple Linear Regression and Random Forest Regression in Predicting Academic Performance of Students in Higher Education”. Asian Research Journal of Mathematics 21 (4):170-81. https://doi.org/10.9734/arjom/2025/v21i4919.

Downloads

Download data is not yet available.