JMIR Mhealth Uhealth. 2025 Nov 6;13:e70987. doi: 10.2196/70987.
ABSTRACT
BACKGROUND: The expansion of mobile health app or apps has created a growing need for structured and predictive tools to evaluate app quality before deployment. The Mobile App Rating Scale (MARS) offers a standardized, expert-driven assessment across 4 key dimensions-engagement, functionality, aesthetics, and information-but its use in forecasting user satisfaction through predictive modeling remains limited.
OBJECTIVE: This study aimed to investigate how k-means clustering, combined with machine learning models, can predict user ratings for physical activity apps based on MARS dimensions, with the goal of forecasting ratings before production and uncovering insights into user satisfaction drivers.
METHODS: We analyzed a dataset of 155 MARS-rated physical activity apps with user ratings. The dataset was split into training (n=111) and testing (n=44) subsets. K means clustering was applied to the training data, identifying 2 clusters. Exploratory data analysis included box plots, summary statistics, and component+residual plots to visualize linearity and distribution patterns across MARS dimensions. Correlation analysis was performed to quantify relationships between each MARS dimension and user ratings. In total, 5 machine learning models-generalized additive models, k-nearest neighbors, random forest, extreme gradient boosting, and support vector regression-were trained with and without clustering. Models were hypertuned and trained separately on each cluster, and the best-performing model for each cluster was selected. These predictions were combined to compute final performance metrics for the test set. Performance was evaluated using correct prediction percentage (0.5 range), mean absolute error, and R². Validation was performed on 2 additional datasets: mindfulness (n=85) and older adults (n=55) apps.
RESULTS: Exploratory data analysis revealed that apps in cluster 1 were feature-rich and scored higher across all MARS dimensions, reflecting comprehensive and engagement-oriented designs. In contrast, cluster 2 comprised simpler, utilitarian apps focused on basic functionality. Component+residual plots showed nonlinear relationships, which became more interpretable within clusters. Correlation analysis indicated stronger associations between user ratings and engagement and functionality, but weaker or negative correlations with aesthetics and information, particularly in cluster 2. In the unclustered dataset, k nearest neighbors achieved 79.55% accuracy, mean absolute error=0.26, and R²=0.06. The combined support vector regression (cluster 1)+k-nearest neighbors (cluster 2) model achieved the highest performance: 88.64% accuracy, mean absolute error=0.27, and R²=0.04. Clustering improved prediction accuracy and enhanced alignment between predicted and actual user ratings. Models also generalized well to the external datasets.
CONCLUSIONS: The combined clustering and modeling approach enhances prediction accuracy and reveals how user satisfaction drivers vary across app types. By transforming MARS from a descriptive tool into a predictive framework, this study offers a scalable, transparent method for forecasting user ratings during app development-particularly useful in early-stage or low-data settings.
PMID:41213075 | DOI:10.2196/70987