Transl Vis Sci Technol. 2025 May 1;14(5):2. doi: 10.1167/tvst.14.5.2.
ABSTRACT
PURPOSE: To investigate whether cataract surgical skill performance metrics automatically generated by artificial intelligence (AI) models can differentiate between trainee and faculty surgeons and the correlation between AI metrics and expert-rated skills.
METHODS: Routine cataract surgical videos from residents (N = 28) and attendings (N = 29) were collected. Three video-level metrics were generated by deep learning models: phacoemulsification probe decentration, eye decentration, and zoom level change. Three types of instrument- and landmark- specific metrics were generated for the limbus, pupil, and various surgical instruments: total path length, maximum velocity, and area. Expert human judges assessed the surgical videos using the Objective Structured Assessment of Cataract Surgical Skill (OSACSS). Statistical differences between AI and human-rated scores between attending surgeons and trainees were assessed using t-tests, and the correlations between them were examined by Pearson correlation coefficients.
RESULTS: The phacoemulsification probe had significantly lower total path lengths, maximum velocities, and area metrics in attending videos. Attending surgeons demonstrated better phacoemulsification centration and eye centration. Most AI metrics negatively correlated with OSACSS scores, including phacoemulsification decentration (r = -0.369) and eye decentration (r = -0.394). OSACSS subitems related to eye centration and different steps of surgery also exhibited significant negative correlations with corresponding AI metrics (r ranging from -0.77 to -0.49).
CONCLUSIONS: Automatically generated AI metrics can be used to differentiate between attending and trainee surgeries and correlate with the human expert evaluation on surgical performance.
TRANSLATIONAL RELEVANCE: AI-generated useful metrics that correlate with surgeon skill may be useful for improving cataract surgical education.
PMID:40310637 | DOI:10.1167/tvst.14.5.2