Categories
Nevin Manimala Statistics

Data Preprocessing Techniques for Artificial Learning (AI)/Machine Learning (ML)-Readiness: Systematic Review of Wearable Sensor Data in Cancer Care

JMIR Mhealth Uhealth. 2024 Apr 16. doi: 10.2196/59587. Online ahead of print.

ABSTRACT

BACKGROUND: Wearable sensors are increasingly being explored in healthcare, including in cancer care, for their potential in continuously monitoring patients. Despite their growing adoption, significant challenges remain in the quality and consistency of data collected from wearable sensors. In particular, preprocessing pipelines to clean and standardize raw data have not been fully optimized.

OBJECTIVE: The aim of this study was to conduct a systematic review of preprocessing techniques employed on wearable sensor data to ensure their readiness for artificial intelligence/machine learning (“AI/ML-ready”) applications. Specifically, we sought to understand the landscape of current approaches applied in cleaning, normalizing, and transforming raw datasets into usable formats for subsequent AI/ML analysis.

METHODS: We systematically searched IEEE Xplore, PubMed, Embase (including Embase, Embase Classic, MEDLINE, PubMed-not-MEDLINE), and Scopus to identify potentially relevant studies for this review. The eligibility criteria included: (1) mHealth and wearable sensor studies in cancer; (2) written and published in English; (3) published between January 2018 and December 2023; (4) full text available rather than abstracts; (5) original studies published in peer-reviewed journals or appeared in conference proceedings. The Covidence app was used as a review resource for the screening stage. Statistical learning and image processing techniques were considered irrelevant.

RESULTS: In the initial phase, 2,147 papers were identified between January 2018-December 2023. After a thorough evaluation of these selected papers, we applied our predefined eligibility criteria, which resulted in a total of 20 papers. The following three categories for preprocessing techniques were identified: (1) Data Transformation, (2) Data Scaling, (3) and Data Cleaning.

CONCLUSIONS: While wearable sensors are gaining traction in cancer care, there remain challenges in the application of standard AI/ML techniques due to low quality of raw data captured and not applying appropriate preprocessing pipelines to enrich the data quality. As of now, AI/ML methodologies remain individually tailored to specific studies or types of data, and limit the generalizability of research findings. A general framework for those multiple types of databases has been proposed in this work. Our findings suggest a pressing need to develop and adopt uniform data quality and pre-processing workflows of wearable sensor data that can support the breadth of cancer research and its diverse patient populations.

PMID:38626290 | DOI:10.2196/59587

By Nevin Manimala

Portfolio Website for Nevin Manimala