Evaluating Effects of Resting-State Electroencephalography Data Pre-Processing on a Machine Learning Task for Parkinson’s Disease

Abstract

Resting-state electroencephalography (RSEEG) is a method under consideration as a potential biomarker that could support early and accurate diagnosis of Parkinson’s disease (PD). RSEEG data is often contaminated by signals arising from other electrophysiological sources and the environment, necessitating pre-processing of the data prior to applying machine learning methods for classification. Importantly, using differing degrees of pre-processing will lead to different classification results. This study aimed to examine this by evaluating the difference in experimental results when using re-referenced data, data that had undergone filtering and artefact rejection, and data without muscle artefact. The results demonstrated that, using a Random Forest Classifier for feature selection and a Support Vector Machine for disease classification, different levels of pre-processing led to markedly different classification results. In particular, the presence of muscle artefact was associated with inflated classification accuracy, emphasising the importance of its removal as part of pre-processing.

Publication
medRxiv