Commit 84e6e387 authored by Boris Baldassari's avatar Boris Baldassari
Browse files

Update aice aura demonstrator following fza review.

parent f8f4e6be
Pipeline #3065 passed with stage
......@@ -17,7 +17,7 @@ The Eclipse AI, Cloud & Edge (AICE) Working Group is a special interest working
The AICE OpenLab has been initiated to provide a common shared platform to test, evaluate and demonstrate AI workflows developed by partners. This enables an open collaboration and discussion on AI solutions, and fosters portability and standardisation. The AICE OpenLab is currently working on two use cases: AURA, as described in this document, and Eclipse Graphene, a general-purpose scheduler for AI workflows.
More information about AICE:
More information:
* AICE Working Group wiki: https://wiki.eclipse.org/AICE_WG/
* Eclipse Graphene / AI4EU Experiments: https://ai4europe.eu.
......@@ -25,7 +25,7 @@ More information about AICE:
AURA is a non-profit French organisation that designs and develops a patch to detect epileptic seizures before they happen and warns patients ahead for safety purposes. For this AURA is creating a multidisciplinary community integrating open source and open hardware philosophies with the health and research worlds. The various partners of the initiative (patients, neurologists, data scientists, designers) each bring their experience and expertise to build an open, science-backed workflow that can actually help the end-users. In the end, this device could be a life-changer for the 10 million people with drug-resistant epilepsy worldwide.
More information about Aura:
More information:
* AURA Healthcare official website: https://en.aura.healthcare
* GitHub: https://github.com/Aura-healthcare/
......@@ -40,28 +40,36 @@ However machine learning (ML) methods have been extensively used in the recent
These annotations are used as a reference dataset for training various ML models. Available datasets are usually split so as to set one part for the training and another one to verify the trained model. A typical workflow is to then try to predict epileptic seizures according to an ECG signal and check if the human annotations confirm the seizure.
More information on epileptic seizure detection:
* Methods for seizure detection: https://en.aura.healthcare/analyse-des-donn%C3%A9es
More information:
* Methods for seizure detection: https://en.aura.healthcare/analyse-des-données
* Seizure dogs: https://www.epilepsy.com/living-epilepsy/seizure-first-aid-and-safety/seizure-dogs
### Existing workflow
We started to work using an existing workflow previously designed by the AURA organisation. As input it's using:
We started from the workflow already developed by the AURA data scientists. As usual the first step is to prepare the data before using it (cleaning, selection and extraction of features, and formatting). The resulting dataset is subsequently fed to a Random Forest model to predict future seizures.
* the raw signal data stored as European Data Format (EDF. see https://www.edfplus.info/)
* The annotations stored in  `.tse_bi` files with a 1-to-1 association with the EDF signal files.
The data inputs of the workflow come from two different file types:
* The raw ECG signal data stored as European Data Format (EDF).
* The annotations, which describe if the signal pattern is actually an epileptic seizure or normal activity, are stored in `.tse_bi` files with a 1-to-1 association with the EDF signal files.
Python scripts are used to generate the Seizure Detection Model as described by the following schema:
The data preparation step is achieved through a series of Python scripts developed by the AURA scientists, which extract the rr-intervals (i.e. the time laps between two heart beats), cardiac features and annotations, then build a simplified dataset that can be used to train a Random-Forest algorithm:
{{< grid/div isMarkdown="false" >}}
<img src="/images/articles/aice_aura_demonstrator/ecg_workflow.png" alt="The AURA AI process - before" class="img-responsive">
{{</ grid/div >}}
There are many parameters involved in the process, with some of them having a huge impact on performance, like the time window for the rr-interval. In order to fine-tune these parameters one needs to try and run various combinations, which is not practical or even prohibitive with long executions.
More information:
* European Data Format (EDF): https://www.edfplus.info
* AURA GitHub repository for seizure detection: https://github.com/Aura-healthcare/seizure_detection_pipeline
## Objectives of the project
In this context, our first practical goal was to train the model on a large dataset of EEGs/ECGs from Temple University Hospital (TUH). The TUH dataset is composed of EDF files recording the electrocardiogram signal, along with their annotation files that classify the time ranges as either noise or as an epileptic seizure. The full dataset has 5600+ EDF files and as many annotations, representing 692 Patients, 1074 Hours of Recordings and 3500+ Seizures. Its size on disk is 67GB.
In this context, our first practical goal was to train the model on a large dataset of ECGs from the Temple University Hospital (TUH). The TUH dataset is composed of EDF files recording the electrocardiogram signal, along with their annotation files that classify the time ranges as either noise or as an epileptic seizure. The full dataset has 5600+ EDF files and as many annotations, representing 692 Patients, 1074 Hours of Recordings and 3500+ Seizures. Its size on disk is 67GB.
AI-related research and development activities, even if they rely on smaller datasets for the early stages of the set up, require a more complete dataset to run when it comes to the fine-tuning and exploitation of the model. The TUH database was not used with the previous AURA workflow, as its full execution would take more than 20 hours on the developer's computers. Executions often failed because of wrong input data, and switching to more powerful computers was difficult because of the complex setup.
AI-related research and development activities, even if they rely on smaller datasets for the early stages of the set up, require a more complete dataset to run when it comes to the fine-tuning and exploitation of the model. The TUH database was not used often with the previous AURA workflow, as its full execution would take more than 20 hours on the developer's computers. Executions often failed because of wrong input data, and switching to more powerful computers was difficult because of the complex setup.
Established objectives of the project were to:
......@@ -75,16 +83,16 @@ Established objectives of the project were to:
More information:
* Temple university dataset homepage: https://isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml#c_tusz
* Temple university dataset reference: Obeid I., Picone J. (2016). The temple university hospital EEG data corpus. Front. Neurosci. 10:196. 10.3389/fnins.2016.00196.
* Temple university dataset reference: Obeid I., Picone J. (2016). The temple university hospital EEG data corpus. Front. Neurosci. 10:196.10.3389/fnins.2016.00196.
## Areas of improvement
We identified four area of improvements
Considering the above situation and objectives, we identified four area of improvements:
* Portability
* Performance
* Visualisation
* Industrialisation
* **Portability**: make the workflow executable on any machine.
* **Performance**: optimise execution time and resources.
* **Visualisation**: help with the massive import of ECG files in database.
* **Industrialisation**: make sure that the next development can seamlessly reuse our work.
### Portability: Building the AURA Containers
......@@ -122,7 +130,7 @@ Another step was to refactor the scripts to identify and remove performance bott
The performance gain enabled us to run more precise and resource-consuming operations in order to refine the training. For example we modified the length of the sliding window when computing the rr-intervals from 9 seconds to 1 second, which generates a substantial amount of computations while seriously improving predictions from the ML training.
We identified atomic steps that could be executed independently and built them as parallel execution jobs. As an example, the cleaning and preparation of data files can be executed simultaneously on different directories to accelerate the overall step. By partitioning the dataset in subsets of roughly 10GB and running concurrently 6 data preparation containers we went down from almost 17h to 4h on the same host.
We identified atomic steps that could be executed independently and built them as parallel execution jobs. As an example, the cleaning and preparation of data files can be executed simultaneously on different directories to accelerate the overall step. By partitioning the dataset in subsets of roughly 10GB and running concurrently 6 data preparation containers we went down from almost 17h to 4h on the same reference host.
Also by being able to run the process everywhere, we could execute it on several hardwares with different capabilities. This allowed us to check (and fix) portability while getting a better understanding of the resource requirements of each step. The following plot shows the evolution of performance in various situations:
......@@ -188,7 +196,7 @@ We also installed a fresh instance of AI4EU Experiments on our dedicated hardwar
The major performance gain was achieved by setting up dedicated containers to run atomic tasks (e.g. data preparation, visualisation imports) in parallel. Most computers, both in the lab and for high-end execution platforms, have multiple threads and enough memory to manage several containers simultaneously, and we need to take advantage of the full computing power we have. Another major gain was obviously to run the process on a more powerful system, with enough memory, CPUs and disk throughput.
All considered we were able to scale down the full execution time on the TUH dataset from 17 hours on the lab's laptop to roughly 4 hours in our cluster.
All considered we were able to scale down the full execution time on the TUH dataset from 20 hours on the lab's laptop to roughly 4 hours in our cluster.
### Visualisation
......
static/images/articles/aice_aura_demonstrator/ecg_workflow.png

151 KB | W: | H:

static/images/articles/aice_aura_demonstrator/ecg_workflow.png

50.2 KB | W: | H:

static/images/articles/aice_aura_demonstrator/ecg_workflow.png
static/images/articles/aice_aura_demonstrator/ecg_workflow.png
static/images/articles/aice_aura_demonstrator/ecg_workflow.png
static/images/articles/aice_aura_demonstrator/ecg_workflow.png
  • 2-up
  • Swipe
  • Onion skin
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment