Commit 7e34bda1 authored by Boris Baldassari's avatar Boris Baldassari
Browse files

Merge branch 'dev' into 'main'

new website

See merge request !5
parents 70796793 f319106d
Pipeline #3579 passed with stage
in 0 seconds
......@@ -67,7 +67,7 @@ More information:
## Objectives of the project
In this context, our first practical goal was to train the model on a large dataset of ECGs from the Temple University Hospital (TUH). The TUH dataset is composed of EDF files recording the electrocardiogram signal, along with their annotation files that classify the time ranges as either noise or as an epileptic seizure. The full dataset has 5600+ EDF files and as many annotations, representing 692 Patients, 1074 Hours of Recordings and 3500+ Seizures. Its size on disk is 67GB.
In this context, our first practical goal was to train the model on a large dataset of ECGs from the Temple University Hospital (TUH). The TUH dataset is composed of EDF files recording the electrocardiogram signal, along with their annotation files that classify the time ranges as either noise or as an epileptic seizure. The full dataset has 5600+ EDF files and as many annotations, representing 692 patients, 1074 hours of recordings and 3500+ seizures. Its size on disk is 67GB.
AI-related research and development activities, even if they rely on smaller datasets for the early stages of the set up, require a more complete dataset to run when it comes to the fine-tuning and exploitation of the model. The TUH database was not used often with the previous AURA workflow, as its full execution would take more than 20 hours on the developer's computers. Executions often failed because of wrong input data, and switching to more powerful computers was difficult because of the complex setup.
......@@ -110,7 +110,7 @@ We developed three Docker images to easily execute the full workflow or specific
We decomposed the workflow into a sequence of steps that could be encapsulated and executed sequentially, i.e. where each step needs the output of the previous step to start. On steps that allow it (e.g. data processing) we can also run several containers in parallel. The resulting architecture is shown below:
{{< grid/div isMarkdown="false" >}}
<img src="/images/articles/aice_aura_demonstrator/aura_process_big.png" alt="The AURA AI process" class="img-responsive">
<img src="/images/articles/aice_aura_demonstrator/aura_process.png" alt="The AURA AI process" class="img-responsive">
{{</ grid/div >}}
In this diagram, the following containers are defined:
......@@ -130,20 +130,28 @@ Another step was to refactor the scripts to identify and remove performance bott
The performance gain enabled us to run more precise and resource-consuming operations in order to refine the training. For example we modified the length of the sliding window when computing the rr-intervals from 9 seconds to 1 second, which generates a substantial amount of computations while seriously improving predictions from the ML training.
We identified atomic steps that could be executed independently and built them as parallel execution jobs. As an example, the cleaning and preparation of data files can be executed simultaneously on different directories to accelerate the overall step. By partitioning the dataset in subsets of roughly 10GB and running concurrently 6 data preparation containers we went down from almost 17h to 4h on the same reference host.
We identified atomic steps that could be executed independently and built them as parallel execution jobs. As an example, the cleaning and preparation of data files can be executed simultaneously on different directories to accelerate the overall step. By partitioning the dataset in subsets of roughly 10GB and running concurrently 6 data preparation containers we went down from almost 17 hours to 4 hours on the same reference host.
Also by being able to run the process everywhere, we could execute it on several hardwares with different capabilities. This allowed us to check (and fix) portability while getting a better understanding of the resource requirements of each step. The following plot shows the evolution of performance in various situations:
![Workflow benchmarking](/images/articles/aice_aura_demonstrator/benchmark_perf.png)
{{< grid/div isMarkdown="false" >}}
<img src="/images/articles/aice_aura_demonstrator/aura_process_multi.png" alt="The AURA AI process" class="img-responsive">
{{</ grid/div >}}
On three different machines:
Also by being able to run the process everywhere, we could execute it on several hardwares with different capabilities. This allowed us to check (and fix) portability while getting a better understanding of the resource requirements of each step. We targeted three different hosts for our performance benchmark:
* A middle-range laptop (label: Laptop), HDD disks and i7 CPU.
* A high-range station (label: Station), SSD disks and (a better) i7 CPU.
* A high-range server (label: SDIA), HDD disks and 2 x Xeon (48 threads).
* With a single container for data preparation vs. multiple containers executed in parallel (label: Mono / Multi).
The following plot shows the evolution of performance in various situations:
{{< grid/div isMarkdown="false" >}}
<img src="/images/articles/aice_aura_demonstrator/benchmark_perf.png" alt="Execution time benchmark" class="img-responsive">
<br />
{{</ grid/div >}}
We could identify different behaviours regarding performance. The data preparation step relies heavily on IOs, and improving the disk throughput (e.g. SSD + NVMe instead of a classic HDD) shows a 30% gain. The ML training on the other hand is very CPU- and memory- intensive, and running it on a node with a large number of threads (e.g. 48 in our case) brings a stunning 10x performance improvement compared to a laptop equipped with an Intel i7.
### Visualisation process
AURA uses Grafana to display the ECG signals and the associated annotations, both for the creation of annotated data sets and for their exploitation. In order to build this workflow we need to import the rr-intervals files and their associated annotations in a PostgreSQL database, and configure Grafana to read and display the corresponding time series.
......
......@@ -57,7 +57,7 @@ The meeting was held in Brussels, Belgium, at the Huawei office, 180 Chaussée d
[Download slides](https://drive.google.com/file/d/1oZOIi6YNaxIF9jQwl_AHOiBP-RsbnuE-/view?usp=sharing) - [See video](https://youtu.be/hrsmnPEBTL8)
* Open Euler -- Mauro Carvalho Chehab, Operating Systems Senior Engineer, Roberto Sassu, Senior Security Engineer Trusted Computing (Huawei) \
[Download slides]() - [See video]()
[Download slides](https://drive.google.com/file/d/1Ox16B-ScMX8-2N6yZo9dr_H_Qsc8Cuvd/view?usp=sharing) - [See video](https://www.youtube.com/watch?v=5M0SZYJ7t-g)
* The brain needs a nervous system - Supporting Cloud to Thing AI -- Luca Cominardi, PhD, Senior Technologist (AdLink) \
[Download slides](https://drive.google.com/file/d/17WQza9ryVjEm3tmcwjN0JcmhyBEEru8s/view?usp=sharing) - [See video](https://youtu.be/CkoC_KfdGqM)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment