Commit dcbf5366 authored by Boris Baldassari's avatar Boris Baldassari
Browse files

Fix images in aura article, last updates.

parent a543666a
Pipeline #3551 passed with stage
......@@ -110,7 +110,7 @@ We developed three Docker images to easily execute the full workflow or specific
We decomposed the workflow into a sequence of steps that could be encapsulated and executed sequentially, i.e. where each step needs the output of the previous step to start. On steps that allow it (e.g. data processing) we can also run several containers in parallel. The resulting architecture is shown below:
{{< grid/div isMarkdown="false" >}}
<img src="/images/articles/aice_aura_demonstrator/aura_process_big.png" alt="The AURA AI process" class="img-responsive">
<img src="/images/articles/aice_aura_demonstrator/aura_process.png" alt="The AURA AI process" class="img-responsive">
{{</ grid/div >}}
In this diagram, the following containers are defined:
......@@ -132,17 +132,24 @@ The performance gain enabled us to run more precise and resource-consuming opera
We identified atomic steps that could be executed independently and built them as parallel execution jobs. As an example, the cleaning and preparation of data files can be executed simultaneously on different directories to accelerate the overall step. By partitioning the dataset in subsets of roughly 10GB and running concurrently 6 data preparation containers we went down from almost 17h to 4h on the same reference host.
Also by being able to run the process everywhere, we could execute it on several hardwares with different capabilities. This allowed us to check (and fix) portability while getting a better understanding of the resource requirements of each step. The following plot shows the evolution of performance in various situations:
![Workflow benchmarking](/images/articles/aice_aura_demonstrator/benchmark_perf.png)
{{< grid/div isMarkdown="false" >}}
<img src="/images/articles/aice_aura_demonstrator/aura_process_multi.png" alt="The AURA AI process" class="img-responsive">
{{</ grid/div >}}
On three different machines:
Also by being able to run the process everywhere, we could execute it on several hardwares with different capabilities. This allowed us to check (and fix) portability while getting a better understanding of the resource requirements of each step. We targeted three different hosts for our performance benchmark:
* A middle-range laptop (label: Laptop), HDD disks and i7 CPU.
* A high-range station (label: Station), SSD disks and (a better) i7 CPU.
* A high-range server (label: SDIA), HDD disks and 2 x Xeon (48 threads).
* With a single container for data preparation vs. multiple containers executed in parallel (label: Mono / Multi).
We could identify different behaviours regarding performance. The data preparation step relies heavily on IOs, and improving the disk throughput (e.g. SSD + NVMe instead of a classic HDD) shows a 30% gain. The ML training on the other hand is very CPU- and memory- intensive, and running it on a node with a large number of threads (e.g. 48 in our case) brings a stunning 10x performance improvement compared to a laptop equipped with an Intel i7.
We could identify different behaviours regarding performance. The data preparation step relies heavily on IOs, and improving the disk throughput (e.g. SSD + NVMe instead of a classic HDD) shows a 30% gain. The ML training on the other hand is very CPU- and memory- intensive, and running it on a node with a large number of threads (e.g. 48 in our case) brings a stunning 10x performance improvement compared to a laptop equipped with an Intel i7. The following plot shows the evolution of performance in various situations:
{{< grid/div isMarkdown="false" >}}
<img src="/images/articles/aice_aura_demonstrator/benchmark_perf.png" alt="Execution time benchmark" class="img-responsive">
{{</ grid/div >}}
### Visualisation process
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment