[Infrastructure] support additional requirements for Spark Jobs in Quality Evaluator
Due to a dependency problem in Spark Jobs, some changes have to be made in the Quality Evaluator project and also in the infrastructure project.
Spark Jobs runs on an image created in the deployment step (infrastructure project) because it must contain the KPI library as a dependency. Now, with the improvement of Parquet file processing, this image needs to be built with the additional dependencies that Spark Jobs needs as well.
At the moment, it works thanks to the second level dependencies of the KPI library (pandas, etc.), but ‘pyarrow’, ‘fastparquet’ are not included.
Solution
Project infrastructure (spark directory):
- Modify the Spark Jobs Dockerfile in order to add the dependencies ‘pyarrow’, ‘fastparquet’
Edited by Javier Belenguer Faguás