[UDRG] Data sampling recommender service (prototype)
The Data sampling recommender service recommends sampling methods for a dataset, e.g.: cluster sampling, stratified sampling and so on, according to the following provided metadata about the dataset.
-
timeseries
: True if it the dataset is a timeseries. False otherwise. -
category_column
: If a timeries, the timestamp column name. Otherwise, tha category column name (e.g.: region, gender, postal_code, etc.). -
high-dimensionability
: True if the dataset has many columns(features), some of which could be correlated so a new dataset with a lower number of columns could be generated. False otherwise. -
image
: True if the dataset contains images. False otherwise.
The service output will be a list with recommended sampling methods, as in the following example:
[
"cluster_sampling",
"stratified_sampling"
]
Edited by Idoia Murua Belakortu