Skip to content

[UDRG] Data sampling recommender service (prototype)

The Data sampling recommender service recommends sampling methods for a dataset, e.g.: cluster sampling, stratified sampling and so on, according to the following provided metadata about the dataset.

  • timeseries: True if it the dataset is a timeseries. False otherwise.
  • category_column: If a timeries, the timestamp column name. Otherwise, tha category column name (e.g.: region, gender, postal_code, etc.).
  • high-dimensionability: True if the dataset has many columns(features), some of which could be correlated so a new dataset with a lower number of columns could be generated. False otherwise.
  • image: True if the dataset contains images. False otherwise.

The service output will be a list with recommended sampling methods, as in the following example:

[
  "cluster_sampling",
  "stratified_sampling"
]

sampling_recommender

Edited by Idoia Murua Belakortu