You need to sign in or sign up before continuing.

[UDRG] Data sampling recommender service (prototype)

The Data sampling recommender service recommends sampling methods for a dataset, e.g.: cluster sampling, stratified sampling and so on, according to the following provided metadata about the dataset.

timeseries: True if it the dataset is a timeseries. False otherwise.
category_column: If a timeries, the timestamp column name. Otherwise, tha category column name (e.g.: region, gender, postal_code, etc.).
high-dimensionability: True if the dataset has many columns(features), some of which could be correlated so a new dataset with a lower number of columns could be generated. False otherwise.
image: True if the dataset contains images. False otherwise.

The service output will be a list with recommended sampling methods, as in the following example:

[
  "cluster_sampling",
  "stratified_sampling"
]

Edited May 20, 2025 by Idoia Murua Belakortu