Skip to content

Add RobustScaler #650

@lars-reimann

Description

@lars-reimann

Is your feature request related to a problem?

The two scalers, range scaler and standard scaler, cannot deal with outliers well.

Desired solution

Add a new class RobustScaler similar to the scikit-learn transformer with the same name. It should be implemented using polars directly to avoid unnecessary conversions.

  • Superclass: InvertibleTableTransformer
  • Constructor parameters:
    • column_names: str | list[str] | None = None (keyword-only). List of columns to transform, if None all numeric columns passed in fit.
  • Attributes:
    • self._data_median: pl.DataFrame | None = None
    • self._data_scale: pl.DataFrame | None = None
  • fit:
    • Call _check_columns_exist to ensure columns to transform exist
    • Call _check_columns_are_numeric to ensure that columns to transform are numeric
    • Raise value error if row_count is 0
    • Create a new instance of the RobustScaler, don't mutate it in place
    • Compute the median (second quartile) for each column to transform and store it in _data_median of the copied transformer
    • Compute the first and third quartile for each column to transform and store the difference (third - first) in _data_scale of the copied transformer
  • transform:
    • TransformerNotFittedError if transformer is not fitted
    • Call _check_columns_exist to ensure columns to transform exist
    • Call _check_columns_are_numeric to ensure that columns to transform are numeric
    1. Subtract _data_median for each column to transform
    2. Divide result by _data_scale for each column to transform
  • inverse_transform:
    • TransformerNotFittedError if transformer is not fitted
    • Call _check_columns_exist to ensure columns to transform exist
    • Call _check_columns_are_numeric to ensure that columns to transform are numeric
    1. Multiply by _data_scale for each column to transform
    2. Add _data_median for each column to transform

See implementation of RangeScaler and StandardScaler for implementation of similar functionality.

Possible alternatives (optional)

No response

Screenshots (optional)

No response

Additional Context (optional)

No response

Metadata

Metadata

Labels

releasedIncluded in a release

Type

No type

Projects

Status

✔️ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions