-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Labels
releasedIncluded in a releaseIncluded in a release
Description
Is your feature request related to a problem?
The two scalers, range scaler and standard scaler, cannot deal with outliers well.
Desired solution
Add a new class RobustScaler similar to the scikit-learn transformer with the same name. It should be implemented using polars directly to avoid unnecessary conversions.
- Superclass:
InvertibleTableTransformer - Constructor parameters:
column_names: str | list[str] | None = None(keyword-only). List of columns to transform, if None all numeric columns passed infit.
- Attributes:
self._data_median: pl.DataFrame | None = Noneself._data_scale: pl.DataFrame | None = None
fit:- Call
_check_columns_existto ensure columns to transform exist - Call
_check_columns_are_numericto ensure that columns to transform are numeric - Raise value error if
row_countis 0 - Create a new instance of the
RobustScaler, don't mutate it in place - Compute the median (second quartile) for each column to transform and store it in
_data_medianof the copied transformer - Compute the first and third quartile for each column to transform and store the difference (third - first) in
_data_scaleof the copied transformer
- Call
transform:TransformerNotFittedErrorif transformer is not fitted- Call
_check_columns_existto ensure columns to transform exist - Call
_check_columns_are_numericto ensure that columns to transform are numeric
- Subtract
_data_medianfor each column to transform - Divide result by
_data_scalefor each column to transform
inverse_transform:TransformerNotFittedErrorif transformer is not fitted- Call
_check_columns_existto ensure columns to transform exist - Call
_check_columns_are_numericto ensure that columns to transform are numeric
- Multiply by
_data_scalefor each column to transform - Add
_data_medianfor each column to transform
See implementation of RangeScaler and StandardScaler for implementation of similar functionality.
Possible alternatives (optional)
No response
Screenshots (optional)
No response
Additional Context (optional)
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
releasedIncluded in a releaseIncluded in a release
Type
Projects
Status
✔️ Done