-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Labels
performance 🏃Speed things upSpeed things up
Description
Describe the bug
This warning is printed, after doing some operations on a Table:
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
To Reproduce
- Load a Table with
Table.from_csv_file - Slice some rows out of the table with
Table.slice_rows - Split the Table with
Table.split_rows - Create a TaggedTable from the resulting table with
Table.tag_columns
Expected behavior
These operations should not slow down the DataFrame and e.g. fitting to a model.
Screenshots (optional)
No response
Additional Context (optional)
For easier reproduction, this dataset was used: https://www.kaggle.com/competitions/digit-recognizer/overview
This code was used to reproduce the performance warning:
from safeds.data.tabular.containers import Table, Column
def pipeline():
labeled_images = Table.from_csv_file('beginner_classification/train.csv')
subset_images = labeled_images.slice_rows(0, 5000)
train, validate = subset_images.split_rows(0.8)
train_tagged = train.tag_columns("label")Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performance 🏃Speed things upSpeed things up
Type
Projects
Status
✔️ Done