Skip to content

performance problem: fragmented DataFrame after some table operations #574

@WinPlay02

Description

@WinPlay02

Describe the bug

This warning is printed, after doing some operations on a Table:

PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`

To Reproduce

  1. Load a Table with Table.from_csv_file
  2. Slice some rows out of the table with Table.slice_rows
  3. Split the Table with Table.split_rows
  4. Create a TaggedTable from the resulting table with Table.tag_columns

Expected behavior

These operations should not slow down the DataFrame and e.g. fitting to a model.

Screenshots (optional)

No response

Additional Context (optional)

For easier reproduction, this dataset was used: https://www.kaggle.com/competitions/digit-recognizer/overview

This code was used to reproduce the performance warning:

from safeds.data.tabular.containers import Table, Column

def pipeline():
    labeled_images = Table.from_csv_file('beginner_classification/train.csv')
    subset_images = labeled_images.slice_rows(0, 5000)
    train, validate = subset_images.split_rows(0.8)
    train_tagged = train.tag_columns("label")

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

✔️ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions