Skip to content

perf: Faster plot_histograms and more reliable plots#659

Merged
lars-reimann merged 23 commits intoSafe-DS:mainfrom
SamanHushi:main
May 1, 2024
Merged

perf: Faster plot_histograms and more reliable plots#659
lars-reimann merged 23 commits intoSafe-DS:mainfrom
SamanHushi:main

Conversation

@SamanHushi
Copy link
Member

Fixed problem that histogram get plottet a lot faster

Summary of Changes

  • rewrote the entire funtion
  • reduce numerical bins to a given parameter for more performance (default 10)
  • Bins are now also sorted
  • sometimes specific columns had strange plots, these are also now fixed
  • swaped the images for the test with new ones

@SamanHushi SamanHushi requested a review from a team as a code owner April 30, 2024 11:12
@SamanHushi SamanHushi changed the title Faster plot_histograms and more reliable plots perf: Faster plot_histograms and more reliable plots Apr 30, 2024
@codecov
Copy link

codecov bot commented Apr 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (c1644b7) to head (4880bf9).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #659   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           66        66           
  Lines         4825      4843   +18     
=========================================
+ Hits          4825      4843   +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@lars-reimann lars-reimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks and performs much better than the previous implementation. Very nice!

I have a couple of suggestions and then this is good to be merged.

SamanHushi and others added 11 commits April 30, 2024 21:09
shortening internal number of bin selection

Co-authored-by: Lars Reimann <mail@larsreimann.com>
using internal funktion to get number of columns

Co-authored-by: Lars Reimann <mail@larsreimann.com>
Co-authored-by: Lars Reimann <mail@larsreimann.com>
Co-authored-by: Lars Reimann <mail@larsreimann.com>
Co-authored-by: Lars Reimann <mail@larsreimann.com>
Co-authored-by: Lars Reimann <mail@larsreimann.com>
Co-authored-by: Lars Reimann <mail@larsreimann.com>
@lars-reimann
Copy link
Member

Awesome, I'll have a last look tomorrow, thank you.

Copy link
Member

@lars-reimann lars-reimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks a lot!

@lars-reimann lars-reimann merged commit b5f0a12 into Safe-DS:main May 1, 2024
lars-reimann pushed a commit that referenced this pull request May 1, 2024
## [0.22.0](v0.21.0...v0.22.0) (2024-05-01)

### Features

* `is_fitted` is now always a property ([#662](#662)) ([b1db881](b1db881)), closes [#586](#586)
* add `Column.missing_value_count` ([#682](#682)) ([f084916](f084916)), closes [#642](#642)
* Add `InputConversion` & `OutputConversion` for nn interface ([#625](#625)) ([fd723f7](fd723f7)), closes [#621](#621)
* Add hash,eq and sizeof in ForwardLayer ([#634](#634)) ([72f7fde](72f7fde)), closes [#633](#633)
* allow using tables that already contain target for prediction ([#687](#687)) ([e9f1cfb](e9f1cfb)), closes [#636](#636)
* callback `Row.sort_columns` takes four parameters instead of two tuples ([#683](#683)) ([9c3e3de](9c3e3de)), closes [#584](#584)
* rename `group_rows_by` in `Table` to `group_rows` ([#661](#661)) ([c1644b7](c1644b7)), closes [#611](#611)
* rename `number_of_column` in `Row` to `number_of_columns` ([#660](#660)) ([0a08296](0a08296)), closes [#646](#646)
* rework `TaggedTable` ([#680](#680)) ([db2b613](db2b613)), closes [#647](#647)
* show missing value count/ratio in summarized statistics ([#684](#684)) ([74b8a35](74b8a35)), closes [#619](#619)
* specify `extras` instead of `features` in `to_tabular_dataset` ([#685](#685)) ([841657f](841657f)), closes [#623](#623)

### Bug Fixes

* actually use `kernel` of support vector machines for training ([#681](#681)) ([09c5082](09c5082)), closes [#602](#602)

### Performance Improvements

* Faster plot_histograms and more reliable plots ([#659](#659)) ([b5f0a12](b5f0a12))
@lars-reimann
Copy link
Member

🎉 This PR is included in version 0.22.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

released Included in a release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants