Skip to content

Conversation

@oshogbo
Copy link
Contributor

@oshogbo oshogbo commented Jan 16, 2026

Description

This change modifies the behavior of spa_sync_time_logger when flushing the RRD database.

Previously, once the sync interval elapsed, a flush would always be generated. On solid-state devices, especially when the pool was otherwise idle, this caused disks to wake up solely to write RRD data. Since RRD is best-effort telemetry, this behavior is unnecessary and wasteful.

With this change, spa_sync_time_logger delays flushing until a TXG that already contains data is being synced. The RRD update is appended to that TXG instead of forcing the creation of a new write-only TXG.

During pool export, flushing is forced regardless of whether the TXG contains user data. At that stage, data durability takes precedence and a write must be issued.

This fixes #18082
This change was inspired from @amotin in comments #18120.

Sponsored by: [Wasabi Technology, Inc.; Klara, Inc.]

How Has This Been Tested?

I have added logs to check when the database is flushed and what is the size of database.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure dp_dirty_pertxg at this point reliably means there is nothing to be written in this TXG. It may need a deeper look. But yea, this might be the direction.

Comment on lines +2157 to +2159
if (force ||
(txg > spa->spa_last_noted_txg &&
curtime >= spa->spa_last_noted_txg_time + spa_note_txg_time)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition does not look right. We should not care about force if we already logged the txg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it’s better to log during export, but we can omit it if you prefer so.

Copy link
Member

@amotin amotin Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this condition is wrong. I haven't looked whether on export txg can happen to be equal to spa->spa_last_noted_txg, but if it can, force will make it to insert a duplicate value. I suppose it should be:
txg > spa->spa_last_noted_txg && (force || curtime >= spa->spa_last_noted_txg_time + spa_note_txg_time).

This change modifies the behavior of spa_sync_time_logger when
flushing the RRD database.

Previously, once the sync interval elapsed, a flush would always
be generated. On solid-state devices, especially when the pool was
otherwise idle, this caused disks to wake up solely to write RRD
data. Since RRD is best-effort telemetry, this behavior is
unnecessary and wasteful.

With this change, spa_sync_time_logger delays flushing until a TXG
that already contains data is being synced. The RRD update is
appended to that TXG instead of forcing the creation of
a new write-only TXG.

During pool export, flushing is forced regardless of whether
the TXG contains user data. At that stage, data durability takes
precedence and a write must be issued.

Sponsored by: [Wasabi Technology, Inc.; Klara, Inc.]
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[2.4] TXG timestamp DB sync if idle causes unnecessary disk access/prevent spin down

2 participants