Skip to content

Conversation

@gaozhangmin
Copy link
Contributor

@gaozhangmin gaozhangmin commented Dec 19, 2025

Motivation

The current asyncFindPosition method has performance issues: it performs binary search by reading all entries from the beginning based on offset to find the position. This implementation causes several problems when dealing with large amounts of data:

  1. High Latency: Requires traversing and reading large amounts of unrelated ledgers and entries
  2. High Traffic: Reads unnecessary data from BookKeeper
  3. Resource Waste: CPU and network resources are consumed by ineffective search operations

This problem becomes particularly severe in Pulsar topics with large amounts of historical data, significantly impacting consumer startup speed and overall performance.

Modifications

This PR optimizes the asyncFindPosition method through the following approaches:

  1. Record index information during ledger creation:

    • Write the current ManagedLedgerInterceptor index into LedgerInfo
    • Store it as the relative position of firstEntryIndex
  2. Optimize search logic:

    • Pre-judge the size relationship between the target offset and firstEntryIndex in asyncFindPosition
    • Decide whether to search the ledger based on the comparison result
    • Skip ledgers that don't contain the target offset and proceed to the next one
  3. Reduce unnecessary data reads:

    • Avoid reading ledger data that obviously doesn't contain the target position
    • Significantly reduce data transfer from BookKeeper
    • Substantially decrease search latency

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Added unit tests in ManagedLedgerInterceptorImplTest including:
    • testSetFirstEntryIndex: Verifies proper setting of firstEntryIndex during ledger creation
    • testFindPositionByOffsetWithMissingFirstEntryIndex: Tests backward compatibility when firstEntryIndex is not available
    • testFindPositionByOffset: Tests optimized position finding with various offset scenarios

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API - Added firstEntryIndex field to LedgerInfo metadata structure
  • The schema - Modified LedgerInfo protobuf schema to include firstEntryIndex
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints - Position lookup performance improvements may affect REST API response times
  • The admin CLI options
  • The metrics - Added new metrics for search optimization hit/miss rates
  • Anything that affects deployment - LedgerInfo schema changes require careful rollout strategy

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

gaozhangmin#13

@github-actions
Copy link

@gaozhangmin Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@gaozhangmin gaozhangmin self-assigned this Dec 19, 2025
@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Dec 19, 2025
@gaozhangmin gaozhangmin removed the doc-required Your PR changes impact docs and you will update later. label Dec 19, 2025
@github-actions github-actions bot added doc-label-missing doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Dec 19, 2025
@dao-jun
Copy link
Member

dao-jun commented Dec 22, 2025

this feat should completed in the downstream project, I added PIP-404 to introduce per-ledger properties, the downstream project could add the first entry index to the ledger-info

@gaozhangmin
Copy link
Contributor Author

gaozhangmin commented Dec 22, 2025

this feat should completed in the downstream project, I added PIP-404 to introduce per-ledger properties, the downstream project could add the first entry index to the ledger-info

No,the firstEntryIndex is recorded when new ledger created.

@dao-jun
Copy link
Member

dao-jun commented Dec 22, 2025

The feat is mostly for kop, as it is a protocol plugin, I don't think add the logic to the pulsar-core is a good idea.

@gaozhangmin
Copy link
Contributor Author

The feat is mostly for kop, as it is a protocol plugin, I don't think add the logic to the pulsar-core is a good idea.

I see your point, but since asyncFindPosition exists in Pulsar, perhaps we should focus on how to optimize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/broker doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants