[pixels-cli, common] Implement Bucket-Based Data Classification for Distributed Loading


1. Refactor PixelsConsumer:

Introduce an abstract base class (AbstractPixelsConsumer) to handle common initialization and cleanup.

Create a concrete subclass (IndexedPixelsConsumer) dedicated to handling loads where a Primary Index exists.

Create a simple subclass (SimplePixelsConsumer) for loads without an Index (maintaining existing sequential logic).

2. Bucket-Based Routing Logic:

In IndexedPixelsConsumer, maintain a map to track active writers: Map<Integer, PerBucketWriter>.

For every incoming data row:

- Calculate the data's bucketId based on its Primary Key hash.

- Use the bucketId to look up the corresponding PerBucketWriter state object.

- If no writer exists for the bucketId, dynamically initialize a new PixelsWriter and temporary File.

3. Core Dependency: Node Mapping Cache:

Implement BucketToNodeCache (Small Component): Create a thread-safe, singleton, lazy-loaded cache component to quickly map a bucketId to its responsible RetinaNodeInfo. This cache reduces the necessity of repeatedly querying the NodeService for node assignment during the high-throughput loading process.

4. Distributed Indexing:

Ensure that index entries generated by IndexedPixelsConsumer are routed to the correct IndexService instance, potentially identified by the RetinaNodeInfo obtained from the cache.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pixels-cli, common] Implement Bucket-Based Data Classification for Distributed Loading #1218

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[pixels-cli, common] Implement Bucket-Based Data Classification for Distributed Loading #1218

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions