feat(coordinator): add distributed coordination system for multi-instance Fess#3101
Merged
feat(coordinator): add distributed coordination system for multi-instance Fess#3101
Conversation
…ance Fess Add CoordinatorHelper to enable inter-instance coordination via OpenSearch, preventing concurrent execution of maintenance operations (reindex, config rebuild, etc.) across multiple Fess instances connected to the same cluster. Key features: - Instance heartbeat registration with TTL-based liveness detection - Operation state management using op_type=create for distributed mutex - Event publishing/consumption for inter-instance notifications - Periodic polling loop (60s) for heartbeat, event consumption, and cleanup Also adds: - SystemHelper.getInstanceId() with hostname+PID for process uniqueness - fess_config.coordinator index with single-shard atomicity guarantee - Lock control in AdminMaintenanceAction for all maintenance operations - Ownership-verified lock release with optimistic concurrency control - Error messages in 17 languages - Comprehensive unit tests (134 tests)
- Add retry limit (coordinator.operation.retry) to prevent infinite recursion in tryCleanupAndRetry, configurable via fess_config.properties - Unify lock release to use completeOperation in all code paths for idempotent and safe double-call behavior - Add refresh=true to sendHeartbeat for immediate search visibility - Use typed FessConfig accessors instead of raw string key access - Fix event time tracking to avoid same-millisecond event loss - Remove unused createdBy field from coordinator index mapping - Add Javadoc to all public/protected methods and data classes - Add 23 unit tests covering retry logic, lock safety, poll loop, config accessors, and event time advancement
Remove failOperation method that simply delegates to completeOperation, reorder app.xml components alphabetically, and replace 22 test cases in CoordinatorHelperTest that were not actually testing CoordinatorHelper (testing Java arithmetic, FessConfig constants, or duplicating logic) with proper tests using mocked CurlHelper to verify actual method behavior.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add CoordinatorHelper to enable inter-instance coordination via OpenSearch, preventing concurrent execution of maintenance operations across multiple Fess instances connected to the same cluster.
Changes Made
op_type=createfor atomic lock acquisitionif_seq_no/if_primary_termoptimistic concurrencyerrors.operation_already_runningin 17 languagescoordinator.poll.interval,coordinator.heartbeat.ttl,coordinator.operation.ttl,coordinator.event.ttlTesting
Breaking Changes
Additional Notes
number_of_shards: 1to guaranteeop_type=createatomicity on the same primary shard