[IGNORE] FLINK-5911: Full flink_tools refactor (reference — will be split into smaller PRs)#4275
Draft
[IGNORE] FLINK-5911: Full flink_tools refactor (reference — will be split into smaller PRs)#4275
Conversation
Extract ~300 lines of monolithic formatting logic from _print_flink_status_from_job_manager() into 10 independently testable functions in flink_tools.py with TypedDict data structures. Improvements over previous attempt (PR #4165): - Defensive .get() for pod_status, phase, reason (prevents KeyError) - Null check on metadata before accessing labels - Preserves checkpoint counts and restart timestamps (PRs #4270/#4271) - Overview null guard for crashlooping jobmanager - Specific error messages per API call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket: https://jira.yelpcorp.com/browse/FLINK-5911
Summary
This PR is for reference only — it will be split into 4 smaller, reviewable PRs.
Refactors the monolithic
_print_flink_status_from_job_manager()function (~335 lines) instatus.pyinto modular, independently testable functions inflink_tools.py. This is an improved version of PR #4165 that was never merged, now updated to work with the checkpoint counts (#4270) and restart timestamps (#4271) that have since been merged.Motivation
The
_print_flink_status_from_job_manager()function handles 10 distinct concerns in a single function:This makes it hard to test individual pieces, hard to review changes, and easy to introduce bugs.
What Changed
New TypedDicts in
flink_tools.pyPodCounts— running, evicted, other, total pod countsJobCounts— running, finished, failed, cancelled, total job countsFlinkJobDetailsDict— aggregated state, pod counts, job counts, taskmanagers, slots, jobs list, overview availability flagFlinkInstanceDetails— config SHA, version, revision, dashboard URL, pool, team, runbookNew functions in
flink_tools.py(10 total)get_flink_instance_details()format_flink_instance_header()format_flink_instance_metadata()format_flink_config_links()format_flink_log_commands()paasta logscommands for service, taskmanager, jobmanager, supervisorformat_flink_monitoring_links()collect_flink_job_details().get()), job counts (with overview null guard), taskmanager/slot infoformat_flink_state_and_pods()get_flink_job_name()status.py— extracts job name from full Flink job nameformat_flink_jobs_table()Changes to
status.py_print_flink_status_from_job_manager()reduced from ~335 lines to ~80 linesget_flink_job_name()delegates toflink_tools.get_flink_job_name()shutil,groupby,get_runbook,FlinkJobsImprovements over PR #4165
.get()forpod_status,phase,reason— preventsKeyErroron malformed pod datametadata is not Nonebefore accessing labelsformat_flink_jobs_table()getattr(overview, "jobs_running", None) is Nonefor crashlooping jobmanager — shows "Jobs: unknown (jobmanager is not responding)"Test coverage
test_flink_tools.pyacross 10 test classes covering all new functionsFiles Updated
paasta_tools/flink_tools.py: +390 lines — TypedDicts + 10 new functionspaasta_tools/cli/cmds/status.py: -299 lines — simplified orchestrator, removed inline logictests/test_flink_tools.py: +439 lines — comprehensive unit tests for all new functionstests/cli/test_cmds_status.py: Updated error message assertions, config SHA format, output orderingPlanned split into 4 PRs
extract-metadata-helpersextract-pod-state-displayextract-job-displayorchestrator-cleanupTest plan
pytest tests/test_flink_tools.py— 42 tests passpytest tests/cli/test_cmds_status.py -k flink— 13 tests passmypy paasta_tools/flink_tools.py paasta_tools/cli/cmds/status.py— no issuespre-commit run --all-files— all checks pass.tox/py310-linux/bin/paasta status -s beam_happyhour -c pnw-devc -i main -vvv