Fix duplicate field error when partition column already exists in schema#587
Open
simbadzina wants to merge 1 commit intolinkedin:masterfrom
Open
Fix duplicate field error when partition column already exists in schema#587simbadzina wants to merge 1 commit intolinkedin:masterfrom
simbadzina wants to merge 1 commit intolinkedin:masterfrom
Conversation
1d0d6bd to
477c6b6
Compare
…y exists in schema When a Hive view projects a partition column as a regular field in its schema, addPartitionColsToSchema() would attempt to add it again, causing AvroRuntimeException: "Duplicate field X in record". The fix skips partition columns that already exist in the schema by name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
477c6b6 to
614ad37
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
addPartitionColsToSchema()blindly appends partition columns without checking if they already exist in the schema, causingAvroRuntimeException: Duplicate field X in recordaddPartitionColsToSchema()tries to add it againSetand skips partition columns that are already presentChanges
SchemaUtilities.addPartitionColsToSchema(): Added deduplication check before adding partition columnsSchemaUtilitiesTests: Added two tests — one verifying duplicates are skipped, one verifying normal partition column addition still worksTest plan
testAddPartitionColsToSchemaSkipsDuplicatesfails without fix, passes with fixtestAddPartitionColsToSchemaAddsNewPartitionColconfirms normal behavior unchangedtestBaseTableWithPartition,testSelectStarWithPartition,testSelectPartitionColumn,testUnionSelectStarFromPartitionTable)