Skip to content

Fix duplicate field error when partition column already exists in schema#587

Open
simbadzina wants to merge 1 commit intolinkedin:masterfrom
simbadzina:sdzinama/fix-datepartition-duplication
Open

Fix duplicate field error when partition column already exists in schema#587
simbadzina wants to merge 1 commit intolinkedin:masterfrom
simbadzina:sdzinama/fix-datepartition-duplication

Conversation

@simbadzina
Copy link
Copy Markdown

@simbadzina simbadzina commented Mar 25, 2026

Summary

  • addPartitionColsToSchema() blindly appends partition columns without checking if they already exist in the schema, causing AvroRuntimeException: Duplicate field X in record
  • This can happen when a Hive view projects a partition column as a regular field — the schema already contains the field, and addPartitionColsToSchema() tries to add it again
  • The fix collects existing field names into a Set and skips partition columns that are already present

Changes

  • SchemaUtilities.addPartitionColsToSchema(): Added deduplication check before adding partition columns
  • SchemaUtilitiesTests: Added two tests — one verifying duplicates are skipped, one verifying normal partition column addition still works

Test plan

  • New test testAddPartitionColsToSchemaSkipsDuplicates fails without fix, passes with fix
  • New test testAddPartitionColsToSchemaAddsNewPartitionCol confirms normal behavior unchanged
  • Existing partition tests pass (testBaseTableWithPartition, testSelectStarWithPartition, testSelectPartitionColumn, testUnionSelectStarFromPartitionTable)

@simbadzina simbadzina changed the title [Coral-Schema] Fix duplicate field error when partition column alread… Fix duplicate field error when partition column already exists in schema Mar 25, 2026
@simbadzina simbadzina force-pushed the sdzinama/fix-datepartition-duplication branch from 1d0d6bd to 477c6b6 Compare March 25, 2026 01:13
…y exists in schema

When a Hive view projects a partition column as a regular field in its schema,
addPartitionColsToSchema() would attempt to add it again, causing
AvroRuntimeException: "Duplicate field X in record". The fix skips partition
columns that already exist in the schema by name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@simbadzina simbadzina force-pushed the sdzinama/fix-datepartition-duplication branch from 477c6b6 to 614ad37 Compare March 25, 2026 01:16
@simbadzina simbadzina marked this pull request as ready for review March 25, 2026 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant