Skip to content

Upgrade Hive 1.2.2 → 2.3.9#585

Open
YogeshKothari26 wants to merge 4 commits intolinkedin:masterfrom
YogeshKothari26:pr2-hive-239
Open

Upgrade Hive 1.2.2 → 2.3.9#585
YogeshKothari26 wants to merge 4 commits intolinkedin:masterfrom
YogeshKothari26:pr2-hive-239

Conversation

@YogeshKothari26
Copy link
Copy Markdown
Contributor

@YogeshKothari26 YogeshKothari26 commented Mar 17, 2026

What changes are proposed in this pull request, and why are they necessary?

This PR upgrades Coral's Hive dependency from 1.2.2 to 2.3.9.

Part 2 of 3: #580 Gradlethis PR (Hive) → Java 17 (WIP). Stays on Java 8.

Dependencies

  • Hive 1.2.2 → 2.3.9
  • Added hive-serde (split from hive-metastore in 2.3.9)
  • DataNucleus/Derby/javax.jdo test deps consolidated in root build.gradle
  • Pentaho exclusion (not in Maven Central)
  • Hive CBO disabled via systemProperty (CalcitePlanner incompatible with Calcite 1.21.0.265)
  • Excluded problematic transitives from published POM to protect downstream consumers

Source fixes

  • StaticHiveFunctionRegistry: No changes to function registrations. All existing functions (including TRANSLATE3, substr/substring) are compatible with Hive 2.3.9.
  • MetastoreProvider: getProxy(conf)getProxy(conf, true) (single-arg overload removed in 2.3.9)

coral-spark-catalog compatibility

After merging master (which added coral-spark-catalog via #584), the new module's tests needed Hive 2.3.9 compatibility fixes:

  • Added DataNucleus/Derby/hive-serde test deps (same as other modules)
  • Excluded Jackson from hive-metastore and hive-exec-core (Hive 2.3.9 brings Jackson 2.6.x which conflicts with Spark 3.5's Jackson 2.15)
  • Disabled Hive CBO (same Calcite conflict as other modules)
  • Test-only changes — no impact on published artifacts

Tests disabled

  • testEnumUnionString — Hive 2.3.9 SemanticAnalyzer throws AssertionError in UnparseTranslator.addTranslation during CREATE VIEW with UNION ALL between Avro enum and string columns
HMS API Changelog: 1.2.2 → 2.3.9

Core read APIs — UNCHANGED:

getTable, getDatabase, getTables, getAllDatabases, getAllTables, getFields, getSchema, listPartitions, getPartitionsByNames, listPartitionNames, listPartitionsByFilter, tableExists — all identical signatures.

Breaking changes (4, all handled internally by Coral):

  • RetryingMetaStoreClient.getProxy(HiveConf) — removed, replaced by getProxy(HiveConf, boolean)
  • getProxy() Map param → ConcurrentHashMap
  • dropPartitions() ignoreProtection param removed
  • getPartition() param order swapped (tblName, dbName → dbName, tblName)

New methods: 25 additive (constraints, bulk metadata, partition values, etc.)

Thrift: 0.9.2 → 0.9.3 (wire compatible)

Full source comparison

Transitive dependency analysis

Compared coral-common compile classpath: master (146 artifacts) → this PR (221 artifacts).

+102 new (from Hive 2.3.9 dependency tree), -27 removed (from Hive 1.2.2).

Excluded transitives — verified NOT leaking in published POM:

Artifact Why excluded
org.apache.logging.log4j:log4j-core CVE (log4shell)
org.eclipse.jetty.orbit:javax.servlet Conflicts with consumer servlet APIs
org.slf4j:slf4j-log4j12 Conflicts with consumer SLF4J bindings

Shadow JAR: coral-trino-parser verified identical to master — 3,450 entries, zero differences.

How was this patch tested?

PR2 branch (pr2-hive-239 — Java 8, Hive 2.3.9)

Phase Status Details
Phase 1: Local build ./gradlew clean build — all tests pass
Phase 3: Translation regression Zero regressions (Trino SQL + Spark SQL)
Phase 4: Trino i-test No new enforcer violations
Phase 5A: SDK/DSM integration Verified via dali-data-sdk snapshot chain
Phase 5B: HadoopJavaJob 9/10 pass, zero regressions vs baseline
Phase 6: Trino compat Shadow JAR identical, excluded deps not leaking

Combined branch (java17-hive239-migration — Java 17, Hive 2.3.9, Gradle 8)

Phase Status Details
Phase 1: Local build All tests pass
Phase 2: Translation regression All 4 languages
Phase 3: Spark/Trino DAG regression Zero regressions across 4028 views
Phase 4: Trino i-test No new incompatibilities
Phase 5A: SDK/DSM integration Verified via dali-data-sdk snapshot chain
Phase 5B: HadoopJavaJob 9/10 pass, zero regressions
Phase 6: Trino compat No new incompatibilities

Phase 3 — Full Regression (4028 production views, same-day comparison)

Metric Baseline (master) Migration (combined)
Success 3192 3199
Failed 836 829
Nullability diffs 0
New failures 0
Trino SQL regressions 0

Test Plan (LinkedIn Internal) | Test Evidence (LinkedIn Internal)

- Hive version 1.2.2 → 2.3.9 in dependencies.gradle
- Added hive-serde (split from hive-metastore in 2.3.9)
- Added DataNucleus/Derby/javax.jdo test deps (consolidated in root build.gradle)
- Added calcite-druid test dep (Hive 2.3.9 references DruidQuery at runtime)
- Pentaho exclusion (not in Maven Central)
- Hive CBO disabled via systemProperty (CalcitePlanner incompatible with Calcite 1.21.0.265)
- Excluded problematic Hive 2.3.9 transitives from coral-common (log4j-core, javax.servlet
  from Jetty orbit, slf4j-log4j12) to prevent conflicts in downstream consumers like Trino
- coral-service: use deps map instead of hardcoded Hive/Hadoop versions
- StaticHiveFunctionRegistry: disabled TRANSLATE3, split substr/substring
- MetastoreProvider: getProxy(conf) → getProxy(conf, true) (API change in 2.3.9)
- Disabled 3 tests: testEnumUnionString, testTranslateFunction (x2)
@YogeshKothari26 YogeshKothari26 marked this pull request as ready for review March 17, 2026 12:27
Resolve conflict in gradle/dependencies.gradle: keep both spark3.5
entries from master and Hive 2.3.9 DataNucleus/Derby deps from PR2.
coral-spark-catalog (PR linkedin#584) tests fail on the Hive 2.3.9 branch
because:

1. avatica-1.8.0.jar (via calcite-druid) bundles jackson-databind
   2.6.3 un-relocated, shadowing Spark 3.5's jackson-databind 2.15.2
   and causing NoSuchMethodError on JsonMappingException(Closeable,
   String) which was added in jackson-databind 2.7.0.

2. Hive 2.3.9 embedded metastore requires DataNucleus + Derby +
   hive-serde (separate from hive-metastore in 2.3.x).

3. Hive 2.3.9 CalcitePlanner is incompatible with Calcite 1.21.0.265
   (requires CBO disable).

All changes are test-only (testImplementation / test block).
- Re-enable addFunctionEntry("translate", TRANSLATE3) and
  addFunctionEntry("translate3", TRANSLATE3) — these were disabled
  preemptively during the Hive 2.3.9 upgrade but are not affected
  by the Calcite version mismatch (verified by unit tests and
  production DAG regression against 4027 views on Holdem)
- Re-enable testTranslateFunction in CoralSparkTest and
  HiveToTrinoConverterTest (both pass)
- Revert substr/substring from split 2-arg/3-arg registrations back
  to original optionalOrd(2) — the split was unnecessary as other
  functions using the same pattern (round, bround, rand) were not
  affected
- Update testEnumUnionString comment to reflect actual Hive 2.3.9
  error (AssertionError in UnparseTranslator, not "stricter type
  checking")
@YogeshKothari26 YogeshKothari26 marked this pull request as draft April 2, 2026 04:36
@YogeshKothari26 YogeshKothari26 marked this pull request as ready for review April 3, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant