Skip to content

Conversation

@xhochy
Copy link
Member

@xhochy xhochy commented May 16, 2017

This mainly uses the same logic we already use for arrow-cpp

@wesm
Copy link
Member

wesm commented May 16, 2017

The manylinux1 build failed looks like

Change-Id: Id45eaf3a5d067df644a23f0d6ff094fa02fd69a4
@xhochy xhochy force-pushed the parquet-abi-version-bundling branch from 3158c4c to 4aa17f8 Compare May 17, 2017 11:43
@xhochy
Copy link
Member Author

xhochy commented May 17, 2017

Fixed the build

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I will have to rebase #700

@asfgit asfgit closed this in a4f3259 May 17, 2017
jeffknupp pushed a commit to jeffknupp/arrow that referenced this pull request Jun 3, 2017
This mainly uses the same logic we already use for arrow-cpp

Author: Uwe L. Korn <uwelk@xhochy.com>

Closes apache#698 from xhochy/parquet-abi-version-bundling and squashes the following commits:

4aa17f8 [Uwe L. Korn] ARROW-1030: Python: Account for library versioning in parquet-cpp
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
## What's Changed

This PR relates to apache#698 and is the second in a series intended to
provide full Avro read / write support in native Java. It adds
round-trip tests for both schemas (Arrow schema -> Avro -> Arrow) and
data (Arrow VSR -> Avro block -> Arrow VSR). It also adds a number of
fixes and improvements to the Avro Consumers so that data arrives back
in its original form after a round trip. The main changes are:

* Added a top level method in AvroToArrow to convert Avro schema
directly to Arrow schema (this may exist elsewhere, but is needed to
provide an API that matches the logic of this implementation)
* Avro unions of [ type, null ] or [ null, type ] now have special
handling, these are interpreted as a single nullable type rather than a
union. Setting legacyMode = false in the AvroToArrowConfig object is
required to enable this behaviour, otherwise unions are interpreted
literally. Unions with more than 2 elements are always interpreted
literally (but, per apache#108, in practice Java's current Union
implementation is probably not usable with Avro atm).
* Added support for new logical types (decimal 256, timestamp nano and 3
local timestamp types)
* Existing timestamp-mills and timestamp-micros times now interpreted as
zone-aware (previously they were interpreted as local, but now the local
timestamp types are interpreted as local - I think this is correct per
the [Avro
spec](https://avro.apache.org/docs/1.12.0/specification/#timestamps)).
Requires setting legacyMode = false.
* Removed namespaces from generated Arrow field names in complex types.
E.g. the Avro field myNamepsace.outerRecord.structField.intField should
be called just "intField" inside the Arrow struct. This doesn't affect
the skip field logic, which still works using the qualified names. This
requires setting legacyMode = false.
* Remove unexpected metadata in generated Arrow fields (empty alias
lists and attributes interpreted as part of the field schema). This
requires setting legacyMode = false.
* Use the expected child vector names for Arrow LIST and MAP types when
reading. For LIST, the default child vector is called "$data$" which is
illegal in Avro, so the child field name is also changed to "item" in
the producers. This requires setting legacyMode = false.

Breaking changes have been removed from this PR.

Per discussion below, all breaking changes are now behind a "legacyMode"
flag in the AvroToArrowConfig object, which is enabled by default in all
the original code paths.

Closes apache#698 .

This change is meant to allow for round trip of schemas and individual
Avro data blocks (one Avro data block -> one VSR). File-level
capabilities are not included. I have not included anything to recycle
the VSR as part of the read API, this feels like it belongs with the
file-level piece. Also I have not done anything specific for enums /
dict encoding as of yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants