Skip to content

[Java] Dataset: JVM crash when read Parquet from S3(Apache Arrow 12.0.0) #35632

@REASY

Description

@REASY

Hello, team,

I upgraded Apache Arrow 10.0.0 to 12.0.0 and started getting JVM crash. Here is the error I got:

# A fatal error has been detected by the Java Runtime Environment:
# SIGSEGV (0xb) at pc=0x00007f35c6ff4ae7, pid=1, tid=132
# JRE version: OpenJDK Runtime Environment 18.9 (11.0.16+8) (build 11.0.16+8)
# Java VM: OpenJDK 64-Bit Server VM 18.9 (11.0.16+8, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [jnilib-6496609145223316687.tmp+0x1decae7] Aws::Utils::Crypto::CreateSecureRandomBytesImplementation()+0x17
#
# Core dump will be written. Default location: /app/core.1
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
[error occurred during error reporting (), id 0xb, SIGSEGV (0xb) at pc=0x00007f3721d45602]

I've tried to search for the usage of the method CreateSecureRandomBytesImplementation in C/C++ codebase, but could not find anything.

My understanding that JVM libary loads platform specific native libarrow_dataset_jni to use C++ Dataset API:
image

Versions

Apache Arrow

12.0.0

Java

openjdk 11.0.16 2022-07-19
OpenJDK Runtime Environment 18.9 (build 11.0.16+8)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.16+8, mixed mode, sharing)

OS

Debian GNU/Linux 11, Linux d90729487d5a 5.14.0-1059-oem #67-Ubuntu SMP Mon Mar 13 14:22:10 UTC 2023 x86_64 GNU/Linux

Component(s)

Java

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions