Skip to content

pyarrow crash on converstion of Pandas dataframe -> arrow with Decimal column #1888

@robambalu

Description

@robambalu

I was attempting to grab some data form a db -> dataframe -> arrow table. Some columns came in from the db as Decimal types, and some elements were None
This appears to cause a crash in the release build of pyarrow, and a proper error + std::abort in the debug build. numpy - > arrow conversion deduces that the column is a decimal type, but then ConvertDecimals barfs on the None type... Needless to say, if possible, would prefer to get a python exception that a crash / abort.
Relevant Stack:
#0 0x00007ffff712b1d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff712c8c8 in abort () from /lib64/libc.so.6
#2 0x00007fffeb521fda in arrow::internal::CerrLog::~CerrLog (this=0x7fffffffaaa0, __in_chrg=) at /home/ra7293/arrow/cpp/src/arrow/util/logging.h:112
#3 0x00007fffeb1256d0 in arrow::py::internal::DecimalMetadata::Update (this=0x7fffffffaf00, object=0x88e760 <_Py_NoneStruct>) at /home/ra7293/arrow/cpp/src/arrow/python/helpers.cc:270
#4 0x00007fffeb131a36 in arrow::py::NumPyConverter::ConvertDecimals (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:789
#5 0x00007fffeb1376d2 in arrow::py::NumPyConverter::ConvertObjectsInfer (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1090
#6 0x00007fffeb138aae in arrow::py::NumPyConverter::ConvertObjects (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1176
#7 0x00007fffeb1313a7 in arrow::py::NumPyConverter::Convert (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:547
#8 0x00007fffeb13cbf0 in arrow::py::NdarrayToArrow (pool=0x7fffebbfa240 arrow::default_memory_pool()::default_memory_pool_, ao=0x7ffff0968030, mo=0x88e760 <_Py_NoneStruct>,
use_pandas_null_sentinels=true, type=..., out=0x7fffffffb9e0) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1725
#9 0x00007fffebcca52b in __pyx_f_7pyarrow_3lib__ndarray_to_array (__pyx_v_values=0x7ffff0968030, __pyx_v_mask=0x88e760 <_Py_NoneStruct>, __pyx_v_type=0x88e760 <_Py_NoneStruct>,

Quick repro:
import pyarrow as pa
import pandas as pd
from decimal import Decimal

df = pd.DataFrame( { "test" : [ None, Decimal(1.0), Decimal(2.0), None ] } )
print(df, df["test"])
pa.Table.from_pandas( df )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions