-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
I was attempting to grab some data form a db -> dataframe -> arrow table. Some columns came in from the db as Decimal types, and some elements were None
This appears to cause a crash in the release build of pyarrow, and a proper error + std::abort in the debug build. numpy - > arrow conversion deduces that the column is a decimal type, but then ConvertDecimals barfs on the None type... Needless to say, if possible, would prefer to get a python exception that a crash / abort.
Relevant Stack:
#0 0x00007ffff712b1d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff712c8c8 in abort () from /lib64/libc.so.6
#2 0x00007fffeb521fda in arrow::internal::CerrLog::~CerrLog (this=0x7fffffffaaa0, __in_chrg=) at /home/ra7293/arrow/cpp/src/arrow/util/logging.h:112
#3 0x00007fffeb1256d0 in arrow::py::internal::DecimalMetadata::Update (this=0x7fffffffaf00, object=0x88e760 <_Py_NoneStruct>) at /home/ra7293/arrow/cpp/src/arrow/python/helpers.cc:270
#4 0x00007fffeb131a36 in arrow::py::NumPyConverter::ConvertDecimals (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:789
#5 0x00007fffeb1376d2 in arrow::py::NumPyConverter::ConvertObjectsInfer (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1090
#6 0x00007fffeb138aae in arrow::py::NumPyConverter::ConvertObjects (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1176
#7 0x00007fffeb1313a7 in arrow::py::NumPyConverter::Convert (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:547
#8 0x00007fffeb13cbf0 in arrow::py::NdarrayToArrow (pool=0x7fffebbfa240 arrow::default_memory_pool()::default_memory_pool_, ao=0x7ffff0968030, mo=0x88e760 <_Py_NoneStruct>,
use_pandas_null_sentinels=true, type=..., out=0x7fffffffb9e0) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1725
#9 0x00007fffebcca52b in __pyx_f_7pyarrow_3lib__ndarray_to_array (__pyx_v_values=0x7ffff0968030, __pyx_v_mask=0x88e760 <_Py_NoneStruct>, __pyx_v_type=0x88e760 <_Py_NoneStruct>,
Quick repro:
import pyarrow as pa
import pandas as pd
from decimal import Decimal
df = pd.DataFrame( { "test" : [ None, Decimal(1.0), Decimal(2.0), None ] } )
print(df, df["test"])
pa.Table.from_pandas( df )