pandas.api.extensions.ExtensionArray#
- class pandas.api.extensions.ExtensionArray[source]#
- Abstract base class for custom 1-D array types. - pandas will recognize instances of this class as proper arrays with a custom type and will not attempt to coerce them to objects. They may be stored directly inside a - DataFrameor- Series.- Attributes - An instance of ExtensionDtype. - The number of bytes needed to store this object in memory. - Extension Arrays are only allowed to be 1-dimensional. - Return a tuple of the array dimensions. - Methods - argsort(*[, ascending, kind, na_position])- Return the indices that would sort this array. - astype(dtype[, copy])- Cast to a NumPy array or ExtensionArray with 'dtype'. - copy()- Return a copy of the array. - dropna()- Return ExtensionArray without NA values. - duplicated([keep])- Return boolean ndarray denoting duplicate values. - factorize([use_na_sentinel])- Encode the extension array as an enumerated type. - fillna(value[, limit, copy])- Fill NA/NaN values using the specified method. - equals(other)- Return if another array is equivalent to this array. - insert(loc, item)- Insert an item at the given position. - interpolate(*, method, axis, index, limit, ...)- Fill NaN values using an interpolation method. - isin(values)- Pointwise comparison for set containment in the given values. - isna()- A 1-D array indicating if each value is missing. - ravel([order])- Return a flattened view on this array. - repeat(repeats[, axis])- Repeat elements of an ExtensionArray. - searchsorted(value[, side, sorter])- Find indices where elements should be inserted to maintain order. - shift([periods, fill_value])- Shift values by desired number. - take(indices, *[, allow_fill, fill_value])- Take elements from an array. - tolist()- Return a list of the values. - unique()- Compute the ExtensionArray of unique values. - view([dtype])- Return a view on the array. - _accumulate(name, *[, skipna])- Return an ExtensionArray performing an accumulation operation. - _concat_same_type(to_concat)- Concatenate multiple array of this dtype. - _explode()- Transform each element of list-like to a row. - _formatter([boxed])- Formatting function for scalar values. - _from_factorized(values, original)- Reconstruct an ExtensionArray after factorization. - _from_sequence(scalars, *[, dtype, copy])- Construct a new ExtensionArray from a sequence of scalars. - _from_sequence_of_strings(strings, *, dtype)- Construct a new ExtensionArray from a sequence of strings. - _hash_pandas_object(*, encoding, hash_key, ...)- Hook for hash_pandas_object. - _pad_or_backfill(*, method[, limit, ...])- Pad or backfill values, used by Series/DataFrame ffill and bfill. - _reduce(name, *[, skipna, keepdims])- Return a scalar result of performing the reduction operation. - Return values for sorting. - Return an array and missing value suitable for factorization. - See also - api.extensions.ExtensionDtype
- A custom data type, to be paired with an ExtensionArray. 
- api.extensions.ExtensionArray.dtype
- An instance of ExtensionDtype. 
 - Notes - The interface includes the following abstract methods that must be implemented by subclasses: - _from_sequence 
- _from_factorized 
- __getitem__ 
- __len__ 
- __eq__ 
- dtype 
- nbytes 
- isna 
- take 
- copy 
- _concat_same_type 
- interpolate 
 - A default repr displaying the type, (truncated) data, length, and dtype is provided. It can be customized or replaced by by overriding: - __repr__ : A default repr for the ExtensionArray. 
- _formatter : Print scalars inside a Series or DataFrame. 
 - Some methods require casting the ExtensionArray to an ndarray of Python objects with - self.astype(object), which may be expensive. When performance is a concern, we highly recommend overriding the following methods:- fillna 
- _pad_or_backfill 
- dropna 
- unique 
- factorize / _values_for_factorize 
- argsort, argmax, argmin / _values_for_argsort 
- searchsorted 
- map 
 - The remaining methods implemented on this class should be performant, as they only compose abstract methods. Still, a more efficient implementation may be available, and these methods can be overridden. - One can implement methods to handle array accumulations or reductions. - _accumulate 
- _reduce 
 - One can implement methods to handle parsing from strings that will be used in methods such as - pandas.io.parsers.read_csv.- _from_sequence_of_strings 
 - This class does not inherit from ‘abc.ABCMeta’ for performance reasons. Methods and properties required by the interface raise - pandas.errors.AbstractMethodErrorand no- registermethod is provided for registering virtual subclasses.- ExtensionArrays are limited to 1 dimension. - They may be backed by none, one, or many NumPy arrays. For example, - pandas.Categoricalis an extension array backed by two arrays, one for codes and one for categories. An array of IPv6 address may be backed by a NumPy structured array with two fields, one for the lower 64 bits and one for the upper 64 bits. Or they may be backed by some other storage type, like Python lists. Pandas makes no assumptions on how the data are stored, just that it can be converted to a NumPy array. The ExtensionArray interface does not impose any rules on how this data is stored. However, currently, the backing data cannot be stored in attributes called- .valuesor- ._valuesto ensure full compatibility with pandas internals. But other names as- .data,- ._data,- ._items, … can be freely used.- If implementing NumPy’s - __array_ufunc__interface, pandas expects that- You defer by returning - NotImplementedwhen any Series are present in inputs. Pandas will extract the arrays and call the ufunc again.
- You define a - _HANDLED_TYPEStuple as an attribute on the class. Pandas inspect this to determine whether the ufunc is valid for the types present.
 - See NumPy universal functions for more. - By default, ExtensionArrays are not hashable. Immutable subclasses may override this behavior. - Examples - Please see the following: