pandas.DataFrame.interpolate#
- DataFrame.interpolate(method='linear', *, axis=0, limit=None, inplace=False, limit_direction=None, limit_area=None, **kwargs)[source]#
Fill NaN values using an interpolation method.
Please note that only
method='linear'is supported for DataFrame/Series with a MultiIndex.- Parameters:
- methodstr, default ‘linear’
Interpolation technique to use. One of:
‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.
‘time’: Works on daily and higher resolution data to interpolate given length of interval. This interpolates values based on time interval between observations.
‘index’: The interpolation uses the numerical values of the DataFrame’s index to linearly calculate missing values.
‘values’: Interpolation based on the numerical values in the DataFrame, treating them as equally spaced along the index.
‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d, whereas ‘spline’ is passed to scipy.interpolate.UnivariateSpline. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g.
df.interpolate(method='polynomial', order=5). Note that, slinear method in Pandas refers to the Scipy first order spline instead of Pandas first order spline.‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes.
‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives.
- axis{{0 or ‘index’, 1 or ‘columns’, None}}, default None
Axis to interpolate along. For Series this parameter is unused and defaults to 0.
- limitint, optional
Maximum number of consecutive NaNs to fill. Must be greater than 0.
- inplacebool, default False
Update the data in place if possible.
- limit_direction{{‘forward’, ‘backward’, ‘both’}}, optional, default ‘forward’
Consecutive NaNs will be filled in this direction.
- limit_area{{None, ‘inside’, ‘outside’}}, default None
If limit is specified, consecutive NaNs will be filled with this restriction.
None: No fill restriction.‘inside’: Only fill NaNs surrounded by valid values (interpolate).
‘outside’: Only fill NaNs outside valid values (extrapolate).
- **kwargsoptional
Keyword arguments to pass on to the interpolating function.
- Returns:
- Series or DataFrame or None
Returns the same object type as the caller, interpolated at some or all
NaNvalues or None ifinplace=True.
See also
fillnaFill missing values using different methods.
scipy.interpolate.Akima1DInterpolatorPiecewise cubic polynomials (Akima interpolator).
scipy.interpolate.BPoly.from_derivativesPiecewise polynomial in the Bernstein basis.
scipy.interpolate.interp1dInterpolate a 1-D function.
scipy.interpolate.KroghInterpolatorInterpolate polynomial (Krogh interpolator).
scipy.interpolate.PchipInterpolatorPCHIP 1-d monotonic cubic interpolation.
scipy.interpolate.CubicSplineCubic spline data interpolator.
Notes
The ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’ and ‘akima’ methods are wrappers around the respective SciPy implementations of similar names. These use the actual numerical values of the index. For more information on their behavior, see the SciPy documentation.
Examples
Filling in
NaNin aSeriesvia linear interpolation.>>> s = pd.Series([0, 1, np.nan, 3]) >>> s 0 0.0 1 1.0 2 NaN 3 3.0 dtype: float64 >>> s.interpolate() 0 0.0 1 1.0 2 2.0 3 3.0 dtype: float64
Filling in
NaNin a Series via polynomial interpolation or splines: Both ‘polynomial’ and ‘spline’ methods require that you also specify anorder(int).>>> s = pd.Series([0, 2, np.nan, 8]) >>> s.interpolate(method="polynomial", order=2) 0 0.000000 1 2.000000 2 4.666667 3 8.000000 dtype: float64
Fill the DataFrame forward (that is, going down) along each column using linear interpolation.
Note how the last entry in column ‘a’ is interpolated differently, because there is no entry after it to use for interpolation. Note how the first entry in column ‘b’ remains
NaN, because there is no entry before it to use for interpolation.>>> df = pd.DataFrame( ... [ ... (0.0, np.nan, -1.0, 1.0), ... (np.nan, 2.0, np.nan, np.nan), ... (2.0, 3.0, np.nan, 9.0), ... (np.nan, 4.0, -4.0, 16.0), ... ], ... columns=list("abcd"), ... ) >>> df a b c d 0 0.0 NaN -1.0 1.0 1 NaN 2.0 NaN NaN 2 2.0 3.0 NaN 9.0 3 NaN 4.0 -4.0 16.0 >>> df.interpolate(method="linear", limit_direction="forward", axis=0) a b c d 0 0.0 NaN -1.0 1.0 1 1.0 2.0 -2.0 5.0 2 2.0 3.0 -3.0 9.0 3 2.0 4.0 -4.0 16.0
Using polynomial interpolation.
>>> df["d"].interpolate(method="polynomial", order=2) 0 1.0 1 4.0 2 9.0 3 16.0 Name: d, dtype: float64