- 2.25.0 (latest)
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
DataFrameGroupBy(
    block: bigframes.core.blocks.Block,
    by_col_ids: typing.Sequence[str],
    *,
    selected_cols: typing.Optional[typing.Sequence[str]] = None,
    dropna: bool = True,
    as_index: bool = True
)Class for grouping and aggregating relational data.
Methods
agg
agg(
    func=None, **kwargs
) -> typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]Aggregate using one or more operations.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> data = {"A": [1, 1, 2, 2],
...         "B": [1, 2, 3, 4],
...         "C": [0.362838, 0.227877, 1.267767, -0.562860]}
>>> df = bpd.DataFrame(data)
The aggregation is for each column.
>>> df.groupby('A').agg('min')
    B         C
A
1  1  0.227877
2  3  -0.56286
<BLANKLINE>
[2 rows x 2 columns]
Multiple aggregations
>>> df.groupby('A').agg(['min', 'max'])
    B             C
       min max       min       max
A
1        1   2  0.227877  0.362838
2        3   4  -0.56286  1.267767
<BLANKLINE>
[2 rows x 4 columns]
| Parameter | |
|---|---|
| Name | Description | 
| func | function, str, list, dict or NoneFunction to use for aggregating the data. Accepted combinations are: - string function name - list of function names, e.g.  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame | A BigQuery DataFrame. | 
aggregate
aggregate(
    func=None, **kwargs
) -> typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]Aggregate using one or more operations.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> data = {"A": [1, 1, 2, 2],
...         "B": [1, 2, 3, 4],
...         "C": [0.362838, 0.227877, 1.267767, -0.562860]}
>>> df = bpd.DataFrame(data)
The aggregation is for each column.
>>> df.groupby('A').aggregate('min')
    B         C
A
1  1  0.227877
2  3  -0.56286
<BLANKLINE>
[2 rows x 2 columns]
Multiple aggregations
>>> df.groupby('A').agg(['min', 'max'])
    B             C
       min max       min       max
A
1        1   2  0.227877  0.362838
2        3   4  -0.56286  1.267767
<BLANKLINE>
[2 rows x 4 columns]
| Parameter | |
|---|---|
| Name | Description | 
| func | function, str, list, dict or NoneFunction to use for aggregating the data. Accepted combinations are: - string function name - list of function names, e.g.  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame | A BigQuery DataFrame. | 
all
all() -> bigframes.dataframe.DataFrameReturn True if all values in the group are true, else False.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([1, 2, 0], index=lst)
>>> ser.groupby(level=0).all()
a     True
b    False
dtype: boolean
For DataFrameGroupBy:
>>> data = [[1, 0, 3], [1, 5, 6], [7, 8, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["ostrich", "penguin", "parrot"])
>>> df.groupby(by=["a"]).all()
        b       c
a
1   False    True
7   True    True
<BLANKLINE>
[2 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | DataFrame or Series of boolean values, where a value is True if all elements are True within its respective group; otherwise False. | 
any
any() -> bigframes.dataframe.DataFrameReturn True if any value in the group is true, else False.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([1, 2, 0], index=lst)
>>> ser.groupby(level=0).any()
a     True
b    False
dtype: boolean
For DataFrameGroupBy:
>>> data = [[1, 0, 3], [1, 0, 6], [7, 1, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["ostrich", "penguin", "parrot"])
>>> df.groupby(by=["a"]).any()
        b       c
a
1   False    True
7   True    True
<BLANKLINE>
[2 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | DataFrame or Series of boolean values, where a value is True if any element is True within its respective group; otherwise False. | 
count
count() -> bigframes.dataframe.DataFrameCompute count of group, excluding missing values.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([1, 2, np.nan], index=lst)
>>> ser.groupby(level=0).count()
a     2
b     0
dtype: Int64
For DataFrameGroupBy:
>>> data = [[1, np.nan, 3], [1, np.nan, 6], [7, 8, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["cow", "horse", "bull"])
>>> df.groupby(by=["a"]).count()
   b  c
a
1  0  2
7  1  1
<BLANKLINE>
[2 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Count of values within each group. | 
cumcount
cumcount(ascending: bool = True)Number each item in each group from 0 to the length of that group - 1. (DataFrameGroupBy functionality is not yet available.)
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b', 'b', 'c']
>>> ser = bpd.Series([5, 1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).cumcount()
a    0
a    1
b    0
b    1
c    0
dtype: Int64
>>> ser.groupby(level=0).cumcount(ascending=False)
a    0
a    1
b    0
b    1
c    0
dtype: Int64
| Parameter | |
|---|---|
| Name | Description | 
| ascending | bool, default TrueIf False, number in reverse, from length of group - 1 to 0. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.Series | Sequence number of each element within each group. | 
cummax
cummax(
    *args, numeric_only: bool = False, **kwargs
) -> bigframes.dataframe.DataFrameCumulative max for each group.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([6, 2, 0], index=lst)
>>> ser.groupby(level=0).cummax()
a    6
a    6
b    0
dtype: Int64
For DataFrameGroupBy:
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["fox", "gorilla", "lion"])
>>> df.groupby("a").cummax()
         b  c
fox      8  2
gorilla  8  5
lion     6  9
<BLANKLINE>
[3 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Cumulative max for each group. | 
cummin
cummin(
    *args, numeric_only: bool = False, **kwargs
) -> bigframes.dataframe.DataFrameCumulative min for each group.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([6, 2, 0], index=lst)
>>> ser.groupby(level=0).cummin()
a    6
a    2
b    0
dtype: Int64
For DataFrameGroupBy:
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["fox", "gorilla", "lion"])
>>> df.groupby("a").cummin()
         b  c
fox      8  2
gorilla  2  2
lion     6  9
<BLANKLINE>
[3 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Cumulative min for each group. | 
cumprod
cumprod(*args, **kwargs) -> bigframes.dataframe.DataFrameCumulative product for each group.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([6, 2, 0], index=lst)
>>> ser.groupby(level=0).cumprod()
a     6.0
a    12.0
b     0.0
dtype: Float64
For DataFrameGroupBy:
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["cow", "horse", "bull"])
>>> df.groupby("a").cumprod()
          b     c
cow     8.0   2.0
horse  16.0  10.0
bull    6.0   9.0
<BLANKLINE>
[3 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Cumulative product for each group. | 
cumsum
cumsum(
    *args, numeric_only: bool = False, **kwargs
) -> bigframes.dataframe.DataFrameCumulative sum for each group.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([6, 2, 0], index=lst)
>>> ser.groupby(level=0).cumsum()
a    6
a    8
b    0
dtype: Int64
For DataFrameGroupBy:
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["fox", "gorilla", "lion"])
>>> df.groupby("a").cumsum()
          b  c
fox       8  2
gorilla  10  7
lion      6  9
<BLANKLINE>
[3 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Cumulative sum for each group. | 
diff
diff(periods=1) -> bigframes.series.SeriesFirst discrete difference of element. Calculates the difference of each element compared with another element in the group (default is element in previous row).
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = bpd.Series([7, 2, 8, 4, 3, 3], index=lst)
>>> ser.groupby(level=0).diff()
a    <NA>
a      -5
a       6
b    <NA>
b      -1
b       0
dtype: Int64
For DataFrameGroupBy:
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = bpd.DataFrame(data, index=['dog', 'dog', 'dog',
...                   'mouse', 'mouse', 'mouse', 'mouse'])
>>> df.groupby(level=0).diff()
          a     b
dog    <NA>  <NA>
dog       2     3
dog       2     4
mouse  <NA>  <NA>
mouse     0     0
mouse     1    -2
mouse    -5    -1
<BLANKLINE>
[7 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | First differences. | 
expanding
expanding(min_periods: int = 1) -> bigframes.core.window.rolling.WindowProvides expanding functionality.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'c', 'c', 'e']
>>> ser = bpd.Series([1, 0, -2, -1, 2], index=lst)
>>> ser.groupby(level=0).expanding().min()
index  index
a      a         1
       a         0
c      c        -2
       c        -2
e      e         2
dtype: Int64
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | An expanding grouper, providing expanding functionality per group. | 
head
head(n: int = 5) -> bigframes.dataframe.DataFrameReturn last first n rows of each group
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([[1, 2], [1, 4], [5, 6]],
...                   columns=['A', 'B'])
>>> df.groupby('A').head(1)
   A  B
0  1  2
2  5  6
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| n | intIf positive: number of entries to include from start of each group. If negative: number of entries to exclude from end of each group. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | First n rows of the original DataFrame or Series | 
kurt
kurt(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrameReturn unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b']
>>> ser = bpd.Series([0, 1, 1, 0, 0, 1, 2, 4, 5], index=lst)
>>> ser.groupby(level=0).kurt()
a        -6.0
b   -1.963223
dtype: Float64
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Variance of values within each group. | 
kurtosis
kurtosis(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrameReturn unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b']
>>> ser = bpd.Series([0, 1, 1, 0, 0, 1, 2, 4, 5], index=lst)
>>> ser.groupby(level=0).kurtosis()
a        -6.0
b   -1.963223
dtype: Float64
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Variance of values within each group. | 
max
max(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrameCompute max of group values.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).max()
a     2
b     4
dtype: Int64
For DataFrameGroupBy:
>>> data = [[1, 8, 2], [1, 2, 5], [2, 5, 8], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["tiger", "leopard", "cheetah", "lion"])
>>> df.groupby(by=["a"]).max()
   b  c
a
1  8  5
2  6  9
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean columns. | 
| min_count | int, default 0The required number of valid values to perform the operation. If fewer than  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Computed max of values within each group. | 
mean
mean(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrameCompute mean of groups, excluding missing values.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 1, 2, 1, 2],
...                    'B': [np.nan, 2, 3, 4, 5],
...                    'C': [1, 2, 1, 1, 2]}, columns=['A', 'B', 'C'])
Groupby one column and return the mean of the remaining columns in each group.
>>> df.groupby('A').mean()
    B         C
A
1  3.0  1.333333
2  4.0       1.5
<BLANKLINE>
[2 rows x 2 columns]
Groupby two columns and return the mean of the remaining column.
>>> df.groupby(['A', 'B']).mean()
         C
A B
1 2.0  2.0
  4.0  1.0
2 3.0  1.0
  5.0  2.0
<BLANKLINE>
[4 rows x 1 columns]
Groupby one column and return the mean of only particular column in the group.
>>> df.groupby('A')['B'].mean()
A
1    3.0
2    4.0
Name: B, dtype: Float64
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Mean of groups. | 
median
median(
    numeric_only: bool = False, *, exact: bool = True
) -> bigframes.dataframe.DataFrameCompute median of groups, excluding missing values.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = bpd.Series([7, 2, 8, 4, 3, 3], index=lst)
>>> ser.groupby(level=0).median()
a    7.0
b    3.0
dtype: Float64
For DataFrameGroupBy:
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = bpd.DataFrame(data, index=['dog', 'dog', 'dog',
...                    'mouse', 'mouse', 'mouse', 'mouse'])
>>> df.groupby(level=0).median()
        a    b
dog    3.0  4.0
mouse  7.0  3.0
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean columns. | 
| exact | bool, default TrueCalculate the exact median instead of an approximation. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Median of groups. | 
min
min(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrameCompute min of group values.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).min()
a     1
b     3
dtype: Int64
For DataFrameGroupBy:
>>> data = [[1, 8, 2], [1, 2, 5], [2, 5, 8], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["tiger", "leopard", "cheetah", "lion"])
>>> df.groupby(by=["a"]).min()
   b  c
a
1  2  2
2  5  8
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean columns. | 
| min_count | int, default 0The required number of valid values to perform the operation. If fewer than  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Computed min of values within each group. | 
nunique
nunique() -> bigframes.dataframe.DataFrameReturn DataFrame with counts of unique elements in each position.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'id': ['spam', 'egg', 'egg', 'spam',
...                           'ham', 'ham'],
...                    'value1': [1, 5, 5, 2, 5, 5],
...                    'value2': list('abbaxy')})
>>> df.groupby('id').nunique()
      value1  value2
id
egg        1       1
ham        1       2
spam       2       1
<BLANKLINE>
[3 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame | Number of unique values within a BigQuery DataFrame. | 
prod
prod(numeric_only: bool = False, min_count: int = 0)Compute prod of group values. (DataFrameGroupBy functionality is not yet available.)
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).prod()
a     2.0
b    12.0
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean columns. | 
| min_count | int, default 0The required number of valid values to perform the operation. If fewer than  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Computed prod of values within each group. | 
quantile
quantile(
    q: typing.Union[float, typing.Sequence[float]] = 0.5, *, numeric_only: bool = False
) -> bigframes.dataframe.DataFrameReturn group values at the given quantile, a la numpy.percentile.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([
...     ['a', 1], ['a', 2], ['a', 3],
...     ['b', 1], ['b', 3], ['b', 5]
... ], columns=['key', 'val'])
>>> df.groupby('key').quantile()
     val
key
a    2.0
b    3.0
<BLANKLINE>
[2 rows x 1 columns]
| Parameters | |
|---|---|
| Name | Description | 
| q | float or array-like, default 0.5 (50% quantile)Value(s) between 0 and 1 providing the quantile(s) to compute. | 
| numeric_only | bool, default FalseInclude only  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Return type determined by caller of GroupBy object. | 
rank
rank(
    method="average", ascending: bool = True, na_option: str = "keep"
) -> bigframes.dataframe.DataFrameProvide the rank of values within each group.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(
...     {
...         "group": ["a", "a", "a", "a", "a", "b", "b", "b", "b", "b"],
...         "value": [2, 4, 2, 3, 5, 1, 2, 4, 1, 5],
...     }
... )
>>> df
group  value
0     a      2
1     a      4
2     a      2
3     a      3
4     a      5
5     b      1
6     b      2
7     b      4
8     b      1
9     b      5
<BLANKLINE>
[10 rows x 2 columns]
>>> for method in ['average', 'min', 'max', 'dense', 'first']:
...     df[f'{method}_rank'] = df.groupby('group')['value'].rank(method)
>>> df
group  value  average_rank  min_rank  max_rank  dense_rank  first_rank
0     a      2           1.5       1.0       2.0         1.0         1.0
1     a      4           4.0       4.0       4.0         3.0         4.0
2     a      2           1.5       1.0       2.0         1.0         2.0
3     a      3           3.0       3.0       3.0         2.0         3.0
4     a      5           5.0       5.0       5.0         4.0         5.0
5     b      1           1.5       1.0       2.0         1.0         1.0
6     b      2           3.0       3.0       3.0         2.0         3.0
7     b      4           4.0       4.0       4.0         3.0         4.0
8     b      1           1.5       1.0       2.0         1.0         2.0
9     b      5           5.0       5.0       5.0         4.0         5.0
<BLANKLINE>
[10 rows x 7 columns]
| Parameters | |
|---|---|
| Name | Description | 
| method | {'average', 'min', 'max', 'first', 'dense'}, default 'average'
 | 
| ascending | bool, default TrueFalse for ranks by high (1) to low (N). | 
| na_option | {'keep', 'top', 'bottom'}, default 'keep'
 | 
rolling
rolling(
    window: (
        int
        | pandas._libs.tslibs.timedeltas.Timedelta
        | numpy.timedelta64
        | datetime.timedelta
        | str
    ),
    min_periods=None,
    on: str | None = None,
    closed: typing.Literal["right", "left", "both", "neither"] = "right",
) -> bigframes.core.window.rolling.WindowReturns a rolling grouper, providing rolling functionality per group.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'a', 'a', 'e']
>>> ser = bpd.Series([1, 0, -2, -1, 2], index=lst)
>>> ser.groupby(level=0).rolling(2).min()
index  index
a      a        <NA>
    a           0
    a          -2
    a          -2
e      e        <NA>
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| window | int, pandas.Timedelta, numpy.timedelta64, datetime.timedelta, strSize of the moving window. If an integer, the fixed number of observations used for each window. If a string, the timedelta representation in string. This string must be parsable by pandas.Timedelta(). Otherwise, the time range for each window. | 
| min_periods | int, default NoneMinimum number of observations in window required to have a value; otherwise, result is  | 
| on | str, optionalFor a DataFrame, a column label on which to calculate the rolling window, rather than the DataFrame’s index. | 
| closed | str, default 'right'If 'right', the first point in the window is excluded from calculations. If 'left', the last point in the window is excluded from calculations. If 'both', the no points in the window are excluded from calculations. If 'neither', the first and last points in the window are excluded from calculations. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Return a new grouper with our rolling appended. | 
shift
shift(periods=1) -> bigframes.series.SeriesShift each group by periods observations.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).shift(1)
a    <NA>
a       1
b    <NA>
b       3
dtype: Int64
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [2, 5, 8], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["tuna", "salmon", "catfish", "goldfish"])
>>> df.groupby("a").shift(1)
             b     c
tuna      <NA>  <NA>
salmon       2     3
catfish   <NA>  <NA>
goldfish     5     8
<BLANKLINE>
[4 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| periods | int, default 1Number of periods to shift. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Object shifted within each group. | 
size
size() -> typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]Compute group sizes.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([1, 2, 3], index=lst)
>>> ser
a     1
a     2
b     3
dtype: Int64
>>> ser.groupby(level=0).size()
a    2
b    1
dtype: Int64
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["owl", "toucan", "eagle"])
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
[3 rows x 3 columns]
>>> df.groupby("a").size()
a
1    2
7    1
dtype: Int64
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Number of rows in each group as a Series if as_index is True or a DataFrame if as_index is False. | 
skew
skew(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrameReturn unbiased skew within groups.
Normalized by N-1.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> ser = bpd.Series([390., 350., 357., np.nan, 22., 20., 30.],
...                  index=['Falcon', 'Falcon', 'Falcon', 'Falcon',
...                         'Parrot', 'Parrot', 'Parrot'],
...                  name="Max Speed")
>>> ser.groupby(level=0).skew()
Falcon    1.525174
Parrot    1.457863
Name: Max Speed, dtype: Float64
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Variance of values within each group. | 
std
std(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrameCompute standard deviation of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = bpd.Series([7, 2, 8, 4, 3, 3], index=lst)
>>> ser.groupby(level=0).std()
a     3.21455
b     0.57735
dtype: Float64
For DataFrameGroupBy:
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = bpd.DataFrame(data, index=['dog', 'dog', 'dog',
...                    'mouse', 'mouse', 'mouse', 'mouse'])
>>> df.groupby(level=0).std()
              a         b
dog         2.0  3.511885
mouse  2.217356       1.5
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Standard deviation of values within each group. | 
sum
sum(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrameCompute sum of group values.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).sum()
a     3
b     7
dtype: Int64
For DataFrameGroupBy:
>>> data = [[1, 8, 2], [1, 2, 5], [2, 5, 8], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["tiger", "leopard", "cheetah", "lion"])
>>> df.groupby("a").sum()
    b   c
a
1  10   7
2  11  17
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean columns. | 
| min_count | int, default 0The required number of valid values to perform the operation. If fewer than  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Computed sum of values within each group. | 
var
var(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrameCompute variance of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex.
Examples:
For SeriesGroupBy:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = bpd.Series([7, 2, 8, 4, 3, 3], index=lst)
>>> ser.groupby(level=0).var()
a   10.333333
b    0.333333
dtype: Float64
For DataFrameGroupBy:
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = bpd.DataFrame(data, index=['dog', 'dog', 'dog',
...                    'mouse', 'mouse', 'mouse', 'mouse'])
>>> df.groupby(level=0).var()
              a          b
dog         4.0  12.333333
mouse  4.916667       2.25
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.pandas.DataFrame or bigframes.pandas.Series | Variance of values within each group. |