- 2.25.0 (latest)
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
Session(
    context: typing.Optional[bigframes._config.bigquery_options.BigQueryOptions] = None,
    clients_provider: typing.Optional[bigframes.session.clients.ClientsProvider] = None,
)Establishes a BigQuery connection to capture a group of job activities related to DataFrames.
| Parameters | |
|---|---|
| Name | Description | 
| context | bigframes._config.bigquery_options.BigQueryOptionsConfiguration adjusting how to connect to BigQuery and related APIs. Note that some options are ignored if  | 
| clients_provider | bigframes.session.clients.ClientsProviderAn object providing client library objects. | 
Properties
bqclient
API documentation for bqclient property.
bqconnectionclient
API documentation for bqconnectionclient property.
bqconnectionmanager
API documentation for bqconnectionmanager property.
bqstoragereadclient
API documentation for bqstoragereadclient property.
cloudfunctionsclient
API documentation for cloudfunctionsclient property.
resourcemanagerclient
API documentation for resourcemanagerclient property.
Methods
close
close()No-op. Temporary resources are deleted after 7 days.
read_csv
read_csv(
    filepath_or_buffer: str | IO["bytes"],
    *,
    sep: Optional[str] = ",",
    header: Optional[int] = 0,
    names: Optional[
        Union[MutableSequence[Any], np.ndarray[Any, Any], Tuple[Any, ...], range]
    ] = None,
    index_col: Optional[
        Union[int, str, Sequence[Union[str, int]], Literal[False]]
    ] = None,
    usecols: Optional[
        Union[
            MutableSequence[str],
            Tuple[str, ...],
            Sequence[int],
            pandas.Series,
            pandas.Index,
            np.ndarray[Any, Any],
            Callable[[Any], bool],
        ]
    ] = None,
    dtype: Optional[Dict] = None,
    engine: Optional[
        Literal["c", "python", "pyarrow", "python-fwf", "bigquery"]
    ] = None,
    encoding: Optional[str] = None,
    **kwargs
) -> dataframe.DataFrameLoads DataFrame from comma-separated values (csv) file locally or from Cloud Storage.
The CSV file data will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.
Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.csv"
>>> df = bpd.read_csv(filepath_or_buffer=gcs_path)
>>> df.head(2)
      name post_abbr
0  Alabama        AL
1   Alaska        AK
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| filepath_or_buffer | strA local or Google Cloud Storage ( | 
| sep | Optional[str], default ","the separator for fields in a CSV file. For the BigQuery engine, the separator can be any ISO-8859-1 single-byte character. To use a character in the range 128-255, you must encode the character as UTF-8. Both engines support  | 
| header | Optional[int], default 0row number to use as the column names. -  | 
| names | default Nonea list of column names to use. If the file contains a header row and you want to pass this parameter, then  | 
| index_col | default Nonecolumn(s) to use as the row labels of the DataFrame, either given as string name or column index.  | 
| usecols | default NoneList of column names to use): The BigQuery engine only supports having a list of string column names. Column indices and callable functions are only supported with the default engine. Using the default engine, the column names in  | 
| dtype | data type for data or columnsData type for data or columns. Only to be used with default engine. | 
| engine | Optional[Dict], default NoneType of engine to use. If  | 
| encoding | Optional[str], default to Noneencoding the character encoding of the data. The default encoding is  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | A BigQuery DataFrames. | 
read_gbq
read_gbq(
    query_or_table: str,
    *,
    index_col: Iterable[str] | str = (),
    columns: Iterable[str] = (),
    configuration: Optional[Dict] = None,
    max_results: Optional[int] = None,
    filters: third_party_pandas_gbq.FiltersType = (),
    use_cache: Optional[bool] = None,
    col_order: Iterable[str] = ()
) -> dataframe.DataFrameLoads a DataFrame from BigQuery.
BigQuery tables are an unordered, unindexed data source. To add support pandas-compatibility, the following indexing options are supported:
- (Default behavior) Add an arbitrary sequential index and ordering using an an analytic windowed operation that prevents filtering push down.
- (Recommended) Set the index_colargument to one or more columns. Unique values for the row labels are recommended. Duplicate labels are possible, but note that joins on a non-unique index can duplicate rows and operations likecumsum()that window across a non-unique index can have some non-deternimism.
GENERATE_UUID() AS
    rowindex in your SQL and set index_col='rowindex' for the
    best performance.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
If the input is a table ID:
>>> df = bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")
Read table path with wildcard suffix and filters:
df = bpd.read_gbq_table("bigquery-public-data.noaa_gsod.gsod19*", filters=[("_table_suffix", ">=", "30"), ("_table_suffix", "<=", "39")])
Preserve ordering in a query input.
>>> df = bpd.read_gbq('''
...    SELECT
...       -- Instead of an ORDER BY clause on the query, use
...       -- ROW_NUMBER() to create an ordered DataFrame.
...       ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
...         AS rowindex,
...
...       pitcherFirstName,
...       pitcherLastName,
...       AVG(pitchSpeed) AS averagePitchSpeed
...     FROM `bigquery-public-data.baseball.games_wide`
...     WHERE year = 2016
...     GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
         pitcherFirstName pitcherLastName  averagePitchSpeed
rowindex
1                Albertin         Chapman          96.514113
2                 Zachary         Britton          94.591039
<BLANKLINE>
[2 rows x 3 columns]
Reading data with columns and filters parameters:
>>> columns = ['pitcherFirstName', 'pitcherLastName', 'year', 'pitchSpeed']
>>> filters = [('year', '==', 2016), ('pitcherFirstName', 'in', ['John', 'Doe']), ('pitcherLastName', 'in', ['Gant'])]
>>> df = bpd.read_gbq(
...             "bigquery-public-data.baseball.games_wide",
...             columns=columns,
...             filters=filters,
...         )
>>> df.head(1)
         pitcherFirstName   pitcherLastName     year        pitchSpeed
0                    John              Gant     2016            82
<BLANKLINE>
[1 rows x 4 columns]
| Parameters | |
|---|---|
| Name | Description | 
| query_or_table | strA SQL string to be executed or a BigQuery table to be read. The table must be specified in the format of  | 
| index_col | Iterable[str] or strName of result column(s) to use for index in results DataFrame. New in bigframes version 1.3.0: If  | 
| columns | Iterable[str]List of BigQuery column names in the desired order for results DataFrame. | 
| configuration | dict, optionalQuery config parameters for job processing. For example: configuration = {'query': {'useQueryCache': False}}. For more information see  | 
| max_results | Optional[int], default NoneIf set, limit the maximum number of rows to fetch from the query results. | 
| filters | Union[Iterable[FilterType], Iterable[Iterable[FilterType]]], default ()To filter out data. Filter syntax: [[(column, op, val), …],…] where op is [==, >, >=, <, <=, !=, in, not in, LIKE]. The innermost tuples are transposed into a set of filters applied through an AND operation. The outer Iterable combines these sets of filters through an OR operation. A single Iterable of tuples can also be used, meaning that no OR operation between set of filters is to be conducted. If using wildcard table suffix in query_or_table, can specify '_table_suffix' pseudo column to filter the tables to be read into the DataFrame. | 
| use_cache | Optional[bool], default NoneCaches query results if set to  | 
| col_order | Iterable[str]Alias for columns, retained for backwards compatibility. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | A DataFrame representing results of the query or table. | 
read_gbq_function
read_gbq_function(function_name: str)Loads a BigQuery function from BigQuery.
Then it can be applied to a DataFrame or Series.
BigQuery Utils provides many public functions under thebqutil project on Google Cloud Platform project
(See: https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs#using-the-udfs).
You can checkout Community UDFs to use community-contributed functions.
(See: https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs/community#community-udfs).
Examples:
Use the cw_lower_case_ascii_only function from Community UDFs.
(https://github.com/GoogleCloudPlatform/bigquery-utils/blob/master/udfs/community/cw_lower_case_ascii_only.sqlx)
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'id': [1, 2, 3], 'name': ['AURÉLIE', 'CÉLESTINE', 'DAPHNÉ']})
>>> df
   id       name
0   1    AURÉLIE
1   2  CÉLESTINE
2   3     DAPHNÉ
<BLANKLINE>
[3 rows x 2 columns]
>>> func = bpd.read_gbq_function("bqutil.fn.cw_lower_case_ascii_only")
>>> df1 = df.assign(new_name=df['name'].apply(func))
>>> df1
   id       name   new_name
0   1    AURÉLIE    aurÉlie
1   2  CÉLESTINE  cÉlestine
2   3     DAPHNÉ     daphnÉ
<BLANKLINE>
[3 rows x 3 columns]
| Parameter | |
|---|---|
| Name | Description | 
| function_name | strthe function's name in BigQuery in the format  | 
| Returns | |
|---|---|
| Type | Description | 
| callable | A function object pointing to the BigQuery function read from BigQuery. The object is similar to the one created by the remote_functiondecorator, including thebigframes_remote_functionproperty, but not including thebigframes_cloud_functionproperty. | 
read_gbq_model
read_gbq_model(model_name: str)Loads a BigQuery ML model from BigQuery.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Read an existing BigQuery ML model.
>>> model_name = "bigframes-dev.bqml_tutorial.penguins_model"
>>> model = bpd.read_gbq_model(model_name)
| Parameter | |
|---|---|
| Name | Description | 
| model_name | strthe model's name in BigQuery in the format  | 
read_gbq_query
read_gbq_query(
    query: str,
    *,
    index_col: Iterable[str] | str = (),
    columns: Iterable[str] = (),
    configuration: Optional[Dict] = None,
    max_results: Optional[int] = None,
    use_cache: Optional[bool] = None,
    col_order: Iterable[str] = ()
) -> dataframe.DataFrameTurn a SQL query into a DataFrame.
Note: Because the results are written to a temporary table, ordering by
ORDER BY is not preserved. A unique index_col is recommended. Use
row_number() over () if there is no natural unique index or you
want to preserve ordering.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Simple query input:
>>> df = bpd.read_gbq_query('''
...    SELECT
...       pitcherFirstName,
...       pitcherLastName,
...       pitchSpeed,
...    FROM `bigquery-public-data.baseball.games_wide`
... ''')
Preserve ordering in a query input.
>>> df = bpd.read_gbq_query('''
...    SELECT
...       -- Instead of an ORDER BY clause on the query, use
...       -- ROW_NUMBER() to create an ordered DataFrame.
...       ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
...         AS rowindex,
...
...       pitcherFirstName,
...       pitcherLastName,
...       AVG(pitchSpeed) AS averagePitchSpeed
...     FROM `bigquery-public-data.baseball.games_wide`
...     WHERE year = 2016
...     GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
         pitcherFirstName pitcherLastName  averagePitchSpeed
rowindex
1                Albertin         Chapman          96.514113
2                 Zachary         Britton          94.591039
<BLANKLINE>
[2 rows x 3 columns]
See also: Session.read_gbq.
read_gbq_table
read_gbq_table(
    query: str,
    *,
    index_col: Iterable[str] | str = (),
    columns: Iterable[str] = (),
    max_results: Optional[int] = None,
    filters: third_party_pandas_gbq.FiltersType = (),
    use_cache: bool = True,
    col_order: Iterable[str] = ()
) -> dataframe.DataFrameTurn a BigQuery table into a DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Read a whole table, with arbitrary ordering or ordering corresponding to the primary key(s).
>>> df = bpd.read_gbq_table("bigquery-public-data.ml_datasets.penguins")
See also: Session.read_gbq.
read_json
read_json(
    path_or_buf: str | IO["bytes"],
    *,
    orient: Literal[
        "split", "records", "index", "columns", "values", "table"
    ] = "columns",
    dtype: Optional[Dict] = None,
    encoding: Optional[str] = None,
    lines: bool = False,
    engine: Literal["ujson", "pyarrow", "bigquery"] = "ujson",
    **kwargs
) -> dataframe.DataFrameConvert a JSON string to DataFrame object.
Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> gcs_path = "gs://bigframes-dev-testing/sample1.json"
>>> df = bpd.read_json(path_or_buf=gcs_path, lines=True, orient="records")
>>> df.head(2)
   id   name
0   1  Alice
1   2    Bob
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| path_or_buf | a valid JSON str, path object or file-like objectA local or Google Cloud Storage ( | 
| orient | str, optionalIf  | 
| dtype | bool or dict, default NoneIf True, infer dtypes; if a dict of column to dtype, then use those; if False, then don't infer dtypes at all, applies only to the data. For all  | 
| encoding | str, default is 'utf-8'The encoding to use to decode py3 bytes. | 
| lines | bool, default FalseRead the file as a json object per line. If using  | 
| engine | {{"ujson", "pyarrow", "bigquery"}}, default "ujson"Type of engine to use. If  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | The DataFrame representing JSON contents. | 
read_pandas
Loads DataFrame from a pandas DataFrame.
The pandas DataFrame will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.
Examples:
>>> import bigframes.pandas as bpd
>>> import pandas as pd
>>> bpd.options.display.progress_bar = None
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> pandas_df = pd.DataFrame(data=d)
>>> df = bpd.read_pandas(pandas_df)
>>> df
   col1  col2
0     1     3
1     2     4
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| pandas_dataframe | pandas.DataFrame, pandas.Series, or pandas.Indexa pandas DataFrame/Series/Index object to be loaded. | 
read_parquet
read_parquet(
    path: str | IO["bytes"], *, engine: str = "auto"
) -> dataframe.DataFrameLoad a Parquet object from the file path (local or Cloud Storage), returning a DataFrame.
Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.parquet"
>>> df = bpd.read_parquet(path=gcs_path, engine="bigquery")
| Parameters | |
|---|---|
| Name | Description | 
| path | strLocal or Cloud Storage path to Parquet file. | 
| engine | strOne of  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | A BigQuery DataFrames. | 
read_pickle
read_pickle(
    filepath_or_buffer: FilePath | ReadPickleBuffer,
    compression: CompressionOptions = "infer",
    storage_options: StorageOptions = None,
)Load pickled BigFrames object (or any object) from file.
Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> gcs_path = "gs://bigframes-dev-testing/test_pickle.pkl"
>>> df = bpd.read_pickle(filepath_or_buffer=gcs_path)
| Parameters | |
|---|---|
| Name | Description | 
| filepath_or_buffer | str, path object, or file-like objectString, path object (implementing os.PathLike[str]), or file-like object implementing a binary readlines() function. Also accepts URL. URL is not limited to S3 and GCS. | 
| compression | str or dict, default 'infer'For on-the-fly decompression of on-disk data. If 'infer' and 'filepath_or_buffer' is path-like, then detect compression from the following extensions: '.gz', '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2' (otherwise no compression). If using 'zip' or 'tar', the ZIP file must contain only one data file to be read in. Set to None for no decompression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdDecompressor or tarfile.TarFile, respectively. As an example, the following could be passed for Zstandard decompression using a custom compression dictionary compression={'method': 'zstd', 'dict_data': my_compression_dict}. | 
| storage_options | dict, default NoneExtra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame or bigframes.series.Series | same type as object stored in file. | 
remote_function
remote_function(
    input_types: typing.List[type],
    output_type: type,
    dataset: typing.Optional[str] = None,
    bigquery_connection: typing.Optional[str] = None,
    reuse: bool = True,
    name: typing.Optional[str] = None,
    packages: typing.Optional[typing.Sequence[str]] = None,
    cloud_function_service_account: typing.Optional[str] = None,
    cloud_function_kms_key_name: typing.Optional[str] = None,
    cloud_function_docker_repository: typing.Optional[str] = None,
    max_batching_rows: typing.Optional[int] = 1000,
)Decorator to turn a user defined function into a BigQuery remote function. Check out the code samples at: https://cloud.google.com/bigquery/docs/remote-functions#bigquery-dataframes.
- Have the below APIs enabled for your project: - BigQuery Connection API
- Cloud Functions API
- Cloud Run API
- Cloud Build API
- Artifact Registry API
- Cloud Resource Manager API
 - This can be done from the cloud console (change - PROJECT_IDto yours): https://console.cloud.google.com/apis/enableflow?apiid=bigqueryconnection.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com,cloudbuild.googleapis.com,artifactregistry.googleapis.com,cloudresourcemanager.googleapis.com&project=PROJECT_ID- Or from the gcloud CLI: - $ gcloud services enable bigqueryconnection.googleapis.com cloudfunctions.googleapis.com run.googleapis.com cloudbuild.googleapis.com artifactregistry.googleapis.com cloudresourcemanager.googleapis.com
- Have following IAM roles enabled for you: - BigQuery Data Editor (roles/bigquery.dataEditor)
- BigQuery Connection Admin (roles/bigquery.connectionAdmin)
- Cloud Functions Developer (roles/cloudfunctions.developer)
- Service Account User (roles/iam.serviceAccountUser) on the service account PROJECT_NUMBER-compute@developer.gserviceaccount.com
- Storage Object Viewer (roles/storage.objectViewer)
- Project IAM Admin (roles/resourcemanager.projectIamAdmin) (Only required if the bigquery connection being used is not pre-created and is created dynamically with user credentials.)
 
- Either the user has setIamPolicy privilege on the project, or a BigQuery connection is pre-created with necessary IAM role set: - To create a connection, follow https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#create_a_connection
- To set up IAM, follow https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#grant_permission_on_function - Alternatively, the IAM could also be setup via the gcloud CLI: - $ gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:CONNECTION_SERVICE_ACCOUNT_ID" --role="roles/run.invoker".
 
| Parameters | |
|---|---|
| Name | Description | 
| input_types | list(type)List of input data types in the user defined function. | 
| output_type | typeData type of the output in the user defined function. | 
| dataset | str, OptionalDataset in which to create a BigQuery remote function. It should be in  | 
| bigquery_connection | str, OptionalName of the BigQuery connection. You should either have the connection already created in the  | 
| reuse | bool, OptionalReuse the remote function if already exists.  | 
| name | str, OptionalExplicit name of the persisted BigQuery remote function. Use it with caution, because two users working in the same project and dataset could overwrite each other's remote functions if they use the same persistent name. | 
| packages | str[], OptionalExplicit name of the external package dependencies. Each dependency is added to the  | 
| cloud_function_service_account | str, OptionalService account to use for the cloud functions. If not provided then the default service account would be used. See https://cloud.google.com/functions/docs/securing/function-identity for more details. Please make sure the service account has the necessary IAM permissions configured as described in https://cloud.google.com/functions/docs/reference/iam/roles#additional-configuration. | 
| cloud_function_kms_key_name | str, OptionalCustomer managed encryption key to protect cloud functions and related data at rest. This is of the format projects/PROJECT_ID/locations/LOCATION/keyRings/KEYRING/cryptoKeys/KEY. Read https://cloud.google.com/functions/docs/securing/cmek for more details including granting necessary service accounts access to the key. | 
| cloud_function_docker_repository | str, OptionalDocker repository created with the same encryption key as  | 
| max_batching_rows | int, OptionalThe maximum number of rows to be batched for processing in the BQ remote function. Default value is 1000. A lower number can be passed to avoid timeouts in case the user code is too complex to process large number of rows fast enough. A higher number can be used to increase throughput in case the user code is fast enough.  | 
| Returns | |
|---|---|
| Type | Description | 
| callable | A remote function object pointing to the cloud assets created in the background to support the remote execution. The cloud assets can be located through the following properties set in the object: bigframes_cloud_function- The google cloud function deployed for the user defined code.bigframes_remote_function- The bigquery remote function capable of calling intobigframes_cloud_function. |