- 2.25.0 (latest)
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
API documentation for bigquery package.
Packages Functions
approx_top_count
approx_top_count(series: series.Series, number: int) -> series.SeriesReturns the approximate top elements of expression as an array of STRUCTs.
The number parameter specifies the number of elements returned.
Each STRUCT contains two fields. The first field (named value) contains an input
value. The second field (named count) contains an INT64 specifying the number
of times the value was returned.
Returns NULL if there are zero input rows.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(["apple", "apple", "pear", "pear", "pear", "banana"])
>>> bbq.approx_top_count(s, number=2)
[{'value': 'pear', 'count': 3}, {'value': 'apple', 'count': 2}]
| Parameters | |
|---|---|
| Name | Description | 
| series | The Series with any data type that the  | 
| number | An integer specifying the number of times the value was returned. | 
array_agg
array_agg(
    obj: groupby.SeriesGroupBy | groupby.DataFrameGroupBy,
) -> series.Series | dataframe.DataFrameGroup data and create arrays from selected columns, omitting NULLs to avoid BigQuery errors (NULLs not allowed in arrays).
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
For a SeriesGroupBy object:
>>> lst = ['a', 'a', 'b', 'b', 'a']
>>> s = bpd.Series([1, 2, 3, 4, np.nan], index=lst)
>>> bbq.array_agg(s.groupby(level=0))
a    [1. 2.]
b    [3. 4.]
dtype: list<item: double>[pyarrow]
For a DataFrameGroupBy object:
>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = bpd.DataFrame(l, columns=["a", "b", "c"])
>>> bbq.array_agg(df.groupby(by=["b"]))
         a      c
b
1.0    [2]    [3]
2.0  [1 1]  [3 2]
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| obj | A GroupBy object to be applied the function. | 
array_length
array_length(series: series.Series) -> series.SeriesCompute the length of each array element in the Series.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([[1, 2, 8, 3], [], [3, 4]])
>>> bbq.array_length(s)
0    4
1    0
2    2
dtype: Int64
You can also apply this function directly to Series.
>>> s.apply(bbq.array_length, by_row=False)
0    4
1    0
2    2
dtype: Int64
| Parameter | |
|---|---|
| Name | Description | 
| series | A Series with array columns. | 
array_to_string
array_to_string(series: series.Series, delimiter: str) -> series.SeriesConverts array elements within a Series into delimited strings.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([["H", "i", "!"], ["Hello", "World"], np.nan, [], ["Hi"]])
>>> bbq.array_to_string(s, delimiter=", ")
    0         H, i, !
    1    Hello, World
    2
    3
    4              Hi
    dtype: string
| Parameters | |
|---|---|
| Name | Description | 
| series | A Series containing arrays. | 
| delimiter | The string used to separate array elements. | 
json_extract
json_extract(series: series.Series, json_path: str) -> series.SeriesExtracts a JSON value and converts it to a SQL JSON-formatted STRING or JSON
value. This function uses single quotes and brackets to escape invalid JSONPath
characters in JSON keys.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> bbq.json_extract(s, json_path="$.class")
0    {"students":[{"id":5},{"id":12}]}
dtype: string
| Parameters | |
|---|---|
| Name | Description | 
| series | The Series containing JSON data (as native JSON objects or JSON-formatted strings). | 
| json_path | The JSON path identifying the data that you want to obtain from the input. | 
json_extract_array
json_extract_array(series: series.Series, json_path: str = "$") -> series.SeriesExtracts a JSON array and converts it to a SQL array of JSON-formatted STRING or JSON
values. This function uses single quotes and brackets to escape invalid JSONPath
characters in JSON keys.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_extract_array(s)
0    ['1' '2' '3']
1        ['4' '5']
dtype: list<item: string>[pyarrow]
| Parameters | |
|---|---|
| Name | Description | 
| series | The Series containing JSON data (as native JSON objects or JSON-formatted strings). | 
| json_path | The JSON path identifying the data that you want to obtain from the input. | 
json_set
json_set(
    series: series.Series,
    json_path_value_pairs: typing.Sequence[typing.Tuple[str, typing.Any]],
) -> series.SeriesProduces a new JSON value within a Series by inserting or replacing values at specified paths.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> s = bpd.read_gbq("SELECT JSON '{\"a\": 1}' AS data")["data"]
>>> bbq.json_set(s, json_path_value_pairs=[("$.a", 100), ("$.b", "hi")])
    0    {"a":100,"b":"hi"}
    Name: data, dtype: string
| Parameters | |
|---|---|
| Name | Description | 
| series | The Series containing JSON data (as native JSON objects or JSON-formatted strings). | 
| json_path_value_pairs | Pairs of JSON path and the new value to insert/replace. | 
struct
struct(value: dataframe.DataFrame) -> series.SeriesTakes a DataFrame and converts it into a Series of structs with each struct entry corresponding to a DataFrame row and each struct field corresponding to a DataFrame column
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.series as series
>>> bpd.options.display.progress_bar = None
>>> srs = series.Series([{"version": 1, "project": "pandas"}, {"version": 2, "project": "numpy"},])
>>> df = srs.struct.explode()
>>> bbq.struct(df)
0    {'project': 'pandas', 'version': 1}
1     {'project': 'numpy', 'version': 2}
dtype: struct<project: string, version: int64>[pyarrow]
Args:
    value (bigframes.dataframe.DataFrame):
        The DataFrame to be converted to a Series of structs
Returns:
    bigframes.series.Series: A new Series with struct entries representing rows of the original DataFrame
vector_search
vector_search(
    base_table: str,
    column_to_search: str,
    query: Union[dataframe.DataFrame, series.Series],
    *,
    query_column_to_search: Optional[str] = None,
    top_k: Optional[int] = 10,
    distance_type: Literal["euclidean", "cosine"] = "euclidean",
    fraction_lists_to_search: Optional[float] = None,
    use_brute_force: bool = False
) -> dataframe.DataFrameConduct vector search which searches embeddings to find semantically similar entities.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
DataFrame embeddings for which to find nearest neighbors. The ARRAY<FLOAT64> column
is used as the search query:
>>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
...                               "embedding": [[1.0, 2.0], [3.0, 5.2]]})
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             top_k=2)
  query_id  embedding  id my_embedding  distance
1      cat  [3.  5.2]   5    [5.  5.4]  2.009975
0      dog    [1. 2.]   1      [1. 2.]       0.0
0      dog    [1. 2.]   4    [1.  3.2]       1.2
1      cat  [3.  5.2]   2      [2. 4.]   1.56205
<BLANKLINE>
[4 rows x 5 columns]
Series embeddings for which to find nearest neighbors:
>>> search_query = bpd.Series([[1.0, 2.0], [3.0, 5.2]],
...                            index=["dog", "cat"],
...                            name="embedding")
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             top_k=2)
     embedding  id my_embedding  distance
dog    [1. 2.]   1      [1. 2.]       0.0
cat  [3.  5.2]   5    [5.  5.4]  2.009975
dog    [1. 2.]   4    [1.  3.2]       1.2
cat  [3.  5.2]   2      [2. 4.]   1.56205
<BLANKLINE>
[4 rows x 4 columns]
You can specify the name of the column in the query DataFrame embeddings and distance type. If you specify query_column_to_search_value, it will use the provided column which contains the embeddings for which to find nearest neighbors. Otherwiese, it uses the column_to_search value.
>>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
...                               "embedding": [[1.0, 2.0], [3.0, 5.2]],
...                               "another_embedding": [[0.7, 2.2], [3.3, 5.2]]})
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             distance_type="cosine",
...             query_column_to_search="another_embedding",
...             top_k=2)
  query_id  embedding another_embedding  id my_embedding  distance
1      cat  [3.  5.2]         [3.3 5.2]   2      [2. 4.]  0.005181
0      dog    [1. 2.]         [0.7 2.2]   4    [1.  3.2]  0.000013
1      cat  [3.  5.2]         [3.3 5.2]   1      [1. 2.]  0.005181
0      dog    [1. 2.]         [0.7 2.2]   3    [1.5 7. ]  0.004697
<BLANKLINE>
[4 rows x 6 columns]
| Parameters | |
|---|---|
| Name | Description | 
| base_table | The table to search for nearest neighbor embeddings. | 
| column_to_search | The name of the base table column to search for nearest neighbor embeddings. The column must have a type of  | 
| query | A Series or DataFrame that provides the embeddings for which to find nearest neighbors. | 
| query_column_to_search | Specifies the name of the column in the query that contains the embeddings for which to find nearest neighbors. The column must have a type of  | 
| top_k | Sepecifies the number of nearest neighbors to return. Default to 10. | 
| distance_type | Specifies the type of metric to use to compute the distance between two vectors. Possible values are "euclidean" and "cosine". Default to "euclidean". | 
| fraction_lists_to_search | Specifies the percentage of lists to search. Specifying a higher percentage leads to higher recall and slower performance, and the converse is true when specifying a lower percentage. It is only used when a vector index is also used. You can only specify  | 
| use_brute_force | Determines whether to use brute force search by skipping the vector index if one is available. Default to False. |