Send feedback
Module cluster (2.17.0)
Stay organized with collections
Save and categorize content based on your preferences.
Version latestkeyboard_arrow_down
Classes
KMeans (
n_clusters : int = 8 ,
* ,
init : typing . Literal [ "kmeans++" , "random" , "custom" ] = "kmeans++" ,
init_col : typing . Optional [ str ] = None ,
distance_type : typing . Literal [ "euclidean" , "cosine" ] = "euclidean" ,
max_iter : int = 20 ,
tol : float = 0.01 ,
warm_start : bool = False
)
K-Means clustering.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> from bigframes.ml.cluster import KMeans
>>> X = bpd.DataFrame({"feat0": [1, 1, 1, 10, 10, 10], "feat1": [2, 4, 0, 2, 4, 0]})
>>> kmeans = KMeans(n_clusters=2).fit(X)
>>> kmeans.predict(bpd.DataFrame({"feat0": [0, 12], "feat1": [0, 3]}))["CENTROID_ID"] # doctest:+SKIP
0 1
1 2
Name: CENTROID_ID, dtype: Int64
>>> kmeans.cluster_centers_ # doctest:+SKIP
centroid_id feature numerical_value categorical_value
0 1 feat0 5.5 []
1 1 feat1 1.0 []
2 2 feat0 5.5 []
3 2 feat1 4.0 []
[4 rows x 4 columns]
Parameters
Name
Description
n_clusters
int, default 8
The number of clusters to form as well as the number of centroids to generate. Default to 8.
init
"kmeans++", "random" or "custom", default "kmeans++"
The method of initializing the clusters. Default to "kmeans++" kmeas++: Initializes a number of centroids equal to the n_clusters value by using the k-means++ algorithm. Using this approach usually trains a better model than using random cluster initialization. random: Initializes the centroids by randomly selecting a number of data points equal to the n_clusters value from the input data. custom: Initializes the centroids using a provided column of type bool. Uses the rows with a value of True as the initial centroids. You specify the column to use by using the init_col option.
init_col
str or None, default None
The name of the column to use to initialize the centroids. This column must have a type of bool. If this column contains a value of True for a given row, then uses that row as an initial centroid. The number of True rows in this column must be equal to the value you have specified for the n_clusters option. Only works with init method "custom". Default to None.
distance_type
"euclidean" or "cosine", default "euclidean"
The type of metric to use to compute the distance between two points. Default to "euclidean".
max_iter
int, default 20
The maximum number of training iterations, where one iteration represents a single pass of the entire training data. Default to 20.
tol
float, default 0.01
The minimum relative loss improvement that is necessary to continue training. For example, a value of 0.01 specifies that each iteration must reduce the loss by 1% for training to continue. Default to 0.01.
warm_start
bool, default False
Determines whether to train a model with new training data, new model options, or both. Unless you explicitly override them, the initial options used to train the model are used for the warm start run. Default to False.
Send feedback
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-28 UTC.
Need to tell us more?
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[],[],null,["# Module cluster (2.17.0)\n\nVersion latestkeyboard_arrow_down\n\n- [2.17.0 (latest)](/python/docs/reference/bigframes/latest/bigframes.ml.cluster)\n- [2.16.0](/python/docs/reference/bigframes/2.16.0/bigframes.ml.cluster)\n- [2.15.0](/python/docs/reference/bigframes/2.15.0/bigframes.ml.cluster)\n- [2.14.0](/python/docs/reference/bigframes/2.14.0/bigframes.ml.cluster)\n- [2.13.0](/python/docs/reference/bigframes/2.13.0/bigframes.ml.cluster)\n- [2.12.0](/python/docs/reference/bigframes/2.12.0/bigframes.ml.cluster)\n- [2.11.0](/python/docs/reference/bigframes/2.11.0/bigframes.ml.cluster)\n- [2.10.0](/python/docs/reference/bigframes/2.10.0/bigframes.ml.cluster)\n- [2.9.0](/python/docs/reference/bigframes/2.9.0/bigframes.ml.cluster)\n- [2.8.0](/python/docs/reference/bigframes/2.8.0/bigframes.ml.cluster)\n- [2.7.0](/python/docs/reference/bigframes/2.7.0/bigframes.ml.cluster)\n- [2.6.0](/python/docs/reference/bigframes/2.6.0/bigframes.ml.cluster)\n- [2.5.0](/python/docs/reference/bigframes/2.5.0/bigframes.ml.cluster)\n- [2.4.0](/python/docs/reference/bigframes/2.4.0/bigframes.ml.cluster)\n- [2.3.0](/python/docs/reference/bigframes/2.3.0/bigframes.ml.cluster)\n- [2.2.0](/python/docs/reference/bigframes/2.2.0/bigframes.ml.cluster)\n- [2.1.0](/python/docs/reference/bigframes/2.1.0/bigframes.ml.cluster)\n- [2.0.0](/python/docs/reference/bigframes/2.0.0/bigframes.ml.cluster)\n- [1.42.0](/python/docs/reference/bigframes/1.42.0/bigframes.ml.cluster)\n- [1.41.0](/python/docs/reference/bigframes/1.41.0/bigframes.ml.cluster)\n- [1.40.0](/python/docs/reference/bigframes/1.40.0/bigframes.ml.cluster)\n- [1.39.0](/python/docs/reference/bigframes/1.39.0/bigframes.ml.cluster)\n- [1.38.0](/python/docs/reference/bigframes/1.38.0/bigframes.ml.cluster)\n- [1.37.0](/python/docs/reference/bigframes/1.37.0/bigframes.ml.cluster)\n- [1.36.0](/python/docs/reference/bigframes/1.36.0/bigframes.ml.cluster)\n- [1.35.0](/python/docs/reference/bigframes/1.35.0/bigframes.ml.cluster)\n- [1.34.0](/python/docs/reference/bigframes/1.34.0/bigframes.ml.cluster)\n- [1.33.0](/python/docs/reference/bigframes/1.33.0/bigframes.ml.cluster)\n- [1.32.0](/python/docs/reference/bigframes/1.32.0/bigframes.ml.cluster)\n- [1.31.0](/python/docs/reference/bigframes/1.31.0/bigframes.ml.cluster)\n- [1.30.0](/python/docs/reference/bigframes/1.30.0/bigframes.ml.cluster)\n- [1.29.0](/python/docs/reference/bigframes/1.29.0/bigframes.ml.cluster)\n- [1.28.0](/python/docs/reference/bigframes/1.28.0/bigframes.ml.cluster)\n- [1.27.0](/python/docs/reference/bigframes/1.27.0/bigframes.ml.cluster)\n- [1.26.0](/python/docs/reference/bigframes/1.26.0/bigframes.ml.cluster)\n- [1.25.0](/python/docs/reference/bigframes/1.25.0/bigframes.ml.cluster)\n- [1.24.0](/python/docs/reference/bigframes/1.24.0/bigframes.ml.cluster)\n- [1.22.0](/python/docs/reference/bigframes/1.22.0/bigframes.ml.cluster)\n- [1.21.0](/python/docs/reference/bigframes/1.21.0/bigframes.ml.cluster)\n- [1.20.0](/python/docs/reference/bigframes/1.20.0/bigframes.ml.cluster)\n- [1.19.0](/python/docs/reference/bigframes/1.19.0/bigframes.ml.cluster)\n- [1.18.0](/python/docs/reference/bigframes/1.18.0/bigframes.ml.cluster)\n- [1.17.0](/python/docs/reference/bigframes/1.17.0/bigframes.ml.cluster)\n- [1.16.0](/python/docs/reference/bigframes/1.16.0/bigframes.ml.cluster)\n- [1.15.0](/python/docs/reference/bigframes/1.15.0/bigframes.ml.cluster)\n- [1.14.0](/python/docs/reference/bigframes/1.14.0/bigframes.ml.cluster)\n- [1.13.0](/python/docs/reference/bigframes/1.13.0/bigframes.ml.cluster)\n- [1.12.0](/python/docs/reference/bigframes/1.12.0/bigframes.ml.cluster)\n- [1.11.1](/python/docs/reference/bigframes/1.11.1/bigframes.ml.cluster)\n- [1.10.0](/python/docs/reference/bigframes/1.10.0/bigframes.ml.cluster)\n- [1.9.0](/python/docs/reference/bigframes/1.9.0/bigframes.ml.cluster)\n- [1.8.0](/python/docs/reference/bigframes/1.8.0/bigframes.ml.cluster)\n- [1.7.0](/python/docs/reference/bigframes/1.7.0/bigframes.ml.cluster)\n- [1.6.0](/python/docs/reference/bigframes/1.6.0/bigframes.ml.cluster)\n- [1.5.0](/python/docs/reference/bigframes/1.5.0/bigframes.ml.cluster)\n- [1.4.0](/python/docs/reference/bigframes/1.4.0/bigframes.ml.cluster)\n- [1.3.0](/python/docs/reference/bigframes/1.3.0/bigframes.ml.cluster)\n- [1.2.0](/python/docs/reference/bigframes/1.2.0/bigframes.ml.cluster)\n- [1.1.0](/python/docs/reference/bigframes/1.1.0/bigframes.ml.cluster)\n- [1.0.0](/python/docs/reference/bigframes/1.0.0/bigframes.ml.cluster)\n- [0.26.0](/python/docs/reference/bigframes/0.26.0/bigframes.ml.cluster)\n- [0.25.0](/python/docs/reference/bigframes/0.25.0/bigframes.ml.cluster)\n- [0.24.0](/python/docs/reference/bigframes/0.24.0/bigframes.ml.cluster)\n- [0.23.0](/python/docs/reference/bigframes/0.23.0/bigframes.ml.cluster)\n- [0.22.0](/python/docs/reference/bigframes/0.22.0/bigframes.ml.cluster)\n- [0.21.0](/python/docs/reference/bigframes/0.21.0/bigframes.ml.cluster)\n- [0.20.1](/python/docs/reference/bigframes/0.20.1/bigframes.ml.cluster)\n- [0.19.2](/python/docs/reference/bigframes/0.19.2/bigframes.ml.cluster)\n- [0.18.0](/python/docs/reference/bigframes/0.18.0/bigframes.ml.cluster)\n- [0.17.0](/python/docs/reference/bigframes/0.17.0/bigframes.ml.cluster)\n- [0.16.0](/python/docs/reference/bigframes/0.16.0/bigframes.ml.cluster)\n- [0.15.0](/python/docs/reference/bigframes/0.15.0/bigframes.ml.cluster)\n- [0.14.1](/python/docs/reference/bigframes/0.14.1/bigframes.ml.cluster)\n- [0.13.0](/python/docs/reference/bigframes/0.13.0/bigframes.ml.cluster)\n- [0.12.0](/python/docs/reference/bigframes/0.12.0/bigframes.ml.cluster)\n- [0.11.0](/python/docs/reference/bigframes/0.11.0/bigframes.ml.cluster)\n- [0.10.0](/python/docs/reference/bigframes/0.10.0/bigframes.ml.cluster)\n- [0.9.0](/python/docs/reference/bigframes/0.9.0/bigframes.ml.cluster)\n- [0.8.0](/python/docs/reference/bigframes/0.8.0/bigframes.ml.cluster)\n- [0.7.0](/python/docs/reference/bigframes/0.7.0/bigframes.ml.cluster)\n- [0.6.0](/python/docs/reference/bigframes/0.6.0/bigframes.ml.cluster)\n- [0.5.0](/python/docs/reference/bigframes/0.5.0/bigframes.ml.cluster)\n- [0.4.0](/python/docs/reference/bigframes/0.4.0/bigframes.ml.cluster)\n- [0.3.0](/python/docs/reference/bigframes/0.3.0/bigframes.ml.cluster)\n- [0.2.0](/python/docs/reference/bigframes/0.2.0/bigframes.ml.cluster) \nClustering models. This module is styled after Scikit-Learn's cluster module:\n\u003chttps://scikit-learn.org/stable/modules/clustering.html\u003e.\n\nClasses\n-------\n\n### [KMeans](/python/docs/reference/bigframes/latest/bigframes.ml.cluster.KMeans)\n\n KMeans(\n n_clusters: int = 8,\n *,\n init: typing.Literal[\"kmeans++\", \"random\", \"custom\"] = \"kmeans++\",\n init_col: typing.Optional[str] = None,\n distance_type: typing.Literal[\"euclidean\", \"cosine\"] = \"euclidean\",\n max_iter: int = 20,\n tol: float = 0.01,\n warm_start: bool = False\n )\n\nK-Means clustering.\n\n**Examples:** \n\n \u003e\u003e\u003e import bigframes.pandas as bpd\n \u003e\u003e\u003e bpd.options.display.progress_bar = None\n \u003e\u003e\u003e from bigframes.ml.cluster import KMeans\n\n \u003e\u003e\u003e X = bpd.DataFrame({\"feat0\": [1, 1, 1, 10, 10, 10], \"feat1\": [2, 4, 0, 2, 4, 0]})\n \u003e\u003e\u003e kmeans = KMeans(n_clusters=2).fit(X)\n \u003e\u003e\u003e kmeans.predict(bpd.DataFrame({\"feat0\": [0, 12], \"feat1\": [0, 3]}))[\"CENTROID_ID\"] # doctest:+SKIP\n 0 1\n 1 2\n Name: CENTROID_ID, dtype: Int64\n\n \u003e\u003e\u003e kmeans.cluster_centers_ # doctest:+SKIP\n centroid_id feature numerical_value categorical_value\n 0 1 feat0 5.5 []\n 1 1 feat1 1.0 []\n 2 2 feat0 5.5 []\n 3 2 feat1 4.0 []\n\n [4 rows x 4 columns]"]]