A tf.data.Dataset object that contains a potentially large set
of elements, where each element is a pair of (input_data, target). The
input_data means the raw input data, like an image, a text etc., while
the target means the ground truth of the raw input data, e.g. the
classification label of the image etc.
size
The size of the dataset. tf.data.Dataset donesn't support a function
to get the length directly since it's lazy-loaded and may be infinite.
Attributes
label_names
num_classes
size
Returns the size of the dataset.
Same functionality as calling len. See the len method definition for
more information.
If size is not set, this method will fallback to using the len method
of the tf.data.Dataset in self._dataset. Calling len on a
tf.data.Dataset instance may throw a TypeError because the dataset may
be lazy-loaded with an unknown size or have infinite size.
In most cases, however, when an instance of this class is created by helper
functions like 'from_folder', the size of the dataset will be preprocessed,
and the _size instance variable will be already set.
Raises
TypeError if self._size is not set and the cardinality of self._dataset
is INFINITE_CARDINALITY or UNKNOWN_CARDINALITY.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-05-07 UTC."],[],[],null,["# mediapipe_model_maker.text_classifier.Dataset\n\n\u003cbr /\u003e\n\n|------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/google/mediapipe/blob/master/mediapipe/model_maker/python/text/text_classifier/dataset.py#L50-L133) |\n\nDataset library for text classifier.\n\nInherits From: [`ClassificationDataset`](../../mediapipe_model_maker/face_stylizer/dataset/classification_dataset/ClassificationDataset), [`Dataset`](../../mediapipe_model_maker/model_util/dataset/Dataset) \n\n mediapipe_model_maker.text_classifier.Dataset(\n dataset: tf.data.Dataset,\n label_names: List[str],\n tfrecord_cache_files: Optional[cache_files_lib.TFRecordCacheFiles] = None,\n size: Optional[int] = None\n )\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `tf_dataset` | A tf.data.Dataset object that contains a potentially large set of elements, where each element is a pair of (input_data, target). The `input_data` means the raw input data, like an image, a text etc., while the `target` means the ground truth of the raw input data, e.g. the classification label of the image etc. |\n| `size` | The size of the dataset. tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|---------------|-----------------------------------------------------------------------------------------------------------------------------------------|\n| `label_names` | \u003cbr /\u003e \u003cbr /\u003e |\n| `num_classes` | \u003cbr /\u003e \u003cbr /\u003e |\n| `size` | Returns the size of the dataset. \u003cbr /\u003e Same functionality as calling **len** . See the **len** method definition for more information. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `from_csv`\n\n[View source](https://github.com/google/mediapipe/blob/master/mediapipe/model_maker/python/text/text_classifier/dataset.py#L67-L133) \n\n @classmethod\n from_csv(\n filename: str,\n csv_params: ../../mediapipe_model_maker/text_classifier/CSVParams,\n shuffle: bool = True,\n cache_dir: Optional[str] = None,\n num_shards: int = 1\n ) -\u003e 'Dataset'\n\nLoads text with labels from a CSV file.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `filename` | Name of the CSV file. |\n| `csv_params` | Parameters used for reading the CSV file. |\n| `shuffle` | If True, randomly shuffle the data. |\n| `cache_dir` | Optional parameter to specify where to store the preprocessed dataset. Only used for BERT models. |\n| `num_shards` | Optional parameter for num shards of the preprocessed dataset. Note that using more than 1 shard will reorder the dataset. Only used for BERT models. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| Dataset containing (text, label) pairs and other related info. ||\n\n\u003cbr /\u003e\n\n### `gen_tf_dataset`\n\n[View source](https://github.com/google/mediapipe/blob/master/mediapipe/model_maker/python/core/data/dataset.py#L68-L116) \n\n gen_tf_dataset(\n batch_size: int = 1,\n is_training: bool = False,\n shuffle: bool = False,\n preprocess: Optional[Callable[..., Any]] = None,\n drop_remainder: bool = False\n ) -\u003e tf.data.Dataset\n\nGenerates a batched tf.data.Dataset for training/evaluation.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|------------------|------------------------------------------------------------------------------------------------------------|\n| `batch_size` | An integer, the returned dataset will be batched by this size. |\n| `is_training` | A boolean, when True, the returned dataset will be optionally shuffled and repeated as an endless dataset. |\n| `shuffle` | A boolean, when True, the returned dataset will be shuffled to create randomness during model training. |\n| `preprocess` | A function taking three arguments in order, feature, label and boolean is_training. |\n| `drop_remainder` | boolean, whether the finally batch drops remainder. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A TF dataset ready to be consumed by Keras model. ||\n\n\u003cbr /\u003e\n\n### `split`\n\n[View source](https://github.com/google/mediapipe/blob/master/mediapipe/model_maker/python/core/data/classification_dataset.py#L43-L56) \n\n split(\n fraction: float\n ) -\u003e Tuple[ds._DatasetT, ds._DatasetT]\n\nSplits dataset into two sub-datasets with the given fraction.\n\nPrimarily used for splitting the data set into training and testing sets.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|------------|-----------------------------------------------------------------------------------------|\n| `fraction` | float, demonstrates the fraction of the first returned subdataset in the original data. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| The splitted two sub datasets. ||\n\n\u003cbr /\u003e\n\n### `__len__`\n\n[View source](https://github.com/google/mediapipe/blob/master/mediapipe/model_maker/python/core/data/dataset.py#L118-L137) \n\n __len__() -\u003e int\n\nReturns the number of element of the dataset.\n\nIf size is not set, this method will fallback to using the **len** method\nof the tf.data.Dataset in self._dataset. Calling **len** on a\ntf.data.Dataset instance may throw a TypeError because the dataset may\nbe lazy-loaded with an unknown size or have infinite size.\n\nIn most cases, however, when an instance of this class is created by helper\nfunctions like 'from_folder', the size of the dataset will be preprocessed,\nand the _size instance variable will be already set.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ||\n|---|---|\n| TypeError if self._size is not set and the cardinality of self._dataset is INFINITE_CARDINALITY or UNKNOWN_CARDINALITY. ||\n\n\u003cbr /\u003e"]]