Skip to content

Faster ingestion into BQ by converting the chunk into pd.Dataframe #414

@DarshanSP19

Description

@DarshanSP19

In weather-mv we're dividing the Dataset into small chunks that's adding appropriate parallelism in the pipeline. In the next step if we convert those small chunks into pandas Dataframes it would reduce the cost of generating the flat rows as extracting the rows from Dataframe is very fast.

df = ds.to_dataframe().reset_index()

Here ds is a small chunk of a dataset and reset_index() will flatten the dataset chunk into a normalized dataframe.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions