Skip to content

Conversation

TrevorBergeron
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 28, 2025
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Mar 31, 2025
@TrevorBergeron TrevorBergeron marked this pull request as ready for review March 31, 2025 21:03
@TrevorBergeron TrevorBergeron requested review from a team as code owners March 31, 2025 21:03
@TrevorBergeron TrevorBergeron requested review from tswast and removed request for drylks-work March 31, 2025 21:03
@@ -57,8 +58,13 @@ def test_bq_session_create_temp_table_clustered(bigquery_client: bigquery.Client
assert result_table.clustering_fields == cluster_cols

session_resource_manager.close()
with pytest.raises(google.api_core.exceptions.NotFound):
Copy link
Collaborator

@tswast tswast Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: if you sync to main I made a similar fix in https://github.com/googleapis/python-bigquery-dataframes/pull/1572/files. I think we can keep the pytest.raises and just do the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, just merged in and used your version

@TrevorBergeron TrevorBergeron requested a review from tswast April 1, 2025 17:59
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a few nits.

Comment on lines 404 to 410
anon_dataset_manager = getattr(self, "_anon_dataset_manager", None)
if anon_dataset_manager:
self._anon_dataset_manager.close()

if getattr(self, "_session_resource_manager", None):
if self._session_resource_manager is not None:
self._session_resource_manager.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We can make this a little more internally consistent. If you'd like to use getattr in the if statement, let's use the := (walrus operator) so that we aren't fetching from self more than once.

Suggested change
anon_dataset_manager = getattr(self, "_anon_dataset_manager", None)
if anon_dataset_manager:
self._anon_dataset_manager.close()
if getattr(self, "_session_resource_manager", None):
if self._session_resource_manager is not None:
self._session_resource_manager.close()
if anon_dataset_manager := getattr(self, "_anon_dataset_manager", None):
anon_dataset_manager.close()
if session_resource_manager := getattr(self, "_session_resource_manager", None):
session_resource_manager.close()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committed my own suggestion. 🤞 hopefully I didn't break anything.

@@ -906,8 +923,6 @@ def read_csv(
engine=engine,
write_engine=write_engine,
)
table = self._temp_storage_manager.allocate_temp_table()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the table isn't created right away? If so, I think we might need to supply a session ID in the load job, if available.

Edit: I see this was moved to read_bigquery_load_job, which makes sense to me. Aside: with more hybrid engine stuff in the future, I can imagine some cases where to_gbq() would be doing a load job to a user-managed table, but I suppose that would probably use a very different job config, anyway.

@@ -166,8 +171,13 @@ def read_pandas_load_job(
job_config.clustering_fields = cluster_cols

job_config.labels = {"bigframes-api": api_name}
job_config.schema_update_options = [
google.cloud.bigquery.job.SchemaUpdateOption.ALLOW_FIELD_ADDITION
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: why ALLOW_FIELD_ADDITION here, but using WRITE_TRUNCATE for read_gbq_load_job? Might be worthwhile to add some comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to make sure the ordering_col does not get overridden, as that is what is clustered.
Might still work with TRUNCATE as well?

)

self._start_generic_job(load_job)
table_id = f"{table.project}.{table.dataset_id}.{table.table_id}"

# Update the table expiration so we aren't limited to the default 24
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume _storage_manager.create_temp_table handles this in the anonymous dataset case now?

Edit: Yes, I see we do set an expiration above:

        expiration = (
            datetime.datetime.now(datetime.timezone.utc) + constants.DEFAULT_EXPIRATION
        )
        table = bf_io_bigquery.create_temp_table(
            self.bqclient,
            self.allocate_temp_table(),
            expiration,
            schema=schema,
            cluster_columns=list(cluster_cols),
            kms_key=self._kms_key,
        )

@TrevorBergeron TrevorBergeron enabled auto-merge (squash) April 2, 2025 18:14
@TrevorBergeron TrevorBergeron merged commit 9711b83 into main Apr 2, 2025
17 of 24 checks passed
@TrevorBergeron TrevorBergeron deleted the use_temp_session_tables branch April 2, 2025 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants