[SPARK-54854][SQL] Add a UUIDv7 queryId to SQLExecution Events #53625

asl3 · 2025-12-27T01:26:04Z

What changes were proposed in this pull request?

Add a new UUIDv7 queryId object to SparkListenerSQLExecutionStart and propagate it through the SQL execution lifecycle via SparkContext local properties.

Currently, Spark uses executionId to connect jobs, stages, and tasks with SQL executions. However, this field is not globally unique, as multiple Spark applications can include the same executionIds. UUIDv7 allows for a time-ordered, globally unique identifier for improved telemetry across systems.

In a separate PR, plan to add queryId as a new field to SparkUI.

Why are the changes needed?

Add a globally unique, time-ordered identifier for Spark SQL query execution events.

Does this PR introduce any user-facing change?

No, this PR simply adds the internal queryId which is not yet surfaced.

How was this patch tested?

Added tests for UUIDv7 generator and SQLExecution queryId propagation.

Was this patch authored or co-authored using generative AI tooling?

UUIDv7Generator was written with help of claude-4.5-sonnet according to the specification in https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#section-5.2

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

cloud-fan · 2025-12-27T15:20:41Z

...e/src/main/scala/org/apache/spark/sql/execution/streaming/runtime/IncrementalExecution.scala

    val outputMode: OutputMode,
    val checkpointLocation: String,
-    val queryId: UUID,
+    override val queryId: UUID,


does this use UUIDv7?

sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala

Copilot

Pull request overview

This PR adds a UUIDv7-based queryId to SQL execution events to provide a globally unique, time-ordered identifier for tracking SQL queries across systems. The key changes include:

Introduction of a UUIDv7 generator for creating time-ordered unique identifiers
Addition of queryId field to QueryExecution and propagation through the SQL execution lifecycle
Modification of SparkListenerSQLExecutionStart to include the queryId

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
sql/core/src/main/scala/org/apache/spark/sql/util/UUIDv7Generator.scala	New UUIDv7 generator implementation following RFC draft specification
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala	Adds queryId field and executionCount tracking to QueryExecution
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala	Implements queryId propagation via SparkContext local properties and generation logic
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala	Adds queryId parameter to SparkListenerSQLExecutionStart event
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala	Updates event handler to extract queryId from events
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/runtime/IncrementalExecution.scala	Declares queryId as override since it's now inherited from QueryExecution
sql/core/src/test/scala/org/apache/spark/sql/util/UUIDv7GeneratorSuite.scala	Comprehensive test suite for UUIDv7 generator covering format, uniqueness, monotonicity, and timestamp accuracy
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala	Tests for queryId propagation in concurrent and sequential execution scenarios
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/*.scala	Updates test event constructors to include None for queryId parameter
sql/core/src/test/scala/org/apache/spark/sql/execution/history/*.scala	Updates test event constructors to include None for queryId parameter
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListenerSuite.scala	Updates test event constructors to include None for queryId parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-28T16:22:45Z

sql/core/src/main/scala/org/apache/spark/sql/util/UUIDv7Generator.scala

+   * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#section-5.2
+   */
+
+  private val random = new Random()


The shared Random instance in the object is not thread-safe. Since generateFrom can be called from multiple threads concurrently (via SQLExecution.withNewExecutionId0), the shared Random instance may produce non-unique UUIDs due to race conditions in Random.nextLong().

Consider using ThreadLocalRandom.current() instead of a shared Random instance to ensure thread-safety. ThreadLocalRandom is the standard approach for concurrent random number generation in Java/Scala and is used elsewhere in the Spark codebase.

sql/core/src/main/scala/org/apache/spark/sql/util/UUIDv7Generator.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

pan3793 · 2025-12-29T03:07:59Z

sql/core/src/main/scala/org/apache/spark/sql/util/UUIDv7Generator.scala

+   * Deterministic UUIDv7 generation from epochMilli and nanos.
+   * Called by generate() and used for testing.
+   */
+  def generateFrom(epochMilli: Long, nano: Int): UUID = {


nano is not mentioned in the RFC, does this implementation have a reference version?

BTW, OpenJDK 26 starts to provide a built-in UUIDv7 implementation openjdk/jdk@642ba4c, which could be a good reference

this implementation follows the spec here

pan3793 · 2025-12-29T03:14:06Z

could the UUIDv7Generator be defined in the common/utils-java? it's likely that UUIDv7 to be used in more places, e.g.,

use UUIDv7 as the staging dir name, then the admin can easily delete the dangling staging dirs by using a prefix pattern,
use UUIDv7 as session id and operation id in Connect/Thrift Server, then we can infer the creation time from the id itself.
add built-in function uuid_v7, like PostgreSQL.

heyihong · 2025-12-29T14:40:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

+
+  // Tracks how many times this QueryExecution has been executed.
+  // Used by SQLExecution to determine whether to use the existing queryId or generate a new one.
+  val executionCount = new AtomicInteger(0)


Is there a need to expose executionCount? Should we make it private[sql]? Also, it seems that an AtomicBoolean is sufficient to check whether this is the first execution. Theoretically, executionCount can go back to 0 after many executions.

sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala

github-actions bot added SQL STRUCTURED STREAMING WEB UI CONNECT labels Dec 27, 2025

asl3 added 2 commits December 26, 2025 17:27

Add query id

bdb90f9

uuidv7 generator

a075cfa

asl3 force-pushed the sparkqueryid branch from 69d3dcd to a075cfa Compare December 27, 2025 01:28

asl3 changed the title ~~[SPARK-54854][SQL] Add queryId (UUIDv7) to SQL Execution Events~~ [SPARK-54854][SQL] Add a UUIDv7 queryId to SQL Execution Events Dec 27, 2025

asl3 changed the title ~~[SPARK-54854][SQL] Add a UUIDv7 queryId to SQL Execution Events~~ [SPARK-54854][SQL] Add a UUIDv7 queryId to SQLExecution Events Dec 27, 2025

cloud-fan reviewed Dec 27, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Dec 27, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala Outdated Show resolved Hide resolved

yaooqinn requested a review from Copilot December 28, 2025 16:19

Copilot started reviewing on behalf of yaooqinn December 28, 2025 16:19 View session

Copilot AI reviewed Dec 28, 2025

View reviewed changes

queryid none for legacy spark versions

3d4fdaf

github-actions bot removed the CONNECT label Dec 28, 2025

pan3793 reviewed Dec 29, 2025

View reviewed changes

heyihong reviewed Dec 29, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54854][SQL] Add a UUIDv7 queryId to SQLExecution Events #53625

[SPARK-54854][SQL] Add a UUIDv7 queryId to SQLExecution Events #53625

asl3 commented Dec 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

cloud-fan Dec 27, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Uh oh!

Uh oh!

pan3793 Dec 29, 2025

Uh oh!

asl3 Dec 29, 2025

Uh oh!

pan3793 commented Dec 29, 2025

Uh oh!

heyihong Dec 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-54854][SQL] Add a UUIDv7 queryId to SQLExecution Events #53625

Are you sure you want to change the base?

[SPARK-54854][SQL] Add a UUIDv7 queryId to SQLExecution Events #53625

Conversation

asl3 commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

cloud-fan Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pan3793 Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

asl3 Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Dec 29, 2025

Uh oh!

heyihong Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

asl3 commented Dec 27, 2025 •

edited

Loading

heyihong Dec 29, 2025 •

edited

Loading