Skip to content

Generate computed aliases for columns without explicit aliases #41

@fupelaqu

Description

@fupelaqu

Description

When no alias is specified for a column in a SQL query, the column name in the result set is the raw expression (e.g., COUNT(*), SUM(quantity), unit_price * quantity). This can cause issues with downstream tools that expect simple, unique column names.

Example

SELECT COUNT(*), SUM(quantity), unit_price * quantity FROM ecommerce

Current result columns: COUNT(*), SUM(quantity), unit_price * quantity

Expected Behaviour

When no alias is provided, generate a positional alias based on the column's position in the result set:

__c1, __c2, __c3, ...

This ensures:

  • Column names are always valid identifiers
  • Column names are unique within the result set
  • Downstream tools (Superset, DuckDB, JDBC clients) can reference columns unambiguously

Notes

  • This should only apply when the user has not specified an explicit AS alias
  • The naming convention __c<N> (double underscore prefix) avoids collisions with user-defined column names
  • Named columns (e.g., SELECT name, age FROM ...) should retain their original names

Resolution

Added fieldsWithComputedAliases as a separate lazy val in Select that generates positional aliases __c1, __c2, ... for expression fields (aggregates, arithmetic, transforms, window functions) without explicit AS aliases. The original fields property is unchanged, preserving SQL round-trip safety.

Detection rule

A field gets a computed alias when fieldAlias.isEmpty && identifier.functions.nonEmpty.

Strategy: separate property, not mutation

fieldsWithComputedAliases is used only where output names matter (aggregation naming, script fields, output column names). Select.sql uses original fields — no AS __cN injected into SQL representation.

Additional fix: dotted field names in GROUP BY

Removed the historical replace(".", "_") in Bucket.path — ES allows dots in aggregation names. This ensures dotted fields like profile.city keep their original name as the bucket key, matching the output column name from extractOutputFieldNames.

Modified files

File Change
sql/.../query/Select.scala Added fieldsWithComputedAliases, updated fieldAliases to use it
sql/.../query/package.scala Use fieldsWithComputedAliases in aggregates, scriptFields, windowFields, nestedHitsMappings
sql/.../query/GroupBy.scala Removed replace(".", "_") from Bucket.path
core/.../client/SearchApi.scala Use fieldsWithComputedAliases in extractOutputFieldNames

Tests

  • Parser unit tests: ParserSpec — 4 new tests covering computed aliases for aggregates, mixed named/unnamed fields, explicit alias preservation, arithmetic expressions
  • Integration test: ReplGatewayIntegrationSpec — end-to-end test with profile.city, COUNT(*), AVG(age) without aliases, verifying __c2 and __c3 column names

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions