Skip to content

Support double-quoted identifiers in SQL parser #48

@fupelaqu

Description

@fupelaqu

Description

The SQL parser does not support double-quoted identifiers ("identifier"). This causes failures when SQL clients (Superset, DBeaver, etc.) generate standard SQL that quotes column aliases or table/column names.

Root Cause

The parser expects bare identifiers in ORDER BY, AS, and other clauses. When it encounters "Revenue" instead of Revenue, it fails with:

string matching regex '(?i)(FROM)\b' expected but 'A' found

Failing Queries (generated by Apache Superset)

-- Chart: Revenue by Country
SELECT country AS country, sum(total_price) AS "Revenue"
FROM ecommerce GROUP BY country ORDER BY "Revenue" DESC LIMIT 10

-- Chart: Orders by Category
SELECT category AS category, COUNT(*) AS "Orders"
FROM ecommerce GROUP BY category ORDER BY "Orders" DESC LIMIT 10

-- Chart: Avg Order Value by Payment Method
SELECT payment_method AS payment_method, AVG(total_price) AS "Avg Order Value"
FROM ecommerce GROUP BY payment_method ORDER BY "Avg Order Value" DESC LIMIT 10

-- Chart: Top Customers
SELECT customer_name AS customer_name, country AS country,
       sum(total_price) AS "Total Spend", COUNT(*) AS "Orders"
FROM ecommerce GROUP BY customer_name, country
ORDER BY "Total Spend" DESC LIMIT 10

All fail because:

  1. AS "Revenue" — quoted alias in SELECT
  2. ORDER BY "Revenue" — quoted identifier reference in ORDER BY

Expected Behaviour

Double-quoted identifiers should be treated as regular identifiers with the quotes stripped:

SELECT country, SUM(total_price) AS "Revenue"
FROM ecommerce GROUP BY country ORDER BY "Revenue" DESC

Should parse identically to:

SELECT country, SUM(total_price) AS Revenue
FROM ecommerce GROUP BY country ORDER BY Revenue DESC

For identifiers containing spaces or reserved words, the quotes are mandatory:

SELECT SUM(total_price) AS "Total Spend" FROM ecommerce

SQL Standard Reference

ANSI SQL uses double quotes for delimited identifiers (SQL-92, section 5.2). This is the standard quoting mechanism used by:

  • PostgreSQL
  • Apache Superset (SQLAlchemy default)
  • DBeaver
  • Most JDBC/ODBC tools

Scope

The parser should accept double-quoted identifiers in at least:

Context Example
Column alias (AS) SELECT col AS "My Alias"
ORDER BY ORDER BY "My Alias" DESC
GROUP BY GROUP BY "col name"
Column reference SELECT "col" FROM t
WHERE clause WHERE "col" > 10
HAVING clause HAVING "Total" > 100

Impact

  • Apache Superset charts fail entirely — all auto-generated SQL uses quoted identifiers
  • DBeaver and other JDBC clients may quote identifiers depending on settings
  • Blocks the Superset demo dashboard (all 6 charts return internal errors)

Resolution

Added two new parser combinators — quotedIdentifier and quotedAlias — that match "..." content and produce GenericIdentifier / Alias with the unquoted name. These are inserted as the first alternative in all identifier-expecting parser chains, before identifierWithArithmeticExpression (which would otherwise consume "..." via the literalStringValuefunctionAsIdentifier path).

Strategy: targeted insertion, not global replacement

The literal parser in type/package.scala was not modified — double-quoted strings remain StringValue in value positions (WHERE RHS, DDL OPTIONS). This preserves DDL round-trip safety.

Modified files

File Change
Parser.scala Added quotedIdentifier, quotedAlias; inserted quotedIdentifier in valueExpr
SelectParser.scala quotedIdentifier first in field chain; quotedAlias | alias for alias parsing
OrderByParser.scala quotedIdentifier first in fieldWithFunction
GroupByParser.scala quotedIdentifier first in bucketWithFunction
WhereParser.scala quotedIdentifier first in any_identifier; updated isNull/isNotNull

Tests

  • Parser unit tests: ParserSpec — 6 new tests covering Superset queries, quoted aliases with spaces, quoted SELECT/WHERE/GROUP BY, reserved words as identifiers
  • Integration test: ReplGatewayIntegrationSpec — end-to-end test with AS "City", AS "Total", AS "Avg Age" and ORDER BY "Total" DESC against real Elasticsearch

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions