-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description
The SQL parser does not support double-quoted identifiers ("identifier"). This causes failures when SQL clients (Superset, DBeaver, etc.) generate standard SQL that quotes column aliases or table/column names.
Root Cause
The parser expects bare identifiers in ORDER BY, AS, and other clauses. When it encounters "Revenue" instead of Revenue, it fails with:
string matching regex '(?i)(FROM)\b' expected but 'A' found
Failing Queries (generated by Apache Superset)
-- Chart: Revenue by Country
SELECT country AS country, sum(total_price) AS "Revenue"
FROM ecommerce GROUP BY country ORDER BY "Revenue" DESC LIMIT 10
-- Chart: Orders by Category
SELECT category AS category, COUNT(*) AS "Orders"
FROM ecommerce GROUP BY category ORDER BY "Orders" DESC LIMIT 10
-- Chart: Avg Order Value by Payment Method
SELECT payment_method AS payment_method, AVG(total_price) AS "Avg Order Value"
FROM ecommerce GROUP BY payment_method ORDER BY "Avg Order Value" DESC LIMIT 10
-- Chart: Top Customers
SELECT customer_name AS customer_name, country AS country,
sum(total_price) AS "Total Spend", COUNT(*) AS "Orders"
FROM ecommerce GROUP BY customer_name, country
ORDER BY "Total Spend" DESC LIMIT 10All fail because:
AS "Revenue"— quoted alias in SELECTORDER BY "Revenue"— quoted identifier reference in ORDER BY
Expected Behaviour
Double-quoted identifiers should be treated as regular identifiers with the quotes stripped:
SELECT country, SUM(total_price) AS "Revenue"
FROM ecommerce GROUP BY country ORDER BY "Revenue" DESCShould parse identically to:
SELECT country, SUM(total_price) AS Revenue
FROM ecommerce GROUP BY country ORDER BY Revenue DESCFor identifiers containing spaces or reserved words, the quotes are mandatory:
SELECT SUM(total_price) AS "Total Spend" FROM ecommerceSQL Standard Reference
ANSI SQL uses double quotes for delimited identifiers (SQL-92, section 5.2). This is the standard quoting mechanism used by:
- PostgreSQL
- Apache Superset (SQLAlchemy default)
- DBeaver
- Most JDBC/ODBC tools
Scope
The parser should accept double-quoted identifiers in at least:
| Context | Example |
|---|---|
Column alias (AS) |
SELECT col AS "My Alias" |
ORDER BY |
ORDER BY "My Alias" DESC |
GROUP BY |
GROUP BY "col name" |
| Column reference | SELECT "col" FROM t |
WHERE clause |
WHERE "col" > 10 |
HAVING clause |
HAVING "Total" > 100 |
Impact
- Apache Superset charts fail entirely — all auto-generated SQL uses quoted identifiers
- DBeaver and other JDBC clients may quote identifiers depending on settings
- Blocks the Superset demo dashboard (all 6 charts return internal errors)
Resolution
Added two new parser combinators — quotedIdentifier and quotedAlias — that match "..." content and produce GenericIdentifier / Alias with the unquoted name. These are inserted as the first alternative in all identifier-expecting parser chains, before identifierWithArithmeticExpression (which would otherwise consume "..." via the literal → StringValue → functionAsIdentifier path).
Strategy: targeted insertion, not global replacement
The literal parser in type/package.scala was not modified — double-quoted strings remain StringValue in value positions (WHERE RHS, DDL OPTIONS). This preserves DDL round-trip safety.
Modified files
| File | Change |
|---|---|
Parser.scala |
Added quotedIdentifier, quotedAlias; inserted quotedIdentifier in valueExpr |
SelectParser.scala |
quotedIdentifier first in field chain; quotedAlias | alias for alias parsing |
OrderByParser.scala |
quotedIdentifier first in fieldWithFunction |
GroupByParser.scala |
quotedIdentifier first in bucketWithFunction |
WhereParser.scala |
quotedIdentifier first in any_identifier; updated isNull/isNotNull |
Tests
- Parser unit tests:
ParserSpec— 6 new tests covering Superset queries, quoted aliases with spaces, quoted SELECT/WHERE/GROUP BY, reserved words as identifiers - Integration test:
ReplGatewayIntegrationSpec— end-to-end test withAS "City",AS "Total",AS "Avg Age"andORDER BY "Total" DESCagainst real Elasticsearch