Transforming Data using SQL
Dozer allows users to perform SQL queries and operations on streaming data in real-time. It supports operations such as filtering, aggregating, and joining with other data streams. Upon startup, Dozer converts the provided query into an execution plan for real-time processing, which is then used to process incoming data, produce results, and send them to a caching layer. Dozer's SQL adheres to standard ANSI SQL as much as possible, however some features of the language are still under development.
Supported Constructs
DozerSQL is a specialized component within the Dozer platform that is designed specifically for preparing streaming data
for storage in a cache. This involves extracting, transforming and filtering the data using SELECT statements,
which are the only type of SQL statement supported by DozerSQL. The output of DozerSQL is a stream of data that
is ready to be inserted into a cache, in order to increase performance and scalability of the data pipeline.
DozerSQL works in real-time and only SELECT statement are supported, with the goal of providing optimized
data preparation step prior to be inserted into cache.
SELECT
DozerSQL supports a quasi ANSI compatible version of SELECT:
SELECT [connection.][table1.]field [AS alias] [, [connection.][table2.]field [AS alias]]
INTO store_name,
FROM [table or SELECT statement | CTE] [alias] [[INNER | LEFT | RIGHT] JOIN [table2] ON [fields equality condition]]
WHERE [filter condition]
[UNION | UNION ALL]
...
Breakdown:
SELECT Clause:
connection.: Optionally specify the connection name, ensuring you can pull data from tables across various connections.table1.: Optionally state the name of the table from which the field is being selected.field: The specific column or field you're targeting.AS alias: This gives you the option to rename the resulting field.
INTO Clause:
store_name: The destination where theSELECTquery's results will be stored.
FROM Clause:
table: This is the primary source of data.SELECT statement: A subquery that functions as a derived table, serving as an alternative to a direct table selection.CTE (Common Table Expression): This creates a temporary result set that can be referenced within the main SQL query.alias: Optionally assign an alias to the table, subquery, or CTE, making it easier to refer to in other parts of the query.
JOIN Clause:
INNER | LEFT | RIGHT: These dictate the kind of join:INNER JOIN: Fetches rows that match the join condition from both tables.LEFT JOIN: Retrieves all rows from the left table and those from the right table that match the join condition.RIGHT JOIN: Prioritizes all rows from the right table, along with matching rows from the left table.
ON: Establishes the joining condition.fields equality condition: This condition is for the equi-join between fields from the primary and secondary tables. The join condition must be an equi-join, and only theANDoperator can be used to combine conditions.
WHERE Clause:
filter condition: This sets the criteria to filter the rows resulting from theSELECTstatement.
UNION | UNION ALL Clause:
UNION: Combines the result sets of two or moreSELECTstatements, but does not include any duplicate rows.UNION ALL: Also combines the result sets of two or moreSELECTstatements, but it includes duplicates.