Skip to content

window functions: a prototype #18097

@akuzm

Description

@akuzm

Some thoughts for internal discussion. The main task is here: #1469

What we need for a meaningful prototype:

  • single-threaded, partitioning via sort
  • the only supported frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  • only plain aggregate functions
  • standard grammar (at least I want to be able to write count() over (partition by x order by y)).
  • works properly across blocks
  • can be used as an argument to other SELECT expressions (this is our ClickHouse-specific non-standard part, normally SQL doesn't let you reference the SELECT list). Will require some moderate complications in ExpressionActions (a separate type of action, can't be calculated by ExpressionTransform so we have to split the pipeline and insert WindowTransform there). Can be postponed. An example: SUM(col) / SUM(SUM(col)) OVER (PARTITION BY col2) aggregate function over WINDOW is not supported. #19857

Rough edges to smooth out after that:

  • proper grammar for declaring windows

  • implement RANGE frame and make it default. It is default in other DBs, so people get confused when they try CH and don't get the expected results because we only support ROWS. The default range should suffice for now. This is going to be complicated, because a group can span an arbitrary number of past and future blocks, and we can't output the group until it ends. This requires adding a separate kind of processor + some convenient way to address an arbitrarily large sequence of rows spanning many blocks (tuplestore in Postgres terms). We're going to need it all anyway, to support a moving frame start, and a frame end that is farther than current row.

  • consider the source order not to perform redundant sorting -- this one might require massive refactoring window functions: use storage sorting key #19289

  • work efficiently w/multiple windows (choose an order of windows that allows to sort less) Try to find sort order that satisfies multiple windows #19779

Advanced features:

  • multithreaded calculation. This can be naturally implemented over hash-based partitioning.
  • frame start that moves forward
    • subtraction from aggregate function state (the inverse of IAggregateFunction::add)
  • frame end that is further than current row
    • this one is really bad, needs a separate "processor" and some tricky buffering, because each input block does not necessarily produce an output block
  • by combining the above, we get sliding window (ROWS BETWEEN <X> PRECEDING AND <X> FOLLOWING)
  • RANGE OFFSET frame, for sliding window based on time
  • window functions that are not aggregate functions, e.g. rank()
    • some of them might require full read-behind + read-ahead? -- scary

References:

https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW
https://www.postgresql.org/docs/devel/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
https://www.postgresql.org/docs/devel/functions-window.html
https://www.postgresql.org/docs/devel/tutorial-window.html

https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html
https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html
https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html

Some drafts:

#18222
#18455
#19022 WINDOW clause
#19299 GROUPS frame

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions