The optimizer was not generating correct distributed paths with Gather
Merge nodes, because those nodes always looked as if the data was not
distributed at all. There were two bugs causing this:
1) Gather Merge did not copy distribution from the subpath, leaving it
NULL (as if running on coordinator), so no Remote Subquery needed.
2) create_grouping_paths() did not check if a Remote Subquery is needed
on top of Gather Merge anyway.
After fixing these two issues, we're now generating correct plans (at
least judging by select_parallel regression suite).
NULL,
&total_groups);
+ /*
+ * If the grouping can't be fully pushed down, we'll push down the
+ * first phase of the aggregate, and redistribute only the partial
+ * results.
+ */
+ if (! can_push_down_grouping(root, parse, gmpath))
+ gmpath = create_remotesubplan_path(root, gmpath, NULL);
+
if (parse->hasAggs)
add_path(grouped_rel, (Path *)
create_agg_path(root,
required_outer);
pathnode->path.parallel_aware = false;
+ /* distribution is the same as in the subpath */
+ pathnode->path.distribution = (Distribution *) copyObject(subpath->distribution);
+
pathnode->subpath = subpath;
pathnode->num_workers = subpath->parallel_workers;
pathnode->path.pathkeys = pathkeys;
set enable_hashagg to off;
explain (costs off)
select string4, count((unique2)) from tenk1 group by string4 order by string4;
- QUERY PLAN
-----------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------
Finalize GroupAggregate
Group Key: string4
- -> Gather Merge
- Workers Planned: 4
- -> Partial GroupAggregate
- Group Key: string4
- -> Sort
- Sort Key: string4
- -> Parallel Seq Scan on tenk1
-(9 rows)
+ -> Remote Subquery Scan on all (datanode_1,datanode_2)
+ -> Gather Merge
+ Workers Planned: 4
+ -> Partial GroupAggregate
+ Group Key: string4
+ -> Sort
+ Sort Key: string4
+ -> Parallel Seq Scan on tenk1
+(10 rows)
select string4, count((unique2)) from tenk1 group by string4 order by string4;
string4 | count