Spatial Join Query Optimization on Large Data Set

2018-07-03 17:16:11

I have a use case where two sets of data are joined with an expensive spatial predicate. To parallelize the query I partitioned the spatial universe into tiles (in the order of thousands) such that only records belong to the same tile are need to be tested with the spatial predicate. The query is as follows:

SELECT ST_Area(ST_Intersection(A.polygon, B.polygon))
    / ST_Area(ST_Union( A.polygon, B.polygon))  AS a_ratio
FROM spatial_table_a A
JOIN spatial_table_b B ON ST_Intersects(A.polygon, B.polygon)
WHERE A.tilename = B.tilename;

Ideally the query plan should hash the records according to the tilename , then perform the spatial predicate checking ST_Intersects using either an index scan join or a nested loop join.

However, what I'm getting now is a sub-optimal plan which prematurely performs the spatial join. The plan is shown below:

->  Hash Join  (cost=759468.44..377874772.26 rows=2610 width=18)
         Hash Cond: "outer"."?column4?" = "inner"."?column4?"
             Join Filter: a.polygon && b.polygon AND _st_intersects(a.polygon, b.polygon)
             ->  Seq Scan on spatial_table_b b  (cost=0.00..409556.95 rows=288816 width=1034)
             ->  Hash  (cost=375827.86..375827.86 rows=283522 width=946)
                   ->  Seq Scan on spatial_table_a a  (cost=0.00..375827.86 rows=283522 width=946)

So, my question is: How can force the query optimizer to generate a better plan (which basically changes the join order )?

How about:

SELECT ST_Area(ST_Intersection(a, b))
    / ST_Area(ST_Union( a, b))  AS a_ratio
FROM
(
 SELECT a.polygon AS a, b.polygon AS b
 FROM  spatial_table_a A
 JOIN spatial_table_b B 
   ON A.tilename = B.tilename
 OFFSET 0
) AS q
WHERE ST_Intersects(a, b);

This should force the query planner to first join the two tables on tilename and only then check if the two polygons intersect. That will give you a diffrent query plan but I'm not sure if it will be the query plan you're looking for.

链接地址: http://www.djcxy.com/p/93936.html

上一篇: SQL Server 2012 CTE查找分层数据的根目录或顶级父目录

下一篇: 大数据集空间联合查询优化