Find points within polygon with POSTGIS really slow

by Pitrako Junior   Last Updated October 10, 2019 17:22 PM

From some millions of points I am trying to select those points that are within a polygon and save them into a new table. Both datasets with spatial index and both on projected coordinates.

The query I am trying with is:

SELECT t1.point_id, t1.geom
FROM points t1
INNER JOIN boundaries t2
ON ST_WITHIN(t1.geom, t2.geom)
WHERE LIKE "England"

The query has been running for hours.

On the same machine I have done exactly the same thing using ArcMap in 25 minutes. I really don't understand this. I was under the impression that doing geoprocessing tasks directly on a RDBS (Postgres or SQL Server) was the fastest option (provided the query is the right one).

Is there anything wrong with the query? Is there any way to optimise it? Am I missing something here?

PS: I tried with ST_Intersect instead of ST_WITHIN with the same result.

Answers 2

Looks like your query is a bit overcomplicated. To find all the points that are in "England you can simple run following query.

SELECT * FROM points
WHERE geom && (SELECT geom FROM boundaries WHERE country = "England");

Then if indexes are created properly you should have the following query plan: enter image description here

Now the problems with your current solution:

  1. Use "Like" only when you want to check if the column contains a string. Like its more expensive than normal equal.
  2. Don't use "ST_WITHIN" or "ST_Contains" if you looking only for an intersection. "ST_WITHIN" is more for distance searches and "ST_Contains" is not recommended for point intersections but for more complex geometry checks. Both commands are more expensive than "st_intersects".
  3. When you intersecting use "ST_INTERSECTS" or "&&" both got high index hit rate and should give you fast results.
  4. Don't do JOINs unless you want to get some information from the other table. Joins are great things but use them wisely. Sometimes simple "where" condition is all you need.
  5. You are missing a b-tree index on boundaries. Add it using the following query:

    CREATE INDEX ON boundaries (country);

I hope it helps you a bit to sort your issue out.

October 11, 2019 10:45 AM

You should definitely try to subdivide your boundary polygons! The spatial index serves only to select candidate geometries based on their bounding boxes. Then real geometry has to be rechecked for each candidate to produce real results (as you can see in the execution plan). So... if your boundary polygon is complex, that recheck is very expensive! That can be avoided by subdividing large polygons to smaller pieces and then perform spatial operations on that smaller polygons.

Firs create a new table with subdivided polygons:

create table boundaries_subdiv as select country, st_subdivide(geom, 50) geom from boundaries;

The second parameter on st_subdivide defines the maximum number of vertices that resulting polygons can have. You can try other numbers, but 50 to 100 seems to be a good option (by my observations...)

Then create indexes on that table

create index boundaries_subdiv_geom on boundaries_subdiv using gist (geom);
create index boundaries_subdiv_country on boundaries_subdiv (country);

Analyze table:

analyze boundaries_subdiv;

Then perform your query, but with subdivided boundaries:

SELECT t1.point_id, t1.geom
FROM points t1
INNER JOIN boundaries_subdiv t2
ON ST_WITHIN(t1.geom, t2.geom)

Or more refined one with st_intersects:

create table result AS
  select t1.point_id, t1.geom
    from points t1
    join boundaries_subdiv t2 ON st_intersects(t1.geom, t2.geom)
   where = "ENGLAND"

That should help a lot! Further reading here.

October 14, 2019 06:27 AM

Related Questions

Updated May 26, 2017 10:22 AM

Updated November 01, 2018 13:22 PM

Updated April 22, 2017 09:22 AM

Updated October 26, 2018 11:22 AM

Updated November 12, 2018 17:22 PM