From some millions of points I am trying to select those points that are within a polygon and save them into a new table. Both datasets with spatial index and both on projected coordinates.
The query I am trying with is:
CREATE TABLE result AS SELECT t1.point_id, t1.geom FROM points t1 INNER JOIN boundaries t2 ON ST_WITHIN(t1.geom, t2.geom) WHERE t2.country LIKE "England"
The query has been running for hours.
On the same machine I have done exactly the same thing using ArcMap in 25 minutes. I really don't understand this. I was under the impression that doing geoprocessing tasks directly on a RDBS (Postgres or SQL Server) was the fastest option (provided the query is the right one).
Is there anything wrong with the query? Is there any way to optimise it? Am I missing something here?
PS: I tried with ST_Intersect instead of ST_WITHIN with the same result.
Looks like your query is a bit overcomplicated. To find all the points that are in "England you can simple run following query.
SELECT * FROM points WHERE geom && (SELECT geom FROM boundaries WHERE country = "England");
Now the problems with your current solution:
You are missing a b-tree index on boundaries. Add it using the following query:
CREATE INDEX ON boundaries (country);
I hope it helps you a bit to sort your issue out.
You should definitely try to subdivide your boundary polygons! The spatial index serves only to select candidate geometries based on their bounding boxes. Then real geometry has to be rechecked for each candidate to produce real results (as you can see in the execution plan). So... if your boundary polygon is complex, that recheck is very expensive! That can be avoided by subdividing large polygons to smaller pieces and then perform spatial operations on that smaller polygons.
Firs create a new table with subdivided polygons:
create table boundaries_subdiv as select country, st_subdivide(geom, 50) geom from boundaries;
The second parameter on st_subdivide defines the maximum number of vertices that resulting polygons can have. You can try other numbers, but 50 to 100 seems to be a good option (by my observations...)
Then create indexes on that table
create index boundaries_subdiv_geom on boundaries_subdiv using gist (geom); create index boundaries_subdiv_country on boundaries_subdiv (country);
Then perform your query, but with subdivided boundaries:
CREATE TABLE result AS SELECT t1.point_id, t1.geom FROM points t1 INNER JOIN boundaries_subdiv t2 ON ST_WITHIN(t1.geom, t2.geom) WHERE t2.country LIKE "ENGLAND"
Or more refined one with st_intersects:
create table result AS select t1.point_id, t1.geom from points t1 join boundaries_subdiv t2 ON st_intersects(t1.geom, t2.geom) where t2.country = "ENGLAND"
That should help a lot! Further reading here.