OSM replication osc poly filter

Posted by NieWnen on 28 April 2026 in English.

Regional osm2pgsql updated minutely as an alternative to Overpass.

Short version:

Recently, I wrote a script that could help with OSM DB data replication using osm2pgsql by filtering .osc files with a specific .poly. It doesn’t limit to a specific continent or country, it can be used with any custom PBF (e.g. a specific city), so it doesn’t need to use .fr replication, it uses planet.osm.org directly (which also could be changed).

It’s inspired by the trim_osc.py script by Zverik, but rewritten from scratch with tests, because the script unfortunately didn’t work for me (maybe I did something wrong).

It’s not a very typical thing, so I’m not sure if it will be useful for anyone, but if someone would like try to self-host a regional OSM DB with replication, I recommend to at least checking it.

More details in the repo: osm-replication-osc-poly-filter

Longer version:

A few months ago, I was looking for an alternative for public Overpass instances (due to overloaded servers) for my projects.

I read SomeoneElse’s diary about self-hosting Overpass instance. But it seems over-complicated to me. I also read many times that there are random issues with reliability and it’s time-consuming to maintain it (not sure how true it is).

I decided that I want to switch to something else for my projects. Something more low-level with greater control of data and more likely self-hosted to avoid such problems. Instead of OverpassQL. I switched to SQL with a PostGIS DB. There are no a lot of choices here, so I chose osm2pgsql.

Osm2pgsql is quite advanced tool, which I really recommend at least reading about it – it may require some time to learn it, but it’s worth to seeing some features like --output=flex, which allows to defining custom table schemas in Lua scripts with tags/geometry columns which works when importing and appending (replicating) data. It can be adjusted and optimized per project.

Unfortunately, there are some difficulties, like installing specific os2mpgsql version (there are no binaries of app in releases, no custom repos, and older OSes don’t get the latest version, Debian backports are also not good solution for me – I would expect possibility of easily pinning the version which I want to use). Replication also has some limitations. For example, if we load data from a GeoFabrik extract, it will replicate every ~24 hours, because GeoFabrik doesn’t support minutely replication. We can always replace it with .fr planet servers, but it won’t support untypical PBFs, and Poly files are a little bit different, so I didn’t decide to use it either.

I found this thread, which contains a solution to above problem, which means to: import custom PBF, and apply minutely replication. It relies on the trim_osc.py script by Zverik, but unfortunately it didn’t work for me. Maybe I did something wrong, but very quickly I saw data in my DB from another continent.

Finally, I decided to write my script from scratch which filters .osc files by specific poly. There are really many edge cases to handle, but despite the rewritten script, some limitations still exist due to complexity or .osc file lack of data (e.g. node versions with nd or member tags). I also dockerized everything and pushed it to Docker Hub. So it can be easily self-hosted.

Not sure if anyone will found it useful, but publishing it anyway.

More details in the repo: osm-replication-osc-poly-filter

PS. It’s my first diary, maybe I should write more often about my other projects as well :)

Discussion

Comment from Nakaner on 29 April 2026 at 18:18

On GitHub, you instruct to use the Geofabrik extracts with full metadata from https://osm-internal.download.geofabrik.de/europe/poland.html. I wonder why you need changesets, user IDs and user names. I grepped through your source code and the string “changeset” only exists in your import style file for Osm2pgsql and test data.

The “cleaned” extracts contain version numbers and timestamps. That should be enough for replication.

Changesets are a way of grouping changes for human review, not for programmes consuming the data. That’s why we at Geofabrik decided in 2018 to remove changesets, user names and user IDs from the publicly available extracts in order to protect the privacy of OSM contributors.

That’s why I suggest to instruct users to download the cleaned files from the public servers without authentication.

Comment from amapanda ᚛ᚐᚋᚐᚅᚇᚐ᚜ 🏳️‍⚧️ on 30 April 2026 at 07:37

I also read many times that there are random issues with reliability and it’s time-consuming to maintain it (not sure how true it is).

Yes Overpass can be …. sigh complicated.

BTW do you know about Postpass? That gives you an osm2pgsql database as a HTTP service

Comment from NieWnen on 30 April 2026 at 12:58

Thank you for spending your time, @Nakaner,

I wonder why you need changesets, user IDs and user names.

I am using a version with metadata to make the experience more similar to using global Overpass instances. It makes it easier to create a project that needs such data. For example, it’s common in Overpass to filter objects by a specific user or changesets (e.g. for stats or QA), which I would like to achieve using PostGIS.

I grepped through your source code and the string “changeset” only exists in your import style file for Osm2pgsql and test data.

Yes, in this or the linked example, I don’t use metadata for any specific case. I actually don’t use any OSM data (excluding just filtering/replicating) in these repositories. They are only to simplify setup and manage replication for a PostGIS DB.

protect the privacy of OSM contributors

I don’t want to argue about privacy in this diary, but in my opinion it doesn’t change anything here, because you can already get this data from another source. You’ve done great work at GeoFabrik by preparing these extracts, allowing people to save time and resources by getting it faster, but even if you remove it, people can still get this data, so I don’t see a reason to restrict access or avoid using it in a README as an instruction, if it’s completely public (assuming no country-specific law requires it from you).

That’s why I suggest to instruct users to download the cleaned files from the public servers without authentication. I can change it to include both, with a recommendation to use the cleaned files. Would that satisfy you?

Comment from NieWnen on 30 April 2026 at 13:07

@amapanda

BTW do you know about Postpass?

Yes, I’ve seen this project. It looks interesting, but it doesn’t cover all my needs.

Comment from Nakaner on 1 May 2026 at 11:16

Yes, thank you.

OpenStreetMap

NieWnen's Diary

OSM replication osc poly filter

Regional osm2pgsql updated minutely as an alternative to Overpass.

Discussion

Leave a comment