Beyond Web Feature Services Part 1
Well it seems like my previous post on WFS 3.0 struck a nerve — it’s the most shared post that I’ve written, by far. Reading it again, I think I over-corrected a bit too much from my first two writing attempts, where I dug in to my history with Web Feature Services (WFS), as the results were far more negative than I desired. But the resulting post, and especially the title, came across as probably a bit more enthusiastic than I intended. Jeff Yutzler had perhaps my favorite response to what I wrote.
I admit I probably have a bit more optimism for WFS than Jeff, but I share skepticism on the core, particularly a single standard API to access features. So I thought I would explore what I mean when I talk about ‘WFS’, though many would consider the things I’m most excited about as something different — not WFS. But to me they are the proper way to get at the goals of WFS, and from what I’ve seen they are in line with the direction the WFS 3.0 working group is heading. The piece I’m excited about is granular geospatial API patterns that others can easily adopt, and the second is crawlability — exposing feature data online, but letting others index and search it. In this post I’ll explore the first one, and continue on the second in its own post.
But first let’s dive in to a bit more background. To me the main goal of the Web Feature Service specification is to make geospatial data accessible. The reason I first worked on WFS was that the funder of the non-profit I was working for had a vision. He really wanted to create traffic models that would prove that New York City would be better off without cars. But we had to start one level lower, as you need the actual road and traffic data to make a traffic model. You can’t create your own model without the real data, which Google Maps or (at the time) Mapquest doesn’t give you. They render a pretty picture:
But you don’t have access to the data, you can’t download the typical traffic patterns at each day and time and add them to your own model. Data in the geospatial world is very similar to source code in the software world. The vast majority of people use the compiled binary of the source code — the installer, the .exe or .dmg — but if one is a software developer then one needs the actual source code to do one’s job. Similarly, the vast majority of people use the visualization of geospatial data — the map, be it online or printed. But if one is a geospatial analyst then one needs the actual data to do one’s job. The geospatial expert can take the data and do their own analysis, make their own maps, and even improve the data. This is not just for ‘open data’ or open source (though that’s what I’m most excited about) — it still is far more useful if there’s consistent and interoperable ways to get at the source, even if the data or code is proprietary. It would just be internal to an organization, instead of open for all.
So when I express enthusiasm for WFS, it is far more basic than even a web service that responds to queries — it is the practice of making source geospatial data available. That could be an active API, it could be HTML features that can be crawled, it could even be a GeoPackage on S3. But I’m interested in increasing the interoperability of all forms of requesting source data — the actual geometry and the full attributes to do real analysis.
Geospatial API Components
I believe one of the most important things that can come out of the WFS 3.0 specification is not a single interface that everyone exposing geospatial data on the web should implement. Instead it’s a set of granular API components that any developer who is interested in adding some geospatial functionality to their app or API can easily make use of. You want to return JSON geospatial objects? Use GeoJSON. You want a pattern to page through the JSON results? Here is a mini ‘paging GeoJSON’ spec. You want to filter results spatially? Filter by geometry in addition to bounding box? Handle requests that cross the dateline? Deal with geometries in different projections? Each should be its own small mini-specification that is easily consumable by a developer who has never touched the geospatial world. The broader geospatial world has done a much better job at this than the OGC, with nice tight specifications like GeoJSON, UTFGrids, TMS, Vector Tiles, etc.
I believe what is most important is that the easy case must be super simple to make use of, but a really good specification should contain all the ‘hooks’ where additional functionality can be added. Those who have built geospatial systems have all experienced starting simple and then having users ask for a number of more difficult scenarios. What a set of good standards can do is help guide that path — an initial filtering implementation may assume LongLat and only BBOX filtering of points. But there is a spot for an EPSG code in the future, when a user wants a polar projection. And there are specified ways to use more geometries. The first implementation can safely ignore them, but the developer can find a small spec that explains how to extend his API to support BBOX and Polygons when a user has a particular polygon they want to query for results.
The goal for me is to increase the interoperability in data access, and to do that in small ways — provide some API patterns and open source implementations that make it easy to produce or consume components that follow those practices. The ‘win’ is that when Facebook, Twitter or new start-up X adds some geospatial queries that it doesn’t require a custom client library — that existing tools would read it because they’d follow enough API patterns to ‘fit in’.
I believe it is incumbent on those who have built numerous geospatial systems to create standards that give those new to the field a smooth ‘on ramp’. It is much harder to make a small specification that can be digested in a half hour than to just pile all the edge cases into a huge document. It takes careful editing, and understanding of the user and how to communicate the most important bits. And the key is to have that small specification not be simplistic, with problems emerging when one needs to add more complex functionality. It should solve the 80% use case, with an array of compatible extensions that solve the additional 20%.
What does this have to do with WFS?
So why am I conflating a focus on granular geo specifications with the Web Feature Service specification? If the goal is to create a geospatial API components then shouldn’t we just do that? My experience has taught me that it’s best to start with more practical problems — build a real solution and then break things up, instead of starting with a big plan to create building blocks and hoping they fit together right. So starting with the core ‘problem’ of accessing the source data behind maps is a sensible place to start.
The granular components one needs for WFS cover many of the core API pieces one likely needs for a decent geospatial data API. At the center there is filtering data, sending queries, paging through results, properly setting cache control headers, reporting capabilities, error codes, and various content types. And then a number of interesting potential extensions follow, like transactions, subscriptions / syncing, alternate interfaces like gRPC and more. So working through how WFS handles these should entail thinking through what each of those looks like in a modern, RESTful world — hopefully mostly just enshrining the current best practices in the broader web world with the ‘official’ OGC stamp. In my previous post I was excited about the use of OpenAPI to describe the interfaces. And I believe doing so will enable the API components to be reusable, so patterns made for WFS can be reused by other specifications.
The core group working on WFS seems to be embracing this direction, thinking about the minimal set of components to enable features to be accessed through an API. And so my enthusiasm is much more about the fact that a new style of approaching specifications is being utilized. The open collaboration with GitHub and OpenAPI, REST+JSON at the core, embracing the web (more on that in the next post) and a focus on ease of implementation are the foundations. And my hope is that all those together make it easy to extract out the core components and eventually put the emphasis on that nice geospatial API on-ramp, instead of particular monolithic Web ____ Services.
There is still a lot of work to do to get closer to a vision of granular geospatial API components. I started planning on the SpatioTemporal Asset Catalog specification before I realized WFS 3.0 was going in similar directions, and my goal there has been to create a small set of granular API components that are useful for searching imagery. The overlap of those with the components needed for WFS is immense, as each image is represented by a geometry plus a number of metadata fields — essentially a ‘feature’. So we’ve been thinking about how to bring WFS and STAC together; indeed the STAC API used the sample WFS 3.0 swagger doc as its starting point. My hope is we can have a sprint or hackathon that lets STAC developers collaborate directly with WFS 3.0 developers, so that the two share as many common components as possible. And perhaps it may even make sense for STAC to be a WFS ‘implementation’ that makes a couple opinionated choices and extensions on top of the WFS core.
I’m also excited to try to bring together many of the ideas related to static catalogs in to the WFS world, but I’ll save that write-up for its own post. I believe it can help with the scalability problems of WFS, and help embrace the web even more fully.