An Exploration of ‘Cloud-Native Vector’

Diverse Requirements

  • Visualization — one of the key things to do with data resting on the cloud is to visualize it, especially in web browsers. Vector tiles are super optimized for this use case and can be stored natively on the cloud, so they provide a great answer, but they aren’t great for non-visualization uses.
  • Analysis — the other thing you want to do is process the data to get insight. Analysis with cloud data warehouses is the most interesting to me, but it can also be done by streaming subsets to desktop (QGIS & ESRI) and browser-based (Unfolded, OpenLayers, etc) tools.

Visualization Challenges

Further requirements

Potential Formats

Flatgeobuf

  • Based on a more modern serialization format. In this case, it’s from a format called flatbuffers, which was created at Google and is now completely open-source.
  • Focused on its use cases. It aims at serving large amounts of static data with streaming and random access. It isn’t trying to be everything to everyone but does end up being much better than alternatives.
  • Good implementation support. They prioritized making sure that gdal, geotools, geoserver, postigs, qgis, openlayers & leaflet all support it, even though it’s a relatively young project.

Cloud-optimized Shapefile

GeoParquet & GeoArrow

GeoPackage

GeoJSON?

  • Cloud-Optimized GeoJSON was an experiment from Boundless (now part of Planet as Planet Federal), with an explicit goal to make a vector equivalent to Cloud-Optimized GeoTIFF. It mostly focused on providing a spatially-oriented ‘table of contents’ upfront, for better seek access.
  • Newline-delimited GeoJSON is an upgrade to GeoJSON that enables more efficient streaming using JSON Text sequences. From the overview site: ‘Each line can be read independently, parsed, and processed, without the entire file ever having to be parsed in memory, which is great for huge files.’ You can read more about it in this great blog post.

What’s next?

  • Formats for Cloud Data Warehouses. As a baseline, I believe we should be helping ensure interoperability between the cloud data warehouse vendors. They have adopted some new formats like Parquet, Avro & ORC, and as a geo-community we should help them get ‘geo’ right. But the real potential is to get to a format that they can use in a ‘cloud-native’ way, as an alternative to their internal formats that works ‘well enough’. This would enable data providers to publish in one format and have users of BigQuery, Snowflake, Redshift and more be able to use it directly in their workflows.
  • ‘Overviews included’. I’m far from sure this will work, but it feels like it’s worth experimenting with including some amount of ‘overviews’ directly in formats, to help meet the visualization and analysis use cases in a single file. Doing this will necessarily involve some duplication, but ideally there’s innovation on smart ways to link data within a format. The core ‘challenge’ would be to have a 1+ gigabyte file that can be visualized quickly in OpenLayers/Leaflet at the full layer extent while also providing full attribute data and indices (spatial and even others) for use in analysis.

Have thoughts & ideas on this? Share them!

--

--

--

Product Architect @ Planet, Board Member @ Open Geospatial Consortium, Technical Fellow @ Radiant.Earth

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Estimating π using polygons

92 million US workers now have the opportunity to work remotely: survey : Gadget Game News

A Comparison of Reader Forecasts Versus Elo-based Forecasts in FiveThirtyEight’s NFL Forecasting…

The Divided States of America

According to the American Diabetes Association, recent national surveys show that American adults…

How to read The Lord of the Rings in 5 minutes using data science: A Trilogy (Part 3*)

What Does Statistically Significant Mean?

Data Visualization using Matplotlib and Seaborn

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Chris Holmes

Chris Holmes

Product Architect @ Planet, Board Member @ Open Geospatial Consortium, Technical Fellow @ Radiant.Earth

More from Medium

The Exciting Future of the STAC Browser

Not the Shortest Path: Convert a Directory of Satellite Images to a Cloud Optimized Geotiff (COG)

Neo4j Driver Best Practices

BI on Graphs