“Cloud-native” or “cloud-optimized” geospatial data formats are specifically designed to be stored, managed, and retrieved from the cloud. They’re tuned for performant reading, offering functionality like filtering, parallel reading, lazy evaluation, and more. This technology has several advantages, including:
- Users can read only what they need. The ability to do partial reads instead of retrieving an entire dataset is helpful to both the user and the provider. For providers, this means reduced costs and server loads; for users, this means less time waiting for data, loading it into memory, and storing many files.
- Efficient data access over HTTP. Both users and providers can benefit from a low-latency connection to the data and easily create apps/tools to interact with it.
- Providers can make data more accessible. By hosting and managing it in the cloud, there’s no need to maintain local servers and databases.
- Run scalable, parallelizable data processing workflows near the data. Running workflows in the cloud next to the data saves time and money.
For anyone who works with large amounts of data, these are compelling reasons to consider cloud-native formats. As cloud storage continues to improve, we expect its adoption to keep increasing across industries.
To dive into this topic in more detail, watch our webinar, Cloud Revolution: Exploring the New Wave of Serverless Spatial Data.
What cloud-optimized data format do I use?
There’s no one-size-fits-all way to package data, which is why so many geospatial formats are available. Now, there are a growing number of cloud formats. As a provider, choosing the right one depends on the data type—raster, vector, point cloud, or otherwise. Below are some of the top cloud-native formats available at the time of writing, along with the types of data they support.
Rasters: Cloud Optimized GeoTIFF (COG)
Cloud Optimized GeoTIFF (COG) is a read-optimized format for storing rasters, such as Digital Elevation Models (DEMs). Its metadata includes a directory to indicate where to find different overview levels and data within the file, allowing users to read only what they need.
Point clouds: Cloud Optimized Point Cloud (COPC)
Cloud Optimized Point Cloud (COPC) is based on the LAS/LAZ specification. It can be read in subsets and allows readers to select and seek through the file.
Vector data: FlatGeoBuf
FlatGeoBuf is based on Google’s high-performance Flatbuffers library and supports quick bounding-box spatial filtering. Optional spatial indexing can help with efficient streaming and random access.
Data cubes or tensors: Zarr
Zarr is ideal for multi-dimensional arrays, such as weather data and climate models. It enables high-throughput parallel processing. It’s designed for various storage systems, among them the cloud. Users can chunk multi-dimensional arrays along any dimension.
GeoJSON data: SpatioTemporal Asset Catalogs (STAC)
SpatioTemporal Asset Catalogs (STAC) is a performant way to index geospatial assets. A STAC Catalog is a JSON file containing links to GeoJSON features, while a STAC Collection expands on this to include additional metadata.
How do you work with data in cloud-optimized formats?
All the above formats are supported in FME, so converting data into these formats is as straightforward as building a data integration workflow to extract the data from its source and load it into the new system.
To read cloud-native data, an FME workflow can be used to stream just the portion of data that’s needed.
We’ll show these workflows in live demos in this week’s webinar, Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME, so be sure to register or watch the recording!