When Features and Geometry Are Not 1:1
It’s said that most data has a spatial component (location matters) and that asking “where” questions can lead to useful insight. Sometimes there can be more than one aspect or way to frame something’s location, size, or shape – and different approaches have been taken to capture these relationships.
In particular, I’d like to highlight the different path GIS and spatial databases have taken in response to this challenge. I’ll then outline some considerations you might face when working with data that spans systems which have taken different approaches.
Examples of Data Spatially Represented in Multiple Ways
- The outline of a geographic area could be stored with different levels of detail based on the scale of interest (e.g. as a polygon with many vertices, a polygon with few vertices, or as a single point).
- Similarly, you might have a number of alternative representations, (e.g. choosing to represent some lakes as areas, some as points, but none as both).
- A road might be represented by both its center line, and by lines representing its edges.
- In some cases, you might want to redundantly store the same geometry in different coordinate systems or storage representations.
(This discussion is restricted to vector data, but in the broader sense you could have vector, raster, 3D models, LiDAR, etc. for the same entity or area.)
The Evolution of GIS and Spatial Databases
The traditional GIS model is that each individual entity (called a feature) consists of a single geometry and a set of attributes. These are organized into layers by common themes (e.g. roads, administrative areas), which are often constrained to a single geometric type (e.g. all points, lines, or areas). In this model, the scenarios above might be handled by multiple layers linked together by a common id.
Some of the first approaches to storing spatial data in relational databases followed the same idea, using primitive types to model layers of features, each with a single geometry and a set of attributes. This strategy remains in common use with database systems that don’t otherwise support spatial data, typically as standardized via the first option in OGC’s Simple Feature SQL specification.
Over time, many relational databases have introduced first-class spatial types. This means they treat geometry just like strings, numbers, or dates, which has made it much easier to work with and analyse spatial data in these systems. Another implication is that their tables may contain more than one spatial column – just like they might contain more than one numeric column – and that presents an alternative way to model the four scenarios discussed above.
The idea of modeling features as having more than one spatial representation isn’t unique to spatial databases. For example, the GE Smallworld GIS uses this concept, as does the GML interchange format.
Practical Implications
There are a few things to consider when integrating or converting data from/between formats or systems using these different approaches:
- If the same entity is represented in different layers, how are these different representations tied together? If this isn’t managed well, data can become inconsistent and lose value.
- In the database, what does it mean for one of the spatial columns to be null (missing), and what is the equivalent representation in the other format or system?
When modeling spatial data which may be usefully represented in different ways, consider how your chosen software best handles this case, and determine if the added value is worth the extra complexity. When using and integrating this data, it is important to keep the different representations and relationships in mind so that you make the most of your investment.