impacts of STAC item footprint size on dynamic tiling query performance¶
TL;DR: If you have any control over the geographic footprint of the assets that you are cataloging with pgstac
and you want to serve visualizations with a dynamic tiling application, try to maximize the size of the assets!
Dynamic tiling applications like titiler-pgstac
send many queries to a pgstac
database and clients are very sensitive to performance so it is worth considering a few basic ideas when building collections and items that may be used in this way.
pgstac
's query functions perform relatively expensive spatial intersection operations so the fewer items there are in a collection x datetime partition, the faster the query will be. This is not a pgstac
-specific problem (any application that needs to perform spatial intersections will take longer as the number of calculations increases), but it is worth demonstrating the influence of these factors in the dynamic tiling context.
Scenario¶
Imagine you have a continental-scale dataset of gridded data that will be stored as cloud-optimized geotiffs (COGs) and you get to decide how the individual files will be spatially arranged and cataloged in a pgstac
database. You could make items as small as 0.5 degree squares or as large as 10 degree squares. In this case the assets will be non-overlapping rectangular grids.
The assets will be publicly accessible, so smaller file sizes might be useful for some applications/users, but since the data will be stored as COGs and we also want to be able to serve raster tile visualizations in a web map with titiler-pgstac
, smaller file sizes are not very important. However, the processing pipleline that generates the assets might have some resource constraints that push you to choose a smaller tile size.
Consider the following options for tile sizes:
tile width (degrees) | # items | |
---|---|---|
0 | 0.5 | 10000 |
1 | 1.0 | 2500 |
2 | 2.0 | 625 |
3 | 4.0 | 169 |
4 | 6.0 | 81 |
5 | 8.0 | 49 |
6 | 10.0 | 25 |
The number of items is inversely proportional to the square of the tile width which means that small changes in tile size can have a large impact on the eventual number of items in your catalog!
This map shows the spatial arrangement of the items for a range of tile sizes:
Performance comparison¶
To simulate the performance of queries made by a dynamic tiling application we have prepared a benchmarking procedure that uses the pgstac
function xyzsearch
to run an item query for an XYZ tile. By iterating over many combinations of tile sizes and zoom levels we can examine the response time with respect to item footprint size and tile zoom level.
This figure shows average response time for xyzsearch
to return a complete set of results for each zoom level for the range of item tile widths:
<Axes: xlabel='zoom level', ylabel='item tile width'>
Without details about the resource configuration for a specific pgstac
deployment it is hard to say which zoom level becomes inoperable for a given tile size, but queries that take >0.5 seconds in this test would probably yield poor results in a deployed context.