Direct GeoPandas conversion (Legacy)¶
The API listed here was the initial non-Arrow-based STAC-GeoParquet implementation, converting between JSON and GeoPandas directly. For large collections of STAC items, using the new Arrow-based functionality (under the stac_geoparquet.arrow
namespace) will be more performant.
Note that stac_geoparquet
lifts the keys in the item properties
up to the top level of the DataFrame, similar to geopandas.GeoDataFrame.from_features
.
>>> import requests
>>> import stac_geoparquet.arrow
>>> import pyarrow.parquet
>>> import pyarrow as pa
>>> items = requests.get(
... "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items"
... ).json()["features"]
>>> table = pa.Table.from_batches(stac_geoparquet.arrow.parse_stac_items_to_arrow(items))
>>> stac_geoparquet.arrow.to_parquet(table, "items.parquet")
>>> table2 = pyarrow.parquet.read_table("items.parquet")
>>> items2 = list(stac_geoparquet.arrow.stac_table_to_items(table2))
stac_geoparquet.to_geodataframe ¶
to_geodataframe(
items: Sequence[dict[str, Any]],
add_self_link: bool = False,
dtype_backend: DTYPE_BACKEND | None = None,
datetime_precision: str = "ns",
) -> GeoDataFrame
Convert a sequence of STAC items to a geopandas.GeoDataFrame
.
The objects under properties
are moved up to the top-level of the
DataFrame, similar to
geopandas.GeoDataFrame.from_features
.
Parameters:
-
items
(Sequence[dict[str, Any]]
) –A sequence of STAC items.
-
add_self_link
(bool
, default:False
) –bool, default False Add the absolute link (if available) to the source STAC Item as a separate column named "self_link"
-
dtype_backend
(DTYPE_BACKEND | None
, default:None
) –{'pyarrow', 'numpy_nullable'}
, optional The dtype backend to use for storing arrays.By default, this will use 'numpy_nullable' and emit a FutureWarning that the default will change to 'pyarrow' in the next release.
Set to 'numpy_nullable' to silence the warning and accept the old behavior.
Set to 'pyarrow' to silence the warning and accept the new behavior.
There are some difference in the output as well: with
dtype_backend="pyarrow"
, struct-like fields will explicitly contain null values for fields that appear in only some of the records. For example, given anassets
like::{ "a": { "href": "a.tif", }, "b": { "href": "b.tif", "title": "B", } }
The
assets
field of the output for the first row withdtype_backend="numpy_nullable"
will be a Python dictionary with just{"href": "a.tiff"}
.With
dtype_backend="pyarrow"
, this will be a pyarrow struct with fields{"href": "a.tif", "title", None}
. pyarrow will infer that the struct fieldasset.title
is nullable. -
datetime_precision
(str
, default:'ns'
) –str, default "ns" The precision to use for the datetime columns. For example, "us" is microsecond and "ns" is nanosecond.
Returns:
-
GeoDataFrame
–The converted GeoDataFrame.
stac_geoparquet.to_item_collection ¶
to_item_collection(df: GeoDataFrame) -> ItemCollection
Convert a GeoDataFrame of STAC items to a pystac.ItemCollection
.
Parameters:
-
df
(GeoDataFrame
) –A GeoDataFrame with a schema similar to that exported by stac-geoparquet.
Returns:
-
ItemCollection
–The converted
ItemCollection
. There will be one record / feature per row in the in the GeoDataFrame.