Skip to content

DuckDB

stacrs.DuckdbClient

A client for querying stac-geoparquet with DuckDB.

__init__

__init__(
    use_s3_credential_chain: bool = True, use_hive_partitioning: bool = False
) -> None

Creates a new duckdb client.

Parameters:

  • use_s3_credential_chain (bool, default: True ) –

    If true, configures DuckDB to correctly handle s3:// urls.

  • use_hive_partitioning (bool, default: False ) –

    If true, enables queries on hive partitioned geoparquet files.

get_collections

get_collections(href: str) -> list[dict[str, Any]]

Returns all collections in this stac-geoparquet file.

These collections will be auto-generated from the STAC items, one collection per id in the collections column.

Eventually, these collections might be stored in the stac-geoparquet metadata and retrieved from there, but that's not the case yet.

Parameters:

  • href (str) –

    The stac-geoparquet file to build the collections from.

Returns:

  • list[dict[str, Any]]

    A list of STAC Collections

search

search(
    href: str,
    *,
    ids: Optional[str | list[str]] = None,
    collections: Optional[str | list[str]] = None,
    intersects: Optional[str | dict[str, Any]] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
    bbox: Optional[list[float]] = None,
    datetime: Optional[str] = None,
    include: Optional[str | list[str]] = None,
    exclude: Optional[str | list[str]] = None,
    sortby: Optional[str | list[str]] = None,
    filter: Optional[str | dict[str, Any]] = None,
    query: Optional[dict[str, Any]] = None,
    **kwargs: str,
) -> dict[str, Any]

Search a stac-geoparquet file with duckdb, returning an item collection.

Parameters:

  • href (str) –

    The stac-geoparquet file.

  • ids (Optional[str | list[str]], default: None ) –

    Array of Item ids to return.

  • collections (Optional[str | list[str]], default: None ) –

    Array of one or more Collection IDs that each matching Item must be in.

  • intersects (Optional[str | dict[str, Any]], default: None ) –

    Searches items by performing intersection between their geometry and provided GeoJSON geometry.

  • limit (Optional[int], default: None ) –

    The number of items to return.

  • offset (Optional[int], default: None ) –

    The number of items to skip before returning.

  • bbox (Optional[list[float]], default: None ) –

    Requested bounding box.

  • datetime (Optional[str], default: None ) –

    Single date+time, or a range (/ separator), formatted to RFC 3339, section 5.6. Use double dots .. for open date ranges.

  • include (Optional[str | list[str]], default: None ) –

    fields to include in the response (see the extension docs) for more on the semantics).

  • exclude (Optional[str | list[str]], default: None ) –

    fields to exclude from the response (see the extension docs) for more on the semantics).

  • sortby (Optional[str | list[str]], default: None ) –

    Fields by which to sort results (use -field to sort descending).

  • filter (Optional[str | dict[str, Any]], default: None ) –

    CQL2 filter expression. Strings will be interpreted as cql2-text, dictionaries as cql2-json.

  • query (Optional[dict[str, Any]], default: None ) –

    Additional filtering based on properties. It is recommended to use filter instead, if possible.

  • kwargs (str, default: {} ) –

    Additional parameters to pass in to the search.

Returns:

  • dict[str, Any]

    A feature collection of STAC items.

search_to_arrow

search_to_arrow(
    href: str,
    *,
    ids: Optional[str | list[str]] = None,
    collections: Optional[str | list[str]] = None,
    intersects: Optional[str | dict[str, Any]] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
    bbox: Optional[list[float]] = None,
    datetime: Optional[str] = None,
    include: Optional[str | list[str]] = None,
    exclude: Optional[str | list[str]] = None,
    sortby: Optional[str | list[str]] = None,
    filter: Optional[str | dict[str, Any]] = None,
    query: Optional[dict[str, Any]] = None,
    **kwargs: str,
) -> Table | None

Search a stac-geoparquet file with duckdb, returning an arrow table suitable for loading into (e.g.) GeoPandas.

stacrs must be installed with the arrow extra, e.g. `python -m pip *install 'stacrs[arrow]'.

Because DuckDB has arrow as a core output format, this can be more performant than going through a JSON dictionary.

Parameters:

  • href (str) –

    The stac-geoparquet file.

  • ids (Optional[str | list[str]], default: None ) –

    Array of Item ids to return.

  • collections (Optional[str | list[str]], default: None ) –

    Array of one or more Collection IDs that each matching Item must be in.

  • intersects (Optional[str | dict[str, Any]], default: None ) –

    Searches items by performing intersection between their geometry and provided GeoJSON geometry.

  • limit (Optional[int], default: None ) –

    The number of items to return.

  • offset (Optional[int], default: None ) –

    The number of items to skip before returning.

  • bbox (Optional[list[float]], default: None ) –

    Requested bounding box.

  • datetime (Optional[str], default: None ) –

    Single date+time, or a range (/ separator), formatted to RFC 3339, section 5.6. Use double dots .. for open date ranges.

  • include (Optional[str | list[str]], default: None ) –

    fields to include in the response (see the extension docs) for more on the semantics).

  • exclude (Optional[str | list[str]], default: None ) –

    fields to exclude from the response (see the extension docs) for more on the semantics).

  • sortby (Optional[str | list[str]], default: None ) –

    Fields by which to sort results (use -field to sort descending).

  • filter (Optional[str | dict[str, Any]], default: None ) –

    CQL2 filter expression. Strings will be interpreted as cql2-text, dictionaries as cql2-json.

  • query (Optional[dict[str, Any]], default: None ) –

    Additional filtering based on properties. It is recommended to use filter instead, if possible.

  • kwargs (str, default: {} ) –

    Additional parameters to pass in to the search.

Returns:

  • Table | None

    An arrow table, or none if no records were returned.

Examples:

>>> table = client.search_to_arrow("data/100-sentinel-2-items.parquet")
>>> data_frame = GeoDataFrame.from_arrow(table)