Skip to content

DuckDB

rustac.DuckdbClient

A client for querying stac-geoparquet with DuckDB.

__init__

__init__(
    *,
    extension_directory: Path | None = None,
    extensions: list[str] | None = None,
    install_extensions: bool = True,
    use_hive_partitioning: bool = False,
) -> None

Creates a new duckdb client.

Parameters:

  • extension_directory (Path | None, default: None ) –

    A non-standard extension directory to use.

  • extensions (list[str] | None, default: None ) –

    A list of extensions to LOAD on client initialization.

  • install_extensions (bool, default: True ) –

    Whether to install the required extensions on client initialization.

  • use_hive_partitioning (bool, default: False ) –

    Whether to use hive partitioning for geoparquet queries.

execute

execute(sql: str, params: list[str] | None = None) -> int

Execute an SQL command.

This can be useful for configuring AWS credentials, for example.

Parameters:

  • sql (str) –

    The SQL to execute

  • params (list[str] | None, default: None ) –

    The parameters to pass in to the execution

get_collections

get_collections(href: str) -> list[Collection]

Returns all collections in this stac-geoparquet file.

These collections will be auto-generated from the STAC items, one collection per id in the collections column.

Eventually, these collections might be stored in the stac-geoparquet metadata and retrieved from there, but that's not the case yet.

Parameters:

  • href (str) –

    The stac-geoparquet file to build the collections from.

Returns:

search

search(
    href: str,
    *,
    ids: Optional[str | list[str]] = None,
    collections: Optional[str | list[str]] = None,
    intersects: Optional[str | dict[str, Any]] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
    bbox: Optional[list[float]] = None,
    datetime: Optional[str] = None,
    include: Optional[str | list[str]] = None,
    exclude: Optional[str | list[str]] = None,
    sortby: Optional[str | list[str | dict[str, str]]] = None,
    filter: Optional[str | dict[str, Any]] = None,
    query: Optional[dict[str, Any]] = None,
    **kwargs: str,
) -> list[dict[str, Any]]

Search a stac-geoparquet file with duckdb, returning a list of items.

Parameters:

  • href (str) –

    The stac-geoparquet file.

  • ids (Optional[str | list[str]], default: None ) –

    Array of Item ids to return.

  • collections (Optional[str | list[str]], default: None ) –

    Array of one or more Collection IDs that each matching Item must be in.

  • intersects (Optional[str | dict[str, Any]], default: None ) –

    Searches items by performing intersection between their geometry and provided GeoJSON geometry.

  • limit (Optional[int], default: None ) –

    The number of items to return.

  • offset (Optional[int], default: None ) –

    The number of items to skip before returning.

  • bbox (Optional[list[float]], default: None ) –

    Requested bounding box.

  • datetime (Optional[str], default: None ) –

    Single date+time, or a range (/ separator), formatted to RFC 3339, section 5.6. Use double dots .. for open date ranges.

  • include (Optional[str | list[str]], default: None ) –

    fields to include in the response (see the extension docs) for more on the semantics).

  • exclude (Optional[str | list[str]], default: None ) –

    fields to exclude from the response (see the extension docs) for more on the semantics).

  • sortby (Optional[str | list[str | dict[str, str]]], default: None ) –

    Fields by which to sort results (use -field to sort descending).

  • filter (Optional[str | dict[str, Any]], default: None ) –

    CQL2 filter expression. Strings will be interpreted as cql2-text, dictionaries as cql2-json.

  • query (Optional[dict[str, Any]], default: None ) –

    Additional filtering based on properties. It is recommended to use filter instead, if possible.

  • kwargs (str, default: {} ) –

    Additional parameters to pass in to the search.

Returns:

  • list[dict[str, Any]]

    A list of STAC items.

search_to_arrow

search_to_arrow(
    href: str,
    *,
    ids: Optional[str | list[str]] = None,
    collections: Optional[str | list[str]] = None,
    intersects: Optional[str | dict[str, Any]] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
    bbox: Optional[list[float]] = None,
    datetime: Optional[str] = None,
    include: Optional[str | list[str]] = None,
    exclude: Optional[str | list[str]] = None,
    sortby: Optional[str | list[str | dict[str, str]]] = None,
    filter: Optional[str | dict[str, Any]] = None,
    query: Optional[dict[str, Any]] = None,
    **kwargs: str,
) -> Table | None

Search a stac-geoparquet file with duckdb, returning an arrow table suitable for loading into (e.g.) GeoPandas.

rustac must be installed with the arrow extra, e.g. `python -m pip *install 'rustac[arrow]'.

Because DuckDB has arrow as a core output format, this can be more performant than going through a JSON dictionary.

Parameters:

  • href (str) –

    The stac-geoparquet file.

  • ids (Optional[str | list[str]], default: None ) –

    Array of Item ids to return.

  • collections (Optional[str | list[str]], default: None ) –

    Array of one or more Collection IDs that each matching Item must be in.

  • intersects (Optional[str | dict[str, Any]], default: None ) –

    Searches items by performing intersection between their geometry and provided GeoJSON geometry.

  • limit (Optional[int], default: None ) –

    The number of items to return.

  • offset (Optional[int], default: None ) –

    The number of items to skip before returning.

  • bbox (Optional[list[float]], default: None ) –

    Requested bounding box.

  • datetime (Optional[str], default: None ) –

    Single date+time, or a range (/ separator), formatted to RFC 3339, section 5.6. Use double dots .. for open date ranges.

  • include (Optional[str | list[str]], default: None ) –

    fields to include in the response (see the extension docs) for more on the semantics).

  • exclude (Optional[str | list[str]], default: None ) –

    fields to exclude from the response (see the extension docs) for more on the semantics).

  • sortby (Optional[str | list[str | dict[str, str]]], default: None ) –

    Fields by which to sort results (use -field to sort descending).

  • filter (Optional[str | dict[str, Any]], default: None ) –

    CQL2 filter expression. Strings will be interpreted as cql2-text, dictionaries as cql2-json.

  • query (Optional[dict[str, Any]], default: None ) –

    Additional filtering based on properties. It is recommended to use filter instead, if possible.

  • kwargs (str, default: {} ) –

    Additional parameters to pass in to the search.

Returns:

  • Table | None

    An arrow table, or none if no records were returned.

Examples:

>>> table = client.search_to_arrow("data/100-sentinel-2-items.parquet")
>>> data_frame = GeoDataFrame.from_arrow(table)