DuckDB

rustac.DuckdbClient

A client for querying stac-geoparquet with DuckDB.

init

__init__(
    *,
    extension_directory: Path | None = None,
    extensions: list[str] | None = None,
    install_extensions: bool = True,
    use_hive_partitioning: bool = False,
) -> None

Creates a new duckdb client.

Parameters:

extension_directory (Path | None, default: None ) –

A non-standard extension directory to use.
extensions (list[str] | None, default: None ) –

A list of extensions to LOAD on client initialization.
install_extensions (bool, default: True ) –

Whether to install the required extensions on client initialization.
use_hive_partitioning (bool, default: False ) –

Whether to use hive partitioning for geoparquet queries.

execute

execute(sql: str, params: list[str] | None = None) -> int

Execute an SQL command.

This can be useful for configuring AWS credentials, for example.

Parameters:

sql (str) –

The SQL to execute
params (list[str] | None, default: None ) –

The parameters to pass in to the execution

get_collections

get_collections(href: str) -> list[Collection]

Returns all collections in this stac-geoparquet file.

These collections will be auto-generated from the STAC items, one collection per id in the collections column.

Eventually, these collections might be stored in the stac-geoparquet metadata and retrieved from there, but that's not the case yet.

Parameters:

href (str) –

The stac-geoparquet file to build the collections from.

Returns:

list[Collection] –

A list of STAC Collections

search

search(
    href: str,
    *,
    ids: Optional[str | list[str]] = None,
    collections: Optional[str | list[str]] = None,
    intersects: Optional[str | dict[str, Any]] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
    bbox: Optional[list[float]] = None,
    datetime: Optional[str] = None,
    include: Optional[str | list[str]] = None,
    exclude: Optional[str | list[str]] = None,
    sortby: Optional[str | list[str | dict[str, str]]] = None,
    filter: Optional[str | dict[str, Any]] = None,
    query: Optional[dict[str, Any]] = None,
    **kwargs: str,
) -> list[dict[str, Any]]

Search a stac-geoparquet file with duckdb, returning a list of items.

Parameters:

href (str) –

The stac-geoparquet file.
ids (Optional[str | list[str]], default: None ) –

Array of Item ids to return.
collections (Optional[str | list[str]], default: None ) –

Array of one or more Collection IDs that each matching Item must be in.
intersects (Optional[str | dict[str, Any]], default: None ) –

Searches items by performing intersection between their geometry and provided GeoJSON geometry.
limit (Optional[int], default: None ) –

The number of items to return.
offset (Optional[int], default: None ) –

The number of items to skip before returning.
bbox (Optional[list[float]], default: None ) –

Requested bounding box.
datetime (Optional[str], default: None ) –

Single date+time, or a range (/ separator), formatted to RFC 3339, section 5.6. Use double dots .. for open date ranges.
include (Optional[str | list[str]], default: None ) –

fields to include in the response (see the extension docs) for more on the semantics).
exclude (Optional[str | list[str]], default: None ) –

fields to exclude from the response (see the extension docs) for more on the semantics).
sortby (Optional[str | list[str | dict[str, str]]], default: None ) –

Fields by which to sort results (use -field to sort descending).
filter (Optional[str | dict[str, Any]], default: None ) –

CQL2 filter expression. Strings will be interpreted as cql2-text, dictionaries as cql2-json.
query (Optional[dict[str, Any]], default: None ) –

Additional filtering based on properties. It is recommended to use filter instead, if possible.
kwargs (str, default: {} ) –

Additional parameters to pass in to the search.

Returns:

list[dict[str, Any]] –

A list of STAC items.

search_to_arrow

search_to_arrow(
    href: str,
    *,
    ids: Optional[str | list[str]] = None,
    collections: Optional[str | list[str]] = None,
    intersects: Optional[str | dict[str, Any]] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
    bbox: Optional[list[float]] = None,
    datetime: Optional[str] = None,
    include: Optional[str | list[str]] = None,
    exclude: Optional[str | list[str]] = None,
    sortby: Optional[str | list[str | dict[str, str]]] = None,
    filter: Optional[str | dict[str, Any]] = None,
    query: Optional[dict[str, Any]] = None,
    **kwargs: str,
) -> Table | None

Search a stac-geoparquet file with duckdb, returning an arrow table suitable for loading into (e.g.) GeoPandas.

rustac must be installed with the arrow extra, e.g. `python -m pip *install 'rustac[arrow]'.

Because DuckDB has arrow as a core output format, this can be more performant than going through a JSON dictionary.

Parameters:

href (str) –

The stac-geoparquet file.
ids (Optional[str | list[str]], default: None ) –

Array of Item ids to return.
collections (Optional[str | list[str]], default: None ) –

Array of one or more Collection IDs that each matching Item must be in.
intersects (Optional[str | dict[str, Any]], default: None ) –

Searches items by performing intersection between their geometry and provided GeoJSON geometry.
limit (Optional[int], default: None ) –

The number of items to return.
offset (Optional[int], default: None ) –

The number of items to skip before returning.
bbox (Optional[list[float]], default: None ) –

Requested bounding box.
datetime (Optional[str], default: None ) –

Single date+time, or a range (/ separator), formatted to RFC 3339, section 5.6. Use double dots .. for open date ranges.
include (Optional[str | list[str]], default: None ) –

fields to include in the response (see the extension docs) for more on the semantics).
exclude (Optional[str | list[str]], default: None ) –

fields to exclude from the response (see the extension docs) for more on the semantics).
sortby (Optional[str | list[str | dict[str, str]]], default: None ) –

Fields by which to sort results (use -field to sort descending).
filter (Optional[str | dict[str, Any]], default: None ) –

CQL2 filter expression. Strings will be interpreted as cql2-text, dictionaries as cql2-json.
query (Optional[dict[str, Any]], default: None ) –

Additional filtering based on properties. It is recommended to use filter instead, if possible.
kwargs (str, default: {} ) –

Additional parameters to pass in to the search.

Returns:

Table | None –

An arrow table, or none if no records were returned.

Examples:

>>> table = client.search_to_arrow("data/100-sentinel-2-items.parquet")
>>> data_frame = GeoDataFrame.from_arrow(table)