DuckDB

Extensions

DuckDB requires the spatial, icu, and parquet extensions. By default, DuckdbClient downloads these at runtime via INSTALL.

To skip the runtime download, install the rustac-duckdb-extensions package, which ships pre-built extension binaries:

python -m pip install rustac-duckdb-extensions

Or as an extra:

python -m pip install 'rustac[duckdb-extensions]'

When rustac-duckdb-extensions is installed, DuckdbClient will automatically detect and use the bundled extensions — no configuration needed:

import rustac

client = rustac.DuckdbClient()
items = client.search("data.parquet")

Tip

This is especially useful in environments where network access is restricted or you want reproducible, hermetic builds.

API

rustac.DuckdbClient

A client for querying stac-geoparquet with DuckDB.

init

__init__(
    *,
    extension_directory: Path | None = None,
    extensions: list[str] | None = None,
    install_extensions: bool = True,
    use_hive_partitioning: bool = False,
) -> None

Creates a new duckdb client.

Parameters:

extension_directory (Path | None, default: None ) –

A non-standard extension directory to use.
extensions (list[str] | None, default: None ) –

A list of extensions to LOAD on client initialization.
install_extensions (bool, default: True ) –

Whether to install the required extensions on client initialization.
use_hive_partitioning (bool, default: False ) –

Whether to use hive partitioning for geoparquet queries.

execute

execute(sql: str, params: list[str] | None = None) -> int

Execute an SQL command.

This can be useful for configuring AWS credentials, for example.

Parameters:

sql (str) –

The SQL to execute
params (list[str] | None, default: None ) –

The parameters to pass in to the execution

get_collections

get_collections(href: str) -> list[dict[str, Any]]

Returns all collections in this stac-geoparquet file.

These collections will be auto-generated from the STAC items, one collection per id in the collections column.

Eventually, these collections might be stored in the stac-geoparquet metadata and retrieved from there, but that's not the case yet.

Parameters:

href (str) –

The stac-geoparquet file to build the collections from.

Returns:

list[dict[str, Any]] –

A list of STAC Collections

search

search(
    href: str,
    *,
    ids: str | list[str] | None = None,
    collections: str | list[str] | None = None,
    intersects: str | dict[str, Any] | None = None,
    limit: int | None = None,
    max_items: int | None = None,
    offset: int | None = None,
    bbox: list[float] | None = None,
    datetime: str | None = None,
    include: str | list[str] | None = None,
    exclude: str | list[str] | None = None,
    sortby: str | list[str | dict[str, str]] | None = None,
    filter: str | dict[str, Any] | None = None,
    query: dict[str, Any] | None = None,
    **kwargs: str,
) -> list[dict[str, Any]]

Search a stac-geoparquet file with duckdb, returning a list of items.

Parameters:

href (str) –

The stac-geoparquet file.
ids (str | list[str] | None, default: None ) –

Array of Item ids to return.
collections (str | list[str] | None, default: None ) –

Array of one or more Collection IDs that each matching Item must be in.
intersects (str | dict[str, Any] | None, default: None ) –

Searches items by performing intersection between their geometry and provided GeoJSON geometry.
limit (int | None, default: None ) –

The number of items to return.
max_items (int | None, default: None ) –

The number of items to return (included so that we have a similar call API to normal search)
offset (int | None, default: None ) –

The number of items to skip before returning.
bbox (list[float] | None, default: None ) –

Requested bounding box.
datetime (str | None, default: None ) –
Single date+time, or a range (/ separator), formatted to RFC 3339, section 5.6. Use double dots .. for open date ranges.

Partial dates are also supported and will be automatically expanded to full RFC 3339 datetime ranges:
- Year only (e.g., "2023") expands to 2023-01-01T00:00:00Z/2023-12-31T23:59:59Z
- Year-Month (e.g., "2023-06") expands to 2023-06-01T00:00:00Z/2023-06-30T23:59:59Z
- ISO 8601 date (e.g., "2023-06-15") expands to 2023-06-15T00:00:00Z/2023-06-15T23:59:59Z
- Ranges also support partial dates (e.g., "2017/2018", "2017-06/2017-07")
include (str | list[str] | None, default: None ) –

fields to include in the response (see the extension docs) for more on the semantics).
exclude (str | list[str] | None, default: None ) –

fields to exclude from the response (see the extension docs) for more on the semantics).
sortby (str | list[str | dict[str, str]] | None, default: None ) –

Fields by which to sort results (use -field to sort descending).
filter (str | dict[str, Any] | None, default: None ) –

CQL2 filter expression. Strings will be interpreted as cql2-text, dictionaries as cql2-json.
query (dict[str, Any] | None, default: None ) –

Additional filtering based on properties. It is recommended to use filter instead, if possible.
kwargs (str, default: {} ) –

Additional parameters to pass in to the search.

Returns:

list[dict[str, Any]] –

A list of STAC items.

search_to_arrow

search_to_arrow(
    href: str,
    *,
    ids: str | list[str] | None = None,
    collections: str | list[str] | None = None,
    intersects: str | dict[str, Any] | None = None,
    limit: int | None = None,
    offset: int | None = None,
    bbox: list[float] | None = None,
    datetime: str | None = None,
    include: str | list[str] | None = None,
    exclude: str | list[str] | None = None,
    sortby: str | list[str | dict[str, str]] | None = None,
    filter: str | dict[str, Any] | None = None,
    query: dict[str, Any] | None = None,
    **kwargs: str,
) -> Table | None

Search a stac-geoparquet file with duckdb, returning an arrow table suitable for loading into (e.g.) GeoPandas.

rustac must be installed with the arrow extra, e.g. `python -m pip *install 'rustac[arrow]'.

Because DuckDB has arrow as a core output format, this can be more performant than going through a JSON dictionary.

Parameters:

href (str) –

The stac-geoparquet file.
ids (str | list[str] | None, default: None ) –

Array of Item ids to return.
collections (str | list[str] | None, default: None ) –

Array of one or more Collection IDs that each matching Item must be in.
intersects (str | dict[str, Any] | None, default: None ) –

Searches items by performing intersection between their geometry and provided GeoJSON geometry.
limit (int | None, default: None ) –

The number of items to return.
offset (int | None, default: None ) –

The number of items to skip before returning.
bbox (list[float] | None, default: None ) –

Requested bounding box.
datetime (str | None, default: None ) –
Single date+time, or a range (/ separator), formatted to RFC 3339, section 5.6. Use double dots .. for open date ranges.

Partial dates are also supported and will be automatically expanded to full RFC 3339 datetime ranges:
- Year only (e.g., "2023") expands to 2023-01-01T00:00:00Z/2023-12-31T23:59:59Z
- Year-Month (e.g., "2023-06") expands to 2023-06-01T00:00:00Z/2023-06-30T23:59:59Z
- ISO 8601 date (e.g., "2023-06-15") expands to 2023-06-15T00:00:00Z/2023-06-15T23:59:59Z
- Ranges also support partial dates (e.g., "2017/2018", "2017-06/2017-07")
include (str | list[str] | None, default: None ) –

fields to include in the response (see the extension docs) for more on the semantics).
exclude (str | list[str] | None, default: None ) –

fields to exclude from the response (see the extension docs) for more on the semantics).
sortby (str | list[str | dict[str, str]] | None, default: None ) –

Fields by which to sort results (use -field to sort descending).
filter (str | dict[str, Any] | None, default: None ) –

CQL2 filter expression. Strings will be interpreted as cql2-text, dictionaries as cql2-json.
query (dict[str, Any] | None, default: None ) –

Additional filtering based on properties. It is recommended to use filter instead, if possible.
kwargs (str, default: {} ) –

Additional parameters to pass in to the search.

Returns:

Table | None –

An arrow table, or none if no records were returned.

Examples:

>>> table = client.search_to_arrow("data/100-sentinel-2-items.parquet")
>>> data_frame = GeoDataFrame.from_arrow(table)