stac_geoparquet.arrow¶
Arrow-based format conversions.
stac_geoparquet.arrow ¶
            DEFAULT_JSON_CHUNK_SIZE
  
      module-attribute
  
¶
DEFAULT_JSON_CHUNK_SIZE = 65536
The default chunk size to use for reading JSON into memory.
            DEFAULT_PARQUET_SCHEMA_VERSION
  
      module-attribute
  
¶
DEFAULT_PARQUET_SCHEMA_VERSION: SUPPORTED_PARQUET_SCHEMA_VERSIONS = '1.1.0'
The default GeoParquet schema version written to file.
            SUPPORTED_PARQUET_SCHEMA_VERSIONS
  
      module-attribute
  
¶
SUPPORTED_PARQUET_SCHEMA_VERSIONS = Literal['1.0.0', '1.1.0']
A Literal type with the supported GeoParquet schema versions.
parse_stac_items_to_arrow ¶
parse_stac_items_to_arrow(
    items: Iterable[Item | dict[str, Any]],
    *,
    chunk_size: int = 8192,
    schema: Schema | InferredSchema | None = None
) -> RecordBatchReader
Parse a collection of STAC Items to an iterable of
pyarrow.RecordBatch.
The objects under properties are moved up to the top-level of the
Table, similar to
geopandas.GeoDataFrame.from_features.
Parameters:
- 
          items(Iterable[Item | dict[str, Any]]) –the STAC Items to convert 
- 
          chunk_size(int, default:8192) –The chunk size to use for Arrow record batches. This only takes effect if schemais not None. Whenschemais None, the input will be parsed into a single contiguous record batch. Defaults to 8192.
- 
          schema(Schema | InferredSchema | None, default:None) –The schema of the input data. If provided, can improve memory use; otherwise all items need to be parsed into a single array for schema inference. Defaults to None. 
Returns:
- 
              RecordBatchReader–pyarrow RecordBatchReader with a stream of STAC Arrow RecordBatches. 
parse_stac_ndjson_to_arrow ¶
parse_stac_ndjson_to_arrow(
    path: str | Path | Iterable[str | Path],
    *,
    chunk_size: int = DEFAULT_JSON_CHUNK_SIZE,
    schema: Schema | None = None,
    limit: int | None = None
) -> RecordBatchReader
Convert one or more newline-delimited JSON STAC files to a generator of Arrow RecordBatches.
Each RecordBatch in the returned iterator is guaranteed to have an identical schema, and can be used to write to one or more Parquet files.
Parameters:
- 
          path(str | Path | Iterable[str | Path]) –One or more paths to files with STAC items. 
- 
          chunk_size(int, default:DEFAULT_JSON_CHUNK_SIZE) –The chunk size. Defaults to 65536. 
- 
          schema(Schema | None, default:None) –The schema to represent the input STAC data. Defaults to None, in which case the schema will first be inferred via a full pass over the input data. In this case, there will be two full passes over the input data: one to infer a common schema across all data and another to read the data. 
Other Parameters:
- 
          limit(int | None) –The maximum number of JSON Items to use for schema inference 
Returns:
- 
              RecordBatchReader–pyarrow RecordBatchReader with a stream of STAC Arrow RecordBatches. 
parse_stac_ndjson_to_delta_lake ¶
parse_stac_ndjson_to_delta_lake(
    input_path: str | Path | Iterable[str | Path],
    table_or_uri: str | Path | DeltaTable,
    *,
    chunk_size: int = DEFAULT_JSON_CHUNK_SIZE,
    schema: Schema | None = None,
    limit: int | None = None,
    schema_version: SUPPORTED_PARQUET_SCHEMA_VERSIONS = DEFAULT_PARQUET_SCHEMA_VERSION,
    **kwargs: Any
) -> None
Convert one or more newline-delimited JSON STAC files to Delta Lake
Parameters:
- 
          input_path(str | Path | Iterable[str | Path]) –One or more paths to files with STAC items. 
- 
          table_or_uri(str | Path | DeltaTable) –A path to the output Delta Lake table 
Parameters:
- 
          chunk_size(int, default:DEFAULT_JSON_CHUNK_SIZE) –The chunk size to use for reading JSON into memory. Defaults to 65536. 
- 
          schema(Schema | None, default:None) –The schema to represent the input STAC data. Defaults to None, in which case the schema will first be inferred via a full pass over the input data. In this case, there will be two full passes over the input data: one to infer a common schema across all data and another to read the data and iteratively convert to GeoParquet. 
- 
          limit(int | None, default:None) –The maximum number of JSON records to convert. 
- 
          schema_version(SUPPORTED_PARQUET_SCHEMA_VERSIONS, default:DEFAULT_PARQUET_SCHEMA_VERSION) –GeoParquet specification version; if not provided will default to latest supported version. 
parse_stac_ndjson_to_parquet ¶
parse_stac_ndjson_to_parquet(
    input_path: str | Path | Iterable[str | Path],
    output_path: str | Path,
    *,
    chunk_size: int = DEFAULT_JSON_CHUNK_SIZE,
    schema: Schema | InferredSchema | None = None,
    limit: int | None = None,
    schema_version: SUPPORTED_PARQUET_SCHEMA_VERSIONS = DEFAULT_PARQUET_SCHEMA_VERSION,
    **kwargs: Any
) -> None
Convert one or more newline-delimited JSON STAC files to GeoParquet
Parameters:
- 
          input_path(str | Path | Iterable[str | Path]) –One or more paths to files with STAC items. 
- 
          output_path(str | Path) –A path to the output Parquet file. 
Other Parameters:
- 
          chunk_size(int) –The chunk size. Defaults to 65536. 
- 
          schema(Schema | InferredSchema | None) –The schema to represent the input STAC data. Defaults to None, in which case the schema will first be inferred via a full pass over the input data. In this case, there will be two full passes over the input data: one to infer a common schema across all data and another to read the data and iteratively convert to GeoParquet. 
- 
          limit(int | None) –The maximum number of JSON records to convert. 
- 
          schema_version(SUPPORTED_PARQUET_SCHEMA_VERSIONS) –GeoParquet specification version; if not provided will default to latest supported version. 
All other keyword args are passed on to
pyarrow.parquet.ParquetWriter.
stac_table_to_items ¶
stac_table_to_items(
    table: Table | RecordBatchReader | ArrowStreamExportable,
) -> Iterable[dict]
Convert STAC Arrow to a generator of STAC Item dicts.
Parameters:
- 
          table(Table | RecordBatchReader | ArrowStreamExportable) –STAC in Arrow form. This can be a pyarrow Table, a pyarrow RecordBatchReader, or any other Arrow stream object exposed through the Arrow PyCapsule Interface. A RecordBatchReader or stream object will not be materialized in memory. 
Yields:
stac_table_to_ndjson ¶
stac_table_to_ndjson(
    table: Table | RecordBatchReader | ArrowStreamExportable,
    dest: str | Path | PathLike[bytes],
) -> None
Write STAC Arrow to a newline-delimited JSON file.
Note
This function appends to the JSON file at dest; it does not overwrite any
existing data.
Parameters:
- 
          table(Table | RecordBatchReader | ArrowStreamExportable) –STAC in Arrow form. This can be a pyarrow Table, a pyarrow RecordBatchReader, or any other Arrow stream object exposed through the Arrow PyCapsule Interface. A RecordBatchReader or stream object will not be materialized in memory. 
- 
          dest(str | Path | PathLike[bytes]) –The destination where newline-delimited JSON should be written. 
to_parquet ¶
to_parquet(
    table: Table | RecordBatchReader | ArrowStreamExportable,
    output_path: str | Path,
    *,
    schema_version: SUPPORTED_PARQUET_SCHEMA_VERSIONS = DEFAULT_PARQUET_SCHEMA_VERSION,
    **kwargs: Any
) -> None
Write an Arrow table with STAC data to GeoParquet
This writes metadata compliant with either GeoParquet 1.0 or 1.1.
Parameters:
- 
          table(Table | RecordBatchReader | ArrowStreamExportable) –STAC in Arrow form. This can be a pyarrow Table, a pyarrow RecordBatchReader, or any other Arrow stream object exposed through the Arrow PyCapsule Interface. A RecordBatchReader or stream object will not be materialized in memory. 
- 
          output_path(str | Path) –The destination for saving. 
Other Parameters:
- 
          schema_version(SUPPORTED_PARQUET_SCHEMA_VERSIONS) –GeoParquet specification version; if not provided will default to latest supported version. 
All other keyword args are passed on to
pyarrow.parquet.ParquetWriter.