Architecture¶
Comprehensive overview of stac-server's architecture, component interactions, and data flows.
Overview¶
Stac-server is a serverless implementation of the STAC API specification built entirely on AWS managed services. The architecture is designed for:
- Scalability: Auto-scaling Lambda functions handle any load
- Cost Efficiency: Pay-per-use pricing with minimal idle infrastructure (OpenSearch cluster runs continuously)
- Reliability: Managed services with built-in redundancy
- Event-Driven: Asynchronous ingest pipeline with guaranteed delivery
Core Principle: Separation of read (API) and write (Ingest) paths for optimal performance and reliability.
System Components¶
graph TB
subgraph "Client Layer"
client[API Clients]
publisher[Data Publishers]
end
subgraph "AWS Edge"
apigw[API Gateway]
end
subgraph "Compute Layer"
apiLambda[API Lambda]
ingestLambda[Ingest Lambda]
preHook[Pre-Hook Lambda<br/>optional]
postHook[Post-Hook Lambda<br/>optional]
end
subgraph "Queue Layer"
ingestSNS[Ingest SNS Topic]
postIngestSNS[Post-Ingest SNS Topic]
ingestSQS[Ingest SQS Queue]
dlq[Dead Letter Queue]
end
subgraph "Data Layer"
opensearch[(OpenSearch)]
s3[S3 Buckets<br/>Asset Storage]
secrets[Secrets Manager<br/>Credentials]
end
client -->|HTTP/HTTPS| apigw
apigw -->|Proxy| apiLambda
apiLambda -.->|Optional| preHook
apiLambda -.->|Optional| postHook
apiLambda -->|Query| opensearch
apiLambda -->|Generate URLs| s3
publisher -->|Publish Items| ingestSNS
ingestSNS -->|Fan-out| ingestSQS
ingestSQS -->|Batch| ingestLambda
ingestLambda -->|Write| opensearch
ingestLambda -->|Status| postIngestSNS
ingestSQS -->|Failed| dlq
apiLambda -.->|Credentials| secrets
ingestLambda -.->|Credentials| secrets
style opensearch fill:#ff9900
style apiLambda fill:#ff9900
style ingestLambda fill:#ff9900
style apigw fill:#ff4f8b
Component Descriptions¶
| Component | Type | Purpose | Scaling |
|---|---|---|---|
| API Gateway | Managed Service | HTTP endpoint, request routing, throttling | Auto-scales to millions of requests |
| API Lambda | Serverless Function | STAC API implementation, query processing | Concurrent execution up to account limits |
| Ingest Lambda | Serverless Function | Process and index Collections/Items | Batch processing with SQS triggers |
| Pre-Hook Lambda | Optional Function | Request authentication/modification | Synchronous, adds latency |
| Post-Hook Lambda | Optional Function | Response transformation/logging | Synchronous, adds latency |
| OpenSearch | Managed Database | Document store and search engine | Configurable instance types and counts |
| Ingest SNS Topic | Managed Queue | Entry point for data ingestion | Unlimited throughput |
| Ingest SQS Queue | Managed Queue | Buffering and batching | Unlimited retention |
| Post-Ingest SNS Topic | Managed Queue | Notification of ingest status | Unlimited throughput |
| Dead Letter Queue | Managed Queue | Failed ingest messages | Unlimited retention |
| Secrets Manager | Managed Service | OpenSearch credentials storage | N/A |
| S3 Buckets | Object Storage | Asset storage (external or managed) | Unlimited |
Request Flow¶
Standard API Request¶
sequenceDiagram
participant Client
participant API Gateway
participant Pre-Hook
participant API Lambda
participant OpenSearch
participant Post-Hook
Client->>API Gateway: GET /search?bbox=...
API Gateway->>API Lambda: Lambda Proxy Event
alt Pre-Hook Enabled
API Lambda->>Pre-Hook: Invoke (async)
Pre-Hook-->>API Lambda: Modified Event / Auth Result
end
API Lambda->>API Lambda: Parse & Validate Request
API Lambda->>OpenSearch: Query with filters
OpenSearch-->>API Lambda: Matching documents
API Lambda->>API Lambda: Transform to STAC response
alt Post-Hook Enabled
API Lambda->>Post-Hook: Invoke (async)
Post-Hook-->>API Lambda: Modified Response
end
API Lambda-->>API Gateway: HTTP Response
API Gateway-->>Client: STAC FeatureCollection
Key Points: - Pre-hook can modify request or reject with 401/403 - OpenSearch queries use JSON DSL for complex filtering - Response includes STAC-compliant links for pagination - Post-hook can add headers, modify body, or log
Search Query Processing¶
The API Lambda translates STAC API parameters to OpenSearch queries:
- Collection Filter:
termquery oncollectionfield - Bbox:
geo_bounding_boxquery ongeometryfield - Datetime:
rangequery onproperties.datetime - CQL2 Filter: Recursive translation to OpenSearch DSL
- Sort: Maps to OpenSearch
sortarray - Pagination: Uses
search_afterfor efficient deep pagination
Ingest Pipeline¶
Message Flow¶
flowchart TD
start[Publisher] -->|Publish JSON| ingestSNS[Ingest SNS Topic]
ingestSNS -->|Subscribe| ingestSQS[Ingest SQS Queue]
ingestSQS -->|Batch 1-10 msgs| ingestLambda[Ingest Lambda]
ingestLambda -->|Parse| decision{Message Type?}
decision -->|href reference| fetch[Fetch from S3/HTTP]
decision -->|inline JSON| validate[Validate STAC]
fetch --> validate
validate -->|Valid| type{Type?}
validate -->|Invalid| fail[Failed]
type -->|Collection| collIndex[Write to collections index]
type -->|Item| itemIndex[Write to collection-specific index]
type -->|Action| action[Execute action truncate]
collIndex --> transform[Apply Asset Proxy]
itemIndex --> transform
transform --> success[Success]
success --> notify[Publish to Post-Ingest SNS]
fail --> notify
fail -->|Max retries exceeded| dlq[Dead Letter Queue]
notify --> subscribers[Subscribed systems]
style success fill:#90EE90
style fail fill:#FFB6C1
style dlq fill:#FF6B6B
style ingestLambda fill:#ff9900
Ingest Message Formats¶
Inline Item/Collection:
{
"type": "Feature",
"stac_version": "1.0.0",
"id": "item-id",
"collection": "my-collection",
"geometry": {...},
"properties": {...}
}
Reference (for large items):
Action (delete all items from collection):
Note: The
truncatecommand deletes all items from the specified collection's index using OpenSearch'sdeleteByQueryAPI. This is a destructive operation that requiresENABLE_INGEST_ACTION_TRUNCATE=true. It cannot be used on the collections index or wildcard indices.
Ingest Processing Steps¶
- Receive: Lambda triggered by SQS with batch of 1-10 messages
- Parse: Extract STAC document from SNS wrapper and SQS record
- Fetch: If href reference, retrieve from S3 or HTTP
- Validate: Check STAC version, required fields, schema
- Route:
- Collections →
collectionsindex - Items →
<collection-id>index - Actions → Execute command
- Transform: Apply asset proxy transformations if enabled
- Index: Write to OpenSearch with upsert semantics
- Notify: Publish success/failure to Post-Ingest SNS
- Cleanup: Delete message from SQS or send to DLQ on failure
Error Handling¶
- Transient Errors (network, OpenSearch timeout): Automatic retry via SQS visibility timeout
- Permanent Errors (invalid JSON, missing collection): Send to Dead Letter Queue after max retries
- Partial Batch Failures: Successfully processed messages are deleted, failures return to queue
OpenSearch Index Structure¶
Collections Index¶
Index Name: collections (configurable via COLLECTIONS_INDEX)
Mapping Highlights:
{
"mappings": {
"properties": {
"id": {"type": "keyword"},
"type": {"type": "keyword"},
"stac_version": {"type": "keyword"},
"title": {"type": "text"},
"description": {"type": "text"},
"license": {"type": "keyword"},
"extent": {
"properties": {
"spatial": {
"properties": {
"bbox": {"type": "double"}
}
},
"temporal": {
"properties": {
"interval": {"type": "date"}
}
}
}
},
"queryables": {"enabled": false},
"aggregations": {"enabled": false}
}
}
}
Notes:
- queryables and aggregations fields are stored but not indexed (served from API)
- One document per collection
- Updated via upsert (partial updates not supported)
Items Indices¶
Index Name: <collection-id> (one index per collection)
Mapping Highlights:
{
"mappings": {
"properties": {
"id": {"type": "keyword"},
"type": {"type": "keyword"},
"stac_version": {"type": "keyword"},
"collection": {"type": "keyword"},
"geometry": {"type": "geo_shape"},
"bbox": {"type": "double"},
"properties": {
"properties": {
"datetime": {"type": "date"},
"created": {"type": "date"},
"updated": {"type": "date"}
}
},
"links": {"enabled": false},
"assets": {"enabled": false}
},
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {"type": "keyword"}
}
}
]
}
}
Notes:
- Dynamic mapping for item properties (strings become keywords by default)
- geometry uses geo_shape for complex polygon queries
- links and assets stored but not indexed (metadata only)
- Index auto-created on first item ingest (can be disabled)
Index Strategy¶
Option 1: Collection Per Index (default) - Pros: Independent scaling, deletion = drop index, custom mappings per collection - Cons: More indices to manage, can't search across collections efficiently
Option 2: Shared Index (via COLLECTION_TO_INDEX_MAPPINGS)
- Pros: Fewer indices, efficient cross-collection search
- Cons: Shared mappings, delete requires query
Authentication & Authorization¶
OpenSearch Authentication¶
flowchart LR
subgraph "Fine-Grained Access Control"
lambda1[Lambda] -->|Username/Password| sm1[Secrets Manager]
sm1 --> os1[(OpenSearch<br/>Internal User DB)]
end
subgraph "IAM Authentication"
lambda2[Lambda] -->|IAM Role| sts[STS]
sts -->|SigV4| os2[(OpenSearch<br/>IAM Policy)]
end
style lambda1 fill:#ff9900
style lambda2 fill:#ff9900
Fine-Grained Access Control (Default):
- Lambda uses service account (e.g., stac_server) with read/write permissions
- Credentials stored in Secrets Manager
- Granular role-based permissions within OpenSearch
- Admin uses master account for dashboard/CLI access
IAM Authentication: - Lambda uses its execution role for OpenSearch access - IAM policies control index-level permissions - No credentials to manage - Requires SigV4 signing for all requests
API Authorization (Optional)¶
sequenceDiagram
participant Client
participant API Gateway
participant Authorizer
participant Pre-Hook
participant API Lambda
Client->>API Gateway: Request + Auth Token
alt API Gateway Authorizer
API Gateway->>Authorizer: Validate Token
Authorizer-->>API Gateway: Allow/Deny + Context
end
API Gateway->>API Lambda: Authorized Request
alt Pre-Hook Authorization
API Lambda->>Pre-Hook: Request + Headers
Pre-Hook->>Pre-Hook: Check API Key<br/>Build _collections/_filter
Pre-Hook-->>API Lambda: Modified Request
end
API Lambda->>API Lambda: Apply authx filters
API Lambda-->>Client: Filtered Results
Authorization Layers:
- API Gateway Authorizer (external):
- JWT validation, API key checking
- Adds context to Lambda event
-
Not included in stac-server
-
Pre-Hook Lambda (included example):
- Reads API key from header
- Validates against Secrets Manager
-
Injects
_collectionsor_filterparameters -
Lambda Authorization Parameters:
ENABLE_COLLECTIONS_AUTHX: Restrict by collection listENABLE_FILTER_AUTHX: Apply CQL2 filter to all queries
Asset Proxy Flow¶
sequenceDiagram
participant Client
participant API Lambda
participant S3
Note over API Lambda: Asset Proxy Enabled
Client->>API Lambda: GET /collections/col/items/item
API Lambda->>API Lambda: Transform asset hrefs
Note over API Lambda: s3://bucket/path/asset.tif<br/>→<br/>/collections/col/items/item/assets/data
API Lambda-->>Client: Item with proxy URLs
Client->>API Lambda: GET /collections/col/items/item/assets/data
API Lambda->>API Lambda: Check bucket allowlist
API Lambda->>S3: Generate pre-signed URL
S3-->>API Lambda: Signed URL (300s expiry)
API Lambda-->>Client: 302 Redirect to signed URL
Client->>S3: GET signed URL
S3-->>Client: Asset data
Asset Transformation:
Before (original):
After (proxied):
{
"assets": {
"data": {
"href": "https://api.example.com/collections/col/items/item/assets/data",
"type": "image/tiff",
"alternate": {
"s3": {
"href": "s3://my-bucket/path/to/file.tif"
}
}
}
},
"stac_extensions": [
"https://stac-extensions.github.io/alternate-assets/v1.2.0/schema.json"
]
}
Proxy Modes (via ASSET_PROXY_BUCKET_OPTION):
- NONE: Disabled (default)
- LIST: Specific buckets from ASSET_PROXY_BUCKET_LIST
- ALL: All S3 assets
- ALL_BUCKETS_IN_ACCOUNT: All accessible buckets in AWS account
Pre/Post Hook System¶
The Pre/Post Hook system enables custom logic injection at two key points in the request lifecycle: before request processing (Pre-Hook) and after response generation (Post-Hook). Both hooks are optional Lambda functions invoked synchronously by the API Lambda.
sequenceDiagram
participant Client
participant API Gateway
participant API Lambda
participant Pre-Hook
participant OpenSearch
participant Post-Hook
Client->>API Gateway: HTTP Request
API Gateway->>API Lambda: Forward Request
alt Pre-Hook Configured
API Lambda->>Pre-Hook: Invoke with event
Pre-Hook->>Pre-Hook: Validate/Authorize
alt Rejected
Pre-Hook-->>API Lambda: Return 401/403
API Lambda-->>Client: Error Response
else Authorized
Pre-Hook->>Pre-Hook: Modify Event
Note over Pre-Hook: Add _collections, _filter,<br/>headers, context
Pre-Hook-->>API Lambda: Modified Event
end
end
API Lambda->>OpenSearch: Query with filters
OpenSearch-->>API Lambda: Results
API Lambda->>API Lambda: Generate Response
alt Post-Hook Configured
API Lambda->>Post-Hook: Invoke with response
Post-Hook->>Post-Hook: Transform/Log
Note over Post-Hook: Modify body, add headers,<br/>send analytics
Post-Hook-->>API Lambda: Modified Response
end
API Lambda-->>API Gateway: Final Response
API Gateway-->>Client: HTTP Response
rect rgba(255, 182, 193, 0.3)
Note over Pre-Hook: Pre-Hook can reject requests
end
rect rgba(144, 238, 144, 0.3)
Note over Post-Hook: Post-Hook cannot change status
end
Pre-Hook¶
The Pre-Hook Lambda is invoked before the API Lambda processes the request, allowing request validation, authorization, and modification.
Capabilities:
- Read all request headers, query parameters, and body
- Reject request (return error response directly with 401/403)
- Modify request event (add/change query parameters, modify body)
- Add response headers (CORS headers, custom headers)
- Inject authorization parameters (_collections, _filter for user-specific filtering)
Example Use Cases: - API key validation and quota enforcement - JWT token verification and claims extraction - User-specific collection filtering based on permissions - Request logging and metrics collection - Custom authentication schemes (OAuth, SAML) - Geo-fencing or IP-based access control
Post-Hook¶
The Post-Hook Lambda is invoked after the API Lambda generates the response, allowing response transformation and analytics without affecting core logic.
Capabilities: - Read full response (status code, headers, body) - Modify response body (transform structure, add computed fields, redact data) - Add or modify response headers (caching, custom headers) - Cannot change HTTP status code - Perform async logging and analytics
Example Use Cases: - Response transformation (inject computed fields, normalize structure) - Redaction of sensitive data for specific users - Analytics and metrics collection (usage tracking, performance monitoring) - Custom STAC extension injection - Response caching headers based on content type - A/B testing headers
Configuration¶
Both hooks are configured via environment variables pointing to Lambda ARNs. The API Lambda's IAM role must have permission to invoke these functions.
# serverless.yml
environment:
PRE_HOOK: arn:aws:lambda:region:account:function:my-pre-hook
POST_HOOK: arn:aws:lambda:region:account:function:my-post-hook
iam:
role:
statements:
- Effect: Allow
Action: lambda:InvokeFunction
Resource:
- arn:aws:lambda:region:account:function:my-pre-hook
- arn:aws:lambda:region:account:function:my-post-hook
Hook Requirements:
- Synchronous invocation (RequestResponse type)
- Return proper event/response structure
- Handle errors gracefully (timeouts, exceptions)
- Pre-Hook: Must return modified event or error response
- Post-Hook: Must return modified response object
Deployment Architecture¶
Multi-Region Setup¶
graph TB
subgraph "Region 1 (us-west-2)"
r53[Route53]
cf1[CloudFront Distribution]
apigw1[API Gateway]
lambda1[Lambda Functions]
os1[(OpenSearch)]
end
subgraph "Region 2 (us-east-1)"
cf2[CloudFront Distribution]
apigw2[API Gateway]
lambda2[Lambda Functions]
os2[(OpenSearch)]
end
subgraph "Data Replication"
os1 -.->|Cross-Cluster<br/>Replication| os2
end
r53 -->|Geolocation/Latency| cf1
r53 -->|Geolocation/Latency| cf2
cf1 --> apigw1
cf2 --> apigw2
apigw1 --> lambda1
apigw2 --> lambda2
lambda1 --> os1
lambda2 --> os2
style os1 fill:#ff9900
style os2 fill:#ff9900
Multi-Region Considerations: - Use OpenSearch Cross-Cluster Replication for data sync - CloudFront for global edge caching - Route53 for geographic routing - Separate ingest per region or centralized
High Availability Setup¶
Components: - API Gateway: Multi-AZ by default - Lambda: Multi-AZ by default - OpenSearch: Multi-AZ with dedicated master nodes - SQS: Multi-AZ by default
Best Practices: - 3+ OpenSearch nodes across AZs - Enable OpenSearch automatic snapshots - Monitor Lambda concurrent execution limits - Set up CloudWatch alarms for DLQ depth
Performance Optimization¶
API Response Times: - Cold start: 1-3 seconds (first request) - Warm request: 50-200ms (depends on query complexity) - OpenSearch query: 10-100ms (most queries)
Optimization Strategies:
1. Provisioned Concurrency: Keep Lambda warm (costs more)
2. OpenSearch Tuning: Right-size instances, add nodes
3. CloudFront: Cache GET requests for static data
4. Connection Pooling: Reuse OpenSearch connections
5. Pagination: Use search_after instead of from/size
Ingest Throughput: - SQS: Unlimited buffering - Lambda: Up to 1000 concurrent executions (soft limit) - OpenSearch: Depends on instance size and cluster - Typical: 100-1000 items/second with proper tuning
Cost Optimization¶
Cost Drivers (typical production deployment): 1. OpenSearch: 70-80% (always running) 2. Lambda Invocations: 10-15% 3. Data Transfer: 5-10% 4. Other Services: 5%
Optimization Tips: - Use Reserved Instances for OpenSearch - Right-size OpenSearch cluster (don't over-provision) - Enable response compression - Use S3 for large assets (not OpenSearch) - Monitor and alert on Lambda throttles
Security Architecture¶
Network Security¶
graph LR
subgraph "Public"
client[Clients]
cf[CloudFront]
end
subgraph "AWS VPC (Optional)"
apigw[API Gateway<br/>Public]
lambda[Lambda<br/>VPC or Public]
os[(OpenSearch<br/>VPC)]
end
subgraph "Security Controls"
waf[WAF]
sg[Security Groups]
nacl[Network ACLs]
end
client -->|HTTPS| cf
cf -->|HTTPS| waf
waf --> apigw
apigw --> lambda
lambda -->|VPC Peering<br/>or VPC Lambda| os
sg -.->|Control| lambda
sg -.->|Control| os
nacl -.->|Control| os
style waf fill:#FF6B6B
style sg fill:#FFB6C1
Security Layers: 1. CloudFront: DDoS protection, SSL termination, WAF 2. WAF: Rate limiting, IP filtering, SQL injection protection 3. API Gateway: Throttling, API keys, resource policies 4. Lambda: Execution role with minimal permissions 5. OpenSearch: VPC isolation, security groups, encryption at rest/in transit 6. Secrets Manager: Encrypted credential storage
Data Security¶
Encryption: - At Rest: OpenSearch encryption enabled, S3 bucket encryption - In Transit: TLS 1.2+ for all connections - Secrets: AWS Secrets Manager with KMS encryption
Access Control: - Least Privilege: Lambda roles with specific permissions - Credential Rotation: Automated via Secrets Manager - Audit Logging: CloudWatch logs for all Lambda invocations - OpenSearch Audit: Fine-grained audit logs (when enabled)
Next Steps¶
- Guides > Deployment - Deploy STAC Server to AWS
- Guides > Configuration - Configure collections and environment variables
- Reference > API Overview - Complete endpoint specifications
- About > Contributing - Contribute to the project