Async iteration¶
In rustac v0.8.1 we added the ability to iterate a search asynchronously. Let's compare this new capability with the synchronous version via pystac-client.
The copernicus-dem
collection at https://stac.eoapi.dev has 26450 items, which makes it a good single collection test case for iterating over a bunch of things.
import time
from tqdm.notebook import tqdm
import rustac
from pystac_client import Client
url = "https://stac.eoapi.dev"
collection = "copernicus-dem"
total = 26450
building "rustac" rebuilt and loaded package "rustac" in 4.783s
First, let's try pystac-client. In our testing, it takes almost six minutes to iterate over everything, so we're going to limit things to the first one thousand items.
client = Client.open(url)
items = []
progress = tqdm(total=1000)
start = time.time()
item_search = client.search(collections=[collection])
for item in item_search.items():
items.append(item)
progress.update()
if len(items) >= 1000:
break
print(f"Got {len(items)} items in {time.time() - start:.2f} seconds")
progress.close()
0%| | 0/1000 [00:00<?, ?it/s]
Got 1000 items in 14.28 seconds
rustac does some asynchronous page pre-fetching under the hood, so it might be faster? Let's find out.
progress = tqdm(total=1000)
items = []
start = time.time()
search = await rustac.iter_search(url, collections=[collection])
async for item in search:
items.append(item)
progress.update()
if len(items) >= 1000:
break
print(f"Got {len(items)} items in {time.time() - start:.2f} seconds")
progress.close()
0%| | 0/1000 [00:00<?, ?it/s]
Got 1000 items in 13.67 seconds
Okay, that's about the same, which suggests we're mostly being limited by server response time. If we increase the page size, does that make our async iteration faster?
client = Client.open(url)
items = []
progress = tqdm(total=5000)
start = time.time()
item_search = client.search(collections=[collection], limit=500)
for item in item_search.items():
items.append(item)
progress.update()
if len(items) >= 5000:
break
print(f"Got {len(items)} items in {time.time() - start:.2f} seconds")
progress.close()
0%| | 0/5000 [00:00<?, ?it/s]
Got 5000 items in 11.09 seconds
progress = tqdm(total=5000)
items = []
start = time.time()
search = await rustac.iter_search(url, collections=[collection], limit=500)
async for item in search:
items.append(item)
progress.update()
if len(items) >= 5000:
break
print(f"Got {len(items)} items in {time.time() - start:.2f} seconds")
progress.close()
0%| | 0/5000 [00:00<?, ?it/s]
Got 5000 items in 10.77 seconds