Usage

Search a dataset on title in a specific collection

from dtotools.search import search_on_title
results = search_on_title(title="koster", collection="emodnet-biology")
print(results)

This will return

[<Item id=bdbeb221-7656-52e5-9ade-4b3304db82cd>]

This Item is a pystac item which can be further explored using PySTAC library .

Search a dataset on title in all collections

from dtotools.search import search_on_title
results = search_on_title(title="koster")

Inspect a parquet file

from dtotools.inspect_parquet import inspect_parquet
inspect_parquet("https://s3.waw3-1.cloudferro.com/emodnet/emodnet_biology/12639/marine_biodiversity_observations_2026-02-26.parquet)

inspect_parquet(
    dataset=DATASET_URL,
    columns=["parameter"],
    filters=[("parameter_imisdasid", [4687])],
    output_file="output/inspect_parquet_0.csv"
    )

This will result in

column_name,column_type,unique_values
parameter,string,"[{""value"": ""Detritus (#/l)"", ""count"": 27594}, {""value"": ""Diameter_sample_collector_aperture (cm)"", ""count"": 25644}, {""value"": ""Fibres (#/l)"", ""count"": 27594}, {""value"": ""LifeStage"", ""count"": 27552}, {""value"": ""Mesh_size (um)"", ""count"": 25644}, {""value"": ""Samp_vol (l)"", ""count"": 27540}, {""value"": ""sampling_instrument_name"", ""count"": 26007}, {""value"": ""sampling_platform_name"", ""count"": 27927}, {""value"": ""SubSamplingCoefficient (Dmnless)"", ""count"": 27429}, {""value"": ""unidentified_biota (#/l)"", ""count"": 27594}, {""value"": ""WaterAbund (#/ml)"", ""count"": 27582}]"

Read a parquet file

Read a parquet file without filtering:

DATASET_URL = "https://s3.waw3-1.cloudferro.com/emodnet/emodnet_biology/12639/marine_biodiversity_observations_2026-02-26.parquet"

result = read_parquet(parquet=DATASET_URL, max_rows=10)

Read a parquet file with filtering:

DATASET_URL = "https://s3.waw3-1.cloudferro.com/emodnet/emodnet_biology/12639/marine_biodiversity_observations_2026-02-26.parquet"

result = read_parquet(
    parquet=DATASET_URL,
    # columns=["datasetid"],
    filters={"datasetid": 4687},
    max_rows=50
)