Quick Start

Get started with py-soildb in 5 minutes

Quick Start

Choose your pattern based on your use case, then see Workflows for detailed examples.

Pattern: Simple Script or Jupyter Notebook

Use this for: Scripts, Jupyter notebooks, quick analysis, most one-off queries.

Recommended because it’s simplest and handles client management automatically.

from soildb import get_mapunit_by_point

# Synchronous wrapper - all soildb functions support .sync() method
# This handles async client creation and event loop management automatically
response = get_mapunit_by_point(-93.6, 42.0)
df = response.to_pandas()
print(df)

See also:


Pattern: Production Application or High-Performance Queries

Use this for: Web services, long-running applications, concurrent requests, bulk operations.

Why: Explicit client management and async/await allow concurrent processing and better resource control.

import asyncio
from soildb import SDAClient, query_templates

async def fetch_data():
    async with SDAClient() as client:
        query = query_templates.query_mapunits_by_legend("IA109")
        response = await client.execute(query)
        return response.to_pandas()

df = asyncio.run(fetch_data())

See also:


Pattern: Query by Location

Use this for: “What soil is at this point?” or “Show me soil in this area.”

from soildb import get_mapunit_by_point, spatial_query

# Single point
response = get_mapunit_by_point(-93.6, 42.0)

# Bounding box
response = spatial_query(
    {"xmin": -93.65, "ymin": 42.0, "xmax": -93.6, "ymax": 42.05}, 
    table="mupolygon"
)

# Polygon (WKT)
response = spatial_query(
    "POLYGON((-93.7 42.0, -93.6 42.0, -93.6 42.1, -93.7 42.1, -93.7 42.0))",
    table="mupolygon"
)

See also:


Pattern: Query by Survey Area

Use this for: “Get all soil data for county XYZ.”

from soildb import get_mapunit_by_areasymbol

# Get all map units in Iowa survey area IA109
response = get_mapunit_by_areasymbol("IA109")
df = response.to_pandas()

See also:


Pattern: Bulk Data Fetching

Use this for: “Fetch components/horizons for 1000+ map units.”

from soildb import fetch_by_keys, get_mukey_by_areasymbol

# Get mukeys first
mukeys = get_mukey_by_areasymbol(["IA109", "IA113"])

# Fetch all components - automatic pagination/chunking
response = fetch_by_keys(
    mukeys, 
    "component", 
    key_column="mukey",
    chunk_size=500  # Auto-handled internally
)
df = response.to_pandas()

See also:


Pattern: Custom SQL Query

Use this for: Complex queries, joins, specific WHERE conditions.

from soildb import Query, SDAClient

query = (Query()
    .select("mukey", "muname", "compname")
    .from_("mapunit m")
    .inner_join("component c", "m.mukey = c.mukey")
    .where("m.areasymbol = 'IA109'")
    .where("c.majcompflag = 'Yes'")
    .limit(100))

# Async (recommended)
async with SDAClient() as client:
    response = await client.execute(query)

# Or synchronous (runs its own event loop)
response = SDAClient.execute.sync(query)

See also:


Pattern: AWDB Monitoring Data

Use this for: “Get soil water data from SCAN/SNOTEL stations.”

import asyncio
from soildb.awdb import discover_stations, get_soil_moisture_by_depth

async def main():
    # Find SCAN stations in Idaho
    stations = await discover_stations(
        state_codes=["ID"],
        network_codes=["SCAN"],
        active_only=True,
    )
    print(f"Found {len(stations)} SCAN stations in Idaho")

    # Get soil moisture data for a specific station
    data = await get_soil_moisture_by_depth(
        "2126:NV:SCAN",
        depths_inches=[-2, -8],
        start_date="2024-01-01",
        end_date="2024-01-31",
    )
    print(f"Data points: {data['depths'][-2]['n_data_points']}")

asyncio.run(main())

Note: AWDB is a free public resource. Space out requests and avoid hammering it with concurrent queries. If you encounter errors, reduce request frequency and retry with backoff.

See also:


Pattern: Convert to SoilProfileCollection

Use this for: Specialized soil horizon analysis (requires soilprofilecollection package).

from soildb import get_mapunit_by_point

# Fetch horizon data (includes component info)
response = get_mapunit_by_point(-93.6, 42.0)

# Convert to SoilProfileCollection
spc = response.to_soilprofilecollection()

# Use SPC methods
print(f"Number of profiles: {len(spc)}")
print(f"Deepest profile: {spc.depth.max()} cm")

See also:


Export Formats

All API calls return SDAResponse which supports multiple formats:

# pandas DataFrame (most common)
df = response.to_pandas()

# Polars DataFrame (performant alternative)
df = response.to_polars()

# Dictionary list (JSON-like)
data = response.to_dict()

# SoilProfileCollection (soil science)
spc = response.to_soilprofilecollection()

# GeoDataFrame (spatial analysis, requires GeoPandas)
gdf = response.to_geodataframe()

Error Handling

from soildb import SDAConnectionError, SDAQueryError

try:
    response = get_mapunit_by_point(-93.6, 42.0)
except SDAConnectionError:
    print("Can't reach SDA service")
except SDAQueryError as e:
    print(f"Query error: {e}")

See Error Handling for exception types and recovery strategies.


Next Steps

  1. See Workflows for detailed examples with explanations
  2. Clone/run examples from docs/examples/
  3. Check Async Guide if building high-performance services
  4. Read API Reference for complete function documentation
  5. Review Troubleshooting for common issues