from soildb import Query
query = (Query()
.select("mukey", "muname", "musym")
.from_("mapunit")
.inner_join("legend", "mapunit.lkey = legend.lkey")
.where("areasymbol = 'IA109'")
.limit(5))
# inspect query
print(query.to_sql())
result = await soildb.SDAClient().execute(query)
df = result.to_pandas()
print(df.head())soildb
Python client for the USDA-NRCS Soil Data Access (SDA) web service, Air and Water Database (AWDB), and other National Cooperative Soil Survey data sources.
Overview
soildb provides Python access to the USDA Soil Data Access (SDA) web service https://sdmdataaccess.nrcs.usda.gov/.
Query soil survey data, export to pandas/polars DataFrames, and handle spatial queries.
Installation
pip install soildbFor spatial functionality:
pip install soildb[spatial]For all optional features support:
pip install soildb[all]Features
- Query soil survey data from SDA
- Export to pandas and polars DataFrames
- Build custom SQL queries with fluent interface
- Spatial queries with points, bounding boxes, and polygons
- Bulk data fetching with automatic pagination
- Async I/O for high performance and concurrency
- Access to AWDB (Air and Water Database) for real-time monitoring data
- SCAN/SNOTEL station data with soil moisture, temperature, snow, and weather data
- Advanced station filtering with wildcards and sensor discovery
- Multi-depth soil data queries and forecast data retrieval
Quick Start
Query Builder
This is a basic example of building a custom query and getting the results:
Async Setup
You may have noticed that we need to await the query execution result.
All soildb functions are async. Here’s how to run them in different environments like Jupyter notebooks, VSCode, or regular Python scripts.
Basic Async Execution
import asyncio
import soildb
async def main():
# Your async code here
mapunits = await soildb.get_mapunit_by_areasymbol("IA109")
df = mapunits.to_pandas()
return df
# Handle different environments
try:
# Check if there's already an event loop (Jupyter, etc.)
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
result = loop.run_until_complete(main())
except RuntimeError:
# No existing loop, use asyncio.run()
result = asyncio.run(main())
resultFor comprehensive async usage, see the Async Programming Guide.
Getting Started
New to soildb? Start with: 1. Quick Start — Choose your pattern (script vs app) 2. Common Workflows — Detailed task-based examples 3. Examples — Runnable code samples
Convenience Functions
soildb provides several high-level functions for common tasks:
async with soildb.SDAClient() as client:
mapunits = await soildb.get_mapunit_by_areasymbol("IA109", client=client)
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()If you have suggestions for new convenience functions please file a “feature request” on GitHub.
Spatial Queries
soildb also offers support for queries by location via spatial_query(). You can specify arbitrary geometry to target several spatial and tabular types of results.
import asyncio
async def spatial_query_example():
from soildb import spatial_query
# Point query
async with soildb.SDAClient() as client:
response = await spatial_query(
geometry="POINT (-93.6 42.0)",
table="mupolygon",
spatial_relation="intersects"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")
return df
# Handle different environments
try:
# Check if there's already an event loop (Jupyter, etc.)
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
result = loop.run_until_complete(spatial_query_example())
except RuntimeError:
# No existing loop, use asyncio.run()
result = asyncio.run(spatial_query_example())
resultBulk Data Fetching
soildb makes it easy to retrieve large datasets efficiently, using concurrent requests and built-in functions that automatically handle pagination.
import asyncio
async def bulk_fetch_example():
from soildb import fetch_by_keys, get_mukey_by_areasymbol
# Get mukeys for multiple areas concurrently
areas = ["IA109", "IA113", "IA117"]
mukeys_tasks = [
get_mukey_by_areasymbol([area])
for area in areas
]
# Execute all mukey requests concurrently
mukeys_results = await asyncio.gather(*mukeys_tasks)
# Flatten the results (each task returns a list)
all_mukeys = []
for mukeys in mukeys_results:
all_mukeys.extend(mukeys)
print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")
# Fetch data in chunks automatically
response = await fetch_by_keys(
all_mukeys,
"component",
key_column="mukey",
chunk_size=100,
columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")
return df
# Handle different environments
try:
# Check if there's already an event loop (Jupyter, etc.)
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
result = loop.run_until_complete(bulk_fetch_example())
except RuntimeError:
# No existing loop, use asyncio.run()
result = asyncio.run(bulk_fetch_example())
result.head(10)The component table has a hierarchical relationship:
- mukey (map unit key) is the parent
- cokey (component key) is the child
So when fetching components, you typically want to filter by mukey to get all components for specific map units.
Use fetch_by_keys() with key_column="mukey" to achieve this and demonstrate pagination over chunks with 100 rows each.
AWDB (Air and Water Database) Access
NEW in soildb: Access real-time monitoring data from SCAN, SNOTEL, and other NRCS networks!
Quick AWDB Start
from soildb.awdb import AWDBClient
# Get soil moisture data from SCAN stations
async def awdb_example():
async with AWDBClient() as client:
# Find stations with soil moisture sensors in California
stations = await client.get_stations(
elements=['SMS:*'], # Soil moisture sensors at any depth
state_codes=['CA']
)
# Get multi-depth soil moisture data
if stations:
station = stations[0]
data = await client.get_station_data(
station.station_triplet,
'SMS', # Soil moisture element
'2024-01-01',
'2024-01-10'
)
print(f"Retrieved {len(data)} soil moisture readings")
# Handle async execution
try:
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
loop.run_until_complete(awdb_example())
except RuntimeError:
asyncio.run(awdb_example())AWDB Features
- Station Discovery: Find stations by location, network, sensors, or name patterns
- Multi-Depth Soil Data: Query soil moisture and temperature at multiple depths
- Real-Time Weather: Access precipitation, temperature, wind, and solar radiation
- Snow Data: SNOTEL snow water equivalent, depth, and snowmelt data
- Forecast Data: Water supply and drought forecasts
- Quality Control: Access to quality flags and original values
- Wildcard Filtering: Advanced station filtering with
*wildcards
Advanced Station Filtering
from soildb.awdb.convenience import discover_stations
# Find all SNOTEL stations in Oregon or Washington
snotel_stations = await discover_stations(
station_triplets=['*:OR:SNTL', '*:WA:SNTL']
)
# Find stations with soil moisture sensors in specific HUC area
soil_stations = await discover_stations(
elements=['SMS:*'],
hucs=['170601*'] # Columbia River basin
)
# Find stations by name pattern
lake_stations = await discover_stations(
station_names=['*Lake*']
)Multi-Depth Soil Data
from soildb.awdb.convenience import get_soil_moisture_by_depth
# Get soil moisture at multiple depths for a location
soil_data = await get_soil_moisture_by_depth(
latitude=37.5,
longitude=-120.5,
start_date='2024-01-01',
end_date='2024-01-10',
depths=[-2, -4, -8, -20, -40] # Negative = depth below surface
)
# Each depth returns time series data
for depth, readings in soil_data.items():
print(f"Depth {depth}\": {len(readings)} readings")Examples
See the examples/ directory and documentation for detailed usage patterns.
License
This project is licensed under the MIT License. See the LICENSE file for details.