Troubleshooting
Connectivity Issues
Issue: “Failed to connect to USDA Soil Data Access service”
Symptoms: - SDAConnectionError thrown - Can’t reach SDA endpoint - Network timeouts
Solutions:
Check your internet connection
ping wcc.nrcs.usda.govTry a different network (if on corporate network with proxy)
from soildb import SDAClient from soildb.base_client import ClientConfig # Configure proxy if needed config = ClientConfig( timeout=30, retry_count=3, retry_delay=1 ) async with SDAClient(config=config) as client: result = await client.execute(query)Increase timeout for slow connections
async with SDAClient(timeout=60) as client: # 60 seconds instead of default result = await client.execute(query)Check if SDA is under maintenance
- SDA undergoes daily maintenance from 12:45 AM to 1 AM Central Time
- Check SDA status page
- Try again after maintenance window closes
Issue: “Request timed out”
Symptoms: - SDATimeoutError thrown - Queries taking longer than expected - Works for simple queries but not complex ones
Solutions:
Increase timeout threshold
async with SDAClient(timeout=120) as client: result = await client.execute(query)Break query into smaller chunks
from soildb import fetch_by_keys # Instead of fetching all components at once: # Bad: Too large for one query # all_components = await fetch_by_keys(all_mukeys, "component", "mukey") # Good: Break into batches batch_size = 100 all_components = [] for i in range(0, len(mukeys), batch_size): batch = mukeys[i:i+batch_size] components = await fetch_by_keys(batch, "component", "mukey") all_components.extend(components)Simplify your query
from soildb import Query # Bad: Requesting too many columns and complex conditions complex_query = Query().select( "mukey", "muname", "muacres", "mapunitsymbol", "slopegradient", "flodfreqdcd", "comp_id", "coname", "localphase", "comppct_r", "taxclname", "chorizonkey", "hzname", "hzdept_r", "hzdepb_r", "sandtotal_r", "silttotal_r" ).from_("mapunit").inner_join("component").inner_join("chorizon").where( "areasymbol IN ('IA001', 'IA003', 'IA005', 'IA007')" ) # Good: Start with essential columns simple_query = Query().select( "mukey", "muname" ).from_("mapunit").where("areasymbol = 'IA001'")Use pagination for large result sets
from soildb import Query query = Query().select("mukey", "muname").from_("mapunit").limit(100) result = await client.execute(query) # Get first 100 rows, then increment offset for next batch offset = 0 all_data = [] while offset < total_count: paginated = Query().select("mukey", "muname").from_("mapunit").limit(100).offset(offset) batch = await client.execute(paginated) all_data.extend(batch) offset += 100
Query Issues
Issue: “Query failed: Column ‘X’ does not exist”
Symptoms: - SDAQueryError with message about invalid column - Query syntax looks correct - Works in SSURGO but not in soildb
Solutions:
Check column names (case-sensitive, no spaces)
# Bad: Wrong column names bad_query = Query().select("muacres").from_("mapunit") # Good: Correct SSURGO names good_query = Query().select("muacres").from_("mapunit")Use Query builder for safety
from soildb import Query # Use fluent API instead of raw SQL query = (Query() .select("mukey", "muname", "muacres") .from_("mapunit") .where("areasymbol = 'IA109'") .limit(10))Inspect table structure
from soildb import Query, SDAClient # Get sample data to see available columns async with SDAClient() as client: query = Query().select("*").from_("mapunit").limit(1) result = await client.execute(query) print(f"Available columns: {result.columns}")
Issue: “No results returned”
Symptoms: - Query executes successfully but returns empty DataFrame - No error messages - Uncertain why no data
Solutions:
Verify area symbol exists
from soildb import get_sacatalog # List all valid survey areas catalog = await get_sacatalog() print(catalog.to_pandas()) # Check specific area symbol ia_areas = catalog[catalog['areasymbol'].str.contains('IA')]Check your WHERE conditions
from soildb import Query # Too restrictive conditions? # Bad: Might return nothing if no exact matches restrictive = Query().select("*").from_("mapunit").where( "areasymbol = 'IA109' AND muacres > 10000" ) # Good: Start broad then filter broad = Query().select("mukey", "muname", "muacres").from_("mapunit").where( "areasymbol = 'IA109'" ).limit(100) result = await client.execute(broad) print(result.to_pandas())Verify data exists for your criteria
from soildb import point_query # Check if data exists at a location response = await point_query(-93.6, 42.0, "mupolygon") if response.data: print(f"Found {len(response.data)} mapunits") else: print("No mapunits found at this location")
Issue: Query too complex for SDA
Symptoms: - Complex joins cause timeouts - Multiple table joins fail - Performance degrades with more conditions
Solutions:
Use existing templates for complex queries
from soildb import query_templates, fetch_by_keys # Use pre-built templates instead of writing complex joins query = query_templates.query_mapunits_by_legend("IA109") # For components, use fetch_by_keys with specific mukeys response = await fetch_by_keys([mukey1, mukey2], "component", key_column="mukey")Execute separate queries instead of complex joins
# Bad: Complex multi-table join # complex_query = Query().select(...).from_("mapunit").inner_join("component")... # Good: Multiple simple queries from soildb import fetch_by_keys # Step 1: Get mapunits mapunits = await client.execute( Query().select("mukey", "muname").from_("mapunit").where("areasymbol = 'IA109'") ) # Step 2: Get components for those mapunits mukeys = [row['mukey'] for row in mapunits.rows] components = await fetch_by_keys(mukeys, "component", "mukey")
Data and Validation Issues
Issue: “SoilProfileCollection validation failed”
Symptoms: - SPCValidationError when converting response to SoilProfileCollection - Invalid depth values detected - Horizon stacking issues
Solutions:
Check for invalid depths
from soildb.spc_validator import validate_soil_profile_data # Validate before converting to SPC response = await client.execute(query) data = response.to_dict() validation_result = validate_soil_profile_data(data) if validation_result.is_valid: spc = response.to_soilprofilecollection() else: print("Validation errors:", validation_result.errors)Review depth value requirements
- Top depth (hzdept_r): Must be numeric and >= 0
- Bottom depth (hzdepb_r): Must be numeric and > top depth
- No “missing” values allowed
# Check data for common issues df = response.to_pandas() # Find negative depths negative = df[df['hzdept_r'] < 0] if not negative.empty: print("Found negative depths:", negative) # Find invalid stacking (bottom <= top) invalid_stack = df[df['hzdepb_r'] <= df['hzdept_r']] if not invalid_stack.empty: print("Found invalid stacking:", invalid_stack) # Find missing values missing = df[df['hzdept_r'].isna() | df['hzdepb_r'].isna()] if not missing.empty: print("Found missing depths:", missing)Handle gaps between horizons
# SPC may flag gaps between horizons # This is not necessarily an error, but indicates discontinuity try: spc = response.to_soilprofilecollection() except SPCValidationError as e: print(f"Validation issues: {e}") # May be fixable with SPC preprocessing
Issue: “Type conversion error”
Symptoms: - TypeError when converting data types - Can’t convert to expected format - Column type mismatch
Solutions:
Check type specifications
from soildb import SDAClient from soildb.type_processors import get_type_for_column # Verify expected types async with SDAClient() as client: mukey_type = get_type_for_column("mapunit", "mukey") print(f"mukey type: {mukey_type}")Handle optional fields
import pandas as pd df = response.to_pandas() # Some columns might be nullable # Convert gracefully if 'muacres' in df.columns: df['muacres'] = pd.to_numeric(df['muacres'], errors='coerce')Check data before processing
# Inspect data types df = response.to_pandas() print(df.dtypes) # Look for unexpected values for col in df.columns: unique_vals = df[col].unique() if len(unique_vals) < 20: print(f"{col}: {unique_vals}")
Performance Issues
Issue: “Slow response times”
Symptoms: - Queries taking > 30 seconds - High memory usage - Program becomes unresponsive
Solutions:
Use async/await for concurrent requests
import asyncio from soildb import SDAClient, Query async def fetch_multiple_areas(area_symbols): """Fetch data for multiple areas concurrently.""" async with SDAClient() as client: tasks = [] for symbol in area_symbols: query = Query().select("*").from_("mapunit").where( f"areasymbol = '{symbol}'" ) tasks.append(client.execute(query)) # All requests happen concurrently results = await asyncio.gather(*tasks) return resultsBatch large operations
from soildb import fetch_by_keys # Don't fetch too many keys at once # Bad: Slow: All at once # all_results = await fetch_by_keys(10000_keys, "component", "mukey") # Good: Fast: In batches batch_size = 100 all_results = [] for i in range(0, len(keys), batch_size): batch = keys[i:i+batch_size] results = await fetch_by_keys(batch, "component", "mukey") all_results.extend(results) print(f"Processed {i+batch_size}/{len(keys)}")Cache results when appropriate
from functools import lru_cache @lru_cache(maxsize=128) def get_cached_mapunit(mukey): """Cache mapunit lookups.""" # Note: Works with sync API; for async use a different approach pass # For async, use a dict cache: mapunit_cache = {} async def get_mapunit_cached(mukey): if mukey not in mapunit_cache: query = Query().select("*").from_("mapunit").where(f"mukey = {mukey}") result = await client.execute(query) mapunit_cache[mukey] = result.rows[0] return mapunit_cache[mukey]Select only needed columns
from soildb import Query # Bad: Slow: Select all columns # query = Query().select("*").from_("chorizon").limit(10000) # Good: Fast: Select only needed columns query = Query().select("chorizonkey", "cokey", "hzname").from_("chorizon").limit(10000)Use spatial queries for large regions
from soildb import bbox_query, spatial_query # For large areas, use spatial queries # This is faster than area symbol filtering response = await bbox_query( bbox=(-95, 40, -94, 41), # (west, south, east, north) feature_type="mupolygon" )
Issue: “Out of memory”
Symptoms: - Memory usage grows without limit - Application crashes - Large DataFrames cause issues
Solutions:
Process data in chunks
from soildb import fetch_by_keys async def process_large_dataset(): # Process in chunks instead of loading all at once chunk_size = 1000 for i in range(0, len(all_keys), chunk_size): batch = all_keys[i:i+chunk_size] results = await fetch_by_keys(batch, "component", "mukey") # Process batch immediately process_results(results) # Don't keep all in memory del resultsUse generators for streaming
async def stream_large_result(): """Stream results instead of loading all.""" query = Query().select("mukey", "muname").from_("mapunit") response = await client.execute(query) # Process row by row for row in response.data: yield rowDelete unused DataFrames
import gc df1 = response1.to_pandas() # Large DataFrame # Process df1 del df1 gc.collect() # Force garbage collection df2 = response2.to_pandas() # Load next dataset
Installation and Import Issues
Issue: “ModuleNotFoundError: No module named ‘soildb’”
Symptoms: - Import fails on fresh installation - Can’t find soildb package - Virtual environment not activated
Solutions:
Install the package
pip install soildb # or from development pip install -e .Activate virtual environment
# Linux/Mac source venv/bin/activate # Windows venv\Scripts\activateCheck Python path
import sys print(sys.path) # soildb should be in one of these paths import soildb print(soildb.__file__)
Issue: “Optional dependency not installed”
Symptoms: - ImportError: pandas not found (but pandas needed) - Some features not available
Solutions:
Install optional dependencies
# With pandas support pip install soildb[pandas] # With geopandas (includes pandas) pip install soildb[geopandas] # With polars pip install soildb[polars] # All extras pip install soildb[all]Check what’s installed
pip list | grep -E 'soildb|pandas|polars|geo'
Getting Help
If you encounter issues not covered here:
Check the test suite for examples
- Look in
tests/directory for test cases similar to your issue - See how tests handle errors and edge cases
- Look in
Check existing issues on GitHub
- Search py-soildb issues
- Filter by labels like “bug”, “question”, “help wanted”
Run in debug mode
import logging logging.basicConfig(level=logging.DEBUG) logger = logging.getLogger("soildb") logger.setLevel(logging.DEBUG) # Now run your code with detailed logging result = await client.execute(query)Create a minimal reproducible example
import asyncio from soildb import SDAClient, Query async def minimal_example(): async with SDAClient() as client: query = Query().select("mukey").from_("mapunit").limit(1) result = await client.execute(query) return result.to_dict() asyncio.run(minimal_example())Report the issue with:
- Your Python version:
python --version - soildb version:
pip show soildb - Operating system
- Minimal reproducible code
- Full error traceback
- What you expected vs. what happened
- Your Python version: