Troubleshooting

Solutions to common issues with py-soildb

Connectivity Issues

Issue: “Failed to connect to USDA Soil Data Access service”

Symptoms: - SDAConnectionError thrown - Can’t reach SDA endpoint - Network timeouts

Solutions:

  1. Check your internet connection

    ping wcc.nrcs.usda.gov
  2. Try a different network (if on corporate network with proxy)

    from soildb import SDAClient
    from soildb.base_client import ClientConfig
    
    # Configure proxy if needed
    config = ClientConfig(
        timeout=30,
        retry_count=3,
        retry_delay=1
    )
    async with SDAClient(config=config) as client:
        result = await client.execute(query)
  3. Increase timeout for slow connections

    async with SDAClient(timeout=60) as client:  # 60 seconds instead of default
        result = await client.execute(query)
  4. Check if SDA is under maintenance

    • SDA undergoes daily maintenance from 12:45 AM to 1 AM Central Time
    • Check SDA status page
    • Try again after maintenance window closes

Issue: “Request timed out”

Symptoms: - SDATimeoutError thrown - Queries taking longer than expected - Works for simple queries but not complex ones

Solutions:

  1. Increase timeout threshold

    async with SDAClient(timeout=120) as client:
        result = await client.execute(query)
  2. Break query into smaller chunks

    from soildb import fetch_by_keys
    
    # Instead of fetching all components at once:
    # Bad: Too large for one query
    # all_components = await fetch_by_keys(all_mukeys, "component", "mukey")
    
    # Good: Break into batches
    batch_size = 100
    all_components = []
    for i in range(0, len(mukeys), batch_size):
        batch = mukeys[i:i+batch_size]
        components = await fetch_by_keys(batch, "component", "mukey")
        all_components.extend(components)
  3. Simplify your query

    from soildb import Query
    
    # Bad: Requesting too many columns and complex conditions
    complex_query = Query().select(
        "mukey", "muname", "muacres", "mapunitsymbol",
        "slopegradient", "flodfreqdcd", "comp_id", "coname",
        "localphase", "comppct_r", "taxclname", "chorizonkey",
        "hzname", "hzdept_r", "hzdepb_r", "sandtotal_r", "silttotal_r"
    ).from_("mapunit").inner_join("component").inner_join("chorizon").where(
        "areasymbol IN ('IA001', 'IA003', 'IA005', 'IA007')"
    )
    
    # Good: Start with essential columns
    simple_query = Query().select(
        "mukey", "muname"
    ).from_("mapunit").where("areasymbol = 'IA001'")
  4. Use pagination for large result sets

    from soildb import Query
    
    query = Query().select("mukey", "muname").from_("mapunit").limit(100)
    result = await client.execute(query)
    
    # Get first 100 rows, then increment offset for next batch
    offset = 0
    all_data = []
    while offset < total_count:
        paginated = Query().select("mukey", "muname").from_("mapunit").limit(100).offset(offset)
        batch = await client.execute(paginated)
        all_data.extend(batch)
        offset += 100

Query Issues

Issue: “Query failed: Column ‘X’ does not exist”

Symptoms: - SDAQueryError with message about invalid column - Query syntax looks correct - Works in SSURGO but not in soildb

Solutions:

  1. Check column names (case-sensitive, no spaces)

    # Bad: Wrong column names
    bad_query = Query().select("muacres").from_("mapunit")
    
    # Good: Correct SSURGO names
    good_query = Query().select("muacres").from_("mapunit")
  2. Use Query builder for safety

    from soildb import Query
    
    # Use fluent API instead of raw SQL
    query = (Query()
        .select("mukey", "muname", "muacres")
        .from_("mapunit")
        .where("areasymbol = 'IA109'")
        .limit(10))
  3. Inspect table structure

    from soildb import Query, SDAClient
    
    # Get sample data to see available columns
    async with SDAClient() as client:
        query = Query().select("*").from_("mapunit").limit(1)
        result = await client.execute(query)
        print(f"Available columns: {result.columns}")

Issue: “No results returned”

Symptoms: - Query executes successfully but returns empty DataFrame - No error messages - Uncertain why no data

Solutions:

  1. Verify area symbol exists

    from soildb import get_sacatalog
    
    # List all valid survey areas
    catalog = await get_sacatalog()
    print(catalog.to_pandas())
    
    # Check specific area symbol
    ia_areas = catalog[catalog['areasymbol'].str.contains('IA')]
  2. Check your WHERE conditions

    from soildb import Query
    
    # Too restrictive conditions?
    # Bad: Might return nothing if no exact matches
    restrictive = Query().select("*").from_("mapunit").where(
        "areasymbol = 'IA109' AND muacres > 10000"
    )
    
    # Good: Start broad then filter
    broad = Query().select("mukey", "muname", "muacres").from_("mapunit").where(
        "areasymbol = 'IA109'"
    ).limit(100)
    result = await client.execute(broad)
    print(result.to_pandas())
  3. Verify data exists for your criteria

    from soildb import point_query
    
    # Check if data exists at a location
    response = await point_query(-93.6, 42.0, "mupolygon")
    if response.data:
        print(f"Found {len(response.data)} mapunits")
    else:
        print("No mapunits found at this location")

Issue: Query too complex for SDA

Symptoms: - Complex joins cause timeouts - Multiple table joins fail - Performance degrades with more conditions

Solutions:

  1. Use existing templates for complex queries

    from soildb import query_templates, fetch_by_keys
    
    # Use pre-built templates instead of writing complex joins
    query = query_templates.query_mapunits_by_legend("IA109")
    
    # For components, use fetch_by_keys with specific mukeys
    response = await fetch_by_keys([mukey1, mukey2], "component", key_column="mukey")
  2. Execute separate queries instead of complex joins

    # Bad: Complex multi-table join
    # complex_query = Query().select(...).from_("mapunit").inner_join("component")...
    
    # Good: Multiple simple queries
    from soildb import fetch_by_keys
    
    # Step 1: Get mapunits
    mapunits = await client.execute(
        Query().select("mukey", "muname").from_("mapunit").where("areasymbol = 'IA109'")
    )
    
    # Step 2: Get components for those mapunits
    mukeys = [row['mukey'] for row in mapunits.rows]
    components = await fetch_by_keys(mukeys, "component", "mukey")

Data and Validation Issues

Issue: “SoilProfileCollection validation failed”

Symptoms: - SPCValidationError when converting response to SoilProfileCollection - Invalid depth values detected - Horizon stacking issues

Solutions:

  1. Check for invalid depths

    from soildb.spc_validator import validate_soil_profile_data
    
    # Validate before converting to SPC
    response = await client.execute(query)
    data = response.to_dict()
    
    validation_result = validate_soil_profile_data(data)
    if validation_result.is_valid:
        spc = response.to_soilprofilecollection()
    else:
        print("Validation errors:", validation_result.errors)
  2. Review depth value requirements

    • Top depth (hzdept_r): Must be numeric and >= 0
    • Bottom depth (hzdepb_r): Must be numeric and > top depth
    • No “missing” values allowed
    # Check data for common issues
    df = response.to_pandas()
    
    # Find negative depths
    negative = df[df['hzdept_r'] < 0]
    if not negative.empty:
        print("Found negative depths:", negative)
    
    # Find invalid stacking (bottom <= top)
    invalid_stack = df[df['hzdepb_r'] <= df['hzdept_r']]
    if not invalid_stack.empty:
        print("Found invalid stacking:", invalid_stack)
    
    # Find missing values
    missing = df[df['hzdept_r'].isna() | df['hzdepb_r'].isna()]
    if not missing.empty:
        print("Found missing depths:", missing)
  3. Handle gaps between horizons

    # SPC may flag gaps between horizons
    # This is not necessarily an error, but indicates discontinuity
    try:
        spc = response.to_soilprofilecollection()
    except SPCValidationError as e:
        print(f"Validation issues: {e}")
        # May be fixable with SPC preprocessing

Issue: “Type conversion error”

Symptoms: - TypeError when converting data types - Can’t convert to expected format - Column type mismatch

Solutions:

  1. Check type specifications

    from soildb import SDAClient
    from soildb.type_processors import get_type_for_column
    
    # Verify expected types
    async with SDAClient() as client:
        mukey_type = get_type_for_column("mapunit", "mukey")
        print(f"mukey type: {mukey_type}")
  2. Handle optional fields

    import pandas as pd
    
    df = response.to_pandas()
    
    # Some columns might be nullable
    # Convert gracefully
    if 'muacres' in df.columns:
        df['muacres'] = pd.to_numeric(df['muacres'], errors='coerce')
  3. Check data before processing

    # Inspect data types
    df = response.to_pandas()
    print(df.dtypes)
    
    # Look for unexpected values
    for col in df.columns:
        unique_vals = df[col].unique()
        if len(unique_vals) < 20:
            print(f"{col}: {unique_vals}")

Performance Issues

Issue: “Slow response times”

Symptoms: - Queries taking > 30 seconds - High memory usage - Program becomes unresponsive

Solutions:

  1. Use async/await for concurrent requests

    import asyncio
    from soildb import SDAClient, Query
    
    async def fetch_multiple_areas(area_symbols):
        """Fetch data for multiple areas concurrently."""
        async with SDAClient() as client:
            tasks = []
            for symbol in area_symbols:
                query = Query().select("*").from_("mapunit").where(
                    f"areasymbol = '{symbol}'"
                )
                tasks.append(client.execute(query))
    
            # All requests happen concurrently
            results = await asyncio.gather(*tasks)
            return results
  2. Batch large operations

    from soildb import fetch_by_keys
    
    # Don't fetch too many keys at once
    # Bad: Slow: All at once
    # all_results = await fetch_by_keys(10000_keys, "component", "mukey")
    
    # Good: Fast: In batches
    batch_size = 100
    all_results = []
    for i in range(0, len(keys), batch_size):
        batch = keys[i:i+batch_size]
        results = await fetch_by_keys(batch, "component", "mukey")
        all_results.extend(results)
        print(f"Processed {i+batch_size}/{len(keys)}")
  3. Cache results when appropriate

    from functools import lru_cache
    
    @lru_cache(maxsize=128)
    def get_cached_mapunit(mukey):
        """Cache mapunit lookups."""
        # Note: Works with sync API; for async use a different approach
        pass
    
    # For async, use a dict cache:
    mapunit_cache = {}
    
    async def get_mapunit_cached(mukey):
        if mukey not in mapunit_cache:
            query = Query().select("*").from_("mapunit").where(f"mukey = {mukey}")
            result = await client.execute(query)
            mapunit_cache[mukey] = result.rows[0]
        return mapunit_cache[mukey]
  4. Select only needed columns

    from soildb import Query
    
    # Bad: Slow: Select all columns
    # query = Query().select("*").from_("chorizon").limit(10000)
    
    # Good: Fast: Select only needed columns
    query = Query().select("chorizonkey", "cokey", "hzname").from_("chorizon").limit(10000)
  5. Use spatial queries for large regions

    from soildb import bbox_query, spatial_query
    
    # For large areas, use spatial queries
    # This is faster than area symbol filtering
    response = await bbox_query(
        bbox=(-95, 40, -94, 41),  # (west, south, east, north)
        feature_type="mupolygon"
    )

Issue: “Out of memory”

Symptoms: - Memory usage grows without limit - Application crashes - Large DataFrames cause issues

Solutions:

  1. Process data in chunks

    from soildb import fetch_by_keys
    
    async def process_large_dataset():
        # Process in chunks instead of loading all at once
        chunk_size = 1000
    
        for i in range(0, len(all_keys), chunk_size):
            batch = all_keys[i:i+chunk_size]
            results = await fetch_by_keys(batch, "component", "mukey")
    
            # Process batch immediately
            process_results(results)
    
            # Don't keep all in memory
            del results
  2. Use generators for streaming

    async def stream_large_result():
        """Stream results instead of loading all."""
        query = Query().select("mukey", "muname").from_("mapunit")
        response = await client.execute(query)
    
        # Process row by row
        for row in response.data:
            yield row
  3. Delete unused DataFrames

    import gc
    
    df1 = response1.to_pandas()  # Large DataFrame
    # Process df1
    del df1
    gc.collect()  # Force garbage collection
    
    df2 = response2.to_pandas()  # Load next dataset

Installation and Import Issues

Issue: “ModuleNotFoundError: No module named ‘soildb’”

Symptoms: - Import fails on fresh installation - Can’t find soildb package - Virtual environment not activated

Solutions:

  1. Install the package

    pip install soildb
    # or from development
    pip install -e .
  2. Activate virtual environment

    # Linux/Mac
    source venv/bin/activate
    
    # Windows
    venv\Scripts\activate
  3. Check Python path

    import sys
    print(sys.path)
    
    # soildb should be in one of these paths
    import soildb
    print(soildb.__file__)

Issue: “Optional dependency not installed”

Symptoms: - ImportError: pandas not found (but pandas needed) - Some features not available

Solutions:

  1. Install optional dependencies

    # With pandas support
    pip install soildb[pandas]
    
    # With geopandas (includes pandas)
    pip install soildb[geopandas]
    
    # With polars
    pip install soildb[polars]
    
    # All extras
    pip install soildb[all]
  2. Check what’s installed

    pip list | grep -E 'soildb|pandas|polars|geo'

Getting Help

If you encounter issues not covered here:

  1. Check the test suite for examples

    • Look in tests/ directory for test cases similar to your issue
    • See how tests handle errors and edge cases
  2. Check existing issues on GitHub

    • Search py-soildb issues
    • Filter by labels like “bug”, “question”, “help wanted”
  3. Run in debug mode

    import logging
    logging.basicConfig(level=logging.DEBUG)
    logger = logging.getLogger("soildb")
    logger.setLevel(logging.DEBUG)
    
    # Now run your code with detailed logging
    result = await client.execute(query)
  4. Create a minimal reproducible example

    import asyncio
    from soildb import SDAClient, Query
    
    async def minimal_example():
        async with SDAClient() as client:
            query = Query().select("mukey").from_("mapunit").limit(1)
            result = await client.execute(query)
            return result.to_dict()
    
    asyncio.run(minimal_example())
  5. Report the issue with:

    • Your Python version: python --version
    • soildb version: pip show soildb
    • Operating system
    • Minimal reproducible code
    • Full error traceback
    • What you expected vs. what happened