Troubleshooting

Solutions to common issues with py-soildb

Connectivity Issues

Issue: “Failed to connect to USDA Soil Data Access service”

Symptoms: - SDAConnectionError thrown - Can’t reach SDA endpoint - Network timeouts

Solutions:

Check your internet connection
```
ping wcc.nrcs.usda.gov
```

Try a different network (if on corporate network with proxy)

from soildb import SDAClient
from soildb.base_client import ClientConfig

# Configure proxy if needed
config = ClientConfig(
    timeout=30,
    retry_count=3,
    retry_delay=1
)
async with SDAClient(config=config) as client:
    result = await client.execute(query)

Increase timeout for slow connections

async with SDAClient(timeout=60) as client:  # 60 seconds instead of default
    result = await client.execute(query)

Check if SDA is under maintenance
- SDA undergoes daily maintenance from 12:45 AM to 1 AM Central Time
- Check SDA status page
- Try again after maintenance window closes

Issue: “Request timed out”

Symptoms: - SDATimeoutError thrown - Queries taking longer than expected - Works for simple queries but not complex ones

Solutions:

Increase timeout threshold

async with SDAClient(timeout=120) as client:
    result = await client.execute(query)

Break query into smaller chunks

from soildb import fetch_by_keys

# Instead of fetching all components at once:
# Bad: Too large for one query
# all_components = await fetch_by_keys(all_mukeys, "component", "mukey")

# Good: Break into batches
batch_size = 100
all_components = []
for i in range(0, len(mukeys), batch_size):
    batch = mukeys[i:i+batch_size]
    components = await fetch_by_keys(batch, "component", "mukey")
    all_components.extend(components)

Simplify your query

from soildb import Query

# Bad: Requesting too many columns and complex conditions
complex_query = Query().select(
    "mukey", "muname", "muacres", "mapunitsymbol",
    "slopegradient", "flodfreqdcd", "comp_id", "coname",
    "localphase", "comppct_r", "taxclname", "chorizonkey",
    "hzname", "hzdept_r", "hzdepb_r", "sandtotal_r", "silttotal_r"
).from_("mapunit").inner_join("component").inner_join("chorizon").where(
    "areasymbol IN ('IA001', 'IA003', 'IA005', 'IA007')"
)

# Good: Start with essential columns
simple_query = Query().select(
    "mukey", "muname"
).from_("mapunit").where("areasymbol = 'IA001'")

Use pagination for large result sets

from soildb import Query

query = Query().select("mukey", "muname").from_("mapunit").limit(100)
result = await client.execute(query)

# Get first 100 rows, then increment offset for next batch
offset = 0
all_data = []
while offset < total_count:
    paginated = Query().select("mukey", "muname").from_("mapunit").limit(100).offset(offset)
    batch = await client.execute(paginated)
    all_data.extend(batch)
    offset += 100

Query Issues

Issue: “Query failed: Column ‘X’ does not exist”

Symptoms: - SDAQueryError with message about invalid column - Query syntax looks correct - Works in SSURGO but not in soildb

Solutions:

Check column names (case-sensitive, no spaces)

# Bad: Wrong column names
bad_query = Query().select("muacres").from_("mapunit")

# Good: Correct SSURGO names
good_query = Query().select("muacres").from_("mapunit")

Use Query builder for safety

from soildb import Query

# Use fluent API instead of raw SQL
query = (Query()
    .select("mukey", "muname", "muacres")
    .from_("mapunit")
    .where("areasymbol = 'IA109'")
    .limit(10))

Inspect table structure

from soildb import Query, SDAClient

# Get sample data to see available columns
async with SDAClient() as client:
    query = Query().select("*").from_("mapunit").limit(1)
    result = await client.execute(query)
    print(f"Available columns: {result.columns}")

Issue: “No results returned”

Symptoms: - Query executes successfully but returns empty DataFrame - No error messages - Uncertain why no data

Solutions:

Verify area symbol exists

from soildb import get_sacatalog

# List all valid survey areas
catalog = await get_sacatalog()
print(catalog.to_pandas())

# Check specific area symbol
ia_areas = catalog[catalog['areasymbol'].str.contains('IA')]

Check your WHERE conditions

from soildb import Query

# Too restrictive conditions?
# Bad: Might return nothing if no exact matches
restrictive = Query().select("*").from_("mapunit").where(
    "areasymbol = 'IA109' AND muacres > 10000"
)

# Good: Start broad then filter
broad = Query().select("mukey", "muname", "muacres").from_("mapunit").where(
    "areasymbol = 'IA109'"
).limit(100)
result = await client.execute(broad)
print(result.to_pandas())

Verify data exists for your criteria

from soildb import point_query

# Check if data exists at a location
response = await point_query(-93.6, 42.0, "mupolygon")
if response.data:
    print(f"Found {len(response.data)} mapunits")
else:
    print("No mapunits found at this location")

Issue: Query too complex for SDA

Symptoms: - Complex joins cause timeouts - Multiple table joins fail - Performance degrades with more conditions

Solutions:

Use existing templates for complex queries

from soildb import query_templates, fetch_by_keys

# Use pre-built templates instead of writing complex joins
query = query_templates.query_mapunits_by_legend("IA109")

# For components, use fetch_by_keys with specific mukeys
response = await fetch_by_keys([mukey1, mukey2], "component", key_column="mukey")

Execute separate queries instead of complex joins

# Bad: Complex multi-table join
# complex_query = Query().select(...).from_("mapunit").inner_join("component")...

# Good: Multiple simple queries
from soildb import fetch_by_keys

# Step 1: Get mapunits
mapunits = await client.execute(
    Query().select("mukey", "muname").from_("mapunit").where("areasymbol = 'IA109'")
)

# Step 2: Get components for those mapunits
mukeys = [row['mukey'] for row in mapunits.rows]
components = await fetch_by_keys(mukeys, "component", "mukey")

Data and Validation Issues

Issue: “SoilProfileCollection validation failed”

Symptoms: - SPCValidationError when converting response to SoilProfileCollection - Invalid depth values detected - Horizon stacking issues

Solutions:

Check for invalid depths

from soildb.spc_validator import validate_soil_profile_data

# Validate before converting to SPC
response = await client.execute(query)
data = response.to_dict()

validation_result = validate_soil_profile_data(data)
if validation_result.is_valid:
    spc = response.to_soilprofilecollection()
else:
    print("Validation errors:", validation_result.errors)

Review depth value requirements

Top depth (hzdept_r): Must be numeric and >= 0
Bottom depth (hzdepb_r): Must be numeric and > top depth
No “missing” values allowed

# Check data for common issues
df = response.to_pandas()

# Find negative depths
negative = df[df['hzdept_r'] < 0]
if not negative.empty:
    print("Found negative depths:", negative)

# Find invalid stacking (bottom <= top)
invalid_stack = df[df['hzdepb_r'] <= df['hzdept_r']]
if not invalid_stack.empty:
    print("Found invalid stacking:", invalid_stack)

# Find missing values
missing = df[df['hzdept_r'].isna() | df['hzdepb_r'].isna()]
if not missing.empty:
    print("Found missing depths:", missing)

Handle gaps between horizons

# SPC may flag gaps between horizons
# This is not necessarily an error, but indicates discontinuity
try:
    spc = response.to_soilprofilecollection()
except SPCValidationError as e:
    print(f"Validation issues: {e}")
    # May be fixable with SPC preprocessing

Issue: “Type conversion error”

Symptoms: - TypeError when converting data types - Can’t convert to expected format - Column type mismatch

Solutions:

Check type specifications

from soildb import SDAClient
from soildb.type_processors import get_type_for_column

# Verify expected types
async with SDAClient() as client:
    mukey_type = get_type_for_column("mapunit", "mukey")
    print(f"mukey type: {mukey_type}")

Handle optional fields

import pandas as pd

df = response.to_pandas()

# Some columns might be nullable
# Convert gracefully
if 'muacres' in df.columns:
    df['muacres'] = pd.to_numeric(df['muacres'], errors='coerce')

Check data before processing

# Inspect data types
df = response.to_pandas()
print(df.dtypes)

# Look for unexpected values
for col in df.columns:
    unique_vals = df[col].unique()
    if len(unique_vals) < 20:
        print(f"{col}: {unique_vals}")

Performance Issues

Issue: “Slow response times”

Symptoms: - Queries taking > 30 seconds - High memory usage - Program becomes unresponsive

Solutions:

Use async/await for concurrent requests

import asyncio
from soildb import SDAClient, Query

async def fetch_multiple_areas(area_symbols):
    """Fetch data for multiple areas concurrently."""
    async with SDAClient() as client:
        tasks = []
        for symbol in area_symbols:
            query = Query().select("*").from_("mapunit").where(
                f"areasymbol = '{symbol}'"
            )
            tasks.append(client.execute(query))

        # All requests happen concurrently
        results = await asyncio.gather(*tasks)
        return results

Batch large operations

from soildb import fetch_by_keys

# Don't fetch too many keys at once
# Bad: Slow: All at once
# all_results = await fetch_by_keys(10000_keys, "component", "mukey")

# Good: Fast: In batches
batch_size = 100
all_results = []
for i in range(0, len(keys), batch_size):
    batch = keys[i:i+batch_size]
    results = await fetch_by_keys(batch, "component", "mukey")
    all_results.extend(results)
    print(f"Processed {i+batch_size}/{len(keys)}")

Cache results when appropriate

from functools import lru_cache

@lru_cache(maxsize=128)
def get_cached_mapunit(mukey):
    """Cache mapunit lookups."""
    # Note: Works with sync API; for async use a different approach
    pass

# For async, use a dict cache:
mapunit_cache = {}

async def get_mapunit_cached(mukey):
    if mukey not in mapunit_cache:
        query = Query().select("*").from_("mapunit").where(f"mukey = {mukey}")
        result = await client.execute(query)
        mapunit_cache[mukey] = result.rows[0]
    return mapunit_cache[mukey]

Select only needed columns

from soildb import Query

# Bad: Slow: Select all columns
# query = Query().select("*").from_("chorizon").limit(10000)

# Good: Fast: Select only needed columns
query = Query().select("chorizonkey", "cokey", "hzname").from_("chorizon").limit(10000)

Use spatial queries for large regions

from soildb import bbox_query, spatial_query

# For large areas, use spatial queries
# This is faster than area symbol filtering
response = await bbox_query(
    bbox=(-95, 40, -94, 41),  # (west, south, east, north)
    feature_type="mupolygon"
)

Issue: “Out of memory”

Symptoms: - Memory usage grows without limit - Application crashes - Large DataFrames cause issues

Solutions:

Process data in chunks

from soildb import fetch_by_keys

async def process_large_dataset():
    # Process in chunks instead of loading all at once
    chunk_size = 1000

    for i in range(0, len(all_keys), chunk_size):
        batch = all_keys[i:i+chunk_size]
        results = await fetch_by_keys(batch, "component", "mukey")

        # Process batch immediately
        process_results(results)

        # Don't keep all in memory
        del results

Use generators for streaming

async def stream_large_result():
    """Stream results instead of loading all."""
    query = Query().select("mukey", "muname").from_("mapunit")
    response = await client.execute(query)

    # Process row by row
    for row in response.data:
        yield row

Delete unused DataFrames

import gc

df1 = response1.to_pandas()  # Large DataFrame
# Process df1
del df1
gc.collect()  # Force garbage collection

df2 = response2.to_pandas()  # Load next dataset

Installation and Import Issues

Issue: “ModuleNotFoundError: No module named ‘soildb’”

Symptoms: - Import fails on fresh installation - Can’t find soildb package - Virtual environment not activated

Solutions:

Install the package

pip install soildb
# or from development
pip install -e .

Activate virtual environment

# Linux/Mac
source venv/bin/activate

# Windows
venv\Scripts\activate

Check Python path

import sys
print(sys.path)

# soildb should be in one of these paths
import soildb
print(soildb.__file__)

Issue: “Optional dependency not installed”

Symptoms: - ImportError: pandas not found (but pandas needed) - Some features not available

Solutions:

Install optional dependencies

# With pandas support
pip install soildb[pandas]

# With geopandas (includes pandas)
pip install soildb[geopandas]

# With polars
pip install soildb[polars]

# All extras
pip install soildb[all]

Check what’s installed

pip list | grep -E 'soildb|pandas|polars|geo'

Getting Help

If you encounter issues not covered here:

Check the test suite for examples
- Look in tests/ directory for test cases similar to your issue
- See how tests handle errors and edge cases
Check existing issues on GitHub
- Search py-soildb issues
- Filter by labels like “bug”, “question”, “help wanted”

Run in debug mode

import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("soildb")
logger.setLevel(logging.DEBUG)

# Now run your code with detailed logging
result = await client.execute(query)

Create a minimal reproducible example

import asyncio
from soildb import SDAClient, Query

async def minimal_example():
    async with SDAClient() as client:
        query = Query().select("mukey").from_("mapunit").limit(1)
        result = await client.execute(query)
        return result.to_dict()

asyncio.run(minimal_example())

Report the issue with:
- Your Python version: python --version
- soildb version: pip show soildb
- Operating system
- Minimal reproducible code
- Full error traceback
- What you expected vs. what happened