Architecture¶

Overview¶

Ryoma is organized into three distinct packages with clear separation of concerns:

ryoma/
├── packages/
│   ├── ryoma_data/      # Data layer: Connectors & profiling
│   ├── ryoma_ai/        # AI layer: LLM agents & analysis
│   └── ryoma_lab/       # UI layer: Interactive interfaces

Package Architecture¶

1. ryoma_data (Data Layer)¶

Core Components:

ryoma_data/
├── base.py              # Abstract DataSource interface
├── sql.py               # Unified SQL datasource
├── metadata.py          # Catalog, Schema, Table, Column models
├── profiler.py          # Statistical profiling
└── factory.py           # Datasource factory

Capabilities:

Database connections and query execution
Schema introspection and catalog management
Statistical profiling (row counts, null %, distinct ratios, etc.)
Data quality scoring
LSH-based column similarity
Semantic type inference (rule-based)

2. ryoma_ai (AI Layer)¶

Core Components:

ryoma_ai/
├── agent/               # AI agents
│   ├── sql.py          # SQL generation agents
│   ├── workflow.py     # Base workflow agent
│   └── internals/      # Specialized agents
│       ├── enhanced_sql_agent.py
│       ├── reforce_sql_agent.py
│       ├── query_planner.py
│       ├── schema_linking_agent.py
│       ├── sql_error_handler.py
│       ├── sql_safety_validator.py
│       └── metadata_manager.py
├── profiling/          # LLM-enhanced profiling
│   └── llm_enhancer.py
├── tool/               # Agent tools
│   ├── sql_tool.py
│   ├── pandas_tool.py
│   └── preprocess_tools.py
├── llm/                # LLM provider abstractions
│   └── provider.py
└── utils/              # Utilities
    └── datasource_utils.py

Capabilities:

LLM-based metadata enhancement
SQL generation from natural language
Multi-step query planning
Schema linking and relationship analysis
SQL error handling and recovery
Query safety validation
Agent orchestration

3. ryoma_lab (UI Layer)¶

Interactive user interfaces built with Reflex.

Design Patterns¶

Unified Datasource Pattern¶

from ryoma_data import DataSource

# Single class for all SQL databases
datasource = DataSource(
    "postgres",
    host="localhost",
    database="mydb"
)

Separation of Profiling¶

from ryoma_data import DataSource, DatabaseProfiler

# Statistical profiling (data layer)
datasource = DataSource("postgres", connection_string="...")
profiler = DatabaseProfiler(sample_size=10000)
profile = profiler.profile_table(datasource, "customers")

# LLM enhancement (AI layer)
from ryoma_ai.profiling import LLMProfileEnhancer
enhancer = LLMProfileEnhancer(model=llm_model)
enhanced = enhancer.generate_field_description(profile, "customers")

Type Guards¶

from ryoma_ai.utils import is_sql_datasource, ensure_sql_datasource

# Type guard for static type checking
if is_sql_datasource(datasource):
    result = datasource.query(sql)

# Runtime validation
sql_ds = ensure_sql_datasource(datasource)
result = sql_ds.query(sql)

Data Flow¶

Basic Query Flow¶

User Question
     ↓
[ryoma_ai Agent]
     ↓
Schema Analysis (ryoma_ai)
     ↓
SQL Generation (ryoma_ai)
     ↓
Safety Validation (ryoma_ai)
     ↓
Query Execution (ryoma_data)
     ↓
Result Processing (ryoma_ai)
     ↓
User Response

Enhanced Profiling Flow¶

Database
     ↓
[ryoma_data] Statistical Profiling
     ├─ Row counts
     ├─ NULL percentages
     ├─ Distinct ratios
     ├─ Top-k values
     └─ Basic semantic types
     ↓
[ryoma_ai] LLM Enhancement
     ├─ Natural language descriptions
     ├─ Business purpose analysis
     ├─ SQL generation hints
     └─ Join candidate scoring
     ↓
Enhanced Metadata

Extension Points¶

Adding New Datasources¶

Use the unified DataSource class:

from ryoma_data import DataSource

# New database automatically supported through Ibis
datasource = DataSource(
    "clickhouse",
    host="localhost",
    port=9000
)

from ryoma_data.factory import DataSourceFactory

datasource = DataSourceFactory.create(
    "new_backend",
    **connection_params
)

Adding New AI Features¶

Create new modules in ryoma_ai that use the DataSource interface:

from ryoma_data import DataSource

class MyAIFeature:
    def __init__(self, datasource: DataSource, model):
        self.datasource = datasource
        self.model = model

    def analyze(self):
        catalog = self.datasource.get_catalog()
        # Use LLM to analyze catalog
        pass

Performance Considerations¶

Lazy Loading¶

Datasources lazy-load database drivers
Only import what you need

Caching¶

LLM responses cached by default
Statistical profiles can be cached
Catalog information can be cached

Batch Processing¶

Batch profiling modes supported
Parallel processing for large databases
Configurable worker pools

Security¶

SQL Safety (AI Layer)¶

Query validation before execution
Configurable safety rules
Result size limits

Connection Security (Data Layer)¶

Secure credential handling
Connection pooling
Timeout enforcement

References¶

Paper: “Automatic Metadata Extraction for Text-to-SQL”
Paper: “ReFoRCE: A Text-to-SQL Agent with Self-Refinement”
LangChain: Agent framework
Ibis: Database abstraction layer