ArchitectureΒΆ

OverviewΒΆ

Ryoma is organized into three distinct packages with clear separation of concerns:

ryoma/
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ ryoma_data/      # Data layer: Connectors & profiling
β”‚   β”œβ”€β”€ ryoma_ai/        # AI layer: LLM agents & analysis
β”‚   └── ryoma_lab/       # UI layer: Interactive interfaces

Package ArchitectureΒΆ

1. ryoma_data (Data Layer)ΒΆ

Core Components:

ryoma_data/
β”œβ”€β”€ base.py              # Abstract DataSource interface
β”œβ”€β”€ sql.py               # Unified SQL datasource
β”œβ”€β”€ metadata.py          # Catalog, Schema, Table, Column models
β”œβ”€β”€ profiler.py          # Statistical profiling
└── factory.py           # Datasource factory

Capabilities:

  • Database connections and query execution

  • Schema introspection and catalog management

  • Statistical profiling (row counts, null %, distinct ratios, etc.)

  • Data quality scoring

  • LSH-based column similarity

  • Semantic type inference (rule-based)

2. ryoma_ai (AI Layer)ΒΆ

Core Components:

ryoma_ai/
β”œβ”€β”€ agent/               # AI agents
β”‚   β”œβ”€β”€ sql.py          # SQL generation agents
β”‚   β”œβ”€β”€ workflow.py     # Base workflow agent
β”‚   └── internals/      # Specialized agents
β”‚       β”œβ”€β”€ enhanced_sql_agent.py
β”‚       β”œβ”€β”€ reforce_sql_agent.py
β”‚       β”œβ”€β”€ query_planner.py
β”‚       β”œβ”€β”€ schema_linking_agent.py
β”‚       β”œβ”€β”€ sql_error_handler.py
β”‚       β”œβ”€β”€ sql_safety_validator.py
β”‚       └── metadata_manager.py
β”œβ”€β”€ profiling/          # LLM-enhanced profiling
β”‚   └── llm_enhancer.py
β”œβ”€β”€ tool/               # Agent tools
β”‚   β”œβ”€β”€ sql_tool.py
β”‚   β”œβ”€β”€ pandas_tool.py
β”‚   └── preprocess_tools.py
β”œβ”€β”€ llm/                # LLM provider abstractions
β”‚   └── provider.py
└── utils/              # Utilities
    └── datasource_utils.py

Capabilities:

  • LLM-based metadata enhancement

  • SQL generation from natural language

  • Multi-step query planning

  • Schema linking and relationship analysis

  • SQL error handling and recovery

  • Query safety validation

  • Agent orchestration

3. ryoma_lab (UI Layer)ΒΆ

Interactive user interfaces built with Reflex.

Design PatternsΒΆ

Unified Datasource PatternΒΆ

from ryoma_data import DataSource

# Single class for all SQL databases
datasource = DataSource(
    "postgres",
    host="localhost",
    database="mydb"
)

Separation of ProfilingΒΆ

from ryoma_data import DataSource, DatabaseProfiler

# Statistical profiling (data layer)
datasource = DataSource("postgres", connection_string="...")
profiler = DatabaseProfiler(sample_size=10000)
profile = profiler.profile_table(datasource, "customers")

# LLM enhancement (AI layer)
from ryoma_ai.profiling import LLMProfileEnhancer
enhancer = LLMProfileEnhancer(model=llm_model)
enhanced = enhancer.generate_field_description(profile, "customers")

Type GuardsΒΆ

from ryoma_ai.utils import is_sql_datasource, ensure_sql_datasource

# Type guard for static type checking
if is_sql_datasource(datasource):
    result = datasource.query(sql)

# Runtime validation
sql_ds = ensure_sql_datasource(datasource)
result = sql_ds.query(sql)

Data FlowΒΆ

Basic Query FlowΒΆ

User Question
     ↓
[ryoma_ai Agent]
     ↓
Schema Analysis (ryoma_ai)
     ↓
SQL Generation (ryoma_ai)
     ↓
Safety Validation (ryoma_ai)
     ↓
Query Execution (ryoma_data)
     ↓
Result Processing (ryoma_ai)
     ↓
User Response

Enhanced Profiling FlowΒΆ

Database
     ↓
[ryoma_data] Statistical Profiling
     β”œβ”€ Row counts
     β”œβ”€ NULL percentages
     β”œβ”€ Distinct ratios
     β”œβ”€ Top-k values
     └─ Basic semantic types
     ↓
[ryoma_ai] LLM Enhancement
     β”œβ”€ Natural language descriptions
     β”œβ”€ Business purpose analysis
     β”œβ”€ SQL generation hints
     └─ Join candidate scoring
     ↓
Enhanced Metadata

Extension PointsΒΆ

Adding New DatasourcesΒΆ

  1. Use the unified DataSource class:

from ryoma_data import DataSource

# New database automatically supported through Ibis
datasource = DataSource(
    "clickhouse",
    host="localhost",
    port=9000
)
  1. Register in factory if needed:

from ryoma_data.factory import DataSourceFactory

datasource = DataSourceFactory.create(
    "new_backend",
    **connection_params
)

Adding New AI FeaturesΒΆ

Create new modules in ryoma_ai that use the DataSource interface:

from ryoma_data import DataSource

class MyAIFeature:
    def __init__(self, datasource: DataSource, model):
        self.datasource = datasource
        self.model = model

    def analyze(self):
        catalog = self.datasource.get_catalog()
        # Use LLM to analyze catalog
        pass

Performance ConsiderationsΒΆ

Lazy LoadingΒΆ

  • Datasources lazy-load database drivers

  • Only import what you need

CachingΒΆ

  • LLM responses cached by default

  • Statistical profiles can be cached

  • Catalog information can be cached

Batch ProcessingΒΆ

  • Batch profiling modes supported

  • Parallel processing for large databases

  • Configurable worker pools

SecurityΒΆ

SQL Safety (AI Layer)ΒΆ

  • Query validation before execution

  • Configurable safety rules

  • Result size limits

Connection Security (Data Layer)ΒΆ

  • Secure credential handling

  • Connection pooling

  • Timeout enforcement

ReferencesΒΆ

  • Paper: β€œAutomatic Metadata Extraction for Text-to-SQL”

  • Paper: β€œReFoRCE: A Text-to-SQL Agent with Self-Refinement”

  • LangChain: Agent framework

  • Ibis: Database abstraction layer