ArchitectureΒΆ
OverviewΒΆ
Ryoma is organized into three distinct packages with clear separation of concerns:
ryoma/
βββ packages/
β βββ ryoma_data/ # Data layer: Connectors & profiling
β βββ ryoma_ai/ # AI layer: LLM agents & analysis
β βββ ryoma_lab/ # UI layer: Interactive interfaces
Package ArchitectureΒΆ
1. ryoma_data (Data Layer)ΒΆ
Core Components:
ryoma_data/
βββ base.py # Abstract DataSource interface
βββ sql.py # Unified SQL datasource
βββ metadata.py # Catalog, Schema, Table, Column models
βββ profiler.py # Statistical profiling
βββ factory.py # Datasource factory
Capabilities:
Database connections and query execution
Schema introspection and catalog management
Statistical profiling (row counts, null %, distinct ratios, etc.)
Data quality scoring
LSH-based column similarity
Semantic type inference (rule-based)
2. ryoma_ai (AI Layer)ΒΆ
Core Components:
ryoma_ai/
βββ agent/ # AI agents
β βββ sql.py # SQL generation agents
β βββ workflow.py # Base workflow agent
β βββ internals/ # Specialized agents
β βββ enhanced_sql_agent.py
β βββ reforce_sql_agent.py
β βββ query_planner.py
β βββ schema_linking_agent.py
β βββ sql_error_handler.py
β βββ sql_safety_validator.py
β βββ metadata_manager.py
βββ profiling/ # LLM-enhanced profiling
β βββ llm_enhancer.py
βββ tool/ # Agent tools
β βββ sql_tool.py
β βββ pandas_tool.py
β βββ preprocess_tools.py
βββ llm/ # LLM provider abstractions
β βββ provider.py
βββ utils/ # Utilities
βββ datasource_utils.py
Capabilities:
LLM-based metadata enhancement
SQL generation from natural language
Multi-step query planning
Schema linking and relationship analysis
SQL error handling and recovery
Query safety validation
Agent orchestration
3. ryoma_lab (UI Layer)ΒΆ
Interactive user interfaces built with Reflex.
Design PatternsΒΆ
Unified Datasource PatternΒΆ
from ryoma_data import DataSource
# Single class for all SQL databases
datasource = DataSource(
"postgres",
host="localhost",
database="mydb"
)
Separation of ProfilingΒΆ
from ryoma_data import DataSource, DatabaseProfiler
# Statistical profiling (data layer)
datasource = DataSource("postgres", connection_string="...")
profiler = DatabaseProfiler(sample_size=10000)
profile = profiler.profile_table(datasource, "customers")
# LLM enhancement (AI layer)
from ryoma_ai.profiling import LLMProfileEnhancer
enhancer = LLMProfileEnhancer(model=llm_model)
enhanced = enhancer.generate_field_description(profile, "customers")
Type GuardsΒΆ
from ryoma_ai.utils import is_sql_datasource, ensure_sql_datasource
# Type guard for static type checking
if is_sql_datasource(datasource):
result = datasource.query(sql)
# Runtime validation
sql_ds = ensure_sql_datasource(datasource)
result = sql_ds.query(sql)
Data FlowΒΆ
Basic Query FlowΒΆ
User Question
β
[ryoma_ai Agent]
β
Schema Analysis (ryoma_ai)
β
SQL Generation (ryoma_ai)
β
Safety Validation (ryoma_ai)
β
Query Execution (ryoma_data)
β
Result Processing (ryoma_ai)
β
User Response
Enhanced Profiling FlowΒΆ
Database
β
[ryoma_data] Statistical Profiling
ββ Row counts
ββ NULL percentages
ββ Distinct ratios
ββ Top-k values
ββ Basic semantic types
β
[ryoma_ai] LLM Enhancement
ββ Natural language descriptions
ββ Business purpose analysis
ββ SQL generation hints
ββ Join candidate scoring
β
Enhanced Metadata
Extension PointsΒΆ
Adding New DatasourcesΒΆ
Use the unified
DataSourceclass:
from ryoma_data import DataSource
# New database automatically supported through Ibis
datasource = DataSource(
"clickhouse",
host="localhost",
port=9000
)
Register in factory if needed:
from ryoma_data.factory import DataSourceFactory
datasource = DataSourceFactory.create(
"new_backend",
**connection_params
)
Adding New AI FeaturesΒΆ
Create new modules in ryoma_ai that use the DataSource interface:
from ryoma_data import DataSource
class MyAIFeature:
def __init__(self, datasource: DataSource, model):
self.datasource = datasource
self.model = model
def analyze(self):
catalog = self.datasource.get_catalog()
# Use LLM to analyze catalog
pass
Performance ConsiderationsΒΆ
Lazy LoadingΒΆ
Datasources lazy-load database drivers
Only import what you need
CachingΒΆ
LLM responses cached by default
Statistical profiles can be cached
Catalog information can be cached
Batch ProcessingΒΆ
Batch profiling modes supported
Parallel processing for large databases
Configurable worker pools
SecurityΒΆ
SQL Safety (AI Layer)ΒΆ
Query validation before execution
Configurable safety rules
Result size limits
Connection Security (Data Layer)ΒΆ
Secure credential handling
Connection pooling
Timeout enforcement
ReferencesΒΆ
Paper: βAutomatic Metadata Extraction for Text-to-SQLβ
Paper: βReFoRCE: A Text-to-SQL Agent with Self-Refinementβ
LangChain: Agent framework
Ibis: Database abstraction layer