Store Architecture¶
Ryoma AI uses a unified store architecture that separates concerns while ensuring data consistency across all components.
🏗️ Three-Store Architecture¶
Overview¶
Ryoma implements a three-tier storage system:
graph TD
CLI[CLI Application] --> MS[Metadata Store]
CLI --> VS[Vector Store]
CLI --> DS[Data Sources]
MS --> AM[Agent Manager]
VS --> AM
DS --> AM
AM --> SA[SQL Agent]
AM --> PA[Python Agent]
AM --> DA[Data Analysis Agent]
AM --> CA[Chat Agent]
SA --> CS[Catalog Store]
PA --> CS
DA --> CS
CA --> CS
CS --> MS
CS --> VS
1. Metadata Store¶
Purpose: Stores structured metadata, configuration, and agent state
Types:
memory- In-memory store (default, no persistence)postgres- PostgreSQL-based store with persistenceredis- Redis-based store for distributed deployments
What it stores:
Agent configurations and state
Data source registrations
Catalog metadata and indexes
Session information
2. Vector Store¶
Purpose: Handles semantic search and embeddings for catalog optimization
Types:
chroma- File-based vector database (recommended for development)faiss- In-memory vector store for fast prototypingqdrant- Production vector databasepgvector- PostgreSQL with vector extension
What it stores:
Table and column embeddings for semantic search
Indexed catalog elements for fast retrieval
Query history embeddings for context
3. Data Sources¶
Purpose: Connects to actual databases and data sources
Types:
postgres- PostgreSQL databasesmysql- MySQL databasessqlite- SQLite databasessnowflake- Snowflake data warehousebigquery- Google BigQueryduckdb- DuckDB analytics database
🔄 Store Unification¶
The Problem¶
Previously, each component created independent store instances, leading to:
Data duplication across stores
Inconsistent state between agents
Circular dependencies between modules
Performance degradation
The Solution¶
The unified architecture ensures:
Single source of truth: All components share the same store instances
CLI coordination: CLI creates and distributes stores to all managers
No duplication: Agents receive stores from CLI, never create their own
Consistency: All agents see the same data and state
Implementation Pattern¶
# CLI creates unified stores
class RyomaAI:
def __init__(self):
# Create unified stores from configuration
self.meta_store = StoreFactory.create_store(**meta_config.to_factory_params())
self.vector_store = create_vector_store(config=vector_config, embedding_function=embedding)
# Pass unified stores to all managers
self.agent_manager = AgentManager(...)
self.command_handler = CommandHandler(
meta_store=self.meta_store,
vector_store=self.vector_store
)
# Agents receive stores from CLI
class BaseAgent:
def __init__(self, store=None, vector_store=None, **kwargs):
if store is None:
raise ValueError("store parameter is required - agents must receive stores from CLI")
self.store = store
self.vector_store = vector_store
⚙️ Configuration Structure¶
New Configuration Format¶
The configuration is now split into three distinct sections:
{
"model": "gpt-4o",
"mode": "enhanced",
"embedding_model": "text-embedding-ada-002",
"meta_store": {
"type": "memory",
"connection_string": null,
"options": {}
},
"vector_store": {
"type": "chroma",
"collection_name": "ryoma_vectors",
"dimension": 768,
"distance_metric": "cosine",
"extra_configs": {
"persist_directory": "./data/vectors"
}
},
"datasources": [
{
"name": "default",
"type": "postgres",
"host": "localhost",
"port": 5432,
"database": "mydb",
"user": "postgres",
"password": "password"
}
]
}
Configuration Examples¶
Development Setup¶
{
"meta_store": {"type": "memory"},
"vector_store": {"type": "chroma"},
"datasources": [{"type": "sqlite", "database": ":memory:"}]
}
Production Setup¶
{
"meta_store": {
"type": "postgres",
"connection_string": "postgresql://user:pass@prod-db:5432/metadata"
},
"vector_store": {
"type": "qdrant",
"extra_configs": {
"url": "http://qdrant-server:6333",
"api_key": "your-api-key"
}
},
"datasources": [
{
"name": "warehouse",
"type": "snowflake",
"account": "your-account",
"warehouse": "COMPUTE_WH"
}
]
}
🔧 Store Factory Pattern¶
Metadata Store Factory¶
from ryoma_ai.store.store_factory import StoreFactory
# Create store from configuration
store = StoreFactory.create_store(
store_type="postgres",
connection_string="postgresql://localhost:5432/metadata",
options={}
)
Supported Store Types¶
Type |
Description |
Use Case |
|---|---|---|
|
In-memory storage |
Development, testing |
|
PostgreSQL storage |
Production, persistence |
|
Redis storage |
Distributed, caching |
📊 Catalog Store Integration¶
Unified Catalog Indexing¶
The catalog store now uses the unified architecture:
# CatalogStore requires unified stores
catalog_store = CatalogStore(
metadata_store=unified_meta_store, # From CLI
vector_store=unified_vector_store # From CLI
)
# Indexing uses UnifiedCatalogIndexService
indexer = UnifiedCatalogIndexService(
metadata_store=unified_meta_store,
vector_store=unified_vector_store
)
Search Optimization¶
Catalog search is now optimized using indexed metadata:
# Fast semantic search without loading full catalog
relevant_catalog = catalog_store.search_relevant_catalog(
query="customer information",
top_k=10,
min_score=0.3
)
# Get table suggestions
suggestions = catalog_store.get_table_suggestions(
query="sales data",
max_tables=5
)
🛡️ Error Prevention¶
Store Validation¶
The architecture prevents common errors:
# Agents MUST receive stores from CLI
class BaseAgent:
def __init__(self, store=None, **kwargs):
if store is None:
raise ValueError("store parameter is required - agents must receive stores from CLI to ensure unified storage")
Circular Import Resolution¶
Moved exception classes to break circular dependencies:
CatalogIndexErrormoved fromryoma_ai.store.exceptionstoryoma_ai.catalog.exceptionsClean separation between catalog and store modules
🔄 Migration Guide¶
From Legacy Configuration¶
Old format:
{
"database": {
"type": "postgres",
"connection_string": "postgresql://..."
},
"default_datasource": {...},
"additional_datasources": [...]
}
New format:
{
"meta_store": {"type": "memory"},
"vector_store": {"type": "chroma"},
"datasources": [{...}]
}
Agent Initialization Updates¶
Old way:
agent = SqlAgent("gpt-4", mode="enhanced")
agent.add_datasource(datasource)
New way (CLI managed):
# CLI handles all initialization
ryoma-ai> show me the data
# Agents are created automatically with unified stores
New way (Programmatic):
# Must provide unified stores
agent = SqlAgent(
model="gpt-4",
mode="enhanced",
datasource=datasource,
store=unified_meta_store, # Required
vector_store=unified_vector_store # Optional but recommended
)
🏎️ Performance Benefits¶
Before Unification¶
Multiple store instances per agent
Duplicate data storage
Inconsistent state
Higher memory usage
After Unification¶
Single store instance shared by all agents
Centralized data management
Consistent state across system
Optimized memory usage
Faster catalog operations through indexing
🔍 Debugging Store Issues¶
Check Store Status¶
ryoma-ai> /config
# Shows all store configurations
ryoma-ai> /agents
# Shows which agents are using stores
Common Issues¶
“Store parameter is required” Error:
Agents must receive stores from CLI
Never create agents directly in production
Use CLI or pass unified stores explicitly
Vector search fails:
Run
/index-catalogfirstEnsure vector store is configured
Check embedding model configuration
Circular import errors:
Import from correct exception modules
Use lazy imports if needed
Check module dependency order