Store Architecture

Ryoma AI uses a unified store architecture that separates concerns while ensuring data consistency across all components.

🏗️ Three-Store Architecture

Overview

Ryoma implements a three-tier storage system:

graph TD
    CLI[CLI Application] --> MS[Metadata Store]
    CLI --> VS[Vector Store] 
    CLI --> DS[Data Sources]
    
    MS --> AM[Agent Manager]
    VS --> AM
    DS --> AM
    
    AM --> SA[SQL Agent]
    AM --> PA[Python Agent]
    AM --> DA[Data Analysis Agent]
    AM --> CA[Chat Agent]
    
    SA --> CS[Catalog Store]
    PA --> CS
    DA --> CS
    CA --> CS
    
    CS --> MS
    CS --> VS

1. Metadata Store

Purpose: Stores structured metadata, configuration, and agent state

Types:

  • memory - In-memory store (default, no persistence)

  • postgres - PostgreSQL-based store with persistence

  • redis - Redis-based store for distributed deployments

What it stores:

  • Agent configurations and state

  • Data source registrations

  • Catalog metadata and indexes

  • Session information

2. Vector Store

Purpose: Handles semantic search and embeddings for catalog optimization

Types:

  • chroma - File-based vector database (recommended for development)

  • faiss - In-memory vector store for fast prototyping

  • qdrant - Production vector database

  • pgvector - PostgreSQL with vector extension

What it stores:

  • Table and column embeddings for semantic search

  • Indexed catalog elements for fast retrieval

  • Query history embeddings for context

3. Data Sources

Purpose: Connects to actual databases and data sources

Types:

  • postgres - PostgreSQL databases

  • mysql - MySQL databases

  • sqlite - SQLite databases

  • snowflake - Snowflake data warehouse

  • bigquery - Google BigQuery

  • duckdb - DuckDB analytics database

🔄 Store Unification

The Problem

Previously, each component created independent store instances, leading to:

  • Data duplication across stores

  • Inconsistent state between agents

  • Circular dependencies between modules

  • Performance degradation

The Solution

The unified architecture ensures:

  • Single source of truth: All components share the same store instances

  • CLI coordination: CLI creates and distributes stores to all managers

  • No duplication: Agents receive stores from CLI, never create their own

  • Consistency: All agents see the same data and state

Implementation Pattern

# CLI creates unified stores
class RyomaAI:
    def __init__(self):
        # Create unified stores from configuration
        self.meta_store = StoreFactory.create_store(**meta_config.to_factory_params())
        self.vector_store = create_vector_store(config=vector_config, embedding_function=embedding)
        
        # Pass unified stores to all managers
        self.agent_manager = AgentManager(...)
        self.command_handler = CommandHandler(
            meta_store=self.meta_store,
            vector_store=self.vector_store
        )

# Agents receive stores from CLI
class BaseAgent:
    def __init__(self, store=None, vector_store=None, **kwargs):
        if store is None:
            raise ValueError("store parameter is required - agents must receive stores from CLI")
        self.store = store
        self.vector_store = vector_store

⚙️ Configuration Structure

New Configuration Format

The configuration is now split into three distinct sections:

{
  "model": "gpt-4o",
  "mode": "enhanced",
  "embedding_model": "text-embedding-ada-002",
  
  "meta_store": {
    "type": "memory",
    "connection_string": null,
    "options": {}
  },
  
  "vector_store": {
    "type": "chroma", 
    "collection_name": "ryoma_vectors",
    "dimension": 768,
    "distance_metric": "cosine",
    "extra_configs": {
      "persist_directory": "./data/vectors"
    }
  },
  
  "datasources": [
    {
      "name": "default",
      "type": "postgres", 
      "host": "localhost",
      "port": 5432,
      "database": "mydb",
      "user": "postgres",
      "password": "password"
    }
  ]
}

Configuration Examples

Development Setup

{
  "meta_store": {"type": "memory"},
  "vector_store": {"type": "chroma"},
  "datasources": [{"type": "sqlite", "database": ":memory:"}]
}

Production Setup

{
  "meta_store": {
    "type": "postgres",
    "connection_string": "postgresql://user:pass@prod-db:5432/metadata"
  },
  "vector_store": {
    "type": "qdrant",
    "extra_configs": {
      "url": "http://qdrant-server:6333",
      "api_key": "your-api-key"
    }
  },
  "datasources": [
    {
      "name": "warehouse",
      "type": "snowflake",
      "account": "your-account", 
      "warehouse": "COMPUTE_WH"
    }
  ]
}

🔧 Store Factory Pattern

Metadata Store Factory

from ryoma_ai.store.store_factory import StoreFactory

# Create store from configuration
store = StoreFactory.create_store(
    store_type="postgres",
    connection_string="postgresql://localhost:5432/metadata",
    options={}
)

Supported Store Types

Type

Description

Use Case

memory

In-memory storage

Development, testing

postgres

PostgreSQL storage

Production, persistence

redis

Redis storage

Distributed, caching

📊 Catalog Store Integration

Unified Catalog Indexing

The catalog store now uses the unified architecture:

# CatalogStore requires unified stores
catalog_store = CatalogStore(
    metadata_store=unified_meta_store,  # From CLI
    vector_store=unified_vector_store   # From CLI  
)

# Indexing uses UnifiedCatalogIndexService
indexer = UnifiedCatalogIndexService(
    metadata_store=unified_meta_store,
    vector_store=unified_vector_store
)

Search Optimization

Catalog search is now optimized using indexed metadata:

# Fast semantic search without loading full catalog
relevant_catalog = catalog_store.search_relevant_catalog(
    query="customer information",
    top_k=10,
    min_score=0.3
)

# Get table suggestions
suggestions = catalog_store.get_table_suggestions(
    query="sales data",
    max_tables=5
)

🛡️ Error Prevention

Store Validation

The architecture prevents common errors:

# Agents MUST receive stores from CLI
class BaseAgent:
    def __init__(self, store=None, **kwargs):
        if store is None:
            raise ValueError("store parameter is required - agents must receive stores from CLI to ensure unified storage")

Circular Import Resolution

Moved exception classes to break circular dependencies:

  • CatalogIndexError moved from ryoma_ai.store.exceptions to ryoma_ai.catalog.exceptions

  • Clean separation between catalog and store modules

🔄 Migration Guide

From Legacy Configuration

Old format:

{
  "database": {
    "type": "postgres",
    "connection_string": "postgresql://..."
  },
  "default_datasource": {...},
  "additional_datasources": [...]
}

New format:

{
  "meta_store": {"type": "memory"},
  "vector_store": {"type": "chroma"}, 
  "datasources": [{...}]
}

Agent Initialization Updates

Old way:

agent = SqlAgent("gpt-4", mode="enhanced")
agent.add_datasource(datasource)

New way (CLI managed):

# CLI handles all initialization
ryoma-ai> show me the data
# Agents are created automatically with unified stores

New way (Programmatic):

# Must provide unified stores
agent = SqlAgent(
    model="gpt-4",
    mode="enhanced",
    datasource=datasource,
    store=unified_meta_store,      # Required
    vector_store=unified_vector_store  # Optional but recommended
)

🏎️ Performance Benefits

Before Unification

  • Multiple store instances per agent

  • Duplicate data storage

  • Inconsistent state

  • Higher memory usage

After Unification

  • Single store instance shared by all agents

  • Centralized data management

  • Consistent state across system

  • Optimized memory usage

  • Faster catalog operations through indexing

🔍 Debugging Store Issues

Check Store Status

ryoma-ai> /config
# Shows all store configurations

ryoma-ai> /agents  
# Shows which agents are using stores

Common Issues

“Store parameter is required” Error:

  • Agents must receive stores from CLI

  • Never create agents directly in production

  • Use CLI or pass unified stores explicitly

Vector search fails:

  • Run /index-catalog first

  • Ensure vector store is configured

  • Check embedding model configuration

Circular import errors:

  • Import from correct exception modules

  • Use lazy imports if needed

  • Check module dependency order