🧠 Models¶
Ryoma supports multiple LLM providers and model configurations for different use cases and requirements.
🎯 Supported Providers¶
🏢 Provider |
🤖 Models |
🔧 Features |
---|---|---|
OpenAI |
GPT-4, GPT-3.5-turbo |
Function calling, streaming |
Anthropic |
Claude-3 Sonnet/Haiku |
Large context, safety |
Local (Ollama) |
CodeLlama, Mistral |
Privacy, cost control |
Azure OpenAI |
GPT-4, GPT-3.5 |
Enterprise features |
🚀 Quick Start¶
OpenAI Models¶
from ryoma_ai.agent.sql import SqlAgent
# GPT-4 (recommended for production)
agent = SqlAgent(
model="gpt-4",
model_parameters={
"temperature": 0.1,
"max_tokens": 2000
}
)
# GPT-3.5-turbo (cost-effective)
agent = SqlAgent(
model="gpt-3.5-turbo",
model_parameters={
"temperature": 0.1,
"max_tokens": 1500
}
)
Anthropic Claude¶
# Claude-3 Sonnet (balanced performance)
agent = SqlAgent(
model="claude-3-sonnet-20240229",
model_parameters={
"temperature": 0.1,
"max_tokens": 4000
}
)
# Claude-3 Haiku (fast and efficient)
agent = SqlAgent(
model="claude-3-haiku-20240307",
model_parameters={
"temperature": 0.1,
"max_tokens": 2000
}
)
Local Models (Ollama)¶
from ryoma_ai.models.ollama import OllamaModel
# CodeLlama for SQL generation
model = OllamaModel(
model_name="codellama:13b",
base_url="http://localhost:11434"
)
agent = SqlAgent(model=model, mode="enhanced")
⚙️ Configuration¶
Model Parameters¶
Temperature¶
Controls randomness in model outputs:
# Conservative (recommended for SQL)
model_parameters = {"temperature": 0.1}
# Balanced
model_parameters = {"temperature": 0.3}
# Creative (not recommended for SQL)
model_parameters = {"temperature": 0.7}
Max Tokens¶
Controls maximum response length:
# Simple queries
model_parameters = {"max_tokens": 1000}
# Complex analysis
model_parameters = {"max_tokens": 2000}
# Detailed explanations
model_parameters = {"max_tokens": 4000}
Advanced Parameters¶
model_parameters = {
"temperature": 0.1,
"max_tokens": 2000,
"top_p": 0.9, # Nucleus sampling
"frequency_penalty": 0.1, # Reduce repetition
"presence_penalty": 0.0, # Encourage new topics
"stop": ["```", "END"] # Stop sequences
}
Provider-Specific Configuration¶
OpenAI¶
import os
from ryoma_ai.models.openai import OpenAIModel
# Set API key
os.environ["OPENAI_API_KEY"] = "your-api-key"
# Custom configuration
model = OpenAIModel(
model_name="gpt-4",
api_key="your-api-key",
organization="your-org-id",
base_url="https://api.openai.com/v1", # Custom endpoint
timeout=60,
max_retries=3
)
Anthropic¶
import os
from ryoma_ai.models.anthropic import AnthropicModel
# Set API key
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
# Custom configuration
model = AnthropicModel(
model_name="claude-3-sonnet-20240229",
api_key="your-api-key",
timeout=60,
max_retries=3
)
Azure OpenAI¶
from ryoma_ai.models.azure_openai import AzureOpenAIModel
model = AzureOpenAIModel(
deployment_name="gpt-4-deployment",
api_key="your-azure-key",
api_base="https://your-resource.openai.azure.com/",
api_version="2024-02-01"
)
Ollama (Local)¶
from ryoma_ai.models.ollama import OllamaModel
# Local Ollama instance
model = OllamaModel(
model_name="codellama:13b",
base_url="http://localhost:11434",
timeout=120, # Longer timeout for local models
keep_alive="5m" # Keep model loaded
)
# Remote Ollama instance
model = OllamaModel(
model_name="mistral:7b",
base_url="http://your-server:11434",
headers={"Authorization": "Bearer your-token"}
)
🎯 Model Selection Guide¶
By Use Case¶
Production SQL Generation¶
Recommended: GPT-4 or Claude-3 Sonnet
agent = SqlAgent(
model="gpt-4",
mode="enhanced",
model_parameters={"temperature": 0.1}
)
Development and Testing¶
Recommended: GPT-3.5-turbo or Claude-3 Haiku
agent = SqlAgent(
model="gpt-3.5-turbo",
mode="basic",
model_parameters={"temperature": 0.2}
)
Privacy-Sensitive Environments¶
Recommended: Local models via Ollama
model = OllamaModel("codellama:13b")
agent = SqlAgent(model=model, mode="enhanced")
Cost-Optimized Deployment¶
Recommended: GPT-3.5-turbo with caching
agent = SqlAgent(
model="gpt-3.5-turbo",
mode="enhanced",
model_parameters={"temperature": 0.1},
enable_caching=True
)
By Performance Requirements¶
🎯 Requirement |
🥇 Best Choice |
🥈 Alternative |
---|---|---|
Accuracy |
GPT-4 |
Claude-3 Sonnet |
Speed |
GPT-3.5-turbo |
Claude-3 Haiku |
Cost |
Local models |
GPT-3.5-turbo |
Privacy |
Ollama |
Azure OpenAI |
Context |
Claude-3 |
GPT-4 |
🔧 Advanced Features¶
Model Switching¶
# Switch models based on query complexity
def get_model_for_complexity(complexity):
if complexity == "high":
return "gpt-4"
elif complexity == "medium":
return "gpt-3.5-turbo"
else:
return "claude-3-haiku-20240307"
# Dynamic model selection
query_plan = agent.get_query_plan(question)
model = get_model_for_complexity(query_plan["complexity"])
agent.set_model(model)
Model Ensembling¶
# Use multiple models for consensus
from ryoma_ai.models.ensemble import EnsembleModel
ensemble = EnsembleModel([
"gpt-4",
"claude-3-sonnet-20240229",
"gpt-3.5-turbo"
])
agent = SqlAgent(
model=ensemble,
mode="reforce", # Consensus voting
ensemble_config={
"voting_strategy": "majority",
"confidence_threshold": 0.8
}
)
Caching and Optimization¶
# Enable response caching
agent = SqlAgent(
model="gpt-4",
enable_caching=True,
cache_config={
"ttl": 3600, # 1 hour
"max_size": 1000,
"cache_key_fields": ["question", "schema_hash"]
}
)
📊 Performance Monitoring¶
Track Model Usage¶
from ryoma_ai.monitoring import ModelMonitor
monitor = ModelMonitor()
agent = SqlAgent(
model="gpt-4",
monitor=monitor
)
# Get usage statistics
stats = monitor.get_stats()
print(f"Total tokens: {stats['total_tokens']}")
print(f"Average latency: {stats['avg_latency']:.2f}s")
print(f"Cost estimate: ${stats['estimated_cost']:.2f}")
A/B Testing Models¶
from ryoma_ai.testing import ModelABTest
# Compare model performance
ab_test = ModelABTest(
model_a="gpt-4",
model_b="claude-3-sonnet-20240229",
traffic_split=0.5
)
agent = SqlAgent(model=ab_test)
# Analyze results
results = ab_test.get_results()
print(f"Model A accuracy: {results['model_a']['accuracy']:.2%}")
print(f"Model B accuracy: {results['model_b']['accuracy']:.2%}")
🛡️ Security and Privacy¶
API Key Management¶
# Use environment variables
import os
os.environ["OPENAI_API_KEY"] = "your-key"
# Use key management service
from ryoma_ai.security import KeyManager
key_manager = KeyManager("aws-secrets-manager")
api_key = key_manager.get_key("openai-api-key")
model = OpenAIModel(api_key=api_key)
Data Privacy¶
# Local model for sensitive data
model = OllamaModel("codellama:13b") # No data leaves your infrastructure
# Or use privacy-focused providers
model = AnthropicModel(
model_name="claude-3-sonnet-20240229",
privacy_mode=True # Enhanced privacy settings
)
🎯 Best Practices¶
1. Choose Models Appropriately¶
Use GPT-4 for complex analysis
Use GPT-3.5-turbo for simple queries
Use local models for sensitive data
2. Optimize Parameters¶
Low temperature (0.1) for SQL generation
Appropriate max_tokens for your use case
Enable caching for repeated queries
3. Monitor Performance¶
Track token usage and costs
Monitor response latency
A/B test different models
4. Handle Failures Gracefully¶
Implement retry logic
Have fallback models
Log errors for debugging
5. Security Considerations¶
Rotate API keys regularly
Use environment variables
Consider local models for sensitive data