💡 Examples¶
Real-world examples demonstrating Ryoma’s capabilities across different use cases and industries.
🏪 E-commerce Analytics¶
Customer Segmentation Analysis¶
from ryoma_ai.agent.sql import SqlAgent
from ryoma_ai.datasource.postgres import PostgresDataSource
# Connect to e-commerce database
datasource = PostgresDataSource(
connection_string="postgresql://user:pass@localhost:5432/ecommerce",
enable_profiling=True
)
agent = SqlAgent(model="gpt-4", mode="enhanced")
agent.add_datasource(datasource)
# Complex customer analysis
response = agent.stream("""
Segment customers based on their purchase behavior:
1. High-value customers (>$1000 total purchases)
2. Frequent buyers (>10 orders)
3. Recent customers (first purchase in last 30 days)
4. At-risk customers (no purchase in 90+ days)
Show the count and average order value for each segment.
""")
print(response)
Product Performance Dashboard¶
# Multi-dimensional product analysis
response = agent.stream("""
Create a product performance report showing:
- Top 10 products by revenue this quarter
- Products with declining sales (comparing this month vs last month)
- Inventory turnover rate by category
- Seasonal trends for each product category
Include percentage changes and rank products by profitability.
""")
📊 Financial Analysis¶
Revenue Forecasting¶
from ryoma_ai.datasource.snowflake import SnowflakeDataSource
# Connect to financial data warehouse
datasource = SnowflakeDataSource(
account="your-account",
database="FINANCE_DW",
enable_profiling=True
)
agent = SqlAgent(model="gpt-4", mode="reforce")
agent.add_datasource(datasource)
# Complex financial analysis
response = agent.stream("""
Analyze monthly recurring revenue (MRR) trends:
1. Calculate MRR growth rate for the last 12 months
2. Identify churn impact on revenue
3. Segment MRR by customer size (SMB, Mid-market, Enterprise)
4. Forecast next quarter MRR based on current trends
5. Show the contribution of new vs expansion revenue
Include confidence intervals for the forecast.
""")
Risk Assessment¶
# Credit risk analysis
response = agent.stream("""
Generate a credit risk report:
- Customers with overdue payments >30 days
- Payment pattern analysis by customer segment
- Default probability scoring based on historical data
- Recommended credit limits for new customers
- Geographic risk distribution
Rank customers by risk score and suggest actions.
""")
🏥 Healthcare Analytics¶
Patient Outcome Analysis¶
from ryoma_ai.datasource.bigquery import BigQueryDataSource
# Connect to healthcare data
datasource = BigQueryDataSource(
project_id="healthcare-analytics",
enable_profiling=True,
profiler_config={"sample_size": 50000} # Large dataset
)
agent = SqlAgent(
model="gpt-4",
mode="enhanced",
safety_config={
"enable_validation": True,
"max_rows": 100000,
"require_where_clause": True # Ensure data filtering
}
)
agent.add_datasource(datasource)
# HIPAA-compliant analysis
response = agent.stream("""
Analyze patient readmission patterns (anonymized data):
1. 30-day readmission rates by department
2. Common diagnoses leading to readmissions
3. Seasonal patterns in emergency visits
4. Average length of stay by condition type
5. Resource utilization efficiency metrics
Exclude any personally identifiable information.
""")
🏭 Manufacturing Operations¶
Supply Chain Optimization¶
# Manufacturing database with IoT sensor data
datasource = PostgresDataSource(
connection_string="postgresql://user:pass@localhost:5432/manufacturing",
enable_profiling=True
)
agent = SqlAgent(model="gpt-4", mode="enhanced")
agent.add_datasource(datasource)
# Complex supply chain analysis
response = agent.stream("""
Optimize our supply chain operations:
1. Identify bottlenecks in production lines
2. Calculate optimal inventory levels by component
3. Predict maintenance needs based on sensor data
4. Analyze supplier performance and delivery times
5. Recommend production schedule adjustments
Focus on reducing costs while maintaining quality standards.
""")
📱 SaaS Product Analytics¶
User Engagement Analysis¶
from ryoma_ai.agent.pandas import PandasAgent
import pandas as pd
# Load user activity data
df = pd.read_csv('user_activity.csv')
agent = PandasAgent("gpt-4")
agent.add_dataframe(df, name="user_activity")
# Comprehensive user analysis
response = agent.stream("""
Analyze user engagement patterns:
1. Daily/weekly/monthly active users trends
2. Feature adoption rates and user journey analysis
3. Cohort retention analysis by signup month
4. Identify power users vs casual users
5. Churn prediction based on activity patterns
Create actionable insights for product team.
""")
A/B Test Analysis¶
# A/B test results analysis
test_data = pd.read_csv('ab_test_results.csv')
agent.add_dataframe(test_data, name="ab_test")
response = agent.stream("""
Analyze A/B test results for new checkout flow:
1. Statistical significance of conversion rate difference
2. Segment analysis (mobile vs desktop, new vs returning users)
3. Revenue impact calculation
4. Confidence intervals and p-values
5. Recommendation on whether to ship the new feature
Include both statistical and business impact analysis.
""")
🎓 Educational Data Analysis¶
Student Performance Insights¶
# Educational database
datasource = PostgresDataSource(
connection_string="postgresql://user:pass@localhost:5432/education",
enable_profiling=True
)
agent = SqlAgent(model="gpt-4", mode="enhanced")
agent.add_datasource(datasource)
response = agent.stream("""
Analyze student performance and learning outcomes:
1. Grade distribution by subject and semester
2. Correlation between attendance and performance
3. Identify at-risk students early warning indicators
4. Course completion rates and dropout patterns
5. Teacher effectiveness metrics
Provide recommendations for improving student success.
""")
🌐 Multi-Database Analysis¶
Cross-Platform Integration¶
# Connect multiple data sources
postgres_ds = PostgresDataSource("postgresql://localhost:5432/sales")
snowflake_ds = SnowflakeDataSource(account="account", database="MARKETING")
agent = SqlAgent(model="gpt-4", mode="enhanced")
agent.add_datasource(postgres_ds, name="sales_db")
agent.add_datasource(snowflake_ds, name="marketing_db")
# Cross-database analysis
response = agent.stream("""
Analyze the complete customer journey:
1. Marketing campaign effectiveness (from marketing_db)
2. Sales conversion rates (from sales_db)
3. Customer lifetime value calculation
4. Attribution modeling across channels
5. ROI by marketing channel and campaign
Combine data from both databases for comprehensive insights.
""")
🔍 Advanced Profiling Examples¶
Database Health Check¶
# Comprehensive database profiling
datasource = PostgresDataSource(
connection_string="postgresql://localhost:5432/production",
enable_profiling=True,
profiler_config={
"sample_size": 20000,
"top_k": 15,
"enable_lsh": True
}
)
# Get detailed profiling information
profile = datasource.profile_table("customers")
print("📊 Table Profile:")
print(f"Rows: {profile['table_profile']['row_count']:,}")
print(f"Completeness: {profile['table_profile']['completeness_score']:.2%}")
print("\n📋 Column Quality:")
for col, prof in profile['column_profiles'].items():
quality = prof['data_quality_score']
null_pct = prof['null_percentage']
print(f"{col}: Quality {quality:.2f}, NULL {null_pct:.1f}%")
Schema Similarity Analysis¶
# Find similar columns across tables
similar_columns = datasource.find_similar_columns("customer_id", threshold=0.8)
print(f"🔗 Similar columns: {similar_columns}")
# Enhanced catalog with profiling
catalog = datasource.get_enhanced_catalog(include_profiles=True)
for schema in catalog.schemas:
for table in schema.tables:
high_quality_cols = table.get_high_quality_columns(min_quality_score=0.8)
print(f"📊 {table.table_name}: {len(high_quality_cols)} high-quality columns")
🎯 Best Practices from Examples¶
1. Use Appropriate Agent Modes¶
Enhanced mode for production analytics
ReFoRCE mode for complex, multi-step analysis
Basic mode for simple queries and development
2. Configure Safety for Your Use Case¶
Healthcare: Strict PII protection
Finance: Audit logging and access controls
E-commerce: Row limits for large datasets
3. Optimize Profiling Settings¶
Large datasets: Higher sample sizes
Real-time systems: Cached profiles
Development: Smaller samples for speed
4. Structure Complex Queries¶
Break down multi-part questions
Specify desired output format
Include business context and constraints
5. Monitor and Iterate¶
Track query performance
Refine prompts based on results
Use profiling data to improve accuracy
🚀 Next Steps¶
Ready to implement these patterns in your own projects?
Advanced Setup - Production configuration
API Reference - Complete method documentation
Architecture Guide - Deep dive into internals