From 19bd6db336de50422af6cdb7a8cd33a89f307b9f Mon Sep 17 00:00:00 2001 From: Husain Baghwala Date: Thu, 31 Jul 2025 13:20:22 +0530 Subject: [PATCH] docs: add comprehensive documentation for database flows and component structures --- docs/README.md | 289 ++++++++ docs/api/api-structure.md | 323 +++++++++ docs/architecture/component-interactions.md | 279 ++++++++ .../architecture/deployment-infrastructure.md | 670 ++++++++++++++++++ docs/architecture/system-architecture.md | 218 ++++++ docs/database/database-schema.md | 396 +++++++++++ docs/quick-reference/architecture-summary.md | 195 +++++ 7 files changed, 2370 insertions(+) create mode 100644 docs/README.md create mode 100644 docs/api/api-structure.md create mode 100644 docs/architecture/component-interactions.md create mode 100644 docs/architecture/deployment-infrastructure.md create mode 100644 docs/architecture/system-architecture.md create mode 100644 docs/database/database-schema.md create mode 100644 docs/quick-reference/architecture-summary.md diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 00000000..b81d65f5 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,289 @@ +# AI Middleware System - Architectural Documentation + +## Executive Summary + +The AI Middleware System is a comprehensive FastAPI-based service that provides a unified interface for multiple AI service providers including OpenAI, Anthropic, Google Gemini, Groq, Mistral, and OpenRouter. The system acts as an intelligent proxy layer, offering features such as request routing, authentication, rate limiting, caching, queue management, and comprehensive monitoring. + +## ๐ŸŽฏ Key Features + +- **Multi-Provider AI Integration**: Unified API for 6+ AI service providers +- **Intelligent Request Routing**: Automatic failover and load balancing +- **Advanced Authentication**: JWT-based auth with multi-tier access control +- **Rate Limiting**: Granular rate limiting per user, thread, and organization +- **Caching Layer**: Redis-based response caching for improved performance +- **Queue Management**: RabbitMQ-based asynchronous processing +- **Real-time Monitoring**: Comprehensive metrics and alerting via Atatus +- **Function Call Support**: Advanced tool calling and function execution +- **RAG Capabilities**: Retrieval-Augmented Generation for document processing +- **Multi-Database Architecture**: MongoDB, PostgreSQL/TimescaleDB, and Redis + +## ๐Ÿ“ Documentation Structure + +### Architecture Documentation +- **[System Architecture](architecture/system-architecture.md)** - High-level system overview and component diagrams +- **[Component Interactions](architecture/component-interactions.md)** - Detailed interaction flows and sequence diagrams +- **[Deployment Infrastructure](architecture/deployment-infrastructure.md)** - Kubernetes deployment and infrastructure setup + +### API Documentation +- **[API Structure](api/api-structure.md)** - Complete API endpoint documentation and routing hierarchy + +### Database Documentation +- **[Database Schema](database/database-schema.md)** - Multi-database schema design and relationships + +## ๐Ÿ—๏ธ System Architecture Overview + +```mermaid +graph TB + subgraph "Client Layer" + WebApp[Web Applications] + MobileApp[Mobile Applications] + APIClients[API Clients] + end + + subgraph "Gateway Layer" + LB[Load Balancer] + SSL[SSL/TLS Termination] + end + + subgraph "Application Layer" + FastAPI[FastAPI Application] + Auth[Authentication Layer] + RateLimit[Rate Limiting] + Middleware[Custom Middleware] + end + + subgraph "Service Layer" + BaseService[Base AI Service] + QueueService[Queue Service] + CacheService[Cache Service] + RAGService[RAG Service] + end + + subgraph "AI Providers" + OpenAI[OpenAI] + Anthropic[Anthropic] + Gemini[Google Gemini] + Groq[Groq] + Mistral[Mistral] + OpenRouter[OpenRouter] + end + + subgraph "Data Layer" + MongoDB[(MongoDB)] + PostgreSQL[(PostgreSQL/TimescaleDB)] + Redis[(Redis)] + RabbitMQ[RabbitMQ] + end + + WebApp --> LB + MobileApp --> LB + APIClients --> LB + LB --> SSL + SSL --> FastAPI + FastAPI --> Auth + Auth --> RateLimit + RateLimit --> Middleware + Middleware --> BaseService + BaseService --> QueueService + BaseService --> CacheService + BaseService --> RAGService + BaseService --> OpenAI + BaseService --> Anthropic + BaseService --> Gemini + BaseService --> Groq + BaseService --> Mistral + BaseService --> OpenRouter + QueueService --> RabbitMQ + CacheService --> Redis + BaseService --> MongoDB + BaseService --> PostgreSQL +``` + +## ๐Ÿš€ Quick Start Guide + +### Prerequisites +- Python 3.10+ +- Docker and Docker Compose +- MongoDB 7.0+ +- PostgreSQL with TimescaleDB extension +- Redis 7+ +- RabbitMQ + +### Development Setup +```bash +# Clone the repository +git clone +cd AI-middleware-python + +# Install dependencies +pip install -r req.txt + +# Set up environment variables +cp .env.example .env +# Edit .env with your configuration + +# Start dependencies with Docker Compose +docker-compose -f docker-compose.dev.yml up -d + +# Run the application +uvicorn index:app --reload --port 8080 +``` + +### Production Deployment +```bash +# Build Docker image +docker build -t ai-middleware:latest . + +# Deploy to Kubernetes +kubectl apply -f k8s/ +``` + +## ๐Ÿ”ง Configuration Management + +### Environment Variables +The system uses environment-based configuration with the following key categories: + +- **Database Connections**: MongoDB, PostgreSQL, Redis URIs +- **AI Provider Keys**: API keys for all supported providers +- **Queue Configuration**: RabbitMQ connection and queue settings +- **Security Settings**: JWT secrets, encryption keys +- **Performance Tuning**: Worker counts, timeout settings +- **Monitoring**: Atatus configuration + +### Dynamic Configuration +The system supports dynamic configuration updates through MongoDB change streams, allowing real-time configuration changes without service restart. + +## ๐Ÿ“Š Performance Characteristics + +### Throughput +- **Peak Requests**: 10,000+ requests/minute +- **Concurrent Connections**: 1,000+ simultaneous connections +- **Average Response Time**: <500ms (cached), <2s (uncached) + +### Scalability +- **Horizontal Scaling**: Kubernetes-based auto-scaling +- **Database Scaling**: Read replicas and connection pooling +- **Cache Scaling**: Redis cluster with automatic sharding + +### Reliability +- **Uptime Target**: 99.9% +- **Failover Time**: <30 seconds +- **Data Durability**: Multi-zone replication + +## ๐Ÿ”’ Security Framework + +### Authentication & Authorization +- **JWT Tokens**: Stateless authentication with configurable expiration +- **API Keys**: Service-to-service authentication +- **Multi-Tier Access**: User, organization, and bridge-level permissions + +### Data Protection +- **Encryption at Rest**: Database-level encryption +- **Encryption in Transit**: TLS 1.3 for all connections +- **Key Management**: Kubernetes secrets with rotation +- **PII Handling**: Data anonymization and retention policies + +### Network Security +- **VPC Isolation**: Private network segments +- **Network Policies**: Kubernetes-based micro-segmentation +- **DDoS Protection**: Rate limiting and traffic analysis +- **Intrusion Detection**: Automated threat detection + +## ๐Ÿ“ˆ Monitoring & Observability + +### Metrics Collection +- **Application Metrics**: Request rates, response times, error rates +- **Infrastructure Metrics**: CPU, memory, disk, network utilization +- **Business Metrics**: AI provider usage, token consumption, costs +- **Custom Metrics**: Function call success rates, queue depths + +### Logging Strategy +- **Structured Logging**: JSON-formatted logs with correlation IDs +- **Log Aggregation**: Centralized log collection with ELK stack +- **Log Retention**: Configurable retention periods by log level +- **Security Logging**: Authentication and authorization events + +### Alerting Framework +- **Performance Alerts**: Response time degradation, error rate spikes +- **Infrastructure Alerts**: Resource utilization thresholds +- **Security Alerts**: Failed authentication attempts, anomalous behavior +- **Business Alerts**: Cost thresholds, quota utilization + +## ๐Ÿงช Testing Strategy + +### Unit Testing +- **Coverage Target**: >80% code coverage +- **Test Framework**: pytest with async support +- **Mock Strategy**: External API mocking for reliable tests + +### Integration Testing +- **Database Testing**: Test database interactions with real instances +- **API Testing**: End-to-end API workflow testing +- **Queue Testing**: Message processing and error handling + +### Performance Testing +- **Load Testing**: JMeter-based load testing scenarios +- **Stress Testing**: Breaking point identification +- **Endurance Testing**: Long-running stability testing + +## ๐Ÿ”„ Development Workflow + +### Code Standards +- **Linting**: flake8, black, isort for code formatting +- **Type Checking**: mypy for static type analysis +- **Documentation**: Docstring standards and API documentation + +### CI/CD Pipeline +- **Continuous Integration**: Automated testing on pull requests +- **Continuous Deployment**: Automated deployment to staging/production +- **Quality Gates**: Code coverage, security scanning, performance testing + +### Release Management +- **Semantic Versioning**: Version numbering strategy +- **Feature Flags**: Gradual feature rollout capability +- **Rollback Strategy**: Quick rollback procedures for production issues + +## ๐Ÿ“š Additional Resources + +### API Reference +- **OpenAPI Documentation**: Auto-generated API documentation +- **Postman Collection**: Ready-to-use API testing collection +- **SDK Documentation**: Client SDK usage examples + +### Troubleshooting Guides +- **Common Issues**: Frequently encountered problems and solutions +- **Performance Tuning**: Optimization strategies and best practices +- **Debugging Guide**: Step-by-step debugging procedures + +### Contributing Guidelines +- **Development Setup**: Local development environment setup +- **Code Contribution**: Pull request process and guidelines +- **Bug Reporting**: Issue reporting template and process + +## ๐Ÿค Support & Maintenance + +### Support Channels +- **Documentation**: Comprehensive documentation and FAQs +- **Issue Tracking**: GitHub issues for bug reports and feature requests +- **Community**: Developer community and discussion forums + +### Maintenance Schedule +- **Regular Updates**: Monthly dependency updates and security patches +- **Major Releases**: Quarterly feature releases with breaking changes +- **LTS Support**: Long-term support for stable versions + +--- + +## Quick Navigation + +| Section | Description | Link | +|---------|-------------|------| +| ๐Ÿ—๏ธ Architecture | System design and components | [View Details](architecture/system-architecture.md) | +| ๐Ÿ”„ Interactions | Component communication flows | [View Details](architecture/component-interactions.md) | +| ๐Ÿš€ Deployment | Infrastructure and deployment | [View Details](architecture/deployment-infrastructure.md) | +| ๐Ÿ“ก API | API endpoints and structure | [View Details](api/api-structure.md) | +| ๐Ÿ—„๏ธ Database | Schema and relationships | [View Details](database/database-schema.md) | + +--- + +*This documentation is maintained by the AI Middleware development team. For questions or contributions, please refer to our contributing guidelines.* \ No newline at end of file diff --git a/docs/api/api-structure.md b/docs/api/api-structure.md new file mode 100644 index 00000000..71e05c6f --- /dev/null +++ b/docs/api/api-structure.md @@ -0,0 +1,323 @@ +# API Structure and Routing Hierarchy + +## Overview + +This document outlines the complete API structure of the AI Middleware system, including all endpoints, their purposes, authentication requirements, and routing hierarchy. + +## API Routing Structure + +```mermaid +graph TB + Root[Root /] --> Health[/healthcheck] + Root --> Test[/90-sec] + Root --> Stream[/stream] + + Root --> V1Model[/api/v1/model] + Root --> V2Model[/api/v2/model] + Root --> Chatbot[/chatbot] + Root --> Bridge[/bridge] + Root --> Config[/api/v1/config] + Root --> Functions[/functions] + Root --> BridgeVersions[/bridge/versions] + Root --> ImageProcessing[/image/processing] + Root --> Utils[/utils] + Root --> RAG[/rag] + Root --> Internal[/internal] + Root --> Testcases[/testcases] + + V1Model --> V1Deprecated[Deprecated Routes] + V2Model --> V2Active[Active Model Routes] + + Chatbot --> ChatbotSend[/{botId}/sendMessage] + Chatbot --> ChatbotReset[/{botId}/resetchat] + + Bridge --> BridgeMain[Bridge Operations] + + Config --> ConfigOps[Configuration Management] + + Functions --> FunctionCalls[API Function Calls] + + BridgeVersions --> VersionMgmt[Version Management] + + ImageProcessing --> ImageOps[Image Processing] + + Utils --> UtilityOps[Utility Operations] + + RAG --> RAGOps[RAG Operations] + + Internal --> InternalOps[Internal Operations] + + Testcases --> TestOps[Test Case Management] +``` + +## Detailed API Endpoints + +### Core System Endpoints + +#### Health and Monitoring +``` +GET /healthcheck +- Purpose: System health check +- Authentication: None +- Response: System status information + +GET /90-sec +- Purpose: Long-running request test (90 seconds) +- Authentication: None +- Response: Test completion status + +GET /stream +- Purpose: Server-sent events streaming endpoint +- Authentication: None +- Response: Event stream data +``` + +### Model API Endpoints + +#### V1 Model API (Deprecated) +``` +POST /api/v1/model/chat/completion +- Status: Deprecated (Returns 410) +- Migration: Use /api/v2/model/chat/completion + +POST /api/v1/model/playground/chat/completion/{bridge_id} +- Purpose: Playground chat completion +- Authentication: JWT required +- Parameters: bridge_id (path) +- Middleware: Configuration injection +``` + +#### V2 Model API (Active) +``` +POST /api/v2/model/chat/completion +- Purpose: Main chat completion endpoint +- Authentication: Based on configuration +- Features: Multi-provider AI integration +- Response: Standardized AI response format + +POST /api/v2/model/playground/chat/completion/{bridge_id} +- Purpose: V2 playground endpoint +- Authentication: JWT required +- Parameters: bridge_id (path) +``` + +### Chatbot Endpoints + +``` +POST /chatbot/{botId}/sendMessage +- Purpose: Send message to chatbot +- Authentication: Dual auth (chatbot auth OR agents auth) +- Rate Limiting: + - 100 requests per slugName + - 20 requests per threadId +- Parameters: botId (path) + +POST /chatbot/{botId}/resetchat +- Purpose: Reset chatbot conversation +- Authentication: Chatbot authentication +- Rate Limiting: Applied +- Parameters: botId (path) +``` + +### Bridge Management + +``` +/bridge/* +- Purpose: Bridge configuration and management +- Authentication: Varies by endpoint +- Features: Bridge lifecycle management +``` + +### Configuration Management + +``` +/api/v1/config/* +- Purpose: System configuration management +- Authentication: JWT required +- Operations: CRUD operations on configurations +``` + +### Function Calls + +``` +/functions/* +- Purpose: External API function calls +- Authentication: Based on function requirements +- Features: Function call routing and execution +``` + +### Bridge Versions + +``` +/bridge/versions/* +- Purpose: Bridge version management +- Authentication: Required +- Operations: Version CRUD operations +``` + +### Image Processing + +``` +/image/processing/* +- Purpose: Image processing operations +- Authentication: Required +- Features: Image manipulation and analysis +``` + +### Utilities + +``` +/utils/* +- Purpose: Utility functions and helpers +- Authentication: Varies +- Features: Common utility operations +``` + +### RAG (Retrieval-Augmented Generation) + +``` +/rag/* +- Purpose: RAG operations +- Authentication: Required +- Features: Document processing and retrieval +``` + +### Internal Operations + +``` +/internal/* +- Purpose: Internal system operations +- Authentication: Internal authentication +- Features: System maintenance and operations +``` + +### Test Cases + +``` +/testcases/* +- Purpose: Test case management +- Authentication: Required +- Features: Test execution and management +``` + +## Authentication Matrix + +| Endpoint Group | Authentication Type | Rate Limiting | Special Requirements | +|---------------|--------------------|--------------|--------------------| +| Health/Monitor | None | No | Public access | +| V1 Model | JWT | Yes | Deprecated | +| V2 Model | JWT/API Key | Yes | Bridge configuration | +| Chatbot | Dual Auth | Yes | Bot-specific limits | +| Bridge | JWT | Yes | Admin access | +| Config | JWT | Yes | Admin access | +| Functions | Variable | Yes | Function-dependent | +| Versions | JWT | Yes | Version control | +| Image | JWT | Yes | Processing limits | +| Utils | Variable | Variable | Utility-dependent | +| RAG | JWT | Yes | Document access | +| Internal | Internal Auth | No | System operations | +| Testcases | JWT | Yes | Test environment | + +## Middleware Stack + +### Global Middleware (Applied to All Routes) +1. **CORS Middleware** + - Allow all origins (*) + - Allow all methods + - Allow all headers + - Max age: 86400 seconds + +2. **Atatus Middleware** + - Application performance monitoring + - Only in production environment + +### Route-Specific Middleware + +#### Authentication Middleware +- **JWT Middleware**: Token validation and user extraction +- **Chatbot Auth**: Bot-specific authentication +- **Agents Auth**: Agent-based authentication + +#### Rate Limiting Middleware +- **User-based limiting**: Per user request limits +- **Thread-based limiting**: Per conversation thread limits +- **Slug-based limiting**: Per bot slug limits + +#### Configuration Middleware +- **Bridge Configuration**: Injects bridge-specific configuration +- **User Data**: Adds user context to requests + +## Error Handling + +### Standard Error Responses +```json +{ + "success": false, + "error": "Error description", + "code": "ERROR_CODE", + "details": {} +} +``` + +### HTTP Status Codes +- `200`: Success +- `400`: Bad Request (validation errors) +- `401`: Unauthorized +- `403`: Forbidden +- `404`: Not Found +- `410`: Gone (deprecated endpoints) +- `429`: Too Many Requests (rate limiting) +- `500`: Internal Server Error + +## Request/Response Patterns + +### Standard Request Format +```json +{ + "model": "gpt-4", + "messages": [...], + "temperature": 0.7, + "max_tokens": 1000, + "tools": [...], + "stream": false +} +``` + +### Standard Response Format +```json +{ + "success": true, + "data": { + "id": "response_id", + "choices": [...], + "usage": { + "prompt_tokens": 100, + "completion_tokens": 200, + "total_tokens": 300 + } + }, + "metadata": { + "provider": "openai", + "model": "gpt-4", + "latency": 1500 + } +} +``` + +## API Versioning Strategy + +### Version 1 (v1) +- **Status**: Deprecated +- **Endpoints**: `/api/v1/*` +- **Migration Path**: Use v2 equivalents + +### Version 2 (v2) +- **Status**: Active +- **Endpoints**: `/api/v2/*` +- **Features**: Enhanced functionality, better error handling + +### Unversioned Endpoints +- Core system endpoints (health, streaming) +- Domain-specific endpoints (chatbot, bridge, etc.) + +This API structure provides a comprehensive interface for AI middleware operations while maintaining backward compatibility and clear separation of concerns. \ No newline at end of file diff --git a/docs/architecture/component-interactions.md b/docs/architecture/component-interactions.md new file mode 100644 index 00000000..6cfe9cda --- /dev/null +++ b/docs/architecture/component-interactions.md @@ -0,0 +1,279 @@ +# Component Interaction Diagrams + +## Detailed Component Interaction Flow + +### 1. Chat Completion Request Flow + +```mermaid +sequenceDiagram + participant Client + participant FastAPI as FastAPI App + participant Auth as Auth Middleware + participant Controller as Model Controller + participant BaseService as Base Service + participant Cache as Redis Cache + participant Provider as AI Provider + participant Queue as RabbitMQ + participant MongoDB as MongoDB + participant TimescaleDB as TimescaleDB + + Client->>FastAPI: POST /api/v2/model/chat/completion + FastAPI->>Auth: Validate JWT token + Auth->>Controller: Authorized request + Controller->>BaseService: Process chat request + + BaseService->>Cache: Check cached response + alt Cache Hit + Cache->>BaseService: Return cached response + BaseService->>Controller: Cached result + else Cache Miss + BaseService->>Provider: Send AI request + Provider->>BaseService: AI response + BaseService->>Cache: Store response in cache + end + + BaseService->>MongoDB: Store conversation + BaseService->>TimescaleDB: Store metrics + BaseService->>Queue: Queue background tasks + + BaseService->>Controller: Final response + Controller->>FastAPI: JSON response + FastAPI->>Client: HTTP response +``` + +### 2. Function Call Processing Flow + +```mermaid +flowchart TD + Start([AI Request with Tools]) --> ValidateTools{Validate Tool Calls} + ValidateTools -->|Valid| ProcessTools[Process Tool Calls] + ValidateTools -->|Invalid| ErrorResponse[Return Error] + + ProcessTools --> RunTools[Execute Function Calls] + RunTools --> UpdateConfig[Update Configuration] + UpdateConfig --> SecondAICall[Make Follow-up AI Call] + SecondAICall --> CheckMoreTools{More Tool Calls?} + + CheckMoreTools -->|Yes| ProcessTools + CheckMoreTools -->|No| FinalResponse[Return Final Response] + + FinalResponse --> StoreResults[Store in Database] + StoreResults --> End([End]) + + ErrorResponse --> End +``` + +### 3. Queue Processing Flow + +```mermaid +graph LR + subgraph "Queue Producer" + Request[Incoming Request] --> QueueService[Queue Service] + QueueService --> RabbitMQ[(RabbitMQ)] + end + + subgraph "Queue Consumer" + RabbitMQ --> Consumer[Background Consumer] + Consumer --> ProcessMessage[Process Message] + ProcessMessage --> AICall[AI Provider Call] + AICall --> StoreResult[Store Result] + StoreResult --> Complete[Mark Complete] + end + + subgraph "Error Handling" + ProcessMessage --> Error{Error?} + Error -->|Yes| Retry[Retry Logic] + Error -->|No| Success[Success] + Retry -->|Max Retries| DeadLetter[Dead Letter Queue] + Retry -->|Retry| ProcessMessage + end +``` + +### 4. Configuration Management Flow + +```mermaid +graph TB + subgraph "Configuration Sources" + EnvVars[Environment Variables] + MongoDB[MongoDB Configs] + DefaultVals[Default Values] + end + + subgraph "Configuration Loading" + ConfigLoader[Configuration Loader] + ModelConfig[Model Configuration] + ServiceKeys[Service Keys] + end + + subgraph "Change Detection" + ChangeStream[MongoDB Change Stream] + Listener[Background Listener] + Refresh[Refresh Configuration] + end + + subgraph "Application Usage" + BaseService[Base Service] + Controllers[Controllers] + Middlewares[Middlewares] + end + + EnvVars --> ConfigLoader + MongoDB --> ConfigLoader + DefaultVals --> ConfigLoader + + ConfigLoader --> ModelConfig + ConfigLoader --> ServiceKeys + + MongoDB --> ChangeStream + ChangeStream --> Listener + Listener --> Refresh + Refresh --> ModelConfig + + ModelConfig --> BaseService + ServiceKeys --> Controllers + ModelConfig --> Middlewares +``` + +### 5. Authentication & Rate Limiting Flow + +```mermaid +graph TD + Request[Incoming Request] --> AuthCheck{Authentication Required?} + + AuthCheck -->|Yes| ValidateJWT[Validate JWT Token] + AuthCheck -->|No| RateLimit[Check Rate Limits] + + ValidateJWT -->|Valid| RateLimit + ValidateJWT -->|Invalid| AuthError[Return 401 Unauthorized] + + RateLimit --> CheckUserLimit{User Rate Limit} + CheckUserLimit -->|Within Limit| CheckThreadLimit{Thread Rate Limit} + CheckUserLimit -->|Exceeded| RateLimitError[Return 429 Too Many Requests] + + CheckThreadLimit -->|Within Limit| ProcessRequest[Process Request] + CheckThreadLimit -->|Exceeded| RateLimitError + + ProcessRequest --> Success[Continue to Controller] + + AuthError --> End[Return Error Response] + RateLimitError --> End + Success --> End +``` + +### 6. Database Interaction Pattern + +```mermaid +graph LR + subgraph "Application Layer" + Services[Services] + Controllers[Controllers] + end + + subgraph "Database Services" + MongoService[MongoDB Service] + TimescaleService[TimescaleDB Service] + CacheService[Redis Service] + end + + subgraph "Data Storage" + MongoDB[(MongoDB
Configurations
Conversations
API Calls)] + TimescaleDB[(TimescaleDB
Metrics
Analytics
Time-series)] + Redis[(Redis
Cache
Sessions
Rate Limits)] + end + + Services --> MongoService + Services --> TimescaleService + Services --> CacheService + Controllers --> MongoService + + MongoService --> MongoDB + TimescaleService --> TimescaleDB + CacheService --> Redis +``` + +### 7. Error Handling & Monitoring Flow + +```mermaid +graph TB + Request[Request Processing] --> Error{Error Occurred?} + + Error -->|No| Success[Successful Response] + Error -->|Yes| LogError[Log Error Details] + + LogError --> SendMetrics[Send to Metrics Service] + LogError --> NotifyAtatus[Send to Atatus] + LogError --> CheckWebhook{Webhook Configured?} + + CheckWebhook -->|Yes| SendWebhook[Send Error Webhook] + CheckWebhook -->|No| FormatError[Format Error Response] + + SendWebhook --> FormatError + SendMetrics --> TimescaleDB[(TimescaleDB)] + NotifyAtatus --> Atatus[Atatus Monitoring] + + FormatError --> ClientResponse[Return Error to Client] + Success --> ClientResponse +``` + +### 8. AI Provider Abstraction Layer + +```mermaid +graph TB + subgraph "Service Selection" + Request[AI Request] --> ServiceRouter{Select AI Service} + ServiceRouter --> OpenAI[OpenAI Service] + ServiceRouter --> Anthropic[Anthropic Service] + ServiceRouter --> Groq[Groq Service] + ServiceRouter --> Gemini[Gemini Service] + ServiceRouter --> Mistral[Mistral Service] + ServiceRouter --> OpenRouter[OpenRouter Service] + end + + subgraph "Request Processing" + OpenAI --> FormatRequest[Format Request] + Anthropic --> FormatRequest + Groq --> FormatRequest + Gemini --> FormatRequest + Mistral --> FormatRequest + OpenRouter --> FormatRequest + + FormatRequest --> MakeAPICall[Make API Call] + MakeAPICall --> ProcessResponse[Process Response] + ProcessResponse --> StandardizeFormat[Standardize Format] + end + + subgraph "Response Handling" + StandardizeFormat --> TokenCalculation[Calculate Token Usage] + TokenCalculation --> StoreMetrics[Store Usage Metrics] + StoreMetrics --> ReturnResponse[Return Standardized Response] + end +``` + +## Key Integration Points + +### 1. Service-to-Service Communication +- **Synchronous**: Direct function calls between services +- **Asynchronous**: RabbitMQ message queues for background tasks +- **Caching**: Redis for frequently accessed data + +### 2. Data Persistence Patterns +- **MongoDB**: Document storage for configurations and conversations +- **TimescaleDB**: Time-series data for metrics and analytics +- **Redis**: Temporary storage for cache and session data + +### 3. External API Integration +- **AI Providers**: RESTful API calls with retry logic +- **Webhooks**: Outbound notifications for events +- **Monitoring**: Atatus APM integration + +### 4. Background Processing +- **Queue Workers**: Process messages asynchronously +- **Change Stream Listeners**: React to database changes +- **Scheduled Tasks**: Periodic maintenance operations + +### 5. Error Propagation +- **Service Level**: Errors bubble up through service layers +- **Client Level**: Standardized error responses +- **Monitoring Level**: Error tracking and alerting + +This interaction model ensures loose coupling between components while maintaining data consistency and providing comprehensive error handling and monitoring capabilities. \ No newline at end of file diff --git a/docs/architecture/deployment-infrastructure.md b/docs/architecture/deployment-infrastructure.md new file mode 100644 index 00000000..5be937d3 --- /dev/null +++ b/docs/architecture/deployment-infrastructure.md @@ -0,0 +1,670 @@ +# Deployment and Infrastructure Architecture + +## Overview + +The AI Middleware system is designed for cloud-native deployment with containerization, microservices architecture, and horizontal scalability. This document outlines the infrastructure components, deployment patterns, and operational requirements. + +## Infrastructure Architecture + +```mermaid +graph TB + subgraph "External Services" + OpenAI[OpenAI API] + Anthropic[Anthropic API] + Google[Google Gemini API] + Groq[Groq API] + Mistral[Mistral API] + OpenRouter[OpenRouter API] + Atatus[Atatus Monitoring] + end + + subgraph "Load Balancer Layer" + LB[Load Balancer
Nginx/HAProxy] + SSL[SSL Termination] + end + + subgraph "Application Layer" + subgraph "Container Orchestration" + K8s[Kubernetes Cluster] + end + + subgraph "Application Pods" + App1[FastAPI App 1
Gunicorn + Uvicorn] + App2[FastAPI App 2
Gunicorn + Uvicorn] + App3[FastAPI App N
Gunicorn + Uvicorn] + end + + subgraph "Background Workers" + Worker1[Queue Consumer 1] + Worker2[Queue Consumer 2] + Worker3[Queue Consumer N] + end + end + + subgraph "Message Queue Layer" + RabbitMQ[RabbitMQ Cluster
High Availability] + RabbitMQ_LB[RabbitMQ Load Balancer] + end + + subgraph "Database Layer" + subgraph "MongoDB Cluster" + MongoDB_Primary[(MongoDB Primary)] + MongoDB_Secondary1[(MongoDB Secondary)] + MongoDB_Secondary2[(MongoDB Secondary)] + end + + subgraph "PostgreSQL/TimescaleDB" + PostgreSQL_Primary[(PostgreSQL Primary
TimescaleDB Extension)] + PostgreSQL_Replica[(PostgreSQL Read Replica)] + end + + subgraph "Redis Cluster" + Redis_Master[(Redis Master)] + Redis_Replica[(Redis Replica)] + end + end + + subgraph "Monitoring & Logging" + Prometheus[Prometheus
Metrics Collection] + Grafana[Grafana
Dashboards] + ElasticSearch[ElasticSearch
Log Storage] + Kibana[Kibana
Log Analysis] + AlertManager[Alert Manager
Notifications] + end + + subgraph "Storage" + FileStorage[Object Storage
S3/GCS/Azure Blob] + ConfigMaps[ConfigMaps
Environment Variables] + Secrets[Kubernetes Secrets
API Keys & Credentials] + end + + %% Client Connections + Client[Client Applications] --> LB + LB --> SSL + SSL --> K8s + + %% Application Layer Connections + K8s --> App1 + K8s --> App2 + K8s --> App3 + K8s --> Worker1 + K8s --> Worker2 + K8s --> Worker3 + + %% External API Connections + App1 --> OpenAI + App1 --> Anthropic + App1 --> Google + App1 --> Groq + App1 --> Mistral + App1 --> OpenRouter + App1 --> Atatus + + %% Database Connections + App1 --> MongoDB_Primary + App1 --> PostgreSQL_Primary + App1 --> Redis_Master + App2 --> MongoDB_Primary + App2 --> PostgreSQL_Primary + App2 --> Redis_Master + + %% Message Queue Connections + App1 --> RabbitMQ_LB + Worker1 --> RabbitMQ_LB + RabbitMQ_LB --> RabbitMQ + + %% Monitoring Connections + App1 --> Prometheus + PostgreSQL_Primary --> Prometheus + MongoDB_Primary --> Prometheus + Redis_Master --> Prometheus + Prometheus --> Grafana + Prometheus --> AlertManager + + %% Configuration + K8s --> ConfigMaps + K8s --> Secrets + K8s --> FileStorage +``` + +## Container Architecture + +### Docker Configuration + +```dockerfile +# Production Dockerfile +FROM python:3.10-slim + +WORKDIR /app + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + gcc \ + g++ \ + curl \ + && rm -rf /var/lib/apt/lists/* + +# Copy requirements and install dependencies +COPY req.txt /app/req.txt +RUN pip install --upgrade pip +RUN pip install --no-cache-dir -r req.txt + +# Copy application code +COPY . /app + +# Create non-root user +RUN useradd -m -u 1001 appuser && chown -R appuser:appuser /app +USER appuser + +# Health check +HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \ + CMD curl -f http://localhost:8080/healthcheck || exit 1 + +# Expose port +EXPOSE 8080 + +# Start application with Gunicorn +CMD ["gunicorn", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "-b", "0.0.0.0:8080", "--timeout", "300", "--keep-alive", "5", "index:app"] +``` + +### Kubernetes Deployment + +```yaml +# deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ai-middleware + labels: + app: ai-middleware +spec: + replicas: 3 + selector: + matchLabels: + app: ai-middleware + template: + metadata: + labels: + app: ai-middleware + spec: + containers: + - name: ai-middleware + image: ai-middleware:latest + ports: + - containerPort: 8080 + env: + - name: ENVIRONMENT + value: "PRODUCTION" + - name: MONGODB_CONNECTION_URI + valueFrom: + secretKeyRef: + name: ai-middleware-secrets + key: mongodb-uri + - name: REDIS_URI + valueFrom: + secretKeyRef: + name: ai-middleware-secrets + key: redis-uri + - name: OPENAI_API_KEY + valueFrom: + secretKeyRef: + name: ai-middleware-secrets + key: openai-key + resources: + requests: + memory: "512Mi" + cpu: "250m" + limits: + memory: "1Gi" + cpu: "500m" + livenessProbe: + httpGet: + path: /healthcheck + port: 8080 + initialDelaySeconds: 30 + periodSeconds: 10 + readinessProbe: + httpGet: + path: /healthcheck + port: 8080 + initialDelaySeconds: 5 + periodSeconds: 5 +``` + +### Queue Worker Deployment + +```yaml +# worker-deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ai-middleware-worker + labels: + app: ai-middleware-worker +spec: + replicas: 2 + selector: + matchLabels: + app: ai-middleware-worker + template: + metadata: + labels: + app: ai-middleware-worker + spec: + containers: + - name: worker + image: ai-middleware:latest + command: ["python", "-c"] + args: ["from src.services.commonServices.queueService.queueService import queue_obj; import asyncio; asyncio.run(queue_obj.consume_messages())"] + env: + - name: CONSUMER_STATUS + value: "true" + - name: QUEUE_CONNECTIONURL + valueFrom: + secretKeyRef: + name: ai-middleware-secrets + key: rabbitmq-uri + resources: + requests: + memory: "256Mi" + cpu: "100m" + limits: + memory: "512Mi" + cpu: "200m" +``` + +## Service Configuration + +### Service and Ingress + +```yaml +# service.yaml +apiVersion: v1 +kind: Service +metadata: + name: ai-middleware-service +spec: + selector: + app: ai-middleware + ports: + - protocol: TCP + port: 80 + targetPort: 8080 + type: ClusterIP + +--- +# ingress.yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: ai-middleware-ingress + annotations: + nginx.ingress.kubernetes.io/ssl-redirect: "true" + nginx.ingress.kubernetes.io/proxy-body-size: "50m" + nginx.ingress.kubernetes.io/proxy-read-timeout: "300" + nginx.ingress.kubernetes.io/proxy-send-timeout: "300" +spec: + tls: + - hosts: + - api.yourdomain.com + secretName: tls-secret + rules: + - host: api.yourdomain.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: ai-middleware-service + port: + number: 80 +``` + +## Database Infrastructure + +### MongoDB Cluster Configuration + +```yaml +# mongodb-cluster.yaml +apiVersion: mongodbcommunity.mongodb.com/v1 +kind: MongoDBCommunity +metadata: + name: ai-middleware-mongodb +spec: + members: 3 + type: ReplicaSet + version: "7.0.0" + security: + authentication: + modes: ["SCRAM"] + users: + - name: ai-middleware-user + db: admin + passwordSecretRef: + name: mongodb-secret + roles: + - name: readWriteAnyDatabase + db: admin + additionalMongodConfig: + storage.wiredTiger.engineConfig.journalCompressor: zlib + storage.wiredTiger.collectionConfig.blockCompressor: snappy +``` + +### PostgreSQL/TimescaleDB Configuration + +```yaml +# postgres-deployment.yaml +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: postgres +spec: + serviceName: postgres + replicas: 1 + selector: + matchLabels: + app: postgres + template: + metadata: + labels: + app: postgres + spec: + containers: + - name: postgres + image: timescale/timescaledb:latest-pg14 + env: + - name: POSTGRES_DB + value: ai_middleware + - name: POSTGRES_USER + valueFrom: + secretKeyRef: + name: postgres-secret + key: username + - name: POSTGRES_PASSWORD + valueFrom: + secretKeyRef: + name: postgres-secret + key: password + ports: + - containerPort: 5432 + volumeMounts: + - name: postgres-storage + mountPath: /var/lib/postgresql/data + volumeClaimTemplates: + - metadata: + name: postgres-storage + spec: + accessModes: ["ReadWriteOnce"] + resources: + requests: + storage: 100Gi +``` + +### Redis Cluster Configuration + +```yaml +# redis-cluster.yaml +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: redis +spec: + serviceName: redis + replicas: 3 + selector: + matchLabels: + app: redis + template: + metadata: + labels: + app: redis + spec: + containers: + - name: redis + image: redis:7-alpine + command: + - redis-server + - --cluster-enabled + - "yes" + - --cluster-config-file + - nodes.conf + - --cluster-node-timeout + - "5000" + - --appendonly + - "yes" + ports: + - containerPort: 6379 + - containerPort: 16379 + volumeMounts: + - name: redis-data + mountPath: /data + volumeClaimTemplates: + - metadata: + name: redis-data + spec: + accessModes: ["ReadWriteOnce"] + resources: + requests: + storage: 10Gi +``` + +## Monitoring and Observability + +### Prometheus Configuration + +```yaml +# prometheus-config.yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: prometheus-config +data: + prometheus.yml: | + global: + scrape_interval: 15s + + scrape_configs: + - job_name: 'ai-middleware' + static_configs: + - targets: ['ai-middleware-service:80'] + metrics_path: '/metrics' + + - job_name: 'postgres' + static_configs: + - targets: ['postgres:5432'] + + - job_name: 'mongodb' + static_configs: + - targets: ['ai-middleware-mongodb:27017'] + + - job_name: 'redis' + static_configs: + - targets: ['redis:6379'] +``` + +### Grafana Dashboard Configuration + +```json +{ + "dashboard": { + "title": "AI Middleware Metrics", + "panels": [ + { + "title": "Request Rate", + "type": "graph", + "targets": [ + { + "expr": "rate(http_requests_total[5m])", + "legendFormat": "{{method}} {{endpoint}}" + } + ] + }, + { + "title": "Response Times", + "type": "graph", + "targets": [ + { + "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))", + "legendFormat": "95th percentile" + } + ] + }, + { + "title": "AI Provider Usage", + "type": "pie", + "targets": [ + { + "expr": "sum by (provider) (ai_requests_total)", + "legendFormat": "{{provider}}" + } + ] + } + ] + } +} +``` + +## Environment Configurations + +### Development Environment +```bash +# docker-compose.dev.yml +version: '3.8' +services: + app: + build: . + ports: + - "8080:8080" + environment: + - ENVIRONMENT=LOCAL + - MONGODB_CONNECTION_URI=mongodb://mongo:27017 + - REDIS_URI=redis://redis:6379 + depends_on: + - mongo + - redis + - postgres + + mongo: + image: mongo:7 + ports: + - "27017:27017" + volumes: + - mongo_data:/data/db + + redis: + image: redis:7-alpine + ports: + - "6379:6379" + + postgres: + image: timescale/timescaledb:latest-pg14 + environment: + POSTGRES_DB: ai_middleware + POSTGRES_USER: admin + POSTGRES_PASSWORD: password + ports: + - "5432:5432" + volumes: + - postgres_data:/var/lib/postgresql/data + +volumes: + mongo_data: + postgres_data: +``` + +### Production Environment Variables +```bash +# Production .env template +ENVIRONMENT=PRODUCTION +PORT=8080 + +# Database Connections +MONGODB_CONNECTION_URI=mongodb+srv://user:pass@cluster.mongodb.net/ai_middleware +TIMESCALE_SERVICE_URL=postgresql://user:pass@timescale.host:5432/ai_middleware +REDIS_URI=redis://redis.host:6379 + +# AI Provider Keys (Encrypted) +OPENAI_API_KEY=encrypted_key +ANTHROPIC_API_KEY=encrypted_key +GOOGLE_API_KEY=encrypted_key + +# Queue Configuration +QUEUE_CONNECTIONURL=amqp://user:pass@rabbitmq.host:5672 +QUEUE_NAME=ai-middleware-queue +CONSUMER_STATUS=true +PREFETCH_COUNT=10 + +# Security +JWT_TOKEN_SECRET=jwt_secret +ENCRYPTION_KEY=encryption_key + +# Monitoring +ATATUS_LICENSE_KEY=atatus_key + +# Performance +MAX_WORKERS=4 +``` + +## Scaling and Performance + +### Horizontal Pod Autoscaler +```yaml +# hpa.yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: ai-middleware-hpa +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: ai-middleware + minReplicas: 3 + maxReplicas: 20 + metrics: + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 70 + - type: Resource + resource: + name: memory + target: + type: Utilization + averageUtilization: 80 +``` + +### Resource Requirements + +| Component | CPU Request | CPU Limit | Memory Request | Memory Limit | Storage | +|-----------|-------------|-----------|----------------|--------------|---------| +| FastAPI App | 250m | 500m | 512Mi | 1Gi | - | +| Queue Worker | 100m | 200m | 256Mi | 512Mi | - | +| MongoDB | 500m | 1000m | 1Gi | 2Gi | 100Gi | +| PostgreSQL | 500m | 1000m | 1Gi | 2Gi | 100Gi | +| Redis | 100m | 200m | 256Mi | 512Mi | 10Gi | + +## Security Considerations + +### Network Security +- TLS termination at load balancer +- Internal service mesh with mTLS +- Network policies for pod-to-pod communication +- VPC/subnet isolation + +### Application Security +- Secret management with Kubernetes secrets +- API key encryption at rest +- JWT token validation +- Rate limiting and DDoS protection + +### Database Security +- Database connection encryption +- Role-based access control +- Regular security updates +- Backup encryption + +This infrastructure provides high availability, scalability, and security for the AI Middleware system while maintaining operational efficiency and cost optimization. \ No newline at end of file diff --git a/docs/architecture/system-architecture.md b/docs/architecture/system-architecture.md new file mode 100644 index 00000000..908e1530 --- /dev/null +++ b/docs/architecture/system-architecture.md @@ -0,0 +1,218 @@ +# AI Middleware System Architecture + +## Overview + +This AI middleware system is built with FastAPI and serves as a unified interface for multiple AI service providers. It provides features like request routing, authentication, rate limiting, caching, queue management, and comprehensive monitoring. + +## High-Level System Architecture + +```mermaid +graph TB + %% External Systems + Client[Client Applications] + AIProviders[AI Service Providers
OpenAI, Anthropic, Groq
Gemini, Mistral, OpenRouter] + + %% Load Balancer + LB[Load Balancer] + + %% Application Layer + subgraph "FastAPI Application" + App[FastAPI App
index.py] + Middleware[Middleware Layer
CORS, Atatus, Auth] + Routes[Route Layer
API Endpoints] + end + + %% Controller Layer + subgraph "Controller Layer" + ModelCtrl[Model Controller] + BridgeCtrl[Bridge Controller] + ConfigCtrl[Config Controller] + ChatbotCtrl[Chatbot Controller] + APICallCtrl[API Call Controller] + end + + %% Service Layer + subgraph "Service Layer" + BaseService[Base Service
Core AI Logic] + CommonServices[Common Services] + QueueService[Queue Service
Background Processing] + CacheService[Cache Service
Redis Integration] + RAGService[RAG Service
Document Processing] + end + + %% AI Provider Services + subgraph "AI Provider Services" + OpenAIService[OpenAI Service] + AnthropicService[Anthropic Service] + GroqService[Groq Service] + GeminiService[Gemini Service] + MistralService[Mistral Service] + OpenRouterService[OpenRouter Service] + end + + %% Database Layer + subgraph "Database Layer" + MongoDB[(MongoDB
Primary Database)] + TimescaleDB[(TimescaleDB
Time-series Data)] + Redis[(Redis
Cache & Sessions)] + end + + %% Message Queue + RabbitMQ[RabbitMQ
Message Queue] + + %% Monitoring + subgraph "Monitoring & Logging" + Atatus[Atatus Monitoring] + Logger[Custom Logger] + Metrics[Metrics Service] + end + + %% Data Flow + Client --> LB + LB --> App + App --> Middleware + Middleware --> Routes + Routes --> ModelCtrl + Routes --> BridgeCtrl + Routes --> ConfigCtrl + Routes --> ChatbotCtrl + Routes --> APICallCtrl + + ModelCtrl --> BaseService + BridgeCtrl --> BaseService + ChatbotCtrl --> CommonServices + + BaseService --> OpenAIService + BaseService --> AnthropicService + BaseService --> GroqService + BaseService --> GeminiService + BaseService --> MistralService + BaseService --> OpenRouterService + + BaseService --> QueueService + BaseService --> CacheService + CommonServices --> RAGService + + QueueService --> RabbitMQ + CacheService --> Redis + + OpenAIService --> AIProviders + AnthropicService --> AIProviders + GroqService --> AIProviders + GeminiService --> AIProviders + MistralService --> AIProviders + OpenRouterService --> AIProviders + + BaseService --> MongoDB + BaseService --> TimescaleDB + CommonServices --> MongoDB + + App --> Atatus + BaseService --> Logger + BaseService --> Metrics + Metrics --> TimescaleDB +``` + +## Component Details + +### 1. FastAPI Application Layer +- **Entry Point**: [`index.py`](../../index.py) +- **Responsibilities**: + - Application lifecycle management + - Middleware configuration + - Route registration + - Health checks + - Background task initialization + +### 2. Middleware Layer +- **Authentication**: JWT-based authentication +- **Rate Limiting**: Request throttling per user/endpoint +- **CORS**: Cross-origin resource sharing +- **Monitoring**: Atatus integration for APM + +### 3. Controller Layer +- **Model Controller**: Handles AI model requests +- **Bridge Controller**: Manages bridge configurations +- **Config Controller**: Configuration management +- **Chatbot Controller**: Chatbot-specific operations + +### 4. Service Layer +- **Base Service**: Core abstraction for AI provider interactions +- **Common Services**: Shared business logic +- **Queue Service**: Asynchronous task processing +- **Cache Service**: Redis-based caching +- **RAG Service**: Retrieval-Augmented Generation + +### 5. AI Provider Integration +- **OpenAI**: GPT models and embeddings +- **Anthropic**: Claude models +- **Google Gemini**: Gemini models +- **Groq**: High-speed inference +- **Mistral**: Mistral models +- **OpenRouter**: Multi-provider routing + +### 6. Database Systems +- **MongoDB**: Primary database for configurations, conversations, and metadata +- **TimescaleDB**: Time-series data for metrics and analytics +- **Redis**: Caching and session management + +### 7. Message Queue System +- **RabbitMQ**: Asynchronous message processing +- **Background Tasks**: Long-running operations +- **Queue Management**: Load balancing and retry mechanisms + +## Key Features + +### Authentication & Authorization +- JWT token validation +- API key management +- Rate limiting per user/organization + +### Multi-Provider Support +- Unified interface for multiple AI providers +- Automatic failover and load balancing +- Provider-specific optimizations + +### Caching Strategy +- Redis-based response caching +- TTL-based cache invalidation +- Batch operation support + +### Queue Processing +- Background task execution +- Message persistence +- Consumer scaling + +### Monitoring & Observability +- Real-time performance monitoring +- Error tracking and alerting +- Comprehensive logging +- Metrics collection and analysis + +### Configuration Management +- Dynamic configuration updates +- MongoDB change stream listeners +- Environment-based settings + +## Data Flow Patterns + +### Synchronous Request Flow +1. Client request โ†’ Load Balancer +2. FastAPI app โ†’ Middleware validation +3. Route โ†’ Controller โ†’ Service +4. AI Provider call โ†’ Response processing +5. Caching โ†’ Client response + +### Asynchronous Processing Flow +1. Request โ†’ Queue Service +2. Background worker โ†’ Message processing +3. AI Provider interaction +4. Result storage โ†’ Notification + +### Configuration Update Flow +1. MongoDB change detection +2. Change stream listener activation +3. Configuration refresh +4. Service reconfiguration + +This architecture ensures scalability, reliability, and maintainability while providing a unified interface for multiple AI service providers. \ No newline at end of file diff --git a/docs/database/database-schema.md b/docs/database/database-schema.md new file mode 100644 index 00000000..27b987f0 --- /dev/null +++ b/docs/database/database-schema.md @@ -0,0 +1,396 @@ +# Database Schema and Relationships + +## Overview + +The AI Middleware system uses a multi-database architecture with three primary data stores, each optimized for specific use cases: + +- **MongoDB**: Document storage for configurations, dynamic data, and collections +- **PostgreSQL**: Relational data for conversations and structured records +- **TimescaleDB**: Time-series data for metrics and analytics +- **Redis**: Caching and session management + +## Database Architecture Diagram + +```mermaid +graph TB + subgraph "Application Layer" + Services[Application Services] + Controllers[Controllers] + end + + subgraph "MongoDB Cluster" + ModelConfigs[(modelconfigurations)] + Configurations[(configurations)] + APICallCollection[(apicalls)] + ChatbotCollection[(chatbots)] + TemplateCollection[(templates)] + end + + subgraph "PostgreSQL Database" + Conversations[(conversations)] + RawData[(raw_data)] + SystemPrompts[(system_prompt_versionings)] + ConfigHistory[(user_bridge_config_history)] + end + + subgraph "TimescaleDB" + MetricsRaw[(metrics_raw_data)] + end + + subgraph "Redis Cache" + ResponseCache[(Response Cache)] + SessionData[(Session Data)] + RateLimitData[(Rate Limit Data)] + end + + %% Relationships + Services --> ModelConfigs + Services --> Configurations + Services --> Conversations + Services --> RawData + Services --> MetricsRaw + Services --> ResponseCache + + Conversations ||--o{ RawData : "chat_id" + SystemPrompts ||--o{ Configurations : "bridge_id" + ConfigHistory ||--o{ Configurations : "bridge_id" + + %% Data Flow + Controllers --> Services + Services --> SessionData + Services --> RateLimitData +``` + +## Detailed Schema Definitions + +### MongoDB Collections + +#### Model Configurations Collection +```javascript +// Collection: modelconfigurations +{ + _id: ObjectId, + service: String, // AI service provider + model: String, // Model identifier + configuration: Object, // Model-specific config + apikey: String, // Encrypted API key + org_id: String, // Organization identifier + created_at: Date, + updated_at: Date, + is_active: Boolean +} + +// Indexes +db.modelconfigurations.createIndex({org_id: 1, service: 1}) +db.modelconfigurations.createIndex({service: 1, model: 1}) +``` + +#### Configurations Collection +```javascript +// Collection: configurations +{ + _id: ObjectId, + org_id: String, // Organization ID + service: String, // AI service provider + bridgeType: String, // 'api' | 'chatbot' + name: String, // Configuration name + slugName: String, // Unique slug + configuration: Object, // AI model configuration + apikey: String, // API key reference + api_call: Object, // API call configuration + api_endpoints: Array, // Available endpoints + is_api_call: Boolean, // API call flag + responseIds: Array, // Response references + defaultQuestions: Array, // Default questions + actions: Object, // Available actions + created_at: Date, + updated_at: Date +} + +// Unique Index +db.configurations.createIndex({org_id: 1, slugName: 1}, {unique: true}) +``` + +#### Chatbot Collection +```javascript +// Collection: chatbots +{ + _id: ObjectId, + config: { + buttonName: String, + height: String, + heightUnit: String, + width: String, + widthUnit: String, + type: String, // 'popup' | 'embedded' + themeColor: String + }, + orgId: String, + title: String, + createdBy: String, + type: String, // 'chatbot' + updatedBy: String, + bridge: Array, // References to configurations + created_at: Date, + updated_at: Date +} +``` + +### PostgreSQL Tables + +#### Conversations Table +```sql +CREATE TABLE conversations ( + id SERIAL PRIMARY KEY, + org_id VARCHAR, + thread_id VARCHAR, + sub_thread_id VARCHAR, + model_name VARCHAR, + bridge_id VARCHAR, + version_id VARCHAR, + message TEXT, + message_by VARCHAR, + function JSON, + type conversation_type_enum NOT NULL, + "createdAt" TIMESTAMP DEFAULT NOW(), + "updatedAt" TIMESTAMP DEFAULT NOW(), + chatbot_message TEXT, + is_reset BOOLEAN DEFAULT FALSE, + tools_call_data JSON[], + user_feedback INTEGER, + message_id UUID, + revised_prompt TEXT, + image_url TEXT, + urls VARCHAR[], + "AiConfig" JSON, + annotations JSON[] +); + +-- Indexes +CREATE INDEX idx_conversations_org_id ON conversations(org_id); +CREATE INDEX idx_conversations_thread_id ON conversations(thread_id); +CREATE INDEX idx_conversations_bridge_id ON conversations(bridge_id); +CREATE INDEX idx_conversations_created_at ON conversations("createdAt"); +``` + +#### Raw Data Table +```sql +CREATE TABLE raw_data ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + org_id VARCHAR, + authkey_name VARCHAR, + latency FLOAT, + service VARCHAR, + status BOOLEAN NOT NULL, + error TEXT DEFAULT 'none', + model VARCHAR, + input_tokens FLOAT, + output_tokens FLOAT, + expected_cost FLOAT, + created_at TIMESTAMP DEFAULT NOW(), + chat_id INTEGER REFERENCES conversations(id), + message_id UUID, + variables JSON, + is_present BOOLEAN DEFAULT FALSE, + "firstAttemptError" TEXT +); + +-- Indexes +CREATE INDEX idx_raw_data_org_id ON raw_data(org_id); +CREATE INDEX idx_raw_data_chat_id ON raw_data(chat_id); +CREATE INDEX idx_raw_data_created_at ON raw_data(created_at); +CREATE INDEX idx_raw_data_service ON raw_data(service); +``` + +#### System Prompt Versioning Table +```sql +CREATE TABLE system_prompt_versionings ( + id SERIAL PRIMARY KEY, + created_at TIMESTAMP NOT NULL DEFAULT NOW(), + updated_at TIMESTAMP NOT NULL DEFAULT NOW(), + system_prompt TEXT NOT NULL, + bridge_id VARCHAR NOT NULL, + org_id VARCHAR NOT NULL +); + +-- Indexes +CREATE INDEX idx_system_prompts_bridge_id ON system_prompt_versionings(bridge_id); +CREATE INDEX idx_system_prompts_org_id ON system_prompt_versionings(org_id); +``` + +#### User Bridge Config History Table +```sql +CREATE TABLE user_bridge_config_history ( + id SERIAL PRIMARY KEY, + user_id INTEGER NOT NULL, + org_id VARCHAR NOT NULL, + bridge_id VARCHAR NOT NULL, + type VARCHAR NOT NULL, + time TIMESTAMP NOT NULL DEFAULT NOW(), + version_id VARCHAR DEFAULT '' +); + +-- Indexes +CREATE INDEX idx_config_history_user_id ON user_bridge_config_history(user_id); +CREATE INDEX idx_config_history_bridge_id ON user_bridge_config_history(bridge_id); +``` + +### TimescaleDB Hypertables + +#### Metrics Raw Data Table +```sql +CREATE TABLE metrics_raw_data ( + id SERIAL PRIMARY KEY, + org_id VARCHAR, + bridge_id VARCHAR, + version_id VARCHAR, + thread_id VARCHAR, + model VARCHAR, + service VARCHAR, + input_tokens FLOAT, + output_tokens FLOAT, + total_tokens FLOAT, + apikey_id VARCHAR, + created_at TIMESTAMP DEFAULT NOW(), + latency FLOAT, + success BOOLEAN, + cost FLOAT, + time_zone VARCHAR +); + +-- Convert to hypertable for time-series optimization +SELECT create_hypertable('metrics_raw_data', 'created_at'); + +-- Indexes for efficient querying +CREATE INDEX idx_metrics_org_id_time ON metrics_raw_data(org_id, created_at); +CREATE INDEX idx_metrics_service_time ON metrics_raw_data(service, created_at); +CREATE INDEX idx_metrics_bridge_id_time ON metrics_raw_data(bridge_id, created_at); +``` + +### Redis Data Structures + +#### Cache Keys Structure +``` +# Response Cache +ai_response:{org_id}:{bridge_id}:{hash} -> JSON response +TTL: 3600 seconds (configurable) + +# Rate Limiting +rate_limit:{org_id}:{endpoint} -> counter +rate_limit:{thread_id} -> counter +TTL: Based on rate limit window + +# Session Data +session:{user_id} -> user session data +TTL: 86400 seconds (24 hours) + +# Configuration Cache +config_cache:{org_id}:{bridge_id} -> configuration JSON +TTL: 1800 seconds (30 minutes) +``` + +## Entity Relationships + +### Primary Relationships + +```mermaid +erDiagram + ORGANIZATIONS ||--o{ CONFIGURATIONS : has + ORGANIZATIONS ||--o{ CONVERSATIONS : owns + ORGANIZATIONS ||--o{ METRICS : generates + + CONFIGURATIONS ||--o{ CONVERSATIONS : processes + CONFIGURATIONS ||--o{ SYSTEM_PROMPTS : versions + CONFIGURATIONS ||--o{ CONFIG_HISTORY : tracks + + CONVERSATIONS ||--o{ RAW_DATA : produces + CONVERSATIONS }|--|| USERS : initiated_by + + RAW_DATA }|--|| CONVERSATIONS : belongs_to + RAW_DATA ||--o{ METRICS : aggregates_to + + USERS ||--o{ CONFIG_HISTORY : modifies + + ORGANIZATIONS { + string org_id PK + string name + string created_at + } + + CONFIGURATIONS { + objectid _id PK + string org_id FK + string bridge_id UK + string slugName UK + object configuration + string service + string bridgeType + } + + CONVERSATIONS { + int id PK + string org_id FK + string bridge_id FK + string thread_id + text message + timestamp created_at + } + + RAW_DATA { + uuid id PK + int chat_id FK + string org_id FK + float latency + boolean status + timestamp created_at + } + + METRICS { + int id PK + string org_id FK + string bridge_id FK + float input_tokens + float output_tokens + timestamp created_at + } +``` + +## Data Flow Patterns + +### 1. Request Processing Flow +``` +Client Request โ†’ Authentication โ†’ Configuration Lookup (MongoDB) +โ†’ AI Provider Call โ†’ Response Storage (PostgreSQL) +โ†’ Metrics Recording (TimescaleDB) โ†’ Cache Update (Redis) +``` + +### 2. Analytics Data Flow +``` +Raw API Calls (PostgreSQL) โ†’ Aggregation Service +โ†’ Metrics Storage (TimescaleDB) โ†’ Dashboard Queries +``` + +### 3. Configuration Management Flow +``` +Configuration Update (MongoDB) โ†’ Change Stream Trigger +โ†’ Cache Invalidation (Redis) โ†’ Service Refresh +``` + +## Performance Optimizations + +### Indexing Strategy +- **MongoDB**: Compound indexes on frequently queried fields +- **PostgreSQL**: B-tree indexes on foreign keys and timestamp columns +- **TimescaleDB**: Time-based partitioning with composite indexes +- **Redis**: Key expiration and memory optimization + +### Partitioning Strategy +- **TimescaleDB**: Automatic time-based partitioning (1 day chunks) +- **PostgreSQL**: Consider partitioning large tables by org_id or date + +### Caching Strategy +- **Application Level**: Redis for frequently accessed data +- **Database Level**: Connection pooling and query optimization +- **CDN Level**: Static content and API response caching + +This multi-database architecture provides optimal performance for different data access patterns while maintaining data consistency and enabling horizontal scaling. \ No newline at end of file diff --git a/docs/quick-reference/architecture-summary.md b/docs/quick-reference/architecture-summary.md new file mode 100644 index 00000000..d4470da1 --- /dev/null +++ b/docs/quick-reference/architecture-summary.md @@ -0,0 +1,195 @@ +# Architecture Quick Reference + +## System Overview +**AI Middleware System** - A FastAPI-based service providing unified access to multiple AI providers with advanced features like caching, queuing, authentication, and monitoring. + +## Key Components + +### ๐Ÿ—๏ธ Application Architecture +``` +Client โ†’ Load Balancer โ†’ FastAPI App โ†’ Base Service โ†’ AI Providers + โ†“ + Queue/Cache/Database +``` + +### ๐ŸŽฏ Core Services +- **Base Service**: Central AI abstraction layer +- **Queue Service**: RabbitMQ-based async processing +- **Cache Service**: Redis-based response caching +- **RAG Service**: Document processing and retrieval + +### ๐Ÿ”Œ AI Provider Integration +| Provider | Models | Features | +|----------|--------|----------| +| OpenAI | GPT-4, GPT-3.5, DALL-E | Chat, Embeddings, Images | +| Anthropic | Claude | Advanced reasoning | +| Google | Gemini | Multimodal capabilities | +| Groq | Various | High-speed inference | +| Mistral | Mistral models | European AI provider | +| OpenRouter | Multiple | Model routing | + +### ๐Ÿ—„๏ธ Database Architecture +- **MongoDB**: Configurations, conversations, metadata +- **PostgreSQL/TimescaleDB**: Structured data, time-series metrics +- **Redis**: Caching, sessions, rate limiting + +## API Structure + +### Main Endpoints +``` +/api/v2/model/* - AI model operations +/chatbot/* - Chatbot interactions +/bridge/* - Bridge management +/api/v1/config/* - Configuration management +/functions/* - Function calls +/rag/* - RAG operations +``` + +### Authentication Flow +``` +Request โ†’ JWT Validation โ†’ Rate Limiting โ†’ Authorization โ†’ Processing +``` + +## Deployment Architecture + +### Container Stack +- **Application**: FastAPI + Gunicorn + Uvicorn +- **Workers**: Background queue processors +- **Databases**: MongoDB, PostgreSQL, Redis clusters +- **Message Queue**: RabbitMQ cluster +- **Monitoring**: Prometheus, Grafana, Atatus + +### Kubernetes Resources +- **Deployments**: Application pods with HPA +- **StatefulSets**: Database clusters +- **Services**: Internal load balancing +- **Ingress**: External traffic routing +- **ConfigMaps/Secrets**: Configuration management + +## Performance Characteristics + +### Throughput +- **Peak Load**: 10,000+ requests/minute +- **Response Time**: <500ms (cached), <2s (uncached) +- **Concurrency**: 1,000+ simultaneous connections + +### Scaling +- **Horizontal**: Auto-scaling based on CPU/memory +- **Database**: Read replicas and connection pooling +- **Cache**: Redis cluster with sharding + +## Security Framework + +### Access Control +- **JWT Authentication**: Stateless token validation +- **Rate Limiting**: Per-user, per-thread, per-organization +- **API Keys**: Service-to-service authentication + +### Data Protection +- **Encryption**: TLS 1.3 in transit, database encryption at rest +- **Key Management**: Kubernetes secrets with rotation +- **Network Security**: VPC isolation, network policies + +## Monitoring Stack + +### Metrics +- **Application**: Request rates, response times, errors +- **Infrastructure**: CPU, memory, disk, network +- **Business**: AI usage, token consumption, costs + +### Observability +- **Logging**: Structured JSON logs with correlation IDs +- **Tracing**: Distributed tracing for request flows +- **Alerting**: Multi-channel notifications for incidents + +## Development Workflow + +### Local Development +```bash +# Start dependencies +docker-compose -f docker-compose.dev.yml up -d + +# Run application +uvicorn index:app --reload --port 8080 +``` + +### Production Deployment +```bash +# Build and deploy +docker build -t ai-middleware:latest . +kubectl apply -f k8s/ +``` + +## Common Patterns + +### Request Processing +1. Authentication & authorization +2. Configuration lookup +3. Request formatting per provider +4. AI provider call with retry logic +5. Response standardization +6. Caching and metrics recording + +### Error Handling +1. Service-level error catching +2. Standardized error formatting +3. Metrics recording +4. Webhook notifications +5. Client error response + +### Function Calling +1. Tool validation +2. Function execution +3. Response formatting +4. Follow-up AI calls +5. Result aggregation + +## Troubleshooting Quick Reference + +### Common Issues +- **High Response Times**: Check cache hit rates, database connections +- **Authentication Failures**: Verify JWT configuration, API keys +- **Queue Backlog**: Monitor consumer health, scaling settings +- **Database Connections**: Check connection pool settings, replica health + +### Health Checks +- **Application**: `GET /healthcheck` +- **Database**: Connection pool status +- **Queue**: Consumer status and message counts +- **Cache**: Redis cluster health + +### Performance Tuning +- **App Level**: Worker count, connection pooling +- **Database**: Index optimization, query performance +- **Cache**: TTL settings, memory optimization +- **Network**: Connection keep-alive, timeout settings + +## Configuration Quick Reference + +### Environment Variables +```bash +# Core Settings +ENVIRONMENT=PRODUCTION +PORT=8080 + +# Database URLs +MONGODB_CONNECTION_URI=mongodb+srv://... +TIMESCALE_SERVICE_URL=postgresql://... +REDIS_URI=redis://... + +# AI Provider Keys +OPENAI_API_KEY=sk-... +ANTHROPIC_API_KEY=... + +# Queue Settings +QUEUE_CONNECTIONURL=amqp://... +CONSUMER_STATUS=true +``` + +### Key Configuration Files +- **index.py**: Application entry point and lifecycle +- **config.py**: Environment variable configuration +- **Dockerfile**: Container definition +- **k8s/**: Kubernetes deployment manifests + +This quick reference provides essential information for developers, operators, and stakeholders working with the AI Middleware System. \ No newline at end of file