LLM Service
An LLM-powered impact analysis and report generation service for the Rippler system. This service leverages Large Language Models (OpenAI GPT-4o-mini, Anthropic Claude, and local models via Ollama) to generate comprehensive impact analysis reports for code changes.
Architecture Diagram
The following diagram illustrates the internal architecture of the LLM Service, showing the multi-provider LLM integration, automatic fallback mechanism, request validation, prompt building, and response parsing.
To view and edit the architecture diagram:
- Open
/docs/architecture/services/llm-service.drawioin diagrams.net or VS Code with the Draw.io extension - The diagram shows the complete LLM service architecture including FastAPI endpoints, provider routing, fallback management, caching, rate limiting, and integration with OpenAI, Anthropic, and Ollama
- After making changes, export to HTML and copy to
/website/static/architecture/services/llm-service.drawio.html
Overview
The LLM Service is the intelligence engine of Rippler, using advanced AI models to analyze code changes and predict their impact across the microservice architecture.
Features
- Multi-Provider LLM Integration: Support for OpenAI GPT-4o-mini, Anthropic Claude, and local models (Ollama)
- Automatic Fallback: Seamlessly falls back to local models when remote APIs are unavailable
- Impact Analysis: Generate detailed analysis of code changes and their downstream impact
- Risk Scoring: Automated risk/impact scoring (high, medium, low)
- Structured Reports: JSON-formatted reports with natural language explanations
- Rate Limiting: Graceful handling of API rate limits with retry logic
- Performance: Sub-10-second response time for typical PRs
- Cost Optimization: Streaming and caching support
Technology Stack
- Python 3.11+: Modern Python runtime
- FastAPI: High-performance async web framework
- OpenAI SDK: GPT-4o-mini integration
- Anthropic SDK: Claude integration
- Ollama: Local LLM inference
- Pydantic: Data validation and serialization
API Endpoints
POST /api/v1/analyze
Accepts structured diff/change data and returns a comprehensive impact analysis report.
Request Body:
{
"repository": {
"name": "my-repo",
"owner": "my-org",
"url": "https://github.com/my-org/my-repo"
},
"pull_request": {
"number": 123,
"title": "Add new feature",
"description": "This PR adds a new feature...",
"author": "developer",
"branch": "feature/new-feature",
"base_branch": "main"
},
"changes": [
{
"file": "src/service.py",
"type": "modified",
"additions": 50,
"deletions": 10,
"diff": "diff content..."
}
],
"dependencies": {
"direct": ["service-a", "service-b"],
"transitive": ["service-c", "service-d"]
},
"mode": "production"
}
Response:
{
"summary": {
"text": "Natural language summary...",
"confidence": 0.95
},
"changes_analysis": {
"description": "Detailed analysis...",
"confidence": 0.90
},
"affected_services": [
{
"name": "service-a",
"impact_level": "high",
"reason": "Direct dependency with breaking changes",
"confidence": 0.85
}
],
"risk_assessment": {
"overall_risk": "medium",
"score": 0.65,
"factors": ["Breaking API changes", "Multiple services affected"],
"confidence": 0.88
},
"stakeholders": [
{
"name": "Team A",
"role": "owner",
"notification_priority": "high",
"reason": "Owns affected service"
}
],
"recommendations": [
{
"priority": "high",
"action": "Add integration tests",
"rationale": "To verify compatibility..."
}
],
"metadata": {
"model": "gpt-4o-mini",
"processing_time_ms": 4500,
"tokens_used": 2500,
"used_fallback": false
}
}
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER | LLM provider (openai, anthropic, or local) | openai |
OPENAI_API_KEY | OpenAI API key | - |
ANTHROPIC_API_KEY | Anthropic API key | - |
OPENAI_MODEL | OpenAI model name | gpt-4o-mini |
ANTHROPIC_MODEL | Anthropic model name | claude-3-haiku-20240307 |
LOCAL_MODEL_ENABLED | Enable local model fallback | true |
LOCAL_MODEL_TYPE | Local model type | ollama |
LOCAL_MODEL_NAME | Ollama model name | codellama:7b |
LOCAL_MODEL_BASE_URL | Ollama API URL | http://localhost:11434 |
LOCAL_MODEL_GPU_LAYERS | GPU layers (-1=auto, 0=CPU only) | -1 |
ENABLE_FALLBACK | Enable automatic fallback | true |
HOST | Server host | 0.0.0.0 |
PORT | Server port | 8000 |
LOG_LEVEL | Logging level | INFO |
MAX_RETRIES | Max retry attempts | 3 |
TIMEOUT_SECONDS | Request timeout | 30 |
ENABLE_CACHING | Enable result caching | true |
Local Model Setup
The service supports local models through Ollama for fast, offline inference.
Installing Ollama
# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Or download from https://ollama.ai/download
Pulling a Model
# Pull CodeLlama 7B (recommended for code analysis)
ollama pull codellama:7b
# Or use other code-focused models
ollama pull codellama:13b
ollama pull deepseek-coder:6.7b
Starting Ollama
# Ollama runs as a service on port 11434
ollama serve
# Verify it's running
curl http://localhost:11434/api/tags
GPU Acceleration
Enable GPU acceleration for faster inference:
# Enable GPU with automatic layer offloading (default)
export LOCAL_MODEL_GPU_LAYERS=-1
# Use CPU only
export LOCAL_MODEL_GPU_LAYERS=0
# For multi-GPU setups
export LOCAL_MODEL_MAIN_GPU=0
GPU Requirements:
- NVIDIA GPU with CUDA support (for best performance)
- AMD GPU with ROCm support (experimental)
- Apple Silicon (M1/M2/M3) uses Metal acceleration automatically
Benefits:
- 3-10x faster inference compared to CPU-only
- Lower latency for real-time analysis
- Better handling of larger models
Installation
# Clone the repository
git clone https://github.com/Citi-Rippler/llm-service.git
cd llm-service
# Install dependencies
pip install -r requirements.txt
# For development
pip install -r requirements-dev.txt
Running the Service
Using Python
# Set environment variables
export OPENAI_API_KEY=your-key-here
export LLM_PROVIDER=openai
# Run the service
python -m app.main
Using Uvicorn
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
Using Docker
docker build -t rippler-llm-service .
docker run -p 8000:8000 \
-e OPENAI_API_KEY=your-key \
-e LLM_PROVIDER=openai \
rippler-llm-service
Using Local Models
# Start Ollama
ollama serve
# Run the service with local provider
export LLM_PROVIDER=local
export LOCAL_MODEL_NAME=codellama:7b
python -m app.main
Fallback Configuration
When ENABLE_FALLBACK=true (default), the service automatically falls back to local models if:
- Remote API is unavailable (connection errors)
- Rate limits are exceeded
- Authentication fails
The fallback is transparent - the response includes used_fallback: true in the metadata.
Example fallback scenario:
# OpenAI API down or rate limited
# Service automatically uses local Ollama model
# Response includes: "metadata": {"model": "codellama:7b", "used_fallback": true}
Integration
From Impact Analyzer
The Impact Analyzer sends structured data to the LLM service:
import httpx
async def analyze_pr(pr_data):
async with httpx.AsyncClient() as client:
response = await client.post(
"http://llm-service:8000/api/v1/analyze",
json=pr_data,
timeout=30.0
)
return response.json()
Required Data Structure
The LLM service expects:
- Repository Information: Basic metadata about the repository
- Pull Request Details: PR number, title, description, author
- Code Changes: File-level diffs and statistics
- Dependency Information: Direct and transitive dependencies
Performance
- Response Time: Optimized for < 10 seconds for typical PRs (10-20 file changes)
- Token Usage: Efficient prompt design to minimize token consumption
- Caching: Results cached for identical inputs
- Streaming: Optional streaming support for large analyses
Performance Benchmarks
| Provider | Avg Response Time | Cost per 1K Tokens |
|---|---|---|
| OpenAI GPT-4o-mini | 3-5s | $0.15 |
| Anthropic Claude Haiku | 2-4s | $0.25 |
| Ollama CodeLlama 7B (GPU) | 5-8s | Free |
| Ollama CodeLlama 7B (CPU) | 20-30s | Free |
Error Handling
The service handles:
- API rate limits (with exponential backoff)
- Network timeouts
- Invalid input data
- LLM service unavailability (with automatic fallback)
All errors return structured JSON responses with appropriate HTTP status codes.
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=app --cov-report=html
# Run specific test
pytest tests/test_api.py
Development
Code Quality
# Format code
black app/ tests/
# Sort imports
isort app/ tests/
# Lint
flake8 app/ tests/
# Type checking
mypy app/
Project Structure
app/
├── main.py # FastAPI application entry point
├── api/
│ └── v1/
│ └── endpoints.py # API endpoints
├── core/
│ ├── config.py # Configuration management
│ └── prompts.py # Prompt templates
├── models/
│ ├── request.py # Request models
│ └── response.py # Response models
├── services/
│ ├── llm_service.py # LLM integration
│ └── analyzer.py # Impact analysis logic
└── utils/
├── retry.py # Retry logic
└── cache.py # Caching utilities
Related Documentation
Repository
GitHub: llm-service