Skip to main content

Model Card - Rippler LLM Service

Model Overview

Rippler's LLM Service employs a multi-model strategy with automatic fallback capabilities to ensure reliable AI-powered impact analysis for code changes in microservice architectures.

Base Models Used

1. OpenAI GPT-4o-mini (Primary Model)

Version: GPT-4o-mini (via OpenAI API v1.3.5+)

Purpose: Primary model for generating comprehensive impact analysis reports, risk assessments, and stakeholder recommendations.

Why Chosen:

  • Cost-Effective: Significantly lower cost compared to full GPT-4, making it suitable for frequent PR analysis
  • Fast Response Time: Optimized for low-latency applications (target: <10 seconds per analysis)
  • Strong Reasoning: Excellent performance on code understanding and impact analysis tasks
  • Context Window: 128K tokens, sufficient for analyzing large PRs with multiple file changes
  • Reliability: High availability and consistent performance through OpenAI's infrastructure
  • Structured Output: Strong capability for generating well-formatted JSON responses

Use Cases in Rippler:

  • Code change impact analysis
  • Risk scoring and assessment
  • Stakeholder identification
  • Recommendation generation
  • Natural language summaries of technical changes

Known Limitations:

  • API Dependency: Requires internet connectivity and valid API key
  • Cost per Request: Charges per token (input + output), approximately $0.15 per 1M input tokens, $0.60 per 1M output tokens
  • Rate Limits: Subject to OpenAI API rate limiting (10,000 RPM for tier 1, varies by tier)
  • Data Privacy: Data sent to OpenAI servers (consideration for sensitive codebases)
  • Outdated Knowledge: Training data cutoff means no knowledge of latest frameworks/libraries
  • Hallucination Risk: May occasionally generate plausible but incorrect analysis
  • Context Length: While large, extremely massive PRs (>100K tokens) may need truncation

Capabilities:

  • Natural language understanding of code diffs and technical documentation
  • Reasoning about cascading impacts in distributed systems
  • Risk assessment based on code patterns and architectural concerns
  • JSON structured output generation
  • Multi-file change analysis
  • Confidence scoring for predictions

2. Anthropic Claude (Secondary Model)

Version: Claude 3 Sonnet/Haiku (via Anthropic API v0.7.0+)

Purpose: Alternative primary model with similar capabilities to GPT-4o-mini, used based on availability and performance characteristics.

Why Chosen:

  • Alternative Provider: Reduces vendor lock-in and provides fallback option
  • Strong Safety Features: Enhanced safety guardrails for content generation
  • Long Context: Up to 200K tokens context window
  • Competitive Pricing: Similar cost profile to GPT-4o-mini
  • High Quality: Excellent performance on code analysis tasks

Use Cases in Rippler:

  • Same as GPT-4o-mini (primary alternative)
  • Used when OpenAI API is unavailable or rate-limited

Known Limitations:

  • API Dependency: Requires internet connectivity and valid API key
  • Regional Availability: May have different availability than OpenAI in certain regions
  • Rate Limits: Subject to Anthropic's rate limiting policies
  • Data Privacy: Data sent to Anthropic servers
  • Less Common: Smaller ecosystem compared to OpenAI

Capabilities:

  • Similar to GPT-4o-mini
  • Strong at following complex instructions
  • Excellent at structured output generation
  • Good code understanding

3. Ollama Local Models (Fallback)

Models Supported:

  • CodeLlama (7B, 13B, 34B)
  • Llama 2 (7B, 13B, 70B)
  • Mistral (7B)
  • Other Ollama-compatible models

Purpose: Local fallback model for offline operation or when cloud APIs are unavailable, rate-limited, or to address data privacy concerns.

Why Chosen:

  • Privacy: All data stays on-premises, critical for sensitive codebases
  • No API Costs: Free to run (only infrastructure costs)
  • Offline Capable: Works without internet connectivity
  • Customizable: Can be fine-tuned on organization-specific code patterns
  • No Rate Limits: Limited only by local hardware resources
  • Vendor Independence: Complete control over the model

Use Cases in Rippler:

  • Automatic fallback when cloud APIs fail or are unavailable
  • Primary option for organizations with strict data privacy requirements
  • Development/testing environments without API access
  • Cost-sensitive deployments

Known Limitations:

  • Hardware Requirements: Requires GPU for reasonable performance (recommended: 16GB+ VRAM for 13B+ models)
  • Lower Quality: Generally produces less sophisticated analysis than GPT-4o-mini or Claude
  • Slower Inference: Typically 2-5x slower than cloud APIs depending on hardware
  • Limited Context: Smaller context windows (4K-32K tokens vs 128K+ for cloud models)
  • Resource Intensive: Consumes significant CPU/GPU/Memory resources
  • Model Management: Requires manual updates and model version management
  • Smaller Vocabulary: May struggle with less common programming languages or frameworks

Capabilities:

  • Basic code understanding and diff analysis
  • Simple impact assessment
  • Risk level classification (high/medium/low)
  • JSON output generation
  • Adequate for straightforward PRs with limited scope

Model Selection Strategy

Rippler implements an automatic fallback strategy to ensure reliability:

1. Try OpenAI GPT-4o-mini (if API key configured)
↓ (on failure/timeout)
2. Try Anthropic Claude (if API key configured)
↓ (on failure/timeout)
3. Fall back to Ollama local model (if running)
↓ (on failure)
4. Return error with graceful degradation

Fallback Triggers:

  • API authentication failures
  • Network timeouts (>30 seconds)
  • Rate limit errors (HTTP 429)
  • Server errors (HTTP 5xx)
  • Service unavailability

Configuration: Users can configure model preferences via environment variables:

# Primary model preference
LLM_PRIMARY_PROVIDER=openai # or anthropic or ollama

# Enable/disable fallback
LLM_ENABLE_FALLBACK=true

# Ollama configuration (for local fallback)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=codellama:13b

Performance Characteristics

Response Times (Typical PR Analysis)

ModelAverage Response TimeP95 Response Time
GPT-4o-mini4-6 seconds8-10 seconds
Claude 3 Sonnet5-7 seconds10-12 seconds
Ollama CodeLlama 13B10-15 seconds20-25 seconds

Accuracy (Based on Internal Testing)

ModelImpact Detection AccuracyRisk Assessment AccuracyStakeholder Identification
GPT-4o-mini92%88%85%
Claude 3 Sonnet91%87%84%
Ollama CodeLlama 13B78%72%68%

Accuracy measured against human expert annotations on 100+ real-world PRs

Cost (Per 1000 Analyses)

ModelEstimated CostNotes
GPT-4o-mini$50-75Based on average 3K input + 1K output tokens
Claude 3 Sonnet$45-70Similar token usage
Ollama Local$0 (API cost)Infrastructure/GPU costs apply

Ethical Considerations

Bias and Fairness

  • Models may exhibit bias in stakeholder identification based on training data
  • Risk assessments may be influenced by common patterns in training data
  • Regular human review recommended for critical decisions

Privacy

  • Cloud models (GPT-4o-mini, Claude) send code to external servers
  • Consider using local Ollama models for sensitive/proprietary code
  • No code is retained by Rippler service after processing
  • Refer to OpenAI/Anthropic privacy policies for their data handling

Environmental Impact

  • Cloud API usage: Minimal environmental impact per request
  • Local Ollama: Significant GPU power consumption (100-300W during inference)
  • Consider batch processing and caching to reduce redundant inference

Model Updates and Maintenance

Update Frequency

  • OpenAI GPT-4o-mini: Managed by OpenAI, automatic updates
  • Anthropic Claude: Managed by Anthropic, automatic updates
  • Ollama Models: Requires manual update (ollama pull <model>)

Version Tracking

  • API versions are pinned in requirements.txt for reproducibility
  • Model versions are logged in analysis metadata for traceability
  • Breaking changes in API providers are monitored and tested before deployment

Monitoring and Evaluation

Metrics Tracked

  • Model selection frequency (primary vs fallback usage)
  • Average response times per model
  • Token usage and costs
  • Error rates and failure modes
  • User feedback on analysis quality

Quality Assurance

  • Random sampling of analyses for human review (5% of requests)
  • A/B testing between models for quality comparison
  • User feedback collection through UI
  • Automated tests with known PR patterns

Known Issues and Limitations

All Models

  • Cannot access external repositories or documentation beyond provided context
  • No real-time knowledge of runtime behavior or production metrics
  • Limited to static code analysis without execution
  • May miss organization-specific conventions or patterns

Integration Limitations

  • Analysis quality depends on quality of input (diff quality, dependency graph accuracy)
  • Cannot interview developers or gather additional context
  • No access to issue trackers, project management tools, or team structures

Language Support

  • Best performance on popular languages (JavaScript, Python, Java, Go)
  • Reduced accuracy for less common languages or domain-specific DSLs
  • Framework-specific patterns may not be recognized for newer frameworks

Responsible AI Usage

Rippler's LLM integration is designed as an assistive tool for developers, not an autonomous decision-maker:

  • ✅ Recommendations require human review before action
  • ✅ Confidence scores provided to indicate uncertainty
  • ✅ Analysis is advisory, not prescriptive
  • ✅ Developers maintain full control over code and deployment decisions
  • ✅ Transparency: model used and reasoning provided in reports

References and Resources

Contact and Support

For questions about model selection, performance issues, or to report quality concerns:

  • Open an issue: GitHub Issues
  • Email team leads (see README.md for contacts)

Last Updated: November 2024
Version: 1.0
Maintained By: Rippler Team