Developer Guide

Overview

Working with AI providers

AI Integration Guide

Prompt Stack provides seamless integration with multiple AI providers. Switch between OpenAI, Anthropic, Google Gemini, and DeepSeek with a single configuration change.

Supported Providers

๐Ÿค– OpenAI

  • โœ… GPT-4o, GPT-4o-mini
  • โœ… GPT-3.5 Turbo
  • โœ… Embeddings (Ada v2)
  • โœ… Streaming support
  • ๐Ÿ’ฐ $5-20 per million tokens

๐Ÿง  Anthropic

  • โœ… Claude 3 Opus, Sonnet, Haiku
  • โœ… Claude 2.1
  • โœ… 200K context window
  • โœ… Streaming support
  • ๐Ÿ’ฐ $3-15 per million tokens

๐ŸŒŸ Google Gemini

  • โœ… Gemini Pro
  • โœ… Gemini Pro Vision
  • โœ… Embeddings
  • โœ… Multi-modal support
  • ๐Ÿ’ฐ Free tier available

๐Ÿš€ DeepSeek

  • โœ… DeepSeek Chat
  • โœ… DeepSeek Coder
  • โœ… 32K context window
  • โœ… Best price/performance
  • ๐Ÿ’ฐ $0.14 per million tokens!

๐Ÿ’ก Pro Tip: Start with DeepSeek

DeepSeek offers GPT-4 level performance at 1/100th the cost. Perfect for development and testing.

Quick Start

1. Choose Your Provider

# In backend/.env, add ONE of these:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
DEEPSEEK_API_KEY=sk-...

# Restart Docker to apply changes
docker-compose restart backend

2. Test Your Configuration

# Check available providers
curl http://localhost:8000/api/llm/providers

# Test generation
curl -X POST http://localhost:8000/api/llm/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "prompt": "Hello, AI!",
    "provider": "deepseek",
    "model": "deepseek-chat"
  }'

Provider Configuration

OpenAI

# Available models
models = [
    "gpt-4o",           # Latest, most capable
    "gpt-4o-mini",      # Faster, cheaper
    "gpt-3.5-turbo",    # Fast, affordable
    "text-embedding-ada-002"  # Embeddings
]

# Example usage
response = await llm_service.generate(
    prompt="Explain quantum computing",
    provider="openai",
    model="gpt-4o-mini",
    temperature=0.7,
    max_tokens=1000
)

Anthropic

# Available models
models = [
    "claude-3-opus-20240229",    # Most capable
    "claude-3-sonnet-20240229",  # Balanced
    "claude-3-haiku-20240307",   # Fast, cheap
    "claude-2.1"                 # Previous gen
]

# Example with system prompt
response = await llm_service.generate(
    prompt="Write a haiku about coding",
    provider="anthropic",
    model="claude-3-haiku-20240307",
    system_prompt="You are a poetic programmer"
)

Google Gemini

# Available models
models = [
    "gemini-pro",         # Text generation
    "gemini-pro-vision",  # Multi-modal
    "embedding-001"       # Embeddings
]

# Example with safety settings
response = await llm_service.generate(
    prompt="Analyze this business plan",
    provider="gemini",
    model="gemini-pro",
    additional_params={
        "safety_settings": [
            {
                "category": "HARM_CATEGORY_DANGEROUS",
                "threshold": "BLOCK_MEDIUM_AND_ABOVE"
            }
        ]
    }
)

DeepSeek

# Available models
models = [
    "deepseek-chat",   # General purpose
    "deepseek-coder"   # Code generation
]

# Example for code generation
response = await llm_service.generate(
    prompt="Write a Python function to calculate fibonacci",
    provider="deepseek",
    model="deepseek-coder",
    temperature=0.2,  # Lower for code
    max_tokens=500
)

Streaming Responses

All providers support streaming for real-time responses:

# Backend streaming endpoint
@router.post("/stream")
async def stream_generate(request: LLMRequest):
    return StreamingResponse(
        llm_service.stream_generate(
            prompt=request.prompt,
            provider=request.provider,
            model=request.model
        ),
        media_type="text/event-stream"
    )

# Frontend consumption
const response = await fetch('/api/llm/stream', {
    method: 'POST',
    body: JSON.stringify({ prompt, provider, model }),
    headers: { 'Content-Type': 'application/json' }
});

const reader = response.body.getReader();
while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    const text = new TextDecoder().decode(value);
    console.log('Chunk:', text);
}

Embeddings & Vector Search

Generate embeddings for semantic search:

# Generate embedding
embedding = await llm_service.create_embedding(
    text="Machine learning fundamentals",
    model="text-embedding-ada-002"  # OpenAI
)

# Store in vector database
await vector_service.store(
    content="Machine learning fundamentals",
    embedding=embedding,
    metadata={"category": "AI", "level": "beginner"}
)

# Semantic search
results = await vector_service.search(
    query="How does AI work?",
    top_k=5,
    namespace="documents"
)

Cost Optimization

๐Ÿ’ฐ Token Usage & Costs

  • DeepSeek: $0.14/M tokens (input), $0.28/M (output)
  • GPT-3.5: $0.50/M tokens (input), $1.50/M (output)
  • GPT-4o-mini: $0.15/M tokens (input), $0.60/M (output)
  • Claude Haiku: $0.25/M tokens (input), $1.25/M (output)
  • Gemini Pro: Free tier: 60 requests/minute

Cost-Saving Strategies

  • Use DeepSeek for development and testing
  • Implement response caching for common queries
  • Use smaller models (mini/haiku) when possible
  • Set appropriate max_tokens limits
  • Batch similar requests together
  • Use embeddings cache for vector search

Error Handling

try:
    response = await llm_service.generate(
        prompt=prompt,
        provider=provider,
        model=model
    )
except RateLimitError:
    # Switch to backup provider
    response = await llm_service.generate(
        prompt=prompt,
        provider="deepseek",  # Fallback
        model="deepseek-chat"
    )
except InvalidAPIKeyError:
    return {"error": "Please configure your API keys"}
except ModelNotAvailableError:
    # Use default model for provider
    response = await llm_service.generate(
        prompt=prompt,
        provider=provider
        # model param omitted - uses default
    )

Best Practices

๐ŸŽฏ Model Selection

  • Use GPT-4o for complex reasoning tasks
  • Use Claude for creative writing and analysis
  • Use Gemini for multi-modal tasks
  • Use DeepSeek for cost-effective development

โšก Performance

  • Cache common responses with Redis
  • Use streaming for long responses
  • Implement request queuing for rate limits
  • Monitor token usage and costs

๐Ÿ”’ Security

  • Never expose API keys to frontend
  • Implement user-level rate limiting
  • Sanitize user inputs before sending
  • Log all AI interactions for audit

Testing AI Features

Visit the AI testing page to try different providers and models:

๐Ÿงช Interactive Testing

Go to /test-ai (requires authentication) to:

  • Test all configured providers
  • Compare model outputs side-by-side
  • Benchmark response times
  • Monitor token usage

Troubleshooting

Common Issues

Provider not available

# Check which providers are configured
curl http://localhost:8000/api/llm/providers

# Response shows available providers
{
  "providers": ["openai", "deepseek"],
  "default_provider": "deepseek"
}

API key not working

  • Verify key format (sk-... for OpenAI)
  • Check key permissions and quotas
  • Ensure Docker was restarted after adding key
  • Log out and back in to refresh auth token

Slow responses

  • Use streaming for better UX
  • Reduce max_tokens if appropriate
  • Use faster models (GPT-3.5, Haiku)
  • Implement response caching