Overview
Working with AI providers
AI Integration Guide
Prompt Stack provides seamless integration with multiple AI providers. Switch between OpenAI, Anthropic, Google Gemini, and DeepSeek with a single configuration change.
Supported Providers
๐ค OpenAI
- โ GPT-4o, GPT-4o-mini
- โ GPT-3.5 Turbo
- โ Embeddings (Ada v2)
- โ Streaming support
- ๐ฐ $5-20 per million tokens
๐ง Anthropic
- โ Claude 3 Opus, Sonnet, Haiku
- โ Claude 2.1
- โ 200K context window
- โ Streaming support
- ๐ฐ $3-15 per million tokens
๐ Google Gemini
- โ Gemini Pro
- โ Gemini Pro Vision
- โ Embeddings
- โ Multi-modal support
- ๐ฐ Free tier available
๐ DeepSeek
- โ DeepSeek Chat
- โ DeepSeek Coder
- โ 32K context window
- โ Best price/performance
- ๐ฐ $0.14 per million tokens!
๐ก Pro Tip: Start with DeepSeek
DeepSeek offers GPT-4 level performance at 1/100th the cost. Perfect for development and testing.
Quick Start
1. Choose Your Provider
# In backend/.env, add ONE of these:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
DEEPSEEK_API_KEY=sk-...
# Restart Docker to apply changes
docker-compose restart backend
2. Test Your Configuration
# Check available providers
curl http://localhost:8000/api/llm/providers
# Test generation
curl -X POST http://localhost:8000/api/llm/generate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"prompt": "Hello, AI!",
"provider": "deepseek",
"model": "deepseek-chat"
}'
Provider Configuration
OpenAI
# Available models
models = [
"gpt-4o", # Latest, most capable
"gpt-4o-mini", # Faster, cheaper
"gpt-3.5-turbo", # Fast, affordable
"text-embedding-ada-002" # Embeddings
]
# Example usage
response = await llm_service.generate(
prompt="Explain quantum computing",
provider="openai",
model="gpt-4o-mini",
temperature=0.7,
max_tokens=1000
)
Anthropic
# Available models
models = [
"claude-3-opus-20240229", # Most capable
"claude-3-sonnet-20240229", # Balanced
"claude-3-haiku-20240307", # Fast, cheap
"claude-2.1" # Previous gen
]
# Example with system prompt
response = await llm_service.generate(
prompt="Write a haiku about coding",
provider="anthropic",
model="claude-3-haiku-20240307",
system_prompt="You are a poetic programmer"
)
Google Gemini
# Available models
models = [
"gemini-pro", # Text generation
"gemini-pro-vision", # Multi-modal
"embedding-001" # Embeddings
]
# Example with safety settings
response = await llm_service.generate(
prompt="Analyze this business plan",
provider="gemini",
model="gemini-pro",
additional_params={
"safety_settings": [
{
"category": "HARM_CATEGORY_DANGEROUS",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]
}
)
DeepSeek
# Available models
models = [
"deepseek-chat", # General purpose
"deepseek-coder" # Code generation
]
# Example for code generation
response = await llm_service.generate(
prompt="Write a Python function to calculate fibonacci",
provider="deepseek",
model="deepseek-coder",
temperature=0.2, # Lower for code
max_tokens=500
)
Streaming Responses
All providers support streaming for real-time responses:
# Backend streaming endpoint
@router.post("/stream")
async def stream_generate(request: LLMRequest):
return StreamingResponse(
llm_service.stream_generate(
prompt=request.prompt,
provider=request.provider,
model=request.model
),
media_type="text/event-stream"
)
# Frontend consumption
const response = await fetch('/api/llm/stream', {
method: 'POST',
body: JSON.stringify({ prompt, provider, model }),
headers: { 'Content-Type': 'application/json' }
});
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = new TextDecoder().decode(value);
console.log('Chunk:', text);
}
Embeddings & Vector Search
Generate embeddings for semantic search:
# Generate embedding
embedding = await llm_service.create_embedding(
text="Machine learning fundamentals",
model="text-embedding-ada-002" # OpenAI
)
# Store in vector database
await vector_service.store(
content="Machine learning fundamentals",
embedding=embedding,
metadata={"category": "AI", "level": "beginner"}
)
# Semantic search
results = await vector_service.search(
query="How does AI work?",
top_k=5,
namespace="documents"
)
Cost Optimization
๐ฐ Token Usage & Costs
- DeepSeek: $0.14/M tokens (input), $0.28/M (output)
- GPT-3.5: $0.50/M tokens (input), $1.50/M (output)
- GPT-4o-mini: $0.15/M tokens (input), $0.60/M (output)
- Claude Haiku: $0.25/M tokens (input), $1.25/M (output)
- Gemini Pro: Free tier: 60 requests/minute
Cost-Saving Strategies
- Use DeepSeek for development and testing
- Implement response caching for common queries
- Use smaller models (mini/haiku) when possible
- Set appropriate max_tokens limits
- Batch similar requests together
- Use embeddings cache for vector search
Error Handling
try:
response = await llm_service.generate(
prompt=prompt,
provider=provider,
model=model
)
except RateLimitError:
# Switch to backup provider
response = await llm_service.generate(
prompt=prompt,
provider="deepseek", # Fallback
model="deepseek-chat"
)
except InvalidAPIKeyError:
return {"error": "Please configure your API keys"}
except ModelNotAvailableError:
# Use default model for provider
response = await llm_service.generate(
prompt=prompt,
provider=provider
# model param omitted - uses default
)
Best Practices
๐ฏ Model Selection
- Use GPT-4o for complex reasoning tasks
- Use Claude for creative writing and analysis
- Use Gemini for multi-modal tasks
- Use DeepSeek for cost-effective development
โก Performance
- Cache common responses with Redis
- Use streaming for long responses
- Implement request queuing for rate limits
- Monitor token usage and costs
๐ Security
- Never expose API keys to frontend
- Implement user-level rate limiting
- Sanitize user inputs before sending
- Log all AI interactions for audit
Testing AI Features
Visit the AI testing page to try different providers and models:
๐งช Interactive Testing
Go to /test-ai
(requires authentication) to:
- Test all configured providers
- Compare model outputs side-by-side
- Benchmark response times
- Monitor token usage
Troubleshooting
Common Issues
Provider not available
# Check which providers are configured
curl http://localhost:8000/api/llm/providers
# Response shows available providers
{
"providers": ["openai", "deepseek"],
"default_provider": "deepseek"
}
API key not working
- Verify key format (sk-... for OpenAI)
- Check key permissions and quotas
- Ensure Docker was restarted after adding key
- Log out and back in to refresh auth token
Slow responses
- Use streaming for better UX
- Reduce max_tokens if appropriate
- Use faster models (GPT-3.5, Haiku)
- Implement response caching