Model Comparison
Schatzi AI provides access to 13+ active state-of-the-art AI models from leading providers. All models run on Swiss infrastructure, ensuring your data never leaves Switzerland.
This guide helps you choose the right model for your specific needs, understand pricing, and optimize costs.
Quick Model Selector
Choose by use case:
| Your Need | Recommended Model |
|---|---|
| Daily chat & email | Chat & Document Analysis - Medium |
| Document analysis | Chat & Document Analysis - Medium |
| Swiss compliance required | Apertus Swiss LLM |
| Complex reasoning | Reasoning & Problem Solving - Xtra Large |
| Web search & research | Search, Chat & Analysis - Small |
| Multilingual chat | Llama 3.3 Multi-lingual - Medium |
| Budget-conscious | Apertus Swiss LLM - Small |
**Estimated for typical professional usage on Basic plan
Complete Model Reference
Chat & General Purpose Models
Chat & Document Analysis - Medium
- Parameters: 24 Billion
- Context Window: 128,000 tokens (~96,000 words)
Capabilities:
- Versatile multimodal model
- Vision and image analysis
- Conversational agents
- Strong contextual understanding
- All major European languages
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- Daily business communications
- Email drafting and responses
- General document analysis
- Customer service responses
- Quick content generation
Search, Chat & Analysis - Small
- Parameters: ~17 Billion (Llama 4 Scout)
- Context Window: Variable
Capabilities:
- Optimized for web search and chat
- Suitable for artists and content creation, including storytelling
- Web search integration
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- Web search and research tasks
- Content creation and storytelling
- Quick information retrieval
Chat & Document Analysis - Xtra Large
- Parameters: 235 Billion (Mixture of Experts - 22B active)
- Context Window: 128,000 tokens (~96,000 words)
Capabilities:
- Very large-scale model, rivaling GPT-4 or Claude 3 Opus across a broad range of complex tasks
- Advanced multilingual capabilities
- Reasoning mode can be enabled to dynamically tailor responses to the context and complexity of queries
- Document analysis excellence
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- Complex document analysis
- High-quality content generation
- Advanced multilingual tasks
- Tasks requiring GPT-4 level performance
- Dynamic reasoning tasks
Why Choose This Model:
- Premium performance at competitive pricing
- Advanced reasoning capabilities
- Multilingual excellence
Swiss LLM Models (🇨🇭 AI Act Compliant)
Apertus Swiss LLM - Large (70B)
- Parameters: 70 Billion
- Context Window: 65,536 tokens (~49,000 words)
- Max Output: 16,384 tokens
Capabilities:
- Fully documented and transparent
- AI Act compliant
- Respectful of privacy and intellectual property
- Performance on par with market leaders
- Optimized for German, French, Italian, English
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- Government agencies
- Swiss financial services
- R&D teams requiring transparency
- Multilingual European services
- Compliance-heavy industries
Why Choose Apertus:
- Swiss-developed and Swiss-hosted
- Fully AI Act compliant
- Complete transparency and documentation
- Privacy-first architecture
- Multilingual European focus
Apertus Swiss LLM - Small (8B)
- Parameters: 8 Billion
- Context Window: 32,768 tokens (~24,500 words)
- Max Output: 8,192 tokens
Capabilities:
- Optimized for multilingual dialogue
- Fast response times
- Efficient for routine tasks
- Same compliance as Large model
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- High-volume, routine conversations
- Budget-conscious Swiss compliance needs
- Quick responses with Swiss data protection
- Multilingual customer support
Cost Comparison:
- Much cheaper than Apertus Large
- Perfect for high-frequency, simple tasks
Reasoning & Problem-Solving Models
Reasoning & Problem Solving - Xtra Large
- Parameters: 670 Billion
- Context Window: 65,536 tokens
- Max Output: 16,384 tokens
Capabilities:
- Advanced reasoning chat completions
- Complex problem-solving
- Multi-step logical analysis
- Strategic planning
- Deep technical understanding
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
- Note: Premium pricing for advanced reasoning
Best For:
- Complex strategic analysis
- Advanced research tasks
- Multi-step problem decomposition
- Technical architecture decisions
- Financial modeling and forecasting
When to Use:
- Tasks requiring deep reasoning
- When accuracy is critical
- Complex business decisions
- Advanced technical problems
This is our most expensive model. Use for complex reasoning tasks where the quality justifies the higher cost.
Reasoning & Problem Solving - Medium
- Parameters: 32 Billion
- Context Window: 32,768 tokens (~24,500 words)
- Max Output: 8,192 tokens
Capabilities:
- Optimized for thinking and reasoning
- Strong problem-solving abilities
- Cost-effective reasoning model
- Multi-step analysis
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- Mid-complexity reasoning tasks
- Problem-solving at lower cost than XL
- Logical analysis
- Structured thinking tasks
Cost Comparison:
- Much cheaper than Reasoning - XL
- Better reasoning than general chat models
- Balanced performance-to-cost ratio
Fast Reasoning & Problem Solving - Small
- Parameters: 8 Billion
- Context Window: 32,768 tokens (~24,500 words)
- Max Output: 8,192 tokens
Capabilities:
- Optimized for thinking and reasoning
- Fast response times
- Efficient for quick problem-solving
- Function calling support
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- High-volume reasoning tasks
- Quick logical analysis
- Budget-conscious problem-solving
- Rapid iteration on solutions
Cost Comparison:
- Most affordable reasoning model
- Much cheaper than Reasoning - XL
- Perfect for high-frequency reasoning tasks
Reasoning & Agent tasks - Xtra Large
- Parameters: 120 Billion
- Context Window: 32,768 tokens (~24,500 words)
- Max Output: 8,192 tokens
Capabilities:
- Optimized for powerful reasoning
- Agentic task execution
- Versatile developer use cases
- Function calling support
- Data analysis capabilities
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- Agentic workflows
- Automated task execution
- Complex data analysis
- Developer tools and automation
- Multi-step reasoning at lower cost
Why Choose This Model:
- Excellent value for advanced reasoning
- Strong agent capabilities
- Function calling for automation
- More affordable than premium reasoning models
Kimi K2 - Chat
- Context Window: Variable
Capabilities:
- Optimized for multilingual dialogue use cases
- Strong conversational abilities
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- General chat and conversation
- Multilingual dialogue
Vision & Document Analysis Models
Document Analysis - Small
- Parameters: 12 Billion
- Context Window: 32,768 tokens (~24,500 words)
- Max Output: 8,192 tokens
- Vision: ✅ Supports image analysis
Capabilities:
- Optimized for handling text and image input
- Multimodal - processes both text and images
- Document analysis with visual elements
- Chart and diagram interpretation
- Screenshot analysis
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- Document analysis with images
- PDF processing with charts/diagrams
- Screenshot interpretation
- Visual content analysis
- Forms and invoice processing
Document Analysis - Xtra Small
- Parameters: 2 Billion
- Context Window: 16,384 tokens (~12,000 words)
- Max Output: 4,096 tokens
- Vision: ✅ Supports image analysis
Capabilities:
- Compact and efficient vision-language model
- Fast processing of images and text
- Budget-friendly multimodal option
- Quick visual analysis
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- High-volume image processing
- Quick visual QA
- Simple document scans
- Receipt and form analysis
- Budget-conscious vision tasks
Cost Comparison:
- Most affordable vision model
- Perfect for simple visual tasks at scale
Multilingual Models
Llama 3.3 Multi-lingual - Medium
- Parameters: 70 Billion
- Context Window: 131,072 tokens (~98,000 words)
- Max Output: 8,192 tokens
Capabilities:
- Optimized for multilingual dialogue
- Strong European language support
- Natural conversation flow
- Cultural context awareness
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Best For:
- Multilingual customer support
- Cross-border communication
- Translation with context
- International business
Supported Languages:
- German, French, Italian (Swiss variants)
- English, Spanish, Portuguese
- Dutch, Polish, Czech
- And 50+ more languages
Complete Pricing Overview
All Active Models - Price Comparison
Loading prices...
Best Value Models
Overall Best Value:
- Chat & Document Analysis - Xtra Large
- GPT-4 level performance at competitive pricing
- Great for complex document analysis
Best Budget Option:
- Fast Reasoning & Problem Solving - Small
- Most affordable reasoning model
- Perfect for high-frequency use
Best Swiss Compliance:
- Apertus Swiss LLM - Small
- Full AI Act compliance
- More affordable than Apertus Large
Best Vision Model:
- Document Analysis - Xtra Small
- Most affordable vision capabilities
- Perfect for document scanning at scale
Cost Estimation
Typical Task Types
| Task Type | Token Usage | Recommended Model |
|---|---|---|
| Email response | 500 input + 300 output | Chat & Document Analysis - Medium |
| 10-page document summary | 10K input + 1K output | Chat & Document Analysis - Medium |
| Contract analysis (30 pages) | 30K input + 2K output | Apertus Swiss LLM - Large |
| Complex reasoning task | 5K input + 3K output | Reasoning & Problem Solving - Xtra Large |
| Multilingual chat (hour) | 15K input + 10K output | Llama 3.3 Multi-lingual - Medium |
Choose the right model for each task. Use smaller, cheaper models for routine work and reserve premium models for complex analysis. See the pricing table above for current rates.
Choosing the Right Model
Decision Framework
1. Task Complexity
- Simple/Routine → Apertus Small, Chat & Document Medium
- Complex/Technical → Apertus Large, Chat & Document Xtra Large
- Advanced Reasoning → Reasoning - XL
2. Language Requirements
- German/French/Italian primary → Apertus models
- Multilingual → Llama 3.3
- English-focused → Chat & Document Medium
3. Compliance Needs
- AI Act compliance required → Apertus models
- Swiss data sovereignty → All models (Swiss-hosted)
- Maximum transparency → Apertus models
4. Budget Constraints
- Minimal cost → Apertus Swiss LLM - Small
- Balanced → Chat & Document Analysis - Medium
- Premium quality → Reasoning & Problem Solving - Xtra Large
Model Comparison Matrix
| Feature | Chat & Doc Medium | Apertus Large | Apertus Small | Reasoning - XL | Llama 3.3 |
|---|---|---|---|---|---|
| Swiss LLM | ❌ | ✅ | ✅ | ❌ | ❌ |
| AI Act Compliant | ⚠️ | ✅ | ✅ | ⚠️ | ⚠️ |
| Multilingual | ✅ | ✅✅ | ✅✅ | ✅ | ✅✅ |
| Reasoning | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Coding | ⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡ ⚡⚡⚡ | ⚡⚡ | ⚡⚡⚡ |
| Context Window | 128K | 65K | 32K | 65K | 131K |
🎓 Best Practices
Cost Optimization Strategies
1. Match Model to Task
✅ Good: Use Chat & Document Medium for email responses
❌ Bad: Use Reasoning - XL for simple emails
Savings: 17x cost reduction
2. Optimize Prompt Length
❌ Inefficient: Long, repetitive context
✅ Efficient: Concise, specific prompts
Savings: 30-50% token reduction
3. Batch Similar Tasks
✅ Good: Process 10 documents in one conversation
❌ Bad: Start new chat for each document
Savings: Reduce redundant context
4. Use Smaller Models When Possible
For routine tasks: Apertus Small
For complex analysis: Apertus Large
For deep reasoning: Reasoning - XL
Quality Optimization
When to Use Premium Models:
- Critical business decisions
- Complex technical problems
- High-stakes legal/financial analysis
- Advanced reasoning requirements
When Budget Models Are Sufficient:
- Routine correspondence
- Simple summaries
- Basic translations
- FAQ responses
- Content drafts (can be refined with premium model)
Model Availability
Currently Active Models
Chat & General Purpose:
- Chat & Document Analysis - Medium (Mistral 24B)
- Search, Chat & Analysis - Small (Llama 4 Scout 17B)
- Chat & Document Analysis - Xtra Large (Qwen3-VL 235B)
- Kimi K2 - Chat
Swiss LLM (AI Act Compliant):
- Apertus Swiss LLM - Large (70B)
- Apertus Swiss LLM - Small (8B)
Reasoning & Problem-Solving:
- Reasoning & Problem Solving - Xtra Large (DeepSeek R1 670B)
- Reasoning & Problem Solving - Medium (QwQ 32B)
- Fast Reasoning & Problem Solving - Small (Qwen3 8B)
- Reasoning & Agent tasks - Xtra Large (GPT-OSS 120B)
Vision & Document Analysis:
- Document Analysis - Small (Gemma 12B) - Vision enabled
- Document Analysis - Xtra Small (Granite 2B) - Vision enabled
Multilingual:
- Llama 3.3 Multi-lingual - Medium (70B)
We regularly add new models and update existing ones. Check your dashboard for the latest available models and current pricing.
Related Documentation
Learn More:
- Model Profiles - Detailed model descriptions
- Choosing the Right Model - Interactive decision guide
- Understanding Tokens - How billing works
- Prompt Engineering - Get better results
Optimize Usage:
- Best Practices - When to use AI
- Structured Prompts - Advanced techniques
- Monitoring Usage - Track costs
FAQ
Q: Can I switch models mid-conversation? A: Yes! You can change models at any time. The conversation context carries over (note: very long contexts may be truncated for smaller models).
Q: Which model is best for Swiss legal documents? A: Apertus Swiss LLM (Large or Small) - they're AI Act compliant and optimized for Swiss languages.
Q: What's the cheapest way to process 100 documents? A: Use Apertus Swiss LLM - Small for routine processing, escalate to Apertus Large or Chat & Document Analysis - Medium for complex analysis. Check the pricing table above for current rates.
Q: Do all models support document upload? A: Yes, all active models support document analysis. Some models have vision capabilities for image analysis.
Q: How do I track which model costs what? A: Your usage dashboard shows token usage and costs broken down by model. Navigate to Account → Usage & Billing.
Q: Can I set spending limits per model? A: Not yet, but you can set overall monthly spending limits. Model-specific limits are on our roadmap.
Get Started
Ready to choose your model?
- Log in to your Schatzi AI account
- Start a new chat in OpenWebUI
- Click the model selector at the top
- Choose the right model for your task
- Start chatting!
Need help? Contact Support | View Pricing Plans