Model Comparison

Schatzi AI provides access to 13+ active state-of-the-art AI models from leading providers. All models run on Swiss infrastructure, ensuring your data never leaves Switzerland.

This guide helps you choose the right model for your specific needs, understand pricing, and optimize costs.

Quick Model Selector

Choose by use case:

Your Need	Recommended Model
Daily chat & email	Chat & Document Analysis - Medium
Document analysis	Chat & Document Analysis - Medium
Swiss compliance required	Apertus Swiss LLM
Complex reasoning	Reasoning & Problem Solving - Xtra Large
Web search & research	Search, Chat & Analysis - Small
Multilingual chat	Llama 3.3 Multi-lingual - Medium
Budget-conscious	Apertus Swiss LLM - Small

**Estimated for typical professional usage on Basic plan

Complete Model Reference

Chat & General Purpose Models

Chat & Document Analysis - Medium

Parameters: 24 Billion
Context Window: 128,000 tokens (~96,000 words)

Capabilities:

Versatile multimodal model
Vision and image analysis
Conversational agents
Strong contextual understanding
All major European languages

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

Daily business communications
Email drafting and responses
General document analysis
Customer service responses
Quick content generation

Search, Chat & Analysis - Small

Parameters: ~17 Billion (Llama 4 Scout)
Context Window: Variable

Capabilities:

Optimized for web search and chat
Suitable for artists and content creation, including storytelling
Web search integration

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

Web search and research tasks
Content creation and storytelling
Quick information retrieval

Chat & Document Analysis - Xtra Large

Parameters: 235 Billion (Mixture of Experts - 22B active)
Context Window: 128,000 tokens (~96,000 words)

Capabilities:

Very large-scale model, rivaling GPT-4 or Claude 3 Opus across a broad range of complex tasks
Advanced multilingual capabilities
Reasoning mode can be enabled to dynamically tailor responses to the context and complexity of queries
Document analysis excellence

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

Complex document analysis
High-quality content generation
Advanced multilingual tasks
Tasks requiring GPT-4 level performance
Dynamic reasoning tasks

Why Choose This Model:

Premium performance at competitive pricing
Advanced reasoning capabilities
Multilingual excellence

Swiss LLM Models (🇨🇭 AI Act Compliant)

Apertus Swiss LLM - Large (70B)

Parameters: 70 Billion
Context Window: 65,536 tokens (~49,000 words)
Max Output: 16,384 tokens

Capabilities:

Fully documented and transparent
AI Act compliant
Respectful of privacy and intellectual property
Performance on par with market leaders
Optimized for German, French, Italian, English

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

Government agencies
Swiss financial services
R&D teams requiring transparency
Multilingual European services
Compliance-heavy industries

Why Choose Apertus:

Swiss-developed and Swiss-hosted
Fully AI Act compliant
Complete transparency and documentation
Privacy-first architecture
Multilingual European focus

Apertus Swiss LLM - Small (8B)

Parameters: 8 Billion
Context Window: 32,768 tokens (~24,500 words)
Max Output: 8,192 tokens

Capabilities:

Optimized for multilingual dialogue
Fast response times
Efficient for routine tasks
Same compliance as Large model

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

High-volume, routine conversations
Budget-conscious Swiss compliance needs
Quick responses with Swiss data protection
Multilingual customer support

Cost Comparison:

Much cheaper than Apertus Large
Perfect for high-frequency, simple tasks

Reasoning & Problem-Solving Models

Reasoning & Problem Solving - Xtra Large

Parameters: 670 Billion
Context Window: 65,536 tokens
Max Output: 16,384 tokens

Capabilities:

Advanced reasoning chat completions
Complex problem-solving
Multi-step logical analysis
Strategic planning
Deep technical understanding

Pricing:

Input: ... per million tokens
Output: ... per million tokens
Note: Premium pricing for advanced reasoning

Best For:

Complex strategic analysis
Advanced research tasks
Multi-step problem decomposition
Technical architecture decisions
Financial modeling and forecasting

When to Use:

Tasks requiring deep reasoning
When accuracy is critical
Complex business decisions
Advanced technical problems

Premium Model

This is our most expensive model. Use for complex reasoning tasks where the quality justifies the higher cost.

Reasoning & Problem Solving - Medium

Parameters: 32 Billion
Context Window: 32,768 tokens (~24,500 words)
Max Output: 8,192 tokens

Capabilities:

Optimized for thinking and reasoning
Strong problem-solving abilities
Cost-effective reasoning model
Multi-step analysis

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

Mid-complexity reasoning tasks
Problem-solving at lower cost than XL
Logical analysis
Structured thinking tasks

Cost Comparison:

Much cheaper than Reasoning - XL
Better reasoning than general chat models
Balanced performance-to-cost ratio

Fast Reasoning & Problem Solving - Small

Parameters: 8 Billion
Context Window: 32,768 tokens (~24,500 words)
Max Output: 8,192 tokens

Capabilities:

Optimized for thinking and reasoning
Fast response times
Efficient for quick problem-solving
Function calling support

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

High-volume reasoning tasks
Quick logical analysis
Budget-conscious problem-solving
Rapid iteration on solutions

Cost Comparison:

Most affordable reasoning model
Much cheaper than Reasoning - XL
Perfect for high-frequency reasoning tasks

Reasoning & Agent tasks - Xtra Large

Parameters: 120 Billion
Context Window: 32,768 tokens (~24,500 words)
Max Output: 8,192 tokens

Capabilities:

Optimized for powerful reasoning
Agentic task execution
Versatile developer use cases
Function calling support
Data analysis capabilities

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

Agentic workflows
Automated task execution
Complex data analysis
Developer tools and automation
Multi-step reasoning at lower cost

Why Choose This Model:

Excellent value for advanced reasoning
Strong agent capabilities
Function calling for automation
More affordable than premium reasoning models

Kimi K2 - Chat

Context Window: Variable

Capabilities:

Optimized for multilingual dialogue use cases
Strong conversational abilities

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

General chat and conversation
Multilingual dialogue

Vision & Document Analysis Models

Document Analysis - Small

Parameters: 12 Billion
Context Window: 32,768 tokens (~24,500 words)
Max Output: 8,192 tokens
Vision: ✅ Supports image analysis

Capabilities:

Optimized for handling text and image input
Multimodal - processes both text and images
Document analysis with visual elements
Chart and diagram interpretation
Screenshot analysis

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

Document analysis with images
PDF processing with charts/diagrams
Screenshot interpretation
Visual content analysis
Forms and invoice processing

Document Analysis - Xtra Small

Parameters: 2 Billion
Context Window: 16,384 tokens (~12,000 words)
Max Output: 4,096 tokens
Vision: ✅ Supports image analysis

Capabilities:

Compact and efficient vision-language model
Fast processing of images and text
Budget-friendly multimodal option
Quick visual analysis

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

High-volume image processing
Quick visual QA
Simple document scans
Receipt and form analysis
Budget-conscious vision tasks

Cost Comparison:

Most affordable vision model
Perfect for simple visual tasks at scale

Multilingual Models

Llama 3.3 Multi-lingual - Medium

Parameters: 70 Billion
Context Window: 131,072 tokens (~98,000 words)
Max Output: 8,192 tokens

Capabilities:

Optimized for multilingual dialogue
Strong European language support
Natural conversation flow
Cultural context awareness

Pricing:

Input: ... per million tokens
Output: ... per million tokens

Best For:

Multilingual customer support
Cross-border communication
Translation with context
International business

Supported Languages:

German, French, Italian (Swiss variants)
English, Spanish, Portuguese
Dutch, Polish, Czech
And 50+ more languages

Complete Pricing Overview

All Active Models - Price Comparison

Loading prices...

Best Value Models

Overall Best Value:

Chat & Document Analysis - Xtra Large
- GPT-4 level performance at competitive pricing
- Great for complex document analysis

Best Budget Option:

Fast Reasoning & Problem Solving - Small
- Most affordable reasoning model
- Perfect for high-frequency use

Best Swiss Compliance:

Apertus Swiss LLM - Small
- Full AI Act compliance
- More affordable than Apertus Large

Best Vision Model:

Document Analysis - Xtra Small
- Most affordable vision capabilities
- Perfect for document scanning at scale

Cost Estimation

Typical Task Types

Task Type	Token Usage	Recommended Model
Email response	500 input + 300 output	Chat & Document Analysis - Medium
10-page document summary	10K input + 1K output	Chat & Document Analysis - Medium
Contract analysis (30 pages)	30K input + 2K output	Apertus Swiss LLM - Large
Complex reasoning task	5K input + 3K output	Reasoning & Problem Solving - Xtra Large
Multilingual chat (hour)	15K input + 10K output	Llama 3.3 Multi-lingual - Medium

Cost Optimization

Choose the right model for each task. Use smaller, cheaper models for routine work and reserve premium models for complex analysis. See the pricing table above for current rates.

Choosing the Right Model

Decision Framework

1. Task Complexity

Simple/Routine → Apertus Small, Chat & Document Medium
Complex/Technical → Apertus Large, Chat & Document Xtra Large
Advanced Reasoning → Reasoning - XL

2. Language Requirements

German/French/Italian primary → Apertus models
Multilingual → Llama 3.3
English-focused → Chat & Document Medium

3. Compliance Needs

AI Act compliance required → Apertus models
Swiss data sovereignty → All models (Swiss-hosted)
Maximum transparency → Apertus models

4. Budget Constraints

Minimal cost → Apertus Swiss LLM - Small
Balanced → Chat & Document Analysis - Medium
Premium quality → Reasoning & Problem Solving - Xtra Large

Model Comparison Matrix

Feature	Chat & Doc Medium	Apertus Large	Apertus Small	Reasoning - XL	Llama 3.3
Swiss LLM	❌	✅	✅	❌	❌
AI Act Compliant	⚠️	✅	✅	⚠️	⚠️
Multilingual	✅	✅✅	✅✅	✅	✅✅
Reasoning	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Coding	⭐⭐	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Speed	⚡⚡⚡	⚡⚡	⚡⚡⚡⚡	⚡⚡	⚡⚡⚡
Context Window	128K	65K	32K	65K	131K

🎓 Best Practices

Cost Optimization Strategies

1. Match Model to Task

✅ Good: Use Chat & Document Medium for email responses
❌ Bad: Use Reasoning - XL for simple emails
Savings: 17x cost reduction

2. Optimize Prompt Length

❌ Inefficient: Long, repetitive context
✅ Efficient: Concise, specific prompts
Savings: 30-50% token reduction

3. Batch Similar Tasks

✅ Good: Process 10 documents in one conversation
❌ Bad: Start new chat for each document
Savings: Reduce redundant context

4. Use Smaller Models When Possible

For routine tasks: Apertus Small
For complex analysis: Apertus Large
For deep reasoning: Reasoning - XL

Quality Optimization

When to Use Premium Models:

Critical business decisions
Complex technical problems
High-stakes legal/financial analysis
Advanced reasoning requirements

When Budget Models Are Sufficient:

Routine correspondence
Simple summaries
Basic translations
FAQ responses
Content drafts (can be refined with premium model)

Model Availability

Currently Active Models

Chat & General Purpose:

Chat & Document Analysis - Medium (Mistral 24B)
Search, Chat & Analysis - Small (Llama 4 Scout 17B)
Chat & Document Analysis - Xtra Large (Qwen3-VL 235B)
Kimi K2 - Chat

Swiss LLM (AI Act Compliant):

Apertus Swiss LLM - Large (70B)
Apertus Swiss LLM - Small (8B)

Reasoning & Problem-Solving:

Reasoning & Problem Solving - Xtra Large (DeepSeek R1 670B)
Reasoning & Problem Solving - Medium (QwQ 32B)
Fast Reasoning & Problem Solving - Small (Qwen3 8B)
Reasoning & Agent tasks - Xtra Large (GPT-OSS 120B)

Vision & Document Analysis:

Document Analysis - Small (Gemma 12B) - Vision enabled
Document Analysis - Xtra Small (Granite 2B) - Vision enabled

Multilingual:

Llama 3.3 Multi-lingual - Medium (70B)

Model Updates

We regularly add new models and update existing ones. Check your dashboard for the latest available models and current pricing.

Learn More:

Model Profiles - Detailed model descriptions
Choosing the Right Model - Interactive decision guide
Understanding Tokens - How billing works
Prompt Engineering - Get better results

Optimize Usage:

Best Practices - When to use AI
Structured Prompts - Advanced techniques
Monitoring Usage - Track costs

FAQ

Q: Can I switch models mid-conversation? A: Yes! You can change models at any time. The conversation context carries over (note: very long contexts may be truncated for smaller models).

Q: Which model is best for Swiss legal documents? A: Apertus Swiss LLM (Large or Small) - they're AI Act compliant and optimized for Swiss languages.

Q: What's the cheapest way to process 100 documents? A: Use Apertus Swiss LLM - Small for routine processing, escalate to Apertus Large or Chat & Document Analysis - Medium for complex analysis. Check the pricing table above for current rates.

Q: Do all models support document upload? A: Yes, all active models support document analysis. Some models have vision capabilities for image analysis.

Q: How do I track which model costs what? A: Your usage dashboard shows token usage and costs broken down by model. Navigate to Account → Usage & Billing.

Q: Can I set spending limits per model? A: Not yet, but you can set overall monthly spending limits. Model-specific limits are on our roadmap.

Get Started

Ready to choose your model?

Log in to your Schatzi AI account
Start a new chat in OpenWebUI
Click the model selector at the top
Choose the right model for your task
Start chatting!

Need help? Contact Support | View Pricing Plans

Quick Model Selector​

Complete Model Reference​

Chat & General Purpose Models​

Chat & Document Analysis - Medium​

Search, Chat & Analysis - Small​

Chat & Document Analysis - Xtra Large​

Swiss LLM Models (🇨🇭 AI Act Compliant)​

Apertus Swiss LLM - Large (70B)​

Apertus Swiss LLM - Small (8B)​

Reasoning & Problem-Solving Models​

Reasoning & Problem Solving - Xtra Large​

Reasoning & Problem Solving - Medium​

Fast Reasoning & Problem Solving - Small​

Reasoning & Agent tasks - Xtra Large​

Kimi K2 - Chat​

Vision & Document Analysis Models​

Document Analysis - Small​

Document Analysis - Xtra Small​

Multilingual Models​

Llama 3.3 Multi-lingual - Medium​

Complete Pricing Overview​

All Active Models - Price Comparison​

Best Value Models​

Cost Estimation​

Typical Task Types​

Choosing the Right Model​

Decision Framework​

Model Comparison Matrix​

🎓 Best Practices​

Cost Optimization Strategies​

Quality Optimization​

Model Availability​

Currently Active Models​

Related Documentation​

FAQ​

Get Started​

Quick Model Selector

Complete Model Reference

Chat & General Purpose Models

Chat & Document Analysis - Medium

Search, Chat & Analysis - Small

Chat & Document Analysis - Xtra Large

Swiss LLM Models (🇨🇭 AI Act Compliant)

Apertus Swiss LLM - Large (70B)

Apertus Swiss LLM - Small (8B)

Reasoning & Problem-Solving Models

Reasoning & Problem Solving - Xtra Large

Reasoning & Problem Solving - Medium

Fast Reasoning & Problem Solving - Small

Reasoning & Agent tasks - Xtra Large

Kimi K2 - Chat

Vision & Document Analysis Models

Document Analysis - Small

Document Analysis - Xtra Small

Multilingual Models

Llama 3.3 Multi-lingual - Medium

Complete Pricing Overview

All Active Models - Price Comparison

Best Value Models

Cost Estimation

Typical Task Types

Choosing the Right Model

Decision Framework

Model Comparison Matrix

🎓 Best Practices

Cost Optimization Strategies

Quality Optimization

Model Availability

Currently Active Models

Related Documentation

FAQ

Get Started