How I Set Up Ollama With n8n and Brought My AI API Costs to Zero
$50/month in OpenAI API charges was eating into my automation ROI. I switched to Ollama for local AI and my costs dropped to zero. Here's the full setup and honest tradeoffs.
How I Set Up Ollama With n8n and Brought My AI API Costs to Zero#
In December I opened my OpenAI billing dashboard and saw $54. Not because I'd built something impressive — because my n8n automations had been quietly calling the API thousands of times across several workflows, and the costs had crept up while I wasn't paying attention.
The automations were worth it. But $54/month purely for API calls to classify emails and summarize text felt wrong when I knew local models existed that could do the same thing.
I spent a weekend switching to Ollama. My January OpenAI bill was $0.
This post is the honest version of how that went — what worked immediately, what tripped me up, what I had to compromise on, and whether the quality difference is actually noticeable.
What I Was Using AI For in My Workflows#
Before explaining the switch, it helps to know what I was actually using OpenAI for. My n8n workflows were calling GPT-3.5-turbo for:
- Email classification: Is this email urgent, routine, or can I ignore it? (200-300 tokens per email, runs 30-40 times daily)
- Document type detection: Is this a contract, invoice, receipt, or proposal? (runs on every PDF I receive)
- Lead scoring summaries: Given this form submission, write a 2-sentence summary and suggest a follow-up priority
- Weekly report drafting: Turn a list of completed tasks into natural language paragraphs
None of these required GPT-4. I was using GPT-3.5-turbo for everything, which is cheap — but cheap multiplied by hundreds of daily calls adds up.
The math: roughly 800 API calls per day × 500 average tokens × $0.002 per 1K tokens = ~$0.80/day = ~$24/month. Add some larger calls and occasional GPT-4 tests and you land at $50-60/month.
Installing Ollama#
I have two machines I work from: a MacBook Pro M2 (32GB RAM) and a Windows desktop with an RTX 3080. I installed Ollama on both.
MacBook (Apple Silicon):
# Download from ollama.com or via Homebrew
brew install ollama
# Pull models
ollama pull llama3.1
ollama pull phi3
# Start the service
ollama serve
Ollama automatically uses the Apple Neural Engine and GPU on M-series chips. Performance is excellent — Llama 3.1 8B generates around 30-40 tokens per second on M2. Fast enough to feel instantaneous in a workflow.
Windows with NVIDIA GPU: Download the installer from ollama.com. It detects your GPU automatically and uses CUDA. Same experience, similar performance.
After installation, test it:
# Chat directly
ollama run llama3.1
# Test the API
curl http://localhost:11434/api/generate \
-d '{"model": "llama3.1", "prompt": "Classify this email as urgent or routine: Meeting at 3pm tomorrow?", "stream": false}'
If you see a response, you're ready.
The Docker Networking Problem#
My n8n instance runs in Docker. When I set up the Ollama credential in n8n with http://localhost:11434, every AI node failed with a connection error.
The issue is obvious once you know it, but it cost me an hour: inside a Docker container, localhost refers to the container's own loopback address, not the host machine. The Ollama server running on my Mac was invisible to the n8n container.
Fix on Mac/Windows:
Docker Desktop automatically creates a special hostname host.docker.internal that resolves to the host machine's IP. Change your Ollama credential in n8n from:
http://localhost:11434
to:
http://host.docker.internal:11434
That's it. Saved and tested — the connection worked immediately.
Fix on Linux (where host.docker.internal isn't automatic):
Add this to your docker-compose.yml:
services:
n8n:
image: n8nio/n8n
extra_hosts:
- "host.docker.internal:host-gateway"
# ... rest of config
Or find your host's Docker bridge IP (172.17.0.1 on most systems) and use that directly.
Setting Up the n8n Credential#
In n8n, go to Settings → Credentials → New. Search for "Ollama".
Fields:
- Base URL:
http://host.docker.internal:11434 - Name it: "Local Ollama"
Save. Then in any AI node, select "Ollama" as the provider and "Local Ollama" as the credential.
Replacing Each Workflow — What Happened#
Email Classification#
Before (OpenAI):
Model: gpt-3.5-turbo
Prompt: "Classify this email as urgent/routine/ignore. Return JSON: {category, reason}"
Average response time: 800ms
Cost: ~0.001 per call
After (Ollama / Llama 3.1 8B):
Model: llama3.1
Same prompt
Average response time: 1.2 seconds (local, M2 Mac)
Cost: $0
Quality difference: Essentially none for this task. Llama 3.1 8B classifies emails correctly about 94-96% of the time in my testing, compared to GPT-3.5-turbo at around 96-97%. The 2% difference means maybe one misclassified email per week. Completely acceptable.
Document Type Detection#
After switch quality: Very good. The model correctly identifies invoice/contract/receipt/proposal 97%+ of the time when I give it the first page of text. Better than I expected, honestly.
One thing I had to change: my prompt for OpenAI could be relatively loose. Local models respond better to more explicit, structured prompts.
OpenAI prompt (loose):
What type of document is this? Invoice, contract, receipt, or proposal?
[document text]
Ollama prompt (structured):
You are a document classification assistant. Analyze the document text below and
identify its type.
IMPORTANT: Respond with ONLY a valid JSON object. No explanation. No markdown.
Format: {"type": "invoice|contract|receipt|proposal|other", "confidence": "high|medium|low"}
Document text:
[document text]
The more explicit instruction to return valid JSON was necessary. GPT-3.5 would usually return JSON without being told. Llama 3.1 8B needed the instruction reinforced. Once I updated my prompts, the outputs were reliable.
Lead Scoring Summaries#
This one was trickier. The task involves some nuanced judgment — reading a form submission and writing a professional 2-sentence summary that captures the key details and suggests a follow-up priority.
My honest assessment: GPT-3.5 wrote better summaries. The language was more natural, the summaries were more insightful, and it made better judgment calls about priority.
Llama 3.1 8B was fine — the summaries were accurate and useful — but they had a slightly more mechanical feel. For internal workflow use where I'm the only one reading them, it's completely adequate. If these summaries were going to clients, I'd use GPT-4.
I kept this workflow on Ollama but added a "review" flag for any lead scored above a certain threshold, where I personally review the AI summary before using it.
Weekly Report Drafting#
This was the biggest quality gap. Report writing requires fluent prose and the ability to weave a coherent narrative from a list of completed tasks. Llama 3.1 8B produced grammatically correct text that covered all the facts, but it lacked the natural flow that GPT-4 could produce.
For this specific workflow, I switched from Ollama to a compromise: I use Ollama for the data aggregation and structuring step (which is mechanical), and GPT-4 for the final prose generation step (which benefits from the quality difference). This reduced my OpenAI costs by about 80% for this workflow while keeping the output quality high.
My Current Model Setup#
After a month of testing, here's what I actually use:
| Task | Model | Reason | |---|---|---| | Email classification | Phi-3 Mini | Fast, accurate enough, very low resource | | Document detection | Llama 3.1 8B | Good accuracy, handles longer text | | Data extraction from docs | Llama 3.1 8B | Good JSON output with proper prompts | | Lead summaries | Llama 3.1 8B | Adequate quality for internal use | | Report prose writing | GPT-4 (via API) | Quality matters, small volume | | Research summaries | Llama 3.1 70B (server) | Better reasoning, justify the resource use |
The 70B model runs on a rented GPU server (Lambda Labs, used only for batch jobs), not my laptop. For occasional heavy tasks it's cost-effective — a few dollars for an hour of GPU time is far less than equivalent GPT-4 API calls.
What My Costs Look Like Now#
Before switch:
- OpenAI API: ~$54/month
- n8n VPS: €3.79/month
- Total: ~$59/month
After switch:
- OpenAI API: ~$4/month (only report prose writing, low volume)
- n8n VPS: €3.79/month
- Lambda Labs (occasional): ~$2/month average
- Total: ~$10/month
Monthly savings: ~$49
Annual savings: ~$588.
Would I Recommend This?#
Yes, with these caveats:
Do it if:
- Your AI workflows run high volumes (100+ calls per day)
- You're processing sensitive data that shouldn't leave your network
- Your tasks are classification, extraction, or structured output generation
- You have decent hardware (16GB RAM minimum, GPU helps a lot)
Think carefully if:
- You need GPT-4-level prose quality in client-facing outputs
- Your hardware is old or underpowered (CPU-only inference is slow)
- You're doing tasks that require broad world knowledge or nuanced reasoning
The middle path (what I do): Use Ollama for the bulk of high-volume, mechanical tasks. Keep a minimal OpenAI subscription for the small percentage of tasks where quality genuinely matters. You get most of the cost savings with minimal quality compromise.
Frequently Asked Questions#
Does Ollama work as well as OpenAI GPT-4 for n8n automations?#
For most automation tasks — classification, summarization, data extraction — Llama 3.1 8B performs at roughly GPT-3.5 quality, sufficient for the vast majority of workflow automation. For complex reasoning or nuanced writing, GPT-4 still has an edge. Many users run Ollama for 90% of tasks and keep OpenAI as a fallback for the other 10%.
Can I run Ollama on a VPS server without a GPU?#
Yes. On a Hetzner CX31 (4 vCPU, 8GB RAM, €8/month), Llama 3.1 8B runs at about 3-5 tokens per second — slow but functional for background automations. CPU-only is fine for overnight batch jobs. For real-time workflows, GPU or Apple M-series is much better.
How do I connect n8n in Docker to Ollama on the host machine?#
Use host.docker.internal instead of localhost. In your n8n Ollama credential, set Base URL to http://host.docker.internal:11434. On Linux, add --add-host=host.docker.internal:host-gateway to your Docker run command.
Which Ollama model should I start with for n8n automation?#
Start with Llama 3.1 8B (ollama pull llama3.1). It handles the majority of automation tasks well. If too slow, try Phi-3 Mini. If quality is insufficient for specific tasks, try Mistral 7B or Llama 3.1 70B if your hardware supports it.
Frequently Asked Questions
Continue Reading
How I Built a Client Reporting System That Runs Itself (With n8n)
Client reports were taking 3 hours every Friday. After one weekend building an n8n automation, they now take 10 minutes to review and send. Here's the workflow, the mistakes I made, and the parts that surprised me.
How I Fixed My n8n Workflow That Was Failing Silently for Three Weeks
My n8n workflow was silently failing every Tuesday for three weeks. No errors, no alerts, just nothing happening. Here's the debugging story and the monitoring setup I built so it can never sneak past me again.
How I Migrated from Zapier to n8n and Cut My Automation Bill to Zero
I was paying $120/month on Zapier and barely using a third of it. Here's the honest story of migrating to n8n — the wins, the failures, and the one thing that almost made me give up.