How I Set Up Ollama With n8n and Brought My AI API Costs to Zero
automation

How I Set Up Ollama With n8n and Brought My AI API Costs to Zero

$50/month in OpenAI API charges was eating into my automation ROI. I switched to Ollama for local AI and my costs dropped to zero. Here's the full setup and honest tradeoffs.

2026-04-05
10 min read
How I Set Up Ollama With n8n and Brought My AI API Costs to Zero

How I Set Up Ollama With n8n and Brought My AI API Costs to Zero#

In December I opened my OpenAI billing dashboard and saw $54. Not because I'd built something impressive — because my n8n automations had been quietly calling the API thousands of times across several workflows, and the costs had crept up while I wasn't paying attention.

The automations were worth it. But $54/month purely for API calls to classify emails and summarize text felt wrong when I knew local models existed that could do the same thing.

I spent a weekend switching to Ollama. My January OpenAI bill was $0.

This post is the honest version of how that went — what worked immediately, what tripped me up, what I had to compromise on, and whether the quality difference is actually noticeable.


What I Was Using AI For in My Workflows#

Before explaining the switch, it helps to know what I was actually using OpenAI for. My n8n workflows were calling GPT-3.5-turbo for:

  1. Email classification: Is this email urgent, routine, or can I ignore it? (200-300 tokens per email, runs 30-40 times daily)
  2. Document type detection: Is this a contract, invoice, receipt, or proposal? (runs on every PDF I receive)
  3. Lead scoring summaries: Given this form submission, write a 2-sentence summary and suggest a follow-up priority
  4. Weekly report drafting: Turn a list of completed tasks into natural language paragraphs

None of these required GPT-4. I was using GPT-3.5-turbo for everything, which is cheap — but cheap multiplied by hundreds of daily calls adds up.

The math: roughly 800 API calls per day × 500 average tokens × $0.002 per 1K tokens = ~$0.80/day = ~$24/month. Add some larger calls and occasional GPT-4 tests and you land at $50-60/month.


Installing Ollama#

I have two machines I work from: a MacBook Pro M2 (32GB RAM) and a Windows desktop with an RTX 3080. I installed Ollama on both.

MacBook (Apple Silicon):

# Download from ollama.com or via Homebrew
brew install ollama

# Pull models
ollama pull llama3.1
ollama pull phi3

# Start the service
ollama serve

Ollama automatically uses the Apple Neural Engine and GPU on M-series chips. Performance is excellent — Llama 3.1 8B generates around 30-40 tokens per second on M2. Fast enough to feel instantaneous in a workflow.

Windows with NVIDIA GPU: Download the installer from ollama.com. It detects your GPU automatically and uses CUDA. Same experience, similar performance.

After installation, test it:

# Chat directly
ollama run llama3.1

# Test the API
curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.1", "prompt": "Classify this email as urgent or routine: Meeting at 3pm tomorrow?", "stream": false}'

If you see a response, you're ready.


The Docker Networking Problem#

My n8n instance runs in Docker. When I set up the Ollama credential in n8n with http://localhost:11434, every AI node failed with a connection error.

The issue is obvious once you know it, but it cost me an hour: inside a Docker container, localhost refers to the container's own loopback address, not the host machine. The Ollama server running on my Mac was invisible to the n8n container.

Fix on Mac/Windows:

Docker Desktop automatically creates a special hostname host.docker.internal that resolves to the host machine's IP. Change your Ollama credential in n8n from:

http://localhost:11434

to:

http://host.docker.internal:11434

That's it. Saved and tested — the connection worked immediately.

Fix on Linux (where host.docker.internal isn't automatic):

Add this to your docker-compose.yml:

services:
  n8n:
    image: n8nio/n8n
    extra_hosts:
      - "host.docker.internal:host-gateway"
    # ... rest of config

Or find your host's Docker bridge IP (172.17.0.1 on most systems) and use that directly.


Setting Up the n8n Credential#

In n8n, go to Settings → Credentials → New. Search for "Ollama".

Fields:

  • Base URL: http://host.docker.internal:11434
  • Name it: "Local Ollama"

Save. Then in any AI node, select "Ollama" as the provider and "Local Ollama" as the credential.


Replacing Each Workflow — What Happened#

Email Classification#

Before (OpenAI):

Model: gpt-3.5-turbo
Prompt: "Classify this email as urgent/routine/ignore. Return JSON: {category, reason}"
Average response time: 800ms
Cost: ~0.001 per call

After (Ollama / Llama 3.1 8B):

Model: llama3.1
Same prompt
Average response time: 1.2 seconds (local, M2 Mac)
Cost: $0

Quality difference: Essentially none for this task. Llama 3.1 8B classifies emails correctly about 94-96% of the time in my testing, compared to GPT-3.5-turbo at around 96-97%. The 2% difference means maybe one misclassified email per week. Completely acceptable.

Document Type Detection#

After switch quality: Very good. The model correctly identifies invoice/contract/receipt/proposal 97%+ of the time when I give it the first page of text. Better than I expected, honestly.

One thing I had to change: my prompt for OpenAI could be relatively loose. Local models respond better to more explicit, structured prompts.

OpenAI prompt (loose):

What type of document is this? Invoice, contract, receipt, or proposal?
[document text]

Ollama prompt (structured):

You are a document classification assistant. Analyze the document text below and 
identify its type.

IMPORTANT: Respond with ONLY a valid JSON object. No explanation. No markdown.

Format: {"type": "invoice|contract|receipt|proposal|other", "confidence": "high|medium|low"}

Document text:
[document text]

The more explicit instruction to return valid JSON was necessary. GPT-3.5 would usually return JSON without being told. Llama 3.1 8B needed the instruction reinforced. Once I updated my prompts, the outputs were reliable.

Lead Scoring Summaries#

This one was trickier. The task involves some nuanced judgment — reading a form submission and writing a professional 2-sentence summary that captures the key details and suggests a follow-up priority.

My honest assessment: GPT-3.5 wrote better summaries. The language was more natural, the summaries were more insightful, and it made better judgment calls about priority.

Llama 3.1 8B was fine — the summaries were accurate and useful — but they had a slightly more mechanical feel. For internal workflow use where I'm the only one reading them, it's completely adequate. If these summaries were going to clients, I'd use GPT-4.

I kept this workflow on Ollama but added a "review" flag for any lead scored above a certain threshold, where I personally review the AI summary before using it.

Weekly Report Drafting#

This was the biggest quality gap. Report writing requires fluent prose and the ability to weave a coherent narrative from a list of completed tasks. Llama 3.1 8B produced grammatically correct text that covered all the facts, but it lacked the natural flow that GPT-4 could produce.

For this specific workflow, I switched from Ollama to a compromise: I use Ollama for the data aggregation and structuring step (which is mechanical), and GPT-4 for the final prose generation step (which benefits from the quality difference). This reduced my OpenAI costs by about 80% for this workflow while keeping the output quality high.


My Current Model Setup#

After a month of testing, here's what I actually use:

| Task | Model | Reason | |---|---|---| | Email classification | Phi-3 Mini | Fast, accurate enough, very low resource | | Document detection | Llama 3.1 8B | Good accuracy, handles longer text | | Data extraction from docs | Llama 3.1 8B | Good JSON output with proper prompts | | Lead summaries | Llama 3.1 8B | Adequate quality for internal use | | Report prose writing | GPT-4 (via API) | Quality matters, small volume | | Research summaries | Llama 3.1 70B (server) | Better reasoning, justify the resource use |

The 70B model runs on a rented GPU server (Lambda Labs, used only for batch jobs), not my laptop. For occasional heavy tasks it's cost-effective — a few dollars for an hour of GPU time is far less than equivalent GPT-4 API calls.


What My Costs Look Like Now#

Before switch:

  • OpenAI API: ~$54/month
  • n8n VPS: €3.79/month
  • Total: ~$59/month

After switch:

  • OpenAI API: ~$4/month (only report prose writing, low volume)
  • n8n VPS: €3.79/month
  • Lambda Labs (occasional): ~$2/month average
  • Total: ~$10/month

Monthly savings: ~$49

Annual savings: ~$588.


Would I Recommend This?#

Yes, with these caveats:

Do it if:

  • Your AI workflows run high volumes (100+ calls per day)
  • You're processing sensitive data that shouldn't leave your network
  • Your tasks are classification, extraction, or structured output generation
  • You have decent hardware (16GB RAM minimum, GPU helps a lot)

Think carefully if:

  • You need GPT-4-level prose quality in client-facing outputs
  • Your hardware is old or underpowered (CPU-only inference is slow)
  • You're doing tasks that require broad world knowledge or nuanced reasoning

The middle path (what I do): Use Ollama for the bulk of high-volume, mechanical tasks. Keep a minimal OpenAI subscription for the small percentage of tasks where quality genuinely matters. You get most of the cost savings with minimal quality compromise.


Frequently Asked Questions#

Does Ollama work as well as OpenAI GPT-4 for n8n automations?#

For most automation tasks — classification, summarization, data extraction — Llama 3.1 8B performs at roughly GPT-3.5 quality, sufficient for the vast majority of workflow automation. For complex reasoning or nuanced writing, GPT-4 still has an edge. Many users run Ollama for 90% of tasks and keep OpenAI as a fallback for the other 10%.

Can I run Ollama on a VPS server without a GPU?#

Yes. On a Hetzner CX31 (4 vCPU, 8GB RAM, €8/month), Llama 3.1 8B runs at about 3-5 tokens per second — slow but functional for background automations. CPU-only is fine for overnight batch jobs. For real-time workflows, GPU or Apple M-series is much better.

How do I connect n8n in Docker to Ollama on the host machine?#

Use host.docker.internal instead of localhost. In your n8n Ollama credential, set Base URL to http://host.docker.internal:11434. On Linux, add --add-host=host.docker.internal:host-gateway to your Docker run command.

Which Ollama model should I start with for n8n automation?#

Start with Llama 3.1 8B (ollama pull llama3.1). It handles the majority of automation tasks well. If too slow, try Phi-3 Mini. If quality is insufficient for specific tasks, try Mistral 7B or Llama 3.1 70B if your hardware supports it.

Frequently Asked Questions

|

Have more questions? Contact us