AI Coding Assistant
👉 Enable Vibe coding and AI assistant in our VSCode / IntelliJ IDE using Ollama LLM Models.
👉 A step by step guide to connect VS Code and IntelliJ with Ollama Models for seamless local code generation and pairing.
Table of Contents
- Table of Contents
- Overview
- Prerequisites
- Step 1: Install Continue Extension
- Step 2: Install and Start Ollama
- Step 3: Download Local Models
- Step 4: Configure Continue with Ollama
- Step 5: Select and Use Your Models
- Model Recommendations
- Tips for Performance
- Troubleshooting
- Next Steps
Overview
Welcome to this fully local, private, and offline AI coding assistatnt setup! Using the Continue extension in VS Code paired with Ollama for lightweight LLMs, you’ll get autocomplete, chat, editing, and more—without cloud dependencies. Perfect for developers prioritizing speed, privacy, and customization.
This guide is battle-tested for macOS, Linux, and Windows. Expect setup in ~15-30 minutes, depending on model downloads.
Prerequisites
Before diving in, ensure your setup meets these basics. Use this checklist for a smooth ride! ✅
| Requirement | Details | Why It Matters |
|---|---|---|
| RAM | 8GB minimum (16GB+ recommended) | Handles model inference |
| Storage | 10GB+ free | For model files (e.g., 4-8GB each) |
| OS | macOS, Linux, or Windows | Ollama compatibility |
| Terminal Access | Basic command-line skills | For Ollama setup |
| VS Code | Install Latest version | Core IDE for the extension |
💡 Pro Tip: If you’re on a laptop, plug-in for downloads—models can be hefty!
Step 1: Install Continue Extension
Get the AI brains into VS Code.
- Install latest VSCode.
- Open VS Code.
- Hit
Ctrl+Shift+X(Windows/Linux) orCmd+Shift+X(macOS) to open Extensions. - Search for
Continue. - Install the official extension from Continue.dev.
- Reload VS Code after install.
- You’ll see a new Continue icon in the activity bar (left sidebar).
💡 Follow the same to install Continue plugin in IntelliJ IDE and cofigure ollama models to integrate them.
Step 2: Install and Start Ollama
Ollama is your local LLM server—think of it as a lightweight OpenAI API alternative.
Installation Commands
Run these in your terminal:
| OS | Command |
|---|---|
| macOS | brew install ollama |
| Linux | curl -fsSL https://ollama.com/install.sh \| sh |
| Windows | Download installer from ollama.com |
Verify and Start
# Check version
ollama --version
# Start the server (runs in background)
ollama serve
# Test connection (should echo "Ollama is running")
curl http://localhost:11434
⚠️ Warning: Keep the terminal open or run
ollama serveas a service (e.g., via systemd on Linux) for persistent use.
Step 3: Download Local Models
Pull models from Ollama’s library. Start small for testing!
Key Commands
# General coding (balanced)
ollama pull llama3.2:3b
# Code-focused autocomplete
ollama pull qwen2.5-coder:1.5b-base
# Embeddings for search (RAG)
ollama pull nomic-embed-text:latest
# List all installed models
ollama list
🔍 Quick Note: Use
ollama run <model_name>for interactive testing, butthis internally firstpullsthe model in local. Tags like:1b,:3betc. specify size of model.
I will recommend to stick to 1.5B - 8B for everyday hardware.
Step 4: Configure Continue with Ollama
Continue uses ~/.continue/config.yaml (YAML for readability).
- Open it via “
Continue sidebar” > “Settings gear” > “Open config.yaml”.
Base Config Example
Here’s a starter config with role-separated models for efficiency:
#config
name: Local Config
version: 1.0.0
schema: v1
models:
# Chats, edits, and apply changes (default for interactions)
- name: Llama 3.2 3B
provider: ollama
model: llama3.2:3b
roles:
- chat
- edit
- apply
default: true
completionOptions:
temperature: 0.2 # Low for consistent code
maxTokens: 2048 # Balanced length
# Lightweight autocomplete
- name: Qwen2.5-Coder 1.5B
provider: ollama
model: qwen2.5-coder:1.5b-base
# Note: Comment out below autocomplete role if facing performance issues.
# If enabled it will keep running the configured model.
# Resulting in consuming a lot of CPU and RAM, specially if you have limited resources.
roles:
- autocomplete
completionOptions:
temperature: 0.1 # Focused predictions
topP: 0.9
# Embeddings for @codebase searches
- name: Nomic Embed
provider: ollama
model: nomic-embed-text:latest
roles:
- embed
# Auto-detect new models
- name: Autodetect
provider: ollama
model: AUTODETECT
# Global tweaks
contextProviders:
- name: code
params:
nRetrieve: 5 # Fast searches
useReranking: false
tabAutocompleteOptions:
enabled: true
model: "Qwen2.5-Coder 1.5B"
Apply Changes
- Save the file.
- Reload:
Cmd/Ctrl+Shift+P> “Continue: Reload Config”. - Auto-detects Ollama at
http://localhost:11434.
🎨 UX Enhancement: Use VS Code’s YAML extension for syntax highlighting and validation, install via Extensions marketplace.
Step 5: Select and Use Your Models
Time to code with AI!
- Open the Continue form sidebar in IDE.
- Select Model: Dropdown at top-pick “Llama 3.2 3B” (or your default).
- Core Features:
- Chat: Highlight code >
Cmd/Ctrl+L> Ask “Refactor this?” - Autocomplete: As you type code-suggestions pop up (Tab to accept).
- Edit: Select code >
Cmd/Ctrl+I> “Add error handling”. - Search: In chat, type
@codebase What does utils.py do?
- Chat: Highlight code >
- Agent Mode: For tools, add
"capabilities": ["tool_use"]to a model’s config.
💫 Pro Tip: Pin your favorite model to the dropdown for one-click access.
Model Recommendations
Tailor to your workflow. Smaller = faster on CPU.
| Use Case | Model Suggestion | RAM Estimate | Speed Rating |
|---|---|---|---|
| Code Completion | qwen2.5-coder:1.5b-base | 4GB | ⚡ Fast |
| General Coding | llama3.2:3b | 8GB | 🚀 Quick |
| Advanced Tasks | codellama:7b | 8-16GB | 🐌 Balanced |
| Heavy Reasoning | mistral:7b | 16GB+ | 🐢 Thoughtful |
Run ollama pull <model> to add more. For GPU boost (NVIDIA/AMD), Ollama auto-detects—monitor with nvidia-smi.
Tips for Performance
- Monitor Loads:
ollama psin terminal. - Tune Context: Lower
maxTokensfor speed. - GPU Tweaks: In a custom Modelfile:
PARAMETER num_gpu 35. - Remote Ollama: Set
"apiBase": "http://your-server:11434"in config.- If above url doen’t work try
http://your-server:11434/v1
- If above url doen’t work try
- Backup Config: Git-track
~/.continue/for version control.
🌟 Enhancement: Integrate with VS Code themes—Continue respects your dark/light mode for a seamless look.
Troubleshooting
Hit a snag? Quick fixes:
| Issue | Solution |
|---|---|
| Model 404 Error | ollama pull <exact-tag>; match config precisely. |
| Connection Failed | Restart ollama serve; check port 11434 (netstat -an \| grep 11434). |
| Slow Autocomplete | Switch to 1.5B model; reduce contextLength. |
| No Embeddings | Pull nomic-embed-text; test with @files in chat. |
| Logs Needed | Continue sidebar > Debug view. |
For deep dives, check Continue Docs or Ollama Guide.
Next Steps
- Experiment: Try
@terminalfor shell integration. - Scale Up: Add larger models like
deepseek-coder:6.7bonce comfy. - Contribute: Fork this guide on GitHub and PR enhancements!
Questions? Drop a comment or ping on Continue Discord. Happy local coding! 🛠️✨