AI Coding Assistant

👉 Enable Vibe coding and AI assistant in our VSCode / IntelliJ IDE using Ollama LLM Models.

👉 A step by step guide to connect VS Code and IntelliJ with Ollama Models for seamless local code generation and pairing.


Spring Boot BadgeMicroservices BadgeContinue

Table of Contents


Overview

Welcome to this fully local, private, and offline AI coding assistatnt setup! Using the Continue extension in VS Code paired with Ollama for lightweight LLMs, you’ll get autocomplete, chat, editing, and more—without cloud dependencies. Perfect for developers prioritizing speed, privacy, and customization.

This guide is battle-tested for macOS, Linux, and Windows. Expect setup in ~15-30 minutes, depending on model downloads.

Prerequisites

Before diving in, ensure your setup meets these basics. Use this checklist for a smooth ride! ✅

RequirementDetailsWhy It Matters
RAM8GB minimum (16GB+ recommended)Handles model inference
Storage10GB+ freeFor model files (e.g., 4-8GB each)
OSmacOS, Linux, or WindowsOllama compatibility
Terminal AccessBasic command-line skillsFor Ollama setup
VS CodeInstall Latest versionCore IDE for the extension

💡 Pro Tip: If you’re on a laptop, plug-in for downloads—models can be hefty!


Step 1: Install Continue Extension

Get the AI brains into VS Code.

  1. Install latest VSCode.
  2. Open VS Code.
  3. HitCtrl+Shift+X (Windows/Linux) or Cmd+Shift+X (macOS) to open Extensions.
  4. Search for Continue.
  5. Install the official extension from Continue.dev.
  6. Reload VS Code after install.
    • You’ll see a new Continue icon in the activity bar (left sidebar).

💡 Follow the same to install Continue plugin in IntelliJ IDE and cofigure ollama models to integrate them.


Step 2: Install and Start Ollama

Ollama is your local LLM server—think of it as a lightweight OpenAI API alternative.

Installation Commands

Run these in your terminal:

OSCommand
macOSbrew install ollama
Linuxcurl -fsSL https://ollama.com/install.sh \| sh
WindowsDownload installer from ollama.com

Verify and Start

# Check version
ollama --version

# Start the server (runs in background)
ollama serve

# Test connection (should echo "Ollama is running")
curl http://localhost:11434

⚠️ Warning: Keep the terminal open or run ollama serve as a service (e.g., via systemd on Linux) for persistent use.


Step 3: Download Local Models

Pull models from Ollama’s library. Start small for testing!

Key Commands

# General coding (balanced)
ollama pull llama3.2:3b

# Code-focused autocomplete
ollama pull qwen2.5-coder:1.5b-base

# Embeddings for search (RAG)
ollama pull nomic-embed-text:latest

# List all installed models
ollama list

🔍 Quick Note: Use ollama run <model_name> for interactive testing, butthis internally first pulls the model in local. Tags like :1b, :3b etc. specify size of model.
I will recommend to stick to 1.5B - 8B for everyday hardware.


Step 4: Configure Continue with Ollama

Continue uses ~/.continue/config.yaml (YAML for readability).

  • Open it via “Continue sidebar” > “Settings gear” > “Open config.yaml”.

Base Config Example

Here’s a starter config with role-separated models for efficiency:

#config
name: Local Config
version: 1.0.0
schema: v1
models:
  # Chats, edits, and apply changes (default for interactions)
  - name: Llama 3.2 3B
    provider: ollama
    model: llama3.2:3b
    roles:
      - chat
      - edit
      - apply
    default: true
    completionOptions:
      temperature: 0.2  # Low for consistent code
      maxTokens: 2048   # Balanced length

  # Lightweight autocomplete
  - name: Qwen2.5-Coder 1.5B
    provider: ollama
    model: qwen2.5-coder:1.5b-base
    # Note: Comment out below autocomplete role if facing performance issues. 
    # If enabled it will keep running the configured model.
    # Resulting in consuming a lot of CPU and RAM, specially if you have limited resources.
    roles:
      - autocomplete
    completionOptions:
      temperature: 0.1  # Focused predictions
      topP: 0.9

  # Embeddings for @codebase searches
  - name: Nomic Embed
    provider: ollama
    model: nomic-embed-text:latest
    roles:
      - embed

  # Auto-detect new models
  - name: Autodetect
    provider: ollama
    model: AUTODETECT

# Global tweaks
contextProviders:
  - name: code
    params:
      nRetrieve: 5  # Fast searches
      useReranking: false

tabAutocompleteOptions:
  enabled: true
  model: "Qwen2.5-Coder 1.5B"

Apply Changes

  • Save the file.
  • Reload: Cmd/Ctrl+Shift+P > “Continue: Reload Config”.
  • Auto-detects Ollama at http://localhost:11434.

🎨 UX Enhancement: Use VS Code’s YAML extension for syntax highlighting and validation, install via Extensions marketplace.


Step 5: Select and Use Your Models

Time to code with AI!

  1. Open the Continue form sidebar in IDE.
  2. Select Model: Dropdown at top-pick “Llama 3.2 3B” (or your default).
  3. Core Features:
    • Chat: Highlight code > Cmd/Ctrl+L > Ask “Refactor this?”
    • Autocomplete: As you type code-suggestions pop up (Tab to accept).
    • Edit: Select code > Cmd/Ctrl+I > “Add error handling”.
    • Search: In chat, type @codebase What does utils.py do?
  4. Agent Mode: For tools, add "capabilities": ["tool_use"] to a model’s config.

💫 Pro Tip: Pin your favorite model to the dropdown for one-click access.


Model Recommendations

Tailor to your workflow. Smaller = faster on CPU.

Use CaseModel SuggestionRAM EstimateSpeed Rating
Code Completionqwen2.5-coder:1.5b-base4GB⚡ Fast
General Codingllama3.2:3b8GB🚀 Quick
Advanced Taskscodellama:7b8-16GB🐌 Balanced
Heavy Reasoningmistral:7b16GB+🐢 Thoughtful

Run ollama pull <model> to add more. For GPU boost (NVIDIA/AMD), Ollama auto-detects—monitor with nvidia-smi.


Tips for Performance

  • Monitor Loads: ollama ps in terminal.
  • Tune Context: Lower maxTokens for speed.
  • GPU Tweaks: In a custom Modelfile: PARAMETER num_gpu 35.
  • Remote Ollama: Set "apiBase": "http://your-server:11434" in config.
    • If above url doen’t work try http://your-server:11434/v1
  • Backup Config: Git-track ~/.continue/ for version control.

🌟 Enhancement: Integrate with VS Code themes—Continue respects your dark/light mode for a seamless look.


Troubleshooting

Hit a snag? Quick fixes:

IssueSolution
Model 404 Errorollama pull <exact-tag>; match config precisely.
Connection FailedRestart ollama serve; check port 11434 (netstat -an \| grep 11434).
Slow AutocompleteSwitch to 1.5B model; reduce contextLength.
No EmbeddingsPull nomic-embed-text; test with @files in chat.
Logs NeededContinue sidebar > Debug view.

For deep dives, check Continue Docs or Ollama Guide.


Next Steps

  • Experiment: Try @terminal for shell integration.
  • Scale Up: Add larger models like deepseek-coder:6.7b once comfy.
  • Contribute: Fork this guide on GitHub and PR enhancements!

Questions? Drop a comment or ping on Continue Discord. Happy local coding! 🛠️✨