Integrate OpenRouter LLMs

👉 Connect to 100+ frontier LLMs using a single, unified API Key and the OpenAI Client.

👉 Deploy cost-effective AI apps using OpenRouter's free models and smart routing.


Table of Contents


Why Use OpenRouter?

This guide demonstrates a powerful technique for LLM Engineering: connecting to the OpenRouter.ai platform using the standard, familiar OpenAI Python API client.

OpenRouter acts as a unified gateway, allowing developers to access a wide array of Large Language Models (LLMs), including models that are free-tier or from providers other than OpenAI - all through the robust and well-documented OpenAI client library structure. This significantly streamlines multi-model development.

Architecture Flow

The method is simple: the OpenAI client is configured to route requests to OpenRouter’s API endpoint instead of OpenAI’s, by simply overriding the base_url and api_key parameters.

graph TD
    A[Python Client] -->|OpenAI Library with OpenRouter base_url & api_key| B(OpenRouter API Endpoint)
    B -->|Routes Request| C(OpenRouter Model Layer)
    C -->|Sends Response| B
    B -->|Returns Final Output| A

Essential Python Imports

To make our connection, we need libraries for environment variable management, API interaction, and display manipulation (for streaming output in environments like Jupyter/IPython).

import os
from dotenv import load_dotenv
from openai import OpenAI

from IPython.display import Markdown, display, update_display

Application Configuration

This setup loads the necessary credentials and configuration from a local .env file, ensuring our API keys are kept secure and separate from the main code. We specifically need the OpenRouter Base URL, API Key, and the chosen LLM model identifier (could be a free-tier model).

Environment Setup

Create a .env file in your project root with your OpenRouter credentials.

#.env file
# Openrouter
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
OPENROUTER_API_KEY = "sk-or-v1-b79XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
MINIMAX_FREE = "minimax/minimax-m2:free"

Application Config

The following code loads the variables and performs a quick check for their presence.

load_dotenv()

openrouter_base_url = os.getenv("OPENROUTER_BASE_URL")
openrouter_api_key = os.getenv("OPENROUTER_API_KEY")

openrouter_llm_model = os.getenv("MINIMAX_FREE")

if (openrouter_base_url and openrouter_api_key and openrouter_llm_model) is None:
    print('Openrouter Base url and api key are missing')
else:
    print(f"""Found base url and api key.
          Base URL: {openrouter_base_url}
          API Key: {openrouter_api_key[:10]}...
          LLM Model: {openrouter_llm_model}""")

Output:

Found base url and api key.
          Base URL: https://openrouter.ai/api/v1
          API Key: sk-or-v1-b...
          LLM Model: minimax/minimax-m2:free

Project POC:
Refer project source code on Github: Orachestrating Openrouter LLMs


Initializing the LLM Client

This is the critical step. We initialize the standard OpenAI client, by providing the OpenRouter-specificbase_url and api_key. From this point forward, all API calls use the standard OpenAI chat completion pattern but are routed to OpenRouter’s chosen LLM.

client = OpenAI(base_url=openrouter_base_url, api_key=openrouter_api_key)

payload = [
    {'role': 'system', 'content': 'You are a humorous assisant.'},
    {'role': 'user', 'content': 'Tell me a fun fact about lions.'}
]

Non-Streaming Application

A non-streaming API call is the simplest way to interact. The client waits for the entire response to be generated by the LLM and then delivers it all at once. This is suitable for backend processes or short requests.

response = client.chat.completions.create(model=openrouter_llm_model, messages=payload)

print(response.choices[0].message.content)

Response

Here’s a cool lion fact that never gets old: **A male lion’s roar can be heard from up to 5 miles (about 8 kilometers) away!** 🌟  

The roar is so powerful that it can travel across open savanna, helping pride members locate each other and even warn other lions in the area. It’s one of the few “real‑world” examples of a cat’s voice being a true communications system rather than just a hiss or a purr.  

(And just for fun, a male’s mane can weigh as much as 22 pounds (~10 kg)—talk about a heavy‑weight “hair‑do”!)

Streaming Application for Enhanced UX

In Real-time applications like a chat interface, streaming it’s essential. This significantly improves the user experience by providing immediate feedback.

By setting stream=True, we receive the response in small chunks, which we then process and update on the display.

Payload for Streaming

Let’s change the system personna to be an expert software engineer and request a detailed explanation of Greedy Algorithms to showcase a longer, more structured streaming response to user.

payload = [
    {'role': 'system', 'content': 'You are a expert software engineer, having vast experience in DSA ALGO, AI ML, using python, java and Springboot tech stack.'},
    {'role': 'user', 'content': 'Write an algorithm with example for greedy algorithm and explain it for begineers.'}
]

Streaming Implementation

This Python snippet uses IPython.display functions (display, update_display) to incrementally render the received text chunks, creating a dynamic, real-time output effect.

stream_resp = client.chat.completions.create(model=openrouter_llm_model, messages=payload, stream=True)

display_writer = display(Markdown(""), display_id=True)
resp = ""
for chunk in stream_resp:
    # Check if content exists in the chunk
    content = chunk.choices[0].delta.content
    if content:
        resp += content
        # Update the display with the new total content
        update_display(Markdown(resp), display_id=display_writer.display_id)

The LLM-generated response below provides a great example of structured, educational output using Markdown and code blocks, which is crucial for technical documentation.

Streaming Response

Here’s a simple, beginner-friendly explanation of greedy algorithms, followed by three classic examples with runnable Python code.

Greedy algorithm at a glance

  • Idea: At each step, pick the option that looks best right now (greedy choice), without reconsidering it later.
  • When it works: Problems where making the locally best choice also leads to a globally optimal solution. These usually have:
    • Optimal substructure: An optimal solution contains optimal solutions to subproblems.
    • Greedy choice property: The locally optimal choice can be made without affecting future choices.
  • When it fails: Problems where a locally optimal choice blocks a better global outcome. Greedy may be fast but incorrect in those cases.

General greedy skeleton

  • Sort candidates by some measure.
  • Iteratively select candidates that remain feasible.
  • Update the remaining problem and continue.

Example 1: Fractional Knapsack (Greedy Works)

  • Sort items by value per unit weight.
  • Keep adding items; if the next item doesn’t fit fully, take the fraction you need.
# Fractional Knapsack
# Greedy choice: pick highest value/weight ratio first; take fraction if needed.

from dataclasses import dataclass
from typing import List

@dataclass
class Item:
    name: str
    weight: float
    value: float

def fractional_knapsack(capacity: float, items: List[Item]):
    # Sort by value/weight descending
    items_sorted = sorted(items, key=lambda it: it.value / it.weight, reverse=True)

    total_value = 0.0
    taken = []  # tuples of (name, fraction_taken)

    for item in items_sorted:
        if capacity <= 0:
            break
        if item.weight <= capacity:
            # Take whole item
            total_value += item.value
            capacity -= item.weight
            taken.append((item.name, 1.0))
        else:
            # Take fraction
            fraction = capacity / item.weight
            total_value += item.value * fraction
            taken.append((item.name, round(fraction, 4)))
            capacity = 0
    return total_value, taken

if __name__ == "__main__":
    items = [
        Item("A", 10, 60),
        Item("B", 20, 100),
        Item("C", 30, 120),
    ]
    capacity = 50
    value, taken = fractional_knapsack(capacity, items)
    print("Max value:", value)
    print("Taken:", taken)

Example 2: Activity Selection (Greedy Works)

  • Pick the activity that finishes earliest among those you can attend next.
# Activity Selection
# Greedy choice: pick the activity that finishes first among those that start after the last chosen finishes.

from dataclasses import dataclass

@dataclass
class Activity:
    name: str
    start: int
    finish: int

def activity_selection(activities: List[Activity]):
    # Sort by earliest finish time
    activities_sorted = sorted(activities, key=lambda a: a.finish)
    selected = []
    last_finish = -float("inf")

    for a in activities_sorted:
        if a.start >= last_finish:
            selected.append(a)
            last_finish = a.finish
    return selected

if __name__ == "__main__":
    activities = [
        Activity("A", 1, 4),
        Activity("B", 3, 5),
        Activity("C", 0, 6),
        Activity("D", 5, 7),
        Activity("E", 8, 9),
        Activity("F", 5, 9),
    ]
    chosen = activity_selection(activities)
    print("Selected activities:", [a.name for a in chosen])

Example 3: Coin Change (Greedy Does NOT Always Work)

  • Greedy approach: use the largest coin first.
  • Counterexample: Denominations [1, 3, 4] to make 6. Greedy picks 4 + 1 + 1 (3 coins), but optimal is 3 + 3 (2 coins).
# Coin Change (greedy does NOT always give optimal results)
# Greedy choice: always use the largest coin possible first.

def coin_change_greedy(amount, coins):
    coins = sorted(coins, reverse=True)
    used = []
    total = 0
    for c in coins:
        cnt = amount // c
        if cnt:
            used.append((c, cnt))
            amount -= c * cnt
            total += cnt
    if amount != 0:
        return None, total, used  # not possible
    return None, total, used

if __name__ == "__main__":
    coins = [1, 3, 4]
    amount = 6
    _, total_coins, used = coin_change_greedy(amount, coins)
    print(f"Greedy uses {total_coins} coins: {used}")  # 3 coins (4+1+1)
    print("But optimal is 2 coins: (3, 3)")

Tips for Beginners

  • Always check if the problem has the greedy choice property; otherwise, consider dynamic programming or other methods.
  • Greedy algorithms are usually fast: often $O(n \log n)$ due to sorting.
  • When in doubt, test on edge cases and known counterexamples.