Integrate OpenRouter LLMs
👉 Connect to 100+ frontier LLMs using a single, unified API Key and the OpenAI Client.
👉 Deploy cost-effective AI apps using OpenRouter's free models and smart routing.
Table of Contents
- Table of Contents
- Why Use OpenRouter?
- Essential Python Imports
- Application Configuration
- Initializing the LLM Client
- Non-Streaming Application
- Streaming Application for Enhanced UX
Why Use OpenRouter?
This guide demonstrates a powerful technique for LLM Engineering: connecting to the OpenRouter.ai platform using the standard, familiar OpenAI Python API client.
OpenRouter acts as a unified gateway, allowing developers to access a wide array of Large Language Models (LLMs), including models that are free-tier or from providers other than OpenAI - all through the robust and well-documented OpenAI client library structure. This significantly streamlines multi-model development.
Architecture Flow
The method is simple: the OpenAI client is configured to route requests to OpenRouter’s API endpoint instead of OpenAI’s, by simply overriding the base_url and api_key parameters.
graph TD
A[Python Client] -->|OpenAI Library with OpenRouter base_url & api_key| B(OpenRouter API Endpoint)
B -->|Routes Request| C(OpenRouter Model Layer)
C -->|Sends Response| B
B -->|Returns Final Output| A
Essential Python Imports
To make our connection, we need libraries for environment variable management, API interaction, and display manipulation (for streaming output in environments like Jupyter/IPython).
import os
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display
Application Configuration
This setup loads the necessary credentials and configuration from a local .env file, ensuring our API keys are kept secure and separate from the main code. We specifically need the OpenRouter Base URL, API Key, and the chosen LLM model identifier (could be a free-tier model).
Environment Setup
Create a .env file in your project root with your OpenRouter credentials.
#.env file
# Openrouter
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
OPENROUTER_API_KEY = "sk-or-v1-b79XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
MINIMAX_FREE = "minimax/minimax-m2:free"
Application Config
The following code loads the variables and performs a quick check for their presence.
load_dotenv()
openrouter_base_url = os.getenv("OPENROUTER_BASE_URL")
openrouter_api_key = os.getenv("OPENROUTER_API_KEY")
openrouter_llm_model = os.getenv("MINIMAX_FREE")
if (openrouter_base_url and openrouter_api_key and openrouter_llm_model) is None:
print('Openrouter Base url and api key are missing')
else:
print(f"""Found base url and api key.
Base URL: {openrouter_base_url}
API Key: {openrouter_api_key[:10]}...
LLM Model: {openrouter_llm_model}""")
Output:
Found base url and api key.
Base URL: https://openrouter.ai/api/v1
API Key: sk-or-v1-b...
LLM Model: minimax/minimax-m2:free
Project POC:
Refer project source code on Github: Orachestrating Openrouter LLMs
Initializing the LLM Client
This is the critical step. We initialize the standard OpenAI client, by providing the OpenRouter-specificbase_url and api_key. From this point forward, all API calls use the standard OpenAI chat completion pattern but are routed to OpenRouter’s chosen LLM.
client = OpenAI(base_url=openrouter_base_url, api_key=openrouter_api_key)
payload = [
{'role': 'system', 'content': 'You are a humorous assisant.'},
{'role': 'user', 'content': 'Tell me a fun fact about lions.'}
]
Non-Streaming Application
A non-streaming API call is the simplest way to interact. The client waits for the entire response to be generated by the LLM and then delivers it all at once. This is suitable for backend processes or short requests.
response = client.chat.completions.create(model=openrouter_llm_model, messages=payload)
print(response.choices[0].message.content)
Response
Here’s a cool lion fact that never gets old: **A male lion’s roar can be heard from up to 5 miles (about 8 kilometers) away!** 🌟
The roar is so powerful that it can travel across open savanna, helping pride members locate each other and even warn other lions in the area. It’s one of the few “real‑world” examples of a cat’s voice being a true communications system rather than just a hiss or a purr.
(And just for fun, a male’s mane can weigh as much as 22 pounds (~10 kg)—talk about a heavy‑weight “hair‑do”!)
Streaming Application for Enhanced UX
In Real-time applications like a chat interface, streaming it’s essential. This significantly improves the user experience by providing immediate feedback.
By setting stream=True, we receive the response in small chunks, which we then process and update on the display.
Payload for Streaming
Let’s change the system personna to be an expert software engineer and request a detailed explanation of Greedy Algorithms to showcase a longer, more structured streaming response to user.
payload = [
{'role': 'system', 'content': 'You are a expert software engineer, having vast experience in DSA ALGO, AI ML, using python, java and Springboot tech stack.'},
{'role': 'user', 'content': 'Write an algorithm with example for greedy algorithm and explain it for begineers.'}
]
Streaming Implementation
This Python snippet uses IPython.display functions (display, update_display) to incrementally render the received text chunks, creating a dynamic, real-time output effect.
stream_resp = client.chat.completions.create(model=openrouter_llm_model, messages=payload, stream=True)
display_writer = display(Markdown(""), display_id=True)
resp = ""
for chunk in stream_resp:
# Check if content exists in the chunk
content = chunk.choices[0].delta.content
if content:
resp += content
# Update the display with the new total content
update_display(Markdown(resp), display_id=display_writer.display_id)
The LLM-generated response below provides a great example of structured, educational output using Markdown and code blocks, which is crucial for technical documentation.
Streaming Response
Here’s a simple, beginner-friendly explanation of greedy algorithms, followed by three classic examples with runnable Python code.
Greedy algorithm at a glance
- Idea: At each step, pick the option that looks best right now (greedy choice), without reconsidering it later.
- When it works: Problems where making the locally best choice also leads to a globally optimal solution. These usually have:
- Optimal substructure: An optimal solution contains optimal solutions to subproblems.
- Greedy choice property: The locally optimal choice can be made without affecting future choices.
- When it fails: Problems where a locally optimal choice blocks a better global outcome. Greedy may be fast but incorrect in those cases.
General greedy skeleton
- Sort candidates by some measure.
- Iteratively select candidates that remain feasible.
- Update the remaining problem and continue.
Example 1: Fractional Knapsack (Greedy Works)
- Sort items by value per unit weight.
- Keep adding items; if the next item doesn’t fit fully, take the fraction you need.
# Fractional Knapsack
# Greedy choice: pick highest value/weight ratio first; take fraction if needed.
from dataclasses import dataclass
from typing import List
@dataclass
class Item:
name: str
weight: float
value: float
def fractional_knapsack(capacity: float, items: List[Item]):
# Sort by value/weight descending
items_sorted = sorted(items, key=lambda it: it.value / it.weight, reverse=True)
total_value = 0.0
taken = [] # tuples of (name, fraction_taken)
for item in items_sorted:
if capacity <= 0:
break
if item.weight <= capacity:
# Take whole item
total_value += item.value
capacity -= item.weight
taken.append((item.name, 1.0))
else:
# Take fraction
fraction = capacity / item.weight
total_value += item.value * fraction
taken.append((item.name, round(fraction, 4)))
capacity = 0
return total_value, taken
if __name__ == "__main__":
items = [
Item("A", 10, 60),
Item("B", 20, 100),
Item("C", 30, 120),
]
capacity = 50
value, taken = fractional_knapsack(capacity, items)
print("Max value:", value)
print("Taken:", taken)
Example 2: Activity Selection (Greedy Works)
- Pick the activity that finishes earliest among those you can attend next.
# Activity Selection
# Greedy choice: pick the activity that finishes first among those that start after the last chosen finishes.
from dataclasses import dataclass
@dataclass
class Activity:
name: str
start: int
finish: int
def activity_selection(activities: List[Activity]):
# Sort by earliest finish time
activities_sorted = sorted(activities, key=lambda a: a.finish)
selected = []
last_finish = -float("inf")
for a in activities_sorted:
if a.start >= last_finish:
selected.append(a)
last_finish = a.finish
return selected
if __name__ == "__main__":
activities = [
Activity("A", 1, 4),
Activity("B", 3, 5),
Activity("C", 0, 6),
Activity("D", 5, 7),
Activity("E", 8, 9),
Activity("F", 5, 9),
]
chosen = activity_selection(activities)
print("Selected activities:", [a.name for a in chosen])
Example 3: Coin Change (Greedy Does NOT Always Work)
- Greedy approach: use the largest coin first.
- Counterexample: Denominations [1, 3, 4] to make 6. Greedy picks 4 + 1 + 1 (3 coins), but optimal is 3 + 3 (2 coins).
# Coin Change (greedy does NOT always give optimal results)
# Greedy choice: always use the largest coin possible first.
def coin_change_greedy(amount, coins):
coins = sorted(coins, reverse=True)
used = []
total = 0
for c in coins:
cnt = amount // c
if cnt:
used.append((c, cnt))
amount -= c * cnt
total += cnt
if amount != 0:
return None, total, used # not possible
return None, total, used
if __name__ == "__main__":
coins = [1, 3, 4]
amount = 6
_, total_coins, used = coin_change_greedy(amount, coins)
print(f"Greedy uses {total_coins} coins: {used}") # 3 coins (4+1+1)
print("But optimal is 2 coins: (3, 3)")
Tips for Beginners
- Always check if the problem has the greedy choice property; otherwise, consider dynamic programming or other methods.
- Greedy algorithms are usually fast: often $O(n \log n)$ due to sorting.
- When in doubt, test on edge cases and known counterexamples.