Skip to content

Models and Providers

LionAGI's iModel provides a unified interface for working with different LLM providers. Configure once, swap providers without changing application code.

Basic Usage

from lionagi import Branch, iModel

branch = Branch(
    chat_model=iModel(provider="openai", model="gpt-4.1-mini")
)

response = await branch.communicate("Explain quantum computing")

If you do not specify a model, Branch uses the default from environment configuration: LIONAGI_CHAT_PROVIDER (default: openai) and LIONAGI_CHAT_MODEL (default: gpt-4.1-mini).

Supported Providers

API-Based Providers

These providers communicate with hosted APIs over HTTP.

OpenAI -- API key: OPENAI_API_KEY

# Default model (gpt-4.1-mini)
branch = Branch(
    chat_model=iModel(provider="openai", model="gpt-4.1-mini")
)

# Other OpenAI models
gpt4 = iModel(provider="openai", model="gpt-4.1")
gpt4o = iModel(provider="openai", model="gpt-4o")
gpt4o_mini = iModel(provider="openai", model="gpt-4o-mini")

OpenAI also supports endpoint="response" for the Responses API:

response_model = iModel(provider="openai", endpoint="response", model="gpt-4.1")

Anthropic -- API key: ANTHROPIC_API_KEY

branch = Branch(
    chat_model=iModel(provider="anthropic", model="claude-sonnet-4-5-20250929")
)

# Other Anthropic models
haiku = iModel(provider="anthropic", model="claude-haiku-4-5-20251001")
opus = iModel(provider="anthropic", model="claude-opus-4-6")

Anthropic supports prompt caching via cache_control=True on individual calls.

Gemini (Native API) -- API key: GEMINI_API_KEY

Gemini uses Google's OpenAI-compatible endpoint:

branch = Branch(
    chat_model=iModel(provider="gemini", model="gemini-2.5-flash")
)

# Other Gemini models
gemini_pro = iModel(provider="gemini", model="gemini-2.0-flash")

Groq -- API key: GROQ_API_KEY

Fast inference for open models:

branch = Branch(
    chat_model=iModel(provider="groq", model="llama-3.3-70b-versatile")
)

OpenRouter -- API key: OPENROUTER_API_KEY

Access many models through a single API:

branch = Branch(
    chat_model=iModel(provider="openrouter", model="google/gemini-2.5-flash")
)

Perplexity -- API key: PERPLEXITY_API_KEY

Real-time web search and Q&A via the Sonar API:

branch = Branch(
    chat_model=iModel(provider="perplexity", model="sonar")
)

NVIDIA NIM -- API key: NVIDIA_NIM_API_KEY

Cloud-hosted models on NVIDIA infrastructure:

# Chat models
branch = Branch(
    chat_model=iModel(provider="nvidia_nim", model="meta/llama3-8b-instruct")
)

# Embedding models
embed_model = iModel(provider="nvidia_nim", endpoint="embed", model="nvidia/nv-embed-v1")

Exa -- API key: EXA_API_KEY

Semantic search (not a chat provider):

exa = iModel(provider="exa", endpoint="search")

Ollama -- Local models, no API key required

branch = Branch(
    chat_model=iModel(
        provider="ollama",
        model="llama3",
        base_url="http://localhost:11434"
    )
)

CLI-Based Providers (Coding Agents)

CLI providers wrap agentic coding tools that run as subprocesses rather than HTTP requests. This enables agent-to-agent orchestration -- your outer agent uses lionagi to spawn and coordinate inner coding agents.

Key differences from API providers:

  • Subprocess execution -- each call spawns the CLI binary and streams NDJSON from stdout with incremental UTF-8 decoding
  • Session persistence -- the endpoint stores session_id and automatically passes --resume on subsequent calls
  • Conservative concurrency -- 3 concurrent, queue capacity of 10
  • No API key needed -- the CLI tool handles its own authentication
  • Event handlers -- optional callbacks for streaming output (on_text, on_tool_use, on_final, etc.)

Claude Code -- Uses installed claude CLI (npm i -g @anthropic-ai/claude-code)

claude_code = iModel(
    provider="claude_code",
    model="sonnet",                          # "sonnet" or "opus"
    permission_mode="bypassPermissions",      # skip approval prompts
    allowed_tools=["Read", "Grep", "Glob"],   # restrict tool access
    max_turns=10,                            # conversation turn limit
)

branch = Branch(chat_model=claude_code)
result = await branch.communicate("Refactor the auth module")

Gemini CLI -- Uses installed gemini CLI

gemini_cli = iModel(
    provider="gemini_code",
    model="gemini-2.5-pro",
    sandbox=True,                      # safety sandboxing (default)
    approval_mode="auto_edit",         # "suggest", "auto_edit", "full_auto"
)

branch = Branch(chat_model=gemini_cli)
result = await branch.communicate("Review this codebase")

Codex CLI -- Uses installed codex CLI (npm i -g codex)

codex = iModel(
    provider="codex",
    model="gpt-5.3-codex",
    full_auto=True,                    # auto-approve with sandbox
    sandbox="workspace-write",         # "read-only", "workspace-write", "danger-full-access"
)

branch = Branch(chat_model=codex)
result = await branch.communicate("Write tests for the parser module")

For full parameter references and orchestration patterns, see LLM Provider Integration and CLI Agent Providers.

OpenAI-Compatible Providers

Any provider with an OpenAI-compatible API can be used by specifying base_url:

custom = iModel(
    provider="custom",
    model="my-model",
    base_url="https://my-provider.example.com/v1",
    api_key="my-api-key"
)

iModel Constructor

The full constructor signature with all parameters:

model = iModel(
    # Provider and endpoint
    provider="openai",                  # Provider name
    model="gpt-4.1-mini",              # Model name (passed via **kwargs)
    endpoint="chat",                    # Endpoint type (default: "chat")
    base_url=None,                      # Custom base URL
    api_key=None,                       # API key (defaults to env var)

    # Rate limiting
    queue_capacity=100,                 # Max queued requests (10 for CLI)
    capacity_refresh_time=60,           # Queue refresh interval (seconds)
    interval=None,                      # Processing interval
    limit_requests=None,               # Max requests per cycle
    limit_tokens=None,                 # Max tokens per cycle
    concurrency_limit=None,            # Max concurrent requests (3 for CLI)

    # Streaming
    streaming_process_func=None,       # Custom chunk processor

    # Hooks
    hook_registry=None,                # HookRegistry for pre/post hooks
    exit_hook=False,                   # Enable exit hooks

    # Model-specific parameters (passed to the endpoint)
    temperature=0.7,
    max_tokens=2000,
)

Async Context Manager

Use iModel as an async context manager for automatic resource cleanup:

async with iModel(provider="openai", model="gpt-4.1") as model:
    branch = Branch(chat_model=model)
    result = await branch.communicate("Hello")
    # Executor is stopped and resources released on exit

Copying Models

Use copy() to create an independent iModel instance with the same configuration but a fresh ID and executor:

original = iModel(provider="openai", model="gpt-4.1-mini")

# Fresh instance, independent executor
clone = original.copy()

# For CLI endpoints, optionally share the session for resume
cli_clone = cli_model.copy(share_session=True)

This is particularly useful when creating multiple branches that need independent rate limiting.

Branch with Separate Chat and Parse Models

Branch supports separate models for chat and structured parsing:

branch = Branch(
    chat_model=iModel(provider="openai", model="gpt-4.1"),       # For conversations
    parse_model=iModel(provider="openai", model="gpt-4.1-mini"), # For parsing
    system="Technical assistant"
)

# communicate() uses chat_model
await branch.communicate("Explain this concept")

# parse() uses parse_model
result = await branch.parse(some_text, response_format=MyModel)

If parse_model is not specified, it defaults to the same model as chat_model.

Multiple Providers in One Session

Mix different providers for different tasks:

from lionagi import Session, Branch, iModel

session = Session()

fast_branch = session.new_branch(
    name="fast",
    system="Quick answers",
    imodel=iModel(provider="openai", model="gpt-4.1-mini")
)

deep_branch = session.new_branch(
    name="deep",
    system="Detailed analysis",
    imodel=iModel(provider="anthropic", model="claude-sonnet-4-5-20250929")
)

# Route tasks to the appropriate model
quick = await fast_branch.communicate("What is 2+2?")
analysis = await deep_branch.communicate("Analyze the implications of quantum computing on cryptography")

Model Configuration

Configure model parameters at construction time:

configured = iModel(
    provider="openai",
    model="gpt-4.1",
    temperature=0.7,
    max_tokens=2000,
    limit_requests=100,
    limit_tokens=50000
)

branch = Branch(chat_model=configured, system="Creative writer")

Environment Configuration

LionAGI loads API keys and defaults from environment variables:

Variable Purpose Default
OPENAI_API_KEY OpenAI authentication --
ANTHROPIC_API_KEY Anthropic authentication --
GEMINI_API_KEY Gemini authentication --
GROQ_API_KEY Groq authentication --
OPENROUTER_API_KEY OpenRouter authentication --
PERPLEXITY_API_KEY Perplexity authentication --
NVIDIA_NIM_API_KEY NVIDIA NIM authentication --
EXA_API_KEY Exa authentication --
LIONAGI_CHAT_PROVIDER Default chat provider openai
LIONAGI_CHAT_MODEL Default chat model gpt-4.1-mini

Settings are loaded from .env, .env.local, or .secrets.env files automatically via pydantic-settings.

Provider Comparison

Provider Type Default Model Auth Key Env Var
openai API gpt-4.1-mini OPENAI_API_KEY
anthropic API -- ANTHROPIC_API_KEY
gemini API gemini-2.5-flash GEMINI_API_KEY
groq API llama-3.3-70b-versatile GROQ_API_KEY
openrouter API google/gemini-2.5-flash OPENROUTER_API_KEY
perplexity API sonar PERPLEXITY_API_KEY
nvidia_nim API meta/llama3-8b-instruct NVIDIA_NIM_API_KEY
ollama API -- --
exa API -- (search only) EXA_API_KEY
claude_code CLI -- -- (uses CLI auth)
gemini_code CLI -- -- (uses CLI auth)
codex CLI -- -- (uses CLI auth)