Custom Model Integrations

What are custom integrations?

Galileo’s custom integrations provide a flexible way of setting up LLMs that aren’t supported through other existing integrations. For example, when there are non-standard proxies, proprietary authentications, or proprietary inference protocols. This video demonstrates how to add a custom integration to Galileo:

Configuring custom integrations in the Galileo console

Navigate to Integrations

Navigate to Settings > Integrations in the Galileo console.

Add Custom Integration

Find the Custom integration card and click Add Integration

Configure Integration

Paste a valid JSON and Save changes.

This JSON example uses API key authentication. See JSON properties below for an explanation of these and other properties, and instructions for other authentication types.

{
  "authentication_type": "api_key",
  "api_key_header": "YOUR_API_KEY_HEADER",
  "api_key_value": "YOUR_API_KEY_VALUE",
  "model_properties": [
    {
      "name": "gpt-5.2",
      "alias": "GPT 5.2",
      "supported_parameters": [
        "max_tokens",
        "n",
        "reasoning_effort",
        "stop_sequences",
        "temperature",
        "tool_choice",
        "tools",
        "verbosity"
      ]
    },
    {
      "name": "gpt-5.4",
      "alias": "GPT 5.4",
      "based_on": "gpt-5.4"
    },
    {
      "name": "claude-opus-4-6",
      "alias": "Opus 4.6",
      "based_on": "Claude Opus 4.6"
    }
  ],
  "endpoint": "https://YOUR_PROVIDER_BASE_URL"
}

Test Integration

After saving, test the integration by selecting one of its models from a Playground.

Selecting the custom integration model in the Playground

Run a prompt or evaluation that uses the custom integration’s model.

Note: The models will also be available for use with metrics in Galileo.

JSON properties

Authentication properties

API key authentication

For providers that use API key authentication (for example, Portkey), specify the following properties:

"authentication_type": "api_key".
"api_key_header": The name of the header that the AI provider uses for API key authentication. For example, for Portkey the name is "x-portkey-api-key". Consult your AI provider’s documentation to find out what is the header name that it requires.
"api_key_value": The API key to be used.

{
  "authentication_type": "api_key",
  "api_key_header": "YOUR_API_KEY_HEADER",
  "api_key_value": "YOUR_API_KEY_VALUE",
  "model_properties": [
    {
      "name": "gpt-5.2",
      "alias": "GPT 5.2",
      "supported_parameters": [
        "max_tokens",
        "n",
        "reasoning_effort",
        "stop_sequences",
        "temperature",
        "tool_choice",
        "tools",
        "verbosity"
      ]
    },
    {
      "name": "gpt-5.4",
      "alias": "GPT 5.4",
      "based_on": "gpt-5.4"
    },
    {
      "name": "claude-opus-4-6",
      "alias": "Opus 4.6",
      "based_on": "Claude Opus 4.6"
    }
  ],
  "endpoint": "https://YOUR_PROVIDER_BASE_URL"
}

Bearer token authentication

For providers that use a static, pre-defined token for authentication (for example, Together AI), specify the following properties:

"authentication_type": "api_key".
"api_key_header": "Authorization".
"api_key_value": "Bearer YOUR_TOKEN" where YOUR_TOKEN is a long text sequence representing the authentication token.

{
  "authentication_type": "api_key",
  "api_key_header": "Authorization",
  "api_key_value": "Bearer YOUR_TOKEN",
  "model_properties": [
    {
      "name": "meta-llama/Llama-4-Scout-17B-16E-Instruct",
      "alias": "Llama 4 Scout",
      "supported_parameters": [
        "max_tokens",
        "stop_sequences",
        "temperature",
        "top_p"
      ]
    }
  ],
  "endpoint": "https://api.together.xyz/v1"
}

OAuth2 authentication

For providers that require dynamically-generated bearer tokens:

"authentication_type": "oauth2".
"oauth2_token_url": Endpoint URL of the OAuth2 server. This endpoint must be compatible with the OAuth2 Client Credentials Grant.
"authentication_scope": "YOUR_SCOPE" (optional, passed to the OAuth2 endpoint as the scope property of the token request payload).
"token": "{\"client_id\": \"YOUR_CLIENT_ID\", \"client_secret\": \"YOUR_CLIENT_SECRET\"}" (an escaped JSON string containing the static client ID and secret that will be sent to the OAuth2 endpoint).

The access_token field of the OAuth2 endpoint’s JSON response will be used as a Bearer token for LLM inference requests.

{
  "authentication_type": "oauth2",
  "oauth2_token_url": "https://auth.provider.com/oauth2/token",
  "token": "{\"client_id\": \"YOUR_CLIENT_ID\", \"client_secret\": \"YOUR_CLIENT_SECRET\"}",
  "authentication_scope": "inference",
  "model_properties": [
    {
      "name": "provider-model-v2",
      "alias": "Provider Model v2",
      "supported_parameters": [
        "max_tokens",
        "temperature",
        "top_p"
      ]
    }
  ],
  "endpoint": "https://api.provider.com/v1"
}

No authentication

For internal endpoints or providers that don’t require authentication:

"authentication_type": "none".

{
  "authentication_type": "none",
  "model_properties": [
    {
      "name": "internal-model-v1",
      "alias": "Internal Model",
      "supported_parameters": [
        "max_tokens",
        "temperature",
        "top_p"
      ]
    }
  ],
  "endpoint": "https://internal-gateway.local/v1"
}

The above examples cover the most common use cases. Most users won’t need to read beyond this point. The sections below are for advanced scenarios like custom LLM handlers and single-tenant deployments.

Model properties

The model_properties JSON property is used to configure the models that the AI provider supports. For each model, the following properties are available:

"name": Name of the model on the AI provider. The value is passed verbatim to the AI provider on the inference request.
"alias": The unique identifier of the model in Galileo. The value is displayed in the user interface when selecting models. If not provided, name will be used as a default value.
"based_on": An optional alias of a built-in Galileo model. If provided, the supported_parameters of the built-in model will be used for this custom integration model.
"supported_parameters": A list of parameters that the custom model supports. Alternative to based_on, with the difference that instead of copying the parameter names from a built-in model, it directly provides the list.

If neither based_on or supported_parameters are provided, this parameter list will be used by default: ["frequency_penalty", "max_tokens", "presence_penalty", "stop", "temperature", "top_p"]. List of built-in model aliases that can be used with based_on:

Click to expand

Anthropic

Claude 3 Haiku
Claude 3.7 Sonnet
Claude Haiku 4.5
Claude Opus 4
Claude Opus 4.1
Claude Opus 4.5
Claude Opus 4.6
Claude Sonnet 4
Claude Sonnet 4.5
Claude Sonnet 4.6

Azure

GPT-4o (Azure)
GPT-4o mini (Azure)
gpt-4.1 (Azure)
gpt-4.1-mini (Azure)
gpt-4.1-nano (Azure)
gpt-5 (Azure)
gpt-5-mini (Azure)
gpt-5-nano (Azure)
o1 (Azure)
o3 (Azure)
o3 mini (Azure)
o4 mini (Azure)

Bedrock

AI21 - Jamba 1.5 Large (Bedrock)
AI21 - Jamba 1.5 Mini (Bedrock)
Amazon - Nova 2 Lite (Bedrock)
Amazon - Nova Lite (Bedrock)
Amazon - Nova Micro (Bedrock)
Amazon - Nova Premier (Bedrock)
Amazon - Nova Pro (Bedrock)
Anthropic - Claude 3 Haiku (Bedrock)
Anthropic - Claude 3.5 Sonnet (Bedrock)
Anthropic - Claude 3.5 Sonnet v2 (Bedrock)
Anthropic - Claude 3.7 Sonnet (Bedrock)
Anthropic - Claude 4 Opus (Bedrock)
Anthropic - Claude 4 Sonnet (Bedrock)
Anthropic - Claude Haiku 4.5 (Bedrock)
Anthropic - Claude Opus 4.1 (Bedrock)
Anthropic - Claude Opus 4.5 (Bedrock)
Anthropic - Claude Opus 4.6 (Bedrock)
Anthropic - Claude Sonnet 4.5 (Bedrock)
Anthropic - Claude Sonnet 4.6 (Bedrock)
Cohere - Command R v1 (Bedrock)
Cohere - Command R+ v1 (Bedrock)
DeepSeek - R1 (Bedrock)
Google - Gemma 3 12B (Bedrock)
Google - Gemma 3 27B (Bedrock)
Google - Gemma 3 4B (Bedrock)
Meta - Llama 3 70B Instruct v1 (Bedrock)
Meta - Llama 3 8B Instruct v1 (Bedrock)
Meta - Llama 3.1 70B Instruct v1 (Bedrock)
Meta - Llama 3.1 8B Instruct v1 (Bedrock)
Meta - Llama 3.2 11B Instruct (Bedrock)
Meta - Llama 3.2 1B Instruct (Bedrock)
Meta - Llama 3.2 3B Instruct (Bedrock)
Meta - Llama 3.2 90B Instruct (Bedrock)
Meta - Llama 3.3 70B Instruct (Bedrock)
Meta - Llama 4 Maverick 17B Instruct (Bedrock)
Meta - Llama 4 Scout 17B Instruct (Bedrock)
MiniMax - M2 (Bedrock)
Mistral - 7B Instruct (Bedrock)
Mistral - Large (Bedrock)
Mistral - Large 3 (Bedrock)
Mistral - Magistral Small (Bedrock)
Mistral - Ministral 14B (Bedrock)
Mistral - Ministral 3B (Bedrock)
Mistral - Ministral 8B (Bedrock)
Mistral - Pixtral Large 25.02 (Bedrock)
Mistral - Small 24.02 (Bedrock)
Mixtral - 8x7B Instruct (Bedrock)
Moonshot - Kimi K2 Thinking (Bedrock)
NVIDIA - Nemotron Nano 12B (Bedrock)
NVIDIA - Nemotron Nano 9B (Bedrock)
OpenAI - GPT OSS 120B (Bedrock)
OpenAI - GPT OSS 20B (Bedrock)
Qwen - Qwen3 32B (Bedrock)
Qwen - Qwen3 Coder 30B (Bedrock)
Qwen - Qwen3 Next 80B (Bedrock)
Qwen - Qwen3 VL 235B A22B (Bedrock)
Writer - Palmyra X4 (Bedrock)
Writer - Palmyra X5 (Bedrock)

Databricks

Meta Llama 3.1 405B Instruct (Databricks)
Meta Llama 3.1 70B Instruct (Databricks)

Mistral

ministral-3b
ministral-8b
ministral-large
ministral-small

NVIDIA

BAAI BGE M3 (NVIDIA)
BigCode StarCoder2 15B (NVIDIA)
BigCode StarCoder2 7B (NVIDIA)
Databricks DBRX Instruct (NVIDIA)
DeepSeek AI DeepSeek-R1 (NVIDIA)
DeepSeek AI DeepSeek-R1 Distill Llama 8B (NVIDIA)
DeepSeek AI DeepSeek-R1 Distill Qwen 32B (NVIDIA)
DeepSeek AI DeepSeek-R1 Distill Qwen 7B (NVIDIA)
Google Gemma 2B (NVIDIA)
Google Gemma 3 12B It (NVIDIA)
Google Gemma 3 1B It (NVIDIA)
Google Gemma 3 27B It (NVIDIA)
Google Gemma 3 4B It (NVIDIA)
Google Gemma 7B (NVIDIA)
Microsoft Phi 3 Medium 128K Instruct (NVIDIA)
Microsoft Phi 3 Medium 4K Instruct (NVIDIA)
Microsoft Phi 3 Mini 128K Instruct (NVIDIA)
Microsoft Phi 3 Mini 4K Instruct (NVIDIA)
Microsoft Phi 3 Small 128K Instruct (NVIDIA)
Microsoft Phi 3 Small 8K Instruct (NVIDIA)
Microsoft Phi 3 Vision 128K Instruct (NVIDIA)
Microsoft Phi 3.5 Mini Instruct (NVIDIA)
Microsoft Phi 3.5 Moe Instruct (NVIDIA)
Microsoft Phi 3.5 Vision Instruct (NVIDIA)
Microsoft Phi 4 Mini Instruct (NVIDIA)
Microsoft Phi 4 Multimodal Instruct (NVIDIA)
NVIDIA Llama 3.1 Nemotron 70B Reward (NVIDIA)
NVIDIA Llama 3.1 Nemotron Nano 8B V1 (NVIDIA)
NVIDIA Llama 3.3 Nemotron Super 49B V1 (NVIDIA)
NVIDIA Nemotron 4 340B Instruct (NVIDIA)
NVIDIA Nemotron 4 340B Reward (NVIDIA)
NVIDIA Nemotron 4 Mini Hindi 4B Instruct (NVIDIA)
Qwen Qwen2 7B Instruct (NVIDIA)
Qwen Qwen2.5 7B Instruct (NVIDIA)
Qwen Qwen2.5 Coder 32B Instruct (NVIDIA)
Qwen Qwen2.5 Coder 7B Instruct (NVIDIA)
Qwen Qwq 32B (NVIDIA)
Tiiuae Falcon3 7B Instruct (NVIDIA)
Writer Palmyra Med 70B (NVIDIA)
Writer Palmyra Med 70B 32K (NVIDIA)
Yentinglin Llama 3 Taiwan 70B Instruct (NVIDIA)
Zyphra Zamba2 7B Instruct (NVIDIA)

OpenAI

GPT-4o
GPT-4o mini
gpt-4.1
gpt-4.1-mini
gpt-4.1-nano
gpt-5
gpt-5-mini
gpt-5-nano
gpt-5.1
gpt-5.2
gpt-5.4
o1
o3
o3-mini
o3-pro
o4-mini

Vertex AI

gemini-2.0-flash
gemini-2.0-flash-lite
gemini-2.0-flash-thinking
gemini-2.0-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.5-pro
gemini-3.0-flash-preview
gemini-3.0-pro-image-preview
gemini-3.0-pro-preview
gemini-3.1-flash-image-preview
gemini-3.1-pro-preview
gemini-3.1-pro-preview-customtools

Writer

Exam Works
Palmyra Base
Palmyra Beta
Palmyra E
Palmyra Instruct
Palmyra Instruct 30
Palmyra Large
Palmyra Med
Palmyra X
Palmyra X 32K
Silk Road

General properties

"default_model": name of the model to be used by default when a model is not selected. If not provided, defaults to the first model in model_properties.
"endpoint": URL of the AI provider’s Chat Completions endpoint. Galileo will append /chat/completions to this base URL. The endpoint must be compatible with the OpenAI Chat Completions API.
"custom_header_mapping": A dictionary mapping internal fields (job_id, user_id, project_id, run_id) to custom header names that will be set on inference requests.
"headers": A dictionary of header names as keys, and their corresponding values. Will be set on the inference request, overriding any existing value.

API schema

To configure custom integrations via Galileo’s API instead of via the UI, refer to the Custom Integrations API reference.

Troubleshooting

Make sure you have valid authentication credentials (e.g. an API key).
Make sure the model name is exactly as specified through the provider.
Make sure that requests are being sent to the provider’s endpoint.

As an example, use this curl command to verify that an API key and model for Portkey has been configured correctly.

curl https://api.portkey.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-portkey-api-key: $PORTKEY_API_KEY" \
  -d '{
    "model": "YOUR_MODEL",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is Portkey"}
    ],
    "max_tokens": 512
  }'

Advanced usage: custom LLM handlers

Custom LLM handlers are only available on single-tenant Galileo deployments. They require api v1.848.0+ and runners v2.239.0+.

The JSON properties example works when your LLM provider exposes a standard OpenAI-compatible /chat/completions endpoint. However, some providers use proprietary request formats, non-standard response structures, or custom authentication flows that can’t be handled by configuration alone. For these cases, you can write a custom LLM handler — a Python class that gives you full control over how Galileo sends requests to your model and interprets the responses.

When to use a custom handler

Your provider’s API doesn’t follow the OpenAI /chat/completions format
You need to transform requests or responses (e.g., different payload structure, custom headers)
Your authentication flow goes beyond OAuth2 or API keys (e.g., signed requests, mTLS)
You need custom retry logic or error handling

Writing a handler

Create a Python file with a class that extends litellm.CustomLLM. Your class must implement the acompletion method, which receives the standard LiteLLM inputs and must return a ModelResponse:

# File: proprietary_handler.py

from litellm import CustomLLM
from litellm.types.utils import ModelResponse
import httpx


class ProprietaryLLMHandler(CustomLLM):
    """Custom handler for proprietary LLM API."""

    def __init__(self, timeout: int = 30, retry_count: int = 1):
        self.timeout = timeout
        self.retry_count = retry_count

    async def acompletion(
        self,
        model: str,
        messages: list,
        api_base: str,
        custom_llm_provider: str,
        **kwargs
    ) -> ModelResponse:
        """Handle async completion requests to the proprietary API."""

        # Transform messages to provider's format
        payload = self._transform_request(messages, **kwargs)

        # Make the async API call
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{api_base}/generate",
                json=payload,
                timeout=self.timeout
            )
            response.raise_for_status()

        # Transform response to LiteLLM format
        return self._transform_response(response.json(), model)

    def _transform_request(self, messages: list, **kwargs) -> dict:
        """Transform LiteLLM messages to provider format."""
        return {
            "prompt": messages,
            "temperature": kwargs.get("temperature", 0.7),
            "max_tokens": kwargs.get("max_tokens", 1024)
        }

    def _transform_response(self, data: dict, model: str) -> ModelResponse:
        """Transform provider response to LiteLLM format."""
        return ModelResponse(
            id=data.get("id", "response-id"),
            choices=[{
                "message": {
                    "role": "assistant",
                    "content": data["output"]
                },
                "finish_reason": "stop"
            }],
            model=model,
            usage={
                "prompt_tokens": data.get("input_tokens", 0),
                "completion_tokens": data.get("output_tokens", 0),
                "total_tokens": data.get("total_tokens", 0)
            }
        )

Configuring the handler in your integration payload

Reference your handler using the custom_llm_config field:

file_name

string

required

Python file containing the CustomLLM class (e.g., "proprietary_handler.py").

class_name

string

required

Class name (must be a litellm.CustomLLM subclass).

init_kwargs

object

Keyword arguments passed to the handler’s constructor.

Example:

{
  "authentication_type": "none",
  "model_properties": [
    {
      "name": "proprietary-model",
      "alias": "Proprietary Model",
      "based_on": "gpt-5.4"
    }
  ],
  "endpoint": "https://ai.example.com/inference",
  "custom_llm_config": {
    "file_name": "proprietary_handler.py",
    "class_name": "ProprietaryLLMHandler",
    "init_kwargs": {
      "timeout": 60,
      "retry_count": 3
    }
  }
}

The model properties and general properties also apply to custom LLM handler JSON configuration.

The custom LLM handler receives the endpoint as the api_base parameter. It’s up to the handler’s implementation to use it or ignore it.

Deploying handler files

Handler files must be placed on the Galileo runners container filesystem.

GALILEO_CUSTOM_LLMS_ENABLED

boolean

default:"false"

Set to true to enable custom LLM support.

GALILEO_CUSTOM_LLMS_DIRECTORY

string

default:"/opt/custom_llms"

Directory where handler files are located.

Place your .py files directly in the configured directory (nested paths are not supported):

/opt/custom_llms/proprietary_handler.py
/opt/custom_llms/enterprise_adapter.py

Deployment options:

Volume mount — Mount a volume containing your handler files at /opt/custom_llms
Custom image — Build a custom runner image with handler files copied in
Custom directory — Set GALILEO_CUSTOM_LLMS_DIRECTORY to a different path

Security notes

Tokens are encrypted before storage
OAuth2 client credentials should be kept confidential and rotated regularly
Custom LLM handler files should be reviewed for security before deployment

Overview

Get Started

Logging and Monitoring

Experiments

Runtime Protection

Metrics

Annotations

Integrations

Security

References

Custom Model Integrations

What are custom integrations?

Configuring custom integrations in the Galileo console

JSON properties

Authentication properties

API key authentication

Bearer token authentication

OAuth2 authentication

No authentication

Model properties

General properties

API schema

Troubleshooting

Advanced usage: custom LLM handlers

When to use a custom handler

Writing a handler

Configuring the handler in your integration payload

Deploying handler files

Security notes

Overview

Get Started

Logging and Monitoring

Experiments

Runtime Protection

Metrics

Annotations

Integrations

Security

References

​What are custom integrations?

​Configuring custom integrations in the Galileo console

​JSON properties

​Authentication properties

​API key authentication

​Bearer token authentication

​OAuth2 authentication

​No authentication

​Model properties

​General properties

​API schema

​Troubleshooting

​Advanced usage: custom LLM handlers

​When to use a custom handler

​Writing a handler

​Configuring the handler in your integration payload

​Deploying handler files

​Security notes

What are custom integrations?

Configuring custom integrations in the Galileo console

JSON properties

Authentication properties

API key authentication

Bearer token authentication

OAuth2 authentication

No authentication

Model properties

General properties

API schema

Troubleshooting

Advanced usage: custom LLM handlers

When to use a custom handler

Writing a handler

Configuring the handler in your integration payload

Deploying handler files

Security notes