Skip to main content

What are custom integrations?

Galileo’s custom integrations provide a flexible way of setting up LLMs that aren’t supported through other existing integrations. For example, when there are non-standard proxies, proprietary authentications, or proprietary inference protocols. This video demonstrates how to add a custom integration to Galileo:

Configuring custom integrations in the Galileo console

1

Navigate to Integrations

Navigate to Settings > Integrations in the Galileo console.
2

Add Custom Integration

Find the Custom integration card and click Add IntegrationCustom integration in Galileo console
3

Configure Integration

Paste a valid JSON and Save changes.

This JSON example uses API key authentication. See JSON properties below for an explanation of these and other properties, and instructions for other authentication types.
{
  "authentication_type": "api_key",
  "api_key_header": "YOUR_API_KEY_HEADER",
  "api_key_value": "YOUR_API_KEY_VALUE",
  "model_properties": [
    {
      "name": "gpt-5.2",
      "alias": "GPT 5.2",
      "supported_parameters": [
        "max_tokens",
        "n",
        "reasoning_effort",
        "stop_sequences",
        "temperature",
        "tool_choice",
        "tools",
        "verbosity"
      ]
    },
    {
      "name": "gpt-5.4",
      "alias": "GPT 5.4",
      "based_on": "gpt-5.4"
    },
    {
      "name": "claude-opus-4-6",
      "alias": "Opus 4.6",
      "based_on": "Claude Opus 4.6"
    }
  ],
  "endpoint": "https://YOUR_PROVIDER_BASE_URL"
}
Pasting the JSON payload
4

Test Integration

After saving, test the integration by selecting one of its models from a Playground.Selecting the custom integration model in the PlaygroundRun a prompt or evaluation that uses the custom integration’s model.Testing the custom integrationNote: The models will also be available for use with metrics in Galileo.

JSON properties

Authentication properties

API key authentication

For providers that use API key authentication (for example, Portkey), specify the following properties:
  • "authentication_type": "api_key".
  • "api_key_header": The name of the header that the AI provider uses for API key authentication. For example, for Portkey the name is "x-portkey-api-key". Consult your AI provider’s documentation to find out what is the header name that it requires.
  • "api_key_value": The API key to be used.
{
  "authentication_type": "api_key",
  "api_key_header": "YOUR_API_KEY_HEADER",
  "api_key_value": "YOUR_API_KEY_VALUE",
  "model_properties": [
    {
      "name": "gpt-5.2",
      "alias": "GPT 5.2",
      "supported_parameters": [
        "max_tokens",
        "n",
        "reasoning_effort",
        "stop_sequences",
        "temperature",
        "tool_choice",
        "tools",
        "verbosity"
      ]
    },
    {
      "name": "gpt-5.4",
      "alias": "GPT 5.4",
      "based_on": "gpt-5.4"
    },
    {
      "name": "claude-opus-4-6",
      "alias": "Opus 4.6",
      "based_on": "Claude Opus 4.6"
    }
  ],
  "endpoint": "https://YOUR_PROVIDER_BASE_URL"
}

Bearer token authentication

For providers that use a static, pre-defined token for authentication (for example, Together AI), specify the following properties:
  • "authentication_type": "api_key".
  • "api_key_header": "Authorization".
  • "api_key_value": "Bearer YOUR_TOKEN" where YOUR_TOKEN is a long text sequence representing the authentication token.
{
  "authentication_type": "api_key",
  "api_key_header": "Authorization",
  "api_key_value": "Bearer YOUR_TOKEN",
  "model_properties": [
    {
      "name": "meta-llama/Llama-4-Scout-17B-16E-Instruct",
      "alias": "Llama 4 Scout",
      "supported_parameters": [
        "max_tokens",
        "stop_sequences",
        "temperature",
        "top_p"
      ]
    }
  ],
  "endpoint": "https://api.together.xyz/v1"
}

OAuth2 authentication

For providers that require dynamically-generated bearer tokens:
  • "authentication_type": "oauth2".
  • "oauth2_token_url": Endpoint URL of the OAuth2 server. This endpoint must be compatible with the OAuth2 Client Credentials Grant.
  • "authentication_scope": "YOUR_SCOPE" (optional, passed to the OAuth2 endpoint as the scope property of the token request payload).
  • "token": "{\"client_id\": \"YOUR_CLIENT_ID\", \"client_secret\": \"YOUR_CLIENT_SECRET\"}" (an escaped JSON string containing the static client ID and secret that will be sent to the OAuth2 endpoint).
The access_token field of the OAuth2 endpoint’s JSON response will be used as a Bearer token for LLM inference requests.
{
  "authentication_type": "oauth2",
  "oauth2_token_url": "https://auth.provider.com/oauth2/token",
  "token": "{\"client_id\": \"YOUR_CLIENT_ID\", \"client_secret\": \"YOUR_CLIENT_SECRET\"}",
  "authentication_scope": "inference",
  "model_properties": [
    {
      "name": "provider-model-v2",
      "alias": "Provider Model v2",
      "supported_parameters": [
        "max_tokens",
        "temperature",
        "top_p"
      ]
    }
  ],
  "endpoint": "https://api.provider.com/v1"
}

No authentication

For internal endpoints or providers that don’t require authentication:
  • "authentication_type": "none".
{
  "authentication_type": "none",
  "model_properties": [
    {
      "name": "internal-model-v1",
      "alias": "Internal Model",
      "supported_parameters": [
        "max_tokens",
        "temperature",
        "top_p"
      ]
    }
  ],
  "endpoint": "https://internal-gateway.local/v1"
}
The above examples cover the most common use cases. Most users won’t need to read beyond this point. The sections below are for advanced scenarios like custom LLM handlers and single-tenant deployments.

Model properties

The model_properties JSON property is used to configure the models that the AI provider supports. For each model, the following properties are available:
  • "name": Name of the model on the AI provider. The value is passed verbatim to the AI provider on the inference request.
  • "alias": The unique identifier of the model in Galileo. The value is displayed in the user interface when selecting models. If not provided, name will be used as a default value.
  • "based_on": An optional alias of a built-in Galileo model. If provided, the supported_parameters of the built-in model will be used for this custom integration model.
  • "supported_parameters": A list of parameters that the custom model supports. Alternative to based_on, with the difference that instead of copying the parameter names from a built-in model, it directly provides the list.
If neither based_on or supported_parameters are provided, this parameter list will be used by default: ["frequency_penalty", "max_tokens", "presence_penalty", "stop", "temperature", "top_p"]. List of built-in model aliases that can be used with based_on:
  • Claude 3 Haiku
  • Claude 3.7 Sonnet
  • Claude Haiku 4.5
  • Claude Opus 4
  • Claude Opus 4.1
  • Claude Opus 4.5
  • Claude Opus 4.6
  • Claude Sonnet 4
  • Claude Sonnet 4.5
  • Claude Sonnet 4.6
  • GPT-4o (Azure)
  • GPT-4o mini (Azure)
  • gpt-4.1 (Azure)
  • gpt-4.1-mini (Azure)
  • gpt-4.1-nano (Azure)
  • gpt-5 (Azure)
  • gpt-5-mini (Azure)
  • gpt-5-nano (Azure)
  • o1 (Azure)
  • o3 (Azure)
  • o3 mini (Azure)
  • o4 mini (Azure)
  • AI21 - Jamba 1.5 Large (Bedrock)
  • AI21 - Jamba 1.5 Mini (Bedrock)
  • Amazon - Nova 2 Lite (Bedrock)
  • Amazon - Nova Lite (Bedrock)
  • Amazon - Nova Micro (Bedrock)
  • Amazon - Nova Premier (Bedrock)
  • Amazon - Nova Pro (Bedrock)
  • Anthropic - Claude 3 Haiku (Bedrock)
  • Anthropic - Claude 3.5 Sonnet (Bedrock)
  • Anthropic - Claude 3.5 Sonnet v2 (Bedrock)
  • Anthropic - Claude 3.7 Sonnet (Bedrock)
  • Anthropic - Claude 4 Opus (Bedrock)
  • Anthropic - Claude 4 Sonnet (Bedrock)
  • Anthropic - Claude Haiku 4.5 (Bedrock)
  • Anthropic - Claude Opus 4.1 (Bedrock)
  • Anthropic - Claude Opus 4.5 (Bedrock)
  • Anthropic - Claude Opus 4.6 (Bedrock)
  • Anthropic - Claude Sonnet 4.5 (Bedrock)
  • Anthropic - Claude Sonnet 4.6 (Bedrock)
  • Cohere - Command R v1 (Bedrock)
  • Cohere - Command R+ v1 (Bedrock)
  • DeepSeek - R1 (Bedrock)
  • Google - Gemma 3 12B (Bedrock)
  • Google - Gemma 3 27B (Bedrock)
  • Google - Gemma 3 4B (Bedrock)
  • Meta - Llama 3 70B Instruct v1 (Bedrock)
  • Meta - Llama 3 8B Instruct v1 (Bedrock)
  • Meta - Llama 3.1 70B Instruct v1 (Bedrock)
  • Meta - Llama 3.1 8B Instruct v1 (Bedrock)
  • Meta - Llama 3.2 11B Instruct (Bedrock)
  • Meta - Llama 3.2 1B Instruct (Bedrock)
  • Meta - Llama 3.2 3B Instruct (Bedrock)
  • Meta - Llama 3.2 90B Instruct (Bedrock)
  • Meta - Llama 3.3 70B Instruct (Bedrock)
  • Meta - Llama 4 Maverick 17B Instruct (Bedrock)
  • Meta - Llama 4 Scout 17B Instruct (Bedrock)
  • MiniMax - M2 (Bedrock)
  • Mistral - 7B Instruct (Bedrock)
  • Mistral - Large (Bedrock)
  • Mistral - Large 3 (Bedrock)
  • Mistral - Magistral Small (Bedrock)
  • Mistral - Ministral 14B (Bedrock)
  • Mistral - Ministral 3B (Bedrock)
  • Mistral - Ministral 8B (Bedrock)
  • Mistral - Pixtral Large 25.02 (Bedrock)
  • Mistral - Small 24.02 (Bedrock)
  • Mixtral - 8x7B Instruct (Bedrock)
  • Moonshot - Kimi K2 Thinking (Bedrock)
  • NVIDIA - Nemotron Nano 12B (Bedrock)
  • NVIDIA - Nemotron Nano 9B (Bedrock)
  • OpenAI - GPT OSS 120B (Bedrock)
  • OpenAI - GPT OSS 20B (Bedrock)
  • Qwen - Qwen3 32B (Bedrock)
  • Qwen - Qwen3 Coder 30B (Bedrock)
  • Qwen - Qwen3 Next 80B (Bedrock)
  • Qwen - Qwen3 VL 235B A22B (Bedrock)
  • Writer - Palmyra X4 (Bedrock)
  • Writer - Palmyra X5 (Bedrock)
  • Meta Llama 3.1 405B Instruct (Databricks)
  • Meta Llama 3.1 70B Instruct (Databricks)
  • ministral-3b
  • ministral-8b
  • ministral-large
  • ministral-small
  • BAAI BGE M3 (NVIDIA)
  • BigCode StarCoder2 15B (NVIDIA)
  • BigCode StarCoder2 7B (NVIDIA)
  • Databricks DBRX Instruct (NVIDIA)
  • DeepSeek AI DeepSeek-R1 (NVIDIA)
  • DeepSeek AI DeepSeek-R1 Distill Llama 8B (NVIDIA)
  • DeepSeek AI DeepSeek-R1 Distill Qwen 32B (NVIDIA)
  • DeepSeek AI DeepSeek-R1 Distill Qwen 7B (NVIDIA)
  • Google Gemma 2B (NVIDIA)
  • Google Gemma 3 12B It (NVIDIA)
  • Google Gemma 3 1B It (NVIDIA)
  • Google Gemma 3 27B It (NVIDIA)
  • Google Gemma 3 4B It (NVIDIA)
  • Google Gemma 7B (NVIDIA)
  • Microsoft Phi 3 Medium 128K Instruct (NVIDIA)
  • Microsoft Phi 3 Medium 4K Instruct (NVIDIA)
  • Microsoft Phi 3 Mini 128K Instruct (NVIDIA)
  • Microsoft Phi 3 Mini 4K Instruct (NVIDIA)
  • Microsoft Phi 3 Small 128K Instruct (NVIDIA)
  • Microsoft Phi 3 Small 8K Instruct (NVIDIA)
  • Microsoft Phi 3 Vision 128K Instruct (NVIDIA)
  • Microsoft Phi 3.5 Mini Instruct (NVIDIA)
  • Microsoft Phi 3.5 Moe Instruct (NVIDIA)
  • Microsoft Phi 3.5 Vision Instruct (NVIDIA)
  • Microsoft Phi 4 Mini Instruct (NVIDIA)
  • Microsoft Phi 4 Multimodal Instruct (NVIDIA)
  • NVIDIA Llama 3.1 Nemotron 70B Reward (NVIDIA)
  • NVIDIA Llama 3.1 Nemotron Nano 8B V1 (NVIDIA)
  • NVIDIA Llama 3.3 Nemotron Super 49B V1 (NVIDIA)
  • NVIDIA Nemotron 4 340B Instruct (NVIDIA)
  • NVIDIA Nemotron 4 340B Reward (NVIDIA)
  • NVIDIA Nemotron 4 Mini Hindi 4B Instruct (NVIDIA)
  • Qwen Qwen2 7B Instruct (NVIDIA)
  • Qwen Qwen2.5 7B Instruct (NVIDIA)
  • Qwen Qwen2.5 Coder 32B Instruct (NVIDIA)
  • Qwen Qwen2.5 Coder 7B Instruct (NVIDIA)
  • Qwen Qwq 32B (NVIDIA)
  • Tiiuae Falcon3 7B Instruct (NVIDIA)
  • Writer Palmyra Med 70B (NVIDIA)
  • Writer Palmyra Med 70B 32K (NVIDIA)
  • Yentinglin Llama 3 Taiwan 70B Instruct (NVIDIA)
  • Zyphra Zamba2 7B Instruct (NVIDIA)
  • GPT-4o
  • GPT-4o mini
  • gpt-4.1
  • gpt-4.1-mini
  • gpt-4.1-nano
  • gpt-5
  • gpt-5-mini
  • gpt-5-nano
  • gpt-5.1
  • gpt-5.2
  • gpt-5.4
  • o1
  • o3
  • o3-mini
  • o3-pro
  • o4-mini
  • gemini-2.0-flash
  • gemini-2.0-flash-lite
  • gemini-2.0-flash-thinking
  • gemini-2.0-pro
  • gemini-2.5-flash
  • gemini-2.5-flash-lite
  • gemini-2.5-pro
  • gemini-3.0-flash-preview
  • gemini-3.0-pro-image-preview
  • gemini-3.0-pro-preview
  • gemini-3.1-flash-image-preview
  • gemini-3.1-pro-preview
  • gemini-3.1-pro-preview-customtools
  • Exam Works
  • Palmyra Base
  • Palmyra Beta
  • Palmyra E
  • Palmyra Instruct
  • Palmyra Instruct 30
  • Palmyra Large
  • Palmyra Med
  • Palmyra X
  • Palmyra X 32K
  • Silk Road

General properties

  • "default_model": name of the model to be used by default when a model is not selected. If not provided, defaults to the first model in model_properties.
  • "endpoint": URL of the AI provider’s Chat Completions endpoint. Galileo will append /chat/completions to this base URL. The endpoint must be compatible with the OpenAI Chat Completions API.
  • "custom_header_mapping": A dictionary mapping internal fields (job_id, user_id, project_id, run_id) to custom header names that will be set on inference requests.
  • "headers": A dictionary of header names as keys, and their corresponding values. Will be set on the inference request, overriding any existing value.

API schema

To configure custom integrations via Galileo’s API instead of via the UI, refer to the Custom Integrations API reference.

Troubleshooting

  • Make sure you have valid authentication credentials (e.g. an API key).
  • Make sure the model name is exactly as specified through the provider.
  • Make sure that requests are being sent to the provider’s endpoint.
As an example, use this curl command to verify that an API key and model for Portkey has been configured correctly.
curl https://api.portkey.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-portkey-api-key: $PORTKEY_API_KEY" \
  -d '{
    "model": "YOUR_MODEL",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is Portkey"}
    ],
    "max_tokens": 512
  }'

Advanced usage: custom LLM handlers

Custom LLM handlers are only available on single-tenant Galileo deployments. They require api v1.848.0+ and runners v2.239.0+.
The JSON properties example works when your LLM provider exposes a standard OpenAI-compatible /chat/completions endpoint. However, some providers use proprietary request formats, non-standard response structures, or custom authentication flows that can’t be handled by configuration alone. For these cases, you can write a custom LLM handler — a Python class that gives you full control over how Galileo sends requests to your model and interprets the responses.

When to use a custom handler

  • Your provider’s API doesn’t follow the OpenAI /chat/completions format
  • You need to transform requests or responses (e.g., different payload structure, custom headers)
  • Your authentication flow goes beyond OAuth2 or API keys (e.g., signed requests, mTLS)
  • You need custom retry logic or error handling

Writing a handler

Create a Python file with a class that extends litellm.CustomLLM. Your class must implement the acompletion method, which receives the standard LiteLLM inputs and must return a ModelResponse:
# File: proprietary_handler.py

from litellm import CustomLLM
from litellm.types.utils import ModelResponse
import httpx


class ProprietaryLLMHandler(CustomLLM):
    """Custom handler for proprietary LLM API."""

    def __init__(self, timeout: int = 30, retry_count: int = 1):
        self.timeout = timeout
        self.retry_count = retry_count

    async def acompletion(
        self,
        model: str,
        messages: list,
        api_base: str,
        custom_llm_provider: str,
        **kwargs
    ) -> ModelResponse:
        """Handle async completion requests to the proprietary API."""

        # Transform messages to provider's format
        payload = self._transform_request(messages, **kwargs)

        # Make the async API call
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{api_base}/generate",
                json=payload,
                timeout=self.timeout
            )
            response.raise_for_status()

        # Transform response to LiteLLM format
        return self._transform_response(response.json(), model)

    def _transform_request(self, messages: list, **kwargs) -> dict:
        """Transform LiteLLM messages to provider format."""
        return {
            "prompt": messages,
            "temperature": kwargs.get("temperature", 0.7),
            "max_tokens": kwargs.get("max_tokens", 1024)
        }

    def _transform_response(self, data: dict, model: str) -> ModelResponse:
        """Transform provider response to LiteLLM format."""
        return ModelResponse(
            id=data.get("id", "response-id"),
            choices=[{
                "message": {
                    "role": "assistant",
                    "content": data["output"]
                },
                "finish_reason": "stop"
            }],
            model=model,
            usage={
                "prompt_tokens": data.get("input_tokens", 0),
                "completion_tokens": data.get("output_tokens", 0),
                "total_tokens": data.get("total_tokens", 0)
            }
        )

Configuring the handler in your integration payload

Reference your handler using the custom_llm_config field:
file_name
string
required
Python file containing the CustomLLM class (e.g., "proprietary_handler.py").
class_name
string
required
Class name (must be a litellm.CustomLLM subclass).
init_kwargs
object
Keyword arguments passed to the handler’s constructor.
Example:
{
  "authentication_type": "none",
  "model_properties": [
    {
      "name": "proprietary-model",
      "alias": "Proprietary Model",
      "based_on": "gpt-5.4"
    }
  ],
  "endpoint": "https://ai.example.com/inference",
  "custom_llm_config": {
    "file_name": "proprietary_handler.py",
    "class_name": "ProprietaryLLMHandler",
    "init_kwargs": {
      "timeout": 60,
      "retry_count": 3
    }
  }
}
The model properties and general properties also apply to custom LLM handler JSON configuration.
The custom LLM handler receives the endpoint as the api_base parameter. It’s up to the handler’s implementation to use it or ignore it.

Deploying handler files

Handler files must be placed on the Galileo runners container filesystem.
GALILEO_CUSTOM_LLMS_ENABLED
boolean
default:"false"
Set to true to enable custom LLM support.
GALILEO_CUSTOM_LLMS_DIRECTORY
string
default:"/opt/custom_llms"
Directory where handler files are located.
Place your .py files directly in the configured directory (nested paths are not supported):
/opt/custom_llms/proprietary_handler.py
/opt/custom_llms/enterprise_adapter.py
Deployment options:
  1. Volume mount — Mount a volume containing your handler files at /opt/custom_llms
  2. Custom image — Build a custom runner image with handler files copied in
  3. Custom directory — Set GALILEO_CUSTOM_LLMS_DIRECTORY to a different path

Security notes

  • Tokens are encrypted before storage
  • OAuth2 client credentials should be kept confidential and rotated regularly
  • Custom LLM handler files should be reviewed for security before deployment