einfra logoDocumentation
AI as a Service (AIaaS)

OpenAI API

API Introduction

This documentation provides detailed instructions for generating and using API keys to access locally running Large Language Models (LLMs) on the e-INFRA CZ infrastructure, specifically through the Open-WebUI interface https://chat.ai.e-infra.cz. The guide is tailored for researchers and scientists who aim to integrate these models into their applications, scripts, or AI workflows via API.

To access the Open-WebUI service and generate an API key, you must meet the following prerequisites:

  • A valid MetaCentrum account (for Czech research institutions)
  • or an active Masaryk University account (if affiliated).

For a comprehensive description of available models, see the Chat AI documentation

Creating an API Key

API keys serve as authentication tokens to securely access Open-WebUI’s API endpoint. Follow these steps to generate and use your API key:

Step-by-Step Instructions

  1. Go to the Settings section of the Open-WebUI interface https://chat.ai.e-infra.cz.
  2. Nagigate to the Account (Účet).
  3. Click on API keys (display).
  4. Ignore JWT token and select API key and either generate new or display existing.
  5. Copy the generated API key and store it securely.
  6. Use this key in API requests to authenticate and access Open-WebUI services.
  7. The base endpoint for the Open-WebUI API is: https://llm.ai.e-infra.cz/v1/. This endpoint follows the OpenAI API specification, enabling compatibility with many existing LLM frameworks and applications.

Using the API Key

Listing Available Models

Before querying a model, ensure that you use the correct model name. To retrieve a list of all available models on the e-INFRA CZ infrastructure, run the following using curl and jq commands. Replace ${E_INFRA_API_TOKEN} with your real token.

curl -H "Authorization: Bearer ${E_INFRA_API_TOKEN}" https://llm.ai.e-infra.cz/v1/models | jq .data[].id

Expected Output (example):

"llama3.3:latest"
"llama3.3:70b-instruct-fp16"
"deepseek-r1:32b-qwen-distill-fp16"
"qwen2.5-coder:32b-instruct-q8_0"
"aya-expanse:latest"

This list reflects the model identifiers (id) that can be queried via the API. The identifiers follow a naming convention, typically including:

  • The model name (e.g., llama3.3, qwen2.5-coder).
  • A quantization tag (e.g., fp16, q8_0) that indicates how the model is optimized for inference (important for performance and resource consumption).
  • A variant or revision (e.g., :latest, :70b-instruct-fp16).

Example API Request

Below is an example of how to use the API key to query the LLama 3.3 model (llama3.3:latest) with a chat completions request:

curl https://llm.ai.e-infra.cz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${E_INFRA_API_TOKEN}" \
  -d '{
    "model": "llama3.3:latest",
    "messages": [
      {
        "role": "user",
        "content": "Explain the impact of machine learning on climate research in 100 words or less."
      }
    ]
  }'

Expected Output (example):

{
  "id": "chatcmpl-XYZ123",
  "object": "chat.completion",
  "model": "llama3.3:latest",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Machine learning (ML) is revolutionizing climate research by unlocking unprecedented insights..."
      },
      "index": 0,
      "finish_reason": "length"
    }
  ]
}

Framework Integration

The e-INFRA CZ LLM API is designed to be compatible with the OpenAI API specification, meaning that it can be integrated into frameworks originally built for OpenAI’s services.

PyDanticAI Integration Example

PyDanticAI is a framework that simplifies LLM interactions using OpenAI-compatible models. Use the following configuration to authenticate with the e-INFRA CZ LLM endpoint:

Example PydanticAI configuration for integrating with our API (use similar settings for other frameworks):

from pydantic_ai_provider import OpenAIModel, OpenAIProvider
import os

model = OpenAIModel(
    'deepseek-r1',
    provider=OpenAIProvider(
        base_url="https://llm.ai.e-infra.cz/v1",
        api_key=os.getenv("E_INFRA_API_TOKEN"),
    ),
)

Beyond PyDanticAI, similar configurations can be applied to:

  • LangChain
  • LlamaIndex
  • FastAPI-based applications
  • Any other frameworks or clients that support OpenAI API custom endpoints.

Reasoning Models in the API

Some models are hybrid, supporting both reasoning (thinking) and non-reasoning modes.

In the chat UI, most hybrid models are preconfigured to run in reasoning mode, which means you may see intermediate thinking output before the final response is returned. In the API, however, the default behavior depends on the specific model.

DeepSeek v3.2

By default, DeepSeek v3.2 runs without reasoning enabled. To enable reasoning, pass additional parameters via chat_template_kwargs:

curl https://llm.ai.e-infra.cz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer xxx" \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [
      {
        "role": "user",
        "content": "What are 5 creative things I could do with my kids'' art? I don''t want to throw them away, but it''s also so much clutter."
      }
    ],
    "chat_template_kwargs": {
      "thinking": true
    }
  }'

For convenience—and for environments where chat_template_kwargs cannot be used (for example, certain AI agents)—we also provide a dedicated reasoning variant named deepseek-v3.2-thinking, which is permanently forced into thinking mode.

GLM-4.7

In contrast, the GLM-4.7 model enables reasoning mode by default. To disable reasoning and return only the final response, explicitly turn it off using chat_template_kwargs:

curl https://llm.ai.e-infra.cz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer xxx" \
  -d '{
    "model": "glm-4.7",
    "messages": [
      {
        "role": "user",
        "content": "What are 5 creative things I could do with my kids'' art? I don''t want to throw them away, but it''s also so much clutter."
      }
    ],
    "chat_template_kwargs": {
      "enable_thinking": false
    }
  }'

Use these options to control whether intermediate reasoning is included in API responses, depending on your application’s needs.

Last updated on

publicity banner

On this page

einfra banner