AI4EOSC LLM

As part of the tool repertoire that the platform is offering to increase our user’s productivity, we are current offering an LLM Chatbot service that allows users to summarize information, get code recommendations, ask questions about the documentation, etc.

We care about user privacy, so it’s important to notice that your chat history will be erased whenever you delete it, and no data will be retained by the platform (privacy policy).

Requirements

🔓 You need a platform account with basic access level.

🤔 Platform LLM vs self-deployed LLM

We also offer a self-deployed LLM option, so which one should you choose?

Self-deployed LLM

  • available to full access level users

  • let’s you deploy a wide variety of models from a catalog,

  • resources are exclusively dedicated to you and your selected coworkers,

  • you are the admin, so you can configure model and UI parameters,

  • you are the admin, so you can create your own Knowledge Bases as persistent memory banks,

  • you are the admin, so you can use Functions to create your own agents that use custom prompts, custom Knowledge Bases, and custom input/output filtering,

Platform LLM

  • available to all users (basic access level and above)

  • uses more powerful GPUs, so it offers bigger and more accurate LLMs,

  • zero configuration needed, access directly with your credentials,

  • the backend (VLLM) is load balanced so it can offer lower latency,

  • comes with some pre-configured helpful agents, like the Assistant that helps you navigate the project’s documentation,

By default, we recommend using the platform LLM, which will offer a better experience for most users. Users with more custom needs should try nevertheless the self-deployment options.

Anyway, remember that both options are compatible: you can deploy your own LLM and still access the platform-wide one. The best of both worlds! 🚀

Both options offer a privacy-first design: once your delete your chats or knowledge bases, the data is immediately wiped out from the platform.

Login

Login into: https://chat.cloud.ai4eosc.eu

../_images/login.png

Once you login, you will arrive to a landing page where you will be able to select the model which you want to interact with.

../_images/landing.png

Available models might evolve, as we are constantly deploying newer, better and more efficient models. We typically server open-source/open-weights models from the Mistral family and the Qwen family. We also create custom agents, like the AI4EOSC Assistant that allows you to ask questions to our documentation.

Using the LLM

The AI4EOSC LLM interface is based on OpenWebUI, so please refer to the OpenWebUI documentation on how to use the different interface features (like the chat, the notes, the settings, etc).

In the following sections, we will briefly explain some common usage patterns.

Chat with the LLM

We can ask generic questions to the model.

../_images/chat.png

Remember that if you answer relies on up-to-date information, you can always enable Web search under the button.

Summarize a document

Under the button, you can select Upload files. This will allow you to query a document with questions.

../_images/upload-files.png

Ask questions about the documentation

In the upper left corner, you can select the AI4EOSC/Assistant model to ask questions about the platform. The LLM with use our documentation as knowledge base to provide truthful answers to your questions.

../_images/assistant.png

Use Vision models

In the model menu, select any model with the VISION tag. Then you will be able to upload images to the model and ask questions about them. To upload an image click the and you will be offered the possibility of either Capture an image or Upload an image.

Here are some ideas on how to incorporate this into a scientific workflow:

Detexify a LaTeX equation

Generate latex code for the above picture and render it below.

../_images/vision-detexify.png
Digitize your handwritten notes

Can you generate a Mermaid graph from this sketch? To ensure valid code, make sure that text inside boxes follows the format `letter{…}`. For example `B{Some text}`.

../_images/vision-mermaid.png

Do you use it in other ways? We are happy to hear!

Integrate it with your own services

You can access the LLM models via an OpenAI-compatible API for an easier integration with your own services.

Requirements

🔓 You need a platform account with intermediate access level or above.

Retrieve the API endpoint/key

The API endpoint to query the models is:

To generate the API keys go to the Dashboard profile, to the Secrets and API keys section. There you will be able to create a new API key selecting the name and the expiration date.

../_images/profile-llm-keys.png
ㅤ ℹ️ Budgets and rate limits

Each time you use an API key you will be consuming your daily budget. When you consume all your budget you will no longer be able to make further requests. After each day, your budget will be reset and you will be able to make calls again.

If you create different keys, both keys will consume the same budget.

Your budget depends on your current user access level.

(TPM = Tokens Per Minute, RPM = Requests Per Minute)

Group

Budget (credits) / day / user

TPM Limit / user

RPM Limit / user

ap-a

0.05

1000

2

ap-a1

0.075

1500

2

ap-b

0.1

2000

2

ap-u

1

20000

20

ap-d

1.5

30000

30


Each model will consume a different amount of resources. As a general rule of thumb:

  • small models (like Smol or OLMo) consume around 1e-8 per input token and 2e-8 per output token.

  • large models (like Mistral Small or Qwen 3) consume around 1e-7 per input token and 3e-7 per output token.

  • embedding models (like Qwen3 Embeddings) consume around 2e-8 per input token.

You can use OpenAI tokenizer to get a rough estimate of how a given text maps into tokens.

Use it as a code assistant with VScode

It’s very easy to use the LLM as a code assistant, both locally and in the Development Environment. To configure it:

  1. In VScode, install the Continue.dev extension.

  2. Open the Continue config file: /home/<user>/.continue/config.yaml

  3. Modify it to add the LLM model, using your API key:

models:
  - name: AI4EOSC LLM
    provider: openai
    model: AI4EOSC/mistralai/Mistral-Small-3.1-24B-Instruct-2503
    apiKey: "sk-************************************"
    apiBase: https://vllm.cloud.ai4eosc.eu/
    roles:
      - chat
      - edit
      - apply
  1. Voilá, you are done! Check the Continue short tutorial for a quick overview on how to use it.

../_images/continue.png

Use it from within your Python code

To use the LLM from your Python scripts you need to install the openai Python package. Then you can use the LLM as following:

from openai import OpenAI


client = OpenAI(
    base_url="https://vllm.cloud.ai4eosc.eu",
    api_key="sk-************************************",
)

completion = client.chat.completions.create(
    model="AI4EOSC/mistralai/Mistral-Small-3.1-24B-Instruct-2503",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

print(completion.choices[0].message.content)
ㅤ ⚠️ Query token limit

The models served have a typical token limit per call. If your query exceeds this token limit, you will get an error message similar to:

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'max_tokens must be at least 1, got -30979.', 'type': 'BadRequestError', 'param': None, 'code': 400}

Implement a RAG pipeline

We also have a dedicated embeddings model that let’s you perform Retrieval Augmented Generation (RAG). This allows the model to ground its answers on the specific documents you pass to it. You can implement a RAG pipeline using the llama-index Python package, for example.

After installing the required packages,

pip install llama-index
pip install llama-index-llms-openai-like
pip install llama-index-embeddings-openai-like

define your demo pipeline:

from llama_index.core import Settings, VectorStoreIndex, Document, SimpleDirectoryReader
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.openai_like import OpenAILikeEmbedding


Settings.embed_model = OpenAILikeEmbedding(
    api_base="https://vllm.cloud.ai4eosc.eu",
    api_key="sk-************************************",
    model_name="AI4EOSC/Qwen/Qwen3-Embedding-4B",
)
Settings.llm = OpenAILike(
    api_base="https://vllm.cloud.ai4eosc.eu",
    api_key="sk-************************************",
    model="AI4EOSC/mistralai/Mistral-Small-3.1-24B-Instruct-2503",
    context_window=25000,
    is_chat_model=True,
    is_function_calling_model=False,
)

# Simple document example 📄️
text_to_embed = "My favorite fruit are apples."
documents = [Document(text=text_to_embed)]

# But you can also load entire document folders 📂️
# documents = SimpleDirectoryReader(input_dir="/path/to/your/document/folder").load_data()

# Build index and query engine
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Ask the LLM a question
response = query_engine.query("What is my favorite fruit?")
print(response)

# > Apples are your favorite fruit.