AI4OS LLM¶
As part of the tool repertoire that the AI4OS stack is offering to increase our user’s productivity, we are current offering an LLM Chatbot service (in beta!) that allows users to summarize information, get code recommendations, ask questions about the documentation, etc.
We care about user privacy, so it’s important to notice that your chat history will be erased whenever you delete it, and no data will be retained by the platform (privacy policy).
AI4OS LLM vs self-deployed LLM¶
We also offer a self-deployed LLM option, so which one should you choose?
Self-deployed LLM
let’s you deploy a wide variety of models from a catalog,
resources are exclusively dedicated to you and your selected coworkers,
you are the admin, so you can configure model and UI parameters,
you are the admin, so you can create your own Knowledge Bases as persistent memory banks,
you are the admin, so you can use Functions to create your own agents that use custom prompts, custom Knowledge Bases, and custom input/output filtering,
AI4OS LLM (platform wide)
uses more powerful GPUs, so it offers bigger and more accurate LLMs,
zero configuration needed, access directly with your AI4OS credentials,
the backend (VLLM) is load balanced so it can offer lower latency,
has a dedicated RAG instance for faster queries,
comes with some pre-configured helpful agents, like the AI4EOSC Assistant that helps you navigate the project’s documentation,
By default, we recommend using the AI4OS LLM, which will offer a better experience for most users. Users with more custom needs should try nevertheless the self-deployment options.
Anyway, remember that both options are compatible: you can deploy your own LLM and still access the platform-wide one. The best of both worlds! 🚀
Both options offer a privacy-first design: once your delete your chats or knowledge bases, the data is immediately wiped out from the platform.
Login¶
The service is available at: https://llm.dev.ai4eosc.eu
To access the LLM service, you first need to register in the platform.

Once you login, you will arrive to a landing page where you will be able to select the model which you want to interact with.

The current available models are:
Small-2409
: a medium-size model from the Mistral family (22B parameters) with a smaller context window (32K tokens), released on September 2024. This is the default model.DeepSeek-R1-Distill-Llama-8B
: a small distillation (8B parameters) of the original DeepSeek R1 model, released on January 2025. The distillation is nowhere as performant as the original model, but it serves a nice demo of a thinking model.Assistant
: our custom model designed to help you navigate our documentation,Qwen2.5-VL-7B-Instruct
: a small model for vision related tasks, released in August 2024.
Now, let’s explore some common usages of the tool. Keep in mind that the AI4OS LLM is built with OpenWebUI so you always find further information in their documentation.
Using the AI4OS LLM¶
Chat with the LLM¶
We can ask generic questions to the model.

Remember that if you answer relies on up-to-date information, you can always enable Web search
under the button.
Summarize a document¶
Under the button, you can select Upload files
.
This will allow you to query a document with questions.

Ask questions about the documentation¶
Important
This service is currently under development, so it might not be accessible to you.
In the upper left corner, you can select the AI4EOSC/Assistant
model to ask questions about the platform. The LLM with use our documentation as knowledge base to provide truthful answers to your questions.

Use Vision models¶
If you select the Qwen2.5-VL-7B-Instruct
, you can upload images to the model and ask questions about them.
To upload an image click the and you will be offered the possibility of either Capture
an image or Upload
an image.
Here are some ideas on how to incorporate this into a scientific workflow:
Detexify a LaTeX equation
Generate latex code for the above picture and render it below.

Digitize your handwritten notes
Can you generate a Mermaid graph from this sketch? To ensure valid code, make sure that text inside boxes follows the format `letter{…}`. For example `B{Some text}`.

Do you use it in other ways? We are happy to hear!
Integrate it with your own services¶
Retrieve the API endpoint/key¶
To integrate LLM completions into your workflow you need an API endpoint and an API key. There are two API options:
vLLM API ( recommended): faster (load balanced), supports chat completions
API endpoint: https://llm.dev.ai4eosc.eu:8000.
API key: AI4OS Keycloak →
Personal Info
→User metadata
→LLM API key
OpenWebUI API: supports chat completions, supports Retrieval Augmented Generation
API endpoint: https://llm.dev.ai4eosc.eu/api
API key: AI4OS LLM → →
Settings
→Account
Learn more on how to use API keys to integrate the AI4OS LLM into your own services (endpoints are compatible with the OpenAI API spec).
Use it as a code assistant with VScode¶
It’s very easy to use the AI4OS LLM as a code assistant, both locally and in the AI4OS Development Environment. To configure it:
In VScode, install the Continue.dev extension.
On the left handside bar, click the Continue icon. Then, in the panel, click the ⚙️
Open Continue Config
.Modify the
config.json
to add the AI4OS LLM model, using your API key:{ "models": [ { "title": "AI4OS LLM", "provider": "openai", "model": "AI4EOSC/DeepSeek-R1-Distill-Llama-8B", "apiKey": "sk-********************************", "apiBase": "https://llm.dev.ai4eosc.eu/api", "useLegacyCompletionsEndpoint": false } ] }
Voilá, you are done! Check the Continue short tutorial for a quick overview on how to use it.

Use it from within your Python code¶
To use the LLM from your Python scripts you need to install the openai Python package. Then you can use the LLM as following:
from openai import OpenAI
client = OpenAI(
base_url="https://llm.dev.ai4eosc.eu/api",
api_key="******************",
)
completion = client.chat.completions.create(
model="AI4EOSC/Small",
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(completion.choices[0].message.content)