Inference platform (OSCAR)¶
An OSCAR cluster consists of, among other components:
a Kubernetes cluster than can optionally auto-scale, in terms of number of nodes, within a certain boundaries.
configured with MinIO, a high-performance object storage system, so that file uploads to a MinIO bucket can trigger the invocation of an OSCAR service to perform AI model inference.
configured with Knative, a FaaS platform, so that synchronous requests to an OSCAR service are handled via dynamically provisioned pods (containers) in the Kubenetes cluster.
The AI4OS Inference platform consists of a pre-deployed OSCAR cluster accessible exclusively for users belonging to the AI4EOSC ( vo.ai4eosc.eu ) and iMagine ( vo.imagine-ai.eu ) Virtual Organizations (VOs).
You can access the OSCAR cluster via:
A command-line interface (CLI), accesible via login through EGI Check-In.
This cluster is provided for testing purposes and OSCAR services may be removed at any time depending on the underlying infrastructure capacity and usage rates. Should this happen, you can easily re-deploy the services from the corresponding FDL file.
You can also deploy an OSCAR cluster on whichever Cloud platform you have access to, via the Infrastructure Manager, to decide the computing infrastructure used to execute your AI models.
The cluster is used to deploy OSCAR services, which are described by a Functions Definition Language (FDL) file which specifies (among other features):
The Docker image, which includes the AI model that supports the DEEPaaS API and all the required libraries and data to perform the inference.
The computing requirements (CPUs, RAM, GPUs, etc.).
The shell-script to be executed inside the container created out of the Docker image for each service invocation.
(Optional) The link to a MinIO bucket and an input folder.
OSCAR services can be invoked (see Invoking services for further details):
Asynchronously, by uploading files to a MinIO bucket to trigger the OSCAR service upon file uploads.
Synchronously, by invoking the service from OSCAR CLI or via the OSCAR Manager’s REST API. A certain number of pre-deployed containers can be kept up and running to mitigate the cold start problem (initial delays when performing the first invocations to the service).
Through Exposed Services, where stateless services created out of large containers require too much time to be started to process a service invocation. This is the case when supporting the fast inference of pre-trained AI models that require close to real-time processing with high throughput. In a traditional serverless approach, the AI model weights would be loaded in memory for each service invocation (thus creating a new container). With this approach AI model weights could be loaded just once and the service would perform the AI model inference for each subsequent request. An auto-scaled load-balanced approach for these stateless services is supported.
There is extensive documentation on how to use an OSCAR cluster in https://docs.oscar.grycap.net.
The AI4OS inference platform based on OSCAR is not restricted to use cases involving scalable inference of AI models. It can also be used to offload sporadic (even if bursty) inference requests to a remote AI model (which can exploit GPUs).