Deployment options in AI4OS¶
This page serves a guide, summarizing the pros and cons of each deployment option. With this information in mind, users can make the best decision on where to deploy their models.
Option |
✅ Pros |
❌ Cons |
---|---|---|
Deploy in AI4OS (serverless) (model is loaded on demand) |
|
|
Deploy in AI4OS (dedicated resources) (model is always loaded) |
|
|
|
|
Given the above specifications, we recommend the following typical workflows:
Deploy in AI4OS (serverless) if your service does not need low latency and expects to either (1) receive lots of concurrent user queries or (2) receive user queries that are spaced-out in time. This is the recommended option by default for all users.
Deploy in AI4OS (dedicated resources) if your service needs to handle, with low latency, few concurrent user queries that are close in time.
Deploy in your own cloud resources if you do not belong to the project.
If you need to generate one-off predictions on a given dataset but not maintain a running service, you have two options:
Deploy in AI4OS (dedicated resources) with a GPU, make your predictions and delete the deployment.
Deploy in AI4OS (serverless) and upload your dataset files to a bucket to perform async predictions. If your dataset is really big, you can contact support to create a custom batch processing pipeline tailored to your usecase.
In addition to the above deployment options from the AI4OS Dashboard, there are several additional deployment methods: