Develop a model from scratch¶

Requirements

For Step 1, to use the Module’s template webpage, you need at least basic authentication.
For Step 2, if you plan to use the Development Environment, you need full authentication to be able to access the Dashboard. Otherwise you can develop locally.
For Step 4 we recommend having docker installed (though it’s not strictly mandatory).

This tutorial explains how to develop an AI module from scratch.

If you are new to Machine Learning, you might want to check some useful Machine Learning resources we compiled to help you getting started.

1. Setting the framework¶

This first step relies on the the AI Modules Template for creating a template for your new module:

Access and authenticate in the Template creation webpage.
Then select the minimal branch of the template and answer the questions.
Click on Generate and you will be able to download a .zip file with with your project directory.

2. Prepare your development environment¶

Although it is possible to develop your code locally, we also offer the possibility to develop from our Development Environment.

This offers the benefits of:

developing on dedicated resources (including GPUs),
have direct access to your data hosted in the Storage,
develop on a Docker image that is already packaged with your favorite Deep Learning framework (eg. Pytorch, Tensorflow),
develop on your favorite IDE (Jupyterlab or VScode),

Check how to create and configure the Development Environment.

⚠️ Use storage-synced folder to develop

We strongly recommend to create a deployment attached to storage.

By doing so, you can develop your code inside the /storage folder and any changes you make will instantly be synced with the Storage. This will prevent any work loss in case of an unexpected deployment crash (which do happen from time to time).

This is what an Development Environment with VScode would look out-of-the-box:

3. Editing the module’s code¶

Drag and drop in the VScode editor the zip file created in Step 1. Then unpack it.

Install your project as a Python module in editable mode, so that the changes you make to the codebase are picked by Python.

cd <project-name>
pip install -e .

Now you can start adding your AI model code inside <project-name>/<project-name>.

Integrating with the DEEPaaS API¶

Once your code is included, you need your module to be able to interface with the DEEPaaS API, which allows any module in the Marketplace to be accessed using the same API. For this, you have to define in api.py the functions you want to make accessible to the user. For this tutorial we are going to head to our official demo module and copy-paste its api.py file.

Once this is done, check that DEEPaaS is interfacing correctly by running:

deepaas-run --listen-ip 0.0.0.0

Your module should be visible in http://0.0.0.0:5000/ui . If you don’t see your module, you probably messed the api.py file. Try running it with python so you get a more detailed debug message.

python api.py

Remember to leave untouched the get_metadata() function that comes predefined with your module, as all modules should have proper metadata.

In general, you will have the following ports available when making a deployment in the platform:

5000: default port used for exposing the API (eg. DEEPaaS)
6006: default port used for exposing a monitoring endpoint (eg. Tensorboard)
8888: default port used for exposing the IDE (eg. JupyterLab, VScode)
80: port available to let developers expose their custom endpoint

Model Quality Assurance¶

In order to improve the readability of the code and the overall maintainability of your module, we enforce some quality standards in tox (including style, security, etc). Modules that fail to pass style tests won’t be able to build docker images. You can check locally if your module passes the tests:

tox -e .

There you should see a detailed report of the offending lines (if any). You can always turn off flake8 testing in some parts of the code if long lines are really needed.

4. Editing the module’s Dockerfile¶

Your ./Dockerfile is in charge of creating a docker image that integrates your application, along with deepaas and any other dependency. You can modify that file according to your needs.

If you need to add instructions based on the runtime (eg. perform certain actions depending on whether you detected a GPU), please use the ENTRYPOINT statement, as CMD will be overwritten by the platform when you deploying a given service (eg. JupyterLab).

We recommend checking the installation steps are fine. If your module needs additional Linux packages add them to the Dockerfile. Check your Dockerfile works correctly by building it locally (outside the Development Environment) and running it:

docker build --no-cache -t your_project .
docker run -ti -p 5000:5000 -p 6006:6006 -p 8888:8888 your_project

Your module should be visible in http://0.0.0.0:5000/ui . You can make a POST request to the predict method to check everything is working as intended.

5. Update your project’s metadata¶

The module’s metadata is located in the ai4-metadata.yml file (example). This is the information that will be displayed in the Marketplace. The fields you need to edit to comply with our schemata are:

title (mandatory): short title,
summary (mandatory): one liner summary of your module,
description (optional): extended description of your module, like a README,
links (mostly optional): links to related info (training dataset, module citation. etc),
tags (mandatory): relevant user-defined keywords (can be empty),

categories, tasks, libraries, data-type (mandatory): one or several keywords, to be chosen from a closed list (can be empty).

ㅤ 📋 Supported values

Libraries	Tasks	Categories	Data Type
TensorFlow	Computer Vision	AI4 pre trained	Image
PyTorch	Natural Language Processing	AI4 trainable	Text
Keras	Time Series	AI4 inference	Time Series
Scikit-learn	Recommender Systems	AI4 tools	Tabular
XGBoost	Anomaly Detection		Graph
LightGBM	Regression		Audio
CatBoost	Classification		Video
Other	Clustering		Other
	Dimensionality Reduction
	Generative Models
	Graph Neural Networks
	Optimization
	Reinforcement Learning
	Transfer Learning
	Uncertainty Estimation
	Other

inference (optional): this is is the minimum resources your module needs to run an inference correctly (eg. CPU cores, RAM, GPUs, etc). If not specified, the Dashboard will prefill with some defaults, that can later be adapted by the user during the configuration step.
provenance (optional): this will allow your model to have a more rich provenance information, as your model provenance graph will show the resources and the hyper-parameters you used to train. The are two subfields you can specify:
- nomad_job: the Dashboard deployment UUID you used to train the final model,
- mlflow_run: the MLflow run UUID you used to train the final model,

Some fields are pre-filled via the AI Modules Template and usually do not need to be modified. Check you didn’t mess up the YAML definition by running our metadata validator:

pip install ai4-metadata
ai4-metadata validate ai4-metadata.yml

6. Integrating the module in the Marketplace¶

Once your repo is set, it’s time to integrate it in the Marketplace:

Open an issue in the AI ModuleCatalog repo.
An platform admin will create the Github repo for your module inside the ai4os-hub organization. You will be granted write permissions in that repo.
ㅤ Naming conventions
Modules repos follow the following convention:
- ai4os-hub/ai4-<project-name>: module officially developed by the project
- ai4os-hub/<project-name>: modules developed by external users
Upload your code to that repo.
An admin will review your code and add it to the AI Module Catalog. Once a module is approved it will take roughly 6 hours to appear in the Dashboard’s Marketplace.

Next steps

If to go further, check our tutorials on how to: