Deploy Machine Learning Models with Django

Version 1.0 (04/11/2019)

Piotr Płoński

Introduction

The demand for Machine Learning (ML) applications is growing. Many resources show how to train ML algorithms. However, the ML algorithms work in two phases:

The benefits for business are in the interference phase when ML algorithms provide information before it is known. There is a technological challenge on how to provide ML algorithms for inference into production systems. There are many requirements which need to be fulfilled:

There are many ways of how ML algorithms can be used:

This tutorial provides code examples on how to build your ML system available with REST API. In this book, for building the ML service I will use Python 3.6 and Django 2.2.4. This book is the first part that covers the basics which should be enough to build your ML system which:

There are many ways in which this tutorial can be extended, for example:

Right now, the above topics are not covered in this tutorial. I will consider writing them in the future based on the reader's feedback. You can send me feedback using this form.

In my opinion, building your ML system has a great advantage - it is tailored to your needs. It has all features that are needed in your ML system and can be as complex as you wish.

This tutorial is for readers who are familiar with ML and would like to learn how to build ML web services. Basic Python knowledge is required. The full code of this tutorial is available at: https://github.com/pplonski/my_ml_service.

Start

What you will learn in this chapter:

Setup git repository

To set up a git repository I use GitHub (it is free for public and private projects). If you have an account there please go to https://github.com/new and set the repository, like in the image (1).

Figure 1: Setup a new project in github
Figure 1: Setup a new project in github

The full code of this tutorial is available at: https://github.com/pplonski/my_ml_service.

Then please go to your terminal and set the repository:

git clone https://github.com/pplonski/my_ml_service.git
cd my_ml_service
ls -l

In my case, I had two files in the repository, LICENSE and README.md.

Installation

Let's set up and activate the environment for development (I'm using Ubuntu 16.04). I will use virtualenv:

virtualenv venv --python=python3.6
source venv/bin/activate

You will need to activate the environment every time you are starting work on your project in the new terminal.

To install needed packages I will use pip3:

pip3 install django==2.2.4

The Django is installed in version 2.2.4.

Start Django project

I will set up the Django project in the backend directory. The Django project name is set to server.

mkdir backend
cd backend
django-admin startproject server

You can run your initiated server with the following command:

cd server
python manage.py runserver

When you enter 127.0.0.1:8000 in your favorite web browser you should see default Django welcome site (2).

Figure 2: Django default welcome site
Figure 2: Django default welcome site

Congratulations!!! you have successfully set up the environment.

Add source files to the repository

Before we go to the next chapter, let's commit new files.

# please execute it in your main project directory
git add backend/
git commit -am "setup django project"
git push

The following files should be added to your project:

new file:   backend/server/manage.py
new file:   backend/server/server/__init__.py
new file:   backend/server/server/settings.py
new file:   backend/server/server/urls.py
new file:   backend/server/server/wsgi.py

In your directory, there are other files which are not added to the repository because there are excluded in .gitignore file.

Build ML algorithms

In this chapter you will learn:

Setup Jupyter notebook

For building ML algorithms I'm using Jupyter notebook. It can be easily installed:

# run commands in your project directory
pip3 install jupyter notebook

To set Jupyter to use local virtualenv environment run:

ipython kernel install --user --name=venv

I will create a research directory where I will put Jupiter files. To start Jupyter notebook run:

# create a research directory
mkdir research
cd research
# start Jupyter
jupyter notebook

When starting a new notebook make sure that you select the correct kernel, venv in our case (image 3).

Figure 3: Start new jupyter notebook
Figure 3: Start new jupyter notebook

Train ML algorithms

Before building ML algorithms we need to install packages:

pip3 install numpy pandas sklearn joblib

The numpy and pandas packages are used for data manipulation. The joblib is used for ML objects saving. Whereas, the sklearn package offers a wide range of ML algorithms. We need to reload Jupyter after installation.

The first step in our code is to load packages:

import json # will be needed for saving preprocessing details
import numpy as np # for data manipulation
import pandas as pd # for data manipulation
from sklearn.model_selection import train_test_split # will be used for data split
from sklearn.preprocessing import LabelEncoder # for preprocessing
from sklearn.ensemble import RandomForestClassifier # for training the algorithm
from sklearn.ensemble import ExtraTreesClassifier # for training the algorithm
import joblib # for saving algorithm and preprocessing objects

Loading data

In this tutorial, I will use Adult Income data set. In this data set, the ML will be used to predict whether income exceeds $50K/year based on census data. I will load data from my public repository with data sets good for start with ML.

Code to load data and show first rows of data (figure 4):

# load dataset
df = pd.read_csv('https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv', skipinitialspace=True)
x_cols = [c for c in df.columns if c != 'income']
# set input matrix and target column
X = df[x_cols]
y = df['income']
# show first rows of data
df.head()
Figure 4: First rows of our dataset
Figure 4: First rows of our dataset

The X matrix has 32,561 rows and 14 columns. This is input data for our algorithm, each row describes one person. The y vector has 32,561 values indicating whether income exceeds 50K per year.

Before starting data preprocessing we will split our data into training, and testing subsets. We will use 30% of the data for testing.

# data split train / test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=1234)

Data pre-processing

In our data set, there are missing values and categorical columns. For ML algorithm training I will use the Random Forest algorithm from the sklearn package. In the current implementation it can not handle missing values and categorical columns, that's why we need to apply pre-processing algorithms.

To fill missing values we will use the most frequent value in each column (there are many other filling methods, the one I select is just for example purposes).

# fill missing values
train_mode = dict(X_train.mode().iloc[0])
X_train = X_train.fillna(train_mode)
print(train_mode)

The train_mode values look like:

{'age': 31.0,
 'workclass': 3.0,
 'fnlwgt': 121124.0,
 'education': 11.0,
 'education-num': 9.0,
 'marital-status': 2.0,
 'occupation': 9.0,
 'relationship': 0.0,
 'race': 4.0,
 'sex': 1.0,
 'capital-gain': 0.0,
 'capital-loss': 0.0,
 'hours-per-week': 40.0,
 'native-country': 37.0}

From train_mode you see, that for example in the age column the most frequent value is 31.0.

Let's convert categoricals into numbers. I will use LabelEncoder from sklearn package:

# convert categoricals
encoders = {}
for column in ['workclass', 'education', 'marital-status',
                'occupation', 'relationship', 'race',
                'sex','native-country']:
    categorical_convert = LabelEncoder()
    X_train[column] = categorical_convert.fit_transform(X_train[column])
    encoders[column] = categorical_convert

Algorithms training

Data is ready, so we can train our Random Forest algorithm.

# train the Random Forest algorithm
rf = RandomForestClassifier(n_estimators = 100)
rf = rf.fit(X_train, y_train)

We will also train Extra Trees algorithm:

# train the Extra Trees algorithm
et = ExtraTreesClassifier(n_estimators = 100)
et = et.fit(X_train, y_train)

As you see, training the algorithm is easy, just 2 lines of code - much less than data reading and pre-processing. Now, let's save the algorithm that we have created. The important thing to notice is that the ML algorithm is not only the rf and et variable (with model weights), but we also need to save pre-processing variables train_mode and encoders as well. For saving, I will use joblib package.

# save preprocessing objects and RF algorithm
joblib.dump(train_mode, "./train_mode.joblib", compress=True)
joblib.dump(encoders, "./encoders.joblib", compress=True)
joblib.dump(rf, "./random_forest.joblib", compress=True)
joblib.dump(et, "./extra_trees.joblib", compress=True)

Add ML code and artifacts to the repository

Before continuing to the next chapter, let's add our notebook and files to the repository.

# execute in project main directory
git add research/*
git commit -am "add ML code and algorithms"
git push

Each file with preprocessing objects and algorithms is smaller than 100 MB, which is the GitHub file limit. For larger files it will be better to use separate version control systems like DVC - however, this is a more advanced topic.

Django models

What have you already accomplished:

What you will learn in this chapter:

Create Django models

To create Django models we need to create a new app:

# run this in backend/server directory
python manage.py startapp endpoints
mkdir apps
mv endpoints/ apps/

With the above commands, we have created the endpoints app and moved it to the apps directory. I have added the apps directory to keep the project clean.

# list files in apps/endpoints
ls apps/endpoints/
# admin.py  apps.py  __init__.py  migrations  models.py  tests.py  views.py

Let's go to apps/endpoints/models.py file and define database models (Django provides object-relational mapping layer (ORM)).

from django.db import models

class Endpoint(models.Model):
    '''
    The Endpoint object represents ML API endpoint.

    Attributes:
        name: The name of the endpoint, it will be used in API URL,
        owner: The string with owner name,
        created_at: The date when endpoint was created.
    '''
    name = models.CharField(max_length=128)
    owner = models.CharField(max_length=128)
    created_at = models.DateTimeField(auto_now_add=True, blank=True)

class MLAlgorithm(models.Model):
    '''
    The MLAlgorithm represent the ML algorithm object.

    Attributes:
        name: The name of the algorithm.
        description: The short description of how the algorithm works.
        code: The code of the algorithm.
        version: The version of the algorithm similar to software versioning.
        owner: The name of the owner.
        created_at: The date when MLAlgorithm was added.
        parent_endpoint: The reference to the Endpoint.
    '''
    name = models.CharField(max_length=128)
    description = models.CharField(max_length=1000)
    code = models.CharField(max_length=50000)
    version = models.CharField(max_length=128)
    owner = models.CharField(max_length=128)
    created_at = models.DateTimeField(auto_now_add=True, blank=True)
    parent_endpoint = models.ForeignKey(Endpoint, on_delete=models.CASCADE)

class MLAlgorithmStatus(models.Model):
    '''
    The MLAlgorithmStatus represent status of the MLAlgorithm which can change during the time.

    Attributes:
        status: The status of algorithm in the endpoint. Can be: testing, staging, production, ab_testing.
        active: The boolean flag which point to currently active status.
        created_by: The name of creator.
        created_at: The date of status creation.
        parent_mlalgorithm: The reference to corresponding MLAlgorithm.

    '''
    status = models.CharField(max_length=128)
    active = models.BooleanField()
    created_by = models.CharField(max_length=128)
    created_at = models.DateTimeField(auto_now_add=True, blank=True)
    parent_mlalgorithm = models.ForeignKey(MLAlgorithm, on_delete=models.CASCADE, related_name = "status")

class MLRequest(models.Model):
    '''
    The MLRequest will keep information about all requests to ML algorithms.

    Attributes:
        input_data: The input data to ML algorithm in JSON format.
        full_response: The response of the ML algorithm.
        response: The response of the ML algorithm in JSON format.
        feedback: The feedback about the response in JSON format.
        created_at: The date when request was created.
        parent_mlalgorithm: The reference to MLAlgorithm used to compute response.
    '''
    input_data = models.CharField(max_length=10000)
    full_response = models.CharField(max_length=10000)
    response = models.CharField(max_length=10000)
    feedback = models.CharField(max_length=10000, blank=True, null=True)
    created_at = models.DateTimeField(auto_now_add=True, blank=True)
    parent_mlalgorithm = models.ForeignKey(MLAlgorithm, on_delete=models.CASCADE)

We defined three models:

We need to add our app to INSTALLED_APPS in backend/server/server/settings.py, it should look like:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    # apps
    'apps.endpoints'
]

To apply our models to the database we need to run migrations:

# please run it in backend/server directory
python manage.py makemigrations
python manage.py migrate

The above commands will create tables in the database. By default, Django is using SQLite as a database. For this tutorial, we can keep this simple database, for more advanced projects you can set a Postgres or MySQL as a database (you can configure this by setting DATABASES variable in backend/server/server/settings.py).

Create REST API for models

So far we have defined database models, but we will not see anything new when running the web server. We need to specify REST API to our objects. The simplest and cleanest way to achieve this is to use Django REST Framework (DRF). To install DRF we need to run:

pip3 install djangorestframework
pip3 install markdown       # Markdown support for the browsable API.
pip3 install django-filter  # Filtering support

and add it to INSTALLED_APPS in backend/server/server/settings.py:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework', # add django rest framework
    # apps
    'apps.endpoints'
]

To see something in the browser we need to define:

DRF Serializers

Please add serializers.py file to server/apps/endpoints directory:

# backend/server/apps/endpoints/serializers.py file
from rest_framework import serializers
from apps.endpoints.models import Endpoint
from apps.endpoints.models import MLAlgorithm
from apps.endpoints.models import MLAlgorithmStatus
from apps.endpoints.models import MLRequest

class EndpointSerializer(serializers.ModelSerializer):
    class Meta:
        model = Endpoint
        read_only_fields = ("id", "name", "owner", "created_at")
        fields = read_only_fields


class MLAlgorithmSerializer(serializers.ModelSerializer):

    current_status = serializers.SerializerMethodField(read_only=True)

    def get_current_status(self, mlalgorithm):
        return MLAlgorithmStatus.objects.filter(parent_mlalgorithm=mlalgorithm).latest('created_at').status

    class Meta:
        model = MLAlgorithm
        read_only_fields = ("id", "name", "description", "code",
                            "version", "owner", "created_at",
                            "parent_endpoint", "current_status")
        fields = read_only_fields

class MLAlgorithmStatusSerializer(serializers.ModelSerializer):
    class Meta:
        model = MLAlgorithmStatus
        read_only_fields = ("id", "active")
        fields = ("id", "active", "status", "created_by", "created_at",
                            "parent_mlalgorithm")

class MLRequestSerializer(serializers.ModelSerializer):
    class Meta:
        model = MLRequest
        read_only_fields = (
            "id",
            "input_data",
            "full_response",
            "response",
            "created_at",
            "parent_mlalgorithm",
        )
        fields =  (
            "id",
            "input_data",
            "full_response",
            "response",
            "feedback",
            "created_at",
            "parent_mlalgorithm",
        )

Serializers will help with packing and unpacking database objects into JSON objects. In Endpoints and MLAlgorithm serializers, we defined all read-only fields. This is because, we will create and modify our objects only on the server-side.For MLAlgorithmStatus, fields status, created_by, created_at and parent_mlalgorithm are in read and write mode, we will use the to set algorithm status by REST API. For MLRequest serializer there is a feedback field that is left in read and write mode - it will be needed to provide feedback about predictions to the server.

The MLAlgorithmSerializer is more complex than others. It has one filed current_status that represents the latest status from MLAlgorithmStatus.

Views

To add views please open backend/server/endpoints/views.py file and add the following code:

# backend/server/apps/endpoints/views.py file
from rest_framework import viewsets
from rest_framework import mixins

from apps.endpoints.models import Endpoint
from apps.endpoints.serializers import EndpointSerializer

from apps.endpoints.models import MLAlgorithm
from apps.endpoints.serializers import MLAlgorithmSerializer

from apps.endpoints.models import MLAlgorithmStatus
from apps.endpoints.serializers import MLAlgorithmStatusSerializer

from apps.endpoints.models import MLRequest
from apps.endpoints.serializers import MLRequestSerializer

class EndpointViewSet(
    mixins.RetrieveModelMixin, mixins.ListModelMixin, viewsets.GenericViewSet
):
    serializer_class = EndpointSerializer
    queryset = Endpoint.objects.all()


class MLAlgorithmViewSet(
    mixins.RetrieveModelMixin, mixins.ListModelMixin, viewsets.GenericViewSet
):
    serializer_class = MLAlgorithmSerializer
    queryset = MLAlgorithm.objects.all()


def deactivate_other_statuses(instance):
    old_statuses = MLAlgorithmStatus.objects.filter(parent_mlalgorithm = instance.parent_mlalgorithm,
                                                        created_at__lt=instance.created_at,
                                                        active=True)
    for i in range(len(old_statuses)):
        old_statuses[i].active = False
    MLAlgorithmStatus.objects.bulk_update(old_statuses, ["active"])

class MLAlgorithmStatusViewSet(
    mixins.RetrieveModelMixin, mixins.ListModelMixin, viewsets.GenericViewSet,
    mixins.CreateModelMixin
):
    serializer_class = MLAlgorithmStatusSerializer
    queryset = MLAlgorithmStatus.objects.all()
    def perform_create(self, serializer):
        try:
            with transaction.atomic():
                instance = serializer.save(active=True)
                # set active=False for other statuses
                deactivate_other_statuses(instance)



        except Exception as e:
            raise APIException(str(e))

class MLRequestViewSet(
    mixins.RetrieveModelMixin, mixins.ListModelMixin, viewsets.GenericViewSet,
    mixins.UpdateModelMixin
):
    serializer_class = MLRequestSerializer
    queryset = MLRequest.objects.all()

For each model, we created a view which will allow to retrieve single object or list of objects. We will not allow to create or modify Endpoints, MLAlgorithms by REST API. The code to to handle creation of new ML related objects will be on server side, I will describe it in the next chapter.

We will allow to create MLAlgorithmStatus objects by REST API. We don't allow to edit statuses for ML algorithms as we want to keep all status history.

We allow to edit MLRequest objects, however only feedback field (please take a look at serializer definition).

URLs

The last step is to add URLs to access out models. Please add urls.py file in backend/server/apps/endpoints with following code:

# backend/server/apps/endpoints/urls.py file
from django.conf.urls import url, include
from rest_framework.routers import DefaultRouter

from apps.endpoints.views import EndpointViewSet
from apps.endpoints.views import MLAlgorithmViewSet
from apps.endpoints.views import MLAlgorithmStatusViewSet
from apps.endpoints.views import MLRequestViewSet

router = DefaultRouter(trailing_slash=False)
router.register(r"endpoints", EndpointViewSet, basename="endpoints")
router.register(r"mlalgorithms", MLAlgorithmViewSet, basename="mlalgorithms")
router.register(r"mlalgorithmstatuses", MLAlgorithmStatusViewSet, basename="mlalgorithmstatuses")
router.register(r"mlrequests", MLRequestViewSet, basename="mlrequests")

urlpatterns = [
    url(r"^api/v1/", include(router.urls)),
]

The above code will create REST API routers to our database models. Our models will be accessed by following the URL pattern:

http://<server-ip>/api/v1/<object-name>

You can notice that we include v1 in the API address. This might be needed later for API versioning.

We need to add endpoints urls to main urls.py file of the server (file backend/server/server/urls.py):

# backend/server/server/urls.py file
from django.conf.urls import url, include
from django.contrib import admin
from django.urls import path

from apps.endpoints.urls import urlpatterns as endpoints_urlpatterns

urlpatterns = [
    path('admin/', admin.site.urls),
]

urlpatterns += endpoints_urlpatterns

Run the server

We have added many new things, let's check if all works.

Please run the server:

# in backend/server
python manage.py runserver

and open http://127.0.0.1:8000/api/v1/ in the web browser. You should see DRF view (image 5).

Figure 5: Default Django REST Framework view
Figure 5: Default Django REST Framework view

The DRF provides nice interface, so you can click on any URL and check the objects (for example on http://127.0.0.1:8000/api/v1/endpoints). You should see empty list for all objects, because we didn't add anything there yet. We will add ML algorithms and endpoints in the next chapter.

Add code to the repository

The last step in this chapter is to add a new code to the repository.

# please run in backend/server directory
git add apps/endpoints
git commit -am "endpoints models"
git push

Add ML algorithms to the server code

So far you have accomplished:

What you will learn in this chapter:

ML code in the server

In the chapter 3 we have created two ML algorithms (with Random Forest and Extra Trees). They were implemented in the Jupyter notebook. Now, we will write code on the server-side that will use previously trained algorithms. In this chapter we will include on server-side only the Random Forest algorithm (for simplicity).

In the directory backend/server/apps let's create new directory ml to keep all ML related code and income_classifier directory to keep our income classifiers.

# please run in backend/server/apps
mkdir ml
mkdir ml/income_classifier

In income_classifier directory let's add new file random_forest.py and empty file __init__.py.

In random_forest.py file we will implement the ML algorithm code.

# file backend/server/apps/ml/income_classifier/random_forest.py
import joblib
import pandas as pd

class RandomForestClassifier:
    def __init__(self):
        path_to_artifacts = "../../research/"
        self.values_fill_missing =  joblib.load(path_to_artifacts + "train_mode.joblib")
        self.encoders = joblib.load(path_to_artifacts + "encoders.joblib")
        self.model = joblib.load(path_to_artifacts + "random_forest.joblib")

    def preprocessing(self, input_data):
        # JSON to pandas DataFrame
        input_data = pd.DataFrame(input_data, index=[0])
        # fill missing values
        input_data.fillna(self.values_fill_missing)
        # convert categoricals
        for column in [
            "workclass",
            "education",
            "marital-status",
            "occupation",
            "relationship",
            "race",
            "sex",
            "native-country",
        ]:
            categorical_convert = self.encoders[column]
            input_data[column] = categorical_convert.transform(input_data[column])

        return input_data

    def predict(self, input_data):
        return self.model.predict_proba(input_data)

    def postprocessing(self, input_data):
        label = "<=50K"
        if input_data[1] > 0.5:
            label = ">50K"
        return {"probability": input_data[1], "label": label, "status": "OK"}

    def compute_prediction(self, input_data):
        try:
            input_data = self.preprocessing(input_data)
            prediction = self.predict(input_data)[0]  # only one sample
            prediction = self.postprocessing(prediction)
        except Exception as e:
            return {"status": "Error", "message": str(e)}

        return prediction

The RandomForestClassifier algorithm has five methods:

To enable our code in the Django we need to add ml app to INSTALLED_APPS in backend/server/server/settings.py:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    # apps
    'apps.endpoints',
    'apps.ml'
]

ML code tests

Let's write a test case that will check if our Random Forest algorithm is working as expected. For testing, I will use one row from train data and check if the prediction is correct.

Please add two files into ml directory: empty __init__.py file and tests.py file with the following code:

from django.test import TestCase

from apps.ml.income_classifier.random_forest import RandomForestClassifier

class MLTests(TestCase):
    def test_rf_algorithm(self):
        input_data = {
            "age": 37,
            "workclass": "Private",
            "fnlwgt": 34146,
            "education": "HS-grad",
            "education-num": 9,
            "marital-status": "Married-civ-spouse",
            "occupation": "Craft-repair",
            "relationship": "Husband",
            "race": "White",
            "sex": "Male",
            "capital-gain": 0,
            "capital-loss": 0,
            "hours-per-week": 68,
            "native-country": "United-States"
        }
        my_alg = RandomForestClassifier()
        response = my_alg.compute_prediction(input_data)
        self.assertEqual('OK', response['status'])
        self.assertTrue('label' in response)
        self.assertEqual('<=50K', response['label'])

The above test is:

To run Django tests run the following command:

# please run in backend/server directory
python manage.py test apps.ml.tests

You should see that 1 test was run.

System check identified no issues (0 silenced).
.
----------------------------------------------------------------------
Ran 1 test in 0.661s

OK

Algorithms registry

We have the ML code ready and tested. We need to connect it with the server code. For this, I will create the ML registry object, that will keep information about available algorithms and corresponding endpoints.

Let's add registry.py file in the backend/server/apps/ml/ directory.

# file backend/server/apps/ml/registry.py
from apps.endpoints.models import Endpoint
from apps.endpoints.models import MLAlgorithm
from apps.endpoints.models import MLAlgorithmStatus

class MLRegistry:
    def __init__(self):
        self.endpoints = {}

    def add_algorithm(self, endpoint_name, algorithm_object, algorithm_name,
                    algorithm_status, algorithm_version, owner,
                    algorithm_description, algorithm_code):
        # get endpoint
        endpoint, _ = Endpoint.objects.get_or_create(name=endpoint_name, owner=owner)

        # get algorithm
        database_object, algorithm_created = MLAlgorithm.objects.get_or_create(
                name=algorithm_name,
                description=algorithm_description,
                code=algorithm_code,
                version=algorithm_version,
                owner=owner,
                parent_endpoint=endpoint)
        if algorithm_created:
            status = MLAlgorithmStatus(status = algorithm_status,
                                        created_by = owner,
                                        parent_mlalgorithm = database_object,
                                        active = True)
            status.save()

        # add to registry
        self.endpoints[database_object.id] = algorithm_object

The registry keeps simple dict object with a mapping of algorithm id to algorithm object.

To check if the code is working as expected, we can add test case in the backend/server/apps/ml/tests.py file:

# add at the beginning of the file:
import inspect
from apps.ml.registry import MLRegistry

# ...
# the rest of the code
# ...

# add below method to MLTests class:
    def test_registry(self):
        registry = MLRegistry()
        self.assertEqual(len(registry.endpoints), 0)
        endpoint_name = "income_classifier"
        algorithm_object = RandomForestClassifier()
        algorithm_name = "random forest"
        algorithm_status = "production"
        algorithm_version = "0.0.1"
        algorithm_owner = "Piotr"
        algorithm_description = "Random Forest with simple pre- and post-processing"
        algorithm_code = inspect.getsource(RandomForestClassifier)
        # add to registry
        registry.add_algorithm(endpoint_name, algorithm_object, algorithm_name,
                    algorithm_status, algorithm_version, algorithm_owner,
                    algorithm_description, algorithm_code)
        # there should be one endpoint available
        self.assertEqual(len(registry.endpoints), 1)

This simple test adds a ML algorithm to the registry. To run tests:

# please run in backend/server
python manage.py test apps.ml.tests

Tests output:

System check identified no issues (0 silenced).
..
----------------------------------------------------------------------
Ran 2 tests in 0.679s

OK

Add ML algorithms to the registry

The registry code is ready, we need to specify one place in the server code which will add ML algorithms to the registry when the server is starting. The best place to do it is backend/server/server/wsgi.py file. Please set the following code in the file:

# file backend/server/server/wsgi.py
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'server.settings')
application = get_wsgi_application()

# ML registry
import inspect
from apps.ml.registry import MLRegistry
from apps.ml.income_classifier.random_forest import RandomForestClassifier

try:
    registry = MLRegistry() # create ML registry
    # Random Forest classifier
    rf = RandomForestClassifier()
    # add to ML registry
    registry.add_algorithm(endpoint_name="income_classifier",
                            algorithm_object=rf,
                            algorithm_name="random forest",
                            algorithm_status="production",
                            algorithm_version="0.0.1",
                            owner="Piotr",
                            algorithm_description="Random Forest with simple pre- and post-processing",
                            algorithm_code=inspect.getsource(RandomForestClassifier))

except Exception as e:
    print("Exception while loading the algorithms to the registry,", str(e))

After starting the server with:

python manage.py runserver

you can check the endpoints and ML algorithms in the browser. At the URL: http://127.0.0.1:8000/api/v1/endpoints you can check endpoints (image 6), and at http://127.0.0.1:8000/api/v1/mlalgorithms you can check algorithms (image 7).

Figure 6: List of endpoints defined in the service
Figure 6: List of endpoints defined in the service
Figure 7: List of ML algorithms defined in the service
Figure 7: List of ML algorithms defined in the service

Add code to repository

We need to commit a new code to the repository.

# please run in backend/server directory
git add apps/ml/
git commit -am "add ml code"
git push

What's next?

We have our ML algorithm in the database and we can access information about it with REST API, but how to do predictions? This will be the subject of the next chapter.

Making predictions

What have you learned already:

What you will learn in this chapter:

Predictions view

Firstly, we will create the view for predictions that can accept POST requests with JSON data and forward it to the correct ML algorithm.

In backend/server/apps/endpoints/views.py we need to add the following code:

# please add imports
import json
from numpy.random import rand
from rest_framework import views, status
from rest_framework.response import Response
from apps.ml.registry import MLRegistry
from server.wsgi import registry

'''
... the rest of the backend/server/apps/endpoints/views.py file ...
'''

class PredictView(views.APIView):
    def post(self, request, endpoint_name, format=None):

        algorithm_status = self.request.query_params.get("status", "production")
        algorithm_version = self.request.query_params.get("version")

        algs = MLAlgorithm.objects.filter(parent_endpoint__name = endpoint_name, status__status = algorithm_status, status__active=True)

        if algorithm_version is not None:
            algs = algs.filter(version = algorithm_version)

        if len(algs) == 0:
            return Response(
                {"status": "Error", "message": "ML algorithm is not available"},
                status=status.HTTP_400_BAD_REQUEST,
            )
        if len(algs) != 1 and algorithm_status != "ab_testing":
            return Response(
                {"status": "Error", "message": "ML algorithm selection is ambiguous. Please specify algorithm version."},
                status=status.HTTP_400_BAD_REQUEST,
            )
        alg_index = 0
        if algorithm_status == "ab_testing":
            alg_index = 0 if rand() < 0.5 else 1

        algorithm_object = registry.endpoints[algs[alg_index].id]
        prediction = algorithm_object.compute_prediction(request.data)


        label = prediction["label"] if "label" in prediction else "error"
        ml_request = MLRequest(
            input_data=json.dumps(request.data),
            full_response=prediction,
            response=label,
            feedback="",
            parent_mlalgorithm=algs[alg_index],
        )
        ml_request.save()

        prediction["request_id"] = ml_request.id

        return Response(prediction)

Let's add the URL for predictions. The file backend/server/apps/endpoints/urls.py should look like below:

# file backend/server/apps/endpoints/urls.py

from django.conf.urls import url, include
from rest_framework.routers import DefaultRouter

from apps.endpoints.views import EndpointViewSet
from apps.endpoints.views import MLAlgorithmViewSet
from apps.endpoints.views import MLRequestViewSet
from apps.endpoints.views import PredictView # import PredictView

router = DefaultRouter(trailing_slash=False)
router.register(r"endpoints", EndpointViewSet, basename="endpoints")
router.register(r"mlalgorithms", MLAlgorithmViewSet, basename="mlalgorithms")
router.register(r"mlrequests", MLRequestViewSet, basename="mlrequests")

urlpatterns = [
    url(r"^api/v1/", include(router.urls)),
    # add predict url
    url(
        r"^api/v1/(?P<endpoint_name>.+)/predict$", PredictView.as_view(), name="predict"
    ),
]

OK, let's go into details. The PredictView accepts only POST requests. It is available at:

https://<server_ip/>api/v1/<endpoint_name>/predict

The endpoint_name is defining the endpoint that we are trying to reach. In our case (in local development) the ML algorithm can be accessed at:

http://127.0.0.1:8000/api/v1/income_classifier/predict

The income_classifier is the endpoint name (you can check endpoints at http://127.0.0.1:8000/api/v1/endpoints).

What is more, you can specify algorithm status or version in the URL. To specify status and version you need to include them in the URL, for example:

http://127.0.0.1:8000/api/v1/income_classifier/predict?status=testing&version=1.1.1.

By default, there is a used production status.

Based on endpoint name, status and version there is routing of the request to correct ML algorithm. If the algorithm is selected properly, the JSON request is forwarded to the algorithm object and prediction is computed.

In the code there is also included code that is drawing algorithm in case A/B testing, we will go into details of this code in the next chapter.

To check if is it working please go to http://127.0.0.1:8000/api/v1/income_classifier/predict and provide example JSON input:

{
    "age": 37,
    "workclass": "Private",
    "fnlwgt": 34146,
    "education": "HS-grad",
    "education-num": 9,
    "marital-status": "Married-civ-spouse",
    "occupation": "Craft-repair",
    "relationship": "Husband",
    "race": "White",
    "sex": "Male",
    "capital-gain": 0,
    "capital-loss": 0,
    "hours-per-week": 68,
    "native-country": "United-States"
}

and click the POST button. You should see views like in images 8 and 9.

Figure 8: Fill input data and click POST
Figure 8: Fill input data and click POST
Figure 9: Response of the ML algorithm
Figure 9: Response of the ML algorithm

Congratulations!!! If you see the result as on image 9 it means that your ML web service is working correctly.

Your response should look like this:

{
    "probability": 0.04,
    "label": "<=50K",
    "status": "OK",
    "request_id": 1
}

The response contains probability, label, status and request_id. The request_id can be used later to provide feedback and ML algorithms monitoring.

Add tests for PredictView

We will add a simple test case that will check if the predicted view correctly responds to correct data.

# file backend/server/endpoints/tests.py
from django.test import TestCase
from rest_framework.test import APIClient

class EndpointTests(TestCase):

    def test_predict_view(self):
        client = APIClient()
        input_data = {
            "age": 37,
            "workclass": "Private",
            "fnlwgt": 34146,
            "education": "HS-grad",
            "education-num": 9,
            "marital-status": "Married-civ-spouse",
            "occupation": "Craft-repair",
            "relationship": "Husband",
            "race": "White",
            "sex": "Male",
            "capital-gain": 0,
            "capital-loss": 0,
            "hours-per-week": 68,
            "native-country": "United-States"
        }
        classifier_url = "/api/v1/income_classifier/predict"
        response = client.post(classifier_url, input_data, format='json')
        self.assertEqual(response.status_code, 200)
        self.assertEqual(response.data["label"], "<=50K")
        self.assertTrue("request_id" in response.data)
        self.assertTrue("status" in response.data)

To run this test:

# please run in backend/server directory
python manage.py test apps.endpoints.tests

To run all tests:

# please run in backend/server directory
python manage.py test apps

Later more tests can be added, which will cover situations, where wrong endpoints are selected in the URL or data, is in the wrong format.

Add code to the repository

Before going to next chapter let's add code to the repository:

git commit -am "add predict view"
git push

In the next chapter, we will work on the A/B testing of ML algorithms.

A/B testing

What you already did:

What you will learn in this chapter:

Add second ML algorithm

We will add code and tests for the Extra Trees based algorithm. Please add new file extra_trees.py in backend/server/apps/ml/income_classifer directory. (The code is very similar to RandomForestClassifier class but to keep it simple I just copy it and change the path for reading the model. There can be used inheritance here.).

# file backend/server/apps/ml/income_classifier/extra_trees.py
import joblib
import pandas as pd

class ExtraTreesClassifier:
    def __init__(self):
        path_to_artifacts = "../../research/"
        self.values_fill_missing =  joblib.load(path_to_artifacts + "train_mode.joblib")
        self.encoders = joblib.load(path_to_artifacts + "encoders.joblib")
        self.model = joblib.load(path_to_artifacts + "extra_trees.joblib")

    def preprocessing(self, input_data):
        # JSON to pandas DataFrame
        input_data = pd.DataFrame(input_data, index=[0])
        # fill missing values
        input_data.fillna(self.values_fill_missing)
        # convert categoricals
        for column in [
            "workclass",
            "education",
            "marital-status",
            "occupation",
            "relationship",
            "race",
            "sex",
            "native-country",
        ]:
            categorical_convert = self.encoders[column]
            input_data[column] = categorical_convert.transform(input_data[column])

        return input_data

    def predict(self, input_data):
        return self.model.predict_proba(input_data)

    def postprocessing(self, input_data):
        label = "<=50K"
        if input_data[1] > 0.5:
            label = ">50K"
        return {"probability": input_data[1], "label": label, "status": "OK"}

    def compute_prediction(self, input_data):
        try:
            input_data = self.preprocessing(input_data)
            prediction = self.predict(input_data)[0]  # only one sample
            prediction = self.postprocessing(prediction)
        except Exception as e:
            return {"status": "Error", "message": str(e)}

        return prediction

Add the test in backend/server/apps/ml/tests.py file:

# in file backend/server/apps/ml/tests.py
# add new import
from apps.ml.income_classifier.extra_trees import ExtraTreesClassifier

# ... the rest of the code

# add new test method to MLTests class
    def test_et_algorithm(self):
        input_data = {
            "age": 37,
            "workclass": "Private",
            "fnlwgt": 34146,
            "education": "HS-grad",
            "education-num": 9,
            "marital-status": "Married-civ-spouse",
            "occupation": "Craft-repair",
            "relationship": "Husband",
            "race": "White",
            "sex": "Male",
            "capital-gain": 0,
            "capital-loss": 0,
            "hours-per-week": 68,
            "native-country": "United-States"
        }
        my_alg = ExtraTreesClassifier()
        response = my_alg.compute_prediction(input_data)
        self.assertEqual('OK', response['status'])
        self.assertTrue('label' in response)
        self.assertEqual('<=50K', response['label'])

To run tests:

# please run in backend/server directory
python manage.py test apps.ml.tests

The algorithm is working as expected. We need to add it to our ML registry. We need to modify backend/server/server/wsgi.py file:

# the `backend/server/server/wsgi.py file
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'server.settings')
application = get_wsgi_application()

# ML registry
import inspect
from apps.ml.registry import MLRegistry
from apps.ml.income_classifier.random_forest import RandomForestClassifier
from apps.ml.income_classifier.extra_trees import ExtraTreesClassifier # import ExtraTrees ML algorithm

try:
    registry = MLRegistry() # create ML registry
    # Random Forest classifier
    rf = RandomForestClassifier()
    # add to ML registry
    registry.add_algorithm(endpoint_name="income_classifier",
                            algorithm_object=rf,
                            algorithm_name="random forest",
                            algorithm_status="production",
                            algorithm_version="0.0.1",
                            owner="Piotr",
                            algorithm_description="Random Forest with simple pre- and post-processing",
                            algorithm_code=inspect.getsource(RandomForestClassifier))

    # Extra Trees classifier
    et = ExtraTreesClassifier()
    # add to ML registry
    registry.add_algorithm(endpoint_name="income_classifier",
                            algorithm_object=et,
                            algorithm_name="extra trees",
                            algorithm_status="testing",
                            algorithm_version="0.0.1",
                            owner="Piotr",
                            algorithm_description="Extra Trees with simple pre- and post-processing",
                            algorithm_code=inspect.getsource(RandomForestClassifier))
except Exception as e:
    print("Exception while loading the algorithms to the registry,", str(e))

To see changes, please restart the server:

# please run in backend/server
# stop server with CONTROL-C.
# start server:
python manage.py runserver

After server restart please open http://127.0.0.1:8000/api/v1/mlalgorithms in the web browser. You should see two registered ML algorithms (image 10).

Figure 10: Two ML algorithms registered in the service
Figure 10: Two ML algorithms registered in the service

Create A/B model in the database

Add ABTest model

Let's add database model in the backend/server/apps/endpoints/models.py file to keep information about A/B tests:

# please add at the end of file backend/server/apps/endpoints/models.py

class ABTest(models.Model):
    '''
    The ABTest will keep information about A/B tests.
    Attributes:
        title: The title of test.
        created_by: The name of creator.
        created_at: The date of test creation.
        ended_at: The date of test stop.
        summary: The description with test summary, created at test stop.
        parent_mlalgorithm_1: The reference to the first corresponding MLAlgorithm.
        parent_mlalgorithm_2: The reference to the second corresponding MLAlgorithm.
    '''
    title = models.CharField(max_length=10000)
    created_by = models.CharField(max_length=128)
    created_at = models.DateTimeField(auto_now_add=True, blank=True)
    ended_at = models.DateTimeField(blank=True, null=True)
    summary = models.CharField(max_length=10000, blank=True, null=True)

    parent_mlalgorithm_1 = models.ForeignKey(MLAlgorithm, on_delete=models.CASCADE, related_name="parent_mlalgorithm_1")
    parent_mlalgorithm_2 = models.ForeignKey(MLAlgorithm, on_delete=models.CASCADE, related_name="parent_mlalgorithm_2")

The ABTest keeps information about:

Define serializer

Let's add a serializer for the ABTest model.


# please add at the beginning of file backend/server/apps/endpoints/serializers.py

from apps.endpoints.models import ABTest

# ...
# rest of the code
# ...

# please add at the end of file backend/server/apps/endpoints/serializers.py
class ABTestSerializer(serializers.ModelSerializer):
    class Meta:
        model = ABTest
        read_only_fields = (
            "id",
            "ended_at",
            "created_at",
            "summary",
        )
        fields = (
            "id",
            "title",
            "created_by",
            "created_at",
            "ended_at",
            "summary",
            "parent_mlalgorithm_1",
            "parent_mlalgorithm_2",
            )

Please notice, that id, created_at, ended_at and summary fields are marked as read-only. We will allow users to create A/B tests with REST API the read-only fields with be set with server code.

Define view

# please add to the file backend/server/apps/endpoints/views.py

from django.db import transaction
from apps.endpoints.models import ABTest
from apps.endpoints.serializers import ABTestSerializer


class ABTestViewSet(
    mixins.RetrieveModelMixin, mixins.ListModelMixin, viewsets.GenericViewSet,
    mixins.CreateModelMixin, mixins.UpdateModelMixin
):
    serializer_class = ABTestSerializer
    queryset = ABTest.objects.all()

    def perform_create(self, serializer):
        try:
            with transaction.atomic():
                instance = serializer.save()
                # update status for first algorithm

                status_1 = MLAlgorithmStatus(status = "ab_testing",
                                created_by=instance.created_by,
                                parent_mlalgorithm = instance.parent_mlalgorithm_1,
                                active=True)
                status_1.save()
                deactivate_other_statuses(status_1)
                # update status for second algorithm
                status_2 = MLAlgorithmStatus(status = "ab_testing",
                                created_by=instance.created_by,
                                parent_mlalgorithm = instance.parent_mlalgorithm_2,
                                active=True)
                status_2.save()
                deactivate_other_statuses(status_2)

        except Exception as e:
            raise APIException(str(e))

The ABTestViewSet view allows the user to create new objects. The perform_create method creates the ABTest object and two new statuses for ML algorithms. The new statuses are set to ab_testing.

We will add also a view to stop the A/B test.


# please add to the file backend/server/apps/endpoints/views.py

from django.db.models import F
import datetime

class StopABTestView(views.APIView):
    def post(self, request, ab_test_id, format=None):

        try:
            ab_test = ABTest.objects.get(pk=ab_test_id)

            if ab_test.ended_at is not None:
                return Response({"message": "AB Test already finished."})

            date_now = datetime.datetime.now()
            # alg #1 accuracy
            all_responses_1 = MLRequest.objects.filter(parent_mlalgorithm=ab_test.parent_mlalgorithm_1, created_at__gt = ab_test.created_at, created_at__lt = date_now).count()
            correct_responses_1 = MLRequest.objects.filter(parent_mlalgorithm=ab_test.parent_mlalgorithm_1, created_at__gt = ab_test.created_at, created_at__lt = date_now, response=F('feedback')).count()
            accuracy_1 = correct_responses_1 / float(all_responses_1)
            print(all_responses_1, correct_responses_1, accuracy_1)

            # alg #2 accuracy
            all_responses_2 = MLRequest.objects.filter(parent_mlalgorithm=ab_test.parent_mlalgorithm_2, created_at__gt = ab_test.created_at, created_at__lt = date_now).count()
            correct_responses_2 = MLRequest.objects.filter(parent_mlalgorithm=ab_test.parent_mlalgorithm_2, created_at__gt = ab_test.created_at, created_at__lt = date_now, response=F('feedback')).count()
            accuracy_2 = correct_responses_2 / float(all_responses_2)
            print(all_responses_2, correct_responses_2, accuracy_2)

            # select algorithm with higher accuracy
            alg_id_1, alg_id_2 = ab_test.parent_mlalgorithm_1, ab_test.parent_mlalgorithm_2
            # swap
            if accuracy_1 < accuracy_2:
                alg_id_1, alg_id_2 = alg_id_2, alg_id_1

            status_1 = MLAlgorithmStatus(status = "production",
                            created_by=ab_test.created_by,
                            parent_mlalgorithm = alg_id_1,
                            active=True)
            status_1.save()
            deactivate_other_statuses(status_1)
            # update status for second algorithm
            status_2 = MLAlgorithmStatus(status = "testing",
                            created_by=ab_test.created_by,
                            parent_mlalgorithm = alg_id_2,
                            active=True)
            status_2.save()
            deactivate_other_statuses(status_2)


            summary = "Algorithm #1 accuracy: {}, Algorithm #2 accuracy: {}".format(accuracy_1, accuracy_2)
            ab_test.ended_at = date_now
            ab_test.summary = summary
            ab_test.save()

        except Exception as e:
            return Response({"status": "Error", "message": str(e)},
                            status=status.HTTP_400_BAD_REQUEST
            )
        return Response({"message": "AB Test finished.", "summary": summary})

The StopABTestView stops the A/B test and compute the accuracy (ratio of correct responses) for each algorithm. The algorithm with higher accurcy is set as production algorithm, the other algorithm is saved with testing status.

Add URL router for ABTest

The last thing is to add the URL router:

# the backend/server/apps/endpoints/urls.py file
from django.conf.urls import url, include
from rest_framework.routers import DefaultRouter

from apps.endpoints.views import EndpointViewSet
from apps.endpoints.views import MLAlgorithmViewSet
from apps.endpoints.views import MLAlgorithmStatusViewSet
from apps.endpoints.views import MLRequestViewSet
from apps.endpoints.views import PredictView
from apps.endpoints.views import ABTestViewSet
from apps.endpoints.views import StopABTestView

router = DefaultRouter(trailing_slash=False)
router.register(r"endpoints", EndpointViewSet, basename="endpoints")
router.register(r"mlalgorithms", MLAlgorithmViewSet, basename="mlalgorithms")
router.register(r"mlalgorithmstatuses", MLAlgorithmStatusViewSet, basename="mlalgorithmstatuses")
router.register(r"mlrequests", MLRequestViewSet, basename="mlrequests")
router.register(r"abtests", ABTestViewSet, basename="abtests")

urlpatterns = [
    url(r"^api/v1/", include(router.urls)),
    url(
        r"^api/v1/(?P<endpoint_name>.+)/predict$", PredictView.as_view(), name="predict"
    ),
    url(
        r"^api/v1/stop_ab_test/(?P<ab_test_id>.+)", StopABTestView.as_view(), name="stop_ab"
    ),
]

OK, we are almost set. Before starting a development server we need to create and apply database migrations:

python manage.py makemigrations
python manage.py migrate

Let's run the server:

# please run in backend/server
python manage.py runserver

You should see list of DRF generated list of APIs like in image 11.

Figure 11: URL to A/B tests
Figure 11: URL to A/B tests

Let's start new A/B test. Please go to address http://127.0.0.1:8000/api/v1/abtests (at development environment). Please set the title, creator name and set algorithms. You have algorithm id in the brackets. Make sure that you select id 1 and 2, like in the image 12. Press the POST button to create the test.

Figure 12: View to create new A/B test
Figure 12: View to create new A/B test

After new A/B test creation you should see view like in the image 13.

Figure 13: Created new A/B test
Figure 13: Created new A/B test

After A/B test creation you should see updated status fields for ML algorithms. They should be set to ab_testing, like in the image 16.

Figure 14: ML algorithms with updates statuses
Figure 14: ML algorithms with updates statuses

Run the A/B test

To run the A/B test we will write python script in the Jupyter notebook that will simulate real life A/B testing. The script will:

Before starting new notebook, please install requests package that will be used for communication with the server.

pip3 install requests

Please open Jupyter notebook and create new script ab_test.ipynb in the research directory.

Let's add necessary packages.

import json # will be needed for saving preprocessing details
import numpy as np # for data manipulation
import pandas as pd # for data manipulation
from sklearn.model_selection import train_test_split # will be used for data split
import requests

Code to read the data:

# load dataset
df = pd.read_csv('https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv', skipinitialspace=True)
x_cols = [c for c in df.columns if c != 'income']
# set input matrix and target column
X = df[x_cols]
y = df['income']
# show first rows of data
df.head(

Split the data to train and test sets:

# data split train / test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=1234)

Please notice that we used the same seed (random_state value) as earlier while model training.

Let's use first 100 rows of test data for A/B test.

for i in range(100):
    input_data = dict(X_test.iloc[i])
    target = y_test.iloc[i]
    r = requests.post("http://127.0.0.1:8000/api/v1/income_classifier/predict?status=ab_testing", input_data)
    response = r.json()
    # provide feedback
    requests.put("http://127.0.0.1:8000/api/v1/mlrequests/{}".format(response["request_id"]), {"feedback": target})

In each iteration step, we are sending data to API endpoint:

http://127.0.0.1:8000/api/v1/income_classifier/predict?status=ab_testing

and provide feedback with true label at:

http://127.0.0.1:8000/api/v1/mlrequests/<request-id>

After running the script, you can check the requests at address: http://127.0.0.1:8000/api/v1/mlrequests. You should see list of requests like in the image 15.

Figure 15: ML requests after running the A/B test script
Figure 15: ML requests after running the A/B test script

To stop the A/B test, please open address http://127.0.0.1:8000/api/v1/stop_ab_test/1 where 1 at the end of the address it the A/B test id. Click on POST button to finish A/B test. You should get the view like in the image @fig:16.

Figure 16: A/B test finish
Figure 16: A/B test finish

You can see that there is summary of the test displayed with accuracy for each algorithm. You can check (at http://127.0.0.1:8000/api/v1/mlalgorithms) that algorithms have updated statuses, and the model with higher accuracy is set to production.

Add code to the repository

Let's save our code to the repository:

git add backend/server/apps/ml/income_classifier/extra_trees.py
git add research/ab_test.ipynb
git commit -am "ab tests"
git push

In the next chapter, we will define docker container for our server.

Containers

What you already did:

In this chapter you will define docker container for our server code. With docker it is easy to deploy the code to selected infrastructure and it is easier to scale the service if needed.

Prepare the code

Before creating the docker definition we need to add some changes in the server code.

Please edit backend/server/server/settings.py file and set ALLOWED_HOSTS variable:

ALLOWED_HOSTS = ['0.0.0.0']

Additionally, set the STATIC_ROOT, STATIC_URL variables and the end of settings:

STATIC_ROOT = os.path.join(BASE_DIR, 'static')
STATIC_URL = '/static/'

Please add the requirements.txt file in the project's main directory:

Django==2.2.4
django-filter==2.2.0
djangorestframework==3.10.3
joblib==0.14.0
Markdown==3.1.1
numpy==1.17.3
pandas==0.25.2
requests==2.22.0
scikit-learn==0.21.3

Dockerfiles

Let's define the docker files for nginx server and our server application. We will keep them in separate directories:

# please run in project's main directory
mkdir docker
mkdir docker/nginx
mkdir docker/backend

Please add file Dockerfile in docker/nginx directory:

# docker/nginx/Dockerfile
FROM nginx:1.13.12-alpine
CMD ["nginx", "-g", "daemon off;"]

Additionally, we will add nginx config file, please add docker/nginx/default.conf file:

server {
    listen 8000 default_server;
    listen [::]:8000;

    client_max_body_size 20M;

    location / {
        try_files $uri @proxy_api;
    }

    location @proxy_api {
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Url-Scheme $scheme;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass   http://wsgiserver:8000;
    }

    location /static/ {
        autoindex on;
        alias /app/backend/server/static/;
    }

}

Now, let's define 'Dockerfile' for our server application. Please add file docker/backend/Dockerfile:

FROM ubuntu:xenial

RUN apt-get update && \
    apt-get install -y software-properties-common && \
    add-apt-repository ppa:deadsnakes/ppa && \
    apt-get update && \
    apt-get install -y python3.6 python3.6-dev python3-pip

WORKDIR /app
COPY requirements.txt .
RUN rm -f /usr/bin/python && ln -s /usr/bin/python3.6 /usr/bin/python
RUN rm -f /usr/bin/python3 && ln -s /usr/bin/python3.6 /usr/bin/python3

RUN pip3 install -r requirements.txt
RUN pip3 install gunicorn==19.9.0

ADD ./backend /app/backend
ADD ./docker /app/docker
ADD ./research /app/research

RUN mkdir -p /app/backend/server/static

In this dockerfile, we load ubuntu system, and install all needed packages and switch default python to python 3.6. At the end, we copy the application code.

We will define starting script for our application. Please add docker/backend/wsgi-entrypoint.sh file:

#!/usr/bin/env bash

echo "Start backend server"
until cd /app/backend/server
do
    echo "Waiting for server volume..."
done

until ./manage.py migrate
do
    echo "Waiting for database to be ready..."
    sleep 2
done

./manage.py collectstatic --noinput

gunicorn server.wsgi --bind 0.0.0.0:8000 --workers 4 --threads 4

We will use this starting script to apply database migrations and creation of static files before application is stated with gunicorn.

We have dockerfiles defined for nginx server and our application. We will manage them with docker-compose command. Let's add docker-compose.yml file in the main directory:

version: '2'

services:
    nginx:
        restart: always
        image: nginx:1.12-alpine
        ports:
            - 8000:8000
        volumes:
            - ./docker/nginx/default.conf:/etc/nginx/conf.d/default.conf
            - static_volume:/app/backend/server/static
    wsgiserver:
        build:
            context: .
            dockerfile: ./docker/backend/Dockerfile
        entrypoint: /app/docker/backend/wsgi-entrypoint.sh
        volumes:
            - static_volume:/app/backend/server/static
        expose:
            - 8000
volumes:
    static_volume: {}

To build docker images please run:

sudo docker-compose build

To start the docker images please run:

sudo docker-compose up

You should be able to see the running server at the address:

http://0.0.0.0:8000/api/v1/

Congratulations!

That was the last step of this tutorial. You have successfully created your own web service that can serve machine learning models. Congratulations!

The full code is available in github https://github.com/pplonski/my_ml_service.

Feedback

I'm looking to you feedback!