How to deploy a new solution method (2.0)

The deployment of a solution method consists of sub-classing the class ApplicationCore from the package cornflow_client. ApplicationCore serves as a template for apps and requires certain properties and methods to be defined.

We will use as example the TSP problem defined here. At the end of the document, the complete code with all imports is available.

Application class

First, we make a subclass of ApplicationCore:

class TspApp(ApplicationCore):
    name = "tsp"
    instance = Instance
    solution = Solution
    solvers = dict(naive=TSPNaive)
    schema = load_json(os.path.join(os.path.dirname(__file__), "config.json"))

    @property
    def test_cases(self) -> List[Dict]:
        return []

As you can see, we have given the application a class name (and a name property). We also have passed several other properties: instance, solution, solvers and schema. Finally, we have created a test_cases property.

We will go over each of the required properties below.

Instance class

This is just a subclass of InstanceCore. There are many default methods that can be overwritten (to_dict, from_dict, etc.). The only required properties are the schema and schema_checks. Here we have just imported a file with the corresponding json-schema for the schema, and defined an empty schema for the schema_checks. If you want to know how to define a json-schema, check the section Write a json-schema. The instance can also define a check() method.

class Instance(InstanceCore):
    schema = load_json(os.path.join(os.path.dirname(__file__), "input.json"))
    schema_checks = get_empty_schema()

check

check returns a dictionary of lists. Each key in the dictionary represents a specific validation. Each entry of the list represents a violation of the requirement checked in this validation.

schema_checks

schema_checks returns a jsonschema corresponding to the output of the check method.

Solution class

Very similar to the Instance. The Solution is just a subclass of SolutionCore. The only required property is the schema.

class Solution(SolutionCore):
    schema = load_json(os.path.join(os.path.dirname(__file__), "output.json"))

Experiment class

Although not strictly necessary, it is usually good practice to define an Experiment class that subclasses ExperimentCore. This class takes as input an Instance class and a Solution class. Its purpose is to evaluate and validate a given solution. To achieve this, the class should implement at least two methods get_objective and check_solution and a field schema_checks.

get_objective

get_objective returns a scalar number that represents the objective function value of the solution.

check_solution

check_solution returns a dictionary of lists. Each key in the dictionary represents a specific validation. Each entry of the list represents a violation of the requirement checked in this validation.

In the example below for the TSP, a possible value for check_solution() could be:

{
    "missing_nodes": [{"node": 5}, {"node": 6}]
}

Which implies that the node 5 and node 6 have not been visited in the solution.

schema_checks

schema_checks returns a jsonschema corresponding to the output of the check_solution method.

Example

class Experiment(ExperimentCore):
    schema_checks = get_empty_schema()

    def get_objective(self) -> float:
        # we get a sorted list of nodes by position
        route = (
            TupList(self.solution.data["route"])
            .sorted(key=lambda v: v["pos"])
            .vapply(lambda v: v["node"])
        )
        weight = {(el["n1"], el["n2"]): el["w"] for el in self.instance.data["arcs"]}
        # we sum all arcs in the solution
        return (
            sum([weight[n1, n2] for n1, n2 in zip(route, route[1:])])
            + weight[route[-1], route[0]]
        )

    def check_solution(self, *args, **kwargs) -> dict:
        nodes_in = TupList(v["n1"] for v in self.instance.data["arcs"]).to_set()
        nodes_out = TupList(n["node"] for n in self.solution.data["route"]).to_set()
        missing_nodes = [{"node": n} for n in (nodes_in - nodes_out)]
        positions = TupList(n["pos"] for n in self.solution.data["route"]).to_set()
        missing_positions = [{"position": p} for p in set(range(len(nodes_in))) - positions]
        return SuperDict(
            missing_nodes=missing_nodes, missing_positions=missing_positions
        )

Solver class

Each solver is a subclass of the Experiment class and should define one additional method: solve. Here is the implementation of a very very bad solver for the TSP:

class TSPNaive(Experiment):
    def solve(self, options: dict):
        # we just get an arbitrary but complete list of nodes and we return it
        nodes = (
            TupList(v["n1"] for v in self.instance.data["arcs"])
            .unique()
            .kvapply(lambda k, v: dict(pos=k, node=v))
        )
        self.solution = Solution(dict(route=nodes))
        return dict(
            status_sol=SOLUTION_STATUS_FEASIBLE,
            status=STATUS_UNDEFINED
        )

The output of the solve method should be a dictionary contains the state of the resolution. The status mapping can be found in cornflow_client.constants. The status_sol item should indicate whether the execution has found a solution or not, while the status item should return the state of the resolution (optimal, stopped because of time limit, unfeasible…).

More than one solution method can be defined. This is why they are given in a dictionary to the Application class. We use them as a catalogue when deciding to solve a problem.

Test cases

Test cases is a property that should return a list of datasets (in json-schema format). These tests are used to test the app in the unit-tests. More information on how to create the unit tests for your solution method in Test your solution method.

Schema property

The schema of an application is the configuration used to solve a problem. This schema needs to have at least the timeLimit and the solver properties. Besides that, it’s up to the developer to decide which configuration is needed. If you want to know how to define a json-schema, check the section Write a json-schema.

Conventions

We follow some common conventions in the configuration schema so most apps share most of the main properties. Below is a list of known properties and what the usually represent.

  1. timeLimit: float. It indicates the amount of seconds before the method should stop.

  2. solver: string. It indicates the solution method that should be used to solve the problem. It can support “nested solvers”. For example: pulp.cbc should be parsed as “using the pulp solution method and, inside that solution method, use the cbc solver.

  3. msg: boolean. When true, the solution method displays details of the progress.

  4. warmStart: boolean. When true, the solution method will use the current solution (if any) to start the exploration of solutions.

  5. fixSolution: boolean. When true, the solution method will fix the existing information in the solution when exploring the solution space.

  6. gapAbs: float. The maximum absolute gap allowed when considering a solution optimal.

  7. gapRel: float. The maximum relative gap allowed when considering a solution optimal.

  8. threads: integer. The number of cores that should be used in the solution method.

Cornflow-client has some utility functions to help while making this configuration schema. For small solution methods, it doesn’t make much sense to be creating a separated config.json schema file. For these cases, it’s possible to use cornflow_client.schema.tools.get_empty_schema().

Instance and solution schemas

Instance and Solution classes also require their own schemas. See their own sections for more details on how to provide them: Instance class and Solution class respectively.

README

It is a good idea to have a description of the problem to be solved in text form. Even better if the text is accompanied by a mathematical formulation in LaTeX. This way, others can check the problem description and better understand the input data, output data and the solution methods.

Code structure

Assuming your solution is called my_project, the following is the official structure to organize an application:

my_project/
    __init__.py
    README.md
    core/
        __init__.py
        instance.py
        solution.py
        experiment.py
    schemas/
        __init__.py
        instance.json
        solution.json
        config.json
    data/
        data_file_1.json
        data_file_2.json
    solvers/
        __init__.py
        solver_1.py
        solver_2.py

Each app is contained ideally inside a directory.

The Instance class is implemented in the my_project/core/instance.py file, the Solution class inside the my_project/core/solution.py.

Each Solver class is defined in the my_project/solvers directory: solver_1.py, solver_2.py, etc.

The Application class is defined inside my_project/__init__.py.

Schemas are stored in the my_project/schemas folder. Finally, Test cases are stored in the my_project/data directory.

Complete __init__.py code for the TSP

In this example we put everything inside the __init__.py (except the json-schema files) for simplicity.

from cornflow_client import (
    ApplicationCore,
    InstanceCore,
    SolutionCore,
    ExperimentCore,
)
from pytups import TupList, SuperDict
import os
from typing import List, Dict
from cornflow_client.core.tools import load_json


class Instance(InstanceCore):
    schema = load_json(os.path.join(os.path.dirname(__file__), "input.json"))
    schema_checks = get_empty_schema()


class Solution(SolutionCore):
    schema = load_json(os.path.join(os.path.dirname(__file__), "output.json"))


class Experiment(ExperimentCore):
    schema_checks = get_empty_schema()

    def get_objective(self) -> float:
        # we get a sorted list of nodes by position
        route = (
            TupList(self.solution.data["route"])
            .sorted(key=lambda v: v["pos"])
            .vapply(lambda v: v["node"])
        )
        weight = {(el["n1"], el["n2"]): el["w"] for el in self.instance.data["arcs"]}
        # we sum all arcs in the solution
        return (
            sum([weight[n1, n2] for n1, n2 in zip(route, route[1:])])
            + weight[route[-1], route[0]]
        )

    def check_solution(self, *args, **kwargs) -> dict:
        nodes_in = TupList(v["n1"] for v in self.instance.data["arcs"]).to_set()
        nodes_out = TupList(n["node"] for n in self.solution.data["route"]).to_set()
        missing_nodes = {n: 1 for n in (nodes_in - nodes_out)}
        positions = TupList(n["pos"] for n in self.solution.data["route"]).to_set()
        missing_positions = {p: 1 for p in set(range(len(nodes_in))) - positions}
        return SuperDict(
            missing_nodes=missing_nodes, missing_positions=missing_positions
        )


class TSPNaive(Experiment):
    def solve(self, options: dict):
        # we just get an arbitrary but complete list of nodes and we return it
        nodes = (
            TupList(v["n1"] for v in self.instance.data["arcs"])
            .unique()
            .kvapply(lambda k, v: dict(pos=k, node=v))
        )
        self.solution = Solution(dict(route=nodes))
        return dict(
            status_sol=SOLUTION_STATUS_FEASIBLE,
            status=STATUS_UNDEFINED
        )


class TspApp(ApplicationCore):
    name = "tsp"
    instance = Instance
    solution = Solution
    solvers = dict(naive=TSPNaive)
    schema = load_json(os.path.join(os.path.dirname(__file__), "config.json"))

    @property
    def test_cases(self) -> List[Dict]:
        return []

Requirements

The repository contains a file called requirements.txt. You will need to update this file with the name of the additional libraries that your code needs to run.

Pull request

Once that all the previous is done, that your code has been pushed to the remote repository, that none of the tests run by git fail (see Test your solution method), and your application seems complete, it is time to merge it into the main branch. Indeed, the dag won’t be running while your code is not on the main branch.

In order to do so, you need to create a pull request on github’s web interface. Once the PR has been approved, your code will be on the main branch.

From there, you will have to wait until the dag is running, which can take a few hours since the running dags are only updated once a day. To learn how to test your app, see User your solution method. To learn to debug your app in the airflow web interface, see Debug your solution method.