Write a json-schema

Basics of schemas

The schemas are descriptions of the data accepted by the application.

In particular, a schema must describe a dictionary of tables, each table described as a list of dictionaries whose keys are the name of the columns.

Let’s take an example of data with two tables customers and allowedTrailers.

customers:

IdCustomer

Demand

1

4

2

5

3

7

allowedTrailers:

IdCustomer

idTrailer

1

7

1

8

2

9

3

4

The schema will look like this:

{ "$schema": "http://json-schema.org/schema#",
 "type": "object",
 "properties": {
   "customers": {
     "type": "array",
     "items": {
       "type": "object",
       "properties": {
         "index": {
           "type": "integer"
         },
         "Demand": {
           "type": "integer"
         }
       },
       "required": [
         "index",
         "Demand"
       ]
     }
   },
   "allowedTrailers": {
     "type": "array",
     "items": {
       "type": "object",
       "properties": {
         "idCustomer": {
           "type": "integer"
         },
         "idTrailer": {
           "type": "integer"
         }
       },
       "required": [
         "idCustomer",
         "idTrailer"
       ]
     }
   }
 },
 "required": [
   "customers",
   "allowedTrailers"
 ]
}

This basically means that our input data should be an object containing two tables (customers and allowedTrailers) represented as arrays of objects. It is important to note three things:

No nested list types

If your data previously looked like this:

IdCustomer

Demand

idTrailer

1

4

[7, 8]

2

5

[9]

3

7

[4]

Then it should be converted into two tables like shown before. The cells of a table must only contain unidimensional values

Required property

The property “required” must be included for all objects that are needed to solve the problem.

In real problems, pure data is usually complemented by auxiliary information (such as that used to display in screens: names, comments, coordinates, etc.). This information should not be included in the “required” property because it is not required to produce a solution for the problem (it is still useful to be defined so it can be validated by the schema).

Exception for “simple objects”

Even though most properties of our schema object must be arrays, an exception is made for the parameters of the problems that are unidimensional and can not be represented as lists. For instance, if in our previous example we had two parameters trailersCapacity and timeHorizon, we would add a property parameters to our schema:

{ "$schema": "http://json-schema.org/schema#",
 "type": "object",
 "properties": {
   "parameters": {
     "type": "object",
     "properties": {
       "trailersCapacity": {
         "type": "integer"
       },
       "timeHorizon": {
         "type": "integer"
       }
     },
     "required": ["trailersCapacity", "timeHorizon"]
   },
   "customers": {},
   "allowedTrailers": {}
   },
 "required": [
   "customers",
   "allowedTrailers",
   "parameters"
 ]
}

Naming conventions

When naming columns in a “master table”, we refer to the unique id of each row as “id” (see the shifts property below. When an id is used as a foreign key in another table (see the resources_not_available property), we use “id_shift” to denote that is the id of the shift that we are using:

{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
    "shifts": {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "id": {
                    "type": "string"
                },
                "start_time": {
                    "type": "string"
                },
                "end_time": {
                    "type": "string"
                }
            },
            "required": [
                "id",
                "start_time",
                "end_time"
            ]
        }
    },
    "resources_unavailable": {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "id_resource": {
                    "type": "string"
                },
                "id_shift": {
                    "type": "string"
                },
                "start_date": {
                    "type": "string"
                },
                "end_date": {
                    "type": "string"
                }
            },
            "required": [
                "id_resource",
                "id_shift",
                "start_date",
                "end_date"
            ]
        }
    },
    "required": ["shifts", "resources_unavailable"]
}

As explained in the section beforehand, the parameters that are unidimensional should be on a table called parameters.

Example with TSP

Let’s take the well known TSP problem and generate an instance, a solution and a configuration following these guidelines.

Instance schema

An instance of a TSP is a simple graph with positive weights in each arc. We will represent the graph by a list of arcs:

{
    "$schema": "http://json-schema.org/schema#",
    "type": "object",
    "properties": {
        "arcs": {
            "description": "Arc information between pairs of nodes",
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "n1": {"type": "integer"},
                    "n2": {"type": "integer"},
                    "w": {"type": "float"}
                },
                "required": ["n1", "n2", "w"]
            }
        }
    },
    "required": ["arcs"]
}

We are using n1 and n2 to call each the first and second node of each arc. We use w to call the weight of the arc.

An example input dataset that follows this schema is the following:

{
    "arcs": [
        {"n1": 0, "n2": 0, "w": 0},
        {"n1": 0, "n2": 1, "w": 633},
        {"n1": 0, "n2": 2, "w": 257},
        {"n1": 0, "n2": 3, "w": 91},
        {"n1": 0, "n2": 4, "w": 412}
    ]
}

Solution schema

A solution to a TSP, is the sequence in which nodes should be visited. We could use an ordered array of nodes. Nevertheless, we need to use an array of objects. We will also add a new property with the position of the node in the sequence.:

{
    "$schema": "http://json-schema.org/schema#",
    "type": "object",
    "properties": {
        "route": {
            "description": "Order of nodes in each route",
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "node": {"type": "integer"},
                    "pos": {"type": "integer"}
                },
                "required": ["pos","node"]
            }
        }
    },
    "required": ["route"]
}

node represents each node in the sequence. pos represents the position of each node in the sequence.

An example output dataset that follows this schema is the following:

{
    "route": [
        {"pos": 0,"node": 0},
        {"pos": 1,"node": 4},
        {"pos": 2,"node": 2},
        {"pos": 3,"node": 3},
        {"pos": 4,"node": 1}
    ]
}

Configuration schema

The configuration will depend on the application. We usually have some default configuration tailored to MIP problems. Here is a minimalistic proposal:

{
    "$schema": "http://json-schema.org/schema#",
    "type": "object",
    "properties": {
        "timeLimit": {"type": "float"},
        "seed": {"type": "integer"},
        "gap": {"type": "float"},
        "solver": {
            "type": "string",
            "enum": ["naive"],
            "default": "naive"
        }
    }
}

timeLimit constraints the time the solution method can run. gapRel provides a tolerance measured in relative gap (to the best possible solution). seed provides a way to make the solution method deterministic. The solver property is mandatory for all solution methods and should always have this format (a string with an “enum” attribute).

Generating jsonschema from data

Usually, the instance data is tedious to describe via json-schema format. At the same time, an example instance is usually available by default in some format (xml, Excel, csv, custom file).

In this case, the first method of the Instance will be something along the lines of from_xml that will create an Instance from that file.

Imagine the second and third methods are to_dict and from_dict. With this, it’s already possible to generate an json with the right schema: both InstanceCore and SolutionCore have a method called generate_schema that will do just that. They return an schema that is compatible with the current object.

An example code to do just that is available in the following https://github.com/baobabsoluciones/hackathonbaobab2021 project. Here is an excerpt:

import json
from hackathonbaobab2021.core import Instance
import os


def generate_schema():
    path = os.path.join(os.path.dirname(__file__), "../data/ITC2021_Test1.xml")
    instance = Instance.from_xml(path)
    schema = instance.generate_schema()
    with open(path + ".json", "w") as f:
        json.dump(schema, f, indent=4, sort_keys=True)


if __name__ == "__main__":
    generate_schema()

Json-schema validations

The json-schema are validated through their parent classes (InstanceCore, SolutionCore, ApplicationCore). This is usually done before solving a problem (e.g., see solve()). In any case, the app user can choose to take advantage of the schema to validate the input or output at any point in time by using check_schema() or check_schema().

Other jsonschema properties

In order to visualize more information on the user interface on cornflow-app we can use some other properties of the jsonschema in order to convey more information about our data structure.

The main properties used are: title, description and $comment:

  • We use the title property to set up a meaningful name for the table columns on the data tables.

  • We use the description property to give a description of the table or fields of the data tables.

  • These two properties can be an object indexed by the language key (en or es for example) in order to set up the internationalization of the user interface.

  • We use the show property to indicate if the table or column has to be seen in the user interface.

  • We use the filterable property to indicate if the table or the column can be filtered down on the user interface.

  • We use the sortable property to indicate if the table or column can be sorted on the user interface.

  • We use the format property to set up additional information of the type of the property, mainly if the field is a date field and should be treated as such for validation.

  • We uses the minimum and maximum properties for integer or number fields in order to set up the limits of the data. This limits are used by the forms in the user interface as well.