2. Serialization (with Graphs!)

GEMD objects link together to form graphs with directional edges. For example, a MaterialRun links back to the ProcessRun that produced it. Some of these links are bi-directional. For example, a ProcessRun also links forward to the MaterialRun that it produces, if there is one. Other links are uni-directional. For example, a MaterialRun links to its MaterialSpec but that MaterialSpec doesn’t link back. Uni-directional links are typically used when the multiplicity of a relationship can be large. For example, a material may be referenced in thousands of ingredients.

In GEMD, bi-directional links are readable but only a single direction is writable. For example, a MeasurementRun can set the MaterialRun material that it was performed on, but a MaterialRun cannot set the MeasurementRun’s material field is set, that MaterialRun has the MeasurementRun appended to its measurements field.

This linking structure presents several challenges for serialization and deserialization:

  • The graph cannot be traversed through uni-directional links in the wrong direction.

  • Only the writeable side of bi-directional links can be persisted.

  • Objects that are referenced by multiple objects must be deserialized to the same object.

These challenges are addressed by a custom json serialization procedure and the special LinkByUID class.

  1. Each entity that doesn’t already have at least one unique identifier is assigned a unique identifier so it can be referenced.

  2. The graph is flattened by traversing it while maintaining a seen list and replacing object references to other entities with LinkByUID objects, producing the set of entities that are reachable.

  3. The objects are sorted into a special “writable” order that ensures that link targets are created when deserializing.

  4. This sorted list of entities is assigned to the “context” field in the serialization output.

  5. The original object (which may contain multiple entities) is assigned to the “object” field in the serialization output.

  6. The serialization output is serialized with a special JSONEncoder, GEMDEncoder, that skips the soft side of links.

Here’s an example of the serialized output for a MaterialSpec and ProcessSpec:

{
  "context": [
    {
      "conditions": [],
      "file_links": [],
      "name": "producing process",
      "notes": null,
      "parameters": [],
      "tags": [],
      "template": null,
      "type": "process_spec",
      "uids": {
        "auto": "a103b759-b3e9-472e-8ec1-c69ee5d1981a"
      }
    },
    {
      "file_links": [],
      "name": "Produced material",
      "notes": null,
      "process": {
        "id": "a103b759-b3e9-472e-8ec1-c69ee5d1981a",
        "scope": "auto",
        "type": "link_by_uid"
      },
      "properties": [],
      "tags": [],
      "template": null,
      "type": "material_spec",
      "uids": {
        "auto": "ad2c31ab-e8c0-40f1-a1b6-c5b5950026cd"
      }
    }
  ],
  "object": {
    "id": "ad2c31ab-e8c0-40f1-a1b6-c5b5950026cd",
    "scope": "auto",
    "type": "link_by_uid"
  }
}

The deserialization is a comparatively simple two-step process. First, the string or file is deserialized with Python’s builtin deserializer and a custom object hook. This hook does three things: it knows how to build GEMD entities and other DictSerializable objects, it creates an index with the unique identifiers of the GEMD entities that it has seen so far, and it replaces any LinkByUID that it encounters with objects from that index. The only thing left to do is return the "object" item from the resulting dictionary.

This strategy is implemented in the GEMDJson class and conveniently exposed in the gemd.json module, which provides the familiar json interface.