2. GEMD Data Model

2.1. Creating Data Model Objects

Each data object and template in the GEMD (Graphical Expression of Materials Data) data model has a corresponding resource in the Citrine Python client. For example, the ProcessSpec class implements the ProcessSpec object in GEMD. The Citrine Python client implementations are consistent with the GEMD model specification.

The Citrine Python client is built on top of and entirely interoperable with the gemd-python package. Any method that accepts the Citrine Python client’s implementations of data model objects should also accept those from GEMD.

2.2. Identifying Data Model Objects

After registering a data model object, you will probably want to be able to find that object again. The easiest way to get an existing object is by one of its unique identifiers. Every data model object on the Citrine Platform has a platform-issued identifier, often referred to as the “Citrine Identifier” or “CitrineId”. These identifiers are UUID4, which are extremely robust but also not especially human readable.

Alternative identifiers are an easier way to recall data objects. To create an alternative identifier, simply add key-value pairs to the uids dictionary in the data model object. The key defines the scope and the value of the id. As a pair, they must be unique across the entire platform. You can think of each value of scope as defining a namespace, with id being the name within that namespace.

2.3. Registering Data Model Objects

Data model objects are created on the Citrine Platform through the register method present in each data model object collection that comes from a dataset. For example:

dataset.process_specs.register(ProcessSpec(...))

Equivalent behavior is available through the type-agnostic gemd collection and directly from a dataset object:

dataset.gemd.register(ProcessSpec(...))
dataset.register(ProcessSpec(...))

Note that registration must be performed within the scope of a dataset: the dataset into which the objects are being written. The data model object collections that are defined with the project scope (such as project.process_specs) are read-only and will throw an error if their register method is called.

If you are registering several objects at the same time, you can use the register_all method that is available via the same objects:

dataset.process_specs.register_all(ProcessSpec(...), ProcessRun(...))
dataset.gemd.register_all(ProcessSpec(...), ProcessRun(...))
dataset.register_all(ProcessSpec(...), ProcessRun(...))

register and register_all will work with any data model object type. register_all will sort objects so that interdependent models (e.g., a whole material history) can be passed in one call. If you have GEMD objects, e.g., ProcessSpec, you can register it just like the objects defined in the Citrine Python client.

2.4. Finding Data Model Objects

If you know any a data model object’s unique identifiers, then you can get that object by its unique identifier. For example:

project.process_templates.get(LinkByUID(scope="standard-templates", id="milling"))

If you know the CitrineID, you do not need to specify a scope:

project.process_templates.get(CitrineID)

If you don’t know any of the data model object’s unique identifiers, then you can list the data model objects and find your object in that list:

project.process_templates.list()

These results can be further constrained by dataset:

dataset.process_templates.list()

The list_by_tag(), list_by_attribute_bounds(), and list_by_name() methods can help refine the listing to make the target object easier to find.

There also exist methods for locating data objects by its reference to another object:

Runs may be listed by spec with MaterialRunCollection.list_by_spec(), IngredientRunCollection.list_by_spec(), MeasurementRunCollection.list_by_spec(), and ProcessRunCollection.list_by_spec().

Specs may be listed by template with MaterialSpecCollection.list_by_template(), ProcessSpecCollection.list_by_template(), and MeasurementSpecCollection.list_by_template().

The output material for a process can be located with MaterialRunCollection.get_by_process() or MaterialSpecCollection.get_by_process().

The ingredients a material is used in can be located with IngredientRunCollection.list_by_material(), or IngredientSpecCollection.list_by_material().

The measurements of a material can be located with MeasurementRunCollection.list_by_material().

2.5. Updating Data Model Objects

Runs and specs can be quickly modified in-place and persisted with register or register_all, but templates require more care. In particular, changing the bounds or allowed names/labels of a template could invalidate existing data objects; thus every object on platform must be compared against the desired change. If there is no risk that an update could invalidate data (e.g., changing an object name), the template can be updated as per runs and specs.

If such a risk exists (e.g., making bounds more restrictive), register and register_all will raise exceptions. To attempt such a template update, use update(). If the update is invalid, then the reasons for failure are logged.

2.6. Referencing Data Model Objects

Many data model objects contain links to other data model objects. For example, a MaterialSpec references the ProcessSpec that produced it. These links are created with the LinkByUID class, for example:

process = ProcessSpec("my process", uids={"my namespace": "my process"})
dataset.process_specs.register(process)
link = LinkByUID(scope="my namespace", id="my_process")
material = MaterialSpec("my material", process=link)
dataset.material_specs.register(material)

LinkByUIDs can also be useful for retrieving referenced objects:

template = dataset.gemd.get(process_spec.template.to_link())

2.7. Material History

Starting with a specific terminal MaterialRun, you can retrieve the complete material history – every process, ingredient, and material that contributed to the target material, as well as the measurements that were performed on all of those materials. The method is get_history(), and it requires you to know a unique identifier (scope/id pair) for the material.

2.8. Validating Data Model Objects

2.8.1. Dry-Run Validation

If you try to register or delete an invalid data model object, the operation fails with an error message that specifies in what way(s) the data model object was invalid. For example:

spec = ProcessSpec("foo")
run = ProcessRun("bar", spec=spec)

spec = dataset.process_specs.register(spec)
run = dataset.process_runs.register(run)

dataset.process_specs.delete(spec.uids["id"])

yields

ERROR:citrine._session:400 DELETE projects/$PROJECT_ID/datasets/$DATASET_ID/process-specs/id/$PROCESS_SPEC_ID
ERROR:citrine._session:{"code":400,"message":"object $PROCESS_SPEC_ID in dataset $DATASET_ID not deleted. See ValidationErrors for details.","validation_errors":[{"failure_message":"Referenced by process_run in dataset $DATASET_ID with ID $PROCESS_RUN_ID","failure_id":"object.mutation.referenced"}]}

If you want to run these same validations on a data model object without the possibility of registering or deleting the object, pass the dry_run=True argument to either the register or delete method. In the example above, this would look like

dataset.process_specs.delete(spec.uids["id"], dry_run=True)

Setting dry_run=True in either register or delete causes the method to run through all of its validations and if any fail, provide the same error that the method would provide without the dry_run argument. If all validations succeed, the method returns the same success value that it would without the dry_run argument, but the object will not be registered or deleted.

Setting dry_run=False is equivalent to not specifying dry_run at all and will have no effect.

2.8.2. Template and Simple Validations

Sometimes, it is convenient to validate a group of runs and/or specs against their attribute and object templates before any of the data objects are stored. The .validate_templates() methods, available for all runs and specs, validate the provided object against all of the (already-stored) attribute templates linked to attributes on the object as well as against an optional object template. Notably, these methods do not validate linked objects in any way, making it possible to run validations on an object with links to yet-unstored objects. Be aware that this means that .validate_templates() will not surface any link-based errors. This method returns a list of validation errors, which is empty on validation success.

The examples below illustrate the usage of .validate_templates() and its expected return values.

Example with validation errors with no object template:

condition1 = Condition('condition_name', value=UniformInteger(1, 2))
condition2 = Condition('condition_name', value=UniformInteger(1, 3))
parameter1 = Parameter('parameter_name', value=UniformInteger(1, 4))
parameter2 = Parameter('parameter_name', value=UniformInteger(1, 5))
process_spec = ProcessSpec(name='spec name')
process_run = ProcessRun(
    name='run name',
    spec=process_spec,
    conditions=[condition1, condition2],
    parameters=[parameter1, parameter2]
)
dataset.process_runs.validate_templates(process_run)

has return value:

[{'failure_message': 'Multiple Condition with named condition_name', 'property': None, 'failure_id': 'attribute.duplicate', 'input': None, 'type': NotImplemented},
 {'failure_message': 'Multiple Parameter with named parameter_name', 'property': None, 'failure_id': 'attribute.duplicate', 'input': None, 'type': NotImplemented}]

Example with validation errors with an object template:

condition_template = ConditionTemplate("condition template", bounds=IntegerBounds(1, 5))
condition_template = dataset.condition_templates.register(condition_template)

condition = Condition("condition", value=UniformInteger(1, 3), template=condition_template)
process_template = ProcessTemplate(
    "pt",
    conditions=[[LinkByUID("id", condition_template.uids["id"]), IntegerBounds(2, 4)]]
)
process_spec = ProcessSpec("ps", template=process_template)
process_run = ProcessRun("pr", conditions=[condition], spec=process_spec)
dataset.process_runs.validate_templates(process_run, object_template=process_template)

has return value:

[{'failure_message': 'UniformInteger(1,3) extends below 2 {2}', 'property': None, 'failure_id': 'attribute.bounds.value', 'input': None, 'type': NotImplemented}]

For ingredients, the associated object template is a process template that is provided as a separate parameter. Any value provided to the object_template parameter for an ingredient will be ignored.

Example with validation errors for an ingredient:

process_template = ProcessTemplate("pt", allowed_names=["foo"], allowed_labels=["bar"])
process_spec = ProcessSpec("ps", template=process_template)

mat_process_spec = ProcessSpec("mps")
material_spec = MaterialSpec("ms", process=mat_process_spec)

ingredient_spec = IngredientSpec(
    "is",
    process=process_spec,
    material=material_spec,
    labels=["ingredient"]
)
dataset.ingredient_specs.validate_templates(
    model=ingredient_spec,
    ingredient_process_template=process_template
)

has return value:

[{'failure_message': 'Ingredient label ingredient not in list of allowed labels Set(bar)', 'property': None, 'failure_id': 'ingredient.label.allowed', 'input': None, 'type': NotImplemented},
 {'failure_message': 'Ingredient name is not in list of allowed names Set(foo)', 'property': None, 'failure_id': 'ingredient.name.allowed', 'input': None, 'type': NotImplemented}]