It's not you; it's us. GEMD is not a simple representation, and we frequently get questions about how to use it to represent certain kinds of data. This page attempts to answer some of those questions.

How do you pronounce GEMD?

This has been a matter of some debate. Both /jemd/ and /jem-dee/ are in common use.

I'm following the documentation, but still having validation problems when using the Citrine Platform.

The implementation of GEMD on the current Citrine Platform is not 100% as described here. Check out Known Limitations or contact your Citrine representative for help.

Where do I store statistics about my measurements?

Let's say that you have data for the same kind of measurement being repeated multiple times. For example, you take a sample from your material, subdivide it 8 times, and perform the same measurement on each of those 8 sub-samples. The measurement that you are performing can be represented by a single MeasurementSpec and each of those 8 sub-samples as its own MeasurementRun associated with that spec. Each MeasurementRun is associated with the MatererialRun representing the material that you sampled and sub-divided.

For each property in the MeasurementRun objects, you may want to compute some statistics of the 8-sample distribution, e.g., the mean and the standard deviation of each property and/or its minimum and maximum values. In the future, these statistics will be automatically computed on the Citrine Platform. For now, you can compute them yourself and record them in another MeasurementRun object distinct from the 8 samples. If there are multiple properties and/or multiple statistics, they should all go in the same MeasurementRun. The statistics should be Property attributes with their origin field set to computed.

How do I represent repeated applications of the same process?

Consider a situation in which a process is repeatedly applied to a material, but does not substantially change the material, e,g., repeated application of heat-treatment to harden a ceramic or multiple coats of paint. Because material histories are chronological, each application must be a new process and must produce a new output material. However, the materials/processes may reference the same templates if the same attributes are relevant for each iteration. It is strongly recommended that such processes contain an attribute describing what order that process was performed in (e.g., heat-treatment number 1, 2, 3, etc.). This additional attribute will allow for disambiguation between these linear processes. The additional attribute should have an Attribute Template, but there is no requirement for it to be included in the Object Template as well.

How do I represent repeated uses of the same material?

It is perfectly fine for the same material to be used in multiple processes throughout a material history, but a new ingredient must be created for each use. Ingredients annotate the use of a specific material in a specific process, recording how much was used and labeling the role of the material. Process templates should be used to describe what types of ingredients are expected in a process.

The same material can even be used multiple times in one process. Consider the following example: a thin film is created by thermally evaporating three materials in succession onto a substrate. The thermal evaporation process template specifies the allowed names as "layer 1," "layer 2," and "layer 3." We have several materials to choose from for the three layers, but in one instance we wish to evaporate a layer of material "A," then a layer of "B," then a final layer of "A." We create a Process Spec linked to the thermal evaporation process template and create three Ingredient Specs, each of which point to the Process Spec as their process. One ingredient points to "A" as its material and has layer 1 as its name. One points to "B" as its material and has layer 2 as its name. And the final Ingredient Spec also points to "A" as its material but is differentiated because it has layer 3 as its name.

What is the difference between description and notes?

description is a field on both Attribute Templates and Object Templates. It is used to describe the type of data that a template is intended to constrain--the documentation of its intended use. We strongly encourage documenting all templates, given both how important they are in constraining data and communicating structure to analysis algorithms, and that a template is likely to be reused by multiple users.

notes are associated with Attributes and Objects. This is a place to put pieces of information that may be important to understanding this particular piece of data but do not naturally fit in the other fields. This might include an annotation about something unusual about this particular sample. Notes normally contain information that is useful for a human but would not be useful in training a machine learning model.

I found a reference to something called "taurus" - what is that?

"Taurus" was the codename for GEMD when we first started work. We're in the process of migrating everything over to GEMD, but names can be surprisingly persistent.