citrine.resources.data_objects module

Top-level class for all data object (i.e., spec and run) objects and collections thereof.

class citrine.resources.data_objects.DataObject

Bases: DataConcepts, BaseObject, ABC

An abstract data object object.

DataObject must be extended along with Resource

add_uid(scope: str, uid: str)

Add a uid.

Parameters:

scope (str) – scope of the uid
uid (str) – Unique identifier

all_dependencies() → Set[BaseEntityType | LinkByUIDType]: Return a set of all immediate dependencies (no recursion).

as_dict() → Dict[str, Any]

Convert the object to a dictionary.

Returns:: A dictionary representation of the object, where the keys are its fields.
Return type:: dict

classmethod build(data: dict) → SelfType: Build the underlying type.

dump() → Dict[str, Any]

Convert the object to a JSON dictionary, so that every entry is serialized.

Uses the json encoder client, so objects with uids are converted to LinkByUID dictionaries.

Returns:: A string representation of the object as a dictionary.
Return type:: str

classmethod from_dict(d: Mapping[str, Any]) → DictSerializableType

Reconstitute the object from a dictionary.

Parameters:: d (dict) – The object as a dictionary of key-value pairs that correspond to the object’s fields.
Returns:: The deserialized object.
Return type:: DictSerializable

classmethod get_collection_type(data) → type[DataConceptsCollection]

Determine the associated collection type for a serialized data object.

The data dictionary must have a ‘type’ key whose value corresponds to the individual key of one of the collections that extends DataConceptsCollection.

Parameters:: data (dict) – A dictionary corresponding to a serialized data concepts object of unknown type. This method will also work if data is a deserialized GEMD object.
Returns:: The collection type corresponding to data.
Return type:: collection

classmethod get_type(data) → type[Serializable]

Determine the class of a serialized object.

The data dictionary must have a ‘type’ key whose value corresponds to the response key of one of the classes that extends DataConcepts.

Parameters:: data (dict) – A dictionary corresponding to a serialized data concepts object of unknown type. This method will also work if data is a deserialized GEMD object.
Returns:: The class corresponding to data.
Return type:: class

to_link(scope: str | None = None, *, allow_fallback: bool = False) → LinkByUIDType

Generate a ~gemd.entity.link_by_uid.LinkByUID for this object.

Parameters:

scope (str, optional) – scope of the uid to get
allow_fallback (bool) – whether to grab another scope/id if chosen scope is missing (Default: False).

Return type:

LinkByUID

property audit_info: AuditInfo | None: Get the audit info object.

collection_dict = {}

dictionary from the type key to the associated collection for every class that extends DataConcepts.

Only populated if the get_collection_type() method is invoked.

Type:: dict[str, class]

property dataset: UUID | None: Get the dataset of this object, if it was returned by the backend.

file_links

property name: str: Name of the object.

notes = None

skip = {}

typ = None

property uid: UUID | None: Get the Citrine Identifier (scope = “id”), or None if not registered.

class citrine.resources.data_objects.DataObjectCollection(*, session: Session, team_id: UUID, dataset_id: UUID | None = None)

Bases: DataConceptsCollection[DataObjectResourceType], ABC

A collection of one kind of data object object.

async_update(model: ResourceType, *, dry_run: bool = False, wait_for_response: bool = True, timeout: float = 120, polling_delay: float = 1.0, return_model: bool = False) → UUID | ResourceType | None

Update a particular element of the collection with data validation.

Update a particular element of the collection, doing a deeper check to ensure that the dependent data objects are still with the (potentially) changed constraints of this change. This will allow you to make bounds and allowed named/labels changes to templates.

Parameters:

model (ResourceType) – The DataConcepts object.
dry_run (bool) – Whether to actually update the item or run a dry run of the update operation. Dry run is intended to be used for validation. Default: false
wait_for_response (bool) – Whether to poll for the eventual response. This changes the return type (see below).
timeout (float) – How long to poll for the result before giving up. This is expressed in (fractional) seconds.
polling_delay (float) – How long to delay between each polling retry attempt.
return_model (bool) – Whether or not to return an updated version of the resource If wait_for_response is False, then this argument has no effect

Returns:

If wait_for_response if True, then this call will poll the backend, waiting for the eventual job result. In the case of successful validation/update, a return value of None is provided unless return_model is True, in which case the updated resource is fetched and returned. In the case of a failure validating or processing the update, an exception (JobFailureError) is raised and an error message is logged with the underlying reason of the failure.

If wait_for_response if False, A job ID (of type UUID) is returned that one can use to poll for the job completion and result with the poll_async_update_job() method.

Return type:

UUID | None

build(data: dict) → ResourceType

Build an object of type ResourceType from a serialized dictionary.

This is an internal method, and should not be called directly by users.

Parameters:: data (dict) – A serialized data model object.
Returns:: A data model object built from the dictionary.
Return type:: ResourceType

delete(uid: UUID | str | LinkByUID | BaseEntity, *, dry_run: bool = False)

Delete an element of the collection by its id.

Parameters:

uid (UUID | str | LinkByUID | BaseEntity) – A representation of the object (Citrine id, LinkByUID, or the object itself)
dry_run (bool) – Whether to actually delete the item or run a dry run of the delete operation. Dry run is intended to be used for validation. Default: false

get(uid: UUID | str | LinkByUID | BaseEntity) → ResourceType

Get an element of the collection by its id.

Parameters:: uid (UUID | str | LinkByUID | BaseEntity) – A representation of the object (Citrine id, LinkByUID, or the object itself)
Returns:: An object with specified scope and uid
Return type:: ResourceType

abstract classmethod get_type() → type[Serializable]: Return the resource type in the collection.

list(*, per_page: int | None = 100, forward: bool = True) → Iterator[ResourceType]

Get all visible elements of the collection.

The order of results should not be relied upon, but for now they are sorted by dataset, object type, and creation time (in that order of priority).

Parameters:

per_page (int, optional) – Max number of results to return per page. It is very unlikely that setting this parameter to something other than the default is useful. It exists for rare situations where the client is bandwidth constrained or experiencing latency from large payload sizes.
forward (bool) – Set to False to reverse the order of results (i.e., return in descending order)

Returns:

Every object in this collection.

Return type:

Iterator[ResourceType]

list_by_attribute_bounds(attribute_bounds: dict[AttributeTemplate | LinkByUID, BaseBounds], *, forward: bool = True, per_page: int = 100) → Iterator[DataObject]

Get all objects in the collection with attributes within certain bounds.

Results are ordered first by dataset, then by attribute value.

Currently only one attribute and one bounds on that attribute is supported, and attribute type must be numeric.

Parameters:

attribute_bounds (dict[AttributeTemplate | LinkByUID, BaseBounds]) – A dictionary from attributes to the bounds on that attribute. Currently only real and integer bounds are supported. Each attribute may be represented as an AttributeTemplate (PropertyTemplate, ParameterTemplate, or ConditionTemplate) or as a LinkByUID, but in either case there must be a uid and it must correspond to an AttributeTemplate that exists in the database. Only the uid is passed, so if you would like to update an attribute template you must register that change to the database before you can use it to filter.
forward (bool) – Set to False to reverse the order of results (i.e., return in descending order).
per_page (int) – Controls the number of results fetched with each http request to the backend. Typically, this is set to a sensible default and should not be modified. Consider modifying this value only if you find this method is unacceptably latent.

Returns:

List of every object in this collection whose name matches the search term.

Return type:

Iterator[DataObject]

list_by_name(name: str, *, exact: bool = False, forward: bool = True, per_page: int = 100) → Iterator[ResourceType]

Get all objects with specified name in this dataset.

Parameters:

name (str) – case-insensitive object name prefix to search.
exact (bool) – Set to True to change prefix search to exact search (but still case-insensitive). Default is False.
forward (bool) – Set to False to reverse the order of results (i.e., return in descending order).
per_page (int) – Controls the number of results fetched with each http request to the backend. Typically, this is set to a sensible default and should not be modified. Consider modifying this value only if you find this method is unacceptably latent.

Returns:

List of every object in this collection whose name matches the search term.

Return type:

Iterator[ResourceType]

list_by_tag(tag: str, *, per_page: int = 100) → Iterator[ResourceType]

Get all objects bearing a tag prefixed with tag in the collection.

The order of results is largely not meaningful. Results from the same dataset will be grouped together but no other meaningful ordering can be relied upon. Duplication in the result set may (but needn’t) occur when one object has multiple tags matching the search tag. For this reason, it is inadvisable to put 2 tags with the same prefix (e.g., ‘foo::bar’ and ‘foo::baz’) in the same object when it can be avoided.

Parameters:

tag (str) – The prefix with which to search. Must fully match up to the first delimiter (ex. ‘foo’ and ‘foo::b’ both match ‘foo::bar’ but ‘fo’ is insufficient.
per_page (int) – Controls the number of results fetched with each http request to the backend. Typically, this is set to a sensible default and should not be modified. Consider modifying this value only if you find this method is unacceptably latent.

Returns:

Every object in this collection.

Return type:

Iterator[ResourceType]

poll_async_update_job(job_id: UUID, *, timeout: float = 120, polling_delay: float = 1.0) → None

Poll for the result of the async_update call.

This call will poll the backend given the Job ID that came from a call to async_update(), waiting for the eventual job result. In the case of successful validation/update, a return value of None is provided which indicates success. In the case of a failure validating or processing the update, an exception (JobFailureError) is raised and an error message is logged with the underlying reason of the failure.

Parameters:

job_id (UUID) – The job ID for the asynchronous update job we wish to poll.
timeout – How long to poll for the result before giving up. This is expressed in (fractional) seconds.
polling_delay – How long to delay between each polling retry attempt.

Returns:

This method will raise an appropriate exception if the job failed, else it will return None to indicate the job was successful.

Return type:

None

register(model: ResourceType, *, dry_run=False)

Create a new element of the collection or update an existing element.

If the input model has an ID that corresponds to an existing object in the database, then that object will be updated. Otherwise a new object will be created.

Only the top-level object in model itself is written to the database with this method. References to other objects are persisted as links, and the object returned by this method has all instances of data objects replaced by instances of LinkByUid. Registering an object which references other objects does NOT implicitly register those other objects. Rather, those other objects’ values are ignored, and the pre-existence of objects with their IDs is asserted before attempting to write model.

Parameters:

model (ResourceType) – The DataConcepts object.
dry_run (bool) – Whether to actually register the item or run a dry run of the register operation. Dry run is intended to be used for validation. Default: false

Returns:

A copy of the registered object as it now exists in the database.

Return type:

ResourceType

register_all(models: Iterable[ResourceType], *, dry_run: bool = False, status_bar: bool = False, include_nested: bool = False) → List[ResourceType]

Register multiple GEMD objects to each of their appropriate collections.

Does so in an order that is guaranteed to store all linked items before the item that references them.

If the GEMD objects have no UIDs, Citrine IDs will be assigned to them prior to passing them on to the server. This is required as otherwise there is no way to determine how objects are related to each other. When the registered objects are returned from the server, the input GEMD objects will be updated with whichever uids & _citr_auto:: tags are on the returned objects. This means GEMD objects that already exist on the server will be updated with all their on-platform uids and tags.

This method has the same behavior as register, except that no models will be written if any one of them is invalid. Using this method should yield significant improvements to write speed over separate calls to register.

Parameters:

models (Iterable[DataConcepts]) – The data model objects to register. Can be different types.
dry_run (bool) – Whether to actually register the objects or run a dry run of the register operation. Dry run is intended to be used for validation. Default: false
status_bar (bool) – Whether to display a status bar using the tqdm module to track progress in registration. Requires installing the optional tqdm module. Default: false
include_nested (bool) – Whether to just register the objects passed in the list, or include nested objects (e.g., obj.process, obj.spec.template, …). Default: false

Returns:

Each object model as it now exists in the database.

Return type:

list[DataConcepts]

update(model: ResourceType) → ResourceType

Update a data object model.

Update a particular element of the collection, first attempting a simple update using register and falling back to async_update if the changes require it (e.g., updating template bounds).

model: ResourceType: The DataConcepts object.

validate_templates(*, model: DataObjectResourceType, object_template: ObjectTemplateResourceType | None = None, ingredient_process_template: ProcessTemplate | None = None) → list[ValidationError]

Validate a data object against its templates.

Validates against provided object templates (passed in as parameters) and stored attribute templates linked on the data object.

Parameters:

model – the data object to validate
object_template – optional object template to validate against
ingredient_process_template – optional process template to validate ingredient against. Ignored unless data object is an IngredientSpec or IngredientRun.

Returns:

list[ValidationError] of validation errors encountered. Empty if successful.