citrine.resources.data_concepts module

Top-level class for all data concepts objects and collections thereof.

class citrine.resources.data_concepts.DataConcepts

Bases: PolymorphicSerializable[DataConcepts], BaseEntity

An abstract data concepts object.

DataConcepts must be extended along with Resource.

Parameters:

typ (str) – A string denoting what type of DataConcepts class a particular instantiation is.

add_uid(scope: str, uid: str)

Add a uid.

Parameters:
  • scope (str) – scope of the uid

  • uid (str) – Unique identifier

all_dependencies() Set[BaseEntityType | LinkByUIDType]

Return a set of all immediate dependencies (no recursion).

as_dict() Dict[str, Any]

Convert the object to a dictionary.

Returns:

A dictionary representation of the object, where the keys are its fields.

Return type:

dict

classmethod build(data: dict) SelfType

Build the underlying type.

dump() Dict[str, Any]

Convert the object to a JSON dictionary, so that every entry is serialized.

Uses the json encoder client, so objects with uids are converted to LinkByUID dictionaries.

Returns:

A string representation of the object as a dictionary.

Return type:

str

classmethod from_dict(d: Mapping[str, Any]) DictSerializableType

Reconstitute the object from a dictionary.

Parameters:

d (dict) – The object as a dictionary of key-value pairs that correspond to the object’s fields.

Returns:

The deserialized object.

Return type:

DictSerializable

classmethod get_collection_type(data) Type[DataConceptsCollection]

Determine the associated collection type for a serialized data object.

The data dictionary must have a ‘type’ key whose value corresponds to the individual key of one of the collections that extends DataConceptsCollection.

Parameters:

data (dict) – A dictionary corresponding to a serialized data concepts object of unknown type. This method will also work if data is a deserialized GEMD object.

Returns:

The collection type corresponding to data.

Return type:

collection

classmethod get_type(data) Type[Serializable]

Determine the class of a serialized object.

The data dictionary must have a ‘type’ key whose value corresponds to the response key of one of the classes that extends DataConcepts.

Parameters:

data (dict) – A dictionary corresponding to a serialized data concepts object of unknown type. This method will also work if data is a deserialized GEMD object.

Returns:

The class corresponding to data.

Return type:

class

Generate a ~gemd.entity.link_by_uid.LinkByUID for this object.

Parameters:
  • scope (str, optional) – scope of the uid to get

  • allow_fallback (bool) – whether to grab another scope/id if chosen scope is missing (Default: False).

Return type:

LinkByUID

property audit_info: AuditInfo | None

Get the audit info object.

collection_dict = {}

dictionary from the type key to the associated collection for every class that extends DataConcepts.

Only populated if the get_collection_type() method is invoked.

Type:

Dict[str, class]

property dataset: UUID | None

Get the dataset of this object, if it was returned by the backend.

skip = {}
tags
typ = None
property uid: UUID | None

Get the Citrine Identifier (scope = “id”), or None if not registered.

uids
class citrine.resources.data_concepts.DataConceptsCollection(project_id: UUID, dataset_id: UUID, session: Session)

Bases: Collection[ResourceType], ABC

A collection of one kind of data concepts object.

Parameters:
  • project_id (UUID) – The uid of the project that this collection belongs to.

  • dataset_id (UUID) – The uid of the dataset that this collection belongs to. If None then the collection ranges over all datasets in the project. Note that this is only allowed for certain actions. For example, you can use list_by_tag() to search over all datasets, but when using register() to upload or update an object, a dataset must be specified.

  • session (Session) – The Citrine session used to connect to the database.

async_update(model: ResourceType, *, dry_run: bool = False, wait_for_response: bool = True, timeout: float = 120, polling_delay: float = 1.0, return_model: bool = False) UUID | ResourceType | None

Update a particular element of the collection with data validation.

Update a particular element of the collection, doing a deeper check to ensure that the dependent data objects are still with the (potentially) changed constraints of this change. This will allow you to make bounds and allowed named/labels changes to templates.

Parameters:
  • model (ResourceType) – The DataConcepts object.

  • dry_run (bool) – Whether to actually update the item or run a dry run of the update operation. Dry run is intended to be used for validation. Default: false

  • wait_for_response (bool) – Whether to poll for the eventual response. This changes the return type (see below).

  • timeout (float) – How long to poll for the result before giving up. This is expressed in (fractional) seconds.

  • polling_delay (float) – How long to delay between each polling retry attempt.

  • return_model (bool) – Whether or not to return an updated version of the resource If wait_for_response is False, then this argument has no effect

Returns:

If wait_for_response if True, then this call will poll the backend, waiting for the eventual job result. In the case of successful validation/update, a return value of None is provided unless return_model is True, in which case the updated resource is fetched and returned. In the case of a failure validating or processing the update, an exception (JobFailureError) is raised and an error message is logged with the underlying reason of the failure.

If wait_for_response if False, A job ID (of type UUID) is returned that one can use to poll for the job completion and result with the poll_async_update_job() method.

Return type:

Optional[UUID]

build(data: dict) ResourceType

Build an object of type ResourceType from a serialized dictionary.

This is an internal method, and should not be called directly by users.

Parameters:

data (dict) – A serialized data model object.

Returns:

A data model object built from the dictionary.

Return type:

ResourceType

delete(uid: UUID | str | LinkByUID | BaseEntity, *, dry_run: bool = False)

Delete an element of the collection by its id.

Parameters:
  • uid (Union[UUID, str, LinkByUID, BaseEntity]) – A representation of the object (Citrine id, LinkByUID, or the object itself)

  • dry_run (bool) – Whether to actually delete the item or run a dry run of the delete operation. Dry run is intended to be used for validation. Default: false

get(uid: UUID | str | LinkByUID | BaseEntity) ResourceType

Get an element of the collection by its id.

Parameters:

uid (Union[UUID, str, LinkByUID, BaseEntity]) – A representation of the object (Citrine id, LinkByUID, or the object itself)

Returns:

An object with specified scope and uid

Return type:

ResourceType

abstract classmethod get_type() Type[Serializable]

Return the resource type in the collection.

list(*, per_page: int | None = 100, forward: bool = True) Iterator[ResourceType]

Get all visible elements of the collection.

The order of results should not be relied upon, but for now they are sorted by dataset, object type, and creation time (in that order of priority).

Parameters:
  • per_page (int, optional) – Max number of results to return per page. It is very unlikely that setting this parameter to something other than the default is useful. It exists for rare situations where the client is bandwidth constrained or experiencing latency from large payload sizes.

  • forward (bool) – Set to False to reverse the order of results (i.e., return in descending order)

Returns:

Every object in this collection.

Return type:

Iterator[ResourceType]

list_by_name(name: str, *, exact: bool = False, forward: bool = True, per_page: int = 100) Iterator[ResourceType]

Get all objects with specified name in this dataset.

Parameters:
  • name (str) – case-insensitive object name prefix to search.

  • exact (bool) – Set to True to change prefix search to exact search (but still case-insensitive). Default is False.

  • forward (bool) – Set to False to reverse the order of results (i.e., return in descending order).

  • per_page (int) – Controls the number of results fetched with each http request to the backend. Typically, this is set to a sensible default and should not be modified. Consider modifying this value only if you find this method is unacceptably latent.

Returns:

List of every object in this collection whose name matches the search term.

Return type:

Iterator[ResourceType]

list_by_tag(tag: str, *, per_page: int = 100) Iterator[ResourceType]

Get all objects bearing a tag prefixed with tag in the collection.

The order of results is largely not meaningful. Results from the same dataset will be grouped together but no other meaningful ordering can be relied upon. Duplication in the result set may (but needn’t) occur when one object has multiple tags matching the search tag. For this reason, it is inadvisable to put 2 tags with the same prefix (e.g., ‘foo::bar’ and ‘foo::baz’) in the same object when it can be avoided.

Parameters:
  • tag (str) – The prefix with which to search. Must fully match up to the first delimiter (ex. ‘foo’ and ‘foo::b’ both match ‘foo::bar’ but ‘fo’ is insufficient.

  • per_page (int) – Controls the number of results fetched with each http request to the backend. Typically, this is set to a sensible default and should not be modified. Consider modifying this value only if you find this method is unacceptably latent.

Returns:

Every object in this collection.

Return type:

Iterator[ResourceType]

poll_async_update_job(job_id: UUID, *, timeout: float = 120, polling_delay: float = 1.0) None

Poll for the result of the async_update call.

This call will poll the backend given the Job ID that came from a call to async_update(), waiting for the eventual job result. In the case of successful validation/update, a return value of None is provided which indicates success. In the case of a failure validating or processing the update, an exception (JobFailureError) is raised and an error message is logged with the underlying reason of the failure.

Parameters:
  • job_id (UUID) – The job ID for the asynchronous update job we wish to poll.

  • timeout – How long to poll for the result before giving up. This is expressed in (fractional) seconds.

  • polling_delay – How long to delay between each polling retry attempt.

Returns:

This method will raise an appropriate exception if the job failed, else it will return None to indicate the job was successful.

Return type:

None

register(model: ResourceType, *, dry_run=False)

Create a new element of the collection or update an existing element.

If the input model has an ID that corresponds to an existing object in the database, then that object will be updated. Otherwise a new object will be created.

Only the top-level object in model itself is written to the database with this method. References to other objects are persisted as links, and the object returned by this method has all instances of data objects replaced by instances of LinkByUid. Registering an object which references other objects does NOT implicitly register those other objects. Rather, those other objects’ values are ignored, and the pre-existence of objects with their IDs is asserted before attempting to write model.

Parameters:
  • model (ResourceType) – The DataConcepts object.

  • dry_run (bool) – Whether to actually register the item or run a dry run of the register operation. Dry run is intended to be used for validation. Default: false

Returns:

A copy of the registered object as it now exists in the database.

Return type:

ResourceType

register_all(models: Iterable[ResourceType], *, dry_run: bool = False, status_bar: bool = False, include_nested: bool = False) List[ResourceType]

Register multiple GEMD objects to each of their appropriate collections.

Does so in an order that is guaranteed to store all linked items before the item that references them.

If the GEMD objects have no UIDs, Citrine IDs will be assigned to them prior to passing them on to the server. This is required as otherwise there is no way to determine how objects are related to each other. When the registered objects are returned from the server, the input GEMD objects will be updated with whichever uids & _citr_auto:: tags are on the returned objects. This means GEMD objects that already exist on the server will be updated with all their on-platform uids and tags.

This method has the same behavior as register, except that no models will be written if any one of them is invalid. Using this method should yield significant improvements to write speed over separate calls to register.

Parameters:
  • models (Iterable[DataConcepts]) – The data model objects to register. Can be different types.

  • dry_run (bool) – Whether to actually register the objects or run a dry run of the register operation. Dry run is intended to be used for validation. Default: false

  • status_bar (bool) – Whether to display a status bar using the tqdm module to track progress in registration. Requires installing the optional tqdm module. Default: false

  • include_nested (bool) – Whether to just register the objects passed in the list, or include nested objects (e.g., obj.process, obj.spec.template, …). Default: false

Returns:

Each object model as it now exists in the database.

Return type:

List[DataConcepts]

update(model: ResourceType) ResourceType

Update a data object model.

Update a particular element of the collection, first attempting a simple update using register and falling back to async_update if the changes require it (e.g., updating template bounds).

model: ResourceType

The DataConcepts object.

class citrine.resources.data_concepts.DataConceptsMeta(name, bases, *args, typ: str | None = None, skip: Set[str] = frozenset({}), **kwargs)

Bases: DictSerializableMeta

Data Concepts metaclass for handling serialization.

mro()

Return a type’s method resolution order.

register(subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

property class_mapping: Dict[str, type]

Return class typ string -> class map for DictSerializable and its descendants.

Note that is actually returns a copy of the internal dict to avoid accidental breakage.

Returns:

The mapping from typ string to class

Return type:

Dict[str, type]