citrine.resources.dataset module

Resources that represent both individual and collections of datasets.

class citrine.resources.dataset.Dataset(name: str, *, summary: str | None = None, description: str | None = None, unique_name: str | None = None)

Bases: Resource[Dataset]

A collection of data objects.

Datasets are the basic unit of access control. A user with read access to a dataset can view every object in that dataset. A user with write access to a dataset can create, update, and delete objects in the dataset.

Parameters:
  • name (str) – Name of the dataset. Can be used for searching.

  • summary (Optional[str]) – An optional summary of this dataset.

  • description (Optional[str]) – An optional long-form description of the dataset.

  • unique_name (Optional[str]) – An optional, globally unique name that can be used to retrieve the dataset.

access_control_dict() dict

Return an access control entity representation of this resource. Internal use only.

classmethod build(data: dict) Self

Build an instance of this object from given data.

delete(uid: UUID | str | LinkByUID | DataConcepts, *, dry_run=False)

Delete a GEMD resource from the appropriate collection.

Parameters:
  • uid (Union[UUID, str, LinkByUID, DataConcepts]) – A representation of the resource to delete (Citrine id, LinkByUID, or the object)

  • dry_run (bool) – Whether to actually delete the item or run a dry run of the delete operation. Dry run is intended to be used for validation. Default: false

delete_contents(*, prompt_to_confirm: bool = True, remove_templates: bool = True, timeout: float = 120, polling_delay: float = 1.0)

Delete all the GEMD objects from within a single Dataset.

Parameters:
  • prompt_to_confirm (bool) – If True, prompt for user confirmation before triggering delete. Included so that a script can skip confirmation if desired, but a user will not accidentally stumble in a notebook or other REPL environment. Default: True

  • remove_templates (bool) – If False, templates will not be deleted along with other contents of the dataset. If true, all GEMD entities including templates will be deleted. Default: True

  • timeout (float) – Amount of time to wait on the job (in seconds) before giving up. Note that this number has no effect on the underlying job itself, which can also time out server-side.

  • polling_delay (float) – How long to delay between each polling retry attempt.

Returns:

A list of (LinkByUID, api_error) for each failure to delete an object. Note that this method doesn’t raise an exception if an object fails to be deleted.

Return type:

List[Tuple[LinkByUID, ApiError]]

dump() dict

Dump this instance.

gemd_batch_delete(id_list: List[LinkByUID | UUID | str | BaseEntity], *, timeout: float = 120, polling_delay: float = 1.0) List[Tuple[LinkByUID, ApiError]]

Remove a set of GEMD objects.

You may provide GEMD objects that reference each other, and the objects will be removed in the appropriate order.

A failure will be returned if the object cannot be deleted due to an external reference.

All data objects must be associated with this dataset resource. You must also have write access on this dataset.

If you wish to delete more than 50 objects, queuing of deletes requires that the types of objects be known, and thus you _must_ provide ids in the form of BaseEntities.

Also note that Attribute Templates cannot be deleted at present.

Parameters:
  • id_list (List[Union[LinkByUID, UUID, str, BaseEntity]]) – A list of the IDs of data objects to be removed. They can be passed as a LinkByUID tuple, a UUID, a string, or the object itself. A UUID or string is assumed to be a Citrine ID, whereas a LinkByUID or BaseEntity can also be used to provide an external ID.

  • timeout (float) – Amount of time to wait on the job (in seconds) before giving up. Defaults to 2 minutes. Note that this number has no effect on the underlying job itself, which can also time out server-side.

  • polling_delay (float) – How long to delay between each polling retry attempt.

Returns:

A list of (LinkByUID, api_error) for each failure to delete an object. Note that this method doesn’t raise an exception if an object fails to be deleted.

Return type:

List[Tuple[LinkByUID, ApiError]]

register(model: DataConcepts, *, dry_run=False) DataConcepts

Register a data model object to the appropriate collection.

register_all(models: Iterable[DataConcepts], *, dry_run: bool = False, status_bar: bool = False, include_nested: bool = False) List[DataConcepts]

Register multiple GEMD objects to each of their appropriate collections.

Does so in an order that is guaranteed to store all linked items before the item that references them.

If the GEMD objects have no UIDs, Citrine IDs will be assigned to them prior to passing them on to the server. This is required as otherwise there is no way to determine how objects are related to each other. When the registered objects are returned from the server, the input GEMD objects will be updated with whichever uids & _citr_auto:: tags are on the returned objects. This means GEMD objects that already exist on the server will be updated with all their on-platform uids and tags.

Parameters:
  • models (Iterable[DataConcepts]) – The data model objects to register. Can be different types.

  • dry_run (bool) – Whether to actually register the item or run a dry run of the register operation. Dry run is intended to be used for validation. Default: false

  • status_bar (bool) – Whether to display a status bar using the tqdm module to track progress in registration. Requires installing the optional tqdm module. Default: false

  • include_nested (bool) – Whether to just register the objects passed in the list, or include nested objects (e.g., obj.process, obj.spec.template, …). Default: false

Returns:

The registered versions

Return type:

List[DataConcepts]

update(model: DataConcepts) DataConcepts

Update a data model object using the appropriate collection.

property condition_templates: ConditionTemplateCollection

Return a resource representing all condition templates in this dataset.

create_time = None

Time the dataset was created, in seconds since epoch.

Type:

int

created_by = None

ID of the user who created the dataset.

Type:

UUID

delete_time = None

Time the dataset was deleted, in seconds since epoch, if it is deleted.

Type:

int

deleted = None

Flag indicating whether or not this dataset has been deleted.

Type:

bool

deleted_by = None

ID of the user who deleted the dataset, if it is deleted.

Type:

UUID

description: str | None = None
property files: FileCollection

Return a resource representing all files in the dataset.

property gemd: GEMDResourceCollection

Return a resource representing all GEMD objects/templates in this dataset.

property ingestions: IngestionCollection

Return a resource representing all files in the dataset.

property ingredient_runs: IngredientRunCollection

Return a resource representing all ingredient runs in this dataset.

property ingredient_specs: IngredientSpecCollection

Return a resource representing all ingredient specs in this dataset.

property material_runs: MaterialRunCollection

Return a resource representing all material runs in this dataset.

property material_specs: MaterialSpecCollection

Return a resource representing all material specs in this dataset.

property material_templates: MaterialTemplateCollection

Return a resource representing all material templates in this dataset.

property measurement_runs: MeasurementRunCollection

Return a resource representing all measurement runs in this dataset.

property measurement_specs: MeasurementSpecCollection

Return a resource representing all measurement specs in this dataset.

property measurement_templates: MeasurementTemplateCollection

Return a resource representing all measurement templates in this dataset.

name: str = None
property parameter_templates: ParameterTemplateCollection

Return a resource representing all parameter templates in this dataset.

property process_runs: ProcessRunCollection

Return a resource representing all process runs in this dataset.

property process_specs: ProcessSpecCollection

Return a resource representing all process specs in this dataset.

property process_templates: ProcessTemplateCollection

Return a resource representing all process templates in this dataset.

project_id = None

project_id will be needed here until deprecation is complete. This class property will be removed post deprecation

property property_templates: PropertyTemplateCollection

Return a resource representing all property templates in this dataset.

public = None

Flag indicating whether the dataset is publicly readable.

Type:

bool

session = None
summary: str | None = None
team_id = None
uid = None

Unique uuid4 identifier of this dataset.

Type:

UUID

unique_name = None
update_time = None

Time the dataset was most recently updated, in seconds since epoch.

Type:

int

updated_by = None

ID of the user who last updated the dataset.

Type:

UUID

class citrine.resources.dataset.DatasetCollection(*args, session: Session | None = None, team_id: UUID | None = None, project_id: UUID | None = None)

Bases: Collection[Dataset]

Represents the collection of all datasets associated with a project.

Parameters:
  • team_id (UUID) – Unique ID of the team this dataset collection belongs to.

  • session (Session) – The Citrine session used to connect to the database.

build(data: dict) Dataset

Build an individual dataset from a dictionary.

Parameters:

data (dict) – A dictionary representing the dataset.

Returns:

The dataset created from data.

Return type:

Dataset

delete(uid: UUID | str) Response

Delete a particular element of the collection.

get(uid: UUID | str) ResourceType

Get a particular element of the collection.

get_by_unique_name(unique_name: str) Dataset

Get a Dataset with the given unique name.

list(*, per_page: int = 1000) Iterator[Dataset]

List datasets using pagination.

Leaving page and per_page as default values will yield all elements in the collection, paginating over all available pages.

Parameters:

per_page (int, optional) – Max number of results to return per page. Default is 1000. This parameter is used when making requests to the backend service. If the page parameter is specified it limits the maximum number of elements in the response.

Returns:

Datasets in this collection.

Return type:

Iterator[Dataset]

register(model: Dataset) Dataset

Create a new dataset in the collection, or update an existing one.

If the Dataset has an ID present, then we update the existing resource, else we create a new one.

This differs from super().register() in that None fields are scrubbed, and the json response is not assumed to come in a dictionary with a single entry ‘dataset’. Both of these behaviors are in contrast to the behavior of projects. Eventually they will be unified in the backend, and one register() method will suffice.

Parameters:

model (Dataset) – The dataset to register.

Returns:

A copy of the registered dataset as it now exists in the database.

Return type:

Dataset

update(model: CreationType) CreationType

Update a particular element of the collection.