citrine.resources.dataset module

Resources that represent both individual and collections of datasets.

class citrine.resources.dataset.Dataset(name: str, *, summary: str | None = None, description: str | None = None, unique_name: str | None = None)

Bases: Resource[Dataset]

A collection of data objects.

Datasets are the basic unit of access control. A user with read access to a dataset can view every object in that dataset. A user with write access to a dataset can create, update, and delete objects in the dataset.

Parameters:

name (str) – Name of the dataset. Can be used for searching.
summary (Optional[str]) – An optional summary of this dataset.
description (Optional[str]) – An optional long-form description of the dataset.
unique_name (Optional[str]) – An optional, globally unique name that can be used to retrieve the dataset.

access_control_dict() → dict: Return an access control entity representation of this resource. Internal use only.

classmethod build(data: dict) → Self: Build an instance of this object from given data.

delete(uid: UUID | str | LinkByUID | DataConcepts, *, dry_run=False)

Delete a GEMD resource from the appropriate collection.

Parameters:

uid (Union[UUID, str, LinkByUID, DataConcepts]) – A representation of the resource to delete (Citrine id, LinkByUID, or the object)
dry_run (bool) – Whether to actually delete the item or run a dry run of the delete operation. Dry run is intended to be used for validation. Default: false

delete_contents(*, prompt_to_confirm: bool = True, remove_templates: bool = True, timeout: float = 120, polling_delay: float = 1.0)

Delete all the GEMD objects from within a single Dataset.

Parameters:

prompt_to_confirm (bool) – If True, prompt for user confirmation before triggering delete. Included so that a script can skip confirmation if desired, but a user will not accidentally stumble in a notebook or other REPL environment. Default: True
remove_templates (bool) – If False, templates will not be deleted along with other contents of the dataset. If true, all GEMD entities including templates will be deleted. Default: True
timeout (float) – Amount of time to wait on the job (in seconds) before giving up. Note that this number has no effect on the underlying job itself, which can also time out server-side.
polling_delay (float) – How long to delay between each polling retry attempt.

Returns:

A list of (LinkByUID, api_error) for each failure to delete an object. Note that this method doesn’t raise an exception if an object fails to be deleted.

Return type:

List[Tuple[LinkByUID, ApiError]]

dump() → dict: Dump this instance.

gemd_batch_delete(id_list: List[LinkByUID | UUID | str | BaseEntity], *, timeout: float = 120, polling_delay: float = 1.0) → List[Tuple[LinkByUID, ApiError]]

Remove a set of GEMD objects.

You may provide GEMD objects that reference each other, and the objects will be removed in the appropriate order.

A failure will be returned if the object cannot be deleted due to an external reference.

All data objects must be associated with this dataset resource. You must also have write access on this dataset.

If you wish to delete more than 50 objects, queuing of deletes requires that the types of objects be known, and thus you _must_ provide ids in the form of BaseEntities.

Also note that Attribute Templates cannot be deleted at present.

Parameters:

id_list (List[Union[LinkByUID, UUID, str, BaseEntity]]) – A list of the IDs of data objects to be removed. They can be passed as a LinkByUID tuple, a UUID, a string, or the object itself. A UUID or string is assumed to be a Citrine ID, whereas a LinkByUID or BaseEntity can also be used to provide an external ID.
timeout (float) – Amount of time to wait on the job (in seconds) before giving up. Defaults to 2 minutes. Note that this number has no effect on the underlying job itself, which can also time out server-side.
polling_delay (float) – How long to delay between each polling retry attempt.

Returns:

A list of (LinkByUID, api_error) for each failure to delete an object. Note that this method doesn’t raise an exception if an object fails to be deleted.

Return type:

List[Tuple[LinkByUID, ApiError]]

register(model: DataConcepts, *, dry_run=False) → DataConcepts: Register a data model object to the appropriate collection.

register_all(models: Iterable[DataConcepts], *, dry_run: bool = False, status_bar: bool = False, include_nested: bool = False) → List[DataConcepts]

Register multiple GEMD objects to each of their appropriate collections.

Does so in an order that is guaranteed to store all linked items before the item that references them.

If the GEMD objects have no UIDs, Citrine IDs will be assigned to them prior to passing them on to the server. This is required as otherwise there is no way to determine how objects are related to each other. When the registered objects are returned from the server, the input GEMD objects will be updated with whichever uids & _citr_auto:: tags are on the returned objects. This means GEMD objects that already exist on the server will be updated with all their on-platform uids and tags.

Parameters:

models (Iterable[DataConcepts]) – The data model objects to register. Can be different types.
dry_run (bool) – Whether to actually register the item or run a dry run of the register operation. Dry run is intended to be used for validation. Default: false
status_bar (bool) – Whether to display a status bar using the tqdm module to track progress in registration. Requires installing the optional tqdm module. Default: false
include_nested (bool) – Whether to just register the objects passed in the list, or include nested objects (e.g., obj.process, obj.spec.template, …). Default: false

Returns:

The registered versions

Return type:

List[DataConcepts]

update(model: DataConcepts) → DataConcepts: Update a data model object using the appropriate collection.

property condition_templates: ConditionTemplateCollection: Return a resource representing all condition templates in this dataset.

create_time = None

Time the dataset was created, in seconds since epoch.

Type:: int

created_by = None

ID of the user who created the dataset.

Type:: UUID

delete_time = None

Time the dataset was deleted, in seconds since epoch, if it is deleted.

Type:: int

deleted = None

Flag indicating whether or not this dataset has been deleted.

Type:: bool

deleted_by = None

ID of the user who deleted the dataset, if it is deleted.

Type:: UUID

description: str | None = None

property files: FileCollection: Return a resource representing all files in the dataset.

property gemd: GEMDResourceCollection: Return a resource representing all GEMD objects/templates in this dataset.

property ingestions: IngestionCollection: Return a resource representing all files in the dataset.

property ingredient_runs: IngredientRunCollection: Return a resource representing all ingredient runs in this dataset.

property ingredient_specs: IngredientSpecCollection: Return a resource representing all ingredient specs in this dataset.

property material_runs: MaterialRunCollection: Return a resource representing all material runs in this dataset.

property material_specs: MaterialSpecCollection: Return a resource representing all material specs in this dataset.

property material_templates: MaterialTemplateCollection: Return a resource representing all material templates in this dataset.

property measurement_runs: MeasurementRunCollection: Return a resource representing all measurement runs in this dataset.

property measurement_specs: MeasurementSpecCollection: Return a resource representing all measurement specs in this dataset.

property measurement_templates: MeasurementTemplateCollection: Return a resource representing all measurement templates in this dataset.

name: str = None

property parameter_templates: ParameterTemplateCollection: Return a resource representing all parameter templates in this dataset.

property process_runs: ProcessRunCollection: Return a resource representing all process runs in this dataset.

property process_specs: ProcessSpecCollection: Return a resource representing all process specs in this dataset.

property process_templates: ProcessTemplateCollection: Return a resource representing all process templates in this dataset.

project_id = None: project_id will be needed here until deprecation is complete. This class property will be removed post deprecation

property property_templates: PropertyTemplateCollection: Return a resource representing all property templates in this dataset.

public = None

Flag indicating whether the dataset is publicly readable.

Type:: bool

session = None

summary: str | None = None

team_id = None

uid = None

Unique uuid4 identifier of this dataset.

Type:: UUID

unique_name = None

update_time = None

Time the dataset was most recently updated, in seconds since epoch.

Type:: int

updated_by = None

ID of the user who last updated the dataset.

Type:: UUID

class citrine.resources.dataset.DatasetCollection(*args, session: Session = None, team_id: UUID = None, project_id: UUID | None = None)

Bases: Collection[Dataset]

Represents the collection of all datasets associated with a project.

Parameters:

team_id (UUID) – Unique ID of the team this dataset collection belongs to.
session (Session) – The Citrine session used to connect to the database.

build(data: dict) → Dataset

Build an individual dataset from a dictionary.

Parameters:: data (dict) – A dictionary representing the dataset.
Returns:: The dataset created from data.
Return type:: Dataset

delete(uid: UUID | str) → Response: Delete a particular element of the collection.

get(uid: UUID | str) → ResourceType: Get a particular element of the collection.

get_by_unique_name(unique_name: str) → Dataset: Get a Dataset with the given unique name.

list(*, per_page: int = 1000) → Iterator[Dataset]

List datasets using pagination.

Leaving page and per_page as default values will yield all elements in the collection, paginating over all available pages.

Parameters:: per_page (int, optional) – Max number of results to return per page. Default is 1000. This parameter is used when making requests to the backend service. If the page parameter is specified it limits the maximum number of elements in the response.
Returns:: Datasets in this collection.
Return type:: Iterator[Dataset]

register(model: Dataset) → Dataset

Create a new dataset in the collection, or update an existing one.

If the Dataset has an ID present, then we update the existing resource, else we create a new one.

This differs from super().register() in that None fields are scrubbed, and the json response is not assumed to come in a dictionary with a single entry ‘dataset’. Both of these behaviors are in contrast to the behavior of projects. Eventually they will be unified in the backend, and one register() method will suffice.

Parameters:: model (Dataset) – The dataset to register.
Returns:: A copy of the registered dataset as it now exists in the database.
Return type:: Dataset

update(model: CreationType) → CreationType: Update a particular element of the collection.