citrine.resources.dataset module
Resources that represent both individual and collections of datasets.
- class citrine.resources.dataset.Dataset(name: str, *, summary: str | None = None, description: str | None = None, unique_name: str | None = None)
Bases:
Resource
[Dataset
]A collection of data objects.
Datasets are the basic unit of access control. A user with read access to a dataset can view every object in that dataset. A user with write access to a dataset can create, update, and delete objects in the dataset.
- Parameters:
name (str) – Name of the dataset. Can be used for searching.
summary (Optional[str]) – An optional summary of this dataset.
description (Optional[str]) – An optional long-form description of the dataset.
unique_name (Optional[str]) – An optional, globally unique name that can be used to retrieve the dataset.
- access_control_dict() dict
Return an access control entity representation of this resource. Internal use only.
- classmethod build(data: dict) Self
Build an instance of this object from given data.
- delete(uid: UUID | str | LinkByUID | DataConcepts, *, dry_run=False)
Delete a GEMD resource from the appropriate collection.
- Parameters:
uid (Union[UUID, str, LinkByUID, DataConcepts]) – A representation of the resource to delete (Citrine id, LinkByUID, or the object)
dry_run (bool) – Whether to actually delete the item or run a dry run of the delete operation. Dry run is intended to be used for validation. Default: false
- delete_contents(*, prompt_to_confirm: bool = True, remove_templates: bool = True, timeout: float = 120, polling_delay: float = 1.0)
Delete all the GEMD objects from within a single Dataset.
- Parameters:
prompt_to_confirm (bool) – If True, prompt for user confirmation before triggering delete. Included so that a script can skip confirmation if desired, but a user will not accidentally stumble in a notebook or other REPL environment. Default: True
remove_templates (bool) – If False, templates will not be deleted along with other contents of the dataset. If true, all GEMD entities including templates will be deleted. Default: True
timeout (float) – Amount of time to wait on the job (in seconds) before giving up. Note that this number has no effect on the underlying job itself, which can also time out server-side.
polling_delay (float) – How long to delay between each polling retry attempt.
- Returns:
A list of (LinkByUID, api_error) for each failure to delete an object. Note that this method doesn’t raise an exception if an object fails to be deleted.
- Return type:
List[Tuple[LinkByUID, ApiError]]
- dump() dict
Dump this instance.
- gemd_batch_delete(id_list: List[LinkByUID | UUID | str | BaseEntity], *, timeout: float = 120, polling_delay: float = 1.0) List[Tuple[LinkByUID, ApiError]]
Remove a set of GEMD objects.
You may provide GEMD objects that reference each other, and the objects will be removed in the appropriate order.
A failure will be returned if the object cannot be deleted due to an external reference.
All data objects must be associated with this dataset resource. You must also have write access on this dataset.
If you wish to delete more than 50 objects, queuing of deletes requires that the types of objects be known, and thus you _must_ provide ids in the form of BaseEntities.
Also note that Attribute Templates cannot be deleted at present.
- Parameters:
id_list (List[Union[LinkByUID, UUID, str, BaseEntity]]) – A list of the IDs of data objects to be removed. They can be passed as a LinkByUID tuple, a UUID, a string, or the object itself. A UUID or string is assumed to be a Citrine ID, whereas a LinkByUID or BaseEntity can also be used to provide an external ID.
timeout (float) – Amount of time to wait on the job (in seconds) before giving up. Defaults to 2 minutes. Note that this number has no effect on the underlying job itself, which can also time out server-side.
polling_delay (float) – How long to delay between each polling retry attempt.
- Returns:
A list of (LinkByUID, api_error) for each failure to delete an object. Note that this method doesn’t raise an exception if an object fails to be deleted.
- Return type:
List[Tuple[LinkByUID, ApiError]]
- register(model: DataConcepts, *, dry_run=False) DataConcepts
Register a data model object to the appropriate collection.
- register_all(models: Iterable[DataConcepts], *, dry_run: bool = False, status_bar: bool = False, include_nested: bool = False) List[DataConcepts]
Register multiple GEMD objects to each of their appropriate collections.
Does so in an order that is guaranteed to store all linked items before the item that references them.
If the GEMD objects have no UIDs, Citrine IDs will be assigned to them prior to passing them on to the server. This is required as otherwise there is no way to determine how objects are related to each other. When the registered objects are returned from the server, the input GEMD objects will be updated with whichever uids & _citr_auto:: tags are on the returned objects. This means GEMD objects that already exist on the server will be updated with all their on-platform uids and tags.
- Parameters:
models (Iterable[DataConcepts]) – The data model objects to register. Can be different types.
dry_run (bool) – Whether to actually register the item or run a dry run of the register operation. Dry run is intended to be used for validation. Default: false
status_bar (bool) – Whether to display a status bar using the tqdm module to track progress in registration. Requires installing the optional tqdm module. Default: false
include_nested (bool) – Whether to just register the objects passed in the list, or include nested objects (e.g., obj.process, obj.spec.template, …). Default: false
- Returns:
The registered versions
- Return type:
List[DataConcepts]
- update(model: DataConcepts) DataConcepts
Update a data model object using the appropriate collection.
- property condition_templates: ConditionTemplateCollection
Return a resource representing all condition templates in this dataset.
- create_time = None
Time the dataset was created, in seconds since epoch.
- Type:
int
- created_by = None
ID of the user who created the dataset.
- Type:
UUID
- delete_time = None
Time the dataset was deleted, in seconds since epoch, if it is deleted.
- Type:
int
- deleted = None
Flag indicating whether or not this dataset has been deleted.
- Type:
bool
- deleted_by = None
ID of the user who deleted the dataset, if it is deleted.
- Type:
UUID
- description: str | None = None
- property files: FileCollection
Return a resource representing all files in the dataset.
- property gemd: GEMDResourceCollection
Return a resource representing all GEMD objects/templates in this dataset.
- property ingestions: IngestionCollection
Return a resource representing all files in the dataset.
- property ingredient_runs: IngredientRunCollection
Return a resource representing all ingredient runs in this dataset.
- property ingredient_specs: IngredientSpecCollection
Return a resource representing all ingredient specs in this dataset.
- property material_runs: MaterialRunCollection
Return a resource representing all material runs in this dataset.
- property material_specs: MaterialSpecCollection
Return a resource representing all material specs in this dataset.
- property material_templates: MaterialTemplateCollection
Return a resource representing all material templates in this dataset.
- property measurement_runs: MeasurementRunCollection
Return a resource representing all measurement runs in this dataset.
- property measurement_specs: MeasurementSpecCollection
Return a resource representing all measurement specs in this dataset.
- property measurement_templates: MeasurementTemplateCollection
Return a resource representing all measurement templates in this dataset.
- name: str = None
- property parameter_templates: ParameterTemplateCollection
Return a resource representing all parameter templates in this dataset.
- property process_runs: ProcessRunCollection
Return a resource representing all process runs in this dataset.
- property process_specs: ProcessSpecCollection
Return a resource representing all process specs in this dataset.
- property process_templates: ProcessTemplateCollection
Return a resource representing all process templates in this dataset.
- project_id = None
project_id will be needed here until deprecation is complete. This class property will be removed post deprecation
- property property_templates: PropertyTemplateCollection
Return a resource representing all property templates in this dataset.
- public = None
Flag indicating whether the dataset is publicly readable.
- Type:
bool
- session = None
- summary: str | None = None
- team_id = None
- uid = None
Unique uuid4 identifier of this dataset.
- Type:
UUID
- unique_name = None
- update_time = None
Time the dataset was most recently updated, in seconds since epoch.
- Type:
int
- updated_by = None
ID of the user who last updated the dataset.
- Type:
UUID
- class citrine.resources.dataset.DatasetCollection(*args, session: Session | None = None, team_id: UUID | None = None, project_id: UUID | None = None)
Bases:
Collection
[Dataset
]Represents the collection of all datasets associated with a project.
- Parameters:
team_id (UUID) – Unique ID of the team this dataset collection belongs to.
session (Session) – The Citrine session used to connect to the database.
- build(data: dict) Dataset
Build an individual dataset from a dictionary.
- Parameters:
data (dict) – A dictionary representing the dataset.
- Returns:
The dataset created from data.
- Return type:
- get(uid: UUID | str) ResourceType
Get a particular element of the collection.
- list(*, per_page: int = 1000) Iterator[Dataset]
List datasets using pagination.
Leaving page and per_page as default values will yield all elements in the collection, paginating over all available pages.
- Parameters:
per_page (int, optional) – Max number of results to return per page. Default is 1000. This parameter is used when making requests to the backend service. If the page parameter is specified it limits the maximum number of elements in the response.
- Returns:
Datasets in this collection.
- Return type:
Iterator[Dataset]
- register(model: Dataset) Dataset
Create a new dataset in the collection, or update an existing one.
If the Dataset has an ID present, then we update the existing resource, else we create a new one.
This differs from super().register() in that None fields are scrubbed, and the json response is not assumed to come in a dictionary with a single entry ‘dataset’. Both of these behaviors are in contrast to the behavior of projects. Eventually they will be unified in the backend, and one register() method will suffice.
- update(model: CreationType) CreationType
Update a particular element of the collection.