citrine.informatics.data_sources module

Tools for working with Descriptors.

class citrine.informatics.data_sources.CSVDataSource(*, file_link: FileLink, column_definitions: Mapping[str, Descriptor], identifiers: List[str] | None = None)

Bases: Serializable[CSVDataSource], DataSource

A data source based on a CSV file stored on the data platform.

Parameters:
  • file_link (FileLink) – link to the CSV file to read the data from

  • column_definitions (Mapping[str, Descriptor]) – Map the column headers to the descriptors that will be used to interpret the cell contents

  • identifiers (Optional[List[str]]) – List of one or more column headers whose values uniquely identify a row. These may overlap with column_definitions if a column should be used as data and as an identifier, but this is not necessary. Identifiers must be unique within a dataset. No two rows can contain the same value.

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod from_data_source_id(data_source_id: str) DataSource

Build a DataSource from a datasource_id.

classmethod get_type(data) Type[Serializable]

Return the subtype.

to_data_source_id() str

Generate the data_source_id for this DataSource.

column_definitions = None
identifiers = None
typ = 'csv_data_source'
class citrine.informatics.data_sources.DataSource

Bases: PolymorphicSerializable[DataSource]

A source of data for the AI engine.

Data source provides a polymorphic interface for specifying different kinds of data as the training data for predictors and the input data for some design spaces.

classmethod build(data: dict) SelfType

Build the underlying type.

classmethod from_data_source_id(data_source_id: str) DataSource

Build a DataSource from a datasource_id.

classmethod get_type(data) Type[Serializable]

Return the subtype.

abstract to_data_source_id() str

Generate the data_source_id for this DataSource.

class citrine.informatics.data_sources.ExperimentDataSourceRef(*, datasource_id: UUID)

Bases: Serializable[ExperimentDataSourceRef], DataSource

A reference to a data source based on an experiment result hosted on the data platform.

Parameters:

datasource_id (UUID) – Unique identifier for the Experiment Data Source

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod from_data_source_id(data_source_id: str) DataSource

Build a DataSource from a datasource_id.

classmethod get_type(data) Type[Serializable]

Return the subtype.

to_data_source_id() str

Generate the data_source_id for this DataSource.

datasource_id: UUID = None
typ = 'experiments_data_source'
class citrine.informatics.data_sources.GemTableDataSource(*, table_id: UUID, table_version: int | str)

Bases: Serializable[GemTableDataSource], DataSource

A data source based on a GEM Table hosted on the data platform.

Parameters:
  • table_id (UUID) – Unique identifier for the GEM Table

  • table_version (Union[str,int]) – Version number for the GEM Table. The first GEM table built from a configuration has version = 1. Strings are cast to ints.

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod from_data_source_id(data_source_id: str) DataSource

Build a DataSource from a datasource_id.

classmethod from_gemtable(table: GemTable) GemTableDataSource

Generate a DataSource that corresponds to a GemTable.

Parameters:

table (GemTable) – The GemTable object to reference

classmethod get_type(data) Type[Serializable]

Return the subtype.

to_data_source_id() str

Generate the data_source_id for this DataSource.

table_id: UUID = None
table_version: int | str = None
typ = 'hosted_table_data_source'
class citrine.informatics.data_sources.SnapshotDataSource(*, snapshot_id: UUID)

Bases: Serializable[SnapshotDataSource], DataSource

A reference to a data source based on a Snapshot on the data platform.

Parameters:

snapshot_id (UUID) – Unique identifier for the Snapshot Data Source

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod from_data_source_id(data_source_id: str) DataSource

Build a DataSource from a datasource_id.

classmethod get_type(data) Type[Serializable]

Return the subtype.

to_data_source_id() str

Generate the data_source_id for this DataSource.

snapshot_id = None
typ = 'snapshot_data_source'