citrine.informatics.data_sources module
Tools for working with Descriptors.
- class citrine.informatics.data_sources.CSVDataSource(*, file_link: FileLink, column_definitions: Mapping[str, Descriptor], identifiers: List[str] | None = None)
Bases:
Serializable
[CSVDataSource
],DataSource
A data source based on a CSV file stored on the data platform.
- Parameters:
file_link (FileLink) – link to the CSV file to read the data from
column_definitions (Mapping[str, Descriptor]) – Map the column headers to the descriptors that will be used to interpret the cell contents
identifiers (Optional[List[str]]) – List of one or more column headers whose values uniquely identify a row. These may overlap with
column_definitions
if a column should be used as data and as an identifier, but this is not necessary. Identifiers must be unique within a dataset. No two rows can contain the same value.
- classmethod build(data: dict) Self
Build an instance of this object from given data.
- dump() dict
Dump this instance.
- classmethod from_data_source_id(data_source_id: str) DataSource
Build a DataSource from a datasource_id.
- classmethod get_type(data) Type[Serializable]
Return the subtype.
- to_data_source_id() str
Generate the data_source_id for this DataSource.
- column_definitions = None
- file_link = None
- identifiers = None
- typ = 'csv_data_source'
- class citrine.informatics.data_sources.DataSource
Bases:
PolymorphicSerializable
[DataSource
]A source of data for the AI engine.
Data source provides a polymorphic interface for specifying different kinds of data as the training data for predictors and the input data for some design spaces.
- classmethod build(data: dict) SelfType
Build the underlying type.
- classmethod from_data_source_id(data_source_id: str) DataSource
Build a DataSource from a datasource_id.
- classmethod get_type(data) Type[Serializable]
Return the subtype.
- abstract to_data_source_id() str
Generate the data_source_id for this DataSource.
- class citrine.informatics.data_sources.ExperimentDataSourceRef(*, datasource_id: UUID)
Bases:
Serializable
[ExperimentDataSourceRef
],DataSource
A reference to a data source based on an experiment result hosted on the data platform.
- Parameters:
datasource_id (UUID) – Unique identifier for the Experiment Data Source
- classmethod build(data: dict) Self
Build an instance of this object from given data.
- dump() dict
Dump this instance.
- classmethod from_data_source_id(data_source_id: str) DataSource
Build a DataSource from a datasource_id.
- classmethod get_type(data) Type[Serializable]
Return the subtype.
- to_data_source_id() str
Generate the data_source_id for this DataSource.
- datasource_id: UUID = None
- typ = 'experiments_data_source'
- class citrine.informatics.data_sources.GemTableDataSource(*, table_id: UUID, table_version: int | str)
Bases:
Serializable
[GemTableDataSource
],DataSource
A data source based on a GEM Table hosted on the data platform.
- Parameters:
table_id (UUID) – Unique identifier for the GEM Table
table_version (Union[str,int]) – Version number for the GEM Table. The first GEM table built from a configuration has version = 1. Strings are cast to ints.
- classmethod build(data: dict) Self
Build an instance of this object from given data.
- dump() dict
Dump this instance.
- classmethod from_data_source_id(data_source_id: str) DataSource
Build a DataSource from a datasource_id.
- classmethod from_gemtable(table: GemTable) GemTableDataSource
Generate a DataSource that corresponds to a GemTable.
- Parameters:
table (GemTable) – The GemTable object to reference
- classmethod get_type(data) Type[Serializable]
Return the subtype.
- to_data_source_id() str
Generate the data_source_id for this DataSource.
- table_id: UUID = None
- table_version: int | str = None
- typ = 'hosted_table_data_source'
- class citrine.informatics.data_sources.SnapshotDataSource(*, snapshot_id: UUID)
Bases:
Serializable
[SnapshotDataSource
],DataSource
A reference to a data source based on a Snapshot on the data platform.
- Parameters:
snapshot_id (UUID) – Unique identifier for the Snapshot Data Source
- classmethod build(data: dict) Self
Build an instance of this object from given data.
- dump() dict
Dump this instance.
- classmethod from_data_source_id(data_source_id: str) DataSource
Build a DataSource from a datasource_id.
- classmethod get_type(data) Type[Serializable]
Return the subtype.
- to_data_source_id() str
Generate the data_source_id for this DataSource.
- snapshot_id = None
- typ = 'snapshot_data_source'