citrine.informatics.predictors.auto_ml_predictor module

class citrine.informatics.predictors.auto_ml_predictor.AutoMLEstimator(value)

Bases: BaseEnumeration

[ALPHA] Algorithms to be used during AutoML model selection.

  • LINEAR corresponds to a linear regression estimator

    (valid for single-task regression problems)

  • RANDOM_FOREST corresponds to a random forest estimator

    (valid for single-task and multi-task regression and classification problems)

  • GAUSSIAN_PROCESS corresponds to a Gaussian process estimator

    (valid for single-task regression and classification problems)

  • SUPPORT_VECTOR_MACHINE corresponds to an support machine estimator

    (valid for single-task classification problems)

  • ALL combines all estimator choices (valid for all learning tasks)

ALL = 'ALL'
GAUSSIAN_PROCESS = 'GAUSSIAN_PROCESS'
LINEAR = 'LINEAR'
RANDOM_FOREST = 'RANDOM_FOREST'
SUPPORT_VECTOR_MACHINE = 'SUPPORT_VECTOR_MACHINE'
class citrine.informatics.predictors.auto_ml_predictor.AutoMLPredictor(name: str, *, description: str, outputs: List[Descriptor], inputs: List[Descriptor], estimators: Set[AutoMLEstimator] | None = None, training_data: List[DataSource] | None = None)

Bases: Resource[AutoMLPredictor], PredictorNode

A predictor interface that builds a single ML model.

The model uses the set of inputs to predict the output(s). Only one machine learning model is built.

Parameters:
  • name (str) – name of the configuration

  • description (str) – the description of the predictor

  • inputs (list[Descriptor]) – Descriptors that represent inputs to the model

  • outputs (list[Descriptor]) – Descriptors that represents the output(s) of the model. Currently, only one output Descriptor is supported.

  • estimators (Optional[Set[AutoMLEstimator]]) – Set of estimators to consider during during AutoML model selection. If None is provided, defaults to AutoMLEstimator.RANDOM_FOREST.

  • training_data (Optional[List[DataSource]]) – Sources of training data. Each can be either a CSV or an GEM Table. Candidates from multiple data sources will be combined into a flattened list and de-duplicated by uid and identifiers. De-duplication is performed if a uid or identifier is shared between two or more rows. The content of a de-duplicated row will contain the union of data across all rows that share the same uid or at least 1 identifier. Training data is unnecessary if the predictor is part of a graph that includes all training data required by this predictor.

access_control_dict() dict

Return an access control entity representation of this resource. Internal use only.

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[PredictorNode]

Return the subtype.

description: str = None
estimators: Set[AutoMLEstimator] = {AutoMLEstimator.RANDOM_FOREST}
inputs: List[Descriptor] = None
name: str = None
outputs = None
training_data: List[DataSource] = []
typ = 'AutoML'