citrine.informatics.predictors.auto_ml_predictor module
- class citrine.informatics.predictors.auto_ml_predictor.AutoMLEstimator(value)
Bases:
BaseEnumeration
[ALPHA] Algorithms to be used during AutoML model selection.
- LINEAR corresponds to a linear regression estimator
(valid for single-task regression problems)
- RANDOM_FOREST corresponds to a random forest estimator
(valid for single-task and multi-task regression and classification problems)
- GAUSSIAN_PROCESS corresponds to a Gaussian process estimator
(valid for single-task regression and classification problems)
- SUPPORT_VECTOR_MACHINE corresponds to an support machine estimator
(valid for single-task classification problems)
ALL combines all estimator choices (valid for all learning tasks)
- ALL = 'ALL'
- GAUSSIAN_PROCESS = 'GAUSSIAN_PROCESS'
- LINEAR = 'LINEAR'
- RANDOM_FOREST = 'RANDOM_FOREST'
- SUPPORT_VECTOR_MACHINE = 'SUPPORT_VECTOR_MACHINE'
- class citrine.informatics.predictors.auto_ml_predictor.AutoMLPredictor(name: str, *, description: str, outputs: List[Descriptor], inputs: List[Descriptor], estimators: Set[AutoMLEstimator] | None = None, training_data: List[DataSource] | None = None)
Bases:
Resource
[AutoMLPredictor
],PredictorNode
A predictor interface that builds a single ML model.
The model uses the set of inputs to predict the output(s). Only one machine learning model is built.
- Parameters:
name (str) – name of the configuration
description (str) – the description of the predictor
inputs (list[Descriptor]) – Descriptors that represent inputs to the model
outputs (list[Descriptor]) – Descriptors that represents the output(s) of the model.
estimators (Optional[Set[AutoMLEstimator]]) – Set of estimators to consider during during AutoML model selection. If None is provided, defaults to AutoMLEstimator.RANDOM_FOREST.
training_data (Optional[List[DataSource]] (deprecated)) – Sources of training data. Each can be either a CSV or an GEM Table. Candidates from multiple data sources will be combined into a flattened list and de-duplicated by uid and identifiers. De-duplication is performed if a uid or identifier is shared between two or more rows. The content of a de-duplicated row will contain the union of data across all rows that share the same uid or at least 1 identifier. Training data is unnecessary if the predictor is part of a graph that includes all training data required by this predictor.
- access_control_dict() dict
Return an access control entity representation of this resource. Internal use only.
- classmethod build(data: dict) Self
Build an instance of this object from given data.
- dump() dict
Dump this instance.
- classmethod get_type(data) Type[PredictorNode]
Return the subtype.
- description: str = None
- estimators: Set[AutoMLEstimator] = {AutoMLEstimator.RANDOM_FOREST}
- inputs: List[Descriptor] = None
- name: str = None
- outputs = None
- property training_data
[DEPRECATED] Retrieve training data associated with this node.
Deprecated since version 3.5.0: This will be removed in 4.0.0. Training data must be accessed through the top-level GraphPredictor.’
- typ = 'AutoML'