citrine.informatics.predictors.mean_property_predictor module
- class citrine.informatics.predictors.mean_property_predictor.MeanPropertyPredictor(name: str, *, description: str, input_descriptor: FormulationDescriptor, properties: List[RealDescriptor | CategoricalDescriptor], p: float, impute_properties: bool, label: str | None = None, default_properties: Mapping[str, str | float] | None = None, training_data: List[DataSource] | None = None)
Bases:
Resource
[MeanPropertyPredictor
],PredictorNode
A predictor that computes a component-weighted mean of real or categorical properties.
Each component in a formulation contributes to the mean property a weight equal to its quantity raised to the power p. For real-valued properties, the property values of each component are averaged with these weights to yield the component-weighted mean property. For categorical-valued properties, these weights are accumulated to yield a distribution over property values in the formulation.
- Parameters:
name (str) – Name of the configuration
description (str) – Description of the predictor
input_descriptor (FormulationDescriptor) – Descriptor that represents the input formulation
properties (List[Union[RealDescriptor, CategoricalDescriptor]]) – List of real or categorical descriptors to featurize
p (float) – Power of the component-weighted mean. Positive, negative, and fractional powers are supported.
impute_properties (bool) – Whether to impute missing ingredient properties. If
False
all ingredients must define values for all featurized properties. Otherwise, the row will not be featurized. IfTrue
and nodefault_properties
are specified, then the average over the entire dataset is used. IfTrue
and a default is specified indefault_properties
, then the specified default is used in place of missing values.label (Optional[str]) – Only ingredients with this label are counted when calculating the component-weighted mean. If
None
(default) all ingredients will be counted.default_properties (Optional[Mapping[str, Union[str, float]]]) – Default values to use for imputed properties. Defaults are specified as a map from descriptor key to its default value. If not specified and
impute_properties == True
the average over the entire dataset will be used to fill in missing values. Any specified defaults will be used in place of the average over the dataset.impute_properties
must beTrue
ifdefault_properties
are provided.training_data (Optional[List[DataSource]]) – Sources of training data. Each can be either a CSV or an GEM Table. Candidates from multiple data sources will be combined into a flattened list and de-duplicated by uid and identifiers. De-duplication is performed if a uid or identifier is shared between two or more rows. The content of a de-duplicated row will contain the union of data across all rows that share the same uid or at least 1 identifier. Training data is unnecessary if the predictor is part of a graph that includes all training data required by this predictor.
- access_control_dict() dict
Return an access control entity representation of this resource. Internal use only.
- classmethod build(data: dict) Self
Build an instance of this object from given data.
- dump() dict
Dump this instance.
- classmethod get_type(data) Type[PredictorNode]
Return the subtype.
- default_properties: Mapping[str, str | float] | None = None
- description: str = None
- impute_properties: bool = None
- input_descriptor: FormulationDescriptor = None
- label: str | None = None
- name: str = None
- p: float = None
- properties: List[RealDescriptor | CategoricalDescriptor] = None
- property training_data
[DEPRECATED] Retrieve training data associated with this node.
Deprecated since version 3.5.0: This will be removed in 4.0.0. Training data must be accessed through the top-level GraphPredictor.’
- typ = 'MeanProperty'