citrine.informatics.predictors.mean_property_predictor module

class citrine.informatics.predictors.mean_property_predictor.MeanPropertyPredictor(name: str, *, description: str, input_descriptor: FormulationDescriptor, properties: List[RealDescriptor | CategoricalDescriptor], p: float, impute_properties: bool, label: str | None = None, default_properties: Mapping[str, str | float] | None = None, training_data: List[DataSource] | None = None)

Bases: Resource[MeanPropertyPredictor], PredictorNode

A predictor that computes a component-weighted mean of real or categorical properties.

Each component in a formulation contributes to the mean property a weight equal to its quantity raised to the power p. For real-valued properties, the property values of each component are averaged with these weights to yield the component-weighted mean property. For categorical-valued properties, these weights are accumulated to yield a distribution over property values in the formulation.

Parameters:

name (str) – Name of the configuration
description (str) – Description of the predictor
input_descriptor (FormulationDescriptor) – Descriptor that represents the input formulation
properties (List[Union[RealDescriptor, CategoricalDescriptor]]) – List of real or categorical descriptors to featurize
p (float) – Power of the component-weighted mean. Positive, negative, and fractional powers are supported.
impute_properties (bool) – Whether to impute missing ingredient properties. If False all ingredients must define values for all featurized properties. Otherwise, the row will not be featurized. If True and no default_properties are specified, then the average over the entire dataset is used. If True and a default is specified in default_properties, then the specified default is used in place of missing values.
label (Optional[str]) – Only ingredients with this label are counted when calculating the component-weighted mean. If None (default) all ingredients will be counted.
default_properties (Optional[Mapping[str, Union[str, float]]]) – Default values to use for imputed properties. Defaults are specified as a map from descriptor key to its default value. If not specified and impute_properties == True the average over the entire dataset will be used to fill in missing values. Any specified defaults will be used in place of the average over the dataset. impute_properties must be True if default_properties are provided.
training_data (Optional[List[DataSource]]) – Sources of training data. Each can be either a CSV or an GEM Table. Candidates from multiple data sources will be combined into a flattened list and de-duplicated by uid and identifiers. De-duplication is performed if a uid or identifier is shared between two or more rows. The content of a de-duplicated row will contain the union of data across all rows that share the same uid or at least 1 identifier. Training data is unnecessary if the predictor is part of a graph that includes all training data required by this predictor.

access_control_dict() → dict: Return an access control entity representation of this resource. Internal use only.

classmethod build(data: dict) → Self: Build an instance of this object from given data.

dump() → dict: Dump this instance.

classmethod get_type(data) → Type[PredictorNode]: Return the subtype.

default_properties: Mapping[str, str | float] | None = None

description: str = None

impute_properties: bool = None

input_descriptor: FormulationDescriptor = None

label: str | None = None

name: str = None

p: float = None

properties: List[RealDescriptor | CategoricalDescriptor] = None

property training_data: [DEPRECATED] Retrieve training data associated with this node.

Deprecated since version 3.5.0: This will be removed in 4.0.0. Training data must be accessed through the top-level GraphPredictor.’

typ = 'MeanProperty'