citrine.informatics.predictors.mean_property_predictor module

class citrine.informatics.predictors.mean_property_predictor.MeanPropertyPredictor(name: str, *, description: str, input_descriptor: FormulationDescriptor, properties: List[RealDescriptor | CategoricalDescriptor], p: float, impute_properties: bool, label: str | None = None, default_properties: Mapping[str, str | float] | None = None, training_data: List[DataSource] | None = None)

Bases: Resource[MeanPropertyPredictor], PredictorNode

A predictor that computes a component-weighted mean of real or categorical properties.

Each component in a formulation contributes to the mean property a weight equal to its quantity raised to the power p. For real-valued properties, the property values of each component are averaged with these weights to yield the component-weighted mean property. For categorical-valued properties, these weights are accumulated to yield a distribution over property values in the formulation.

Parameters:
  • name (str) – Name of the configuration

  • description (str) – Description of the predictor

  • input_descriptor (FormulationDescriptor) – Descriptor that represents the input formulation

  • properties (List[Union[RealDescriptor, CategoricalDescriptor]]) – List of real or categorical descriptors to featurize

  • p (float) – Power of the component-weighted mean. Positive, negative, and fractional powers are supported.

  • impute_properties (bool) – Whether to impute missing ingredient properties. If False all ingredients must define values for all featurized properties. Otherwise, the row will not be featurized. If True and no default_properties are specified, then the average over the entire dataset is used. If True and a default is specified in default_properties, then the specified default is used in place of missing values.

  • label (Optional[str]) – Only ingredients with this label are counted when calculating the component-weighted mean. If None (default) all ingredients will be counted.

  • default_properties (Optional[Mapping[str, Union[str, float]]]) – Default values to use for imputed properties. Defaults are specified as a map from descriptor key to its default value. If not specified and impute_properties == True the average over the entire dataset will be used to fill in missing values. Any specified defaults will be used in place of the average over the dataset. impute_properties must be True if default_properties are provided.

  • training_data (Optional[List[DataSource]]) – Sources of training data. Each can be either a CSV or an GEM Table. Candidates from multiple data sources will be combined into a flattened list and de-duplicated by uid and identifiers. De-duplication is performed if a uid or identifier is shared between two or more rows. The content of a de-duplicated row will contain the union of data across all rows that share the same uid or at least 1 identifier. Training data is unnecessary if the predictor is part of a graph that includes all training data required by this predictor.

access_control_dict() dict

Return an access control entity representation of this resource. Internal use only.

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[PredictorNode]

Return the subtype.

default_properties: Mapping[str, str | float] | None = None
description: str = None
impute_properties: bool = None
input_descriptor: FormulationDescriptor = None
label: str | None = None
name: str = None
p: float = None
properties: List[RealDescriptor | CategoricalDescriptor] = None
training_data: List[DataSource] = []
typ = 'MeanProperty'