citrine.informatics.predictors.molecular_structure_featurizer module

class citrine.informatics.predictors.molecular_structure_featurizer.MolecularStructureFeaturizer(name: str, *, description: str, input_descriptor: MolecularStructureDescriptor, features: List[str] | None = None, excludes: List[str] | None = None)

Bases: Resource[MolecularStructureFeaturizer], PredictorNode

A featurizer for molecular structures, powered by CDK.

The MolecularStructureFeaturizer will compute a configurable set of features on molecular structure data, e.g., SMILES or InChI strings. The features are computed using the Chemistry Development Kit (CDK). The features are configured using the features and excludes arguments, which accept either feature names or predefined aliases.

The default is the standard alias, corresponding to eight features that are a good balance of cost and performance:

The extended alias includes more features that may improve model performance but are slower and may dilute the signal in the features. It includes the standard set and:

Parameters:
  • name (str) – name of the configuration

  • description (str) – the description of the predictor

  • input_descriptor (MolecularStructureDescriptor) – the descriptor to featurize

  • features (List[str]) – the list of features to compute, either by name or by group alias.

  • excludes (List[str]) – list of features to exclude (accepts same set of values as features). The final set of outputs generated by the predictor is set(features) - set(excludes).

access_control_dict() dict

Return an access control entity representation of this resource. Internal use only.

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[PredictorNode]

Return the subtype.

description: str = None
excludes = None
features = None
input_descriptor = None
name: str = None
typ = 'MoleculeFeaturizer'