citrine.informatics.predictors.molecular_structure_featurizer module
- class citrine.informatics.predictors.molecular_structure_featurizer.MolecularStructureFeaturizer(name: str, *, description: str, input_descriptor: MolecularStructureDescriptor, features: List[str] | None = None, excludes: List[str] | None = None)
Bases:
Resource
[MolecularStructureFeaturizer
],PredictorNode
A featurizer for molecular structures, powered by CDK.
The MolecularStructureFeaturizer will compute a configurable set of features on molecular structure data, e.g., SMILES or InChI strings. The features are computed using the Chemistry Development Kit (CDK). The features are configured using the
features
andexcludes
arguments, which accept either feature names or predefined aliases.The default is the standard alias, corresponding to eight features that are a good balance of cost and performance:
The
extended
alias includes more features that may improve model performance but are slower and may dilute the signal in the features. It includes thestandard
set and:- Parameters:
name (str) – name of the configuration
description (str) – the description of the predictor
input_descriptor (MolecularStructureDescriptor) – the descriptor to featurize
features (List[str]) – the list of features to compute, either by name or by group alias.
excludes (List[str]) – list of features to exclude (accepts same set of values as features). The final set of outputs generated by the predictor is set(features) - set(excludes).
- access_control_dict() dict
Return an access control entity representation of this resource. Internal use only.
- classmethod build(data: dict) Self
Build an instance of this object from given data.
- dump() dict
Dump this instance.
- classmethod get_type(data) Type[PredictorNode]
Return the subtype.
- description: str = None
- excludes = None
- features = None
- input_descriptor = None
- name: str = None
- typ = 'MoleculeFeaturizer'