4.10. [ALPHA] Generative Design Execution

The Citrine Platform offers a Generative Design Execution tool that allows the creation of new molecules by applying mutations to a set of given seed molecules. To use this feature, you need to provide a set of starting molecules and filtering parameters using the GenerativeDesignInput class.

The class requires you to define the seed molecules for generating mutations, the fingerprint type used to calculate the fingerprint similarity, the minimum fingerprint similarity between the seed and mutated molecule, the number of initial mutations attempted per seed, and the minimum substructure counts for each mutated molecule.

Various fingerprint types are available on the Citrine Platform, including Atom Pairs (AP), Path-Length Connectivity (PHCO), Binary Path (BPF), Paths of Atoms of Heteroatoms (PATH), Extended Connectivity Fingerprint with radius 4 (ECFP4) and radius 6 (ECFP6), and Focused Connectivity Fingerprint with radius 4 (FCFP4) and radius 6 (FCFP6). Each fingerprint type captures different aspects of molecular structure and influences the generated mutations. You can access these fingerprint types through the FingerprintType enum, like FingerprintType.ECFP4.

The structure_exclusions parameter allows you to control the structural features of mutated molecules. It is a sequence of exclusion types corresponding to the types of structural features or elements to exclude from the list of possible mutation steps during the generative design process. If a type is present in the sequence, the mutation steps generated by the process will avoid using that feature or element. The available structure exclusion options can be found in the StructureExclusion class.

The min_substructure_counts parameter is a dictionary for constraining which substructures (represented by SMARTS strings, not SMILES strings) must appear in each mutated molecule, along with integer values representing the minimum number of times each substructure must appear in a molecule to be considered a valid mutation. SMARTS are necessary here because they are designed to more precisely identify molecular sub-structures in ways that SMILES cannot (SMILES are intended for whole molecules, not substructures).

After the generative design process is complete, the mutations are filtered based on their similarity to the starting seed molecules. Mutations that do not meet the similarity threshold or are duplicates will be discarded. The remaining mutations are returned as a subset of the original mutations in the form of a list of GenerativeDesignResult objects. These results contain information about the seed molecule, the mutation, the similarity score, and the fingerprint type used during execution.

After triggering the execution and waiting for completion, the user can retrieve the results and utilize them in their work. The following example demonstrates how to run a generative design execution on the Citrine Platform:

from citrine.jobs.waiting import wait_while_executing
from citrine.informatics.generative_design import GenerativeDesignInput, FingerprintType, StructureExclusion

# Trigger a new generative design execution
generative_design_input = GenerativeDesignInput(
    seeds=["CCCCCCCCCCCCCCCCCC(=O)O", "CCCCCCCC\C=C/CCCCCCCC(O)=O"],  # stearic acid and oleic acid
    fingerprint_type=FingerprintType.ECFP4,
    min_fingerprint_similarity=0.1,
    mutation_per_seed=1000,
    structure_exclusions=[
        StructureExclusion.BROMINE,
        StructureExclusion.CHLORINE,
        StructureExclusion.TRIPLE_BONDS,
        StructureExclusion.IONS,
    ],
    min_substructure_counts={"[#8X2H]-[#6](=[#8])": 1}  # enforce that a carboxyl group must be present
)
generative_design_execution = project.generative_design_executions.trigger(
    generative_design_input
)
execution = wait_while_executing(
    collection=project.generative_design_executions, execution=generative_design_execution, print_status_info=True
)
generated = execution.results()
mutations = [(gen.seed, gen.mutated) for gen in generated]

# Or get a completed execution by ID
execution_id = execution.uid
execution = project.generative_design_executions.get(execution_id)
generated = execution.results()
mutations = [(gen.seed, gen.mutated) for gen in generated]