4.8. Predictor Reports

Training a predictor generally produces a set of inter-connected models. A predictor report describes those models, for example their settings and what features are important to the model. It does not include predictor evaluation metrics. To learn more about predictor evaluation metrics, please see PredictorEvaluationMetrics. The report can be accessed via predictor.report.

A task to generate a predictor report is scheduled when a predictor is registered. The report has a status and json member variables. Status can be one of:

PENDING The report has been scheduled.
ERROR An error was thrown while generating the report.
OK The report was generated successfully and the results are ready.

Results of the report are in the descriptors and model_summaries attributes. descriptors is a list of Descriptor objects that may be inputs or outputs to models in the predictor.

model_summaries is a list of ModelSummary objects, each one corresponding to a single model in the predictor. Each ModelSummary includes the name of the model, a list of input descriptors, a list of output descriptors, the model’s settings, and its feature importances. model_settings is a dictionary of settings and values, the details of which depend on the type of model. One possible model settings dictionary is shown below:

{
    'Algorithm': 'Ensemble of non-linear estimators',
    'Number of estimators': 64,
    'Use jackknife': True
}

feature_importances is a list of FeatureImportanceReport objects, each of which corresponds to a single output of the model. It has fields output_key, which is the key of the output descriptor, and importances, which is a dictionary from input keys to their importance. The input and output keys correspond to descriptors that can be found in the predictor report’s descriptors field. An example is shown below:

{
    'output_key': 'shear modulus',
    'importances': {
        'Young's modulus': 0.85,
        'Poisson's ratio': 0.15
    }
}

For simple models, such as those that featurize inputs, the model_settings and feature_importances fields might be empty.

As an example, consider a AutoMLPredictor with numeric inputs x and y and numeric output z. This predictor will produce a single model to predict z from x and y. In cases involving multiple ml predictors and/or input featurization, more models will be produced. The code below shows how to create the predictor, register it, and view the report. Assume that there is a training data table with known id and version.

from time import sleep

from citrine.informatics.predictors import GraphPredictor, AutoMLPredictor
from citrine.informatics.descriptors import RealDescriptor
from citrine.informatics.data_sources import GemTableDataSource

# create input descriptors
x = RealDescriptor(key='x', lower_bound=0, upper_bound=10, units='')
y = RealDescriptor(key='y', lower_bound=0, upper_bound=10, units='')
z = RealDescriptor(key='z', lower_bound=0, upper_bound=10, units='')

# create ML predictor
auto_ml_predictor = AutoMLPredictor(
   name='ML predictor for z',
   description='Predicts z from x and y',
   inputs=[x, y],
   outputs=[z],
   training_data=[GemTableDataSource(
    table_id = training_table_id,
    table_version = training_table_version
    )]
)

# register a predictor with a project
predictor = project.predictors.register(
   GraphPredictor(
       name='ML predictor for z',
       description='Predicts z from x and y',
       predictors=[auto_ml_predictor]
   )
)

# wait for the predictor report to be ready
while project.predictors.get(predictor.uid).report.status == 'PENDING':
   sleep(10)

# print the json report
report = project.predictors.get(predictor.uid).report
print(report.json)