citrine.gemtables.columns module

Column definitions for GEM Tables.

class citrine.gemtables.columns.ChemicalDisplayFormat(value)

Bases: BaseEnumeration

Format to use when rendering a molecular structure.

  • SMILES Simplified molecular-input line-entry system

  • INCHI International Chemical Identifier

INCHI = 'inchi'
SMILES = 'smiles'
class citrine.gemtables.columns.Column

Bases: PolymorphicSerializable[Column]

A column in the GEM Table, defined as some operation on a variable.

Abstract type that returns the proper type given a serialized dict.

classmethod build(data: dict) SelfType

Build the underlying type.

classmethod get_type(data) Type[Serializable]

Return the subtype.

class citrine.gemtables.columns.ComponentQuantityColumn(*, data_source: str | Variable, component_name: str, normalize: bool = False)

Bases: Serializable[ComponentQuantityColumn], Column

Column that extracts the quantity of a given component.

If the component is not present in the composition, then the value in the column will be 0.0.

Parameters:
  • data_source (Union[str, Variable]) – name of the variable to use when populating the column

  • component_name (str) – name of the component from which to extract the quantity

  • normalize (bool) – whether to normalize the quantity by the sum of all component amounts. Default is false

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

component_name = None
data_source = None
normalize = None
typ = 'component_quantity_column'
class citrine.gemtables.columns.CompositionSortOrder(value)

Bases: BaseEnumeration

Order to use when sorting the components in a composition.

  • ALPHABETICAL is alpha-numeric order by the component name

  • QUANTITY is ordered from the largest to smallest quantity, with ties broken alphabetically

ALPHABETICAL = 'alphabetical'
QUANTITY = 'quantity'
class citrine.gemtables.columns.ConcatColumn(*, data_source: str | Variable, subcolumn: Column)

Bases: Serializable[ConcatColumn], Column

Column that concatenates multiple values produced by a list- or set-valued variable.

The input subcolumn need not exist elsewhere in the table config, and its parameters have no bearing on how the table is constructed. Only the type of column is relevant. That a complete Column object is required is simply a limitation of the current API.

Parameters:
  • data_source (Union[str, Variable]) – name of the variable to use when populating the column

  • subcolumn (Column) – a column of the type of the individual values to be concatenated

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
subcolumn = None
typ = 'concat_column'
class citrine.gemtables.columns.FlatCompositionColumn(*, data_source: str | Variable, sort_order: CompositionSortOrder)

Bases: Serializable[FlatCompositionColumn], Column

Column that flattens the composition into a string of names and quantities.

The numeric formatting tries to be human readable. For example, if all of the quantities are round numbers like {"spam": 4.0, "eggs": 1.0} then the result omit the decimal points like "(spam)4(eggs)1" (if sort_order is by quantity).

Parameters:
  • data_source (Union[str, Variable]) – name of the variable to use when populating the column

  • sort_order (CompositionSortOrder) – order with which to sort the components when generating the flat string

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
sort_order = None
typ = 'flat_composition_column'
class citrine.gemtables.columns.IdentityColumn(*, data_source: str | Variable)

Bases: Serializable[IdentityColumn], Column

Column containing the value of a string-valued variable.

Parameters:

data_source (Union[str, Variable]) – name of the variable to use when populating the column

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
typ = 'identity_column'
class citrine.gemtables.columns.MeanColumn(*, data_source: str | Variable, target_units: str | None = None)

Bases: Serializable[MeanColumn], Column

Column containing the mean of a real-valued variable.

Parameters:
  • data_source (Union[str, Variable]) – name of the variable to use when populating the column

  • target_units (Optional[str]) –

    units to convert the real variable into. If not specified:
    1. If there is an OriginalUnitsColumnDefinition for that source,

      no conversion will be made.

    2. If not, the real variable will be converted by using the

      default_units from the associated template.

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
target_units = None
typ = 'mean_column'
class citrine.gemtables.columns.MolecularStructureColumn(*, data_source: str | Variable, format: ChemicalDisplayFormat)

Bases: Serializable[MolecularStructureColumn], Column

Column containing a representation of a molecular structure.

Parameters:
  • data_source (Union[str, Variable]) – name of the variable to use when populating the column

  • format (ChemicalDisplayFormat) – the format in which to display the molecular structure

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
format = None
typ = 'molecular_structure_column'
class citrine.gemtables.columns.MostLikelyCategoryColumn(*, data_source: str | Variable)

Bases: Serializable[MostLikelyCategoryColumn], Column

Column containing the most likely category.

Parameters:

data_source (Union[str, Variable]) – name of the variable to use when populating the column

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
typ = 'most_likely_category_column'
class citrine.gemtables.columns.MostLikelyProbabilityColumn(*, data_source: str | Variable)

Bases: Serializable[MostLikelyProbabilityColumn], Column

Column containing the probability of the most likely category.

Parameters:

data_source (Union[str, Variable]) – name of the variable to use when populating the column

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
typ = 'most_likely_probability_column'
class citrine.gemtables.columns.NthBiggestComponentNameColumn(*, data_source: str | Variable, n: int)

Bases: Serializable[NthBiggestComponentNameColumn], Column

Name of the Nth biggest component.

If there are fewer than N components in the composition, then this column will be empty.

Parameters:
  • data_source (Union[str, Variable]) – name of the variable to use when populating the column

  • n (int) – index of the component name to extract, starting with 1 for the biggest

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
n = None
typ = 'biggest_component_name_column'
class citrine.gemtables.columns.NthBiggestComponentQuantityColumn(*, data_source: str | Variable, n: int, normalize: bool = False)

Bases: Serializable[NthBiggestComponentQuantityColumn], Column

Quantity of the Nth biggest component.

If there are fewer than N components in the composition, then this column will be empty.

Parameters:
  • data_source (Union[str, Variable]) – name of the variable to use when populating the column

  • n (int) – index of the component quantity to extract, starting with 1 for the biggest

  • normalize (bool) – whether to normalize the quantity by the sum of all component amounts. Default is false

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
n = None
normalize = None
typ = 'biggest_component_quantity_column'
class citrine.gemtables.columns.OriginalUnitsColumn(*, data_source: str | Variable)

Bases: Serializable[OriginalUnitsColumn], Column

Column containing the units as entered in the source data.

Parameters:

data_source (Union[str, Variable]) – name of the variable to use when populating the column

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
typ = 'original_units_column'
class citrine.gemtables.columns.QuantileColumn(*, data_source: str | Variable, quantile: float, target_units: str | None = None)

Bases: Serializable[QuantileColumn], Column

Column containing a quantile of the variable.

The column is populated with the quantile function of the distribution evaluated at “quantile”. For example, for a uniform distribution parameterized by a lower and upper bound, the value in the column would be:

\[lower + (upper - lower) * quantile\]

while for a normal distribution parameterized by a mean and stddev, the value would be:

\[mean + stddev * \sqrt{2} * erf^{-1}(2 * quantile - 1)\]
Parameters:
  • data_source (Union[str, Variable]) – name of the variable to use when populating the column

  • quantile (float) – the quantile to use for the column, defined between 0.0 and 1.0

  • target_units (Optional[str]) –

    units to convert the real variable into. If not specified:
    1. If there is an OriginalUnitsColumnDefinition for that source,

      no conversion will be made.

    2. If not, the real variable will be converted by using the

      default_units from the associated template.

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
quantile = None
target_units = None
typ = 'quantile_column'
class citrine.gemtables.columns.StdDevColumn(*, data_source: str | Variable, target_units: str | None = None)

Bases: Serializable[StdDevColumn], Column

Column containing the standard deviation of a real-valued variable.

Parameters:
  • data_source (Union[str, Variable]) – name of the variable to use when populating the column

  • target_units (Optional[str]) –

    units to convert the real variable into. If not specified:
    1. If there is an OriginalUnitsColumnDefinition for that source,

      no conversion will be made.

    2. If not, the real variable will be converted by using the

      default_units from the associated template.

classmethod build(data: dict) Self

Build an instance of this object from given data.

dump() dict

Dump this instance.

classmethod get_type(data) Type[Serializable]

Return the subtype.

data_source = None
target_units = None
typ = 'std_dev_column'