2.2. Transformers

Transformers are operators that change data while keeping input/output datatypes the same.

2.2.1. HeaderMap

The HeaderMap transformer is used to change the column headers of an input pandas DataFrame.

2.2.2. Normalizer

The Normalizer transformer is used to shift the mean of an input pandas DataFrame.

2.2.3. Custom Transformers

You may very often need to make your own custom transformer which will be a superclass of TransformOperator. You need to define your own transform method that transforms the data and returns it as the same datatype.

Below is an example of a custom extractor using the base FileExtractor. It reads a csv separated by tabs, downscopes to a few specific columns, and returns a pandas DataFrame:

from piperoni.operators.transform_operator import TransformOperator
from piperoni.operators.pipe import Pipe
from pandas import DataFrame

class IncrementNumericByOneTransformer(TransformOperator):
    def transform(self, input_: DataFrame) -> DataFrame:
        df = deepcopy(input_)
        numeric_cols = [col for col in df if df[col].dtype.kind != 'O']
        df[numeric_cols] += 1
        return df

extractor_pipe = Pipe(
    [
        MyCustomExtractor(),
        IncrementNumericByOneTransformer(),
    ]
)