2.2. Transformers¶
Transformers are operators that change data while keeping input/output datatypes the same.
2.2.1. HeaderMap¶
The HeaderMap transformer is used to change the column headers of an input pandas DataFrame.
2.2.2. Normalizer¶
The Normalizer transformer is used to shift the mean of an input pandas DataFrame.
2.2.3. Custom Transformers¶
You may very often need to make your own custom transformer which will be a superclass of
TransformOperator. You need to define your own transform
method that transforms the data and
returns it as the same datatype.
Below is an example of a custom extractor using the base FileExtractor. It reads a csv separated by tabs, downscopes to a few specific columns, and returns a pandas DataFrame:
from piperoni.operators.transform_operator import TransformOperator
from piperoni.operators.pipe import Pipe
from pandas import DataFrame
class IncrementNumericByOneTransformer(TransformOperator):
def transform(self, input_: DataFrame) -> DataFrame:
df = deepcopy(input_)
numeric_cols = [col for col in df if df[col].dtype.kind != 'O']
df[numeric_cols] += 1
return df
extractor_pipe = Pipe(
[
MyCustomExtractor(),
IncrementNumericByOneTransformer(),
]
)