1. Quickstart¶
1.1. Creating Operators¶
Below are a few Operators that can be run in a Pipe for an ETL project:
from piperoni.operators.transform_operator import TransformOperator
class IntToStr(TransformOperator):
    """Invalid transformer. Changes the types."""
    def transform(self, input_: str) -> int:
        return str(input_)
class IntToInt(TransformOperator):
    """Valid transformer. Types are consistent."""
    def transform(self, input_: int) -> str:
        return int(input_)
1.2. Creating and Running a Pipe¶
The Operators above are run through the pipe below:
from piperoni.operators.pipe import Pipe
# initialize operators
step1 = StrToInt()
step2 = IntToStr()
# initialize pipe
pipe = Pipe([step1, step2])
# run pipe with input; get final data
final_data = Pipe("1")
1.3. End To End CSV Manipulation Example¶
Take an example CSV that looks like the following:
| Color | RGBColor | RGBValue | 
|---|---|---|
| darkorchid | red | 153 | 
| darkorchid | green | 50 | 
| darkorchid | blue | 204 | 
The CSV is saved in a file in the root directory called “color.csv”. The following is a walk through example of how to extract the CSV, manipulate it so that each color is a row, and save the csv to a new file.
from piperoni.operators.extract.extract_file.csv_ import CSVExtractor
from piperoni.operators.transform_operator import TransformOperator
from piperoni.operators.pipe import Pipe
import pandas as pd
from os import path as osp
# Add kwargs for pandas pivot_table
class PivotDf(TransformOperator):
    def __init__(self, **kwargs):
        self.__dict__.update(**kwargs)
        self.kwargs = kwargs
    def transform(self, input_: str):
        return pd.pivot_table(input_, **self.kwargs).reset_index().rename_axis(None, axis=1)
simplePipe = Pipe([
    # input is the filepath
    CSVExtractor(),
    # input_ is being passed in from the CSVExtractor() output, and the args defined here are for pivot_table
    PivotDf(values = 'RGBValue', index="Color", columns="RGBColor")
],
    # logging in root directory
    logging_path='')
## first input
filepath = osp.join(".",'colors.csv')
# to view transformed csv
output = simplePipe(filepath)
Now that the table has been pivoted, the new CSV will look like the following:
| Color | blue | green | red | 
|---|---|---|---|
| darkorchid | 204 | 50 | 153 | 
