1. Quickstart

1.1. Creating Operators

Below are a few Operators that can be run in a Pipe for an ETL project:

from piperoni.operators.transform_operator import TransformOperator

class IntToStr(TransformOperator):
    """Invalid transformer. Changes the types."""

    def transform(self, input_: str) -> int:
        return str(input_)


class IntToInt(TransformOperator):
    """Valid transformer. Types are consistent."""

    def transform(self, input_: int) -> str:
        return int(input_)

1.2. Creating and Running a Pipe

The Operators above are run through the pipe below:

from piperoni.operators.pipe import Pipe

# initialize operators
step1 = StrToInt()
step2 = IntToStr()

# initialize pipe
pipe = Pipe([step1, step2])

# run pipe with input; get final data
final_data = Pipe("1")

1.3. End To End CSV Manipulation Example

Take an example CSV that looks like the following:

Color

RGBColor

RGBValue

darkorchid

red

153

darkorchid

green

50

darkorchid

blue

204

The CSV is saved in a file in the root directory called “color.csv”. The following is a walk through example of how to extract the CSV, manipulate it so that each color is a row, and save the csv to a new file.

from piperoni.operators.extract.extract_file.csv_ import CSVExtractor
from piperoni.operators.transform_operator import TransformOperator
from piperoni.operators.pipe import Pipe
import pandas as pd
from os import path as osp


# Add kwargs for pandas pivot_table
class PivotDf(TransformOperator):
    def __init__(self, **kwargs):
        self.__dict__.update(**kwargs)
        self.kwargs = kwargs

    def transform(self, input_: str):
        return pd.pivot_table(input_, **self.kwargs).reset_index().rename_axis(None, axis=1)



simplePipe = Pipe([
    # input is the filepath
    CSVExtractor(),

    # input_ is being passed in from the CSVExtractor() output, and the args defined here are for pivot_table
    PivotDf(values = 'RGBValue', index="Color", columns="RGBColor")
],
    # logging in root directory
    logging_path='')

## first input
filepath = osp.join(".",'colors.csv')

# to view transformed csv
output = simplePipe(filepath)

Now that the table has been pivoted, the new CSV will look like the following:

Color

blue

green

red

darkorchid

204

50

153