1. Quickstart¶
1.1. Creating Operators¶
Below are a few Operators that can be run in a Pipe for an ETL project:
from piperoni.operators.transform_operator import TransformOperator
class IntToStr(TransformOperator):
"""Invalid transformer. Changes the types."""
def transform(self, input_: str) -> int:
return str(input_)
class IntToInt(TransformOperator):
"""Valid transformer. Types are consistent."""
def transform(self, input_: int) -> str:
return int(input_)
1.2. Creating and Running a Pipe¶
The Operators above are run through the pipe below:
from piperoni.operators.pipe import Pipe
# initialize operators
step1 = StrToInt()
step2 = IntToStr()
# initialize pipe
pipe = Pipe([step1, step2])
# run pipe with input; get final data
final_data = Pipe("1")
1.3. End To End CSV Manipulation Example¶
Take an example CSV that looks like the following:
Color |
RGBColor |
RGBValue |
---|---|---|
darkorchid |
red |
153 |
darkorchid |
green |
50 |
darkorchid |
blue |
204 |
The CSV is saved in a file in the root directory called “color.csv”. The following is a walk through example of how to extract the CSV, manipulate it so that each color is a row, and save the csv to a new file.
from piperoni.operators.extract.extract_file.csv_ import CSVExtractor
from piperoni.operators.transform_operator import TransformOperator
from piperoni.operators.pipe import Pipe
import pandas as pd
from os import path as osp
# Add kwargs for pandas pivot_table
class PivotDf(TransformOperator):
def __init__(self, **kwargs):
self.__dict__.update(**kwargs)
self.kwargs = kwargs
def transform(self, input_: str):
return pd.pivot_table(input_, **self.kwargs).reset_index().rename_axis(None, axis=1)
simplePipe = Pipe([
# input is the filepath
CSVExtractor(),
# input_ is being passed in from the CSVExtractor() output, and the args defined here are for pivot_table
PivotDf(values = 'RGBValue', index="Color", columns="RGBColor")
],
# logging in root directory
logging_path='')
## first input
filepath = osp.join(".",'colors.csv')
# to view transformed csv
output = simplePipe(filepath)
Now that the table has been pivoted, the new CSV will look like the following:
Color |
blue |
green |
red |
---|---|---|---|
darkorchid |
204 |
50 |
153 |