Data Pre-processing

Data pre-processing has to do with the process of managing, analyzing, filtering, transforming, encoding and preparing data to be usefully processed by the machine. This is typically one of the most time-consuming aspects of the data science process. Any decisions made by the data scientist in the data pre-processing stage can have important impacts in subsequent model development and, even, deployment stages. Therefore, it is important to document analytical decisions made in this stage, whether in the code or through a documenting tool, to trace the lineage of the data from the original data source, when the data is in its raw form, to the model training and development stage.