Usage > Data & Data Quality Quick Start

Prerequisites

This quick start guide presumes that you have went through :

Data Quality

Data quality is all about 2 things:

  • Spline
  • _INFO files

Spline

but as Spline is under heavy development, we will postpone extensive documentation about it. Enceladus currently runs with version 0.3.X and it works fine out of the box. Spline 0.3.X just needs to be deployed next to Menas.

Data for Spline are recorded even if the Spline UI is not up and running. This means they can be viewed later without the need to care about it now.

More about spline at Spline Github.

_INFO files

_INFO file is our way of tracking where the data came from and how much data is there. It checks mainly that no data were lost on the way to and through standardization and conformance. All this is made possible by Atum. _INFO file needs to be placed within the source directory together with the raw data.

More info about _INFO files here.

Examples of _INFO files here.

Next Spark Jobs Quick Sart