Table of contents

Each of the components released by Hermes is a command-line tool. All are .jar files.

Dataset Comparison

General tool for data set comparison. It can be used as a spark job or a library. For the library-like usage check method execute and compare in DatasetComparisonJob class in za.co.absa.hermes.datasetComparison package.

If used as a library, you just provide 2 DataFrames for comparison and an optional set of unique keys and schema.

More…


Info File Comparison

A simple comparison tool for Atum’s _INFO file. It works on local FS and Hadoop FS.


E2E Runner

E2E Runner is a tool that runs end-to-end tests based on the JSON input file. E2R Runner has a plugin architecture so the type of test depends on the plugins loaded.

Existing built-in plugins are:

  • Bash command plugin
  • Dataset comparison plugin
  • Info File comparison plugin

More…