As of 16/10 2019 the new version 1.3.0 is out

Menas Improvements

  • #884 New monitoring endpoints added, including a healthcheck endpoint .../menas/admin/health (See Menas documentation for details)
  • #935 Menas environment name is now displayed in the browser’s tab.
  • #945 REST calls to Menas creating Run objects now validate that Dataset name and version are provided.

Standardization Improvements

  • #398 Renaming in Standardization (using sourcecolumn metadata key) is now more robust:
    • Rename does not run the danger of name clashes with other columns anymore.
    • Danger of ErrorCol name clash has been removed.
    • Nested column renames are now registered with ATUM.
    • One column can be a source for multiple standardized columns.
  • #887 The schema metadata key allowinfinity now works for Float and Double. The possible values for the key are "true" and "false" (Boolean in String format), when not specified "false" is considered default. If set to "true"" it allows the column to be marked as infinity/-infinity for values which are too big.

Scripts Improvements

  • #899 Run scripts now allow you to control the storage level of dataframe persistence with the --persist-storage-level option. This can help fine-tuning Standardization and Conformance jobs for specific datasets. (See README for details)
  • #922 Run scripts now allow further spark tuning with these options: --executor-cores, --conf-spark-executor-memoryOverhead, --conf-spark-memory-fraction. (See README for details)
  • #926 Command line autocompletion is now available for the Standradization and Conformance run scripts when the enceladus_autocompletions.sh script is sourced along with a $PATH_TO_SCRIPTS environment variable.
  • #929 Added validation to run scripts to ensure authentication credentials are provided.
  • #953 Handling of environment-specific Spark configurations (ADDITIONAL_SPARK_CONF) is now done separately from JVM configurations (ADDITIONAL_JVM_CONF) in the enceladus_env.sh script.

Model Migration Improvements

  • #912 Migration CLI tools now have logging enabled.
  • #920 Fixed issue with Continuous/Incremental Migration process hanging at the end.
  • #921, #931 Fixed potential memory issues in the migration framework.

Other Improvements

  • #492 Rewrote the DAO module to improve maintainability of Standardization and Conformance’s communication with Menas.
  • #868 Changed the underlying version of Spark to 2.4.4, which is now the recommended stable release by Databricks.
  • #914 Removed selenium module. It did not reflect current projects state.
  • #941 Run metadata is now added at the beginning of a Spark job so if the job fails users have additional information for debugging. This includes additional Spark configuration parameters of the Run.
  • #942 At the end of Standardization and Conformance jobs, the Menas URL (for the API and UI) of the concluded run will now be logged.
  • #968 Standardization and Conformance runs now provide additional metadata about the input and output data sizes. These are similar to input and output directory sizes but do not include hidden files (_INFO, _LINEAGE, etc)