As of 16/10 2019 the new version 1.3.0 is out
Menas Improvements
- #884 New monitoring endpoints added, including a healthcheck endpoint
.../menas/admin/health(See Menas documentation for details) - #935 Menas environment name is now displayed in the browser’s tab.
- #945 REST calls to
MenascreatingRunobjects now validate thatDatasetname and version are provided.
Standardization Improvements
- #398 Renaming in
Standardization(usingsourcecolumnmetadata key) is now more robust:- Rename does not run the danger of name clashes with other columns anymore.
- Danger of
ErrorColname clash has been removed. - Nested column renames are now registered with ATUM.
- One column can be a source for multiple standardized columns.
- #887 The schema metadata key
allowinfinitynow works forFloatandDouble. The possible values for the key are"true"and"false"(Boolean in String format), when not specified"false"is considered default. If set to"true""it allows the column to be marked asinfinity/-infinityfor values which are too big.
Scripts Improvements
- #899 Run scripts now allow you to control the storage level of dataframe persistence with the
--persist-storage-leveloption. This can help fine-tuningStandardizationandConformancejobs for specific datasets. (See README for details) - #922 Run scripts now allow further spark tuning with these options:
--executor-cores,--conf-spark-executor-memoryOverhead,--conf-spark-memory-fraction. (See README for details) - #926 Command line autocompletion is now available for the
StandradizationandConformancerun scripts when theenceladus_autocompletions.shscript is sourced along with a$PATH_TO_SCRIPTSenvironment variable. - #929 Added validation to run scripts to ensure authentication credentials are provided.
- #953 Handling of environment-specific Spark configurations (
ADDITIONAL_SPARK_CONF) is now done separately from JVM configurations (ADDITIONAL_JVM_CONF) in theenceladus_env.shscript.
Model Migration Improvements
- #912 Migration CLI tools now have logging enabled.
- #920 Fixed issue with Continuous/Incremental Migration process hanging at the end.
- #921, #931 Fixed potential memory issues in the migration framework.
Other Improvements
- #492 Rewrote the DAO module to improve maintainability of
StandardizationandConformance’s communication withMenas. - #868 Changed the underlying version of Spark to 2.4.4, which is now the recommended stable release by Databricks.
- #914 Removed selenium module. It did not reflect current projects state.
- #941
Runmetadata is now added at the beginning of a Spark job so if the job fails users have additional information for debugging. This includes additional Spark configuration parameters of theRun. - #942 At the end of
StandardizationandConformancejobs, theMenasURL (for the API and UI) of the concluded run will now be logged. - #968
StandardizationandConformanceruns now provide additional metadata about the input and output data sizes. These are similar to input and output directory sizes but do not include hidden files (_INFO,_LINEAGE, etc)