As of 16/10 2019 the new version 1.3.0 is out
Menas Improvements
- #884 New monitoring endpoints added, including a healthcheck endpoint
.../menas/admin/health
(See Menas documentation for details) - #935 Menas environment name is now displayed in the browser’s tab.
- #945 REST calls to
Menas
creatingRun
objects now validate thatDataset
name and version are provided.
Standardization Improvements
- #398 Renaming in
Standardization
(usingsourcecolumn
metadata key) is now more robust:- Rename does not run the danger of name clashes with other columns anymore.
- Danger of
ErrorCol
name clash has been removed. - Nested column renames are now registered with ATUM.
- One column can be a source for multiple standardized columns.
- #887 The schema metadata key
allowinfinity
now works forFloat
andDouble
. The possible values for the key are"true"
and"false"
(Boolean in String format), when not specified"false"
is considered default. If set to"true""
it allows the column to be marked asinfinity/-infinity
for values which are too big.
Scripts Improvements
- #899 Run scripts now allow you to control the storage level of dataframe persistence with the
--persist-storage-level
option. This can help fine-tuningStandardization
andConformance
jobs for specific datasets. (See README for details) - #922 Run scripts now allow further spark tuning with these options:
--executor-cores
,--conf-spark-executor-memoryOverhead
,--conf-spark-memory-fraction
. (See README for details) - #926 Command line autocompletion is now available for the
Standradization
andConformance
run scripts when theenceladus_autocompletions.sh
script is sourced along with a$PATH_TO_SCRIPTS
environment variable. - #929 Added validation to run scripts to ensure authentication credentials are provided.
- #953 Handling of environment-specific Spark configurations (
ADDITIONAL_SPARK_CONF
) is now done separately from JVM configurations (ADDITIONAL_JVM_CONF
) in theenceladus_env.sh
script.
Model Migration Improvements
- #912 Migration CLI tools now have logging enabled.
- #920 Fixed issue with Continuous/Incremental Migration process hanging at the end.
- #921, #931 Fixed potential memory issues in the migration framework.
Other Improvements
- #492 Rewrote the DAO module to improve maintainability of
Standardization
andConformance
’s communication withMenas
. - #868 Changed the underlying version of Spark to 2.4.4, which is now the recommended stable release by Databricks.
- #914 Removed selenium module. It did not reflect current projects state.
- #941
Run
metadata is now added at the beginning of a Spark job so if the job fails users have additional information for debugging. This includes additional Spark configuration parameters of theRun
. - #942 At the end of
Standardization
andConformance
jobs, theMenas
URL (for the API and UI) of the concluded run will now be logged. - #968
Standardization
andConformance
runs now provide additional metadata about the input and output data sizes. These are similar to input and output directory sizes but do not include hidden files (_INFO
,_LINEAGE
, etc)