As of 27/04 2020 the new version 2.4.0 is out
Standardization Improvements
- #1187 If input data does not adhere to schema (complex type to primitive type) it’s detected and either new error type is added to
errCol
of the defective row (default) or processing stopped with exception altogether (in case a new switch--strict-schema-check
is set totrue
) - #1187 Parquet, and and any other future format without enforced schema, now behaves similarly as XML or others on a missing column - it’s considered NULL
- #1285 Standardisation now validates _INFO file’s checkpoints. Checkpoints
Source
andRaw
are mandatory. Field names are case insensitive.
Standardization & Conformance Improvements
- #1277 Fixed the issue which caused the Kafka control measurement plugin to throw an exception at the end of a Spark job.
Streaming Improvements
- #1293 Add a check to Streaming for the default minimum Spark version
Other changes
- #1236 Integration tests for Menas are now run against an in-place-spawn embedded Mongo instead of relying on a locally running instance.
- #1280 The
DAO
module was made a thin JAR so that spark-jobs don’t depend on a fat JAR. TheDAO
artifact is now 90KB instead of 90MB. - #730 Refactored
validateSchemaPathArray
to a return free implementation - #1142 Removed
includeDisabled
parameter fromVersionedMongoRepository.getLatestVersionValue