As of 27/04 2020 the new version 2.4.0 is out

Standardization Improvements

  • #1187 If input data does not adhere to schema (complex type to primitive type) it’s detected and either new error type is added to errCol of the defective row (default) or processing stopped with exception altogether (in case a new switch --strict-schema-check is set to true)
  • #1187 Parquet, and and any other future format without enforced schema, now behaves similarly as XML or others on a missing column - it’s considered NULL
  • #1285 Standardisation now validates _INFO file’s checkpoints. Checkpoints Source and Raw are mandatory. Field names are case insensitive.

Standardization & Conformance Improvements

  • #1277 Fixed the issue which caused the Kafka control measurement plugin to throw an exception at the end of a Spark job.

Streaming Improvements

  • #1293 Add a check to Streaming for the default minimum Spark version

Other changes

  • #1236 Integration tests for Menas are now run against an in-place-spawn embedded Mongo instead of relying on a locally running instance.
  • #1280 The DAO module was made a thin JAR so that spark-jobs don’t depend on a fat JAR. The DAO artifact is now 90KB instead of 90MB.
  • #730 Refactored validateSchemaPathArray to a return free implementation
  • #1142 Removed includeDisabled parameter from VersionedMongoRepository.getLatestVersionValue