rialto.jobs package

Submodules

rialto.jobs.decorators module

rialto.jobs.decorators.config_parser(cf_getter: Callable) Callable[source]

Config parser functions decorator.

Registers a config parsing function into a rialto job prerequisite. You can then request the job via job function arguments.

Parameters:

cf_getter – dataset reader function

Returns:

raw function, unchanged

rialto.jobs.decorators.datasource(ds_getter: Callable) Callable[source]

Dataset reader functions decorator.

Registers a data-reading function into a rialto job prerequisite. You can then request the job via job function arguments.

Parameters:

ds_getter – dataset reader function

Returns:

raw reader function, unchanged

rialto.jobs.decorators.job(*args, custom_name=None, disable_version=False)[source]

Rialto jobs decorator.

Transforms a python function into a rialto transformation, which can be imported and ran by Rialto Runner. Is mainly used as @job and the function’s name is used, and the outputs get automatic. To override this behavior, use @job(custom_name=XXX, disable_version=True).

Parameters:
  • *args

    list of positional arguments. Empty in case custom_name or disable_version is specified.

  • custom_name – str for custom job name.

  • disable_version – bool for disabling automatically filling the VERSION column in the job’s outputs.

Returns:

One more job wrapper for run function (if custom name or version override specified). Otherwise, generates Rialto Transformation Type and returns it for in-module registration.

rialto.jobs.job_base module

class rialto.jobs.job_base.JobBase[source]

Bases: Transformation

A Base Class for Rialto Jobs. Serves as a foundation into which the @job decorators are inserted.

abstractmethod get_custom_callable() Callable[source]

Getter - Custom callable (i.e. job transformation function)

abstractmethod get_disable_version() bool[source]

Disable version getter

abstractmethod get_job_metadata() JobMetadata[source]

Job metadata getter

abstractmethod get_job_name() str[source]

Job name getter

run(reader: TableReader, run_date: date, spark: SparkSession = None, config: PipelineConfig = None, metadata_manager: MetadataManager = None, feature_loader: PysparkFeatureLoader = None) DataFrame[source]

Rialto transformation run

Parameters:
  • reader – data store api object

  • info_date – date

  • spark – spark session

  • config – pipeline config

Returns:

dataframe

rialto.jobs.moduleregister module

class rialto.jobs.module_register.ModuleRegister[source]

Bases: object

Module register. Class which is used by @datasource and @config_parser decorators to register callables / getters.

Resolver, when searching for a getter for f() defined in module M, uses find_callable(“f”, “M”).

classmethod add_callable_to_module(callable, parent_name)[source]

Add a callable to the specified module’s storage.

Parameters:
  • callable – The callable to be added.

  • parent_name – The name of the module to which the callable is added.

classmethod find_callable(callable_name, module_name)[source]

Find a callable by its name in the specified module and its dependencies.

Parameters:
  • callable_name – The name of the callable to find.

  • module_name – The name of the module to search in.

Returns:

The found callable or None if not found.

classmethod register_callable(callable)[source]

Register a callable by adding it to the module’s storage.

Parameters:

callable – The callable to be registered.

classmethod register_dependency(module, parent_name)[source]

Register a module as a dependency of the caller module.

Parameters:
  • module – The module to be registered as a dependency.

  • parent_name – The module that is registering the dependency.

classmethod remove_module(module)[source]

Remove a module from the storage.

Parameters:

module – The module to be removed.

rialto.jobs.module_register.register_dependency_callable(callable)[source]

Register a callable as a dependency of the caller module.

Note that the function will be added to the module’s list of available dependencies.

Parameters:

callable – The callable to be registered as a dependency.

rialto.jobs.module_register.register_dependency_module(module)[source]

Register a module as a dependency of the caller module.

Parameters:

module – The module to be registered as a dependency.

rialto.jobs.resolver module

class rialto.jobs.resolver.Resolver[source]

Bases: object

Resolver handles dependency management between datasets and jobs.

We register different callables, which can depend on other callables. Calling resolve() we attempt to resolve these dependencies.

register_getter(callable: Callable, name: str = None) str[source]

Register callable with a given name for later resolution.

In case name isn’t present, function’s __name__ attribute will be used.

Parameters:
  • callable – callable to register (getter)

  • name – str, custom name, f.__name__ will be used otherwise

Returns:

str, name under which the callable has been registered

register_object(object: Any, name: str) None[source]

Register an object with a given name for later resolution.

Parameters:
  • object – object to register (getter)

  • name – str, custom name

Returns:

None

resolve(callable: Callable) Dict[str, Any][source]

Take a callable and resolve its dependencies / arguments. Arguments can be a) objects registered via register_object b) callables registered via register_getter c) ModuleRegister registered callables via ModuleRegister.register_callable (+ dependencies)

Arguments are resolved recursively according to requirements; For example, if we have a(b, c), b(d), and c(), d() registered, then we recursively call resolve() methods until we resolve c, d -> b -> a

Parameters:

callable – function to resolve

Returns:

result of the callable

exception rialto.jobs.resolver.ResolverException[source]

Bases: Exception

Resolver Errors Class - In Most Cases your dependency tree is not complete.

rialto.jobs.test_utils module

rialto.jobs.test_utils.disable_job_decorators(module) None[source]

Disables job decorators in a python module. Useful for testing your rialto jobs and datasources.

Parameters:

module – python module with the decorated functions.

Returns:

None

Module contents