rialto.runner package
Subpackages
Submodules
rialto.runner.config_loader module
rialto.runner.config_overrides module
rialto.runner.date_manager module
- class rialto.runner.date_manager.DateManager[source]
Bases:
objectDate generation and shifts based on configuration
- static all_dates(date_from: date, date_to: date) List[date][source]
Get list of all dates between, inclusive
- Parameters:
date_from – starting date
date_to – ending date
- Returns:
List[date]
- static date_subtract(run_date: date, units: str, value: int) date[source]
Generate starting date from given date and config
- Parameters:
run_date – base date
units – units: years, months, weeks, days
value – number of units to subtract
- Returns:
Starting date
- static run_dates(date_from: date, date_to: date, schedule: ScheduleConfig) List[date][source]
Select dates inside given interval depending on frequency and selected day
- Parameters:
date_from – interval start
date_to – interval end
schedule – schedule config
- Returns:
list of dates
rialto.runner.runner module
- class rialto.runner.runner.Runner(spark: SparkSession, config_path: str, run_date: str = None, rerun: bool = False, op: str = None, skip_dependencies: bool = False, overrides: Dict = None, merge_schema: bool = False)[source]
Bases:
objectA scheduler and dependency checker for feature runs
rialto.runner.table module
rialto.runner.transformation module
- class rialto.runner.transformation.Transformation[source]
Bases:
objectInterface for feature implementation
- abstractmethod run(reader: DataReader, run_date: date, spark: SparkSession = None, config: PipelineConfig = None, metadata_manager: MetadataManager = None, feature_loader: PysparkFeatureLoader = None) DataFrame[source]
Run the transformation
- Parameters:
reader – data store api object
run_date – date
spark – spark session
config – pipeline config
metadata_manager – metadata manager
feature_loader – feature loader
- Returns:
dataframe
rialto.runner.utils module
- rialto.runner.utils.find_dependency(config: PipelineConfig, name: str)[source]
Get dependency from config
- Parameters:
config – Pipeline configuration
name – Dependency name
- Returns:
Dependency object
- rialto.runner.utils.get_partitions(reader: DataReader, table: Table) List[date][source]
Get partition values
- Parameters:
table – Table object
- Returns:
List of partition values
- rialto.runner.utils.init_tools(spark: SparkSession, pipeline: PipelineConfig) Tuple[MetadataManager, PysparkFeatureLoader][source]
Initialize metadata manager and feature loader
- Parameters:
spark – Spark session
pipeline – Pipeline configuration
- Returns:
MetadataManager and PysparkFeatureLoader
- rialto.runner.utils.load_module(cfg: ModuleConfig) Transformation[source]
Load feature group
- Parameters:
cfg – Feature configuration
- Returns:
Transformation object