rialto.common package

Submodules

rialto.common.table_reader module

class rialto.common.table_reader.DataReader[source]

Bases: object

This is an abstract class defining interface for reader of spark tables

Data reader provides to public functions, get_latest and get_table. get_latest reads a single snapshot of the given table, while get_table reads the whole table or multiple snapshots.

abstract get_latest(table: str, date_column: str, date_until: date | None = None, uppercase_columns: bool = False) → DataFrame[source]

Get latest available date partition of the table until specified date

Parameters:

table – input table path
date_until – Optional until date (inclusive)
uppercase_columns – Option to refactor all column names to uppercase

Returns:

Dataframe

abstract get_table(table: str, date_column: str, date_from: date | None = None, date_to: date | None = None, uppercase_columns: bool = False) → DataFrame[source]

Get a whole table or a slice by selected dates

Parameters:

table – input table path
date_from – Optional date from (inclusive)
date_to – Optional date to (inclusive)
uppercase_columns – Option to refactor all column names to uppercase

Returns:

Dataframe

class rialto.common.table_reader.TableReader(spark: SparkSession)[source]

Bases: DataReader

An implementation of data reader for databricks tables

get_latest(table: str, date_column: str, date_until: date | None = None, uppercase_columns: bool = False) → DataFrame[source]

Get latest available date partition of the table until specified date

Parameters:

table – input table path
date_until – Optional until date (inclusive)
date_column – column to filter dates on, takes highest priority
uppercase_columns – Option to refactor all column names to uppercase

Returns:

Dataframe

get_table(table: str, date_column: str, date_from: date | None = None, date_to: date | None = None, uppercase_columns: bool = False) → DataFrame[source]

Get a whole table or a slice by selected dates

Parameters:

table – input table path
date_from – Optional date from (inclusive)
date_to – Optional date to (inclusive)
date_column – column to filter dates on, takes highest priority
uppercase_columns – Option to refactor all column names to uppercase

Returns:

Dataframe

rialto.common.utils module

rialto.common.utils.cast_decimals_to_floats(df: DataFrame) → DataFrame[source]

Find all decimal types in the table and cast them to floats. Fixes errors in .toPandas() conversions.

Parameters:: df – input df
Returns:: pyspark DataFrame with fixed types

rialto.common.utils.get_caller_module() → Any[source]

Ged module containing the function which is calling your function.

Inspects the call stack, where: 0th entry is this function 1st entry is the function which needs to know who called it 2nd entry is the calling function

Therefore, we’ll return a module which contains the function at the 2nd place on the stack.

Returns:: Python Module containing the calling function.

rialto.common.utils.load_yaml(path: str) → Any[source]

YAML loader

Parameters:: path – file path
Returns:: Parsed yaml

rialto.common package

Submodules

rialto.common.table_reader module

rialto.common.utils module

Module contents