finol.data_layer.DatasetLoader

class finol.data_layer.DatasetLoader[source]

Class to load different types of datasets.

Methods

access_data(folder_path)

Load raw data files from a specified folder path and return a list of DataFrames.

augment_data(df)

Augment the provided DataFrame based on the configuration.

calculate_zscore(df)

Calculate the z-scores for numeric features in the provided DataFrame.

clean_data(df)

Clean the DataFrame by removing rows with missing values.

feature_engineering(df)

Perform feature engineering on the input DataFrame to generate various types of features.

load_dataset()

Load the raw data, perform data pre-processing operations, and prepare DataLoader for training, validation, and testing.

make_label(raw_df, df)

Generate labels, i.e. price relatives.

normalize_data(df, zscore)

Normalize all numeric features in DataFrame.

plot_single_candlestick(index, row)

split_data(df)

Split the DataFrame into train, validation, and test sets.

access_data(folder_path)[source]

Load raw data files from a specified folder path and return a list of DataFrames.

Parameters:

folder_path (str) – Path to the folder containing raw data files.

Returns:

List of DataFrames containing the loaded raw data.

Return type:

List[DataFrame]

augment_data(df)[source]

Augment the provided DataFrame based on the configuration.

Parameters:

df (DataFrame) – Input DataFrame to be augmented.

Returns:

Augmented DataFrame with window data.

Return type:

Tuple[DataFrame, int]

calculate_zscore(df)[source]

Calculate the z-scores for numeric features in the provided DataFrame.

Parameters:

df (DataFrame) – DataFrame containing the data for z-score calculation.

Returns:

Z-score object for the numeric features in the DataFrame, in some cases the return can be None.

Return type:

Optional[object]

clean_data(df)[source]

Clean the DataFrame by removing rows with missing values.

Parameters:

df (DataFrame) – Input DataFrame to be cleaned.

Returns:

DataFrame with rows containing any missing values removed.

Return type:

DataFrame

feature_engineering(df)[source]

Perform feature engineering on the input DataFrame to generate various types of features.

Parameters:

df (DataFrame) – Input DataFrame to be engineered.

Returns:

Tuple containing the engineered DataFrame, detailed feature list, and number of features in each category.

Return type:

Tuple[DataFrame, List[str], Dict[str, int]]

load_dataset()[source]

Load the raw data, perform data pre-processing operations, and prepare DataLoader for training, validation, and testing.

Returns:

Dictionary containing various data loaders and information about the dataset.

Return type:

Dict

make_label(raw_df, df)[source]

Generate labels, i.e. price relatives.

Parameters:
  • raw_df (DataFrame) – Raw DataFrame containing ‘CLOSE’ prices.

  • df (DataFrame) – DataFrame to merge the labels with.

Returns:

DataFrame containing the generated labels.

Return type:

DataFrame

normalize_data(df, zscore)[source]

Normalize all numeric features in DataFrame.

Parameters:
  • df (DataFrame) – Input DataFrame to be normalized.

  • zscore (object) – Z-score object used for normalization.

Returns:

DataFrame with normalized numeric features.

Return type:

DataFrame

split_data(df)[source]

Split the DataFrame into train, validation, and test sets.

Parameters:

df (DataFrame) – Input DataFrame to be split.

Returns:

Tuple containing the train, validation, and test DataFrames.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]