finol.data_layer.DatasetLoader

class finol.data_layer.DatasetLoader[源代码]

Class to load different types of datasets.

Methods

access_data(folder_path)

Load raw data files from a specified folder path and return a list of DataFrames.

augment_data(df)

Augment the provided DataFrame based on the configuration.

calculate_zscore(df)

Calculate the z-scores for numeric features in the provided DataFrame.

clean_data(df)

Clean the DataFrame by removing rows with missing values.

feature_engineering(df)

Perform feature engineering on the input DataFrame to generate various types of features.

load_dataset()

Load the raw data, perform data pre-processing operations, and prepare DataLoader for training, validation, and testing.

make_label(raw_df, df)

Generate labels, i.e. price relatives.

normalize_data(df, zscore)

Normalize all numeric features in DataFrame.

plot_single_candlestick(index, row)

split_data(df)

Split the DataFrame into train, validation, and test sets.

access_data(folder_path)[源代码]

Load raw data files from a specified folder path and return a list of DataFrames.

参数:

folder_path (str) – Path to the folder containing raw data files.

返回:

List of DataFrames containing the loaded raw data.

返回类型:

List[DataFrame]

augment_data(df)[源代码]

Augment the provided DataFrame based on the configuration.

参数:

df (DataFrame) – Input DataFrame to be augmented.

返回:

Augmented DataFrame with window data.

返回类型:

Tuple[DataFrame, int]

calculate_zscore(df)[源代码]

Calculate the z-scores for numeric features in the provided DataFrame.

参数:

df (DataFrame) – DataFrame containing the data for z-score calculation.

返回:

Z-score object for the numeric features in the DataFrame, in some cases the return can be None.

返回类型:

Optional[object]

clean_data(df)[源代码]

Clean the DataFrame by removing rows with missing values.

参数:

df (DataFrame) – Input DataFrame to be cleaned.

返回:

DataFrame with rows containing any missing values removed.

返回类型:

DataFrame

feature_engineering(df)[源代码]

Perform feature engineering on the input DataFrame to generate various types of features.

参数:

df (DataFrame) – Input DataFrame to be engineered.

返回:

Tuple containing the engineered DataFrame, detailed feature list, and number of features in each category.

返回类型:

Tuple[DataFrame, List[str], Dict[str, int]]

load_dataset()[源代码]

Load the raw data, perform data pre-processing operations, and prepare DataLoader for training, validation, and testing.

返回:

Dictionary containing various data loaders and information about the dataset.

返回类型:

Dict

make_label(raw_df, df)[源代码]

Generate labels, i.e. price relatives.

参数:
  • raw_df (DataFrame) – Raw DataFrame containing ‘CLOSE’ prices.

  • df (DataFrame) – DataFrame to merge the labels with.

返回:

DataFrame containing the generated labels.

返回类型:

DataFrame

normalize_data(df, zscore)[源代码]

Normalize all numeric features in DataFrame.

参数:
  • df (DataFrame) – Input DataFrame to be normalized.

  • zscore (object) – Z-score object used for normalization.

返回:

DataFrame with normalized numeric features.

返回类型:

DataFrame

split_data(df)[源代码]

Split the DataFrame into train, validation, and test sets.

参数:

df (DataFrame) – Input DataFrame to be split.

返回:

Tuple containing the train, validation, and test DataFrames.

返回类型:

Tuple[DataFrame, DataFrame, DataFrame]