feature_encoders package

Subpackages

Submodules

feature_encoders.settings module

feature_encoders.utils module

feature_encoders.utils.add_constant(data: Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame], prepend=True, has_constant='skip')[source]

Add a column of ones to an array.

Parameters
  • data (array-like) – A column-ordered design matrix.

  • prepend (bool, optional) – If true, the constant is in the first column. Else the constant is appended (last column). Defaults to True.

  • has_constant ({'raise', 'add', 'skip'}, optional) – Behavior if data already has a constant. The default will return data without adding another constant. If ‘raise’, will raise an error if any column has a constant value. Using ‘add’ will add a column of 1s if a constant column is present. Defaults to “skip”.

Returns

The original values with a constant (column of ones).

Return type

numpy.ndarray

feature_encoders.utils.as_list(val: Any)[source]

Cast input as list.

Helper function, always returns a list of the input value.

feature_encoders.utils.as_series(x: Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame])[source]

Cast an iterable to a Pandas Series object.

feature_encoders.utils.check_X(X: pandas.core.frame.DataFrame, exists=None, int_is_categorical=True, return_col_info=False)[source]

Perform a series of checks on the input dataframe.

Parameters
  • X (pamdas.DataFrame) – The input dataframe.

  • exists (str or list of str, optional) – Names of columns that must be present in the input dataframe. Defaults to None.

  • int_is_categorical (bool, optional) – If True, integer types are considered categorical. Defaults to True.

  • return_col_info (bool, optional) – If True, the function will return the names of the categorical and the names of the numerical columns, in addition to the provided dataframe. Defaults to False.

Raises
  • ValueError – If the input is not a pandas DataFrame.

  • ValueError – If any of the column names in exists are not found in the input.

  • ValueError – If Nan or inf values are found in the provided input data.

Returns

pandas.DataFrame if return_col_info is False else (pandas.DataFrame, list, list)

feature_encoders.utils.check_y(y: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], index=None)[source]

Perform a series of checks on the input dataframe.

The checks are carried out by sklearn.utils.check_array.

Parameters
  • y (Union[pandas.Series, pandas.DataFrame]) – The input dataframe.

  • index (Union[pandas.Index, pandas.DatetimeIndex], optional) – An index to compare with the input dataframe’s index. Defaults to None.

Raises
  • ValueError – If the input is neither a pandas Series nor a pandas DataFrame with only a single column.

  • ValueError – If the input data has different index than the one that was provided for comparison (if index is not None).

Returns

The validated input data.

Return type

pandas.DataFrame

feature_encoders.utils.get_categorical_cols(X: pandas.core.frame.DataFrame, int_is_categorical=True)[source]

Return the names of the categorical columns in the input DataFrame.

Parameters
  • X (pandas.DataFrame) – Input dataframe.

  • int_is_categorical (bool, optional) – If True, integer types are considered categorical. Defaults to True.

Returns

The names of categorical columns in the input DataFrame.

Return type

list

feature_encoders.utils.get_datetime_data(X: pandas.core.frame.DataFrame, col_name=None)[source]

Get datetime information from the input dataframe.

Parameters
  • X (pandas.DataFrame) – The input dataframe.

  • col_name (str, optional) – The name of the column that contains datetime information. If None, it is assumed that the datetime information is provided by the input dataframe’s index. Defaults to None.

Returns

The datetime information.

Return type

pandas.Series

feature_encoders.utils.load_config(model='towt', features='default', merge_multiple=False)[source]

Load model configuration and feature generator mapping.

Given model and features, the function searches for files in:

conf_path = str(CONF_PATH)
model_files = glob.glob(f"{conf_path}/models/{model}.*")
feature_files = glob.glob(f"{conf_path}/features/{features}.*")
Parameters
  • model (str, optional) – The name of the model configuration to load. Defaults to “towt”.

  • features (str, optional) – The name of the feature generator mapping to load. Defaults to “default”.

  • merge_multiple (bool, optional) – If True and more than one files are found when searching for either models or features, the contents of the files will ne merged. Otherwise, an exception will be raised. Defaults to False.

Returns

The model configuration and feature mapping as dictionaries.

Return type

(dict, dict)

feature_encoders.utils.maybe_reshape_2d(arr: numpy.ndarray)[source]

Reshape an array (if needed) so it’s always 2-d and long.

Parameters

arr (numpy.ndarray) – The input array.

Returns

The reshaped array.

Return type

numpy.ndarray

feature_encoders.utils.tensor_product(a: numpy.ndarray, b: numpy.ndarray, reshape=True)[source]

Compute the tensor product of two matrices.

Parameters
  • a (numpy array of shape (n, m_a)) – The first matrix.

  • b (numpy array of shape (n, m_b)) – The second matrix.

  • reshape (bool, optional) – Whether to reshape the result to be 2D (n, m_a * m_b) or return a 3D tensor (n, m_a, m_b). Defaults to True.

Raises
  • ValueError – If input arrays are not 2-dimensional.

  • ValueError – If both input arrays do not have the same number of samples.

Returns

numpy.ndarray of shape (n, m_a * m_b) if reshape = True else of shape (n, m_a, m_b).

Module contents