feature_encoders package
Subpackages
Submodules
feature_encoders.settings module
feature_encoders.utils module
- feature_encoders.utils.add_constant(data: Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame], prepend=True, has_constant='skip')[source]
Add a column of ones to an array.
- Parameters
data (array-like) – A column-ordered design matrix.
prepend (bool, optional) – If true, the constant is in the first column. Else the constant is appended (last column). Defaults to True.
has_constant ({'raise', 'add', 'skip'}, optional) – Behavior if
data
already has a constant. The default will return data without adding another constant. If ‘raise’, will raise an error if any column has a constant value. Using ‘add’ will add a column of 1s if a constant column is present. Defaults to “skip”.
- Returns
The original values with a constant (column of ones).
- Return type
numpy.ndarray
- feature_encoders.utils.as_list(val: Any)[source]
Cast input as list.
Helper function, always returns a list of the input value.
- feature_encoders.utils.as_series(x: Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame])[source]
Cast an iterable to a Pandas Series object.
- feature_encoders.utils.check_X(X: pandas.core.frame.DataFrame, exists=None, int_is_categorical=True, return_col_info=False)[source]
Perform a series of checks on the input dataframe.
- Parameters
X (pamdas.DataFrame) – The input dataframe.
exists (str or list of str, optional) – Names of columns that must be present in the input dataframe. Defaults to None.
int_is_categorical (bool, optional) – If True, integer types are considered categorical. Defaults to True.
return_col_info (bool, optional) – If True, the function will return the names of the categorical and the names of the numerical columns, in addition to the provided dataframe. Defaults to False.
- Raises
ValueError – If the input is not a pandas DataFrame.
ValueError – If any of the column names in exists are not found in the input.
ValueError – If Nan or inf values are found in the provided input data.
- Returns
pandas.DataFrame if return_col_info is False else (pandas.DataFrame, list, list)
- feature_encoders.utils.check_y(y: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], index=None)[source]
Perform a series of checks on the input dataframe.
The checks are carried out by sklearn.utils.check_array.
- Parameters
y (Union[pandas.Series, pandas.DataFrame]) – The input dataframe.
index (Union[pandas.Index, pandas.DatetimeIndex], optional) – An index to compare with the input dataframe’s index. Defaults to None.
- Raises
ValueError – If the input is neither a pandas Series nor a pandas DataFrame with only a single column.
ValueError – If the input data has different index than the one that was provided for comparison (if index is not None).
- Returns
The validated input data.
- Return type
pandas.DataFrame
- feature_encoders.utils.get_categorical_cols(X: pandas.core.frame.DataFrame, int_is_categorical=True)[source]
Return the names of the categorical columns in the input DataFrame.
- Parameters
X (pandas.DataFrame) – Input dataframe.
int_is_categorical (bool, optional) – If True, integer types are considered categorical. Defaults to True.
- Returns
The names of categorical columns in the input DataFrame.
- Return type
list
- feature_encoders.utils.get_datetime_data(X: pandas.core.frame.DataFrame, col_name=None)[source]
Get datetime information from the input dataframe.
- Parameters
X (pandas.DataFrame) – The input dataframe.
col_name (str, optional) – The name of the column that contains datetime information. If None, it is assumed that the datetime information is provided by the input dataframe’s index. Defaults to None.
- Returns
The datetime information.
- Return type
pandas.Series
- feature_encoders.utils.load_config(model='towt', features='default', merge_multiple=False)[source]
Load model configuration and feature generator mapping.
Given model and features, the function searches for files in:
conf_path = str(CONF_PATH) model_files = glob.glob(f"{conf_path}/models/{model}.*") feature_files = glob.glob(f"{conf_path}/features/{features}.*")
- Parameters
model (str, optional) – The name of the model configuration to load. Defaults to “towt”.
features (str, optional) – The name of the feature generator mapping to load. Defaults to “default”.
merge_multiple (bool, optional) – If True and more than one files are found when searching for either models or features, the contents of the files will ne merged. Otherwise, an exception will be raised. Defaults to False.
- Returns
The model configuration and feature mapping as dictionaries.
- Return type
(dict, dict)
- feature_encoders.utils.maybe_reshape_2d(arr: numpy.ndarray)[source]
Reshape an array (if needed) so it’s always 2-d and long.
- Parameters
arr (numpy.ndarray) – The input array.
- Returns
The reshaped array.
- Return type
numpy.ndarray
- feature_encoders.utils.tensor_product(a: numpy.ndarray, b: numpy.ndarray, reshape=True)[source]
Compute the tensor product of two matrices.
- Parameters
a (numpy array of shape (n, m_a)) – The first matrix.
b (numpy array of shape (n, m_b)) – The second matrix.
reshape (bool, optional) – Whether to reshape the result to be 2D (n, m_a * m_b) or return a 3D tensor (n, m_a, m_b). Defaults to True.
- Raises
ValueError – If input arrays are not 2-dimensional.
ValueError – If both input arrays do not have the same number of samples.
- Returns
numpy.ndarray of shape (n, m_a * m_b) if reshape = True else of shape (n, m_a, m_b).