feature_encoders.models package

Submodules

feature_encoders.models.grouped module

class feature_encoders.models.grouped.GroupedPredictor(*, group_feature: str, model_conf: Dict[str, Dict], feature_conf: Optional[Dict[str, Dict]] = None, estimator_params=(), fallback=False)[source]

Bases: sklearn.base.RegressorMixin, sklearn.base.BaseEstimator

Construct one predictor per data group.

The predictor splits data by the different values of a single column and fits one estimator per group. Since each of the models in the ensemble predicts on a different subset of the input data (an observation cannot belong to more than one clusters), the final prediction is generated by vertically concatenating all the individual models’ predictions.

Parameters

group_feature (str) – The name of the column of the input dataframe to use as the grouping set.
model_conf (Dict[str, Dict]) – A dictionary that includes information about the base model’s structure.
feature_conf (Dict[str, Dict], optional) – A dictionary that maps feature generator names to the classes for the generators’ validation and creation. Defaults to None.
estimator_params (dict or tuple of tuples, optional) – The parameters to use when instantiating a new base estimator. If none are given, default parameters are used. Defaults to tuple().
fallback (bool, optional) – Whether or not to fall back to a global model in case a group parameter is not found during .predict(). Otherwise, an exception will be raised. Defaults to False.

property dof

fit(X: pandas.core.frame.DataFrame, y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series])[source]

Fit the estimator with the available data.

Parameters

X (pandas.DataFrame) – Input data.
y (pandas.Series or pandas.DataFrame) – Target data.

Raises

Exception – If the estimator is re-fitted. An estimator object can only be fitted once.
ValueError – If the input data does not pass the checks of utils.check_X.
ValueError – If the target data does not pass the checks of utils.check_y.

Returns

Fitted estimator.

Return type

GroupedPredictor

property n_parameters

predict(X: pandas.core.frame.DataFrame, include_clusters=False, include_components=False)[source]

Predict given new input data.

Parameters

X (pandas.DataFrame) – Input data.
include_clusters (bool, optional) – Whether to include the added clusters in the returned prediction. Defaults to False.
include_components (bool, optional) – Whether to include the contribution of the individual components of the model structure in the returned prediction. Defaults to False.

Raises

ValueError – If the input data does not pass the checks of utils.check_X.

Returns

The predicted values.

Return type

pandas.DataFrame

feature_encoders.models.linear module

class feature_encoders.models.linear.LinearPredictor(*, model_structure: feature_encoders.compose._compose.ModelStructure, alpha=0.01, fit_intercept=False)[source]

Bases: sklearn.base.RegressorMixin, sklearn.base.BaseEstimator

A linear regression model with flexible parameterization.

Parameters

model_structure (ModelStructure) – The structure of a linear regression model.
alpha (float, optional) – Regularization strength of the underlying ridge regression; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Defaults to 0.01.
fit_intercept (bool, optional) – Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations. Defaults to False.

property dof

fit(X: pandas.core.frame.DataFrame, y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series])[source]

Fit the estimator with the available data.

Parameters

X (pandas.DataFrame) – Input data.
y (pandas.Series or pandas.DataFrame) – Target data.

Raises

Exception – If the estimator is re-fitted. An estimator object can only be fitted once.
ValueError – If the input data does not pass the checks of utils.check_X.
ValueError – If the target data does not pass the checks of utils.check_y.

Returns

Fitted estimator.

Return type

LinearPredictor

property n_parameters

predict(X: pandas.core.frame.DataFrame, include_components=False)[source]

Predict using the given input data.

Parameters

X (pandas.DataFrame) – Input data.
include_components (bool, optional) – If True, the prediction dataframe will include also the individual components’ contribution to the predicted values. Defaults to False.

Returns

The prediction.

Return type

pandas.DataFrame

feature_encoders.models.seasonal module

class feature_encoders.models.seasonal.SeasonalPredictor(ds: Optional[str] = None, add_trend: bool = False, yearly_seasonality: Union[str, bool, int] = 'auto', weekly_seasonality: Union[str, bool, int] = 'auto', daily_seasonality: Union[str, bool, int] = 'auto', min_samples=0.5, alpha=0.01)[source]

Bases: sklearn.base.BaseEstimator

Time series prediction model based on seasonal decomposition.

Parameters

ds (str, optional) – The name of the input dataframe’s column that contains datetime information. If None, it is assumed that the datetime information is provided by the input dataframe’s index. Defaults to None.
add_trend (bool, optional) – If True, a linear time trend will be added. Defaults to False.
yearly_seasonality (Union[str, bool, int], optional) – Fit yearly seasonality. Can be ‘auto’, True, False, or a number of Fourier terms to generate. Defaults to “auto”.
weekly_seasonality (Union[str, bool, int], optional) – Fit weekly seasonality. Can be ‘auto’, True, False, or a number of Fourier terms to generate. Defaults to “auto”.
daily_seasonality (Union[str, bool, int], optional) – Fit daily seasonality. Can be ‘auto’, True, False, or a number of Fourier terms to generate. Defaults to “auto”.
min_samples (float ([0, 1]), optional) – Minimum number of samples chosen randomly from original data by the RANSAC (RANdom SAmple Consensus) algorithm. Defaults to 0.5.
alpha (float, optional) – Parameter for the underlying ridge estimator (base_estimator). It must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Defaults to 0.01.

add_seasonality(name: str, period: Optional[float] = None, fourier_order: Optional[int] = None, condition_name: Optional[str] = None)[source]

Add a seasonal component with specified period and number of Fourier components.

If condition_name is provided, the input dataframe passed to fit and predict should have a column with the specified condition_name containing booleans that indicate when to apply seasonality.

Parameters

name (str) – The name of the seasonality component.
period (float, optional) – Number of days in one period. Defaults to None.
fourier_order (int, optional) – Number of Fourier components to use. Defaults to None.
condition_name (str, optional) – The name of the seasonality condition. Defaults to None.

Raises

Exception – If the method is called after the estimator is fitted.
ValueError – If either period or fourier_order are not provided and the seasonality is not in (‘daily’, ‘weekly’, ‘yearly’).

Returns

The updated estimator object.

Return type

SeasonalPredictor

fit(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame)[source]

Fit the estimator with the available data.

Parameters

X (pandas.DataFrame) – Input data.
y (pandas.DataFrame) – Target data.

Raises

Exception – If the estimator is re-fitted. An estimator object can only be fitted once.
ValueError – If the input data does not pass the checks of utils.check_X.
ValueError – If the target data does not pass the checks of utils.check_y.

Returns

Fitted estimator.

Return type

SeasonalPredictor

predict(X: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]

Predict using the given input data.

Parameters: X (pandas.DataFrame) – Input data.
Returns: The prediction.
Return type: pandas.DataFrame

feature_encoders.models package

Submodules

feature_encoders.models.grouped module

feature_encoders.models.linear module

feature_encoders.models.seasonal module

Module contents