feature_encoders.models package

Submodules

feature_encoders.models.grouped module

class feature_encoders.models.grouped.GroupedPredictor(*, group_feature: str, model_conf: Dict[str, Dict], feature_conf: Optional[Dict[str, Dict]] = None, estimator_params=(), fallback=False)[source]

Bases: sklearn.base.RegressorMixin, sklearn.base.BaseEstimator

Construct one predictor per data group.

The predictor splits data by the different values of a single column and fits one estimator per group. Since each of the models in the ensemble predicts on a different subset of the input data (an observation cannot belong to more than one clusters), the final prediction is generated by vertically concatenating all the individual models’ predictions.

Parameters
  • group_feature (str) – The name of the column of the input dataframe to use as the grouping set.

  • model_conf (Dict[str, Dict]) – A dictionary that includes information about the base model’s structure.

  • feature_conf (Dict[str, Dict], optional) – A dictionary that maps feature generator names to the classes for the generators’ validation and creation. Defaults to None.

  • estimator_params (dict or tuple of tuples, optional) – The parameters to use when instantiating a new base estimator. If none are given, default parameters are used. Defaults to tuple().

  • fallback (bool, optional) – Whether or not to fall back to a global model in case a group parameter is not found during .predict(). Otherwise, an exception will be raised. Defaults to False.

property dof
fit(X: pandas.core.frame.DataFrame, y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series])[source]

Fit the estimator with the available data.

Parameters
  • X (pandas.DataFrame) – Input data.

  • y (pandas.Series or pandas.DataFrame) – Target data.

Raises
  • Exception – If the estimator is re-fitted. An estimator object can only be fitted once.

  • ValueError – If the input data does not pass the checks of utils.check_X.

  • ValueError – If the target data does not pass the checks of utils.check_y.

Returns

Fitted estimator.

Return type

GroupedPredictor

property n_parameters
predict(X: pandas.core.frame.DataFrame, include_clusters=False, include_components=False)[source]

Predict given new input data.

Parameters
  • X (pandas.DataFrame) – Input data.

  • include_clusters (bool, optional) – Whether to include the added clusters in the returned prediction. Defaults to False.

  • include_components (bool, optional) – Whether to include the contribution of the individual components of the model structure in the returned prediction. Defaults to False.

Raises

ValueError – If the input data does not pass the checks of utils.check_X.

Returns

The predicted values.

Return type

pandas.DataFrame

feature_encoders.models.linear module

class feature_encoders.models.linear.LinearPredictor(*, model_structure: feature_encoders.compose._compose.ModelStructure, alpha=0.01, fit_intercept=False)[source]

Bases: sklearn.base.RegressorMixin, sklearn.base.BaseEstimator

A linear regression model with flexible parameterization.

Parameters
  • model_structure (ModelStructure) – The structure of a linear regression model.

  • alpha (float, optional) – Regularization strength of the underlying ridge regression; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Defaults to 0.01.

  • fit_intercept (bool, optional) – Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations. Defaults to False.

property dof
fit(X: pandas.core.frame.DataFrame, y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series])[source]

Fit the estimator with the available data.

Parameters
  • X (pandas.DataFrame) – Input data.

  • y (pandas.Series or pandas.DataFrame) – Target data.

Raises
  • Exception – If the estimator is re-fitted. An estimator object can only be fitted once.

  • ValueError – If the input data does not pass the checks of utils.check_X.

  • ValueError – If the target data does not pass the checks of utils.check_y.

Returns

Fitted estimator.

Return type

LinearPredictor

property n_parameters
predict(X: pandas.core.frame.DataFrame, include_components=False)[source]

Predict using the given input data.

Parameters
  • X (pandas.DataFrame) – Input data.

  • include_components (bool, optional) – If True, the prediction dataframe will include also the individual components’ contribution to the predicted values. Defaults to False.

Returns

The prediction.

Return type

pandas.DataFrame

feature_encoders.models.seasonal module

class feature_encoders.models.seasonal.SeasonalPredictor(ds: Optional[str] = None, add_trend: bool = False, yearly_seasonality: Union[str, bool, int] = 'auto', weekly_seasonality: Union[str, bool, int] = 'auto', daily_seasonality: Union[str, bool, int] = 'auto', min_samples=0.5, alpha=0.01)[source]

Bases: sklearn.base.BaseEstimator

Time series prediction model based on seasonal decomposition.

Parameters
  • ds (str, optional) – The name of the input dataframe’s column that contains datetime information. If None, it is assumed that the datetime information is provided by the input dataframe’s index. Defaults to None.

  • add_trend (bool, optional) – If True, a linear time trend will be added. Defaults to False.

  • yearly_seasonality (Union[str, bool, int], optional) – Fit yearly seasonality. Can be ‘auto’, True, False, or a number of Fourier terms to generate. Defaults to “auto”.

  • weekly_seasonality (Union[str, bool, int], optional) – Fit weekly seasonality. Can be ‘auto’, True, False, or a number of Fourier terms to generate. Defaults to “auto”.

  • daily_seasonality (Union[str, bool, int], optional) – Fit daily seasonality. Can be ‘auto’, True, False, or a number of Fourier terms to generate. Defaults to “auto”.

  • min_samples (float ([0, 1]), optional) – Minimum number of samples chosen randomly from original data by the RANSAC (RANdom SAmple Consensus) algorithm. Defaults to 0.5.

  • alpha (float, optional) – Parameter for the underlying ridge estimator (base_estimator). It must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Defaults to 0.01.

add_seasonality(name: str, period: Optional[float] = None, fourier_order: Optional[int] = None, condition_name: Optional[str] = None)[source]

Add a seasonal component with specified period and number of Fourier components.

If condition_name is provided, the input dataframe passed to fit and predict should have a column with the specified condition_name containing booleans that indicate when to apply seasonality.

Parameters
  • name (str) – The name of the seasonality component.

  • period (float, optional) – Number of days in one period. Defaults to None.

  • fourier_order (int, optional) – Number of Fourier components to use. Defaults to None.

  • condition_name (str, optional) – The name of the seasonality condition. Defaults to None.

Raises
  • Exception – If the method is called after the estimator is fitted.

  • ValueError – If either period or fourier_order are not provided and the seasonality is not in (‘daily’, ‘weekly’, ‘yearly’).

Returns

The updated estimator object.

Return type

SeasonalPredictor

fit(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame)[source]

Fit the estimator with the available data.

Parameters
  • X (pandas.DataFrame) – Input data.

  • y (pandas.DataFrame) – Target data.

Raises
  • Exception – If the estimator is re-fitted. An estimator object can only be fitted once.

  • ValueError – If the input data does not pass the checks of utils.check_X.

  • ValueError – If the target data does not pass the checks of utils.check_y.

Returns

Fitted estimator.

Return type

SeasonalPredictor

predict(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Predict using the given input data.

Parameters

X (pandas.DataFrame) – Input data.

Returns

The prediction.

Return type

pandas.DataFrame

Module contents