feature_encoders.models package
Submodules
feature_encoders.models.grouped module
- class feature_encoders.models.grouped.GroupedPredictor(*, group_feature: str, model_conf: Dict[str, Dict], feature_conf: Optional[Dict[str, Dict]] = None, estimator_params=(), fallback=False)[source]
Bases:
sklearn.base.RegressorMixin,sklearn.base.BaseEstimatorConstruct one predictor per data group.
The predictor splits data by the different values of a single column and fits one estimator per group. Since each of the models in the ensemble predicts on a different subset of the input data (an observation cannot belong to more than one clusters), the final prediction is generated by vertically concatenating all the individual models’ predictions.
- Parameters
group_feature (str) – The name of the column of the input dataframe to use as the grouping set.
model_conf (Dict[str, Dict]) – A dictionary that includes information about the base model’s structure.
feature_conf (Dict[str, Dict], optional) – A dictionary that maps feature generator names to the classes for the generators’ validation and creation. Defaults to None.
estimator_params (dict or tuple of tuples, optional) – The parameters to use when instantiating a new base estimator. If none are given, default parameters are used. Defaults to tuple().
fallback (bool, optional) – Whether or not to fall back to a global model in case a group parameter is not found during .predict(). Otherwise, an exception will be raised. Defaults to False.
- property dof
- fit(X: pandas.core.frame.DataFrame, y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series])[source]
Fit the estimator with the available data.
- Parameters
X (pandas.DataFrame) – Input data.
y (pandas.Series or pandas.DataFrame) – Target data.
- Raises
Exception – If the estimator is re-fitted. An estimator object can only be fitted once.
ValueError – If the input data does not pass the checks of utils.check_X.
ValueError – If the target data does not pass the checks of utils.check_y.
- Returns
Fitted estimator.
- Return type
- property n_parameters
- predict(X: pandas.core.frame.DataFrame, include_clusters=False, include_components=False)[source]
Predict given new input data.
- Parameters
X (pandas.DataFrame) – Input data.
include_clusters (bool, optional) – Whether to include the added clusters in the returned prediction. Defaults to False.
include_components (bool, optional) – Whether to include the contribution of the individual components of the model structure in the returned prediction. Defaults to False.
- Raises
ValueError – If the input data does not pass the checks of utils.check_X.
- Returns
The predicted values.
- Return type
pandas.DataFrame
feature_encoders.models.linear module
- class feature_encoders.models.linear.LinearPredictor(*, model_structure: feature_encoders.compose._compose.ModelStructure, alpha=0.01, fit_intercept=False)[source]
Bases:
sklearn.base.RegressorMixin,sklearn.base.BaseEstimatorA linear regression model with flexible parameterization.
- Parameters
model_structure (ModelStructure) – The structure of a linear regression model.
alpha (float, optional) – Regularization strength of the underlying ridge regression; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Defaults to 0.01.
fit_intercept (bool, optional) – Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations. Defaults to False.
- property dof
- fit(X: pandas.core.frame.DataFrame, y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series])[source]
Fit the estimator with the available data.
- Parameters
X (pandas.DataFrame) – Input data.
y (pandas.Series or pandas.DataFrame) – Target data.
- Raises
Exception – If the estimator is re-fitted. An estimator object can only be fitted once.
ValueError – If the input data does not pass the checks of utils.check_X.
ValueError – If the target data does not pass the checks of utils.check_y.
- Returns
Fitted estimator.
- Return type
- property n_parameters
- predict(X: pandas.core.frame.DataFrame, include_components=False)[source]
Predict using the given input data.
- Parameters
X (pandas.DataFrame) – Input data.
include_components (bool, optional) – If True, the prediction dataframe will include also the individual components’ contribution to the predicted values. Defaults to False.
- Returns
The prediction.
- Return type
pandas.DataFrame
feature_encoders.models.seasonal module
- class feature_encoders.models.seasonal.SeasonalPredictor(ds: Optional[str] = None, add_trend: bool = False, yearly_seasonality: Union[str, bool, int] = 'auto', weekly_seasonality: Union[str, bool, int] = 'auto', daily_seasonality: Union[str, bool, int] = 'auto', min_samples=0.5, alpha=0.01)[source]
Bases:
sklearn.base.BaseEstimatorTime series prediction model based on seasonal decomposition.
- Parameters
ds (str, optional) – The name of the input dataframe’s column that contains datetime information. If None, it is assumed that the datetime information is provided by the input dataframe’s index. Defaults to None.
add_trend (bool, optional) – If True, a linear time trend will be added. Defaults to False.
yearly_seasonality (Union[str, bool, int], optional) – Fit yearly seasonality. Can be ‘auto’, True, False, or a number of Fourier terms to generate. Defaults to “auto”.
weekly_seasonality (Union[str, bool, int], optional) – Fit weekly seasonality. Can be ‘auto’, True, False, or a number of Fourier terms to generate. Defaults to “auto”.
daily_seasonality (Union[str, bool, int], optional) – Fit daily seasonality. Can be ‘auto’, True, False, or a number of Fourier terms to generate. Defaults to “auto”.
min_samples (float ([0, 1]), optional) – Minimum number of samples chosen randomly from original data by the RANSAC (RANdom SAmple Consensus) algorithm. Defaults to 0.5.
alpha (float, optional) – Parameter for the underlying ridge estimator (base_estimator). It must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Defaults to 0.01.
- add_seasonality(name: str, period: Optional[float] = None, fourier_order: Optional[int] = None, condition_name: Optional[str] = None)[source]
Add a seasonal component with specified period and number of Fourier components.
If condition_name is provided, the input dataframe passed to fit and predict should have a column with the specified condition_name containing booleans that indicate when to apply seasonality.
- Parameters
name (str) – The name of the seasonality component.
period (float, optional) – Number of days in one period. Defaults to None.
fourier_order (int, optional) – Number of Fourier components to use. Defaults to None.
condition_name (str, optional) – The name of the seasonality condition. Defaults to None.
- Raises
Exception – If the method is called after the estimator is fitted.
ValueError – If either period or fourier_order are not provided and the seasonality is not in (‘daily’, ‘weekly’, ‘yearly’).
- Returns
The updated estimator object.
- Return type
- fit(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame)[source]
Fit the estimator with the available data.
- Parameters
X (pandas.DataFrame) – Input data.
y (pandas.DataFrame) – Target data.
- Raises
Exception – If the estimator is re-fitted. An estimator object can only be fitted once.
ValueError – If the input data does not pass the checks of utils.check_X.
ValueError – If the target data does not pass the checks of utils.check_y.
- Returns
Fitted estimator.
- Return type