MLflow V3 Overview¶
Foundation Complete¶
With V2, the MLflow infrastructure is now stable and reliable. Time to focus on feature engineering and data exploration.
XGboost as base model
XGBoost was identified as the most appropriate model for this dataset through previous experiments. All trials in this section will be conducted with XGBoost only, as testing all models would be time-intensive. Focus is on optimizing XGBoost performance with 20 trials per run to achieve the best results efficiently.
Approach¶
V3 implements systematic feature engineering based on EDA insights. Each improvement is tracked separately to measure individual impact on model performance.
Pipeline structure:
Raw Data -> Duration Treatment -> Categorical Engineering -> Numerical Enhancement -> Feature Interactions -> Model Training
Class Imbalance Handling¶
The first priority is addressing the high number of false negatives identified in V2 best model results.
Added scale_pos_weight
parameter to XGBoost hyperparameter optimization (range 1.0-15.0) to handle the 7.3:1 class imbalance by giving more weight to positive subscription cases.