Feature Interactions (Step 5)¶
Overview¶
This step creates feature interactions to capture non-linear relationships and domain-specific patterns that individual features cannot express. The approach targets combinations where business logic suggests multiplicative effects between features.
XGBoost optimization continues with 20 trials per run, focusing on extracting maximum value from feature combinations rather than exploring multiple model types.
Key Findings¶
- Previous campaign success shows amplified effects when combined with call engagement
- Professional segments emerge from job-education combinations
- Seasonal timing effectiveness varies by contact method
- Interaction features can capture complex decision patterns
Implementation¶
1. Previous Success + Duration Interactions¶
Created interactions between campaign history and engagement:
def create_previous_success_duration_interactions(train_df, test_df):
"""Create interactions between previous campaign success and call duration."""
# Previous success with high engagement duration
train_df["prev_success_high_engagement"] = (
train_df["previous_success"] * train_df["duration_high_engagement"]
)
test_df["prev_success_high_engagement"] = (
test_df["previous_success"] * test_df["duration_high_engagement"]
)
# Previous success with duration quartiles
train_df["prev_success_duration_log"] = (
train_df["previous_success"] * train_df["duration_log"]
)
test_df["prev_success_duration_log"] = (
test_df["previous_success"] * test_df["duration_log"]
)
return train_df, test_df
2. Job + Education Professional Segments¶
Enhanced professional segmentation through job-education combinations:
def create_job_education_interactions(train_df, test_df):
"""Create professional segment interactions between job and education."""
# High education professionals
train_df["high_ed_professional"] = (
((train_df["job"].isin([1, 8])) & (train_df["education"] == 3)) |
((train_df["job"] == 9) & (train_df["education"].isin([2, 3])))
).astype(int)
# Blue collar with higher education
train_df["blue_collar_educated"] = (
(train_df["job"] == 0) & (train_df["education"].isin([2, 3]))
).astype(int)
return train_df, test_df
3. Seasonal + Contact Method Interactions¶
Combined timing and communication channel effectiveness:
def create_seasonal_contact_interactions(train_df, test_df):
"""Create seasonal communication effectiveness interactions."""
# High success months with cellular contact
train_df["peak_month_cellular"] = (
(train_df["month_success_group"] == 0) & (train_df["contact"] == 0)
).astype(int)
# Medium success months with optimal contact type
train_df["medium_month_contact"] = (
(train_df["month_success_group"] == 2) & (train_df["contact"] != 2)
).astype(int)
return train_df, test_df
4. Coordination Function¶
The main orchestrator applies all interaction types:
def apply_feature_interactions(train_df, test_df):
"""Apply feature interactions based on domain insights."""
train_df, test_df = create_previous_success_duration_interactions(train_df, test_df)
train_df, test_df = create_job_education_interactions(train_df, test_df)
train_df, test_df = create_seasonal_contact_interactions(train_df, test_df)
return train_df, test_df
Features Created¶
Feature | Type | Description |
---|---|---|
prev_success_high_engagement |
Binary | Previous success combined with high engagement calls |
prev_success_duration_log |
Numerical | Previous success weighted by call duration |
high_ed_professional |
Binary | Professional segments with higher education |
blue_collar_educated |
Binary | Blue-collar workers with higher education |
peak_month_cellular |
Binary | High success months with cellular contact |
medium_month_contact |
Binary | Medium success months with optimal contact |
Data Transformation Example¶
Before Processing¶
previous_success | duration_high_engagement | job | education | month_success_group | contact |
---|---|---|---|---|---|
1 | 1 | 1 | 3 | 0 | 0 |
0 | 1 | 0 | 2 | 1 | 1 |
1 | 0 | 8 | 3 | 2 | 0 |
After Processing¶
previous_success | duration_high_engagement | prev_success_high_engagement | high_ed_professional | peak_month_cellular |
---|---|---|---|---|
1 | 1 | 1 | 1 | 1 |
0 | 1 | 0 | 0 | 0 |
1 | 0 | 0 | 1 | 0 |
Expected Impact¶
- Capture multiplicative effects between campaign history and engagement
- Identify high-value professional segments through education-job combinations
- Optimize timing and communication channel strategies
- Enable more nuanced customer segmentation
Results¶
MLflow Performance¶
Feature interactions results:
Single Model Performance (80/20 split): - Test AUC: 0.96840
K-Fold Cross-Validation (5 folds): - Average AUC: 0.9686
Performance comparable to numerical enhancements, with consistent cross-validation results.
Classification Metrics¶
Performance comparison with numerical enhancements results:
- False Positives: Reduced from 5,252 to 5,120 (-132)
- False Negatives: Increased from 4,452 to 4,547 (+95)
- Trade-off: Lower precision with higher recall
Feature interactions show minor changes in error patterns compared to numerical enhancements.
Kaggle Competition Results¶
Competition submission results:
- Competition Score: 0.96916
- Leaderboard Position: 1152