Skip to content

Feature Interactions (Step 5)

Overview

This step creates feature interactions to capture non-linear relationships and domain-specific patterns that individual features cannot express. The approach targets combinations where business logic suggests multiplicative effects between features.

XGBoost optimization continues with 20 trials per run, focusing on extracting maximum value from feature combinations rather than exploring multiple model types.

Key Findings

  • Previous campaign success shows amplified effects when combined with call engagement
  • Professional segments emerge from job-education combinations
  • Seasonal timing effectiveness varies by contact method
  • Interaction features can capture complex decision patterns

Implementation

1. Previous Success + Duration Interactions

Created interactions between campaign history and engagement:

def create_previous_success_duration_interactions(train_df, test_df):
    """Create interactions between previous campaign success and call duration."""

    # Previous success with high engagement duration
    train_df["prev_success_high_engagement"] = (
        train_df["previous_success"] * train_df["duration_high_engagement"]
    )
    test_df["prev_success_high_engagement"] = (
        test_df["previous_success"] * test_df["duration_high_engagement"]
    )

    # Previous success with duration quartiles
    train_df["prev_success_duration_log"] = (
        train_df["previous_success"] * train_df["duration_log"]
    )
    test_df["prev_success_duration_log"] = (
        test_df["previous_success"] * test_df["duration_log"]
    )

    return train_df, test_df

2. Job + Education Professional Segments

Enhanced professional segmentation through job-education combinations:

def create_job_education_interactions(train_df, test_df):
    """Create professional segment interactions between job and education."""

    # High education professionals
    train_df["high_ed_professional"] = (
        ((train_df["job"].isin([1, 8])) & (train_df["education"] == 3)) |
        ((train_df["job"] == 9) & (train_df["education"].isin([2, 3])))
    ).astype(int)

    # Blue collar with higher education
    train_df["blue_collar_educated"] = (
        (train_df["job"] == 0) & (train_df["education"].isin([2, 3]))
    ).astype(int)

    return train_df, test_df

3. Seasonal + Contact Method Interactions

Combined timing and communication channel effectiveness:

def create_seasonal_contact_interactions(train_df, test_df):
    """Create seasonal communication effectiveness interactions."""

    # High success months with cellular contact
    train_df["peak_month_cellular"] = (
        (train_df["month_success_group"] == 0) & (train_df["contact"] == 0)
    ).astype(int)

    # Medium success months with optimal contact type
    train_df["medium_month_contact"] = (
        (train_df["month_success_group"] == 2) & (train_df["contact"] != 2)
    ).astype(int)

    return train_df, test_df

4. Coordination Function

The main orchestrator applies all interaction types:

def apply_feature_interactions(train_df, test_df):
    """Apply feature interactions based on domain insights."""

    train_df, test_df = create_previous_success_duration_interactions(train_df, test_df)
    train_df, test_df = create_job_education_interactions(train_df, test_df)
    train_df, test_df = create_seasonal_contact_interactions(train_df, test_df)

    return train_df, test_df

Features Created

Feature Type Description
prev_success_high_engagement Binary Previous success combined with high engagement calls
prev_success_duration_log Numerical Previous success weighted by call duration
high_ed_professional Binary Professional segments with higher education
blue_collar_educated Binary Blue-collar workers with higher education
peak_month_cellular Binary High success months with cellular contact
medium_month_contact Binary Medium success months with optimal contact

Data Transformation Example

Before Processing

previous_success duration_high_engagement job education month_success_group contact
1 1 1 3 0 0
0 1 0 2 1 1
1 0 8 3 2 0

After Processing

previous_success duration_high_engagement prev_success_high_engagement high_ed_professional peak_month_cellular
1 1 1 1 1
0 1 0 0 0
1 0 0 1 0

Expected Impact

  • Capture multiplicative effects between campaign history and engagement
  • Identify high-value professional segments through education-job combinations
  • Optimize timing and communication channel strategies
  • Enable more nuanced customer segmentation

Results

MLflow Performance

Feature interactions results:

Single Model Performance (80/20 split): - Test AUC: 0.96840

K-Fold Cross-Validation (5 folds): - Average AUC: 0.9686

Performance comparable to numerical enhancements, with consistent cross-validation results.

Classification Metrics

Performance comparison with numerical enhancements results:

  • False Positives: Reduced from 5,252 to 5,120 (-132)
  • False Negatives: Increased from 4,452 to 4,547 (+95)
  • Trade-off: Lower precision with higher recall

Feature interactions show minor changes in error patterns compared to numerical enhancements.

Kaggle Competition Results

Competition submission results:

  • Competition Score: 0.96916
  • Leaderboard Position: 1152