Skip to content

Data Understanding

Dataset Overview

  • Title: Bank Term Deposit Subscription Prediction Dataset
  • Link: https://www.kaggle.com/competitions/playground-series-s5e8/data
  • Source: Portuguese banking institution marketing campaign data
  • Original Dataset: UCI Machine Learning Repository
  • Competition: Kaggle Playground Series S5E8

Objective: Predict whether a client will subscribe to a bank term deposit based on direct marketing campaign data.

The dataset originated from actual banking marketing campaigns but has been processed for the Kaggle competition. Each record represents a single client contacted during the marketing campaign.

Data files

train.csv - training data with features and target
test.csv - test features for final predictions
sample_submission.csv - shows the submission format (id, probability)

Feature Descriptions

The dataset contains 17 features representing client information and campaign details:

Client Demographics

  • age: Age of the client (numeric)
  • job: Type of job (categorical: admin, blue-collar, entrepreneur, housemaid, management, retired, self-employed, services, student, technician, unemployed, unknown)
  • marital: Marital status (categorical: married, single, divorced)
  • education: Education level (categorical: primary, secondary, tertiary, unknown)

Financial Information

  • default: Has credit in default? (categorical: yes, no)
  • balance: Average yearly balance in euros (numeric)
  • housing: Has housing loan? (categorical: yes, no)
  • loan: Has personal loan? (categorical: yes, no)

Campaign Contact Details

  • contact: Communication type (categorical: unknown, telephone, cellular)
  • day: Last contact day of month (numeric: 1-31)
  • month: Last contact month (categorical: jan, feb, mar, ..., dec)
  • duration: Last contact duration in seconds (numeric)

Campaign History

  • campaign: Number of contacts during this campaign (numeric)
  • pdays: Days since last contact from previous campaign (numeric; -1 = not previously contacted)
  • previous: Number of contacts before this campaign (numeric)
  • poutcome: Outcome of previous campaign (categorical: unknown, other, failure, success)

Target Variable

  • y: Client subscribed to term deposit (binary: 1=yes, 0=no)

→ See Exploratory Data Analysis for detailed findings and visualizations.

The analysis results guide model selection and preprocessing decisions for optimal prediction performance.