Andri Armaginda Siregar
Cross selling is a strategy of offering consumers to buy additional products to support the performance of products they have already purchased. Therefore, cross selling products are often considered as recommendations that buyers cannot refuse.
In this case, cross selling is also done to attract health insurance users to also participate in the vehicle insurance program created by the health insurance company.
PORTOFOLIO
HEALTH INSURANCE CROSS SELL PREDICTION
INTRODUCTION
Cross selling is a strategy of offering consumers to buy additional products to support the performance of products they have already purchased. Therefore, cross selling products are often considered as recommendations that buyers cannot refuse.
In this case, cross selling is also done to attract health insurance users to also participate in the vehicle insurance program created by the health insurance company.
OBJECTIVE
Building a model to predict whether a customer would be interested in Vehicle Insurance is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimise its business model and revenue.
DATA DESCRIPTION
Nama Variable | Keterangan |
| Id | Unique ID for the customer |
| Gender | Gender of the customer |
| Age | Age of the customer |
| Driving_License | 0 : Customer does not have DL 1 : Customer already has DL |
| Region_Code | Unique code for the region of the customer |
| Previously_Insured | 0 : Customer doesn't have Vehicle Insurance 1 : Customer already has Vehicle Insurance |
| Vehicle_Age | Age of the Vehicle |
| Vehicle_Damage | 0 : Customer didn't get his/her vehicle damaged in the past 1 : Customer got his/her vehicle damaged in the past |
| Annual_Premium | The amount the customer needs to pay as premium in the year. |
| PolicySalesChannel | Anonymized Code for the channel of outreaching to the customer ie. Different Agents, Over Mail, Over Phone, In Person, etc. |
| Vintage | Number of Days, the Customer has been associated with the company |
| Response | 0: Customer is not interested, 1:Customer is interested |
Dataset : https://www.kaggle.com/datasets/anmolkumar/health-insurance-cross-sell-prediction
I. PREPARE THE PROBLEM
IMPORT LIBRARY & DATASET
II. EDA (Exploratory Data Analysis)
SUMMARIZE DATA
In the data we have there are 12 columns and 381109 rows. Next, we check the datatypes, shapes, and null values in our dataset.
DESCRIPTIVE STATISTICS
DATA VISUALIZATIONS
RESPONSE & GENDER
AGE VS RESPONSE
DRIVING LICENSE PREVIOUSLY INSURED VEHICLE AGE
ANNUAL PREMIUM
CORRELATION MATRIX
Target variable is not much affected by Vintage variable. we can drop least correlated variable.
III. PREPROCESSING DATA
At the data preprocessing stage we do label encoding converting categorical variables into biner variables so that they can be used in data analysis. then we check for duplicate data in the dataset, based on the results of the check no duplicate data is found.
FEATURE SELECTION
We can remove less important features from the data set.
HANDLING IMBALANCED DATA
When observation in one class is higher than the observation in other classes then there exists a class imbalance. We can clearly see that there is a huge difference between the data set. Solving this issue we use resampling technique.
IV. MODEL SELECTION
1. LOGISTIC REGRESSION
2. RANDOM FOREST CLASSIFIER
3. XGBCLASSIFIER
COMPARING THE MODEL
The ML model for the problem statement was created using python with the help of the dataset, and the ML model created with Random Forest and XGBClassifier models performed better than Logistics Regression model. Thus, for the given problem, the models created by Random Forest and XGBClassifier.
CONCLUSION