Portofolio Detail >> HEALTH INSURANCE CROSS SELL PREDICTION

HEALTH INSURANCE CROSS SELL PREDICTION

Andri Armaginda Siregar

Sosial Media

1 orang menyukai ini
Suka

Summary

Cross selling is a strategy of offering consumers to buy additional products to support the performance of products they have already purchased. Therefore, cross selling products are often considered as recommendations that buyers cannot refuse.

In this case, cross selling is also done to attract health insurance users to also participate in the vehicle insurance program created by the health insurance company.

Description

PORTOFOLIO

HEALTH INSURANCE CROSS SELL PREDICTION

INTRODUCTION

In this case, cross selling is also done to attract health insurance users to also participate in the vehicle insurance program created by the health insurance company.

OBJECTIVE

Building a model to predict whether a customer would be interested in Vehicle Insurance is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimise its business model and revenue.

DATA DESCRIPTION

Nama Variable	Keterangan
Id	Unique ID for the customer
Gender	Gender of the customer
Age	Age of the customer
Driving_License	0 : Customer does not have DL 1 : Customer already has DL
Region_Code	Unique code for the region of the customer
Previously_Insured	0 : Customer doesn't have Vehicle Insurance 1 : Customer already has Vehicle Insurance
Vehicle_Age	Age of the Vehicle
Vehicle_Damage	0 : Customer didn't get his/her vehicle damaged in the past 1 : Customer got his/her vehicle damaged in the past
Annual_Premium	The amount the customer needs to pay as premium in the year.
PolicySalesChannel	Anonymized Code for the channel of outreaching to the customer ie. Different Agents, Over Mail, Over Phone, In Person, etc.
Vintage	Number of Days, the Customer has been associated with the company
Response	0: Customer is not interested, 1:Customer is interested

Dataset : https://www.kaggle.com/datasets/anmolkumar/health-insurance-cross-sell-prediction

I. PREPARE THE PROBLEM

IMPORT LIBRARY & DATASET

II. EDA (Exploratory Data Analysis)

SUMMARIZE DATA

In the data we have there are 12 columns and 381109 rows. Next, we check the datatypes, shapes, and null values in our dataset.

DESCRIPTIVE STATISTICS

DATA VISUALIZATIONS

RESPONSE & GENDER

AGE VS RESPONSE

Young people below 30 are not interested in vehicle insurance. Reasons could be lack of experience, less maturity level and they don't have expensive vehicles yet.
People aged between 30-60 are more likely to be interested.
From the boxplot we can see that there no outlier in the data.

DRIVING LICENSE PREVIOUSLY INSURED VEHICLE AGE

ANNUAL PREMIUM

From the distribution plot we can infer that the annual premimum variable is right skewed
From the boxplot we can observe lot of outliers in the variable

CORRELATION MATRIX

Target variable is not much affected by Vintage variable. we can drop least correlated variable.

III. PREPROCESSING DATA

At the data preprocessing stage we do label encoding converting categorical variables into biner variables so that they can be used in data analysis. then we check for duplicate data in the dataset, based on the results of the check no duplicate data is found.

FEATURE SELECTION

We can remove less important features from the data set.

HANDLING IMBALANCED DATA

When observation in one class is higher than the observation in other classes then there exists a class imbalance. We can clearly see that there is a huge difference between the data set. Solving this issue we use resampling technique.

IV. MODEL SELECTION

Problem can be identified as Binary Classification (wheather customer opts for vehicle insurance or not)
Dataset has more than 300k records
cannot go with SVM Classifier as it takes more time to train as dataset increase
The idea to start model selection can be made with several algorithms such as Logistic Regression, Random Forest, and XGBClassifier.

1. LOGISTIC REGRESSION

2. RANDOM FOREST CLASSIFIER

3. XGBCLASSIFIER

COMPARING THE MODEL

The ML model for the problem statement was created using python with the help of the dataset, and the ML model created with Random Forest and XGBClassifier models performed better than Logistics Regression model. Thus, for the given problem, the models created by Random Forest and XGBClassifier.

CONCLUSION

Customers of age between 30 to 60 are more likely to buy insurance.
Customers with Driving License have higher chance of buying Insurance.
Customers with Vehicle_Damage are likely to buy insurance.
The variable such as Age, Previously_insured,Annual_premium are more afecting the target variable.
comparing ROC curve we can see that Random Forest model preform better. Because curves closer to the top-left corner, it indicate a better performance.

Informasi Course Terkait

Kategori: Data Science / Big Data
Course: Teknologi Kecerdasan Artifisial

Kelas GRATIS

Master Class

Learning Path

Master Class + Sertifikasi BNSP

Master Class + Sertifikasi Internasional

Portofolio Peserta

Webinar

Udemy

Kelas GRATIS

Master Class

Master Class + Sertifikasi BNSP

Master Class + Sertifikasi Internasional

Learning Path

Portofolio Peserta

Program Special

Webinar

Udemy

Learncation

Sertifikasi Internasional

Sertifikasi Nasional

Kelas Corporate

Sertifikasi Internasional

Sertifikasi Nasional

Kelas Corporate

Kolaborasi Seminar

Kolaborasi pelatihan

Gallery

Tentang Kami

Testimonial Peserta

Testimonial Video Peserta

Corporate Social Responsibility

Pengajar Kami

Hubungi Kami

Dokter Mekanik

E-learning

LEIP

Flungo

Tampil

Run Addicts

TripTracker

Gramatikal