Muhamad Sahrul Syabani
This Analysis was created a condition of obtaining DSBIZ certification by creating an EDA from a dataset taken from kaggle.com I made an EDA analysis with a dataset from kaggle
Import Library
READ DATA
Since none is null, we continue the EDA analysis process on the dataset
EDA PROCESS
Now we are going to look at basic properties of the users and games in that dataset.
Look at a count aggregation by game. Here we aggregate the counts of each game and show them ordered descending.
Let's now aggregate on the sum of hours for each game.
Now lets have a look at the top 15 most active users by hours played.
Now, let's do the same to look at the cumulative distribution of events per user. We wan't to find the number of users that make 80% of the activity in this dataset. By 80:15 rule you could assume 15% of the users are going to constitute 80% of the data presented here.
So it wasn't 15% but 17% which is close enough. Now do the same for games.
Finally build the user engagement matrix, where we group by user id, pivot on game and aggregate by summing the hours played in each of the games.
From the clustermap we can see, that there are a few co dependencies between games, showing about three clusters: