Exploratory Data Analysis of Steam Video Games

Muhamad Sahrul Syabani

Sosial Media


0 orang menyukai ini
Suka

Summary

This Analysis was created a condition of obtaining DSBIZ certification by creating an EDA from a dataset taken from kaggle.com I made an EDA analysis with a dataset from kaggle 

Description

Import Library

READ DATA

 

Since none is null, we continue the EDA analysis process on the dataset

EDA PROCESS

Now we are going to look at basic properties of the users and games in that dataset.

Look at a count aggregation by game. Here we aggregate the counts of each game and show them ordered descending.

Let's now aggregate on the sum of hours for each game.

 

Now lets have a look at the top 15 most active users by hours played.

Now, let's do the same to look at the cumulative distribution of events per user. We wan't to find the number of users that make 80% of the activity in this dataset. By 80:15 rule you could assume 15% of the users are going to constitute 80% of the data presented here.

So it wasn't 15% but 17% which is close enough. Now do the same for games.

Finally build the user engagement matrix, where we group by user id, pivot on game and aggregate by summing the hours played in each of the games.

From the clustermap we can see, that there are a few co dependencies between games, showing about three clusters:

  1. we can see that there is a cluster of users who play the whole football manager series (2012, 2013, 2014)
  2. also there is a smaller cluster for CS global offensive, Counter-Strike and Counter-Strike source
  3. another cluster of users who Call of Duty series (MW2, Black Ops, Multiplayer)

 

 

 

 

Informasi Course Terkait
  Kategori: Artificial Intelligence
  Course: Teknologi Game Kecerdasan Artifisial (SIB AI-GAME)