EXPLORATORY DATA ANALYSIS (EDA) ON NFT DATA

Iqbal Tri Widiyanto

Sosial Media


0 orang menyukai ini
Suka

Summary

Data exploration analysis is an approach to find out the characteristics of the data, the distribution of the data, as well as an overview of the data we have, either in the form of raw data or in easy-to-understand graphic form, so that the information obtained will make it easier for us to choose the appropriate analysis tools (analytical methods). appropriate.

Description

In EDA there are several things that must be known in the data that has been collected, including whether the data distribution is symmetrical, normal, or skewness, problems with data quality, whether there are outliers, and problems of correlation and intercorrelation. Graphical techniques that are often used in EDA are often very simple, these techniques include plotting raw data (histograms, dotplots, dataplots, stem-andleaf plots), simple statistical plots such as (boxplot, mean plot, std plot).

Exploratory Data Analysis: Almost the same as classical data analysis, but there is a difference where the data analysis step is located after the data collection process, and the next step is to form a model and proceed with the results.

There are several software that provide EDA tools and are open source, including:

1. Python: is an open source application often used in data analysis, data mining, and data science. The python program is highly recommended for processing large amounts of data. The python software is available at https://www.python.org/. Apart from that, Python can also be accessed on Anaconda via Jupyter Notebook, or online using Google Colabs.

2. Software R: R is a programming language specifically for statistical computing and data presentation. R software can be obtained at https://www.r-project.org. Like Python R, it also has various packages that make it easier for users.

3. Weka: an open source application specifically for data mining which includes EDA tools. Weka software can be downloaded at https://www. cs.waikato.ac.nz/ml/weka/ . The advantage of the Weka application over python and R is that the Weka application does not require coding to operate.

4. KNIME: an open source application used for data analysis based on Eclipse, KNIME software can be obtained at https://www.knime.com/.

Exploratory Data Analysis (EDA) is the process of analyzing and visualizing data to gain a better understanding of the data and gain insights from it. There are various steps involved when conducting an EDA, but the following are general steps a data analyst can take when conducting an EDA:

 • Import the data

• Clear the data

• Process the data

• Visualize the data

Some of the packages and functions that will be used include:

• Tidyverse package for tidying up datasets

• ggplot2 for visualization

• corplot package for correlation plots

• Several other basic functions to manipulate data such as strsplit(), cbind(), matrix() and so on.

After we import the data using the read.csv command, we will then factor the 'numeric' in the final group and the rest becomes a factor. There are three scenarios that will be carried out, but before that we will check the data type of each variable using the str() function.

When someone is carrying out the process of data analysis, one of the processes that should not be overlooked is exploratory data analysis (EDA). EDA is an important process in data analysis because by doing EDA users will be able to save more time in the data analysis process, be able to find out some errors in the data such as missing values, outliers, duplications, encodings, noisy data, incomplete data, etc.

One of the things to worry about if you don't go through the EDA process is the occurrence of repeated errors in the analysis process, or the results of the analysis becoming less valid and less relevant to business objectives because the data used is really not ready. In addition, by doing EDA, users will be assisted in viewing the data before making any assumptions so they can identify errors in the data.

 

The main purpose of Exploratory Data Analysis is to help look at the data before making any assumptions. This can help identify obvious errors, as well as better understand patterns in data, detect outliers or anomalous events, find interesting relationships between variables. Data Scientist can ensure whether the results generated are valid and applicable for each purpose. In addition, Exploratory Data Analysis can also help stakeholders by confirming that they are asking the right questions. EDA can help answer questions about standard deviation, categorical variables, and confidence intervals.

There are four types of Exploratory Data Analysis, namely:

• Univariate-Non-Graphic Analysis, is the simplest form of data analysis, in which the data being analyzed consists of only one variable so it does not deal with causes or relationships. The main goal of univariate analysis is to describe the data and find patterns in it.

• Univariate-Graphic Analysis, this method is needed because non-graphical methods cannot provide a complete picture of the data. Examples of graphs that are often used are steam and leaf, histogram, boxplot.

• Multivariate-Non Graphic Analysis, is a form of analysis that uses two or more variables, so that Exploratory Data Analysis is used to show the relationship between these variables.

• Multivariate-Graphical Analysis, using graphs to show relationships between variables. Examples of graphs that can be used are scatter plots, run charts, heat maps, bubble charts.

The use of Exploratory Data Analysis must of course be supported by using tools or programming languages that can support various analyzes. In general, there are two tools that are commonly used by data scientists, namely R and Python. R is a programming language that was created specifically for statistics, so there are many libraries and functions that can help the process of exploring data. While Python itself can also be used for Exploratory Data Analysis, especially for those that will lead to Machine Learning.

The following is an example of implementing exploratory data analysis on NFT data

Informasi Course Terkait
  Kategori: Data Science / Big Data
  Course: Blockchain Kecerdasan Artifisial (SIB AI-BLOCKCHAIN)