# What is hidden behind the scatter diagram?

Olesia Stetsiuk posted on June 30th, 2016

After the battle between pie charts and bar charts, it is a good time to relax and explore scatter plots. Since last time we pointed out main goals of using charts and graphs and mentioned investigating trends and relationships between variables in the data. Exactly for this purpose, all types of scatter plots are typically used. A scatter plot shows values for usually two variables for a set of data. The data is displayed as a collection of points with positions on the horizontal axis set according to the values of one variable and positions on the vertical axis set under to the value of the other variable.

What is an edge of using scatter plots and line plots? Scatter plot displays the strength of the relationship between the variables, the direction and/or form of the relationship, and whether outliers exist. Let’s discuss these purposes in more details. The scatter plot can give you a suggestion that two variables might be related, and if so, how they advance together. Can you see how the data on your scatter plot, as you move from left to right, are gradually rising or decreasing? If so, you may suggest a positive or negative correlation. But, it is a point to be careful as correlation does not mean causation. Your scatter plot may indicate that a relationship exists, but it does not and cannot confirm that one variable is causing the other. There can be several reasons for the observed effect of correlation such as the impact of a third factor influencing both variables or systemic cause affecting all your data.

Scatter plots also help to explore patterns hidden in your data. For instance, points may be clustered in one of the quadrants or form a visible relationship which might be linear or curved. The strength of the pattern is associated to how closely the points are grouped around the underlying form.

In addition, scatter plots are serviceable for examining outliers and influential points in your data. Since specific points that fall outside the overall pattern of the scatter plot may have an important impact on the associations in your data. These points should be considered to conclude whether they are actual data values or some kind of data errors. Therefore, it is quite natural for an analyst to conduct two analyses – the first analysis with outliers remaining in the data set, the second with them eliminated. Furthermore, line plots are typically used for time-series in exploratory data analysis (EDA). We had already mentioned data visualisations techniques on the service of EDA in the previous blog.

Regarding Flexmonster Pivot Table Component, pivot table functionality is essential for a lot of fundamental analysis. With the help of calculated values and built-in charts, you can conduct all necessary business-oriented analysis. In upcoming blogs, we will show more real cases. Stay tuned!