Olesia Stetsiuk posted on June 16th, 2016
Last time we slightly discussed the role of data visualisation in exploration data analysis (EDA). As promised, this time, we are going to run such analysis using Flexmonster Pivot Table Component. For this work, we had chosen always actual health care topic.
It is well-known fact that heart disease is one of the leading causes of death for both men and women around the world , thus, we will analyse different risks factors which increase the chances of developing such kinds of illness. For this task, we will use a subset from prominent Framingham Heart Study (FHS). Here are few reminders of this longitudinal study. It is a prolonged and prospective study of the etiology of cardiovascular disease among a population of residents of Framingham, Massachusetts. It was started in 1948 with 5,209 participants and presently remains one of the most valuable studies, outstanding both in terms of its scope and duration.
We will use the data that comes from the BioLINCC website. After loading prepared for analysis data in CVS format file to Flexmonster Pivot Table Component we can see the data in a flat table view which is easily accessible on Options. For a proper analysis, it is crucial to understand variables in a dataset. Can you easily explain their meaning and implication? What are their types and distributions? So far in this data set, we have 4240 observations of 16 variables. Each observation is a row and corresponds to one person, 15 variables are different risk factors which were collected during the first examination. The “TenYearCHD” variable is 1 or 0, whether or not the particular person had developed coronary heart disease (CHD) within 10 years of the first examination. CHD is a disease of the blood vessels supplying the heart. This is one type of heart disease, which has been the leading cause of death worldwide since 1921. Hereafter, we can see there are some demographical characteristics, such as the gender, age, and education level. The data set also to include behavioural risk factors associated with smoking, indicating if the patient is a current smoker and the number of cigarettes that the person smoked on average in one day. Medical history risk factors are also included. These are whether or not the patient was on blood pressure medication / was hypertensive; whether or not the patient had diabetes /had previously had a stroke. Lastly, the dataset includes risk factors of the first physical examination of the patient. The total cholesterol level, systolic blood pressure, diastolic blood pressure, Body Mass Index, or BMI, heart rate, and blood glucose level of the patient were measured.
Therefore, in our analysis, we will try to explore risk factors and their joint effects on developing coronary heart disease. For the first touch, let us see if the gender of a person can be one of the risk factors. With using Calculated Values feature pivot table shows that more than 0.18 of man and 0.12 of women developed CHD in 10 years.
Also with the bar chart, one can see that relative number of patients that developed the disease is bigger for more elderly persons. In the picture, red bars indicate a number of persons with CHD for a particular age, blue bars show a total number of persons for the same age in the dataset. For building these bar charts it is necessary to use “Multiple values” option.
Furthermore, we can build one more bar chart for combining demographic characteristics to see different age patterns of men (on the right side) and women (on the left side) with the disease.
Regarding behavioural risk factors, we can see that number of cigarettes per day can be a risk factor too. Medical history data analysis showed that from 25 patients that previously have had a stroke 11 developed CHD in 10 years (44%).
Also, one can see that from 109 persons with diabetes 40 were diagnosed with heart disease later (around 37%). Item, 325 individuals from 1317 with hypertension got the heart disease (more than 24%). By similar manipulations, we see that total cholesterol level, systolic blood pressure, and blood glucose level should be investigated for significance as risk factors too.
To sum up, this is an example of using Flexmonster Pivot Table Component for exploratory data analysis on the data for which analytical results are known to see how one can explore information with at hand visualisation tools and come up with ideas and insights which jump together with published papers. In total, there has been around 2500 studies written using the Framingham data. During the years, many other risk factors such as obesity, exercise, psychological, and social issues were evaluated too. Currently, we are in the third generation that started in 2002, and a second generation enables the study to examine also a family history as a risk factor. In addition to the classical measures we have used so far, social network analysis of the participants has also been utilised.
Besides the study, there is an online tool called Framingham Risk Score that assesses 10-year risks of having a heart attack. So you input in this online tool your age, your gender, the total cholesterol, the HDL cholesterol, whether or not you are a smoker, the systolic blood pressure, and check your personal risk.
As you might have noticed in today’s work we use pivot tables and bar charts above all. So for the next blog, we will discuss types of charts and their use for different purposes.
Stay tuned and keep your heart healthy!