Did you know that in 19th century William Playfair had created graphs to visualize price of labour wages and wheat? Did you know that he is considered as the founder of graphical solutions in statistics? In data science it is impossible to do data analysis without data visualization. R is a widely used language in data science and known for its ability to provide visualization to most intricate data sets. Without this feature it is very hard to make sense of raw statistical data. Haven’t you heard the saying ‘A picture is worth a thousand words’?
In digital world, the initial trend in providing data visualization was through charts in Excel. Now entire spreadsheets and databases can be visualized by the click of a button. This automates the data visualization. Data scientists are going a step further to automate the big data visualization without compromising the details.
Packages of R used to kick start data visualization
- Plotly – There is a JavaScript library called Plotly JavaScript to which R provides interface through plotly. It’s somewhat easy to learn and basic graphs in this package have a distinctive look.
- Ggplot2 – It a very widely used, powerful, well-conceptualized package which follows the ‘grammar of graphics’ to form static graphs. It’s requires a considerable amount of effort to master.
- Ggiraph – This package provides geom functions which make ggplot2 plots interactive. It also provides arguments for JavaScript onclicks and tooltips.
- Dygraphs – Time series graphs can be created in JavaScript/HTML. You the data is xts objects then only a single line of code is able to produce graphs.
- googleVis – Google charts can be accessed from R language itself.
- Metricsgraphics – For metricsgraphics Javascript library R provides interface through this package. It is used to form bar charts, bare-bones line, scatterplots and many others.
- RcolorBrewer – To provide apt visualizations this package provides selection to color palletes.
- Shiny – To produce interactive web applications from R data this package is very useful
- Rcdimple – For dimple Javascript library this package provides interface hosting a range of customization options.
- Highcharter – It provides high end interactive graphics and it mainly uses syntax like ggplot2.
There’s a reason why R is the most used language in data science. No other language can compete with R in providing such vivid solutions for data visualizations. Visualization package developers of R actually respond to user queries of their packages in R forums.
Charts and plots popularly used in R for data visualization:
- Histogram – To break data into breaks and to show frequency distribution for these breaks histograms are used. To better make sense of the visualization the developer can modify the breaks and view the results
- Box plot – It shows the significant five numbers in statistics namely – median, 75th percentile, minimum, maximum, 25th percentile. Insightful inferences can be produced by visualizing data that is spread. It is also used to plot a combination of continuous and categorical variables.
- Line chart – If one wants to analyze trends spread over various time frames this type of charts are used. If one wants to distinguish relative changes in numbers over a time period this chart is helpful.
- Bar chart – Across several groups if one wants to find cumulative totals to develop comparison between individual groups then bar chart is the key. They are used if a developer wants to plot a combination of categorical and continuous variable or to plot categorical variable alone.
- Scatter plot – For simple inspection of data along with visualizing it scatter plots are used. If for two continuous variables we have to gain relationship then this plot can be used.
- Area chart – Commonly used for plots in time series it is used to depict continuity across a dataset. It is also famous for plotting continuous variables and studying the resulting trends.
- Heat map – In a 2d image to find out relationship between many variables through color intensity heat map is used.
- Correlogram – Among the variables in data sets to test the co-relation Correlogram is used. To depict the correlation value matrix cells from correlogram are colored. Blue represents positive correlations and red stands for negative correlations. Hence intensity of color is proportional to the value of correlation.
Future of data visualization
There is a strong demand for substantive communication when message with data is delivered to clients. One can invoke intrigue with creative visuals but too much data decoration can’t cover up lack of essential data substance. Visualization can only provide an apt representation for numerical data but a prerequisite for good visualization is that the data being worked on should be sound and accurate. On one hand data analytics and data visualization is very important in organizations but generating highly accurate business insights is a difficult task. Data visualization should be contextual with the underlying data and care should be taken in this regard. An expert visualized image that is out of context is not good. R has kept up the promise of providing graphical solutions for long and it will continue to do so.