1. General principles

2. Taxonomies of data visualizations

Duarte (2008) has an interesting taxonomy
3D dataviz taxonomy
Tableau - Which chart?
Comparison among graph types (p. 24) [PDF]

From Sharda et al. (2014, pp. 110-112):

Line chart AKA line graph: "show relationship between two variables", "track changes or trends over time"
Scatter plot: "explore the existence of trends, concentrations, and outliers" among 2 or 3 variables
- Bubble chart: varying size of dots on scatterplot to add another variable
Pie chart: show relative proportions IFF <4 categories
Bar chart: "compare data across multiple categories"
- Horizontal, vertical, stacked
- Histogram: like a bar chart, but displays a frequency distribution
- Gantt chart: "portray project timelines, project tasks/activity durations, and overlap amongst the tasks/activities"
- Bullet chart: show progress towards a goal in the context of a bar chart
Geographic map
PERT chart AKA network diagram shows sequence of tasks in a complex project
Heat map: "quickly see where the intersection of the categories is strongest and weakest"
- Highlight table is a 2D heat map that displays numeric values in addition to color
Tree map: "display hierarchical (tree-structured) data as a set of nested rectangles"

From Eckerson & Hammond (2011):

3. Data visualization with ...

3.1. Tableau

3.2. Python

import matplotlib.pyplot as plt

# convert to int:
my_data = list(map(int, data_in)))

# linecharts/plots:
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
x_axis_ticks = list(range(len(my_data)))
ax.plot(x_axis_ticks, my_data, linewidth=2)
ax.set_title(my_title)
ax.set_xlim([0, len(sample)])
ax.set_xlabel(‘Axis name’)
ax.set_ylabel(‘Axis name’)
fig.save_fig(my_filename)

# table: see also tablib
from prettytable import PrettyTable
my_data_header = my_data[0]
x = PrettyTable(my_data_header)
x.add_row(my_data[1])

# plot all bars as barchart:
X = numpy.arange(len(my_data))
width = 0.25
plt.bar(X+width, prices, width)
plt.xlim(0, 5055)

# plot buckets:
from collections import Counter
def group_data_by_range(my_data):
talley = Counter()
for em in data:
bucket = 0
if em >=0 and em < 10:
bucket = 1
elif em >= 10 and em < 20:
bucket = 2
talley[bucket] += 1
return talley
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
plt.style.use(‘ggplot’)
colors = plt.rcParams[‘axes.color_cycle’]
for group in my_grouped_data:
ax.bar(group, my_grouped_data[group], color=colors[groups[group%len(my_grouped_data)])
labels = [‘Group 1’, ‘Group 2’ … ]
ax.legend(labels)
ax.set_title(‘Title’)
ax.set_xlabel(‘Axis name’)
ax.set_xticklabels(labels, ha=’left’)
ax.set_xticks(range(1, len(my_grouped_data)+1))
ax.set_ylabel(‘Axis name’)
plt.grid(True)

3.3. R

library for complex graphs: http://ggplot2.org/

p <- seq(0, 1, 0.01)

# scatterplot:
plot(my_df$name1, my_df$name2)

# line:
plot(... type=”l”)

plot(var1 ~ var2))

# univariate boxplot:
boxplot(my_df$var_name)
# multivariate boxplot:
boxplot(var1 ~ var2)

# histogram:
hist(data, breaks=)

# frequencies:
table()
# multivariable:
table(my_df$var1, my_df$var2)
mosaicplot(table(my_df$var1, my_df$var2) )
mosaicplot(var1 ~ var2)

# relative frequencies:
table(my_df$my_var)/length(my_df$my_var)

barplot(table())

# plot in three rows:
par(mfrow = c(3, 1))
xlimits <- range(data1)
hist( … xlim=xlimits)
plot_ss(x = mlb11$at_bats, y = mlb11$runs,  x1, y1, x2, y2)
showSquares=T/F
leastSquares=T/F

# OLS best-fit:
lm(y ~ x, my_df)
summary(lm(...)

# line:
abline()
abline(lm(...))
qqnorm(m1$residuals)
qqline(m1$residuals)
hist(m1$residuals)

# account for overlapping data point:
plot(jitter(x), y)

4. Sources

4.1. Cited

Eckerson, W., & Hammond, M. (2011). Visual reporting and analysis. TDWI Best Practices Report. TDWI, Chatsworth. Retrieved from http://www.smartanalytics.com.au/pdf/Advizor-TDWI_VisualReportingandAnalysisReport.pdf

Sharda, R., Delen, D., & Turban, E. (2014). Business intelligence: A managerial perspective on analytics (3rd ed.). New York City, NY: Pearson.

Tufte, E. (2006). Beautiful evidence. Cheshire, CT: Graphics Press.

4.2. References

Timdream's HTML5 wordcloud generator

4.3. Read

4.4. Unread

DATA + DESIGN ebook
Design better data tables
Word clouds considered harmful
Youtube - Infographics & Data Visualization Course
OpenLearn - Effective Ways of Displaying Information
Information Visualization MOOC
The Visual Display of Quantitative Information
Visual Explanations
Beautiful Evidence
Envisioning Explanations
DataCamp - ggvis, ggplot 1, ggplot 2, R Markdown
FlowingData - Tutorials (mostly in R)
Tufte in R
Data visualization: modern approaches
Data visualization and infographics