Best Practices for Data Visualisation

RSS International Conference 2023
Harrogate, UK

Andreas Krause, Nicola Rennie, and Brian Tarran

Welcome

In this session we will cover… what the guide is, and how it came about, what’s in the guide, and how you can contribute to the guide.

We also want to hear from you about what sort of content you’d like to see added to the guide.

Some background

A survey in 2021 asked RSS members their views on Significance magazine.

Respondents were asked, “What aspects of content could be improved?”

  • “Better, more consistent charts… I’d like to see a house style like The Economist
  • “The plots sometimes look amateurish…”
  • “The figures are often difficult to read…”

Help wanted

We put out a call:

“RSS publications seek data visualisation expert to develop best practice guidance”

Help wanted

The guide would:

  • Help contributors develop data visualisations that are high quality, readable, effective at conveying information, and fulfil their intended purpose.
  • Summarise and link to authoritative advice on chart styles and formats for different types of data.
  • Show how to override software defaults in common data visualisation software and packages.

Help wanted

It would also provide basic information on figure sizes, fonts, colours, resolution, etc., used in RSS publications.

Help wanted

Andreas and Nicola answered the call. We started work in February this year, and six months later…

The guide was published

Screenshot of data vis guide homepage

How the guide is structured

Screenshot of data vis guide table of contents

Why visualise data?

Motivation

Visualisations are found everywhere.

They are the key medium for transporting a message.

Some are better than others.

There are largely no standards.

Motivation: Anchor the message

  • Grab attention
  • Improve access to information (over text)
  • Increase precision (over text)
  • Bolster credibility: see yourself
  • Summarise content

The Art of Visualisation (1)

Gauge the sizes, determine the largest piece

Comparison 01

The Art of Visualisation (2)

The number of pixels per pie depends on its position

Comparison 02

The Art of Visualisation (3)

Sorting the bars by height is easy

Comparison 03

The Art of Visualisation (4)

… and arguably even easier with horizontal layout

Comparison 04

The Art of Visualisation (5)

A single pixel carries the same information as a large bar

Comparison 05

The Art of Visualisation (6)

Faint gridlines help reading off values precisely

Comparison 06

The Art of Visualisation (7)

There are options in designing a visualisation!

Comparison 07

Principles and elements of visualisation

Purpose

Data visualisations must serve a purpose.

Frequent aim: comparison.

Ask yourself:

  • What is the purpose?
  • Does the visualisation support the purpose?
  • Quickly, Accurately, and Intuitively?

Elements of Charts

  • Layout
  • Aspect ratio
  • Lines
  • Points
  • Colours
  • Axes
  • Symbols
  • Legends
  • Orientation
  • Auxiliary elements
  • Dimensionality

Note the defaults: the boxplot function in R has 27 of them.

Layout

  • Which axes are to be compared?

Lines

  • Lines introduce an order
  • No order, no lines

Line types: map style elements to order

  • Line thickness
  • Dash density for dashed lines
  • Brightness (black to white)

Points

If data points overlay (which they generally do)

  • Open circles still allow discerning data points
  • Smaller dots can be considered

If data points overlay perfectly (example: integer data)

  • Consider jittering

Colours

  • Colours should serve a purpose: discerning groups of data

Shades of gray

  • Have a natural visual hierarchy
  • Show varying quantities better than color
  • Provide an easily comprehended order to the data measures
  • This is the key

Edward Tufte (2001), p. 154

Axes

  • Should generally start at 0
  • Should not have negative numbers with none in the data

Axes (2): Relative changes

  • Log-axes for symmetry
  • Tickmarks 1/4 and 4
  • Line at “no change”
  • Faint gray grid

Aspect ratio

  • Number of pixels allocated to 1 measurement unit in y vs x
  • Comparing x and y (example: predicted vs observed data)
  • Identical axis limits, aspect ratio = 1 -> square figure

Symbols

Should be intuitive:

  • Good: +, bad:-, neutral: 0
  • Consider using letters (example: “L”ow, “M”edium, “H”igh)

Ideal case: Single look at the legend to memorize the mapping

Not intuitive: triangles, circles, squares -> repeated looks

(unless the order - number of vertices - carries a meaning)

Legends

  • Should not use up valuable space for data
  • May be integrated into the figure

Orientation

  • Order: y-axis from low to high, x-axis from left to right
  • Time flows from left to right (past to future)
  • Longer labels best at the y-axis, horizontally

Auxiliary elements

  • Tufte: avoid “chart junk”, elements without information
  • My example: the gray background in ggplot2 figures
  • No relevant contribution: remove (repeated identical axes)
  • Helpful lines (examples: y=0, faint gray lines)
  • Smoother to support trend identification
  • Confidence band only if relevant (default with R loess)

Auxiliary elements: change

  • Symmetric y-axis, line at y=0 -> focus on data

Dimensionality

  • Accurate interpretation is not straightforward. Don’t.
3D

Elements of Tables

  • Layout
  • Digits
  • Alignment
  • Multiple numbers per cell
  • Orientation
  • Fonts
  • Colours

Tables: Layout

  • Choose rows and columns consciously
  • Numbers are easier to compare vertically than horizontally
  • Generally: variables in columns, observations in rows
Name Age Weight
Alex 55 123.45
Sandy 33 77.07
Name Alex Sandy
Age 55 33
Weight 123.45 77.07

Tables: Alignment

Generally helpful:

  • Decimal points aligned vertically (monospace fonts?)
  • Right adjustment (larger numbers “stick out”)
  • Difficult if numbers are very different (e.g., 953 and 0.07)
Name Age Weight
Alex 55 123.45
Sandy 33 77.07
Name Alex Sandy
Age 55.00 33.00
Weight 123.45 77.07

Tables: Digits

  • Unnecessary precision is to be avoided
  • R: Digits argument
print(1:7 + 1/(1:7), digits=2)
[1] 2.0 2.5 3.3 4.2 5.2 6.2 7.1
print(1:7 + 1/(1:7), digits=3)
[1] 2.00 2.50 3.33 4.25 5.20 6.17 7.14
format(10+1:7 + 1/(1:7), digits=4)
[1] "12.00" "12.50" "13.33" "14.25" "15.20" "16.17" "17.14"
format(c(1234, 1/1234), digits=3)
[1] "1.23e+03" "8.10e-04"

Tables: Multiple numbers per cell

  • Hard to read, better separate columns
Variable Mean (%CV)
Age 55 (9)
Weight 88 (25)
Variable Mean %CV
Age 55 9
Weight 88 25

Tables: Orientation

  • Single landscape pages are a pain
  • Consider splitting the table into two

Tables: Fonts and colours

  • Some fonts are easier to read than others
  • These are usually the standard fonts
  • Use of different fonts and colors only for a purpose
  • Example: extreme values in boldface or red

Recap: Creating Charts and Tables

  • Actively designing charts is recommended
  • What is the question?
  • Does the visual enable answering the question efficiently?
  • Good visualisations increase P(paper gets accepted)
  • Visual abstracts come into fashion with journals

Styling charts

What’s wrong with this chart?

Colours

Why use colours in data visualisation?

  • Colours can highlight or emphasise parts of your data.

  • Not always the most effective for e.g. communicating differences between variables.

Colours: types of palette

Examples of sequential, diverging, and qualitative palettes

Colours: don’t rely on colour

A 2x2 grid of bar charts and approximately how they may appear to those with different types of colour deficiency

Colours: check for accessibility

  • Different types of colourblindness

  • Monochrome printing

  • Ink usage

Colours: check for accessibility

A nicely styled bar charts of guinea pig tooth growth

A 2x2 grid of bar charts and approximately how they may appear to those with different types of colour deficiency

Annotations

  • Add clarification or detail

  • Highlight an interesting data point

  • Labelling data points (sometimes!)

Annotations

A before and after showing the same bar chart of tooth growth data without annotations on the left, and with labels showing bar values on the right.

Fonts

  • Font size: larger fonts are (usually) better

  • Font colour: ensure sufficient contrast

  • Font face: highlight text using bold font, avoid italics

  • Font family: choose a clear font with distinguishable features

Fonts: font family

Arial: Does it pass the 1Il test?


Times New Roman: Does it pass the 1Il test?


Courier New: Does it pass the 1Il test?

Alt Text

Alt text (AKA alternative text) is text that describes the visual aspects and purpose of an image – including charts.

Though alt text has various uses, its primary purpose is to aid visually impaired users in interpreting images when the alt text is read aloud by screen readers.

Alt Text

Screenshot of Medium article on how to write alt text, which includes a chart type, type of data, reason for including chart, and link to data source

Source: medium.com/nightingale/writing-alt-text-for-data-visualization (Amy Cesal)

Styling for RSS publications

Styling for Significance Magazine

  • Following data visualisation guidelines

  • Consistent font and colour choices

  • Easy to implement!

Photo of inside cover of Significance magazine August 2023

{RSSthemes} R package

  • Colour palettes

  • Base R helper functions

  • {ggplot2} helper functions

Installation

Install from CRAN:

install.packages("RSSthemes")

Installing from GitHub (development version):

remotes::install_github("nrennie/RSSthemes")

Load package:

library(RSSthemes)

Plotting with base R

Examples using Significance colours

  • signif_blue

  • signif_green

  • signif_orange

  • signif_red

  • signif_yellow

barplot(
  height = table(mtcars$gear),
  col = signif_blue,
  cex.axis = 4, cex.names = 4
)

Plotting with base R

barplot(
  height = table(mtcars$gear),
  col = factor(
    unique(mtcars$gear)
    ),
  cex.axis = 4, cex.names = 4
)

set_rss_palette("signif_qual")
barplot(
  height = table(mtcars$gear),
  col = factor(
    unique(mtcars$gear)
    ),
  cex.axis = 4, cex.names = 4
)

Plotting with ggplot2}

  • {ggplot2} is an R package that provides functionality for drawing graphics.
install.packages("ggplot2")

ggplot2 hex sticker logo

{ggplot2}: basic plot

library(ggplot2)
g <- ggplot(data = mtcars) +
  geom_bar(
    mapping = aes(
      x = cyl,
      fill = factor(vs)
      )
    )
g

{ggplot2}: scales

g +
  scale_fill_rss_d("signif_qual")

{ggplot2}: theme

g +
  scale_fill_rss_d("signif_qual") +
  labs(
    title = "Title",
    subtitle = "Subtitle") +
  theme_significance(base_size = 36)

I don’t use R…

The guide also has examples of charts with:

Use something else?

  • Help us add to the guide!

Questions?

Contributing to the guide

Quarto

Quarto is an open-source scientific and technical publishing system that allows you to combine text, images, code, plots, and tables in a fully-reproducible document.

Quarto has support for multiple languages including R, Python, Julia, and Observable. It works for a range of output formats such as PDFs, HTML documents, websites, presentations,…

quarto hex sticker logo

GitHub

The source code for the guide is stored on GitHub.

If you want to contribute to the guide, the easiest way is via GitHub.

To ask a question or make a suggestion

Create or add to a GitHub discussion

Screenshot of github repository with discussions shown

To report a bug or add a feature

Create an issue and describe:

  • what the bug or error is, and add the issue tag bug

  • what feature you want to include, and add the issue tag enhancement

Screenshot of github repository with issue button highlighted

Make a fork

Screenshot of github repository with fork button highlighted

Clone the repository

Screenshot of github repository with clone button highlighted

Then make your changes and commit them…

Create a pull request

Open a pull request, describe what changes it contains, reference any issues it describes, and wait for review.

Screenshot of github repository with pull request shown

Need help with GitHub?

Ask us!


RSS Conference session: GitHub: Version control for research, teaching and industry, Thu 7th @ 11:40

Questions?