How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (2024)

[This article was first published on r – Appsilon | End­ to­ End Data Science Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (1)

Histograms with R and ggplot2

Be honest. How uninspiring are your data visualizations? Expert designers make graph design look effortless, but in reality, it can’t be further from the truth. Luckily, the R programming language provides countless ways to make your visualizations eye-catching.

Read more on our ggplot series:

This article will show you how to make stunning histograms with R’s ggplot2 library. We’ll start with a brief introduction and theory behind histograms, just in case you’re rusty on the subject. You’ll then see how to create and tweak ggplot histograms taking them to new heights.

Table of contents:

What is a Histogram?

A histogram is a way to graphically represent the distribution of your data using bars of different heights. A single bar (bin) represents a range of values, and the height of the bar represents how many data points fall into the range. You can change the number of bins easily.

The easiest way to understand them is through visualization. The image below shows a histogram of 10,000 numbers drawn from a standard normal distribution (mean = 0, standard deviation = 1):

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (2)

Image 1 – Histogram of a standard normal distribution

Although at first glance the histogram doesn’t look like much, it actually tells you a lot. When data is distributed normally (bell curve), you can draw the following conclusions:

  • 68.26% of the data points are located between -1 and +1 standard deviations (34.13% in either direction).
  • 95.44% of the data points are located between -2 and +2 standard deviations (47.72% in either direction).
  • 99.72% of the data points are located between -3 and +3 standard deviations (49.86% in either direction).
  • Anything outside the -3 and +3 standard deviation range is considered to be an outlier.

In reality, you’re rarely dealing with a perfectly normal distribution. It’s usually skewed in either direction or has multiple peaks. Keep this in mind when drawing conclusions from the shape of a histogram, alone.

Let’s see how you can use R and ggplot to visualize histograms.

Make Your First ggplot Histogram

We’ll use the Gapminder dataset throughout the article to visualize histograms. It’s a relatively small dataset showing life expectancy, population, and GDP per capita in countries between 1952 and 2007. We’ll use only a subset that shows countries in Europe and discard everything else.

Here’s the code you need to import libraries, load, and filter the dataset:

Here’s how the first couple of rows from gm_eu look like:

We’ll visualize the lifeExp column with histograms, as it provides enough continuous data to play around with.

Let’s make the most basic ggplot histogram first. You can use the geom_histogram() function to do so. Provided you’ve passed in the dataset and the default aesthetics:

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (4)

Image 3 – Default histogram

Well, you won’t see anything like that on a website or in a magazine, so we better get our keyboard dirty with some tweaking.

Let’s start by changing the number of bins (bars). The default value is 30, and it works in most cases. If you want your histograms to look boxier, use fewer bins. On the other hand, go big if you want your histograms to look like density plots. Here’s how a histogram with 10 bins looks like:

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (5)

Image 4 – Histogram with 10 bins

Let’s stick with the default number of bins for the rest of the article, as it looks somewhat better.

The coloring is painful to look at. There’s nothing wrong with gray, but it looks too boring. Here’s how to enhance your ggplot histogram to make give it some Appsilon flair — blue fill color with black borders:

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (6)

Image 5 – Tweaking the fill and outline color

Much better, provided you like the blue color. Let’s dive deeper into styling and annotations next.

How to Style and Annotate ggplot Histograms

Styling

You can bring more life to your ggplot histogram. For example, we sometimes like to add a vertical line representing the mean, and two surrounding lines representing the range between -1 and +1 standard deviations from the mean. It’s a good idea to style the lines differently, just so your histogram isn’t confusing.

The following code snippet draws a black line at the mean, and dashed black lines at -1 and +1 standard deviation marks:

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (7)

Image 6 – Adding vertical lines to histograms

Are you up for a challenge? Try to recreate our histogram from Image 1. Hint: use geom_segment() instead of geom_vline().

Every so often you want to make your ggplot histogram richer by combining it with a density plot. It shows more or less the same information, just in a smoother format. Here’s how you can add a density plot overlay to your histogram:

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (8)

Image 7 – Adding density plots to histograms

It’s somewhat of a richer data representation than if you’d’ve gone with the histogram alone. For example, if you were to embed the above chart to a dashboard, you could let the user toggle the overlay for maximum customizability.

Do you want to build dashboards professionally? Here’s how to start a career as an R Shiny Developer.

Annotations

Finally, let’s see how you can add annotations to your ggplot histogram. Maybe you find vertical lines too intrusive, and you just want a plain textual representation of specific values.

First things first, you’ll need to create a data.frame for annotations. It should contain X and Y values, and also the labels that will be displayed:

You can now include these in a geom_text() layer. Hint: make the annotations bold, so they’re easier to spot:

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (9)

Image 8 – Adding annotations to histograms

The trick with annotations is making sure there’s some gap between them, so the text doesn’t overlap.

Let’s also see how you can remove this grayish background color. The easiest approach is by adding a more minimalistic theme to the chart. The theme_classic() is one of our top picks:

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (10)

Image 9 – Changing the theme

The only thing missing from our ggplot histogram is the title and axis labels. The users don’t know what they’re looking at without them.

Add Text, Titles, Subtitles, Captions, and Axis Labels to ggplot Histograms

Titles and axis labels are mandatory for production-ready charts. Subtitles or captions are optional, but we’ll show you how to add them as well. The magic happens in the labs() layer. You can use it to specify the values for title, subtitle, caption, X-axis, and Y-axis:

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (11)

Image 10 – Adding title, subtitle, caption, and axis labels

It’s a good start, but the newly added elements don’t stand out. You can change the font, color, size, among other things, in the theme() layer. Just make sure to include a custom theme layer like theme_classic() before you write your styles. These would get overridden otherwise:

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (12)

Image 11 – Styling title, subtitle, and caption

It’s starting to shape up now. And it also matches the color palette of our ggplot histogram. We’ve covered everything needed to get you started visualizing your data distributions with histograms, so we’ll call it a day here. But there’s so much more you can do with your visualizations. Check out some of our Shiny demos to see where advanced level R programming can take your data visualizations.

Did you know there’s another way to visualize data distributions? Read our complete guide to boxplots.

Conclusion

Today you’ve learned what histograms are, why they are important for visualizing the distribution of continuous data, and how to make them appealing with R and the ggplot2 library. It’s enough to set you on the right track, and now it’s up to you to apply this knowledge to your datasets. We’re sure you can manage it.

At Appsilon, we’ve used histograms and the ggplot2 package in developing enterprise R Shiny dashboards for Fortune 500 companies. If R and R Shiny is something you have experience with, we might have a position ready for you.

Start a career at Appsilon — positions available.

Article How to Make Stunning Histograms in R: A Complete Guide with ggplot2 comes from Appsilon | End­ to­ End Data Science Solutions.

Related

To leave a comment for the author, please follow the link and comment on their blog: r – Appsilon | End­ to­ End Data Science Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How to Make Stunning Histograms in R: A Complete Guide with ggplot2 (2024)

FAQs

How to make beautiful histograms in R? ›

How to Make a ggplot2 Histogram in R
  1. Plotting Probability Densities Instead of Counts.
  2. Update Binning Using Bin.
  3. Customize the Color of the Histogram.
  4. Customize the Color of the ggplot2 Histogram Based on Groups.
  5. Add Labels and Titles Using Labs.
  6. Setting x-axis Limits Using xlim()
  7. Change Legend Position.

How to make ggplot look professional? ›

Using themes with you plots can give them a more professional look. Themes are added to a plot by adding the theme at the end of your plot code. p + theme_minimal() . If there other custom theme elements that you want to add you should do that after calling the theme function.

How to add color in ggplot histogram? ›

It is also possible to change manually histogram plot fill colors using the functions :
  1. scale_fill_manual() : to use custom colors.
  2. scale_fill_brewer() : to use color palettes from RColorBrewer package.
  3. scale_fill_grey() : to use grey color palettes.

Which of the following functions can be used to create a histogram in ggplot in R? ›

This R tutorial describes how to create a histogram plot using R software and ggplot2 package. The function geom_histogram() is used.

How to make beautiful histograms? ›

Personalize your histogram

Add images or illustrations to spruce up the histogram. Customize the roundness of each bar or the spacing in between. To emphasize the frequency distribution, you can superimpose a line using the "Draw" tool or any line element in our media library.

How do you make a perfect histogram? ›

You need to follow the below steps to construct a histogram.
  1. Begin by marking the class intervals on the X-axis and frequencies on the Y-axis.
  2. The scales for both the axes have to be the same.
  3. Class intervals need to be exclusive.
  4. Draw rectangles with bases as class intervals and corresponding frequencies as heights.

What is aesthetic mapping in ggplot? ›

Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. Aesthetic mappings can be set in ggplot() and in individual layers.

Why is ggplot so good? ›

The answer is that ggplot2 is declaratively and efficient in creating data visualization based on The Grammar of Graphics. The layered grammar makes developing charts structural and effusive. Generating ggplot2 feels like playing with LEGO blocks.

How to modify a histogram in R? ›

To change the number of bins in the histogram using the ggplot2 package library in the R Language, we use the bins argument of the geom_histogram() function. The bins argument of the geom_histogram() function to manually set the number of bars, cells, or bins the whole histogram will be divided into.

How to change the outline of a histogram in R? ›

We can change the colors inside of the bins on the histogram using the col parameter of the hist() function. We will change the fill to blue. We can also change the outline color of the bars using the border parameter. We will change the color of the outlines to white.

How to plot a histogram in R? ›

R uses hist () function to create histograms. This hist () function uses a vector of values to plot the histogram. Histogram comprises of an x-axis range of continuous values, y-axis plots frequent values of data in the x-axis with bars of variations of heights. break – specifies the width of each bar.

How many bins are there in ggplot2 histogram? ›

By default, the underlying computation ( stat_bin() ) uses 30 bins; this is not a good default, but the idea is to get you experimenting with different number of bins. You can also experiment modifying the binwidth with center or boundary arguments. binwidth overrides bins so you should do one change at a time.

What is the difference between ggplot histogram and barplot? ›

The x and y axes of bar plots specify the category which is included in specific data set. Histogram is a bar graph which represents the raw data with clear picture of distribution of mentioned data set.

How can you improve the smoothness of a histogram? ›

We can achieve this by increasing the number of bins, which is essentially the number of classes the histogram divides the data into. More bins will make the histogram smoother. We can see that the visualization is now richer in information.

What does a perfect histogram look like? ›

So, a healthy histogram will show up as a strong graph which shouldn't appear too "thin" or lacking in data, ideally centered around the middle with no data running off either end of the graph. Then again, rules can and will often be broken.

How do I color a histogram bar in R? ›

Customizing the color

We can change the colors inside of the bins on the histogram using the col parameter of the hist() function. We will change the fill to blue. We can also change the outline color of the bars using the border parameter. We will change the color of the outlines to white.

References

Top Articles
Latest Posts
Article information

Author: Ray Christiansen

Last Updated:

Views: 5478

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Ray Christiansen

Birthday: 1998-05-04

Address: Apt. 814 34339 Sauer Islands, Hirtheville, GA 02446-8771

Phone: +337636892828

Job: Lead Hospitality Designer

Hobby: Urban exploration, Tai chi, Lockpicking, Fashion, Gunsmithing, Pottery, Geocaching

Introduction: My name is Ray Christiansen, I am a fair, good, cute, gentle, vast, glamorous, excited person who loves writing and wants to share my knowledge and understanding with you.