Chapter 7 Histograms | Data Visualization with R (2024)

7.1 Introduction

In this chapter, we will learn to:

  • create a bare bones histogram
  • specify the number of bins/intervals
  • represent frequency density on the Y axis
  • add colors to the bars and the border
  • add labels to the bars

A histogram is a plot that can be used to examine the shape and spread of continuous data. It looks very similar to a bar graph and can be used to detect outliers and skewness in data. The histogram graphically shows the following:

  • center (location) of the data
  • spread (dispersion) of the data
  • skewness
  • outliers
  • presence of multiple modes

To construct a histogram, the data is split into intervals called bins. The intervals may or may not be equal sized. For each bin, the number of data points that fall into it are counted (frequency). The Y axis of the histogram represents the frequency and the X axis represents the variable.

7.2 Distributions

Before we learn how to create histograms, let us see how normal and skewed distributions look when represented by a histogram.

7.2.1 Normal Distribution

Chapter 7 Histograms | Data Visualization with R (1)

7.2.2 Skewed Distributions

Chapter 7 Histograms | Data Visualization with R (2)

7.3 Basics

Histograms are created using the hist() function in R. The minimum input required to create a bare bones histogram is a continuous variable. Below is an example:

Chapter 7 Histograms | Data Visualization with R (3)

The hist() functions returns details of the histogram which can be accessed by assigning the histogram to a variable. Let us assign the above histogram to a variable h and use the $ symbol to access the details stored in the variable.

Chapter 7 Histograms | Data Visualization with R (4)

# display number of breaksh$breaks## [1] 10 15 20 25 30 35# frequency of the intervalsh$counts## [1] 6 12 8 2 4# frequency densityh$density## [1] 0.0375 0.0750 0.0500 0.0125 0.0250# mid points of the intervalsh$mids## [1] 12.5 17.5 22.5 27.5 32.5# varible nameh$xname## [1] "mtcars$mpg"# whether intervals are of equal sizeh$equidist## [1] TRUE

7.4 Bins

The hist() function creates equidistant intervals by default. We can specify the number of bins using the breaks argument.

Chapter 7 Histograms | Data Visualization with R (5)

The below plot displays histograms with different number of bins:

Chapter 7 Histograms | Data Visualization with R (6)

7.5 Intervals

If we want to create histograms with specific intervals, the breaks argument can be supplied with the intervals.

Chapter 7 Histograms | Data Visualization with R (7)

If you observe the Y axis, it does not represent frequency any more. Instead, it represents the frequency density. What is frequency density?

7.5.1 Frequency Density

Frequency Density = Relative Frequency / Class Width

Relative Frequency = Frequency / Total Observations

h <- hist(mtcars$mpg, breaks = c(10, 18, 24, 30, 35))

Chapter 7 Histograms | Data Visualization with R (8)

frequency <- h$countsclass_width <- c(8, 6, 6, 5)rel_freq <- frequency / length(mtcars$mpg)freq_density <- rel_freq / class_widthd <- data.frame(frequency = frequency, class_width = class_width, relative_frequency = rel_freq, frequency_density = freq_density)d
## frequency class_width relative_frequency frequency_density## 1 13 8 0.40625 0.05078125## 2 12 6 0.37500 0.06250000## 3 3 6 0.09375 0.01562500## 4 4 5 0.12500 0.02500000

When multiplied by the class width, the product will always sum upto 1.

sum(d$frequency_density * d$class_width)
## [1] 1

We will learn more about frequency density in a bit. Before we end this section, we need to learn about one more way to specify the intervals of the histogram, algorithms. The hist() function allows us to specify the following algorithms:

  • Sturges (default)
  • Scott
  • Freedman-Diaconis (FD)

In the below plot, we examine how th algorithms work:

Chapter 7 Histograms | Data Visualization with R (9)

7.6 Frequency Distribution II

Let us come back to frequency density. If you want the Y axis of the histogram to represent frequency density instead of counts, set the freq argument to FALSE.

Chapter 7 Histograms | Data Visualization with R (10)

The same result can be achieved by using the probability argument as well. It takes only logical values as inputs and the default is FALSE. If set to TRUE, the Y axis will represent the frequency density instead of counts.

hist(mtcars$mpg, probability = TRUE)

Chapter 7 Histograms | Data Visualization with R (11)

7.7 Color

To add colors to the bars of the histogram, use the col argument. If the number of colors specified is less than the number of bars, the colors are recycled. Below are a few examples:

7.7.1 Single Color

Chapter 7 Histograms | Data Visualization with R (12)

7.7.2 Different Colors

Chapter 7 Histograms | Data Visualization with R (13)

7.7.3 Recycled Colors

Chapter 7 Histograms | Data Visualization with R (14)

7.8 Border Color

Colors can be specified for the borders of the histogrambars using the border argument.

Chapter 7 Histograms | Data Visualization with R (15)

7.8.1 Different Colors

Chapter 7 Histograms | Data Visualization with R (16)

7.9 Labels

In certain cases, we might want to add the frequency counts on the histogram bars. It is easier for the user to know the frequencies of each bin when they are present on top of the bars. Let us add the frequency counts on top of the bars using the labels argument. We can either set it to TRUE or a character vector containing the label values. Let us look at both the methods.

7.9.1 Method 1

Set labels to TRUE.

Chapter 7 Histograms | Data Visualization with R (17)

7.9.2 Method 2

Specify the label values in a character vector.

Chapter 7 Histograms | Data Visualization with R (18)

7.10 Putting it all together..

Let us add a title and axis labels to the histogram.

hist(mtcars$mpg, labels = TRUE, prob = TRUE, ylim = c(0, 0.1), xlab = 'Miles Per Gallon', main = 'Distribution of Miles Per Gallon', col = rainbow(5))

Chapter 7 Histograms | Data Visualization with R (19)

Chapter 7 Histograms | Data Visualization with R (2024)

References

Top Articles
Gas / Electric Furnace Thermostat Wiring (Diagrams & Color Code)
Thermostat Wire Diagram - 8 Color Codes Made Easy - HVAC BOSS
Chs.mywork
Napa Autocare Locator
Phcs Medishare Provider Portal
Shorthand: The Write Way to Speed Up Communication
Gore Videos Uncensored
Geodis Logistic Joliet/Topco
Plus Portals Stscg
Xm Tennis Channel
Craigslist/Phx
How To Delete Bravodate Account
Turbocharged Cars
Transfer Credits Uncc
Elizabethtown Mesothelioma Legal Question
New Stores Coming To Canton Ohio 2022
Lcwc 911 Live Incident List Live Status
Ibukunore
Danforth's Port Jefferson
Craigslist Pearl Ms
Rochester Ny Missed Connections
PCM.daily - Discussion Forum: Classique du Grand Duché
Southland Goldendoodles
48 Oz Equals How Many Quarts
Craigslist Hunting Land For Lease In Ga
Expression&nbsp;Home&nbsp;XP-452 | Grand public | Imprimantes jet d'encre | Imprimantes | Produits | Epson France
101 Lewman Way Jeffersonville In
Ts Modesto
Trust/Family Bank Contingency Plan
Pdx Weather Noaa
South Florida residents must earn more than $100,000 to avoid being 'rent burdened'
Fedex Walgreens Pickup Times
Culver's Hartland Flavor Of The Day
Gyeon Jahee
Tamilrockers Movies 2023 Download
The Ride | Rotten Tomatoes
Junee Warehouse | Imamother
3400 Grams In Pounds
3302577704
Worcester County Circuit Court
About My Father Showtimes Near Amc Rockford 16
Wasmo Link Telegram
412Doctors
Fatal Accident In Nashville Tn Today
Beds From Rent-A-Center
Espn Top 300 Non Ppr
Minecraft: Piglin Trade List (What Can You Get & How)
Is Chanel West Coast Pregnant Due Date
What Time Do Papa John's Pizza Close
How To Win The Race In Sneaky Sasquatch
Inloggen bij AH Sam - E-Overheid
Bunbrat
Latest Posts
Article information

Author: Mrs. Angelic Larkin

Last Updated:

Views: 6157

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Mrs. Angelic Larkin

Birthday: 1992-06-28

Address: Apt. 413 8275 Mueller Overpass, South Magnolia, IA 99527-6023

Phone: +6824704719725

Job: District Real-Estate Facilitator

Hobby: Letterboxing, Vacation, Poi, Homebrewing, Mountain biking, Slacklining, Cabaret

Introduction: My name is Mrs. Angelic Larkin, I am a cute, charming, funny, determined, inexpensive, joyous, cheerful person who loves writing and wants to share my knowledge and understanding with you.