Probability Distributions
Some special distributions and visualizing probabilities
Concept Acquisition
- Probability distributions
- Probability histograms
- Empirical histograms
- Distribution tables
Tool Acquisition
- How to write down the distribution of the probabilities of outcomes
- What a probability histogram represents
- Empirical histograms vs probability histograms
geom_col()
, R script files,replicate()
Concept Application
1.Drawing probability histograms 2.Using R to simulate probabilities 3.Drawing empirical histograms
So far we have seen examples of outcome spaces, and descriptions of how we might compute probabilities, along with tabular representations of the probabilities. In this set of notes, we are going to talk about how to visualize probabilities using tables and histograms, as well as how to visualize simulations of outcomes from actions such as tossing coins or rolling dice.
Probability distributions and histograms
Probability distributions
Recall the example in which we drew a ticket from a box with 5 tickets in it:
If we draw one ticket at random from this box, we know that the probabilities of the four distinct outcomes can be listed in a table as:
Outcome | ||||
---|---|---|---|---|
Probability |
What we have described in the table above is a probability distribution. We have shown how the total probability of one or 100% is distributed among all the possible outcomes. Since the ticket
Probability histograms
A table is nice, but a visual representation would be even better.
We have represented the distribution in the form of a histogram, with the areas of the bars representing probabilities. Notice that this histogram is different from the ones we have seen before, since we didn’t collect any data. We just defined the probabilities based on the outcomes, and then drew bars with the heights being the probabilities. This type of theoretical histogram is called a probability histogram.
Empirical histograms
What about if we don’t know the probability distribution of the outcomes of an experiment? For example, what if we didn’t know how to compute the probability distribution above? What could we do to get an idea of what the probabilities might be? Well, we could keep drawing tickets over and over again from the box, with replacement (that is, we put the selected tickets back before choosing again), keep track of the tickets we draw, and make a histogram of our results. This kind of histogram, which is the kind we have seen before, is a visual representation of data, and is called an empirical histogram.
On the x-axis of this histogram, we have the ticket values; on the y-axis, we have the proportion of times that this ticket was selected out of the 50 with-replacement draws we took.We can see that the sample proportions looks similar to the values given by the probability distribution, but there are some differences. For example, we appear to have drawn more
Ticket | Number of times drawn | Proportion of times drawn |
---|---|---|
10 | 0.2 | |
24 | 0.48 | |
10 | 0.2 | |
6 | 0.12 |
What we have seen here is how when we draw at random, we get a sample that resembles the population, that is, a representative sample, but it isn’t exactly the true probabilities. If we increase our sample, however, say to 500, we will get something that more closely aligns with the truth.
Examples
Rolling a pair of dice and summing the spots
The outcomes are already numbers, so we don’t need to represent them differently. We know that there are
The probability histogram will have the possible outcomes listed on the x-axis, and bars of width
What about the probability distribution? Make a table showing the probability distribution for rolling a pair of dice and summing the spots.
Check your answer
Outcome | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Probability |
Tossing a fair coin 3 times and counting the number of heads
We have seen that there are 8 equally likely outcomes from tossing a fair coin three times:
Outcome | ||||
---|---|---|---|---|
Probability |
What would the probability histogram look like?
Check your answer
We are going to introduce some special distributions. We have seen most of these distributions, but will introduce some names and definitions. Before we do this, let’s recall how to count the number of outcomes for various experiments such as tossing coins or drawing tickets from a box (both with and without replacement).
Basic rule of counting
Recall that if we have multiple steps (say
This example seems trivial, but it illustrates the basic principle of counting: we get the total number of possible outcomes of an action that has multiple steps, by multiplying together the number of outcomes for each step. All the counting that follows in our notes applies this rule. For example, let’s suppose we are drawing tickets from a box which has tickets marked with the letters
How many possible words are there if we draw without replacement? That is, we don’t put the drawn ticket back?
Check your answer
We have
We usually write the quantity
Counting the number of ways to select a subset
What if, in this example of selecting
and is called the number of combinations of
To recap: when we draw
- Permutations
-
The number of possible arrangements or sequences of
things taken at a time which is given by (the ordering matters):
- Combinations
-
Number of ways to choose a subset of
things out of possible things which is given by This number is just the number of distinct arrangements or permutations of things taken at a time divided by the number of arrangements of things. It is denoted by , which is read as “n choose k”.
Example How many ways can I deal
Check your answer
When we deal cards, order does not matter, so this number is
Special distributions
There are some important special distributions that every student of probability must know. Here are a few, and we will learn some more later in the course. We have already seen most of these distributions. All we are doing now is identifying their names. First, we need a vocabulary term:
- Parameter of a probability distribution
- A constant(s) number associated with the distribution. If you know the parameters of a probability distribution, then you can compute the probabilities of all the possible outcomes.
Each of the distributions we will cover below has a parameter(s) associated with it.
Discrete uniform distribution
This is the probability distribution over the numbers
Bernoulli distribution
This is a probability distribution describing the probabilities associated with binary outcomes that result from one action, such as one coin toss that can either land Heads or Tails. We can represent the action as drawing one ticket from a box with tickets marked
For the Bernoulli distribution, our parameter is
In the figure above, the first histogram is for a Bernoulli distribution with parameter
Binomial Distribution
The binomial distribution, which describes the total number of successes in a sequence of
What would the probability distribution and histogram for the number of heads in three tosses of a biased coin like, where
Check your answer
Note that the outcomes
More generally, suppose that we have
The multiplication rule for independent events tells us how to compute the probability of a sequence that consisted of the first
How many such sequences are there? We can count them using our rules above. We have
The probability distribution described by the above formula is called the binomial distribution. It is named after
Example
Toss a weighted coin, where
Check your answer
Across five trials, we need to see four heads and one tails. Since each toss is independent, one possible way to obtain what we are looking for is
However, we need to consider all the possible orderings of tosses that involve four heads and one tail. This is given by the binomial coefficient
Hypergeometric distribution
In the binomial scenario described above, we had
As usual, the ticket marked
What is the probability that we will have exactly
and is called the hypergeometric distribution. It has three parameters,
Numerator
We count the number of samples drawn without replacement that have
Denominator
We count the total number of simple random samples of size
Example
Say we have a box of
What is the probability that two of the tickets drawn are marked
Check your answer
We draw
How many total ways are there to draw
Therefore, our final answer is given by
Binomial vs Hypergeometric distributions
Both these distributions deal with:
a fixed number of trials, or instances of the random experiment;
outcomes that are deemed either successes or failures.
The difference is that for a binomial random variable, the probability of a success stays the same for each trial, and for a hypergeometric random variable, the probability changes with each trial.
The Ideas in Code
Before discussing how to simulate the distributions, we are going to introduce three more useful functions.
Three useful functions
1. rep()
: replicates values in a vector
Sometimes we need to create vectors with repeated values. In these cases, rep()
is very useful.
- Arguments
x
: the vector or list that is to be repeated. This must be specifiedtimes
: the number of times we should repeat the elements ofx
. This could be a vector the same length asx
detailing how many times each element is to be repeated, or it could be a single number, in which case the entirex
is repeated that many times.each
: the default is 1, and if specified, each element ofx
is repeatedeach
times.
2. replicate()
: repeat a specific set of tasks a large number of times.
- Arguments
n
: the number of times we want to repeat the task. This must be specifiedexpr
: the task we want to repeat, usually an expression that is some combinations of functions, for example, maybe we take a sample from a vector, and then sum the sample values.
3. geom_col()
: plotting with probability
When plotting probability histograms, we know exactly what the the height of each bar should be. This is as opposed to the bar charts you have seen before (and empirical histograms), where we are just trying to visualize the data that we have collected.
geom_col()
creates a bar chart in which the heights represent numbers that can be specified via an aesthetic. In other words, the y variable will appear in our call to aes()
!
Example: Rolling a die twice and summing the spots
As you read through the code in this section, keep RStudio open in another window to code along at the console. Keep in mind that we use set.seed()
more than once for demonstration purposes only.
Suppose we want to simulate the task of rolling a pair of die and summing the two spots. We can accomplish this task and examine our results using the functions we have just introduced. First, we will make a vector representing a fair, six-sided die.
<- seq(from = 1, to = 6, by = 1) die
Obtaining a sum
Method 1 - replicate()
We can use the sample()
function to roll the die twice; this will output a vector with two die numbers. Then, we can take the sum of this vector by nesting the call to sample()
inside of sum.
set.seed(214)
sum(sample(die, size = 2, replace = TRUE))
[1] 7
If we would like to repeat this action many times (for instance, in a game of Monopoly, each player has to roll two dice on their turn and sum the spots), the replicate()
function will come in handy. In the following line of code, we obtain 10 sums.
replicate(n= 10, expr = sum(sample(die, size = 2, replace = TRUE)))
[1] 11 8 8 12 5 7 8 7 10 7
Method 2 - rep()
We could also roll the die in advance and then sample from the possible sums: 2 through 12. However, when rolling the two die, there is only one way to get a sum of times
argument of the rep()
function to make such a box. The number
<- seq(from = 2, by = 1, to = 12)
possible_sums <- rep(possible_sums,
correct_sums times = c(1,2, 3, 4, 5, 6, 5, 4, 3, 2, 1))
correct_sums
[1] 2 3 3 4 4 4 5 5 5 5 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8
[26] 8 9 9 9 9 10 10 10 11 11 12
To get 10 sums as we did before, we just need to sample with ten times with replacement from this new box, correct_sums
.
sample(x = correct_sums, size = 10, replace = TRUE)
[1] 7 4 8 9 2 7 4 7 6 8
Visualizing our results
Making a probability histogram with geom_col()
First, let’s create a vector with the probabilities associated with each possible that can be obtained from rolling two dice. We are taking these probabilities from the drawn probability histogram earlier in the notes.
<- c(1,2,3,4,5,6,5,4,3,2,1)/36 prob_sums
Now, using the above and the possible_sums
vector from before, we can make a data frame with the information about the probability distribution and create a probability histogram, which in turn can be used to make a plot with geom_col()
.
<- data.frame(possible_sums, prob_sums) |>
prob_hist ggplot(mapping = aes(x = factor(possible_sums),
y = prob_sums)) +
geom_col(fill = "goldenrod") +
labs(x = "sum value",
y = "probability")
prob_hist
The use of factor()
is to make sure that for the purposes of the plot, that the sum values are treated categorically.
Performing a simulation and making an empirical histogram
Let’s simulate rolling two die and and computing a sum fifty times Then, we can make a data frame out of our results and find the total amount of rolls, grouped by face. This can be done with the n()
summary function– and if we divide by 50, we can get the sample proportions of each sum.
set.seed(214)
<- replicate(n= 50,
results expr = sum(sample(die, size = 2, replace = TRUE)))
<- data.frame(results) |>
empirical group_by(results) |>
summarise(props = n()/50)
empirical
# A tibble: 11 × 2
results props
<dbl> <dbl>
1 2 0.04
2 3 0.02
3 4 0.06
4 5 0.1
5 6 0.08
6 7 0.24
7 8 0.1
8 9 0.12
9 10 0.14
10 11 0.08
11 12 0.02
Now, we can construct an empirical histogram using the empirical
data frame.
<- empirical |>
emp_50 ggplot(mapping = aes(x = factor(results),
y = props)) +
geom_col(fill = "blue") +
labs(x = "sum value",
y = "sample proportion")
emp_50
Comparing our results to the truth
You may have wondered why we bothered to save the plot objects. The reason is that we can use a nifty library called patchwork
which will help us to more easily visualize multiple plots at once by using mathematical and logical syntax. For instance, using +
will put plots side by side.
library(patchwork)
+ emp_50 prob_hist
With only 50 experiments run, we see that the empirical histogram doesn’t quite match. However, modify the above code by increasing the number of repetitions, and you will see the empirical histogram begin to resemble more closely true probability distribution. This is an example of long-run relative frequency.
Summary
- Defined probability distributions
- Stated the basic counting principle and introduced permutations and combinations
- Defined some famous named distributions (Bernoulli, discrete uniform, binomial, hypergeometric)
- Visualized probability distributions using probability histograms
- Looked at the relationship between empirical histograms and probability histograms.
- Introduced functions
rep()
,replicate()
,geom_col()
- Simulated random experiments such as die rolls and coin tosses to visualize the distributions.
Footnotes
Penguin taken from art by @allison_horst↩︎