Continuous Distributions and Normal Approximations

Connections to boxes, continuous distributions, and a fundamental result

Ideas in code

Useful functions

Uniform\((a, b)\)

  1. dunif computes the density \(f(x)\) of \(X\) where \(f(x) = \displaystyle \frac{1}{b-a}\), for \(a < x < b\).
  • Arguments:
    • x: the value of \(x\) in \(f(x)\)
    • min: the parameter \(a\), the lower end point of the interval for \(X\). The default value is min = 0
    • max: the parameter \(b\), the upper end point of the interval for \(X\). The default value is max = 1
  1. punif computes the cdf \(F(x) = P(X \le x)\) of \(X\).
  • Arguments:
    • q: the value of \(x\) in \(F(x)\)
    • min: the parameter \(a\), the lower end point of the interval for \(X\). The default value is min = 0
    • max: the parameter \(b\), the upper end point of the interval for \(X\). The default value is max = 1
  1. runif generates random numbers from the \(Unif(a,b)\) distribution.
  • Arguments:
    • n: the size of the sample we want
    • min: the parameter \(a\), the lower end point of the interval for \(X\). The default value is min = 0
    • max: the parameter \(b\), the upper end point of the interval for \(X\). The default value is max = 1

Normal\((\mu, \sigma^2)\)

  1. dnorm computes the density \(f(x)\) of \(X \sim N(\mu,\sigma^2)\)
  • Arguments:
    • x: the value of \(x\) in \(f(x)\)
    • mean: the parameter \(\mu\), the mean of the distribution. The default value is mean = 0
    • sd: the parameter \(\sigma\), the sd of the distribution. The default value is sd = 1
  1. pnorm computes the cdf \(F(x) = P(X \le x)\) of \(X\).
  • Arguments:
    • q: the value of \(x\) in \(F(x)\)
    • mean: the parameter \(\mu\), the mean of the distribution. The default value is mean = 0
    • sd: the parameter \(\sigma\), the sd of the distribution. The default value is sd = 1
  1. rnorm generates random numbers from the Normal\((\mu, \sigma^2)\) distribution.
  • Arguments:
    • n: the size of the sample we want
    • mean: the parameter \(\mu\), the mean of the distribution. The default value is mean = 0
    • sd: the parameter \(\sigma\), the sd of the distribution. The default value is sd = 1

Example

Let’s verify the empirical rule for the standard normal random variable:

Note that (for example) \(P(-1 \le X \le 1) = F(1) - F(-1)\):

pnorm(q = 1) - pnorm(q = -1)
[1] 0.6826895
pnorm(q = 2) - pnorm(q = -2)
[1] 0.9544997
pnorm(q = 3) - pnorm(q = -3)
[1] 0.9973002

The argument prob in the function sample()

We have seen the function sample(), but so far, have only used it when we were sampling uniformly at random. That is, all the values are equally likely. We can sample according to a weighted probability, though, by putting in a vector of probabilities. Let’s look at the example of net gain while betting on red on a roulette spin. Recall that if we bet a dollar on red, then our net gain is \(+1\) with a probability of \(\displaystyle \frac{18}{38}\) and \(-1\) with a probability of \(\displaystyle \frac{20}{38}\).

gain <- c(1,-1) # define the gain for a single spin
prob_gain <- c(18/38,20/38) #define the corresponding probabilities
exp_gain <- sum(gain*prob_gain)
exp_gain
[1] -0.05263158
set.seed(123)
#simulate gain from 10 spins of the wheel
sample(x = gain, size = 10, prob = prob_gain, replace = TRUE)
 [1] -1  1 -1  1  1 -1  1  1  1 -1
#simulate net gain from 10 spins of the wheel which would sum these
sum(sample(x = gain, size = 10, prob = prob_gain, replace = TRUE))
[1] 0

Here is a simulation showing the Central Limit Theorem at work, with the empirical distribution becoming gradually more bell-shaped. Net gain is the sum of \(n\) draws with replacement from the vector gain defined above using the prob_gain vector.