Professor: Erik Hintz | Term: Fall 2023

Download PDF

Github

Introduction to Probability

Definitions of Probability

Definition

Classical Definition of Probability

Provided things are equally likely.

Definition

Relative Frequency

The probability of an event is proportionate to number of times the event occurs in long repetitions. The probability of rolling a 2 is because after 6 million rolls, 2 will appear ~1 million times. (this is not very practical).

Definition

Subjective Definition

Persons belief on how likely something will happen (unclear since varies by person).

Mathematical Probability Models

Samples Spaces and Probability

Sample Spaces & Sets

Sample space is a set of distinct outcomes of an experiment with the property that in a single trial, only one outcome will occur.

  • Die: or

  • Number of coin flips until heads occurs:

  • Waiting time in minutes until a sunny day.

Sample space is discrete if it is finite or countably infinite (one-to-one correspondence with ). Otherwise non-discrete.

is discrete (countably infinite).

Event is a subset of a sample space .

is an event if ( is a subset of , is contained in ).

  • Die shows :

  • or fewer tosses till heads.

Notation

  1. Element of: in

  2. Union: or

  3. Intersection: and

  4. Complement:

  5. Empty:

  6. Disjoint

die rolled.

Definition

Probability

Let denote the set of all events on a given sample . Probability defined on is defined as

  1. Scale:

  2. Additivity (infinite):

e.g.

or

(discrete) and is an event.

is indivisible is a simple point, otherwise compound.

e.g.

simple compound.

Assign so that

discrete, is an event.

= Number is odd

for

Sample space is equally likely if probability of every outcome is the same.

of elements in set.

Probability and Counting Techniques

Fact

and are disjoint , then

Addition and Multiplication Rules

Addition Rule

Sum larger than 8.

Sum of Die larger than 8

Sum 9, Sum 10, Sum 11, Sum 12.

all are disjoint.

Definition

Thus,

Multiplication Rules

Ordered tuple, . choices for , choices for etc.

Definition

If there are ways to do task 1, and ways to do task 2, there are ways to do tasks 1 and 2.

With replacement: When the object is selected, put it back after.

Without replacement: Once an object is picked, it stays out of the pool.

Counting Arrangements or Permutations

Definition

Factorials

distinct objects, factorial.

and

10 people standing next to each other, arrangements.

Out of 5 students, must choose 1 president and 1 secretary. .

Note, Stirling’s Approximation.

Definition

Permutations

distinct objects, permutation of size is an ordered subset of individuals. (without replacement)

With replacement,

to the factors.

Our example of president and secretary above can thus be seen as .

Counting Subsets or Combinations

Question

If we have members on a club, how can we select to serve on a committee?

Here, order does not matter. Unlike a permutation, this is a combination.

In general, if we have objects, there are ways to reorder such objects. We can therefore get the combination count by dividing the permutation count by the number of ways of ordering the objects .

Definition

Definition

Given distinct objects, a combination of size is an unordered subset of of the objects (without replacement). The number of combinations of size taken from objects is denoted , which reads ” choose , and

For combinatorial proofs of these identities see MATH239.

For our previous example we have

Properties of

  1. for

  2. If we define , then

  3. Binomial Theorem

Example

Group of women and men, a committee of women and men is formed at random. of them men dislike each other. What is the probability they don’t serve together?

First, get the size of the sample space .

Pick women from women,

Pick men from men,

Thus

Consider the event and do not serve in the committee together

Consider 1 and 2 in the committee together

The size of the sample space is given by:

Pick women from women, .

and are in the committee already, we need to pick man from the left, .

Thus,

Arrangements when Symbols are Repeated

Definition

Definition

Consider objects of types. Suppose objects of type , objects of type , and objects of type . Thus there are,

distinguishable arrangements of objects. This is the multinomial coefficient.

Letters of “SLEEVELESS” are arranged at random. What is the probability the word begins and ends with “S”?

Sample space,

Event,

Thus

Useful Series and Sums

Theorem

Finite Geometric Series:

Theorem

Infinite Geometric series if :

Theorem

Binomial Theorem , if is a positive integer and is any real number:

Theorem

Binomial Theorem , if is not a positive integer but :

Theorem

Multinomial Theorem:

where the summation is over all non-negative integers such that where is a positive integer.

Theorem

Hypergeometric Identity:

Theorem

Exponential Series:

for all in the real numbers.

A related identity:

Series involving integers:

Probability Rules and Conditional Probability

General Methods

Rules:

For any event .

If and are two events with , then .

Fact

Fundamental Laws of Set Algebra

Commutativity

Associativity

Distributivity

De Morgan’s Law

The complement of a union is the intersection of the complements.

The complement of an intersection is the union of the complements.

Applied for events

Rules for Unions of Events

For arbitrary events ,

Fact

If and are disjoint then,

Fact

What if ?

Then

Fact

The probability of the complement event is,

If and are disjoint (mutually exclusive), then

Otherwise,

In general,

Note, is often written as

Intersection of Events and Independence

Independent Events

Rolling die twice is an example of independent events. The outcome of the first doesn’t affect the second.

Definition

Events are independent if and only if

”roll on first” “roll on second”

Not independent. First roll is , first roll is even.

A common misconception is that if and are mutually exclusive, then and are independent.

Fact

If and are independent, and are independent. Law of total probability.

Fact

We can also see,

Conditional Probability

Definition

represents the probability that event occurs, when we know that occurs. This is the conditional probability of given .

Fact

If and are independent, then

and are independent events if and only if either of the following statements is true

Example

= The sum of two die is 10 = The first die is 6

If or , then and are independent.

Properties

  • If

  • If and are disjoint:

Product Rules, Law of Total Probability and Bayes’ Theorem

Fact

Product Rules

For events and ,

That means if we know and ; or and , we can find .

More events:

Definition

Law of Total Probability

Let be a partition of the sample space into disjoint (mutually exclusive) events, that is

Let be an arbitrary event in . Then

A common way in which this is used is that

since and partition .

Theorem

Bayes Theorem Suppose and are events defined on a sample space . Suppose also that . Then

Proof:

Discrete Random Variables

Random Variables and Probability Functions

Definition

A random variable is a function that assigns a real number to each point in a sample space . Often a random variable is abbreviated with RV or rv.

Definition

The values that a random variable can take on are called the range of the random variable. We often denote the range of a random variable by .

Example

If we roll a 6-sided dice, our sample space is and if we define sum of die rolls, the range is

Definitions

We say that a random variable is discrete if it takes values in a countable set (finite or countably infinite).

We say that a random variable is continuous if it takes values in some interval of real numbers, e.g. .

Important: Don’t forget that a random variable can be infinite and discrete.

Definition

Let be a discrete random variable with range . The probability function (p.f.) of is the function

The set of pairs is called the probability distribution of .

A probability function has two important properties:

  1. for all

Example

A random variable has a range with , what values of makes a probability function.

From our rules,

Definition

The Cumulative Distribution Function (CDF) of a random variable is

We use the shorthand that if has CDF . Here,

where is the event that contains all outcomes with .

In general, for any

The CDF satisfies that

  1. for (and we say is a non-decreasing function of )

  2. , and

  3. (right continuous)

If takes value on , we can get probability function from CDF:

In general, we have

Note, we often use

Definition

Definition

Two random variables and are said to have the same distribution if for all . We denote this by

Note, and having the same distribution does not mean .

Discrete Uniform Distribution

Setup

Suppose the range of is where and are integers and suppose all values are equally probable. Then has a Discrete Uniform Distribution on the set . The variables and are called the parameters of the distribution.

Illustrations

If is the number obtained when a die is rolled, then has a discrete Uniform distribution with and .

Definition

Probability Function

There are values in the set so the probability of each value must be in order for . Therefore

Example

Suppose a fair die is thrown once and let be the number on the face. Find the cumulative distribution function of .

Solution

This is an example of a Discrete Uniform distribution on the set having and probability function

The cumulative distribution function is ,

where is the largest integer less than or equal to .

Example

Let be the largest number when a die is rolled 3 times. First find the cumulative distribution function, and then find the probability function of .

Solution

This is another example of a distribution constructed from the Discrete Uniform.

The probability that the largest number is less than or equal to is,

for . Here is the CDF for all real values of :

To find the p.f. we may use the fact that we have so that

Hypergeometric Distribution

Setup

Consider a population of objects, of which are considered “successes” (S) and the remaining are considered “failures” (F).

Suppose that a subset of size is chosen at random from the population without replacement

We say that the random variable has a hypergeometric distribution if denotes the number of successes in the subset (shorthand: ).

  • : Number of objective

  • : Number of successes

  • : Number of draws

Illustrations

Drawing balls without replacement from a bag with blue balls and red balls. Let denote the number of blue balls drawn. Then

Drawing cards from a deck of cards. Let denote the number of aces. Then

Definition

Probability Function

Suppose

  • : the number of successes drawn cannot exceed the number drawn

  • : we have at most success

  • obviously

  • : if exceeds the number of failures , we have at least of successes.

We can verify the probability function of the hypergeometric distribution sums to .

Example

Consider drawing a -card hand at random from a standard -card deck. What is the probability that the hand contains at least K’s?

the number of K in hand. Success type: K’s Failure type: Not K cards

Binomial Distribution

Definition

Definition

A Bernoulli trial with probability of success is an experiment that results in either a success or failure, and the probability of success is .

Setup

Consider an experiment in which Bernoulli trials are independently formed, each with probability of success . denotes the number of successes from trials.

Illustrations

Flip a coin independently times, let denote the number of heads observed. Then

Drawing balls with replacement from a bag with blue balls and red balls. Let denote the number of blue balls drawn.

Assumptions:

  1. Trials are independent

  2. The probability of success, , is the same in each Bernoulli trial

Definition

Probability Function

Proof that for :

If is very large, and we keep the number of success where . We choose a relatively small without replacement from .

Let and . Then

The approximation is good if and are large compared to .

Negative Binomial Distribution

Setup

Consider an experiment in which Bernoulli trials are independently performed, each with probability of success , until exactly successes are observed. Then if denotes the number of failures before observing successes, we say that is Negative Binomial with parameters and .

Let be the number of trials until exactly successes are observed. We have .

Illustrations

Flip a coin until heads are observed, and let denote the number of tails observed. Then

Definition

Probability Function

Proof it is valid

Geometric Distribution

Setup

Geometric distribution is a special case of the Negative Binomial where we stop after the first success .

Definition

Probability Function

Geometric Distribution: CDF For an integer , the CDF of is

So for , and 0 otherwise.

Example

The number of times I have to roll dice before I get snake eyes.

Poisson Distribution (from Binomial)

As and , the binomial function approaches

for

Setup

Let . Then if is large, and is close to zero,

Definition

Probability Function

A random variable has a Poisson distribution with parameter () if

Poisson Distribution from Poisson Process

A process satisfying the following conditions is called a Poisson process.

  1. Independence: The number of occurrences in non-overlapping intervals are independent. For and are independent.

  2. Individuality or Singularity: Events occur singly, not in clusters. or more events in as .

  3. Homogeneity or Uniformity: Events occur at a uniform rate (in events per unit of time). one event in as

Little o

A function is as if

Poisson Distribution

If is a Poisson counting process with a rate of per unit. Then,

and

Examples

Suppose that trucks arrive at a receiving dock with an average arrival rate of per hour. What is the probability exactly trucks will arrive in a two-hour period?

We have .

Combining Models

Website hits for a given website occur according to a Poisson process with a rate of hits per minute. A second is a break if there are no hits in that second.

What is the probability of a break in any given second?

Compute the probability of observing exactly breaks in consecutive seconds.

, for

Compute the probability that one must wait for seconds to get breaks.

Let be the seconds we wait for breaks. We want .

Negative Binomial Distribution:

Expected Value and Variance

Summarizing Data on Random Variables

Median: A value such that half the results are below it and half above it, when the results are arranged in numerical order.

Mode: Value which occurs most often.

Mean: The mean of outcomes for a random variable is

Expectation of a Random Variable

Definition

Let be a discrete random variable with and probability function . The expected value of is given by

Suppose denotes the outcome of one fair six sided die roll. Compute .

We know that is, .

And so,

Possible Range

Suppose is a random variable satisfying for all possible values of . We have,

Expectation of

Let be a discrete random variable with and probability function . The expected value of some function of is given by

Proof

Linearity Properties of Expectation

For constants and ,

Proof

Thus it is also easy to show

The expectation of a sum is the sum of the expectations.

Means and Variances of Distributions

Expected value of a Binomial random variable

Let . Find .

Example

If we toss a coin times, with probability of a head, we’d expect

Hypergeometric Distribution

If , then

Geometric Distribution

If , then

Negative Binomial

If , then

Poisson Distribution

If , then

Variances of Distribution

Using the expected value is one way of predicting the value of a random variable. But we might want to know “how likely will observed data be exactly (or close to) the expected value?”

One may wonder how much a random variable tends to deviate from its mean. Suppose

Expected deviation:

Expected absolute deviation:

Expected squared deviation:

Definition

Variance

The variance of a random variable is denoted , and is defined by

or sometimes,

We derive this by

Suppose satisfies . What is ?

For all random variables , . if and only if .

Definition

Standard Deviation

The standard deviation of a random variable is denoted , and defined by

Variance of a Linear Transformation

If and are constants, and , then

the constant does not affect anything.

Variances of distributions

Continuous Random Variables

General Terminology and Notation

Definition

A random variable is said to be continuous if its range is an interval . is continuous if it can take any value between and .

We can’t use with continuous distributions because .

Suppose is a crv with range . We now use integrals instead of sums.

Probability Density Function

We say that a continuous random variable has probability density function if

For a four number spinner, if is where the spinner stops, we can define our pdf as:

We can see that this satisfies

Spinner Example

Definition

The support of a is defined as

Example

Suppose that is a continuous random variable with probability density function

Compute so that this is a valid pdf.

Compute

Note

Definition

The Cumulative Distribution Function of a random variable is

If is continuous with pdf , then

By the fundamental theorem of calculus

Properties:

  1. is defined for all real

  2. is a non-decreasing function of for all real

  3. and

  4. Since , we have

Spinner example,

Thus,

We can then get the pdf by

General approach to find from

  • Treat each piece of separately

  • Note for the minimum value in the support of

  • Note for the maximum value in the support of

  • Find

General approach to find from

  • Treat each piece of separately

  • Find

Note

Quantile

Suppose is a continuous random variable with CDF . The quantile of is the value such that

If , then is the median of X. We can find a given quantile by solving for .

Change of Variables

What if we want to find the CDF or pdf of a function of ? For the spinner example, the winning is inverse of the point we spin. Hence, the winning is , and we want to find

Specific example

How can we generalize this approach to

The process

  1. Find the range of values of

  2. Write the CDF of as a function of

  3. Use to find

  4. Differentiate if we want the pdf of ,

Spinner example:

Since , we know

Write the CDF of as a function of . Let .

Then use to find .

, and we know

Thus,

Differentiate if we want the pdf of , . We have for , so the pdf is

Expectation, Mean, and Variance for Continuous Random Variables

If is a continuous random variable with , and , then

Thus,

Note that the shortcut still holds.

Example

Now solve for

Thus,

Continuous Uniform Distribution

Definition

We say that has a continuous uniform distribution on if takes values in (or ) where all subintervals of a fixed length have the same probability.

has pdf

then the CDF is

Continuous Uniform: Mean and Variance

Exponential Distribution

Consider cars arriving following a Poisson process. The number of cars arriving in minutes follows

Let the time you wait before you see the first car, in minutes. This is a crv.

CDF

where is the number of events in .

Thus,

Taking the derivative gives the pdf:

Exponential Distribution

Let provide an alternate parameterization of the exponential distribution.

Definition

We say that has an exponential distribution with parameter if the density of is

The CDF becomes for .

In general, for any Poisson process with rate , the time between events will follow an exponential distribution

Now we want to compute and . We will need to solve

and

We can use the Gamma function.

Definition

The Gamma function, is defined for all as

Note

  • If , then

The Gamma function tells us that if is a positive integer, then

If we write , then and

Because

We can use the same trick with Variance to see that

Memoryless Property

Computer Generated Random Numbers

Transform simulated observations from to obtain observations from with CDF . Find a function such that .

Assuming is continuous and strictly increasing, our result implies

Example

Consider the CDF of the distribution

Find a transformation so that if has CDF .

CDF is not continuous nor strictly increasing so we have to use the generalized inverse . For a given :

The smallest that satisfies the equation above is

Thus

Normal Distribution

Characteristics: Symmetric about a mean value, more concentrated around the mean than the tails (and unimodal).

Definition

is said to have a normal distribution (or Gaussian distribution) with mean and variance if the density of is

or , or .

Properties:

  1. Symmetric about its mean: If , then

  2. Density of unimodal: Peak is at . The mode, median, and mean are the same .

  3. Mean and Variance are the parameters: , and

A classic problem, if , then what is the value of

This integral is weird.

Definition

We say that is a standard normal random variable if . The probability density function is:

The CDF is

Standard Normal Tables (Z-Tables). Values of the function are tabulated in Standard Normal Tables.

We use symmetry:

Standardization

Theorem

If , then defining

we have .

An important consequence of :

where

The process to find when :

  1. Compute

  2. Use standard normal tables to find

  3. This equals

where .

Let . Find such that ; this is the th percentile of .

So

Multivariate Distributions

Basic Terminology and Techniques

So far we’ve only considered univariate distributions.

Suppose that and are discrete random variables defined on the sample space. The joint probability function of and is

a shorthand is,

For a collection of discrete random variables, , the joint probability function is defined as

For example if we rolled three dice, and let denote the result on the die, we’d have

Properties:

Computing probability from the joint probability function:

Let be a subset of values that could take. Then

Definition

Suppose that and are discrete random variables with joint probability function . The marginal probability function of is

Similarly, the marginal distribution of is

Definition

Suppose that and are discrete random variables with joint probability function and marginal probability functions and . and are said to be independent random variables if and only if

This is the same as saying

Rolling two 6-sided dice with the outcome on the first die and the outcome on the second die:

but we also know

and so and we have independence.

This abstracts to size .

Definition

The conditional probability function of given is denoted , and is defined to be

What about functions of random variables?

Suppose that

For jointly distributed random variables and , is a random variable. If and have joint p.f. , then the probability function of is given by:

  • If and independently, then

  • If and independently, then

Multinomial Distribution

Definition

Multinomial Distribution: Consider an experiment in which:

  1. Individual trials have possible outcomes, and the probabilities of each individual outcome are denoted , so that

  2. Trials are independently repeated times, with denoting the number of times outcome occurred, so that

If have multinomial distribution with parameters and , then their joint probability function is

where satisfy .

The terms

are called multinomial coefficients.

If have joint Multinomial distribution with parameters and , then

Also

Expectation for Multivariate Distributions: Covariance and Correlation

Recall,

Definition

Suppose and are jointly distributed random variables with joint probability function . Then for a function ,

More generally, if , and have joint p.f. , then

Linear Properties of Expectation

and in general,

Independence gives a useful - but simplistic - way of describing relationships between variables.

Definition

If and are jointly distributed, then denotes the covariance between and . It is defined by

or the shortcut,

Properties of Covariance

  • Positive increases as increases

  • Negative decreases as increases

  • The larger the absolute value of , the stronger the relationship is

  • for any constant .

Theorem

If and are independent, then .

Proof

and are independent .

It is important to know that the converse is false. If then and are not necessarily independent.

and are uncorrelated if , but that does not mean they are independent.

Independence Independence

Definition

The correlation of and is denoted , and is defined by

with .

Correlation measures the strength of the linear relationship between and .

Properties:

  • If , then and will have an approximately positive linear relationship

  • If , then and will have an approximately negative linear relationship

  • If , then and are said to be uncorrelated.

We say that and are uncorrelated if (or ).

and are independent and are uncorrelated.

Once again, the converse is not true.

Mean and Variance of a Linear Combination of Random Variables

Suppose are jointly distributed random variables with joint probability function . A linear combination of the random variables is a random variable of the form

where

Examples

Expected Value of a Linear Combination:

Mean of sample mean:

Covariance of linear combinations:

Two useful results:

Variance of linear combination

When

When

In the case of two random variables and , and constants and , we have:

If

If

In general, let be random variables, and write , then:

If are independent, then (for ) and so

Variance of sample mean

In general, we have for independent random variables all with variance

this holds for any set of independent random variables that have the same variance.

Linear Combinations of Independent Normal Random Variables

Linear transformation of a normal random variable:

Let and where and are constants, then

The linear transformation of a normal random variable is still normal.

Linear combination of independent normal random variables:

Let and independently, and let and be constants, then

Linear combination of independent normal random variables is still normal.

Let be independent random variables. Then

and

The IQs of UWaterloo Math students are normally distributed with mean and variance . The probability that the average IQ in a class of students is between and is:

Indicator Random Variables

If we can think of in the following way:

Observe the first trial, and see if it succeeds. We set to be if it succeeds and if it fails.

Observe the second trial, and see if it succeeds. We set to be if it succeeds and if it fails.

Observe the trial, and see if it succeeds. We set to be if it succeeds and if it fails.

Sum the successful events to find

is a linear combination of random variables.

Definition

Let be an event which may possibly result from an experiment. We say that is the indicator random variable of the event . is defined by:

Covariance of Indicator Random Variables:

Suppose we have two events, and , and we define

How do we find

For general,

so for indicator variables we have

We know that and , so we just have to find

If and are events, then

and are independent and are uncorrelated.

Central Limit Theorem and Moment Generating Functions

Central Limit Theorem

If are independent random variables from the same distribution, with mean and variance , then as , the shape of the probability histogram for the random variable approaches the shape of a probability density function.

The cumulative distribution function of the random variable

approaches the cumulative distribution function. Similarly, the cumulative distribution function of

approaches the cumulative distribution function.

In other words, if is large:

has approximately a distribution, and

has approximately a distribution.

Example

Roll a -sided die times and record each result. If the die is a fair die, estimate the probability that the sum of the die rolls is less than .

Let be the dot for the th roll. We want the probability . Without CLT this is very difficult.

Using CLT:

Seems reasonable to assume independence, each is a discrete random variable, and is large.

Mean and variance

Apply CLT (Standardization)

Use the Z-table

Note:

  • CLT doesn’t hold if and/or don’t exist

  • Accuracy depends on the size of and the actual distribution of the

  • CLT works for any distribution of

Normal approximation to binomial

Note that if all are independent.

Then if then for large , the random variable

has approximately a .

Continuity Correction

We need to be careful with CLT when working with discrete random variables. For example we’ve computed . But we know that can’t take non-integer values. Therefore, gives us a better estimate. This is called continuity correction.

We should apply this when approximating discrete distributions using the CLT, and not continuous distributions.

If , then the cumulative distribution function of

approaches that of a random variable as

Moment Generating Functions

This is the third type of function that uniquely determines a distribution.

Definition

The moment generating function (MGF) of a random variable is given by

If is discrete with probability function , then

If is continuous with density

Properties:

So long as is defined in a neighbourhood of

Suppose has a distribution. Then its moment generating function is

Therefore,

so

What is the Moment Generating Function for ?

The distribution of is

What is the Moment Generating Function for ?

When , we have

When

Theorem (Uniqueness)

Suppose that random variables and have MGF’s and respectively. If for all , then and have the same distribution.

Theorem

Theorem

Suppose that and are independent and each have moment generating functions and . Then the moment generating function of is

We can use this to prove the following.

Suppose that and and and are independent.

Multivariate Moment Generating Functions

Definition

The joint moment generating function of two random variables, and is

And so if and are independent