Professor: Erik Hintz | Term: Fall 2023
Introduction to Probability
Definitions of Probability
Definition
Classical Definition of Probability
Provided things are equally likely.
Definition
Relative Frequency
The probability of an event is proportionate to number of times the event occurs in long repetitions. The probability of rolling a 2 is because after 6 million rolls, 2 will appear ~1 million times. (this is not very practical).
Definition
Subjective Definition
Persons belief on how likely something will happen (unclear since varies by person).
Mathematical Probability Models
Samples Spaces and Probability
Sample Spaces & Sets
Sample space is a set of distinct outcomes of an experiment with the property that in a single trial, only one outcome will occur.
-
Die: or
-
Number of coin flips until heads occurs:
-
Waiting time in minutes until a sunny day.
Sample space is discrete if it is finite or countably infinite (one-to-one correspondence with ). Otherwise non-discrete.
is discrete (countably infinite).
Event is a subset of a sample space .
is an event if ( is a subset of , is contained in ).
-
Die shows :
-
or fewer tosses till heads.
-
Notation
-
Element of: in
-
Union: or
-
Intersection: and
-
Complement:
-
Empty:
-
Disjoint
die rolled.
Definition
Probability
Let denote the set of all events on a given sample . Probability defined on is defined as
-
Scale:
-
-
Additivity (infinite):
e.g.
or
(discrete) and is an event.
is indivisible is a simple point, otherwise compound.
e.g.
simple compound.
Assign so that
discrete, is an event.
= Number is odd
for
Sample space is equally likely if probability of every outcome is the same.
of elements in set.
Probability and Counting Techniques
Fact
and are disjoint , then
Addition and Multiplication Rules
Addition Rule
Sum larger than 8.
Sum of Die larger than 8
Sum 9, Sum 10, Sum 11, Sum 12.
all are disjoint.
Definition
Thus,
Multiplication Rules
Ordered tuple, . choices for , choices for etc.
Definition
If there are ways to do task 1, and ways to do task 2, there are ways to do tasks 1 and 2.
With replacement: When the object is selected, put it back after.
Without replacement: Once an object is picked, it stays out of the pool.
Counting Arrangements or Permutations
Definition
Factorials
distinct objects, factorial.
and
10 people standing next to each other, arrangements.
Out of 5 students, must choose 1 president and 1 secretary. .
Note, Stirling’s Approximation.
Definition
Permutations
distinct objects, permutation of size is an ordered subset of individuals. (without replacement)
With replacement,
to the factors.
Our example of president and secretary above can thus be seen as .
Counting Subsets or Combinations
Question
If we have members on a club, how can we select to serve on a committee?
Here, order does not matter. Unlike a permutation, this is a combination.
In general, if we have objects, there are ways to reorder such objects. We can therefore get the combination count by dividing the permutation count by the number of ways of ordering the objects .
Definition
Definition
Given distinct objects, a combination of size is an unordered subset of of the objects (without replacement). The number of combinations of size taken from objects is denoted , which reads ” choose , and
For combinatorial proofs of these identities see MATH239.
For our previous example we have
Properties of
-
for
-
-
-
If we define , then
-
-
Binomial Theorem
Example
Group of women and men, a committee of women and men is formed at random. of them men dislike each other. What is the probability they don’t serve together?
First, get the size of the sample space .
Pick women from women,
Pick men from men,
Thus
Consider the event and do not serve in the committee together
Consider 1 and 2 in the committee together
The size of the sample space is given by:
Pick women from women, .
and are in the committee already, we need to pick man from the left, .
Thus,
Arrangements when Symbols are Repeated
Definition
Definition
Consider objects of types. Suppose objects of type , objects of type , and objects of type . Thus there are,
distinguishable arrangements of objects. This is the multinomial coefficient.
Letters of “SLEEVELESS” are arranged at random. What is the probability the word begins and ends with “S”?
Sample space,
Event,
Thus
Useful Series and Sums
Theorem
Finite Geometric Series:
Theorem
Infinite Geometric series if :
Theorem
Binomial Theorem , if is a positive integer and is any real number:
Theorem
Binomial Theorem , if is not a positive integer but :
Theorem
Multinomial Theorem:
where the summation is over all non-negative integers such that where is a positive integer.
Theorem
Hypergeometric Identity:
Theorem
Exponential Series:
for all in the real numbers.
A related identity:
Series involving integers:
Probability Rules and Conditional Probability
General Methods
Rules:
For any event .
If and are two events with , then .
Fact
Fundamental Laws of Set Algebra
Commutativity
Associativity
Distributivity
De Morgan’s Law
The complement of a union is the intersection of the complements.
The complement of an intersection is the union of the complements.
Applied for events
Rules for Unions of Events
For arbitrary events ,
Fact
If and are disjoint then,
Fact
What if ?
Then
Fact
The probability of the complement event is,
If and are disjoint (mutually exclusive), then
Otherwise,
In general,
Note, is often written as
Intersection of Events and Independence
Independent Events
Rolling die twice is an example of independent events. The outcome of the first doesn’t affect the second.
Definition
Events are independent if and only if
”roll on first” “roll on second”
Not independent. First roll is , first roll is even.
A common misconception is that if and are mutually exclusive, then and are independent.
Fact
If and are independent, and are independent. Law of total probability.
Fact
We can also see,
Conditional Probability
Definition
represents the probability that event occurs, when we know that occurs. This is the conditional probability of given .
Fact
If and are independent, then
and are independent events if and only if either of the following statements is true
Example
= The sum of two die is 10 = The first die is 6
If or , then and are independent.
Properties
-
-
-
If
-
-
If and are disjoint:
-
Product Rules, Law of Total Probability and Bayes’ Theorem
Fact
Product Rules
For events and ,
That means if we know and ; or and , we can find .
More events:
Definition
Law of Total Probability
Let be a partition of the sample space into disjoint (mutually exclusive) events, that is
Let be an arbitrary event in . Then
A common way in which this is used is that
since and partition .
Theorem
Bayes Theorem Suppose and are events defined on a sample space . Suppose also that . Then
Proof:
Discrete Random Variables
Random Variables and Probability Functions
Definition
A random variable is a function that assigns a real number to each point in a sample space . Often a random variable is abbreviated with RV or rv.
Definition
The values that a random variable can take on are called the range of the random variable. We often denote the range of a random variable by .
Example
If we roll a 6-sided dice, our sample space is and if we define sum of die rolls, the range is
Definitions
We say that a random variable is discrete if it takes values in a countable set (finite or countably infinite).
We say that a random variable is continuous if it takes values in some interval of real numbers, e.g. .
Important: Don’t forget that a random variable can be infinite and discrete.
Definition
Let be a discrete random variable with range . The probability function (p.f.) of is the function
The set of pairs is called the probability distribution of .
A probability function has two important properties:
-
for all
-
Example
A random variable has a range with , what values of makes a probability function.
From our rules,
Definition
The Cumulative Distribution Function (CDF) of a random variable is
We use the shorthand that if has CDF . Here,
where is the event that contains all outcomes with .
In general, for any
The CDF satisfies that
-
-
for (and we say is a non-decreasing function of )
-
, and
-
(right continuous)
If takes value on , we can get probability function from CDF:
In general, we have
Note, we often use
Definition
Definition
Two random variables and are said to have the same distribution if for all . We denote this by
Note, and having the same distribution does not mean .
Discrete Uniform Distribution
Setup
Suppose the range of is where and are integers and suppose all values are equally probable. Then has a Discrete Uniform Distribution on the set . The variables and are called the parameters of the distribution.
Illustrations
If is the number obtained when a die is rolled, then has a discrete Uniform distribution with and .
Definition
Probability Function
There are values in the set so the probability of each value must be in order for . Therefore
Example
Suppose a fair die is thrown once and let be the number on the face. Find the cumulative distribution function of .
Solution
This is an example of a Discrete Uniform distribution on the set having and probability function
The cumulative distribution function is ,
where is the largest integer less than or equal to .
Example
Let be the largest number when a die is rolled 3 times. First find the cumulative distribution function, and then find the probability function of .
Solution
This is another example of a distribution constructed from the Discrete Uniform.
The probability that the largest number is less than or equal to is,
for . Here is the CDF for all real values of :
To find the p.f. we may use the fact that we have so that
Hypergeometric Distribution
Setup
Consider a population of objects, of which are considered “successes” (S) and the remaining are considered “failures” (F).
Suppose that a subset of size is chosen at random from the population without replacement
We say that the random variable has a hypergeometric distribution if denotes the number of successes in the subset (shorthand: ).
-
: Number of objective
-
: Number of successes
-
: Number of draws
Illustrations
Drawing balls without replacement from a bag with blue balls and red balls. Let denote the number of blue balls drawn. Then
Drawing cards from a deck of cards. Let denote the number of aces. Then
Definition
Probability Function
Suppose
-
: the number of successes drawn cannot exceed the number drawn
-
: we have at most success
-
obviously
-
: if exceeds the number of failures , we have at least of successes.
We can verify the probability function of the hypergeometric distribution sums to .
Example
Consider drawing a -card hand at random from a standard -card deck. What is the probability that the hand contains at least K’s?
the number of K in hand. Success type: K’s Failure type: Not K cards
Binomial Distribution
Definition
Definition
A Bernoulli trial with probability of success is an experiment that results in either a success or failure, and the probability of success is .
Setup
Consider an experiment in which Bernoulli trials are independently formed, each with probability of success . denotes the number of successes from trials.
Illustrations
Flip a coin independently times, let denote the number of heads observed. Then
Drawing balls with replacement from a bag with blue balls and red balls. Let denote the number of blue balls drawn.
Assumptions:
-
Trials are independent
-
The probability of success, , is the same in each Bernoulli trial
Definition
Probability Function
Proof that for :
If is very large, and we keep the number of success where . We choose a relatively small without replacement from .
Let and . Then
The approximation is good if and are large compared to .
Negative Binomial Distribution
Setup
Consider an experiment in which Bernoulli trials are independently performed, each with probability of success , until exactly successes are observed. Then if denotes the number of failures before observing successes, we say that is Negative Binomial with parameters and .
Let be the number of trials until exactly successes are observed. We have .
Illustrations
Flip a coin until heads are observed, and let denote the number of tails observed. Then
Definition
Probability Function
Proof it is valid
Geometric Distribution
Setup
Geometric distribution is a special case of the Negative Binomial where we stop after the first success .
Definition
Probability Function
Geometric Distribution: CDF For an integer , the CDF of is
So for , and 0 otherwise.
Example
The number of times I have to roll dice before I get snake eyes.
Poisson Distribution (from Binomial)
As and , the binomial function approaches
for
Setup
Let . Then if is large, and is close to zero,
Definition
Probability Function
A random variable has a Poisson distribution with parameter () if
Poisson Distribution from Poisson Process
A process satisfying the following conditions is called a Poisson process.
-
Independence: The number of occurrences in non-overlapping intervals are independent. For and are independent.
-
Individuality or Singularity: Events occur singly, not in clusters. or more events in as .
-
Homogeneity or Uniformity: Events occur at a uniform rate (in events per unit of time). one event in as
Little o
A function is as if
Poisson Distribution
If is a Poisson counting process with a rate of per unit. Then,
and
Examples
Suppose that trucks arrive at a receiving dock with an average arrival rate of per hour. What is the probability exactly trucks will arrive in a two-hour period?
We have .
Combining Models
Website hits for a given website occur according to a Poisson process with a rate of hits per minute. A second is a break if there are no hits in that second.
What is the probability of a break in any given second?
Compute the probability of observing exactly breaks in consecutive seconds.
, for
Compute the probability that one must wait for seconds to get breaks.
Let be the seconds we wait for breaks. We want .
Negative Binomial Distribution:
Expected Value and Variance
Summarizing Data on Random Variables
Median: A value such that half the results are below it and half above it, when the results are arranged in numerical order.
Mode: Value which occurs most often.
Mean: The mean of outcomes for a random variable is
Expectation of a Random Variable
Definition
Let be a discrete random variable with and probability function . The expected value of is given by
Suppose denotes the outcome of one fair six sided die roll. Compute .
We know that is, .
And so,
Possible Range
Suppose is a random variable satisfying for all possible values of . We have,
Expectation of
Let be a discrete random variable with and probability function . The expected value of some function of is given by
Proof
Linearity Properties of Expectation
For constants and ,
Proof
Thus it is also easy to show
The expectation of a sum is the sum of the expectations.
Means and Variances of Distributions
Expected value of a Binomial random variable
Let . Find .
Example
If we toss a coin times, with probability of a head, we’d expect
Hypergeometric Distribution
If , then
Geometric Distribution
If , then
Negative Binomial
If , then
Poisson Distribution
If , then
Variances of Distribution
Using the expected value is one way of predicting the value of a random variable. But we might want to know “how likely will observed data be exactly (or close to) the expected value?”
One may wonder how much a random variable tends to deviate from its mean. Suppose
Expected deviation:
Expected absolute deviation:
Expected squared deviation:
Definition
Variance
The variance of a random variable is denoted , and is defined by
or sometimes,
We derive this by
Suppose satisfies . What is ?
For all random variables , . if and only if .
Definition
Standard Deviation
The standard deviation of a random variable is denoted , and defined by
Variance of a Linear Transformation
If and are constants, and , then
the constant does not affect anything.
Variances of distributions
Continuous Random Variables
General Terminology and Notation
Definition
A random variable is said to be continuous if its range is an interval . is continuous if it can take any value between and .
We can’t use with continuous distributions because .
Suppose is a crv with range . We now use integrals instead of sums.
Probability Density Function
We say that a continuous random variable has probability density function if
For a four number spinner, if is where the spinner stops, we can define our pdf as:
We can see that this satisfies
Spinner Example
Definition
The support of a is defined as
Example
Suppose that is a continuous random variable with probability density function
Compute so that this is a valid pdf.
Compute
Note
Definition
The Cumulative Distribution Function of a random variable is
If is continuous with pdf , then
By the fundamental theorem of calculus
Properties:
-
is defined for all real
-
is a non-decreasing function of for all real
-
and
-
-
Since , we have
Spinner example,
Thus,
We can then get the pdf by
General approach to find from
-
Treat each piece of separately
-
Note for the minimum value in the support of
-
Note for the maximum value in the support of
-
Find
General approach to find from
-
Treat each piece of separately
-
Find
Note
Quantile
Suppose is a continuous random variable with CDF . The quantile of is the value such that
If , then is the median of X. We can find a given quantile by solving for .
Change of Variables
What if we want to find the CDF or pdf of a function of ? For the spinner example, the winning is inverse of the point we spin. Hence, the winning is , and we want to find
Specific example
How can we generalize this approach to
The process
-
Find the range of values of
-
Write the CDF of as a function of
-
Use to find
-
Differentiate if we want the pdf of ,
Spinner example:
Since , we know
Write the CDF of as a function of . Let .
Then use to find .
, and we know
Thus,
Differentiate if we want the pdf of , . We have for , so the pdf is
Expectation, Mean, and Variance for Continuous Random Variables
If is a continuous random variable with , and , then
Thus,
Note that the shortcut still holds.
Example
Now solve for
Thus,
Continuous Uniform Distribution
Definition
We say that has a continuous uniform distribution on if takes values in (or ) where all subintervals of a fixed length have the same probability.
has pdf
then the CDF is
Continuous Uniform: Mean and Variance
Exponential Distribution
Consider cars arriving following a Poisson process. The number of cars arriving in minutes follows
Let the time you wait before you see the first car, in minutes. This is a crv.
CDF
where is the number of events in .
Thus,
Taking the derivative gives the pdf:
Exponential Distribution
Let provide an alternate parameterization of the exponential distribution.
Definition
We say that has an exponential distribution with parameter if the density of is
The CDF becomes for .
In general, for any Poisson process with rate , the time between events will follow an exponential distribution
Now we want to compute and . We will need to solve
and
We can use the Gamma function.
Definition
The Gamma function, is defined for all as
Note
-
-
If , then
The Gamma function tells us that if is a positive integer, then
If we write , then and
Because
We can use the same trick with Variance to see that
Memoryless Property
Computer Generated Random Numbers
Transform simulated observations from to obtain observations from with CDF . Find a function such that .
Assuming is continuous and strictly increasing, our result implies
Example
Consider the CDF of the distribution
Find a transformation so that if has CDF .
CDF is not continuous nor strictly increasing so we have to use the generalized inverse . For a given :
The smallest that satisfies the equation above is
Thus
Normal Distribution
Characteristics: Symmetric about a mean value, more concentrated around the mean than the tails (and unimodal).
Definition
is said to have a normal distribution (or Gaussian distribution) with mean and variance if the density of is
or , or .
Properties:
-
Symmetric about its mean: If , then
-
Density of unimodal: Peak is at . The mode, median, and mean are the same .
-
Mean and Variance are the parameters: , and
A classic problem, if , then what is the value of
This integral is weird.
Definition
We say that is a standard normal random variable if . The probability density function is:
The CDF is
Standard Normal Tables (Z-Tables). Values of the function are tabulated in Standard Normal Tables.
We use symmetry:
Standardization
Theorem
If , then defining
we have .
An important consequence of :
where
The process to find when :
-
Compute
-
Use standard normal tables to find
-
This equals
where .
Let . Find such that ; this is the th percentile of .
So
Multivariate Distributions
Basic Terminology and Techniques
So far we’ve only considered univariate distributions.
Suppose that and are discrete random variables defined on the sample space. The joint probability function of and is
a shorthand is,
For a collection of discrete random variables, , the joint probability function is defined as
For example if we rolled three dice, and let denote the result on the die, we’d have
Properties:
Computing probability from the joint probability function:
Let be a subset of values that could take. Then
Definition
Suppose that and are discrete random variables with joint probability function . The marginal probability function of is
Similarly, the marginal distribution of is
Definition
Suppose that and are discrete random variables with joint probability function and marginal probability functions and . and are said to be independent random variables if and only if
This is the same as saying
Rolling two 6-sided dice with the outcome on the first die and the outcome on the second die:
but we also know
and so and we have independence.
This abstracts to size .
Definition
The conditional probability function of given is denoted , and is defined to be
What about functions of random variables?
Suppose that
For jointly distributed random variables and , is a random variable. If and have joint p.f. , then the probability function of is given by:
-
If and independently, then
-
If and independently, then
Multinomial Distribution
Definition
Multinomial Distribution: Consider an experiment in which:
Individual trials have possible outcomes, and the probabilities of each individual outcome are denoted , so that
Trials are independently repeated times, with denoting the number of times outcome occurred, so that
If have multinomial distribution with parameters and , then their joint probability function is
where satisfy .
The terms
are called multinomial coefficients.
If have joint Multinomial distribution with parameters and , then
Also
Expectation for Multivariate Distributions: Covariance and Correlation
Recall,
Definition
Suppose and are jointly distributed random variables with joint probability function . Then for a function ,
More generally, if , and have joint p.f. , then
Linear Properties of Expectation
and in general,
Independence gives a useful - but simplistic - way of describing relationships between variables.
Definition
If and are jointly distributed, then denotes the covariance between and . It is defined by
or the shortcut,
Properties of Covariance
-
Positive increases as increases
-
Negative decreases as increases
-
The larger the absolute value of , the stronger the relationship is
-
-
for any constant .
-
Theorem
If and are independent, then .
Proof
and are independent .
It is important to know that the converse is false. If then and are not necessarily independent.
and are uncorrelated if , but that does not mean they are independent.
Independence Independence
Definition
The correlation of and is denoted , and is defined by
with .
Correlation measures the strength of the linear relationship between and .
Properties:
-
If , then and will have an approximately positive linear relationship
-
If , then and will have an approximately negative linear relationship
-
If , then and are said to be uncorrelated.
We say that and are uncorrelated if (or ).
and are independent and are uncorrelated.
Once again, the converse is not true.
Mean and Variance of a Linear Combination of Random Variables
Suppose are jointly distributed random variables with joint probability function . A linear combination of the random variables is a random variable of the form
where
Examples
Expected Value of a Linear Combination:
Mean of sample mean:
Covariance of linear combinations:
Two useful results:
Variance of linear combination
When
When
In the case of two random variables and , and constants and , we have:
If
If
In general, let be random variables, and write , then:
If are independent, then (for ) and so
Variance of sample mean
In general, we have for independent random variables all with variance
this holds for any set of independent random variables that have the same variance.
Linear Combinations of Independent Normal Random Variables
Linear transformation of a normal random variable:
Let and where and are constants, then
The linear transformation of a normal random variable is still normal.
Linear combination of independent normal random variables:
Let and independently, and let and be constants, then
Linear combination of independent normal random variables is still normal.
Let be independent random variables. Then
and
The IQs of UWaterloo Math students are normally distributed with mean and variance . The probability that the average IQ in a class of students is between and is:
Indicator Random Variables
If we can think of in the following way:
Observe the first trial, and see if it succeeds. We set to be if it succeeds and if it fails.
Observe the second trial, and see if it succeeds. We set to be if it succeeds and if it fails.
Observe the trial, and see if it succeeds. We set to be if it succeeds and if it fails.
Sum the successful events to find
is a linear combination of random variables.
Definition
Let be an event which may possibly result from an experiment. We say that is the indicator random variable of the event . is defined by:
Covariance of Indicator Random Variables:
Suppose we have two events, and , and we define
How do we find
For general,
so for indicator variables we have
We know that and , so we just have to find
If and are events, then
and are independent and are uncorrelated.
Central Limit Theorem and Moment Generating Functions
Central Limit Theorem
If are independent random variables from the same distribution, with mean and variance , then as , the shape of the probability histogram for the random variable approaches the shape of a probability density function.
The cumulative distribution function of the random variable
approaches the cumulative distribution function. Similarly, the cumulative distribution function of
approaches the cumulative distribution function.
In other words, if is large:
has approximately a distribution, and
has approximately a distribution.
Example
Roll a -sided die times and record each result. If the die is a fair die, estimate the probability that the sum of the die rolls is less than .
Let be the dot for the th roll. We want the probability . Without CLT this is very difficult.
Using CLT:
Seems reasonable to assume independence, each is a discrete random variable, and is large.
Mean and variance
Apply CLT (Standardization)
Use the Z-table
Note:
-
CLT doesn’t hold if and/or don’t exist
-
Accuracy depends on the size of and the actual distribution of the
-
CLT works for any distribution of
Normal approximation to binomial
Note that if all are independent.
Then if then for large , the random variable
has approximately a .
Continuity Correction
We need to be careful with CLT when working with discrete random variables. For example we’ve computed . But we know that can’t take non-integer values. Therefore, gives us a better estimate. This is called continuity correction.
We should apply this when approximating discrete distributions using the CLT, and not continuous distributions.
If , then the cumulative distribution function of
approaches that of a random variable as
Moment Generating Functions
This is the third type of function that uniquely determines a distribution.
Definition
The moment generating function (MGF) of a random variable is given by
If is discrete with probability function , then
If is continuous with density
Properties:
So long as is defined in a neighbourhood of
Suppose has a distribution. Then its moment generating function is
Therefore,
so
What is the Moment Generating Function for ?
The distribution of is
What is the Moment Generating Function for ?
When , we have
When
Theorem (Uniqueness)
Suppose that random variables and have MGF’s and respectively. If for all , then and have the same distribution.
Theorem
Theorem
Suppose that and are independent and each have moment generating functions and . Then the moment generating function of is
We can use this to prove the following.
Suppose that and and and are independent.
Multivariate Moment Generating Functions
Definition
The joint moment generating function of two random variables, and is
And so if and are independent