Relative Frequency
Definition of Probability
The relative frequency definition of probability says:
If an experiment is repeated, the probability of an event (a
specified outcome of the experiment) is the relative frequency of
occurrence of that event in a large number of repetitions of the
experiment.
In this definition 'large number' is vague and will
not be made precise here.
The following links use the Virtual
Laboratories in Probability and Statistics by Professor
Kyle Siegrist of the University of Alabama at Huntsville.
Each link opens a page that illustrates the relative frequency definition
of probability for a different situation.
A page by Professor Siegrist with a link to a simulation of the
birthday problem is found here.
A description of the problem appears below the graphs. What
are the probabilities that at least two people have the same birthday
if there are 5 people in a room? What are they if the number
of people in the room is set at 10? At 20? At 23?
Now set k at 30 and run the simulation of the experiment 100 times.
What is the actual proportion of cases in which there were no
matching birthdays? What is the actual proportion of cases
in which there was at least one matching birthday?
A link to a simulation of coin tossing is found here.
This will open the coin tossing simulation. Run the coin
tossing simulation with 1 single fair coin 100 times. How
many heads occurred in the 100 tosses? How many heads would
you expect? Does the number that you have observed lead
you to believe that the coin is not fair?
Now run the experiment with 2 fair coins. How many times
would you expect to get 2 heads? What did you observe in
your 100 tosses? Try the experiment with 3 fair coins.
How many times would you expect to get 3 heads? What did
you observe in your 100 tosses?
Click
to open the dice experiment. Run the die tossing simulation
with 1 fair die 100 times. Record the number of times that
a 6 occurs. For a fair die, about how many times would you
expect a 6 to occur? Now change the experiment so 2 fair
dice are tossed. About how many times would you expect a
sum of 5 on the dice to occur in 100 tosses? How many times
did you get a sum of 5 on your 100 tosses?
Go to this link
for a page on drawing cards. On the page that has opened,
click the red die in front of Exercise 13. Run the experiment
20 times. About how many times should a red card occur in
20 runs of the experiment? How many times did you get a
red card in your 20 runs of the experiment?
This link
goes to a page on drawing cards for a Poker Deck. Read
the instructions near the top of the linked page and then do Exercise
2 on that page.
The matching
experiment is explained at this link. Go to this link
and do exercise 9.
The above examples involve gambling. The mathematical study
of probability has roots in the study of certain gambling situations.
The next link takes you to a web page that has much information
about modern gambling. If you choose to visit the page be
careful if you don't want to get on their mailing list.
The page is called the Wizard
of Odds .
Sample Space
Definition of Probability
-
Sample Point
Any fundamental outcome of an experiment.
-
Sample Space
The collection of all sample points for an experiment.
-
Event
Any subset of the sample space of an experiment. Events
are usually denoted by capital letters from the first part of
the alphabet.
-
Probability of an Event
- The probability of an event is a number between 0 and 1
(it may be 0 or 1). The probability of event A is denoted
by P[A].
- Probability indicates the likelihood that the event will
occur. Events with probabilities close to 1 are more
likely to happen than events with probabilities near 0.
-
Probability Laws
- P[Not E]=1-P[E]
- P[A or B]=P[A]+P[B]-P[A and B]
-
Examples
- Example 1: A fair coin is tossed 3 times. Events defined
are: A=At least one head in the 3 tosses, B=Exactly 2 heads
in the 3 tosses, and C=No heads in 3 tosses. Find P[A],
P[B], P[C], P[Not A], P[Not B], P[Not C], P[A or B], P[A or
C], P[B or C], P[A and B], P[A and C], P[B and C].
-
The sample space consists of sample points
HHH, HHT, HTH, HTT, THH, THT, TTH, and TTT. These
sample points can be divided into those in A (red ones),
those in B (underlined ones), and those in
C (italicized ones). They are then:
HHH, HHT,
HTH, HTT,
THH, THT,
TTH, and TTT
- From the last display, A consists of sample points in
red, so P[A]=7/8; the sample points in Not A are those
not red, so P[Not A]=1/8.
- From the last display, B consists of sample points underlined,
so P[B]=3/8; sample points in Not B are those not underlined,
so P[Not B]=5/8.
- From the last display, C consists of sample points italicized,
so P[C]=1/8; sample points in Not C are those not italicized,
so P[Not C]=7/8.
- From the last display, A and B is all sample points
that are both red and underlined,
so P[A and B]=3/8 while A or B is all sample points that
are red or underlined (or both),
so P[A or B]=7/8. Notice that P[A or B]=7/8=P[A]+P[B]-P[A
and B]=7/8 + 3/8 - 3/8.
- From the last display, A and C is all sample points
that are both red and italicized,
so P[A and C]=0/8 while A or C is all sample points that
are red or italicized (or both),
so P[A or C]=8/8. Notice that P[A or C]=8/8=P[A]+P[C]-P[A
and C]=7/8 + 1/8 - 0/8.
- From the last display, B and C is all sample points
that are both underlined and italicized,
so P[B and C]=0/8 while B or C is all sample points that
are underlined or italicized (or both),
so P[B or C]=4/8. Notice that P[B or C]=4/8=P[B]+P[C]-P[B
and C]=3/8 + 1/8 - 0/8.
- Example 2: A pair of fair dice are tossed. One die
is green and the other die is white. Events defined
are: A=Green die is greater than or equal to 2, B=White die
shows an even number, C=Sum of numbers on the dice is 6. Find
P[A], P[B], P[C], P[Not A], P[Not B], P[Not C], P[A or B],
P[A or C], P[B or C], P[A and B], P[A and C], P[B and C].
The number on the white die is shown on the left margin
of the next table, and the number on the green die is shown
along the top margin of the following table. Sums
are shown in the interior of the table. Outcomes belonging
to event A are shown in red, outcomes belonging to event
B are shown underlined, and those in event C are in italics.
|
1 |
2 |
3 |
4 |
5 |
6 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
- From the table shown above P[A]=(Number of red sums)/36=30/36,
P[B]=(Number of underlined sums)/36=18/36, and P[C]=(Number
of bold italicized sums)/36=5/36. P[Not A]=1-P[A]=1-(30/36)=6/36,
P[Not B]=1-P[B]=1-(18/36)=18/36, P[Not C]=1-P[C]=1-(5/36)=31/36.
- From the table above P[A or B]=(Number of red or underlined
(or both) sums)/36=33/36, P[A or C]=(Number of red or
bold italicized sums)/36=31/36, and P[B or C]=(Number
of underlined or bold italicized sums)/36=21/36.
- From the table above P[A and B]=(Number of red and underlined
sums)/36=15/36, P[A and C]=(Number of red and bold italicized
sums)/36=4/36, and P[B and C]=(Number of underlined and
bold italicized sums)/36=2/36.
- Example 3: A card is drawn from an ordinary deck of 52 cards.
Events defined are: A=Black card, B=Spade,
and C=Ace. Find P[A], P[B], P[C], P[Not A], P[Not B],
P[Not C], P[A or B],
P[A or C], P[B or C], P[A and B], P[A and C], P[B and C].
- The deck has 26 black cards, so P[A]=26/52, it has 13
spades, so P[B]=13/52, and it has 4 aces, so P[C]=4/52.
- In the deck 13 cards are both black and spades, so P[A
and B]=13/52, 2 cards are both black
and aces, so P[A and C]=2/52, and one card is both a spade
and an ace, so P[B and C]=1/52.
- In an ordinary deck 26 cards are either black or a spade
(or both) so P[A or B]=26/52, 16 cards are either black
or an ace (or both), so P[A or C]=28/52, and 16 cards
are either a spade or an ace (or both), so P[B or C]=16/52.
- Example 4: A fair die is rolled and a fair coin is flipped
the number of times shown on the die, e.g. if the die lands
with a four facing up, the coin is flipped 4 times, if the
die lands 1, the coin is flipped once, etc. List a few
sample points for this experiment. How many sample points
are there in the sample space of this experiment? Should
each sample point be assigned the same probability measure?
- Example 5: A fair coin is tossed. If this coin comes
up heads, a fair coin is tossed, otherwise, a coin with probability
of heads equal to 2/3 is tossed. List all of the sample
points in the experiment. Should each sample point be
assigned the same probability?
- Example 6: Pick 2 balls without replacement from a container
with 4 balls numbered 1 through 4. Also,
balls numbered 1 and 2 are green and balls numbered 3 and
4 are red. List all of the sample points in the experiment.
What probability should be assigned to each of the sample
points?
-
Conditional Probability and Independence
- Rules for Conditional Probability
- P[A|B] = P[A and B] / P[B]
- or P[A and B] = P[A | B] P[B] or P[A and B] = P[B |
A] P[A]
- Conditional Probability Calculations
- Consider drawing two cards, one after another without
replacement, from an ordinary deck. On drawing the
first card the sample space consists of 52 cards.
However, once the first card is drawn, the sample space
for the second draw consists of the 51 cards that remain.
Then, for example, the probability that the 2nd card is
a heart given that the first card was a heart is 12/51,
while the probability that the 2nd card is a heart given
that the first card was not a heart is 13/51. These
probabilities are called conditional probabilities.
They are denoted by
P[2nd is heart | 1st is heart] and P[2nd is heart | 1st
is not heart].
The vertical symbol | means given.
- A ball is selected from a container that holds 6 red
and 10 green balls. Then a second ball is selected
from the container without replacing the first ball.
The probability that the second ball is green given that
the first ball is green is 9/15 while the probability
that the second ball is green given that the first ball
is red is 10/15. These probabilities can be written
as
P[2nd ball green | 1st ball green]=9/15 and P[2nd ball
green | 1st ball red]=10/15
- A population is classified by gender and political party
affiliation. The membership of the population is
shown in the following table with gender along the top
and political party along the left side of the table.
The interior of the table givens the number of people
in each classification.
|
Female |
Male |
Democrat |
45 |
50 |
Republican |
60 |
44 |
Independent |
10 |
15 |
A person is selected at random from this population.
The following ordinary probabilities can be found directly
from the table: P[Democrat]=95/224, P[Female]=115/224,
P[Democrat and Female]=45/224. Also, the following
conditional probabilites can be found directly from
the table by working with the appropriate row or column
of the table. P[Democrat | Female]= 45/115, P[Female
| Democrat]= 45/95, and P[Independent | Male]= 15/109.
From this example, notice that the following are
true:
(1) P[Female | Democrat] = P[Female and Democrat]/P[Democrat]
(2) P[Independent and Male] = P[Independent | Male]
P[Male].
The two rules, P[A | B] = P[A and B]/P[B] (for P[B]
unequal to 0) and P[A and B] = P[A | B] P[B] are true
for any two events, A and B.
- Independence of Events: Events A and B are independent if
and only if P[A | B] = P[A]
- The definition of independence can be stated as A and
B are independent if and only if the occurrence of one
of them, say B, does not affect the occurrence of the
other, A.
- Since P[A and B] = P[A | B] P[B], and since for independent
events, P[A | B]=P[A],
P[A and B] = P[A] P[B]
- In tossing a fair coin twice, the probability of event
A, getting heads on the first toss is 1/2. The probability
of event B, getting heads on the second toss is also 1/2.
The probability of event A and B, getting heads on the
first and second toss is 1/4. In this case P[A and
B] = P[A] P[B], so A and B are independent events.
- In drawing two balls without replacement from a container
that holds 6 red and 10 green balls. The probability
of event A, that the first ball picked is red, is 6/16.
The probability of event B, that the second ball picked
is red, is also 6/16. This second probability may
seem surprising but it can be shown to be true.
The probability of event A and B, that both balls are
red, is (6/16)(5/15). In this case P[A and B] is
not equal to P[A] P[B], that is A and B are not independent
events.
- If the engines on a jet airplane are identical in their
ability to operate throughout a particular flight, if
the probability that any one of the engines works properly
for a particular flight is 0.97, if the engines operate
independently of one another, and if the plane can make
this particular flight safely if at least one engine continues
working, what is the probability that a jet plane with
four of these engines is able to make this flight safely?
P[Makes flight safely] = 1 - P[Doesn't make flight safely]
= 1 - P[all engines fail] = 1 - 0.034 = 0.99999919
The key step was finding the probability that all engines
fail. This was found by multiplying the probability
that any one of the engines fail together 4 times--this
is allowed by the assumption of independence for the engines.
-
Random Variables
- Definition--A random variable is a quantitative variable
whose value is determined by some chance mechanism.
Examples are the total number of heads in 10 tosses of a fair
coin, the toss number of the first head in 10 tosses of a
fair coin, the number of red balls selected when drawing 2
balls (without replacement) from a container that holds 8
red balls and 20 green balls.
- Discrete and Continuous Random Variables
- Discrete random variables can only have a finite or
countably infinite number of values. The number
of heads in 10 tosses of a fair coin, the toss number
of the first head if a fair coin is tossed until a head
appears, or the number of green balls selected in the
example given above.
- Continuous random variables can assume any of an uncountably
infinite set of values. For example, if a point
is picked at random on the interval from [0,1], there
are an uncountably infinite number of values that could
be picked.
- Probability Distribution of a Discrete Random Variable--Each
discrete random variable can only assume a finite or countably
infinite number of values. If a table is made associating
the probability of each value with the value, the table or
association is the probability distribution of that random
variable. As an example consider tossing a fair coin
3 times. A random variable that can be associated with
this experiment is a count of the number of heads in the 3
tosses. Call this random variable T. T can assume
any of the values 0, 1, 2, or 3. The probabilities associated
with each of these values are P[T=0]=1/8, P[T=1]=3/8, P[T=2]=3/8,
and P[T=3]=1/8. These probabilities make up the probability
distribution of this random variable. In table form,
this probability distribution is:
t |
0
|
1 |
2 |
3 |
P[T=t] |
1/8 |
3/8 |
3/8 |
1/8 |
Notice that the sum of the probabilities is 1. This
is true for any discrete random variable.
Run the experiment of tossing a fair coin 3 times 1000
times updating after every 100 tosses. To do this
link here,
and when the page opens click the red die in front of number
4. Set the number of coins at 3. After running
the experiment 1000 times, what can you say about the theoretical
(in blue) and actual (in red) probability distributions
of the number of heads?
In all examples of discrete random
variables, the probabilities in the probability distribution
table give the 'long-term' proportion of times that the
random variable assumes each possible value.
- Example 1: Find the probability distribution of the number
of red balls selected if two balls are selected (without replacement)
from a container which has 4 balls numbered 1 through 4, with
balls numbered 1 and 2 red balls and balls numbered 3 and
4 green balls.
- Example 2: Find the probability distribution of the number
of red balls selected if two balls are selected (with replacement)
from a container which has 4 balls numbered 1 through 4, with
balls numbered 1 and 2 red
balls and balls numbered 3 and 4 green balls.
- Example 3: Find the probability distribution of the number
of red balls selected if two balls are selected (with replacement)
from a container with 20 balls, 10 of them red and 10 green.
- Example 4: Find the probability distribution of the number
of red balls selected if two balls are selected (without replacement)
from a container with 20 balls, 10 of them red and 10 green.
For examples 3 and 4, you can use this link
to a simulation of the situation. When the page
opens click the red die in front of number 4 to open the simulation.
When the simulation opens, set N to 20, set R to 10, the number
of red balls, and set n, the sample size to 2. Select
with or without replacement as appropriate for the example.
The blue graph and the text below it will show probabilities
for each number of red balls.
- Example 5: In examples 3 and 4 decrease the number of red
balls. What happens to the probability distribution
of the number of red balls in the sample? Is this expected?
Also, for each number of red balls, observe the differences
in the probability distribution with and without replacement.
Next, set the number of red balls at 10, use sampling with
or without replacement, and run the simulation of drawing
2 balls, 1000 times, updating every 100 times. What
do you see?
- Mean (also called Expected Value) and Standard Deviation
of a Discrete Random Variable
Consider again the count of heads in 3 tosses of a fair
coin. If this experiment is repeated, say 10 times,
and the number of heads in each series of 3 tosses is counted,
you will have a set of numbers like 0,1,3,1,2,2,1,1,3,0.
The average of these numbers is the average value for the
random variable, the number of heads in 3 tosses of a fair
coin. To see what happens in a larger number of runs
of the experiment, again link here,
and when the page opens click the red die in front of number
4. Set the number of coins at 3 and run the experiment
100 times. What is the average number of heads per
3 tosses? Now reset and run the experiment 1000 times.
What is the average number of heads per 3 tosses?
The long-term average number of heads is called the expected
value of the random variable, the number of heads in 3 tosses
of a fair coin. This expected value can be found for
most random variables. Think of expected value as
the average value of a random variable.
There is an easier way to find the expected value of this
(or any) discrete random variable. If the experiment
of tossing the coin 3 times is repeated for a large number,
N, times, the experiment will end in 0 heads n0 times, in
1 head n1 times, in 2 heads n2 times, and in 3 heads n3
times. The total number of heads is 0 n0 + 1 n1 +
2 n2 + 3 n3, and the average number of heads per run of
the experiment is
(0 n0 + 1 n1 + 2 n2 + 3 n3)/N = 0 (n0/N) + 1 (n1/N) + 2
(n2/N) + 3 (n3/N)
For large N, (n0/N) ~ P[0 Heads], (n1/N) ~ P[1 Head], (n2/N)
~ P[2 Heads], (n3/N) ~ P[3 Heads], so the average number
of heads per run of the experiment is
0 P[0 Heads] + 1P[1 Head] + 2 P[2 Heads] + 3 P[3 Heads]
This is called the Expected Value or Mean and is denoted,
for a general random variable X, by E[X]. It can be
computed by
Using this formula on the random variable T, the total
number of heads in 3 tosses of a fair coin, you get
µ=E[T] = 0 (1/8) + 1 (3/8) + 2 (3/8) + 3 (1/8) = 12/8 =
3/2 = 1.5. This can be interpreted as the average
number of heads per sequence of 3 tosses if the experiment
is repeated a large number of times.
Just as you are able to find the average value for a random
variable, so you can also find the standard deviation of
the random variable. In the case of a random variable,
the standard deviation is given by
For random variable T, the total number of
heads in 3 tosses of a fair coin, the standard deviation
computed by the rightmost formula is
SD[T] = Square Root of [02 (1/8)
+ 12 (3/8) + 22 (3/8) + 32
(1/8) - (3/2)2] = Square Root of [3/4] = 31/2/2
-
The Binomial Random Variable
- Definition--A binomial random variable with parameters n
and p is a count of the number of successes in n experiments
(or trials).
- Each trial can result in only two outcomes, a success,
S, or a failure, F.
- The probability of a success on any trial is P[S] =
p and the probability of a failure on any trial is P[F]
= 1-p = q
- The outcome of any trial has no effect on outcomes of
other trials.
- The binomial random variable is a count of the number
of successes in n trials.
An example is the toss of a fair coin 3 times. If
you think of a success as a head, the count of the number
of heads in 3 tosses satisfies the definition of a binomial
random variable. Here n=3 and p=1/2.
Another example is the drawing of 2 balls with replacement
from a container with 20 balls, 10 of which are red and
10 white. A red ball is considered a success.
Then the count of red balls is a binomial random variable
with n=2 and p=1/2. If the drawing is without replacement,
the probabilities of a red ball change from draw to draw,
so the count of red balls is no longer a binomial random
variable. However, when the number of balls drawn
is much smaller than the total number of balls in the container,
the count of red balls will be distributed almost like a
binomial random variable.
- Probability Distribution of a Binomial Random Variable with
parameters n and p.
Since the binomial random variable is a count of the number
of successes in n trials, the number of successes can only
be an integer between 0 and n. Thus, if X is the total
number of successes in n trials, P[X=k] will only be nonzero
for k=0,1,2,3,...,n. The formula for P[X=k] is
where
|