Relative Frequency Definition of Probability
The relative frequency definition of probability says: If an
experiment is repeated, the probability of an event (a specified outcome of the
experiment) is the relative frequency of occurrence of that event in a large number of
repetitions of the experiment.
In this definition 'large number' is vague and will not be made precise
here.
The following links use the Virtual
Laboratories in Probability and Statistics by Professor
Kyle Siegrist of the University of Alabama at Huntsville.
Each link opens a page that illustrates the relative frequency definition
of probability for a different situation.
A page by Professor Siegrist with a link to a simulation of the birthday problem is
found here. When you
reach this page, scroll down until you reach red underlined text that says
birthday experiment. Follow this link to open the birthday problem simulation. Set the number of people in
a room at 30 (k=30). The height of the blue bar on the right above 0.0 indicates the
probability of no matching birthdays while the bar above 1.0 shows the probability of at
least two of the people having the same birthday. What are these probabilities for
30 people? What are they if the number of people in the room is set at 20? At
23? Now set k at 30 and run the simulation of the experiment 100 times. What
is the actual proportion of cases in which there were no matching birthdays? What is
the actual proportion of cases in which there was at least one matching birthday?
A link to a simulation of coin tossing is found here. When you reach this
page, scroll until you see coin experiment underlined in red. Follow
this link to open the coin tossing simulation. Run the coin tossing simulation with 1 single
fair coin 100 times. Record the number of times a head occurs in the 100 tosses.
How many heads would you expect? Does the number that you have observed lead
you to believe that the coin is not fair?
Now run the experiment with 2 fair coins. How many times would you expect to get
2 heads? What did you observe in your 100 tosses?
Follow the link in the previous paragraph and scroll to the
red-underlined dice experiment on the
linked page. Run the die
tossing simulation with 1 fair die 100 times. Record the number of times that a 6
occurs. For a fair die, about how many times would you expect a 6 to occur?
Now change the experiment so 2 fair dice are tossed. About how many times would you
expect a sum of 5 on the dice to occur in 100 tosses? How many times did you get a
sum of 5 on your 100 tosses?
From the same linked page in the last two sections scroll to the
red-underlined card experiment and follow this link. Run the card experiment 20 times. About how many times should a red
card occur in 20 runs of the experiment? How many times did you get a red card in
your 20 runs of the experiment?
This link goes to a page on
drawing cards for a Poker Deck. When the page opens follow the red
Poker Experiment link Choose a certain outcome. Try a few hands
and see if the actual probabilities of your chosen outcome are close to the
theoretical outcomes of that event.
The matching experiment is
explained at this link. Go to this link, read information on the
matching experiment, and try the experiment by pressing matching experiment
underlined in red.
The above examples involve gambling. The mathematical study of
probability has roots in the study of certain gambling situations. The
next link takes you to a web page that has much information about modern
gambling. If you choose to visit the page be careful if you don't want
to get on their mailing list. The page is called the Wizard
of Odds .
Sample Space Definition of Probability
-
Sample Point
Any fundamental outcome of an experiment.
-
Sample Space
The collection of all sample points for an experiment.
-
Event
Any subset of the sample space of an experiment. Events are
usually denoted by capital letters from the first part of the
alphabet.
-
Probability of an Event
- The probability of an event is a number between 0 and 1 (it may
be 0 or 1). The probability of event A is denoted by P[A].
- Probability indicates the likelihood that the event will occur.
Events with probabilities close to 1 are more likely to happen than events with
probabilities near 0.
-
Probability Laws
- P[Not E]=1-P[E]
- P[A or B]=P[A]+P[B]-P[A and B]
-
Examples
- Example 1: A fair coin is tossed 3 times. Events defined are: A=At
least one head in the 3 tosses, B=Exactly 2 heads in the 3 tosses,
and C=No heads in 3 tosses. Find P[A], P[B], P[C], P[Not A],
P[Not B], P[Not C], P[A or B], P[A or C], P[B or C], P[A and B],
P[A and C], P[B and C].
-
The sample space consists of sample points HHH,
HHT, HTH, HTT, THH, THT, TTH, and TTT. These sample
points can be divided into those in A (red ones), those in B
(underlined ones), and those in
C (italicized ones). They are then:
HHH, HHT,
HTH, HTT,
THH, THT,
TTH, and TTT
- From the last display, A consists of sample points in red,
so P[A]=7/8; the sample points in Not A are those not red, so
P[Not A]=1/8.
- From the last display, B consists of sample points
underlined, so P[B]=3/8; sample points in Not B are those not
underlined, so P[Not B]=5/8.
- From the last display, C consists of sample points
italicized, so P[C]=1/8; sample points in Not C are those not
italicized, so P[Not C]=7/8.
- From the last display, A and B is all sample points that are
both red and underlined,
so P[A and B]=3/8 while A or B is all sample points that are
red or underlined (or both),
so P[A or B]=7/8. Notice that P[A or B]=7/8=P[A]+P[B]-P[A
and B]=7/8 + 3/8 - 3/8.
- From the last display, A and C is all sample points that are
both red and italicized,
so P[A and C]=0/8 while A or C is all sample points that are
red or italicized (or both),
so P[A or C]=8/8. Notice that P[A or C]=8/8=P[A]+P[C]-P[A
and C]=7/8 + 1/8 - 0/8.
- From the last display, B and C is all sample points that are
both underlined and italicized,
so P[B and C]=0/8 while B or C is all sample points that are
underlined or italicized (or both),
so P[B or C]=4/8. Notice that P[B or C]=4/8=P[B]+P[C]-P[B
and C]=3/8 + 1/8 - 0/8.
- Example 2: A pair of fair dice are tossed. One die is green and the
other die is white. Events defined are: A=Green die is
greater than or equal to 2, B=White die shows an even number,
C=Sum of numbers on the dice is 6. Find P[A], P[B], P[C], P[Not
A], P[Not B], P[Not C], P[A or B], P[A or C], P[B or C], P[A and
B], P[A and C], P[B and C].
The number on the white die is shown on the left margin of the
next table, and the number on the green die is shown along the top margin of
the following table. Sums are shown in
the interior of the table. Outcomes belonging to event A are
shown in red, outcomes belonging to event B are shown underlined, and those in event C are in italics.
|
1 |
2 |
3 |
4 |
5 |
6 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
- From
the table shown above P[A]=(Number of red
sums)/36=30/36, P[B]=(Number of underlined
sums)/36=18/36, and P[C]=(Number of bold italicized
sums)/36=5/36. P[Not A]=1-P[A]=1-(30/36)=6/36,
P[Not B]=1-P[B]=1-(18/36)=18/36, P[Not C]=1-P[C]=1-(5/36)=31/36.
- From
the table above P[A or B]=(Number of red or underlined (or both) sums)/36=33/36, P[A or C]=(Number of red or
bold italicized sums)/36=31/36, and P[B or C]=(Number of
underlined or bold italicized sums)/36=21/36.
- From
the table above P[A and B]=(Number of red and underlined
sums)/36=15/36, P[A and C]=(Number of red and bold
italicized sums)/36=4/36, and P[B and C]=(Number of
underlined and bold italicized sums)/36=2/36.
- Example 3: A card is drawn from an ordinary deck of 52 cards. Events
defined are: A=Black card, B=Spade,
and C=Ace. Find P[A], P[B], P[C], P[Not A], P[Not B], P[Not
C], P[A or B],
P[A or C], P[B or C], P[A and B], P[A and C], P[B and C].
- The
deck has 26 black cards, so P[A]=26/52, it has 13
spades, so P[B]=13/52, and it has 4 aces, so P[C]=4/52.
- In
the deck 13 cards are both black and spades, so P[A and
B]=13/52, 2 cards are both black
and aces, so P[A and C]=2/52, and one card is both a
spade and an ace, so P[B and C]=1/52.
- In
an ordinary deck 26 cards are either black or a spade
(or both) so P[A or B]=26/52, 16 cards are either black
or an ace (or both), so P[A or C]=28/52, and 16 cards
are either a spade or an ace (or both), so P[B or
C]=16/52.
- Example 4: A fair die is rolled and a fair coin is flipped the
number of times shown on the die, e.g. if the die lands with a
four facing up, the coin is flipped 4 times, if the die lands 1,
the coin is flipped once, etc. List a few sample points for
this experiment. How many sample points are there in the
sample space of this experiment? Should each sample point be
assigned the same probability measure?
- Example 5: A fair coin is tossed. If this coin comes up
heads, a fair coin is tossed, otherwise, a
coin with probability of heads equal to 2/3 is tossed. List
all of the sample points in the experiment. Should each
sample point be assigned the same probability?
- Example 6: Pick 2 balls without replacement from a container with 4 balls
numbered 1 through 4. Also,
balls numbered 1 and 2 are green and balls numbered 3 and 4
are red. List all of the sample points in the experiment.
What probability should be assigned to each of the sample points?
-
Conditional Probability and Independence
- Rules for Conditional Probability
- P[A|B] = P[A and B] / P[B]
- or P[A and B] = P[A | B] P[B] or P[A and B] = P[B | A] P[A]
- Conditional
Probability Calculations
- Consider drawing two cards, one after another without
replacement, from an ordinary deck. On drawing the first
card the sample space consists of 52 cards. However, once
the first card is drawn, the sample space for the second draw
consists of the 51 cards that remain. Then, for example, the
probability that the 2nd card is a heart given that the first card
was a heart is 12/51, while the probability that the 2nd card is a
heart given that the first card was not a heart is 13/51.
These probabilities are called conditional probabilities.
They are denoted by
P[2nd is heart | 1st is heart] and P[2nd is
heart | 1st is not heart].
The vertical symbol | means
given.
- A
ball is selected from a container that holds 6 red and
10 green balls. Then a second ball is selected
from the container without replacing the first ball.
The probability that the second ball is green given that
the first ball is green is 9/15 while the probability
that the second ball is green given that the first ball
is red is 10/15. These probabilities can be
written as
P[2nd ball green | 1st ball green]=9/15 and P[2nd ball
green | 1st ball red]=10/15
- A
population is classified by gender and political party
affiliation. The membership of the population is
shown in the following table with gender along the top
and political party along the left side of the table.
The interior of the table givens the number of people in
each classification.
|
Female |
Male |
Democrat |
45 |
50 |
Republican |
60 |
44 |
Independent |
10 |
15 |
A person is selected at random from this population.
The following ordinary probabilities can be found
directly from the table: P[Democrat]=95/224, P[Female]=115/224,
P[Democrat and Female]=45/224. Also, the following
conditional probabilites can be found directly from the
table by working with the appropriate row or column of
the table. P[Democrat | Female]= 45/115, P[Female
| Democrat]= 45/95, and P[Independent | Male]= 15/109.
From this example, notice that the following are
true:
(1) P[Female | Democrat] = P[Female and Democrat]/P[Democrat]
(2) P[Independent and Male] = P[Independent | Male]
P[Male].
The two rules, P[A | B] = P[A and B]/P[B] (for P[B]
unequal to 0) and P[A and B] = P[A | B] P[B] are true
for any two events, A and B.
- Independence
of Events: Events A and B are independent if and only if P[A |
B] = P[A]
- The definition of independence can be stated as A and B are
independent if and only if the occurrence of one of them, say
B, does not affect the occurrence of the other, A.
- Since P[A and B] = P[A | B] P[B], and since for independent
events, P[A | B]=P[A],
P[A and B] = P[A] P[B]
- In tossing a fair coin twice, the probability of event A,
getting heads on the first toss is 1/2. The probability
of event B, getting heads on the second toss is also 1/2.
The probability of event A and B, getting heads on the first
and second toss is 1/4. In this case P[A and B] = P[A]
P[B], so A and B are independent events.
- In drawing two balls without replacement from a container
that holds 6 red and 10 green balls. The probability of
event A, that the first ball picked is red, is 6/16. The
probability of event B, that the second ball picked is red, is
also 6/16. This second probability may seem surprising
but it can be shown to be true. The probability of event
A and B, that both balls are red, is (6/16)(5/15). In
this case P[A and B] is not equal to P[A] P[B], that is A
and B are not independent events.
- If the engines on a jet airplane are identical in their
ability to operate throughout a particular flight, if the
probability that any one of the engines works properly for a
particular flight is 0.97, if the engines operate
independently of one another, and if the plane can make this
particular flight safely if at least one engine continues
working, what is the probability that a jet plane with four of
these engines is able to make this flight safely?
P[Makes flight safely] = 1 - P[Doesn't make flight safely] = 1
- P[all engines fail] = 1 - 0.034 = 0.99999919
The key step was finding the probability that all engines fail.
This was found by multiplying the probability that any one of
the engines fail together 4 times--this is allowed by the
assumption of independence for the engines.
-
Random Variables
- Definition--A random variable is a quantitative variable whose
value is determined by some chance mechanism. Examples are
the total number of heads in 10 tosses of a fair coin, the toss
number of the first head in 10 tosses of a fair coin, the number
of red balls selected when drawing 2 balls (without replacement) from a container that holds 8
red balls and 20 green balls.
- Discrete and Continuous Random Variables
- Discrete random variables can only have a finite or
countably infinite number of values. The number of heads
in 10 tosses of a fair coin, the toss number of the first head
if a fair coin is tossed until a head appears, or the number
of green balls selected in the example given above.
- Continuous random variables can assume any of an uncountably
infinite set of values. For example, if a point is
picked at random on the interval from [0,1], there are an
uncountably infinite number of values that could be picked.
- Probability Distribution of a Discrete Random Variable--Each
discrete random variable can only assume a finite or countably infinite number of values.
If a table is made associating the probability of each value with
the value, the table or association is the probability
distribution of that random variable. As an example consider
tossing a fair coin 3 times. A random variable that can be
associated with this experiment is a count of the number of heads
in the 3 tosses. Call this random variable T. T can
assume any of the values 0, 1, 2, or 3. The probabilities
associated with each of these values are P[T=0]=1/8, P[T=1]=3/8,
P[T=2]=3/8, and P[T=3]=1/8. These probabilities make up the
probability distribution of this random variable. In table
form, this probability distribution is:
t |
0 |
1 |
2 |
3 |
P[T=t] |
1/8 |
3/8 |
3/8 |
1/8 |
Notice that the sum of the probabilities is 1. This is true
for any discrete random variable.
Run the experiment of
tossing a fair coin 3 times 1000 times updating after every
100 tosses. To do this link here,
and when the page opens click the red die in front of number
4. Set the number of coins at 3. After running the
experiment 1000 times, what can you say about the theoretical
(in blue) and actual (in red) probability distributions of the
number of heads?
In all examples of discrete random
variables, the probabilities in the probability distribution table
give the 'long-term' proportion of times that the random variable
assumes each possible value.
- Example 1: Find the probability distribution of the number of
red balls selected if two balls are selected (without
replacement) from a container which has 4 balls numbered 1 through
4, with balls numbered 1 and 2 red balls and balls numbered 3
and 4 green balls.
- Example 2: Find the probability distribution of the number of
red balls selected if two balls are selected (with
replacement) from a container which has 4 balls numbered 1 through
4, with balls numbered 1 and 2 red
balls and balls numbered 3
and 4 green balls.
- Example 3: Find the probability distribution of the number of
red balls selected if two balls are selected (with
replacement) from a container with 20 balls, 10 of them red and 10
green.
- Example 4: Find the probability distribution of the number of
red balls selected if two balls are selected (without
replacement) from a container with 20 balls, 10 of them red and 10
green.
For examples 3 and 4, you can use this link
to a simulation of the situation. When the page opens
click the red die in front of number 4 to open the simulation.
When the simulation opens, set N to 20, set R to 10, the number of
red balls, and set n, the sample size to 2. Select with or
without replacement as appropriate for the example. The blue
graph and the text below it will show probabilities for each
number of red balls.
- Example 5: In examples 3 and 4 decrease the number of red balls.
What happens to the probability distribution of the number of red
balls in the sample? Is this expected? Also, for each
number of red balls, observe the differences in the probability
distribution with and without replacement. Next, set the
number of red balls at 10, use sampling with or without
replacement, and run the simulation of drawing 2 balls, 1000
times, updating every 100 times. What do you see?
- Mean
(also called Expected Value) and Standard Deviation of a Discrete Random Variable
Consider again the count of heads in 3 tosses of a fair
coin. If this experiment is repeated, say 10 times, and
the
number of heads in each series of 3 tosses is counted, you
will have a set of numbers like 0,1,3,1,2,2,1,1,3,0. The
average of these numbers is the average value for the random
variable, the number of heads in 3 tosses of a fair coin.
To see what happens in a larger number of runs of the
experiment, again link here,
and when the page opens click the red die in front of number
4. Set the number of coins at 3 and run the experiment
100 times. What is the average number of heads per 3
tosses? Now reset and run the experiment 1000 times.
What is the average number of heads per 3 tosses?
The long-term average number of heads is called the
expected value of the random variable, the number of heads in
3 tosses of a fair coin. This expected value can be
found for most random variables. Think of expected value
as the average value of a random variable.
There is an easier way to find the expected value of this
(or any) discrete random variable. If the experiment of
tossing the coin 3 times is repeated for a large number, N,
times, the experiment will end in 0 heads n0 times, in 1 head
n1 times, in 2 heads n2 times, and in 3 heads n3 times.
The total number of heads is 0 n0 + 1 n1 + 2 n2 + 3 n3, and
the average number of heads per run of the experiment is
(0 n0 + 1 n1 + 2 n2 + 3 n3)/N = 0 (n0/N) + 1 (n1/N) + 2
(n2/N) + 3 (n3/N)
For large N, (n0/N) ~ P[0 Heads], (n1/N) ~ P[1 Head],
(n2/N) ~ P[2 Heads], (n3/N) ~ P[3 Heads], so the average
number of heads per run of the experiment is
0 P[0 Heads] + 1P[1 Head] + 2 P[2 Heads] + 3 P[3 Heads]
This is called the Expected Value or Mean and is denoted, for a
general random variable X, by E[X]. It can be computed
by
Using this formula on the random variable T, the total
number of heads in 3 tosses of a fair coin, you get
µ=E[T] = 0 (1/8) + 1 (3/8) + 2 (3/8) + 3 (1/8) = 12/8 = 3/2 =
1.5. This can be interpreted as the average number of
heads per sequence of 3 tosses if the experiment is repeated a
large number of times.
Just as you are able to find the average value for a random
variable, so you can also find the standard deviation of the
random variable. In the case of a random variable, the
standard deviation is given by
For random variable T, the total number of
heads in 3 tosses of a fair coin, the standard deviation
computed by the rightmost formula is
SD[T] = Square Root of [02 (1/8) + 12 (3/8) + 22 (3/8) + 32 (1/8)
- (3/2)2] = Square Root of [3/4] = 31/2/2
-
The Binomial Random Variable
- Definition--A
binomial random variable with parameters n and p is a count of
the number of successes in n experiments (or trials).
- Each
trial can result in only two outcomes, a success, S, or
a failure, F.
- The
probability of a success on any trial is P[S] = p and
the probability of a failure on any trial is P[F] = 1-p
= q
- The
outcome of any trial has no effect on outcomes of other
trials.
- The
binomial random variable is a count of the number of
successes in n trials.
An example is the toss of a fair coin 3 times. If you
think of a success as a head, the count of the number of heads
in 3 tosses satisfies the definition of a binomial random
variable. Here n=3 and p=1/2.
Another example is the drawing of 2 balls with replacement
from a container with 20 balls, 10 of which are red and 10
white. A red ball is considered a success.
Then the count of red balls is a binomial random variable with
n=2 and p=1/2. If the drawing is without replacement,
the probabilities of a red ball change from draw to draw, so
the count of red balls is no longer a binomial random
variable. However, when the number of balls drawn is
much smaller than the total number of balls in the container,
the count of red balls will be distributed almost like a
binomial random variable.
- Probability
Distribution of a Binomial Random Variable with parameters n
and p.
Since the binomial random variable is a count of the number of
successes in n trials, the number of successes can only be an
integer between 0 and n. Thus, if X is the total number
of successes in n trials, P[X=k] will only be nonzero for
k=0,1,2,3,...,n. The formula for P[X=k] is
where
|