Operant Conditioning

Thorndike's Law of Effect:

e.g. Thorndike discovered by testing cats in a "puzzle box"

B.F. Skinner was inspired by Thorndike's work. He defined this type of learning as

Operant Conditioning:

e.g. Skinner box: rats will press a bar if they then receive food

pigeons will peck at a disk to get food.

Behavioral Contigencies:

1. Positive reinforcement:

e.g. giving a dog a treat for sitting when told to "sit"

giving praise to a child who helped with household chores

Premack principle:

e.g.

e.g.

Primary Reinforcer:

e.g.

Secondary Reinforcer:

e.g. money

e.g. Wolfe (1936) showed chimpanzees could be reinforced by giving them poker chips which could be used to "buy" food out of a vending machine called the "chimp-o-mat".

How do you reinforce a behavior that an individual does not yet perform?

e.g. rats pressing a lever -- don't do it naturally

Shaping:

e.g.

1st reinforce rats for looking in the direction of the lever

2nd reinforce rats for looking at the lever and standing on that side of the cage

3rd reinforce rats for touching the lever

4th reinforce rats for pressing the lever

Chaining:

e.g. getting a rat to do a complex series of behaviors

Schedules of Reinforcement:

a) continuous schedule of reinforcement:

e.g. rat in Skinner box

e.g. candy vending machines

b) partial schedules of reinforcement:

i) fixed-ratio schedule of reinforcement:

e.g. give a rat a food pellet every 3rd time they press the lever

ii) variable-ratio schedule of reinforcement:

e.g. give a rat a food pellet on average every 3rd time they press the lever.

1st time: after 4 presses

2nd time: after 2 presses

3rd time: after 3 presses

e.g. slot machines in casinos -- win money after an unpredictable number of times playing

e.g. Skinner was able to get pigeons to peck up to 10,000 times to get a single pellet of food.

iii) fixed-interval schedule of reinforcement:

e.g. give a rat a food pellet for pressing a lever after 30 seconds. Any prior presses won't produce a reward.

e.g. receiving your mail at the same exact time everyday. If check your mailbox before 2 pm, then get no reward (mail), but if check after 2 pm then get the reward (mail).

iv) variable-interval schedule of reinforcement:

e.g. rat will get a food pellet for pressing a lever after 30 seconds on average.

1st time: after 45 seconds

2nd time: after 30 seconds

3rd time: after 15 seconds

e.g. fishing -- will get a fish after a varying amount of time.

Behavioral Response Patterns to each type of Reinforcement Schedule:

e.g. study hard for an exam and then after the exam students will typically wait a while before starting to study that subject again.

pattern of pause is different for Fixed Ratio and Fixed Interval-
e.g. Fixed Interval has a scallop effect. Since another reinforcer is never available immediately, responding is not resumed immediately. As the time of reinforcement approaches in the interval, then responding gradually increases.

2nd type of behavioral Contingency:

2. Negative Reinforcement:

e.g. the annoying buzz in a car

a) escape learning -

e.g.

b) avoidance learning -

e.g.

other examples:

i) you are outside and the bright sun is stinging your eyes. You put on your sunglasses and your eyes instantly feel better.

ii) a pale man is going to the beach. he puts on a hat and sunblock before spending time in the sun so he won't get sunburned.

ii) a woman gets acid indigestion from eating a hamburger at her favorite restaurant and she quickly takes an antacid to relieve the pain. The next week before she eats the burger she takes some antacid.

3. Punishment:

e.g. put a child in "time out" in the corner for throwing a temper tantrum. "Time out" is not fun and the child doesn't like it. So in the future they are less likely to throw a tantrum.

e.g. squirting a cat with water when it jumps up on a table where it is not allowed.

Punishment is often ineffective. The following make punishment most effective:

e.g.

e.g.

e.g.

e.g.