3.02: Operant Conditioning

Edward Thorndike

Operant Conditioning grew out of the work of Edward Thorndike and his theory called instrumental learning, or the "law of effect." The law of effect states that responses followed by satisfaction are more likely to reoccur in the future.

Thorndike observed that cats put into a box would try different strategies to escape. The strategies that worked were strengthened and those that did not work were eliminated.

Thorndike's work was elaborated by B.F. Skinner and he called it radical behaviorism at first to distinguish it from Watson's ideas. Later he changed the name to operant conditioning because he thought it was a more descriptive name.

In Skinner's theory, a response leads to one of two consequences that will determine if the response will be repeated again.

The first is reinforcers and it will increase the probability that the behavior will be repeated.

The second one is punishment and it decrease the probability that the behavior will occur.

Operant conditioning focuses on strengthening or weakening voluntary behaviors while Classical conditioning focuses on involuntary, automatic behaviors.

"Image of Edward Thorndike" Edward Thorndike

Reinforcers

There are positive reinforcers, negative reinforcers and punishments. Positive reinforcers increases the occurrence of the behavior because they are rewards. Food, praise and money can all work as positive reinforcers.

A negative reinforcer increases the behavior because it takes something bad away.

For example, you clean your room so that your parents stop nagging you. There is no reward in cleaning but something bad is taken away (being nagged) and the behavior of cleaning your room increases.

"Image of a young boy being offered cookies as a reward"

Punishments are an unwanted response following a behavior.

When training a dog, spanking him for making a mistake in the house is an example of punishment. The spanking is something unwanted and should decrease the behavior.
Reinforcers can also be primary or secondary.

Primary reinforcers satisfy biological needs (hunger, thirst, and sleep). They are not learned.

Secondary types of reinforcers have to be learned. Examples are praise and encouragement.

"Image of a dog being scolded by its owner"

Skinner

B.F. Skinner worked mainly with animals and he constructed a special training box that he called a Skinner box to train animals to perform various activities based on reward and punishment.

His animals of choice were pigeons and rats since he did most of his work in laboratories.
He got the animals to press bars to dispense food, shocked them to see how strong the food drive was and got them to perform simple routines. He even got a pigeon to play a toy piano.

Reinforcement Schedules

Skinner also talked about the importance of reinforcement schedules.
He stated that when teaching a new behavior, continuous reinforcement is necessary during the acquisition phase.

Every target (wanted) behavior is rewarded. After the behavior is learned he would apply one of four intermittent (partial) reinforcement schedules. Two are based on time and two are based on number of responses.

Fixed-ratio - a fixed number of target responses must be made before a reward is gained. Buy five, get one free is an example of this.

Variable ratio - The number of target behaviors for reward keeps changing. The organism never knows when it will or will not be rewarded. Slot machines are a perfect example.

Fixed-interval - The first target response after a fixed interval of time has passed is rewarded. If you are paid once a week on Friday, Fridays become the fixed interval.

Variable-interval - The length of time of the interval changes after each time. If you have a class where the teacher can give a pop quiz at any time…that is a variable interval.

Which type of reinforcement schedule do you think would work the best once you have taught your dog a new trick?

Watch Operant Conditioning (4:27) and learn more about how it works and how it is different from classical conditioning.

Conditioning

Conditioning accounts for a lot of learning, both in humans and nonhuman species.

In Operant conditioning, there is a tendency for conditioning to be hindered by natural instincts. Instinct drift is a good example. With instinct drift, the natural instincts of a human or animal cannot be retrained.

An example is the experiment conducted by two psychologists, Keller and Breland, who wanted to train raccoons to put a coin in a box. As long as the used only one coin the raccoons learned to drop the coin in the box. When they gave the raccoons two coins to drop in the box their natural instincts took over, which was to rub the two coins together to clean them before eating.

In classical conditioning, a good example of biological influences on conditioning is taste aversion. An example would be if you watch a movie with a friend and have pizza. A few hours later you get nauseated. You may then have an aversion to pizza.

The combination of taste and nausea seems to be a special case. Researchers think that learning to quickly associate taste and nausea is an evolutionary adaptation, since this association helps people to know what foods to avoid in order to survive.

Summary

Applications of operant conditioning are extensive. Your educational journey has been lined with rewards and punishments. Teaching and learning machines are based on operant conditioning. Animal training is done by punishment and reward and the use of extreme punishment has been very controversial. Animals are trained to be performers by shaping or the rewarding behaviors in small increments until the whole routine is learned. Their small behaviors are linked together and this is called chaining.

Operant conditioning can be used to explain a wide variety of behavior, from the process of learning, to addiction and language acquisition. However, operant conditioning fails to take into account the role of inherited and cognitive factors in learning, and thus is an incomplete explanation of the learning process in humans and animals.