By Jeff Stallings, CPDT-KA
How do you achieve a high level of response in dog training?
You might think that dispensing an awesome treat each and every time your dog successfully completes a command would make him more reliable. But in practice, rewarding for all trials of a cued behavior (command) results in lower response rates, not higher.
Technically referred to as Continuous Reinforcement (CR), rewarding every iteration causes a behavior to degrade: The dog knows a treat will be waiting, so why hurry?
What keeps a dog at the top of his game with cued behaviors is rewarding randomly—and handsomely.
The Vending Machine vs. the Slot Machine
At the outset of teaching a brand new command, Continuous Reinforcement—rewarding each success—is the most effective option. For instance, when teaching a puppy to sit, a food reward for each successful sit makes sense because at this point you’re focused on clearly pairing the verbal cue and hand gestures with the behavior. This basically makes you a vending machine: Put the money in (your puppy sits) and the reward appears (treat!)
But your being a vending machine for too long causes the puppy to stop working so hard. Why bother sitting quickly when a treat inevitably appears regardless of how long it takes that butt to hit the ground? Continuous Reinforcement for too long also causes the dog to become dependent on the food reward: She will refuse to work unless food is present.
Before you get to that point, and usually within just a few days of teaching a new command, it’s time to move to some sort of intermittent reinforcement schedule. Which means it’s time to retire the vending machine and fire up the slot machine.
Entire books are dedicated to the subject of reinforcement schedules and it can quickly become very confusing. Dog owners do not want or need to know all the permutations; they just need to know what works. And here it is: Once your dog reliably performs a behavior on cue using Continuous Reinforcement, say, getting it right around 80% of the time, time to shift to Variable Ratio (VR) reinforcement.
Variable Ratio (VR) means you are a slot machine, pure and simple. When you’re in Vegas playing the slots, what keeps you sitting there pumping quarters for hours—besides the free cocktails, of course—is this:
The probability of hitting the jackpot remains constant, even though the number of plays required to hit the jackpot changes.
Variable ratio essentially means that you’re a random number generator, although it doesn’t have to be that formal. Start with a low ratio; VR3 means that you reward, on average, 1 out of every 3 responses. For example:
Your dog earns a treat with successful command compliance #:
1 2 7 9 15 18 19 23 29 and 30
In that example, we have provided 10 rewards over the course of 30 trails, which averages out to 1 in 3, or VR3. You can then increase your ratio, to VR5 for example, which decreases the frequency of reward, but doing so too fast can lead to frustration on the part of your dog (known as “ratio strain”), so take it slow.
With a given cue and behavior, once you’ve quickly moved past Continuous Reinforcement and your dog is reliably performing with an intermittent (VR) schedule, it’s time to polish up the behavior by only rewarding the very best trials. In the learning theory world that loves acronyms, his is referred to as Differential Reinforcement of Excellent Behaviors, or DRE.
Now is when you start getting picky about what you’re willing to reward and become a referee who rewards only the best trials.
For example, to refine a puppy’s sit cue using DRE, only reward when she sits immediately on cue (verbal or gestural), with no time delay and without moving from the spot she then inhabits. If she delays or takes any steps before sitting, you simply turn and ignore her and wait a few minutes before trying again.
This is a good time to use very high value food (chicken, cheese, liver) because your dog will be more willing to work, just like you’d be more likely to work holidays if paid double-time-and-a-half.
Keep it Simple
Teach a new command/cued behavior and make it reliable by moving quickly through this hierarchy:
- You’re a vending machine who rewards all successful trials (Continuous Reinforcement)
- You’re a slot machine who rewards random successful trials (Variable Ratio)
- You’re a referee who judges each trial and determines if it is better than at least half of all trials and rewards only those…maybe.
Progress through these levels is not rigid and you may combine aspects of more than one as you move through them. Use your intuition and be ready to back up a step if you’ve moved too fast—your dog will tell you. And remember to make the payoff worth it to your dog: Real food (chicken or liver, for example) beats commercial treats any day of the week.
Learning all the time
These rate-of-reinforcement games are in play all the time in your dog’s mind, whether you’re explicitly training or not. Dogs who beg at the dinner table make a random number of trials (begging, pleading, looking very sad/cute) before finally being rewarded. Even relenting once in a great while (Variable Ratio) or accidentally dropping a crumb keeps that dog casino open—and the dog begging.
This is why even inadvertent rewards—the random dropped crumb—can make an undesirable behavior, such as begging, difficult to extinguish. Management is key—avoid begging behavior by keeping your puppy elsewhere during food preparation and meals.
Your dog is learning all the time, not just during formal training sessions. Try to be aware of how random rewards, intentional and inadvertent, affect your dog’s behavior for better or worse.
[A version of this article was printed in the January 2019 Bark Magazine.]