The Catch 22 of Playtesting Games

27 Jan
27 January 2010

While the importance of playtest might seem obvious, the practice took time to catch on and gain acceptance.  In the late ’90s, it was still pretty routine to run into developers who didn’t playtest the games they were working on.

Luckily, I earned my bones at Ensemble, where there was never anything else.  As the only founder with much real experience making games, Bruce Shelley was often responsible for providing us with some clue as to what we should be doing.  Playtesting was the process he had seen work in the past.  (Apparently, the development loop for Civilization involved Sid Meier writing some code, then Bruce playing and making notes, then Sid writing some code, then….)  Ensemble started with it and we never questioned it.

I’ve conducted something north of 2000 sessions for 10 different games during the last 13 years.  The exact manner in which test is conducted and managed has evolved during that time (and is in fact changing on the game Robot is currently developing) but organized team playtest has remained the foundation for all game development at all times and on all projects.

For good reason too:

It’s very effective.

There’s honestly no other way to get anything good.  A novel is the result of myriad revisions and edits.  Every track on any album is recorded dozens of times and mixed before you hear it.  For every hour of movie you see there are 100 you don’t.   Games are no different.  Playtesting is our re-write.

There is nothing better for team communications.

I don’t care what font or graphics you use or how expertly you manage to distill your message down to the minimum words required, few people get anything from anything they need to read (sadly).  You can send COME TO MY OFFICE FOR FREE GOLD BARS and half the team will delete it unread.  Meetings are no better.  There are 40 people in the room and 40 iPhones on the conference table.  You have five guys trading stocks, five checking to see that their EVE characters are training, and the remainder are seeing if the Words With Friends dictionary will accept “BOOBS”.  You could be reading the phonebook up there.

It is infinitely more likely that someone on your team will be aware of the state of the game — what works, what doesn’t, what needs fixed, what is coming next — if they are playing the game.

It promotes team ownership.

It’s empowering to be asked your opinion on almost anything.  A formal process where you’re requested to play the game you’re working on, render your opinion, suggest features, and so on is massively so.  Particularly if, on occasion, you see your comments converted into changes to the game.

When used correctly, it’s very efficient.

Wars can be fought in two ways, push or pull.  Push fighting is when you make a plan and execute that plan no matter what.  You’re pushing your plan on the world.  You’re going to march from A to B — if you run into a minefield and a machinegun on the way you do not care — you’re going to march from A to B.  Pull warfare is where you allow the battlefield to dictate your actions, you let the situation pull you along.  If you see a minefield and a machinegun on your way to B, you maybe flank around them.

Playtest is the epitome of the “pull” approach.  Every day that you go in and test, you’re using the test to answer this question: What needs to be done? If you can properly convert the answer into tasks, there’s no other process that can hold a candle to the efficiency possible in terms of a development hours – game improvement ratio.

It exerts healthy pressure on the team.

If you’re playtesting everyday, the game needs to be playable.  Every day.  If it isn’t, the team should (on its own) see this as a problem and work to fix it.  Knowing that everyone is going to be testing the game on a regular basis also inspires some not to check in horrible game destroying bugs (or at least to fix them quickly once the bitching starts).

Now, all that said, it’s pretty obvious that everything that exists has two sides.  Nothing is all happy with no sad.  There are goldfish who understand this.  For that reason, I feel more or less like a total idiot for having had my first conversation about the shortcomings of playtest just a month ago:

Playtest smoothes out all of the rough edges.

Imagine a graph that represents all of the bits of your game.  Above the baseline are all of the good things, below the baseline are all of the bad things.  Something like this:

The greatness of playtest is that it identifies the bad things and gets them fixed.

The awfulness of playtest is that the same magic that permits this also identifies the good things and gets them “fixed”.

You fill in some of the “pits” but you also chop off some of the “peaks”.

This is happening all the time but it generally goes unnoticed because, to many people, any negative comment is something “bad” that needs “fixed”.  If you’re keeping track via a democratic tally it’s very difficult to determine what decisions belong in which category because there’s always a group that thinks X will increase your sales by 1 million units and another that insists the same feature will result in fans with pitchforks outside your office.  (Typically, both are wrong.)

If you stop to think about it, it’s not difficult to understand how this works.  Think back to whatever blockbuster game last swept your group of friends.  At some point while you were talking about it, someone said something along the lines of “it’s awesome but I hated the way they did X”.  If your buddy had been working on that game, he would have pushed to have that feature cut or changed.  In and of itself, that might not damage whatever awesome game you’re all in love with.  But everyone has one (or more) of these “it was great except for…” items and together they are pure anti-awesome.

It is time consuming to manage.

As mentioned above, it’s very empowering to be asked your opinion and only more so if you perceive that your opinion has resulted in a change.  So much so that when you next opine on something, you might become curious as to what change is going to result.  And when.  And, if a change isn’t going to happen, why the hell not.

It isn’t that difficult to collect feedback and route around the tasks produced from it.  Explaining what is happening and why to everyone involved is another story.  With a team accustomed to development by playtesting and numbering more than 20 or so, it can be easy to spend your week arguing about how any given system should work.  Once you arrive (frequently after several hours) at a solution amenable to one person, the next person you encounter will invariably comment on the bankruptcy certain to take place as a result of that implementation.  At this point, you get to start over and work through all of the points again with someone who wasn’t around the first time.

After the third or fourth conversation of this sort, you’ll start to think that an email or a meeting would be a better way to get this information out to people.  See my comments above on email and meetings.  Feel free to weep.

It provides fertile ground for feature creep.

If your team is playing their game daily during development, by the time it ships they will have logged far more hours with it than the average player ever will (in most cases).  The average player will play it, enjoy it, and move on to something else within a month.  Your team would do the same thing too…if they weren’t making the game.

Because they are making the game, they keep playing it.  But, like the audience, when they start getting bored, they start wanting something new.  The only difference is, “something new” for your team means deleting work that has been done and replacing it with work that needs done.  Work that players aren’t going to appreciate, because they’ll never have the change to get tired of the feature everyone now finds dull.

It’s easy for people to forget that their perception has been altered by playing a billion hours of the game.  Playtesting will produce pressure to make changes for the sake of change, even with experienced teams.

Used to guide development, it will make your producer and publisher batshit insane.

Despite irrefutable evidence that they are nothing more than a giant collection of outrageous lies, publishers love schedules.  If you’re going to honestly use playtest to drive development, your schedule is even more worthless than usual because you could, at any point, answer the “what needs to be done” question without consulting the aforementioned useless table of fibs.

Conversations of this variety will occur:

Hey, I thought the pather was getting done this month?

Yeah, we were going to do that but the programmer who knows that the best caught rickets from playing too many videogames and he’s out for a few weeks.  We could have put someone else on it but that would take four times as long to get done and everyone would be sitting around waiting.  So we just put everyone on the outdoor stuff that doesn’t need to much short-range pathfinding until dude is back.

So, the pather isn’t getting done this month?

No.  It makes more sense to task people on outdoor stuff right now.  We can have everyone working instead of nobody working.

But the schedule says the pather is supposed to be done this month.

As will gems like this:

So what are you guys going to be working on in June?

Is that a codename for something or do you mean “June” like six months from now?

The month.

Hell if I know.  Whatever makes sense.

Sounds good.  You’re all fired.

Interactions like this will sometimes cause friction.

In the end I’m still firmly in the playtesting camp.  Despite delaying specific conversation about it for more than a decade, I can look back and see solutions to the shortcomings of the playtest-centric  approach that surfaced on their own as we went.  In a lot of ways, that’s more evidence of the inherent strength of the technique.

Tags: , , , , , ,