Playtest Design Lessons Part Two: Five More Lessons

Jan 25

Last week, I discussed overall playtest design basics, and how designers and testers alike can get the most out of each other to create awesome games. Last week was Lesson One, which I would sum down to “Understand and communicate about the scope of the playtest.” Today, I’m delving into five more lessons I see as key to being a designer who sets their playtesters up for success and being a playtester who delivers invaluable feedback.

Lesson Two: Identify What You Like

*The microscope is a metaphor for, uh… your own sense of introspection? Listen, this was a hard one to find an image for.*

Different people are going to like different things about a game. Bartle’s Taxonomy is one lens through which to look at this (as Andrew Fischer kindly described in one of his articles, saving me the trouble), and there are numerous others. Whichever model(s) you want to apply to your game’s players, individual player preferences for activities in games will certainly come out during playtest. This is a good thing - the player base at large will also have a diverse set of opinions, too. It’s important that the playtesters reflects this range of experiences. But remember that a game is malleable during playtest. This means that when two groups like the game for different reasons, the feedback will naturally pull the designer in two different directions. Designers and playtesters should both account for this in their thinking.

Lesson for Designers: If you’re the designer, then you need to know what you like about your game (or piece of content you’re designing for an existing game). I’ve talked about identifying the heart of an experience before, and during early development (especially of a new game), that can be a process of discovery within yourself. Even if the heart of the experience evolves as you playtest, iterate, and playtest more, you should continue to check in with yourself: does what you like about the game still shine? If what you like and what the playtesters like are diverging, why is this happening? If the playtesters are pushing back on a mechanic, ask yourself whether that mechanic actually enhancing the parts of the game you like, or are you defensive of what you want it to accomplish?

A concrete example of this comes from when I was working on the calculate mechanic for X-Wing 2nd Edition. Frank, Alex, and I knew that we wanted it to reflect the difference between synthetic and organic pilots, and Frank and I were working on specific implementations. We tried iterations where it persisted round-to-round, iterations where it wasn’t spent on use, iterations where the action gave you multiple tokens, but none of them quite clicked - they were proving too difficult to balance against focus, and the playtesters were pushing back. We were a bit stymied. But coming in with a bit of distance, Alex came up with an answer: calculate would be intrinsically different only by being weaker.

Initially, I didn’t like it - the expected values between changing one focus result and changing all focus results looked too similar on paper. But on assessing this idea more deeply, I realized that Alex’s solution did still achieve what we had liked about the prior iterations. Individual droid ships and pilots could have abilities to make them behave more differently on a case-by-case basis, but “change one focus result” does feel significantly more limited than “change all focus results” when you’re actually choosing to take a calculate action. Players tend to vividly imagine the risk of rolling all eyeballs even if it rarely happens. What I had liked about the prior implementations of the content was that they enforced the difference between the two modes of reflecting the pilot’s concentration, but Alex’s option still achieved that.

Lesson for Playtester: One thing that is important to remember is that, unless you have worked with the designer for a very long time, they probably don’t know what you like about the game. Are you a player who enjoys the mental challenge of the game? A player who comes in for the lore, or thematic considerations? A player who is very much driven by a particular fantasy the game provides? Are you a player for whom community is the biggest draw? Is there a particular playstyle that keeps you invested? In all likelihood, all of these things factor into your enjoyment of the game in some way, and providing this context along with your feedback can make it much more useful to the designer.

This information helps the designer make a better game in a couple of ways. First off, it helps guide the designer see what parts of the game are working for which types of player. If the alignment of the mechanics with the lore is sufficient for most people, but not landing for the people who care the most about the setting of the game, it might need a tweak even if it’s something that most people report as being acceptable. By contrast, if what you like about the game is a particular playstyle, the designer needs to weigh this alongside your opinions about competitive balance. That doesn’t mean that the designer can discount your opinion - on the contrary, if that playstyle is keeping you playing the game, and others echo this sentiment, the designer needs to consider it in their long-term thinking. And it can often help to untangle why experiences were negative for you in the moment. If something negatively affects your preferred playstyle, but wouldn’t affect it as much with a minor tweak, the designer can make that minor course-correction if they are aware of the reason behind your feedback. Simply asking for a change without explaining why it’s impacting something you like is far less useful to the designer.

Lesson Three: Delineate Events and Experiences

Subjective experiences are a critical element of what playtesters provide to a designer. I outlined the role of balance feedback in the prior article but, as the example with Darth Vader illustrates, “balance” is not something that can be measured with numerical models alone. As a designer, you need to know what you are balancing “toward.” As a playtester, you need to give the designer experiential feedback that helps them understand how the current balance “feels” so that they can decide whether and how to adjust it.

But the designer also needs to know the events that transpired in the game, to understand mechanistic issues that may have arisen or assess where an individual’s experiences may have differed from the rest of the playtesting pool.

Lesson for Designers: When designing your feedback mechanism (questionnaires, verbal interviews, or whatever method you prefer), ask the questions in such a way that they encourage the testers to report events in one question, then reflect on those events in a subsequent question. Even if you are purely looking for competitive analysis data, give your testers plenty of space in which to report their subjective experiences. And once you have the data, assess it in three ways:

What are the trends in the subjective feedback?
What are the trends in the reported events?
How do these trends overlap and differ?

Imagine that while comparing subjective feedback, you notice that numerous testers are claiming a particular mechanic or strategy is too weak. Look at whether it is actually winning games. Then look at how its record stacks up to the subjective feedback. In the games where it is reported as being weak, are there any other consistent variables? Sometimes you will find a lurking variable this way - perhaps a piece of content is particularly vulnerable to a given archetype or strategy, or perhaps some counterplay trick players can learn renders it far less effective. This can help you decide whether the issue is one of feel, mechanical potency, or some combination of the two.

Lesson for Playtester: As outlined above, presenting your subjective feedback is key to your role as a playtester. Unfortunately, the truth is that for most people, losing is unenjoyable. This can make it difficult to determine whether you disliked something because of an intrinsic part of the experience, or simply because you lost. Where this differentiation will become apparent to the designer is across a wide array of data from numerous testers. If lots of people are reporting losing to something and also reporting hating it, the designer might determine that it’s a balance issue. If only a few people are reporting losing to something but lots of them are also reporting hating it, the designer might instead decide that the issue has to be solved with a fundamental rework of the content in question. You can make the designer’s job a lot easier in this regard by analyzing your subjective experiences, and presenting them (as best you can) separately from the relatively objective series of events that occurred in the playtest.

Some designers ask for this by default, such as with a series of questions about what happened in the playtest game, followed by more detailed feedback about the content that felt weak, powerful, or problematic. Others do not, in which case it is useful for you to make this division yourself, such as by first giving a brief overview of the game’s events (“On turn 1, I did this and my opponent did this. On turn 2…” etc). This doesn’t need to be exhaustive in detail, but try to focus in on the events as they transpired. It can be especially useful to take notes on these events during the game itself, so that you don’t let the details slip into the vagaries of memory.

Then present your subjective experiences. Did you really enjoy some piece of content you were playing with? What made it fun? Did a particular ability make you have less fun in the game? If so, why? Did you regret anything you chose to play? For this portion of the feedback, try to give the designer an impression of what someone encountering this content in the world might feel.

Finally, give your conclusions. What do you take away from this experience? Do you have suggestions how the designer might proceed? Did something seem fine on a conceptual level, but too efficient? Offer details as needed. This could also be where you provide the math to underscore your conclusions (if efficiency is easily derivable from a formula in the game in question - in many, it is not), or offer comparative insights.

Lesson Four: Don’t Catastrophize Over a Single Bad Test

*The most tempting, forbidden button in all of Tabletop Simulator…*

Whether you’re the designer or a playtester, after you see or experience a frustrating loss firsthand, it’s easy to feel like the content you’re looking at will ruin the game. And it’s possible that that instinct is right, but it rarely helps to make big changes based on only one data point.

Designer Lesson: Sometimes a single test game goes off the rails, and everyone has a bad time. As a designer, there are few work experiences more excruciating than observing or playing in the test game that just sucks. Reading about it happening is marginally better (for me, at least), but still can be painful. It usually makes me want to set my prototype on fire.

It’s also part of the job, and usually necessary to get to a good game. So what do you do?

First, take a breath. The point of playtesting isn’t for you or the testers to play a game that is already fun. That’s the point of the game you’re trying to create. But the actual act of playtesting sometimes has to include playing a game that is currently unfun. Otherwise, you’ll never get the information you need to make it fun.

So if you’ve had a really bad test, try it again. Try it with a different group. If there’s something egregious, tweak it, but try not to blow up the whole design. Instead, change as little as possible and see if the issue recurs. If it does, consider making more sweeping changes to address it.

Playtester Lesson: You’ve just had a terrible loss, and you really want to warn the designer about how they’re about to ruin the entire game. What do you do?

First, take a breath. Any experienced designer has assuredly heard countless doom-and-gloom prophecies about their games, and will likely be fairly jaded to this sort of feedback when it appears in isolation. This doesn’t mean that you shouldn’t report this bad experience, but it does mean that you should scale the level of alarm you raise to the amount of information you’re basing it upon. Report the data of the test, but before you ring the bells too loudly, see about dedicating another one of your allocated tests to duplicating the circumstances. Switch lists with your opponent. Play the strategy against another tester, and see what happens. Did you get the same outcome? If you have access to other testers, encourage them to try the problem content and report their results. Remember, the power of playtesting lies in data aggregation. Occasionally, a single test will be enough to suss out that something is a problem, but most of the time, more data is needed to draw a solid conclusion. While tabletop games playtesting rarely reaches the level of rigor of a scientific journal (and won’t until universities with large endowments start chucking around generous tabletop game research grants), the same fundamental principles apply to the inquiry process.

Lesson Five: Examine the Things that Seem Inefficient

Over years of watching games develop, I’ve seen one clear trend: the most overpowered things that make it to print are almost always parts of the game that failed to make a good first impression. It happens like this: something doesn’t catch the playtesters’ eyes early on, so the designer continue to drop the price or increase the efficiency to get people to try it. But the playtesters have limited time, and already mentally categorize that piece of content as “weak,” so they don’t end up testing it. Even the designer ends up undervaluing it, but they keep dropping the price to see if this will be the right price. This goes on through the entire process. Then you come to a new phase of testing with new testers (or, in the worst case scenario, the product release) and everyone looking at it with fresh eyes immediately identifies the overpowered content.

Designer Lesson: You should be watching like a hawk to see what is going unused. Overused content will always be at the forefront of your mind, but underused content in testing is way more dangerous to the balance of your game. If a piece of content isn’t showing up in tests, don’t just drop its cost a little bit and expect the playertesters to realize it has changed. They’re trying to absorb an entire new pool of content, and some amount of mental shortcuts are inevitable. This means that if they’ve written something off as “weak,” it will stay “weak” in their minds until such time as you radically increase its efficiency or rework it entirely. Remember the frog in the boiling pot of water: a slow increase in temperature isn’t noticeable, but a dramatic, rapid shift will jolt the frog to react. This isn’t to say that frogs make good playtesters, but I’ve never actually tried, so if you do give it a shot and it works out, let me know.

If you want to make sure this isn’t happening (and your game accommodates it), during balance testing, you can always create a spreadsheet of all of the content and assign it out for some number of games. While this doesn’t guarantee undervalued things will be identified, it at least means they won’t go entirely untested.

Playtester Lesson: This one’s really simple: you can help by intentionally dedicating some of your tests to content you find boring, weak, or otherwise unappealing. Then report in on how it fared. Remember, the more data the designer has on something, the better they can tune it to excellence. This goes for bringing overpowered content down to an appropriate level, but it applies equally to unappealing content.

This may not be necessary if the designer provides assignments to playtesters, making sure that all elements receive table time. However, taking the attitude of wanting to use content that seems unappealing can still apply in these circumstances. Instead of simply selecting it as required and then bringing other pieces you know are powerful to compensate, try to find a way to make the thing you’ve been assigned really shine. If you can’t, the designer definitely needs to know. Remember, your job as a playtester isn’t necessarily to win any given game - it is to gather crucial data for the designer. If you suffer a crushing defeat because you leaned into a strategy that you thought was inefficient, the designer should recognize that this test was just as valuable as a close-fought game.

Lesson Six: Try Not to Put the Conclusions Ahead of the Data

No, no, you just teach the horse to walk backwards, and then… — *No, no, you just teach the horse to* ***walk backwards****, and then…*

As a designer or as a playtester, it can be easy to muddle your conclusions with the data you are assessing or gathering. Worse, you may be tempted to place your conclusions first and then find data to support it. Playtesting is almost never double-blind, and rarely peer-reviewed. It is data-based, but much of the data is subjective. Games are, after all, at least as much art as science.

Designer Lesson: As a designer, you want your game to be good. And when you’re six iterations deep on a specific mechanic, you really want to be good now. It’s easy to cherry-pick data from a large enough pool of playtest data to support the conclusion that things are fine already without even meaning to. Make an active effort to draw your conclusions from the data, and not vice-versa.

Here are a few concrete strategies I’ve found effective for drawing conclusions from data without letting your desires leech into the process (too much, at least):

Use spreadsheets to tally up how many times events (positive, negative, or neutral) are occurring across the testing feedback you’ve received. This is especially helpful if you have access to full reports on the games.
Ask your playtesters to give numerical ratings to their feelings, especially with regards to how fair a game felt (I usually use a 1-5 scale, with 1 being “very unfair in Player 1’s favor” and “5” being “very unfair in Player 2’s favor”). This can help reduce the ambiguity you might read into more qualitatively presented feedback.
Get someone uninvolved with the project to read over the playtest feedback around a particular topic and tell you their conclusions from the data. They might well identify trends you missed, or see something you wanted to avoid looking at to preserve your ego.

Playtester Lesson: As a playtester, your job is first and foremost to gather and present data: subjective data on your experiences, and relatively more objective data on events that occurred in your playtest games. Your conclusions may also be very useful, but they are secondary to providing the designer with the raw material needed to improve the game.

While you can run tests based on a hypothesis (“List A is overpowered”), test that hypothesis (play List A against List B three times), and then submit the data that supports your hypothesis (“List A beat List B three times”), always ask yourself at each step of the way: “Is my data driving my conclusions, or vice-versa?” If you are concerned that your desire to see a certain outcome may be biasing your results, suggest to the designer that they run a test with unbiased participants. If your hypothesis is correct, this gives it a much better chance of being addressed than your data alone.

Bonus Lesson: Consider the Other Side

This is a gimme, but an important one. Think about things from the playtester’s perspective if you’re a designer, and think about things from the designer’s perspective if you’re a playtester. If you’re a designer, think about how you would see the playtest you’ve designed as a playtester. If you’re a playtester, ask yourself what the designer is trying to get from you, and why they might want that information. Both sets of lessons apply to both groups, because each can do their job better by putting themselves in the other’s shoes.

Max Brooke