Playtest Design Lessons, Part One: Scoping it Out

Last week, I took a break from my article schedule because… it just didn’t feel right to post a normal article after a violent insurrection that attempted to overthrow the legitimate result of the United States’ most recent democratic election. I tried to write something more topical, something that took on hard questions of art and social responsibility, but I couldn’t find the words to express what I wanted. Maybe someday I’ll be able to articulate myself on that topic. I’ll keep trying. But today I want to publish the article I had intended for last week: part one of a two-part article on playtesting and design.

Despite the well-known importance of playtesting, I feel like I rarely see designers writing to other designers about the specifics of their playtesting process, or writing to playtesters about how to give the most impactful feedback. As such, I’ve decided to delve into this topic over the next two weeks. The remainder of this week’s article will address the some basics, and the fundamental concept of playtest scope. Next week’s article will delve into five lessons for how designers and playtesters can set each other up for success.

First Off, What is Playtesting?

For those completely new to tabletop games, I would define playtesting as follows:

Playtesting is the act of playing an unfinished game with the intent of observing and reporting on what one experiences.

Playtesting should help the designer answer certain questions: How do other people feel when they play the game? What incentives does the game create through its rewards and challenges? How is the game interpreted by players who don’t know what you were thinking? Does it feel fair? Is it fun?

The designer then takes this feedback, assesses it, integrates some parts of it into the game, and creates a new iteration, which is subjected to playtesting again. This process is repeated until the game is good (or until the deadline means that the game must be finalized).

Within the game design process, I see playtesting as where a tabletop game goes from being a concept to being a game. I’ve worked with a lot of playtesters, read more playtest feedback than I can really conceptualize, done a lot of playtesting of games I did and didn’t design, and generally seen the process from all angles, across all sorts of tabletop games. It is unquestionably one of the most vital and most labor-intensive parts of the design process. For ongoing games, the playtesters are often the most engaged, dedicated players of the game, and many put vast amounts of time and energy into playtesting. For a designer, high-quality playtest feedback is always a cornerstone of making an excellent game (or creating excellent new content for an existing game).

What Does Playtesting Look Like?

Playtesting looks like playing the game, then delivering some sort of report of the experience (verbal, written, or mental, if the designer is playtesting and/or a telepath) to the designer. I say that it “looks like” playing the game because, fundamentally, it isn’t the same activity. While playtesting can be quite fun and very fulfilling, it isn’t the same as playing the game recreationally. First, the game is unfinished, and may well not be very good (yet). Second, and perhaps more significantly, it requires the tester to have a certain degree of introspection, and to assess not just the game but also their experiences playing it. It’s not even the same activity as playing a game and reviewing it, because a review is, fundamentally, for a potential consumer, but playtest data is for the designer. And the designer’s goal is not just to understand the merits of drawbacks as the game is, but to assess from that what the merits and drawbacks of the game could be if it was slightly different.

It’s worth noting at this point that there aren’t really widely held standards for playtest organization or feedback across the tabletop games industry. Even within the same playtesting communities, goals and expectations often differ between individuals or playtest groups. And, as a result, a lot of the playtest feedback I’ve seen over the years has ended up being unhelpful. Although in every case the playtesters were earnestly engaged or trying to be helpful, sometimes feedback missed the mark. Playtesting is all about communication, and that can break down in a couple of ways.

*Pictured: Those things that used to fall over and deprive us of telephone service. You know. Bird seats.*

The first way communication can break down is if the designer fails to communicate important information to the testers about the scope of feedback needed at a particular stage (I’ve certainly been guilty of this). The second way it can break down is if the playtester gets stuck in patterns of feedback that fall outside the scope and the designer doesn’t step in to correct this. Whenever I’ve seen this happen, the mismatch between the data the designer needed at that time and the data the tester delivered were almost always avoidable in retrospect.

Know the Scope

Not all playtests are the same, even if you’re testing the same game you tested a week ago. For any given playtest, the designer on the project should have specific goals, and these objectives should be articulated to the testers. As the game (or piece of content for an existing game) begins to take form, these goals should shift to deliver the information that is most relevant to the designer for that phase of development.

For most of the games and game lines I have worked on, testing fell into one of three general categories:

Function Testing: Is the game is fun and understandable?
Balance Testing: Is the game competitively balanced?
Patch Testing: Whether a game that is already out in the world should be tweaked or altered in some way?

For each category, the designer and playtesters should approach the test differently.

Function Testing

*Cooked and ready to be thrown at a wall.*

Function testing encompasses everything from the earliest tests of a new game to the development of new expansions for an existing game, and its goal is primarily experiential rather than mechanistic. While the work of defining a wholly new game is quite different in some ways from designing a successful expansion to an existing game, the process for playtesting starts much the same way. The designer needs to know if players are interpreting the content the way they intend and, if so, if they enjoy it on a fundamental level. This is the phase for the designer to experiment, to throw the proverbial spaghetti at the wall of playtesting and see what sticks. And this is the phase for the playtesters to throw some sphagetti back at the designer’s face.

Function Testing (Designer): Start by asking yourself what you want to know in this stage of testing, and then design your feedback systems around that. Personally, I often use short questionnaires (via Google Forms) as feedback systems, and design different questionnaires for each stages of testing. Written reports allow you to aggregate more data more quickly, which lets you look for commonalities across a larger pool of tests, and having some direct observation to supplement that larger pool of data is also very useful at this stage, as well. Regardless of how you’re receiving feedback, make sure you’re asking the right questions to get the information you want. For function testing, you generally want to know if players are understanding and enjoying the game, so ask questions that will get you this information. Instead of asking “Did you understand X mechanic?”, ask “How would you explain X mechanic to me?” or ask players to give you a summary of their game including a sequence of play. Instead of asking the testers if they enjoyed a specific part of the game, ask them what they enjoyed and see what answers arise. And finally, give playtesters a dedicated space to discuss topics that are not in scope in you reporting system. For example, if you are working on a competitive game, playtesters will always want to raise balance concerns - even if you’re just trying to figure out if a mechanic enough fun to be worth trying to balance. Instead of fighting their inclinations, give them a dedicated space to address these concerns. You can always set aside this information until later if it really isn’t relevant yet, and not having to extract it from the feedback you’re looking for will save you a lot of time and mental effort.

Function Testing (Playtester): During function testing, focus your feedback on two things: function, and that thing you can’t spell it without: fun. Don’t get stuck on the nitty-gritty of balance at this stage. You can still give balance feedback, but segment it off to a separate part of your report, presented after your main findings about why you understood and enjoyed the content (or not). Additionally, try to get not just into whether you enjoyed or understood something, but how and why you understood or enjoyed it it. If you think you might be misunderstanding something, tell the designer how you think it works - if your interpretation isn’t lining up with their expectation, they’ll know they have a problem. If something was unfun to play against, drill down to what specifically made it unfun. Did it take away choices you expected to have during the game? Did it negate choices you made during list-building or the early stages of the game? If you had made different choices, could you have prevented the unpleasant experience from happening? Remember the problems you ran into, and see if they happen again in future games.

Balance Testing

Just try stacking yet another Millennium Falcon on top of that pile… — *Just try stacking yet another* Millennium Falcon on top of that pile…

Balance testing is what I thought playtesting meant before I had worked in the industry. This is the part of testing where a designer wants to discover if specific strategies, builds, or paradigms that haven’t yet been released are “overpowered,” and generally if the game rules are refined in their presentation. Balance testing is in many ways simpler than function testing: the game works, the question is just whether or not it can be “solved” too easily, and whether significant uncertainties occur in play. Balance testing’s importance varies significantly compared with function testing. For competitive games, the focus of balance testing is usually on creating parity among strategic options. For other games, the focus might be more on whether the game feels fair (such as in roleplaying games and cooperative games), or if players can generally come to a shared understanding of the rules.

Balance Testing (Designer): As with function testing, a designer needs to know what they want to do with the data they acquire in balance testing. Is the goal to promote certain strategies or game pieces over others? Making all pieces equal is not necessarily the target of balance testing, depending on the game. For example, if you are making a competitive game around a famous movie (say, Star Wars), many players will want to use Darth Vader simply because he is already their favorite character. If any other piece is an equally effective choice, this means Darth Vader will be viewed as being “average,” which in turn may not lead to the most iconic game experience (this is a topic for a future article unto itself). By contrast, if the game is an abstract tile-placement game, players may not have any sentimental attachment to the individual tiles, and making one tile better than average could well be seen as a major design flaw. The designer must decide which of these outcomes they intend to deliver, then build the test to get the information needed to bring about that result.

Balance Testing (Playtester): From the perspective of the playtester, it is in some ways more straightforward to give good balance feedback than function feedback. Simply report the results of the game and your impressions of why those results occurred, so that the designer can look at the data across numerous games and determine what’s overperforming, what’s underperforming, and what’s in about the right place. However, there is an additional wrinkle to consider. As I discussed in a previous article on how player choices impact the overall game, players sometimes gravitate toward options that, while not optimally effective, optimize the fun of one player. For example, when the Nantex-class starfighter underwent pre-release points testing in X-Wing 2nd Edition, tests gravitated toward the most tournament-effective build: Sun Fac or another ace with Ensnare. However, the build that began to crop up at tournaments with the most pernicious results was a build with numerous mid-tier pilots with Ensnare. This build could control the entire field, and generally made the other player’s life miserable. It also didn’t win tournaments. By the numbers, it wasn’t overpowered, but it won enough games to satisfy the players using it because it won them in a fun (for that player) way. This is where the trickiest part of balance testing comes in: uncovering the strategies and builds that sit at the cross section of “extremely unpleasant to play against” and “effective enough to be taken, but not great.” And this is why it is important to include experiential feedback even at this stage, because sometimes an ineffective strategy can still be a balance problem the designer should address.

Patch Testing

*Hey, they got it working again, didn’t they?*

Patch testing is any test dedicated to testing the content that has already been released, to see if it should be changed via errata or some sort of other update. Due to the difficulty of changing printed content, patch testing is a far smaller part of the tabletop game space than it is in the digital space. However, the divide between these two has grown fuzzier and fuzzier in recent years. The highly successful launch of Magic: The Gathering’s Arena online version has certainly lead to more “patch testing” of Magic. While once cards were very rarely changed for balance via errata in Standard in the past (instead being banned, if necessary), the focus on Arena has lead to several changes to released cards purely for balance reasons. Patch testing is essentially the ongoing form of balance testing, and it has many similarities, but also some key differences.

Patch Testing (Designer): First and foremost, patch tests should be orchestrated to make the most of the data you have from the game being played “in the wild.” Even if your game has a modest player base, it’s likely that the feedback you receive from the wider community will be much more numerically substantial than what you can draw in an internal playtest. As such, make sure that you’re leveraging what you already know when you not only set up your proposed changes (by taking this data into account in your proposed changes) but also when considering its impact. If a piece of content is highly played and popular, consider not just whether you want to bring down its effectiveness, but by how much, and then ask questions that will get you the data to find the sweet spot you’re looking for. For instance, asking if a piece you have made weaker is still effective can often lead to misleading information. If players are forced to use something, they will often find it is reasonably effective on the table, but this doesn’t mean that they will assess it as effective when deciding whether or not to use it. Instead, ask testers to compare the piece with a now-comparable piece, preferably across two games. If they consistently report the weakened piece underperforming the now-comparable piece, you may have overshot your mark. Additionally, for patch testing, asking players to dedicate some portion of tests to using certain specific builds that currently exist in the wild (especially those you aren’t changing) against altered content is often an extremely worthwhile benchmark.

Patch Testing (Playtester): For the playtester, patch testing is quite a bit like balance testing, but there’s one extra important thing to consider: what are the second-level effects of the changes? While many groups will likely report on the direct effects, you can stand out by examining these secondary effects and reporting the results. Does making one strategy weaker suddenly create a new threat from a game element or list that it was holding in check? Does making something that is underperforming cheaper change the threshold for how many of that thing a player can purchase within a legal list? Will an errata for clarity on one issue create uncertainty on a different issue by establishing an inconsistent precedent? Try to step back and look at the game holistically with the proposed changes implemented, rather than simply focusing on the things that are changing. Oftentimes these tests will end in a negative answer to the question you have set out to answer (“Is this weird new strategy actually good? Probably not.”), but remember that a negative answer is still extremely valuable feedback to the designer.

And that brings us to the end of playtesting scope! Tune in next week for five lessons on designing for playtesters and playtesting for designers!