Evaluating Eternal: The importance of variance for deck diversity and why it’s not easy to “fix”

Hi everyone! Flash2351 here and today, I’ll be taking a break from draft articles and instead, talk about something else close to my heart, Math! Specifically, today, I want to show the importance of having variance to allow for deck diversity, both in terms of variety of archetypes (Part 1) and variety within archetypes (Part 2). At the end of each part, I would also cover why there isn’t an easy and clean fix for it. Lastly, I will talk about what can be done to hopefully reduce variance without killing diversity.

 

Part 1: Variety of Archetypes

The Model

As with any good math theory, let us start off with a model. Let’s assume that for each game, your draw quality can be rated on an arbitrary scale of 0 to 100. Initially, for simplicity, let us assume that each draw quality is equally likely (I will later address why this doesn’t work and why that both adds a further degree of deck-building complexity and make the variance problem even harder to fix).

In any case, with this model, we can roughly plot out your chances of winning given both you and your opponent’s draw quality as follows (assuming both players have identical decks):

Slide1To optimize both players’ experience and to reduce non-games, we would want to reduce the dark red/green areas since those are games where player decision often does not matter. Ideally, we also want to maximize the number of games in the yellow area since that is the region where decisions are likely to matter and a single misstep could cost either player the game. Now, looking at this, there seems to be two easy and obvious ways to fix this:

Proposed Method 1: Removing Bad and/or Nut draws

Slide3By removing bad and nut draws, every game will effectively take place within the red dotted circle. And behold, significantly more games are even and there are no strongly favored/unfavored draws. This is a situation that would likely arise due to the multiple fixes suggested on reddit, such as allowing players to cycle 2 non-power cards for a power card, or using a loaded shuffler (guaranteed power every few turns), etc.

Proposed Method 2: Sampling along the diagonal

Slide2This is a slightly more complicated fix, but it involves pegging both player’s hand quality to each other. By ensuring both players have similar draws, we maximize the number of impactful decisions in game and remove any extreme cases.

So far, both fixes seem relatively straightforward and generates the result that we want for ladder. However, if we were to look at the cases of non-identical decks, everything falls apart.

Big Combrei vs Stonescar Burn

In order to see why both proposed fixes would destroy deck diversity, I chose to use Big Combrei (BC) vs Stonescar Burn (SSB) as my example match-up. Now, there will be slight variations due to different builds across the board, but I think pre-board both decks are approximately 50-50. This is also why in the period post-stonescar nerf (RIP ChaCha) and prior to chalice, these two decks were argubly the best in format (Stonescar Burn less so for tournaments because it loses out post-board).

The way this match-up plays out were very different from the mirror match illustrated above:

-If SSB nut draws, it would destroy BC out the gates.
-If BC stumbled on power, the match was firmly in SSB’s favor.
-If both players were given decent draws BC is naturally favored.

This is illustrated in the following graph:Slide4Notably, despite having a very different profile as the previous graph, the match-up is still 50-50 (You can measure the areas to check). Now, let’s see what happens to this match-up when we apply either proposed fixes:Slide6Slide5.JPGNow, the problem with either fixes should be very obvious! Both methods end up heavily distorting what was orginally a 50-50 match-up into either BC favored (Method 1) or SSB favored (Method 2)!

Different Decks Capitalize on Different Draw Quality

Now, one can argue that it doesn’t matter since we could just buff Stonescar or Combrei to re-balance the match-up, but that’s missing the point. Different decks are designed to capitalize on different regions of the graph. Aggro as an archetype strongly capitalizes on nut draws and opponent stumbling. Control decks generally capitalize on stable, consistent draws and giving up the occasional game due to bad variance. On another level, 3-faction greed piles trade in very weak bad draws for a lot more power in their nut draws. The whole idea of a wide diversity of decks arise because they are able to target different draw qualities from both players. And being able to go even against each other despite having vastly different strategies is what allows for a varied meta.

By implementing any fix that reduces this sampling space of draws, it is also effectively killing deck diversity. Let’s go over some of the proposed fixes, armed with our deeper conceptual understanding, and see why they won’t work:

Better Mulligan System

One of the most popular proposals I’ve seen is to improve the mulligan system. Common suggestions include: guaranteed 2-4 power hands for both pre and post-mulligan hands, HS-style mulligan, allowing a third mulligan at the cost of a card. If we think about it in terms of the graph, a better mulligan system effectively increases the potential for nut draws and reduces the odds of a bad hand. This, coupled with the fact that the effect of mulligans is most pronounced in the first few turns, means that aggro would be greatly buffed by this change. In fact, I think the current system is already extremely generous and allows aggro decks to cheat on power. Further improvements to the mulligan system can cause the power level of aggro to spiral out of hand.

Increased Card Cycling

Another very common suggestion is to allow for cycling of cards, such as banishing a sigil for a card (or even a guaranteed non-sigil), and vice versa. Now, this is a system that would greatly benefit control. Because of balance concerns, this effect has to be restricted to once a turn, which means that the effect becomes increasingly pronounced as the game goes on. This allows control decks a much greater degree of freedom to sculpt their hands and easily dominate most match-ups. Aggro decks do also benefit, but significantly less than control decks because they have much more redundancy and dump their hands very fast.

This also greatly reduces deck building decisions. Previously, tech cards for specific match-ups are a costly include because they are a potential dead draw in other match-ups. With this fix, there is nearly no deck-building cost to including tech cards (especially for control decks) because you can simply cycle them if needed.

“Power-weaving”

The last of the top 3 common suggestions was some form of power-weaving. For those unfamiliar with Magic, this term arises from players doing the pile shuffle with separate land and unit piles to “weave” the lands with units (This is banned in tournaments). Translating this to Eternal, it’s effectively some form of power draw guarantee. For example, it could be implemented in the form on guaranteed 1 power and maximum of 4 power every 5 cards or using a pseudo-RNG generator to determine power vs non-power draw.

This would greatly affect card balance. Previously, cards can scale exponentially with power cost. For example, a 7 cost card can be significantly better than a 6 cost card whereas a 2 cost card is usually only marginally better than a 1 cost card. This is because you could be stuck at 6 power for multiple turns, making the 6 cost card come down multiple turns earlier than the 7. This adds a risk vs reward treatment to deck building that would be removed due to power-weaving. The added consistency of hitting your power drops will also benefit control decks significantly.

Mirror-Pegged Draws

Now, let me first say that I think this is next to impossible to implement because hand and draw quality is both deck and match-up dependent (something that I will illustrate and elaborate on later). However, given that there are probably better programmers and smarter people than me working at DWD, let us assume, for the sake of argument, that it is possible to somehow generate a model estimate and peg both players to the same value. In this situation, it seems like the optimal strategy is just to play big fatties, aka midrange soup. By doing so, you make it such that your average draw is always better than your opp’s average draw. You don’t have to worry about draw mismatch against aggro and control either stumbles or they draw well but you just beat them down before they could stabilize.

Some combination of the above

One solution that I actually have been contemplating is whether there is a combination of the above that can be implemented so that we reduce variance but maintain the diversity. This is something that I believe DWD has also toyed with as seen from their improvements to the mulligan system where the initial draw improvement helps aggro while the redraw rule hurts aggro.

However, there are two huge problems with this approach. Firstly, there is the risk of adding too much complexity and making the system too non-intuitive. It increases the burden of knowledge on the player and messes with the player’s game sense. The current mulligan “fix” has already shown some of the flaws. For example, control decks are being pushed towards running multiple power searches instead of actual power simply to inflate power count post-mulligan. This is extremely counter-intuitive and can throw players off unless they actually run the numbers themselves.

Secondly, there is the issue of shifting goalposts. The increased complexity could be worth it if it substantially reduces non-games. However, if that happens, I think all that is going to happen is people will complain about the adjusted bad variance as non-games. For example, if the fix was power-weaving, guaranteeing a maximum of 3 power draws every 5 cards, instead of people salting about drawing 5 power in a row, they will salt about drawing 3 power and 2 non-units. Then a fix comes out so that you get 1 unit every 5 cards. People will now salt about only drawing 1 unit in 5 cards. Ultimately, people will always find something to complain about. Thus, a fix should only be implemented if it substantially improves the experience, but the simple fact that there are players complaining about it does not mean a fix is needed.

 

Part 2: Variety Within Archetypes

Manipulating Your Draw Quality

Remember the caveat at the start? The assumption that all draw qualities are equally likely? Well, not only is this not true, it is also another factor that you can manipulate in your deck-building. For an arbitrary deck, you would expect the distribution of draw qualities to follow a roughly normal distribution, similar to the following:

normalNow, this is something that we can alter by changing cards in our deck. Playing more tech cards for a certain match-up would increase the odds of a game with good draw quality for that match-up while trading off draw quality in other match-ups. Similarly, playing too much or too little power would reduce your draw quality.

While this may seem straight-forward, it really is not. It creates an intricate balance of trade-offs that you have to manage when building your deck. To illustrate this, I want to use Sunyveil’s World’s list as an example.

Tradeoffs: Sunyveil’s World’s list

If you haven’t checked out the ETS World’s series, I would highly recommend checking it out at our youtube channel. For reference, Sunyveil’s list is also available here. While most of his deck choices were similar to a standard Stonescar Burn list, Sunyveil made one unconventional (and in my opinion, critical) adjustment to his deck, running exactly the minimum 25 power instead of the conventional 28-29 power.

I know most players will be up in arms about this, because in no way should 25 power be the correct number of power. It makes the deck more prone to screw, and sometimes, only being able to fire off that crucial obliterate one turn too late. However, I believe Sunyveil made this decision because running the minimum power significantly increases the potency and potential of his nut draws, at the trade-off of lesser average draws. If I were to plot Sunyveil’s draw quality (red line) against the average (blue line), I think it would look something like that: sunyveilWith 25 power, the deck increases it’s chances of nut draws, but also loses more hands to subpar draws. This still doesn’t seem like a great idea though, given your average draw quality is now 45, as compared to 50 when running 28-29 power.

However, when we look back at the Big Combrei vs Stonescar matchup (which argubly shares similar match-up profiles as many other midrange decks), we notice that nut draws for Stonescar is highly favorable, while the difference between an average draw and a poor draw for stonescar is marginal at best. And this is the genius bit, by running 25 power and weakening your average draw, you are also increasing the variance. By doing so, you increase the odds of getting a winning draw and actually tilt the match-up in your favor. Thus, if Sunyveil was expecting a significant portion of midrange and control decks, bringing this build greatly improves his odds.

This philosophy can be generalized to multiple other variations within archetypes and another notable variation is the two most popular TJP midrange decks: the OND variant and the ET variant. Both decks work off the same idea, an aggressive Combrei core, backed up by Kothon, Scouting Party and the best stall-breaker in the format, Crystallize. However, the key difference is that the OND variant is a more stable build, with a solid power base and loses significantly less games to influence screw. In contrast, the ET build accepts a small portion of influence screw to increase the potency of their nut draws (turn 1 initiate into turn 2 student+power and turn 3 titan). Both decks have their strengths and weaknesses, and for different metagame landscapes, one deck would be a better choice over the other.

This adds a whole additional layer of complexity to deck building and alteration. Not only can you build your deck to be favored at different relative draw qualities, you can further tweak the odds of your draw quality by adjusting the card ratios in your deck. Removing variance would likely kill of this level of deck tuning.

Final Caveat: Draw Quality is Dependent on Match-ups

Another addition caveat that I didn’t mention that would also significantly complicate any potential “fix” is that draw quality is an ambiguous term and often dependent on what deck you are playing with and against. For example, being stuck at 4 is probably fine for most aggro decks, but a utter nightmare for control decks. Also, Feln Control’s dream hand against aggro is probably 2 Sigils, 2 Lightning Storm, 1 Feln Bloodcaster, 1 Vara’s Favor and 1 Permafrost. However, this same hand does literally nothing if Feln Control is matched up against Big Combrei. (This is also why I specifically referenced each match-up in earlier paragraphs, so what I wrote still stands)

This means that it is extremely hard, if not downright impossible, to generate a reliable estimate of draw quality without knowing how both archetypes match up and their key cards for the match-up. Solutions such as guaranteed power draws every X turns would also not work out well, because it would sometimes increase your draw quality, but other times, actually hurt it. For example, as Feln Control against Rally Queen, being stuck at 3 or 4 is ideal as long as you continuously draw Lightning Storms and spot removal. In this case, a guaranteed power draw can actually cost you the game, rather than increase your chances of winning. In contrast, as Feln Control against Chalice, the ideal draw is probably 6 straight power, a Vara’s Favor to pop face aegis, and Azindel’s Gift. Again, a forced non-power draw could dramatically shake up the odds, rather than preventing flood.

 

Part 3: Is All Hope Lost?

No. Or at least, I don’t think so. Variance is a fundamental part of card games and I think that it is important to accept that no matter what is being done, at some level, there are going to be games decided purely based on draw quality. However, what could be done is to increase the complexity in the game, giving “players more rope to hang themselves with” as the saying goes. By increasing decisions and decision complexity, there is an increased margin of error and gives the better pilot more opportunities to outmaneuver the opponent.

More Actions vs Increased Complexity

However, there is a need to distinguish between simply increasing the number of actions per turn and increasing complexity. In a recent reddit thread, some players argued that Hearthstone was more complex because you get to do more actions per turn, and with combo potentials such as miracle rogue and OTK warrior (pre-nerf), a single trip-up could cost you the game.

I strongly and utterly disagree with this viewpoint. Yes, more actions per turn CAN lead to increased complexity, but not necessarily. Think about this way: Eternal implements a new rule, before you play any card, you have to first select it, and then spin around 3 times, drink a glass of water, enter your password and then select the same card again. Selecting the wrong card means you can’t play either cards. There are like a ton more actions, and ways for you to trip up, but in actual fact, the game did not get any more complex. Similarly, the same can be argued for many hearthstone combo decks. There were little adaptation on the fly and simply repeating the same series of actions.

Card Draw, Modular Cards and Activated Abilities

So how do we increase complexity then? I think the recent spoilers by DWD have mostly demonstrated the 3 obvious solutions that I see. This is also extremely encouraging because this shows that they really do know where they are going with this whole project. Firstly, more and/or better card draw can increase decision making because more cards obviously give more choices. However, as mentioned above, without sufficient cards of similar power level and differing impact, improved card draw might not actually increase decision making if there is a clearly optimal choice.

Secondly, modular cards are extremely skill-testing, especially in match-ups where both options could be useful. A good example is Rolant’s Choice in draft. While a Plague effect is often game-winning, there are occasions where giving an unit of your choice +3/+3 is marginally better and being able to spot those lines can potentially swing a game. This does seem to be the direction DWD is taking, with the Choice cycle and the spoiled Disjunction card.

Lastly, activated abilities could provide an additional avenue for players to spend their power each turn. At the risk of sounding like a broken record, this abilities need to be comparable to actual cards (plus the cost of the card draw) to increase actual decision making. For example, at 8 power, with Siraf on board and multiple playable cards in hand, the optimal line is almost always to activate Siraf because not only is the unit going to be better on average, it also does not cost a card. The presence of Siraf’s activated ability, in this case, simply increases your deck’s power without increasing the complexity of it’s decision making.

Seeing Complexity where Others Auto-pilot

The last bit is mainly to address the naysayers that claim there is almost no decision making in Eternal. I think that Eternal, as is most games, is a game of margins. There isn’t always a clear critical move where a single play/misplay wins/loses you the game. Rather, it is a game of taking incrementally advantages and avoiding giving up random percentage points elsewhere. Most importantly, it is a game of seeing lines of play where others simply autopilot.

Example 1: Kalsir vs Piquette

As an example, I wanted to highlight the Kalsir vs Piquette match during the KcBandit Sealed Tournament last weekend. The video of the game is here and the decklists are here for reference.kalsir.PNGWe join the game on Kalsir’s turn 5. Notably, Kalsir has been slightly unlucky, drawing only 1 unit and unable to play it due to the lack of Justice influence. Fortunately, Kalsir is not under a ton of pressure. The natural thing to do here is to simply play a power and pass (which is what Kalsir did). However, upon closer inspection of the decklist, we note that Kalsir is running an Emerald Monument, which can only be a justice source prior to 5 power. With that nugget of information, it might actually be worth it to hold off playing a power to ensure that we could use the Monument as a Justice source.

Kalsir indeed topdecks the Monument next and unfortunately, is unable to play it as a power. This leads to a swift demise as he is unable to contest the board at all. However, if Kalsir had instead optimised his play and not played the 5th power, he would have been able to play silverwing familiar the next turn. He could follow up with a Beastcaller Amulet (that he topdecked) and mirror image the summoned beast. This would have put him in prime position to win the game with a 2/1 lifesteal, aegis flier and 2 5/5 units holding the ground.

Now, I am not saying Kalsir threw this game, in fact, I think if 100 players were to play out this exact game, at least 95 of them (myself included) would have played out the 5th power. The point I want to illustrate is that it is often important to be able to correctly identify lines of play where there seems to be no alternative lines of play.

Example 2: Flash2351 vs OneStepBehind

Another example is my game 1 against OneStepBehind in the same tournament. It’s a nailbiter, with both of us trading blows back and forth. I seem to have gotten the upper hand, and we join the game here (note the hand-sync in the stream is slightly off):flash.PNGOn this turn, I topdecked a Trickster’s Cloak. Intuitively, the right play seems to be just slam the cloak on to maximize damage. However, I noted that outside of Lethrai Falchion (which is already discarded), OneStepBehind had nearly no way to instantaneously gain life. This meant that the Trickster’s Cloak would be lethal regardless of whether I equip it this turn or the next. Moreover, holding Trickster’s Cloak plays around both a topdecked flier and removal (since I can play the Trickster’s Cloak on my next unit for surprise lethal if they removed my humbug). Holding the Trickster’s Cloak paid dividends as OneStepBehind topdecked an Umbrean Reaper, which would’ve been a solid answer if i have played the Trickster’s Cloak a turn earlier.

This is a more obvious line as compared to the previous example, but I do think that in my shoes, a lot of players would have just jammed the cloak instantly and blame variance for the loss.

 

Both examples that I’ve shown are instantaneous rewards for the optimal line, but often, the optimal line may not make a difference, or might even only matter 10 turns onwards. For example, something as simple as using an unnecessary Vara’s Favor turn 2 as Feln Control could result in you needing 2 removal spells to deal with an Icaria, the Liberator. This doesn’t seem like a big deal, except a few turns down, a Tarvod gets slammed down and you haven’t drawn into your 3rd removal.

Ultimately, what I wanted to highlight is that while many of these decisions might seem like an easy auto-pilot, any one of them could easily turn into the crucial straw that broke the proverbial camel’s back. So next time you play a game and felt like you made no meaningful decisions, go back and go over it. I’m sure there are some lines that you didn’t consider. Sure, those lines might not have mattered this game but they could end up mattering the next.

 

Conclusion

Wow, this article turned out much, MUCH longer than I expected so if you made it all the way to the end, congrats! This is definitely a topic that I have thought deeply about and I’m happy to discuss any bits in further detail if you wish. Just poke me on reddit or discord!

This column, Evaluating Eternal, will probably appear on a bi-monthly basis because 1) it takes a lot of time to write and 2) I joined RNGeternal as primarily a draft writer, so aReNGee might not be too happy letting me off writing draft articles regularly xP. Do let me know what you think of this article and whether this sort of “science”-centric articles float your boat in the reddit thread! Also, do let me know if there is something you would love for me to cover!

Statistics are like bikinis.
What they reveal is suggestive, but what they conceal is vital.
Flash2351

One thought on “Evaluating Eternal: The importance of variance for deck diversity and why it’s not easy to “fix”