Welcome

Please register for Total War Access to use the forums. If you're an existing user, your forum details will be merged with Total War Access if you register with the same email or username. For more information please read our FAQ’s here.

Categories

Some Conclusions from the Tournament Data Collected this Cycle

tzurugbytzurugby Registered Users Posts: 275
There was a raging debate as to whether the new Rune of Wrath and Rune was overpowered..... it was vigorous on both sides and there were a lot of interesting points made.

In the midst of the debate I made the assertion that if the new RoWR was that OP then we should see the Dwarfs do very good in Pick/Ban tournaments, and end up over represented in the pick rates. I also put up a post asking if CA would post win rates for the Dwarfs in QB, but heard no response.... but heard from others it would be extremely unlikely that they would.

So I decided to get the data myself.... after all, complaining about something without taking action is just whining.

I had hoped to get 750 games in the pick ban format, but only end up getting 661. I also started recording Blind Pick data, and Quick Battle data just because it was fun.



The results: The average number of games played in the pick/ban format was 88, and the Dwarfs played 79 games, which is below average but close enough to average for me to consider them so. The also had a 52% percent win rate over their 79 games, which is solid.

It has been noted that an ability can be overpowered without the faction being overpowered, and there is variance in these numbers for sure.... but based on the data, and personally playing the Dwarfs, talking other Dwarf players and watching a LOT of games, that the following is true:

1) The Dwarfs are not OP (though RoWR may still be)
2) The Dwarfs are competitive in enough matches to be a worthwhile faction to be able to play in pick/ban tournaments.


Some may argue those statements, but I think they are defensible.

That may seem like a lot of effort to understand those two things, but that is simply the nature of data.... there are no shortcuts.



As far as other bits of information that came out of the data, which simply emerged and were not anything I was looking for, is the nature of the types of tournaments being played, and the overall balance of MP in CA.



3) Multiplayer in Warhammer Total War II is incredibly well balanced.

The standard deviation of the win rate of each faction in tournaments was 0.0386, which is amazing. Essentially almost every factions overall win rate landed between 46% and 54%. I have heard some strong players say this game is balanced, and it is "all about the match ups", but I never imagined it would be this good.

KUDOS CA BALANCE TEAM!!!!!!! Seriously, just wow.





4) Pick/Ban tournaments are the most competitive, while blind pick tournaments have the most faction representation.

The standard deviation between win rates in Pick/Ban tournaments is 0.052 compared to the standard deviation in blind pick tournaments of 0.067. This somewhat quantifies the difference skillful faction selections play over pure luck in terms of win rate.

On the other hand the faction selection standard deviation in pick/ban tournaments is 32.3, where as for blind pick it is 14.4, which is a gargantuan differential. To put it another way, in pick/ban tournaments the top 3 factions were selected 418 times with the bottom 3 factions selected a total of 145 for a nearly 2.9:1 ratio..... in blind pick the top three factions were picked 259 times versus 141 times for total of the bottom 3 which is near a 1.8:1 ratio.

From a qualitative point of view, pick/ban tournaments are better at determining the best player, while blind pick tournaments are better at determining the best faction.

As someone who reviewed 1816 games this cycle, I will say that blind pick tournaments are much more interesting to watch, as in pick/ban you often see the same match ups over and over, but that is just my personal preference.





Beyond all this I have not really looked into what is possible with the data. I did put a few feelers out to see if the faction vs faction data could be made sense of beyond a 7 match set, but there is just not enough games to get there I think.

I do hope to see other peoples take on the data, and potential crunching of the numbers, as I did put all the spreadsheets out there for anyone to use.







Comments

  • OrkLadsOrkLads Registered Users Posts: 1,875
    tzurugby said:

    As someone who reviewed 1816 games this cycle, I will say that blind pick tournaments are much more interesting to watch, as in pick/ban you often see the same match ups over and over, but that is just my personal preference.



    I wholeheartedly agree with this. So many tournaments these days are the same matchups again and again and again, it makes for incredibly dull viewing in a game which has its biggest strength as faction diversity.
  • Lotus_MoonLotus_Moon Registered Users Posts: 9,082
    If you remove the outlier that is skaven at 0 considering the new rune there is not a good pick anyway, you haev dwarfs siting at 58% win rate, that is too good.

    I be very surprised if the W&R doesnt get hit this patch, at minimum i expext a duration nerf, but perhaps -50m range also.

    Don't forget the touranemnt data uses different restrictions, such as limiting rune casters to 2 not 3.

    Enough about dwarfs though, this data is fine to have to make suggestions regardign balance from it the events should use same restrictions as quick battles, VP for example are limited to 1 collosas where in QB you can have 2 and dominate with them.

    IF tournaments adapt CA's restrictions which i find better personally than tournaments ones (with few exceptions), than i think balance suggestions can be made in regards to this data.
  • tank3487tank3487 Member Registered Users Posts: 1,727
    There is quite interesting corelation in data. Quite often Low Pickrate increase Win Rate of faction. Which i believe are result of low pickrate factions being played only by most experienced in them.

    Only faction that do not fit in this are Skaven(they have high winrate despite high pickrate) and VP(despite low pickrate they do have low winrate).
  • MTechMTech Registered Users Posts: 495

    If you remove the outlier that is skaven at 0 considering the new rune there is not a good pick anyway, you haev dwarfs siting at 58% win rate, that is too good.

    I be very surprised if the W&R doesnt get hit this patch, at minimum i expext a duration nerf, but perhaps -50m range also.

    Don't forget the touranemnt data uses different restrictions, such as limiting rune casters to 2 not 3.

    Enough about dwarfs though, this data is fine to have to make suggestions regardign balance from it the events should use same restrictions as quick battles, VP for example are limited to 1 collosas where in QB you can have 2 and dominate with them.

    IF tournaments adapt CA's restrictions which i find better personally than tournaments ones (with few exceptions), than i think balance suggestions can be made in regards to this data.

    That's not how you remove outliers, usually removing outliers disqualifies both the highest and the lowest you can't just pick the one that suits you.
    But I know you need to prove your point, which turned out to be wrong after all.
  • Lotus_MoonLotus_Moon Registered Users Posts: 9,082
    edited December 2019
    MTech said:

    If you remove the outlier that is skaven at 0 considering the new rune there is not a good pick anyway, you haev dwarfs siting at 58% win rate, that is too good.

    I be very surprised if the W&R doesnt get hit this patch, at minimum i expext a duration nerf, but perhaps -50m range also.

    Don't forget the touranemnt data uses different restrictions, such as limiting rune casters to 2 not 3.

    Enough about dwarfs though, this data is fine to have to make suggestions regardign balance from it the events should use same restrictions as quick battles, VP for example are limited to 1 collosas where in QB you can have 2 and dominate with them.

    IF tournaments adapt CA's restrictions which i find better personally than tournaments ones (with few exceptions), than i think balance suggestions can be made in regards to this data.

    That's not how you remove outliers, usually removing outliers disqualifies both the highest and the lowest you can't just pick the one that suits you.
    But I know you need to prove your point, which turned out to be wrong after all.
    I removed the match-up the rune is not good in anyway.

    Patch is less than 2 weeks, lets see who will turn out to be rite, like i said i will be very surprised if W&R doesn't get some kind of nerf, be it duration or range, hopefully both, and to top this off it be great if it came with swordmaster buff and longbear nerf just to see you happy :tongue:
  • tank3487tank3487 Member Registered Users Posts: 1,727


    Don't forget the touranemnt data uses different restrictions, such as limiting rune casters to 2 not 3.

    I would say despite all pluses of new runes 3 Runecaster are less frequent. You just do not have enough dakka to concentrate for such ammount of slows. Two slows are optimal in most cases.
    Old triple rune was so good in triple form due to damage. You always could have return gold that you have spent on it. Plus you needed two casts on one target for enough slow.


    I be very surprised if the W&R doesnt get hit this patch, at minimum i expext a duration nerf, but perhaps -50m range also.

    Only if either significant nerfs to DE, HE, Skaven. Or some other kind of buffs to counter dawi abuse.
    Cause right now really strong matchups of dawi are countered by really bad ones.
  • Lotus_MoonLotus_Moon Registered Users Posts: 9,082
    tank3487 said:


    Don't forget the touranemnt data uses different restrictions, such as limiting rune casters to 2 not 3.

    I would say despite all pluses of new runes 3 Runecaster are less frequent. You just do not have enough dakka to concentrate for such ammount of slows. Two slows are optimal in most cases.
    Old triple rune was so good in triple form due to damage. You always could have return gold that you have spent on it. Plus you needed two casts on one target for enough slow.


    I be very surprised if the W&R doesnt get hit this patch, at minimum i expext a duration nerf, but perhaps -50m range also.

    Only if either significant nerfs to DE, HE, Skaven. Or some other kind of buffs to counter dawi abuse.
    Cause right now really strong matchups of dawi are countered by really bad ones.
    SKV i doubt and DE also since they getting DLC, HE got a lot better after last patch so i dont see it necessary, though BM are still great.
  • rymeintrinsecarymeintrinseca Registered Users Posts: 864
    Complete list of MUs with at least 5 wins and 80% win rate:

    BR 12-3 DE
    DW 9-2 VP
    DW 5-1 WE
    HE 5-1 VP
    LM 5-1 BR
    SK 5-0 DW
    SK 8-2 LM
    WE 11-2 LM

    If the results were random or systematically biased (by variations in player skill etc) there would be no tendency for intuitively more favourable MUs to get better results. But in fact almost all of these show more wins for the side with the intuitively more favourable MU. This suggests that the data may indeed be a good guide to difficulty of specific MUs.
  • ystyst Registered Users Posts: 7,361
    So garbage voast with 40%, prolly looking at massive buffs incoming
    https://imgur.com/a/Cj4b9
    Top #3 Leaderboard on Warhammer Totalwar.
  • mightygloinmightygloin Registered Users Posts: 2,702
    Arduous work, but I doubt many people were calling this game unbalanced even before any data (There is still some powercreep issues though). Too much balance can even be detrimental and boring, especially when it's about Warhammer ;)

  • eumaieseumaies Senior Member Registered Users Posts: 6,236

    MTech said:

    If you remove the outlier that is skaven at 0 considering the new rune there is not a good pick anyway, you haev dwarfs siting at 58% win rate, that is too good.

    I be very surprised if the W&R doesnt get hit this patch, at minimum i expext a duration nerf, but perhaps -50m range also.

    Don't forget the touranemnt data uses different restrictions, such as limiting rune casters to 2 not 3.

    Enough about dwarfs though, this data is fine to have to make suggestions regardign balance from it the events should use same restrictions as quick battles, VP for example are limited to 1 collosas where in QB you can have 2 and dominate with them.

    IF tournaments adapt CA's restrictions which i find better personally than tournaments ones (with few exceptions), than i think balance suggestions can be made in regards to this data.

    That's not how you remove outliers, usually removing outliers disqualifies both the highest and the lowest you can't just pick the one that suits you.
    But I know you need to prove your point, which turned out to be wrong after all.
    I removed the match-up the rune is not good in anyway.

    Patch is less than 2 weeks, lets see who will turn out to be rite, like i said i will be very surprised if W&R doesn't get some kind of nerf, be it duration or range, hopefully both, and to top this off it be great if it came with swordmaster buff and longbear nerf just to see you happy :tongue:
    Yeah no one ever gets value from a rune of wrath and run against doomwheels, doomflayers, and fleeing expensive rat units. Funny how wrong that is.
  • yukishiro1yukishiro1 Member Registered Users Posts: 510
    Game is pretty well "balanced" in pick/ban in the sense that if everybody picks and bans intelligently there are enough competitive matchups to run a tournament.

    I'm not sure that really means the game is balanced. In a balanced game the difference between pick/ban and free wouldn't be so large. What this shows is that there are a lot of hard counters that you have to work around in the faction selection system to make the game balanced.
  • OrkLadsOrkLads Registered Users Posts: 1,875
    eumaies said:

    MTech said:

    If you remove the outlier that is skaven at 0 considering the new rune there is not a good pick anyway, you haev dwarfs siting at 58% win rate, that is too good.

    I be very surprised if the W&R doesnt get hit this patch, at minimum i expext a duration nerf, but perhaps -50m range also.

    Don't forget the touranemnt data uses different restrictions, such as limiting rune casters to 2 not 3.

    Enough about dwarfs though, this data is fine to have to make suggestions regardign balance from it the events should use same restrictions as quick battles, VP for example are limited to 1 collosas where in QB you can have 2 and dominate with them.

    IF tournaments adapt CA's restrictions which i find better personally than tournaments ones (with few exceptions), than i think balance suggestions can be made in regards to this data.

    That's not how you remove outliers, usually removing outliers disqualifies both the highest and the lowest you can't just pick the one that suits you.
    But I know you need to prove your point, which turned out to be wrong after all.
    I removed the match-up the rune is not good in anyway.

    Patch is less than 2 weeks, lets see who will turn out to be rite, like i said i will be very surprised if W&R doesn't get some kind of nerf, be it duration or range, hopefully both, and to top this off it be great if it came with swordmaster buff and longbear nerf just to see you happy :tongue:
    Yeah no one ever gets value from a rune of wrath and run against doomwheels, doomflayers, and fleeing expensive rat units. Funny how wrong that is.
    Yea it is pretty critical against Flayers imo. Pretty sure that it doesn't effect fleeing units, as far as I can tell routing units seem to drop speed debuffs for some reason. I assume it is something to do with how when a unit is surrounded and routs, it seems to be able to slip through the lines of a unit that had previously locked it in.
  • AerocrasticAerocrastic Registered Users Posts: 464
    edited December 2019
    yst said:

    So garbage voast with 40%, prolly looking at massive buffs incoming

    50% win rate vs Beastmen
    25% vs Chaos and Bretonnia
    30% vs Greenskins
    20% vs Vampire Counts

    This just tells me that there's a lot of coast players that don't know how to play these matchups or that overall experience with vampire coast is not very high.

    Beastmen have a 71% win rate vs Lizardmen

    or in other words like we've been saying this whole time
    This information shows trends, but is completely unbiased towards player skill in certain matchups or a weighted average of how likely player X is to win N matchup.

    Overall this is a depiction of the state of the meta, not something you can draw deep conclusions from.

    If for example, the best Wood Elf player in the world beats an average TK player, this is recorded as a 1:0. If they take this matchup several times and win 8 times out of 11, but 2 other players play this matchup and win only 2 times out of 6, this leaves us with 12/23 or a 52% win rate skewed by the very strong performance of few people as an outlier.

    Vampire Coast's top representation is by Tutorial (Korean player, doubt his stats are counted here) and Arkhan the Black (banned from the majority of English speaking tournaments). If their data isn't incorporated into these tournament sets then the other factions will have those couple of players swinging win rates higher, and if we go solely off of numbers then we would also conclude if Vampire Coast is a terrible faction based on their win rates here that the best factions right now are

    Beastmen, Wood Elves, and Vampire Counts

    As someone who actually understands the multiplayer meta, this is pretty ridiculous and an insult to the skill of the players whose games are integrated as a part of this data set.

    Aside from this, thanks a ton for your work Tzu! This has been a very interesting snapshot of the multiplayer meta. I wish it were possible to also integrate ban rates and if players had individually weighted scores which were calculated alongside win rates as a sort of ELO type system, but that is a ton of work and the information also isn't very readily accessible. Hopefully CA considers something like this in the near future if multiplayer balance is ever that high of a priority. Integrating a draft-ban multiplayer mode would be really cool and allow them to keep track of this information I'm sure.


  • BordigaBordiga Registered Users Posts: 277
    I think the data from dwarfs and Tomb Kings are probably the most reliableand accurate, though even those two factions have clear spurious results, like the skaven vs dwarf match up which is actually even in my experience, or the Tomb King vs Lizardmen which is the opposite of your data , cause actually Tomb King are good vs Lizardmen.


    In my opinion is also important to note that for a lot of match ups the sample size is pretty low, which means some of the results are not that meaningful. In fact a clear example is the BRET vs GS match up which only took place 4 times.

    Your win rates for that match up are based on that small sample.

    By the way I think that inlcuding the sample size in your chart with the win rates would be a pretty nice adittion.

    So for example something like this

    Beastmen GS and so on....
    Bret 33%/n= 6 50%/n=4

    All opinions my own.

    Medieval II is still the best Total War.
  • tzurugbytzurugby Registered Users Posts: 275
    Vistahm said:

    I think the data from dwarfs and Tomb Kings are probably the most reliableand accurate, though even those two factions have clear spurious results, like the skaven vs dwarf match up which is actually even in my experience, or the Tomb King vs Lizardmen which is the opposite of your data , cause actually Tomb King are good vs Lizardmen.


    In my opinion is also important to note that for a lot of match ups the sample size is pretty low, which means some of the results are not that meaningful. In fact a clear example is the BRET vs GS match up which only took place 4 times.

    Your win rates for that match up are based on that small sample.

    By the way I think that inlcuding the sample size in your chart with the win rates would be a pretty nice adittion.

    So for example something like this

    Beastmen GS and so on....
    Bret 33%/n= 6 50%/n=4

    Just FYI, the sample sizes are in another chart.... so it takes a bit of doing, but it can be worked out.
  • tzurugbytzurugby Registered Users Posts: 275



    Aside from this, thanks a ton for your work Tzu! This has been a very interesting snapshot of the multiplayer meta. I wish it were possible to also integrate ban rates and if players had individually weighted scores which were calculated alongside win rates as a sort of ELO type system, but that is a ton of work and the information also isn't very readily accessible. Hopefully CA considers something like this in the near future if multiplayer balance is ever that high of a priority. Integrating a draft-ban multiplayer mode would be really cool and allow them to keep track of this information I'm sure.


    I'd love to peg the results to an ELO, but that would be an order of magnitude more work.... I never started taking this data with those goals in mind. I just wanted to see how the Dawi stacked up in pick/ban tournaments. The rest is gravy :)

    I am a big fan of chess in general, having played seriously in my youth, and think ELO would be perfect for this game.

    I am also a big fan of World of Tanks and the way they manage and release data, which is essentially completely open, and the model I'd like to see CA follow.

    My sincere hope is that CA will see that with the addition of data to the conversation, debates become more interesting, and the discussion deeper. Data is not perfect and can never be the only thing considered, but it is an excellent addition to thoughtful conversation.
  • The_real_FAUSTThe_real_FAUST Registered Users Posts: 834
    Tzu,

    As ever thank you for the time and effort that has gone into this.

    Doff of the hat!
  • ArkhawnArkhawn Registered Users Posts: 3
    tzurugby said:



    It has been noted that an ability can be overpowered without the faction being overpowered, and there is variance in these numbers for sure.... but based on the data, and personally playing the Dwarfs, talking other Dwarf players and watching a LOT of games, that the following is true:

    1) The Dwarfs are not OP (though RoWR may still be)
    2) The Dwarfs are competitive in enough matches to be a worthwhile faction to be able to play in pick/ban tournaments.

    May I ask what RoWR stands for?
  • Ephraim_DaltonEphraim_Dalton Senior Member Registered Users Posts: 25,095
    Arkhawn said:

    tzurugby said:



    It has been noted that an ability can be overpowered without the faction being overpowered, and there is variance in these numbers for sure.... but based on the data, and personally playing the Dwarfs, talking other Dwarf players and watching a LOT of games, that the following is true:

    1) The Dwarfs are not OP (though RoWR may still be)
    2) The Dwarfs are competitive in enough matches to be a worthwhile faction to be able to play in pick/ban tournaments.

    May I ask what RoWR stands for?
    Rune of Sloth and Snail, formerly known as Rune of Wrath and Ruin.

Sign In or Register to comment.