Consistency in 2.1 engine

Savard · Post by **Savard** » Fri Sep 25, 2020 10:58 am

Hey guys! I am currently trying to create ratings for the 2.1 engine. I am aware of 36Henry's research, the "inverted" SK, tendency calculations etc.

I am going a bit of a different route than most. While others tested their ratings only with one set of rosters (usually NHL rosters/lines), I created 32 different leagues with randomized rosters, and comparing the average results of those 32 different leagues to the target values. For that I even rented a cloud computer on Amazon to have 16 threads simulating 16 leagues simultaneously. Crazy me

I am pretty confident, that I can come up with ratings where the averages hit the target values pretty closely, but I worryingly noticed that there sometimes is a huge difference between the stats in a players best version of the leagues and the worst. Example Brady Tkachuk:

Target: 23G, 25A, 48P.
Average sim result: 21G, 24A, 45P.
Almost perfect. HOWEVER:

in the his best league, he has 49G, 65A, 114P playing with Lars Eller and Tyler Toffoli (hardly a killer line on paper)
in his worst he has 5G, 6A, 11P playing with Joe Pavelski and Alex Chiasson

Ice time is factored in already, so that is not the issue.

The problem can also be seen in "real life":
http://lhsm3.ca/PlayerReport.php?Player=773

I am assuming the LHSM used 36Henry's 2.1 ratings for their 2017 and 18 seasons. In 17 Pacioretty had 150P in 80GP, a year later it was only 14P(!) in the same amount of Games. He probably had a drop in ice time but the P/20 went from 1,97 to 0,27.

Those are extreme examples, but guess you get my point. Has anyone done any research in that department? Anyone any idea how to make the results more consistent?

Kitsune · Post by **Kitsune** » Fri Sep 25, 2020 9:06 pm

Morale - it can have a huge impact (I did something similar too .. I have a random player generator in my toolbox). Once I greatly reduced the slider for that - it generated more consistent results.

Savard · Post by **Savard** » Sat Sep 26, 2020 10:47 am

Mmmmh. It could be true for that real life example, I have no idea what settings they used, but for my tests morale was turned off. I did some more testing and using R I found that there is a certain correlation between the "range" of the results and the amount by how much SC is higher than PA or PH. PA and PH closer to (or higher than) SC produces more consistent results, which makes it hard to get ratings for snipers.

edit: OK, some more research: I noticed that the results range for d-men is much smaller compared to forwards, so I checked the correlation again for Forwards only, and the effect described above is reduced. The correlation coefficient between (SC-PA) and (worst points divided by best points) is -.337 in case anyone knows what than means.

It seems that the 2.1 engine generally is much more consistent with the results for d-men compared to forwards...

36Henry · Post by **36Henry** » Sat Sep 26, 2020 1:43 pm

VERY cool stuff Savard!

Obviously I have no insight into the rosters you're using, but in my experience running tests using different lines/teams will produce inconsistent results as the mechanisms are very sensitive to how lines and teams are put together.

As a guess I would think Tkachuk was playing top line minutes, as one of the most talented (if not the most talented) player on his team when he scored 114 points. And on the season he played with Pavelski and Chiasson he was further down the depth chart with some talented teammates ahead of him on the depth chart.

I think a very interesting experiment would be to run your 32 leagues using identical rosters and line combinations (preferably as close to the real NHL rosters and line-ups as possible). That way you would have a wealth of data to use in tweaking every single player into being as close to his real life persona as possible.

I never thought of going the Amazon route. Cool stuff indeed!

Savard · Post by **Savard** » Sun Sep 27, 2020 12:15 pm

Thanks, 36Henry! You had emailed me a few weeks ago with your suspicion, that overachieving players have to do with them being the most talented on their team, but I don't think is is the case here. Tkachuk's best-season-team also has Mikko Rantanen, Jonathan Drouin and Ryan O'Reilly on the first line (Tkachuk's is the second).

The reason why I don't want to do 32 times the same rosters is because I fear that a fortunate/unfortunate line combination may give a player a huge advantage/disadvantage in ratings. If for example if a player in STHS doesn't click with his real life linemates, it may result in undeservedly high ratings for him. In theory you are right of course, I just don't believe the sim mimics the real world and the effect of linemates well enough.

From a purely scientific standpoint it may be interesting to use random players (either fully random numbers or Kitsune's random player generator), sim 100 seasons each with different randomized players, and then run some clever code in R to find some hidden correlations between results and consistency and different ratings and ratings combinations. But I simply don't have the time for that right now plus I think it would require some better statistical knowledge than simply finding the correlation coefficients.

What I will do in the coming days is to reduce the "spread" of PH, PA and SC for all forwards to see if the average range of results is reduced.

Also I am thinking about instead of targeting real life performance with the average of the 32 leagues a player is in, I will only use the average of the player's best 16 leagues. That lowers the ratings and prevents unrealistically high season totals from occuring. Unlike with the automatic tests, real life GMs will (hopefully) move around underachieving players and try different line combinations, so the lower end of the results range doesn't play such a big role in the end.

edit: got my AWS bill today, and maybe it is cheaper to get a good processor for my own PC

Post by **SimonT** » Sun Sep 27, 2020 1:31 pm

Savard wrote: ↑Sun Sep 27, 2020 12:15 pm edit: got my AWS bill today, and maybe it is cheaper to get a good processor for my own PC

Any old spare computer could do the job. I have a old PC in my living room that I leave crushing data all night long quite often.

Kitsune · Post by **Kitsune** » Sun Sep 27, 2020 3:00 pm

My player generator is prolly geared for this kind of activity - the part that is suppose to macro'ed but i instead manually generated are the names (I have a 100,000 name database split among 20 nationalities according to NHL makeup), nationality, height (weight is generated based off height),PO and CSV creation. For this kind of activity you can really generate easily and just copy/paste to create the CSV. This is what the rating selection screen will look like:

I am launching a fictional league tomorrow using this generator and will do 8+ seasons in a calendar year. If this launch is successful we will prolly see some interesting data start coming out of a human intervention league.

Savard · Post by **Savard** » Tue Sep 29, 2020 11:29 am

SimonT wrote: ↑Sun Sep 27, 2020 1:31 pm Any old spare computer could do the job. I have a old PC in my living room that I leave crushing data all night long quite often.

That may be true if you have access to the code and can write your own testing routines. I have to use the automatic test feature, and then parallel processing comes in very handy.

Savard · Post by **Savard** » Tue Sep 29, 2020 11:30 am

Kitsune wrote: ↑Sun Sep 27, 2020 3:00 pm I am launching a fictional league tomorrow using this generator and will do 8+ seasons in a calendar year. If this launch is successful we will prolly see some interesting data start coming out of a human intervention league.

Oh if you could give me the v3players csv files, I would love to run some R based analyses on them!

Savard · Post by **Savard** » Mon Oct 05, 2020 12:55 pm

Some more findings:

I picked an algorithm to reduce the spread between PA, SC and PH and wanted to check if that reduced the spread of the results. It almost had no effect.

So I started to look into some of the most obvious overachievers. For example the curious case of the line of Blake Coleman, Nazim Kadri and Filip Zadina. In other leagues, where they were not on the same line, their results (on average) were pretty close to what I targeted. However together they completely tore apart League 19. Coleman led the league with 144 points. I cannot remember the numbers for Kadri and Zadina, but they were closely behind.

The first thing that struck me, was that Coleman obviously had pretty high ratings in CK and DF. From earlier research I already had the suspicion, that high DF and CK lines sometimes overachieve. Coleman had 87 in Ck and 97 in DF. I played around with those ratings, and the points dropped as I decreased CK and DF (both had an effect).When I had them both at 82, the Colemans points total was at 58, which is more in line of what to expect. Zadina's and Kadri's points were reduced accordingly.

Great, I thought. That means if I reduce CK and DF for forwards, that may keep those strange overachieving lines from showing up. I fired up my Excel and reduced both CK and DF my 15% for all forwards, ran my big tests (32 leagues, 31 teams, almost 4000 different line combinations) AAAAAND...

...still the same problem. The line mentioned above was still horribly overachieving (Zadina 155 points, Kadri 135, Coleman 132). What a bummer.

Then I noticed something different: those overachieving lines often had players on them with a high SC. My tuning cycles always try to increase the SC if the player is scoring too few goals. The sim has a big problem reproducing the play of players who score goals, but have fewer Assists that goals. In fact only a handful of the 880 skaters I have in the sim produce averages where the number of G is larger than the number of A. And they only have one or two more goals than assists.

Soem players with more real life goals than assists tend to get a high SC, especially compared to PA. Eventually I guess most of them would be at 99SC without having their NHL goal total, since the PA rating is reduced automatically as well, since the higher SC causes too many assists. 36Henry described this behaviour of the engine in another thread.

Does anyone know how to create a player who *on average* (and not just on his specific test line) produces more G than A? If not I may have to modify the target values for the players so for none of them the target goals is greater than the target assists. Too bad, but it seems that the engine seemingly favouring assists from rebounds(?) makes it almost impossible to recreate a player like Ovechkin.

Edit: my hope is that if my tuning method stops producing players with high SC and low PA, the overachievers go away.

Kitsune · Post by **Kitsune** » Tue Oct 06, 2020 8:59 am

It is is possible .. I vividly remember one guy getting 72 goals and 8 assists in 60 games. I'm going to send you the test league files on the weekend.

Savard · Post by **Savard** » Mon Oct 19, 2020 4:37 pm

Hey fellow 2.1 nerds. Before I write a longer post tomorrow about my research, I wanted to ask you about the following:

Did you guys every check the Assists/Goals ratio in your 2.1 (test) leagues? Total Assists divided by Total goals?

In the NHL the ratio is about 1.75 assists per goal. In STHS 2.1 engine I always get around 1.9 a/g, and I struggle to make the ratio more realistic. It's a problem because I need to adjust some players' target values, because the engine doesn't produce players with many goals a few assists (like 50-20-7=, Ovechkin) *and* high scoring players with few goals and many assists (think 19-62-81, Marner), and I think I should the sim's tendency to produce too many assist into account.

Well, maybe you can quickly check a few of your test results in Excel or so.

36Henry · Post by **36Henry** » Tue Oct 20, 2020 12:18 pm

Pure snipers are extremely hard to create in my experience. Most of the goals in the Sim come from rebounds so I would think that has a lot to do with it. Players who shoot a lot will simply get quite a few assists by default.

The other way around is much more common. Having forwards finish an 82 game schedule with numbers like 8+52 is not uncommon.

Savard · Post by **Savard** » Mon Oct 26, 2020 1:34 pm

I am also struggling creating a forward with more than 60 assists. No matter if he has 19 Goals or 38 goals (and the coresponding SC). Playing around with SK didn't help either. We know it's not a simple inversion, as there seems to be a "sweet spot" for SK where the production is highest.

On the other hand I also struggle to create defensive defensemen with <10 assists. *sigh*

Overall I am on a a good way though. Maybe I am being too perfectionist?

I quess I will play around with SK more...

ynohtna · Post by **ynohtna** » Sun Nov 08, 2020 4:14 am

cool, I always use 2.1 engine cause it's in theory the best engine but so many people don't like the results.

I guess it's kind of good that it's not too predictable, definitely want some 'playability' for the GMs to tinker and line match or something. I've always felt 2.1 gave the best chance for that.

Realistic is hard to gauge, different league sizes, different roster combinations... my biggest gripe these days are super goalies and not enough bottom six scoring

Unfortunately, I don't have the time or enough know how to tinker and run with what I have. As long as my best GMs are tops of the league, something is going right enough!

STHS Forum

Consistency in 2.1 engine

Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine

Re: Consistency in 2.1 engine