The inevitable tier thread

Well, here is the list with the +/-/0 system, based off of Thelos list. I may or may not have made a mistake in my addition/subtraction, I’m not going back over it =]

Rog=7
Ryu=7
Vega=7

DeeJay=6

Chun=4

Sim=3
Honda=3

Sagat=2

Ken=1

Guile=0

Bison=-3
Cammy=-3

Fei=-5
Blanka=-5

Hawk=-10
Gief=-11

Online might be the only practical way to do it, but I have to admit that for something like this, I’m not sure how solid the results would be. For example, I’m a LOT more confident about the Sim vs Honda match offline then I am online.

Still, it might be an interesting exercise if the matches were recorded. If nothing else, it’d give everyone a visual guide to some of the various strategies that people use.

I agree with you :). But I’d take online results over no results.

Agreed.

Anyways, to kick this off I’m up for playing anyone a set of 20. We can do more if anyone is interested, like 50. I guess I could represent Chun, Claw, Dic (probably Boxer again eventually, I used to main him but dropped him in HDR because I was excited about Dictator and Ryu :tup: )

I could do Chun in just about any match but Vega and Cammy - I know the theory of playing those matches, but in practice I’m awful at them.

Same. Ill take hawk and gief or boxer.

One thing though, and this is VERY IMPORTANT. In order to accurately reflect the tier lists we need to make sure that players are evenly skilled in each match. So there should be some planning with regards to matchups.

Agree:
A 5-Star system with halves offers the same number of data points as a 10 point system without halves.

Disagree:
We use the same system as a 5-Star with halves movie ranking system.

Movies use a ranking system where the numbers correspond to a straight quality scale, where quality is determined by comparing films against one another.
5-Star films are “the best” and 0-Star/Turkeys are “the worst”.
I don’t know what the % population at each Star point trends toward, but I know that both 0-Star and 5-Star are used.

Street Fighter players allocate numbers on the ranking chart in a modified form.
I believe it is commonly accepted that this form is that the numbers represent how many matches out of 10 that the character should win in a given matchup.
Taking Thelo’s chart as an example, we can see that this modified form results in a very uneven point distribution:

Total rankings listed: 240
Frequency of showing up at each data point:
0 = none
0.5 = none
1 = none
1.5 = none
2 = 1
2.5 = none
3 = 15
3.5 = 12
4 = 47
4.5 = 20
5 = 50
5.5 = 20
6 = 47
6.5 = 12
7 = 15
7.5 = none
8 = 1
8.5 = none
9 = none
9.5 = none
10 = none

99% of all rankings fall between 3 and 7.
In a system without halves, that gives a reader 5 data points with which to ascertain differences in ranking between matchups.

77% of all rankings call between 4 and 6.
In a system without halves, that gives a reader 3 data points with which to ascertain differences in rankings between matchups.

Consequently, with the 5-Star with halves movie ranking, we have a full use of the scale.
This means the the movie system has 11 data points with which for people reading the data to ascertain differences in ranking between the movies.

Whereas in the Street Fighter 10 point without halves ranking, while there are 11 data points on the scale, only a much smaller subset of data points are actually used for 99% of the rankings: 5 data points.

Using halves on the Street Fighter rankings system allows us to give greater definition to the giant clump of rankings in the middle of the scale, giving us ~11 data points within which to populate 99% of the data.

I do not believe these definitions match up relative to the ranking method players currently use* to ascertain numbers for the Street Fighter ranking charts.

Compare it to the definitions I listed:
5-5 = even
6-4 = edge
7-3 = beating
8-2 = blowout
9-1 = might get lucky
10-0 = never

Our 10-0 definitions match up.
My 7-3 equals your 9-1.
My 6-4 equals your 7-3.
My defintions do not have the capability to define a “slight advantage” or a “big advantage”.

I believe if you line-up our definitions against the numbers on Thelo’s chart, that my definitions will better describe the number chosen for each matchup.
In fact, matching up your scale means that only 1% of matchups qualify as involving a “big advnatage”.
With matchups like Ken-vs-Honda as “advantage”, Guile-vs-Honda as “slight advantage”,

But by adding in the “.5” data points to the scale, we can then have the capability to define a “slight advantage” and a “big advantage”.

    • Street Fighter players allocate numbers on the ranking chart in a modified form.
      I believe it is commonly accepted that this form is that the numbers represent how many matches out of 10 that the character should win in a given matchup.

I DO like this modified form, because I think it really fixes the concept of matchup rankings across tangible/practical references.
I just think that it skews the distribution of the data across the scale, such that using halves becomes greatly helpful in discerning minute (but very material) variations within the clumped distributions.

I do appreciate your detail to ensuring the credibility of your statistics and your desire to remove pre-test bias before using the data, but inevitably, if the desire of a tier system is to measure how characters play against one another and to measure differences between them, you have to find a way to keep the variability of talent / ability at a fixed value. This simply isn’t possible. Even if we take the best players of every character to MINIMIZE the variability of this value, you will still run into pre-test bias errors.

Econometrically speaking, you’ll run into a situation where more skilled players, skew more towards well-known higher end characters, inevitably skewing your matchup data in their favor. Nevermind this is no fault of your own. The skewness of your distribution begins as inevitable and your job as the statistician or econometrician then is to minimize its impact, the only way to do that is to introduce confidence levels. I.E. Ryu vs. Ken to 95th percentile is 6-4 or whatever it may be. But then the acknowledgment that our methodology is imperfect allows the debate to move forward as to whether this process should be a quantitative approach at all, or whether or not the qualitative approaches that have been utilized so far (i.e. the personal opinions of everyone on SRK and their mothers…) have some sort of intrinsic and possible superior measuring capacity given our inability to narrow our data set due to the introduction of randomness in what can be argued as the most important variable.

Of course this leads to the natural discussion of measuring ability / talent / skill and then finding people of equal values. How would we do this? Tournament values are unreliable due to the level of randomness, online play is unreliable due to the fact that reaction time is lessened by the realities of data travel, so we have to make some consession here as well. Let’s say we agree to utilize online play data of matches with under 75 ping. Well, then how do we measure skill inside of these matches? Certainly it can’t just be wins and losses since we’re doing this excercize to measure characters against one / another and wins and losses will be skewed by the match ups we’re trying to measure so then do we measure it by the amount of combos someone can do? How technically adapt they are? This inevitably will result into a qualitative shouting match as well. This also leads to an interesting side note on new players and how to measure them. If the existence of said quantitative tier chart were realized, would that data inevitably influence and skew new players to characters who’s quantitative values have already been recognized as higher further skewing those values in future revisions of the data?

In short, from a scientific point of view, I am not sure that a truly viable quantitative study of matchups would be helpful, or even really possible without the data being called into question at some point. Now Zasspacer makes a good point when he says that some information is better than none. I agree with this conclusion 100%. That’s why maybe instead of approaching problems from a character vs. character approach we change the discussion slightly to a more personal level and match individuals versus other characters. We remove skill from the equation and we develop a data set that allows players to see how they do individually against other characters, collaborate that data with other characters who main the same characters and from there identify individual’s strengths and weaknesses. The WWL is taking this type of personalized approach as we’re collecting hundreds of individual match up data from over 30 different individuals. Some characters are not well represented yet but my goal is after collecting this data for perhaps a year or more, that a reasonable, arguable quantitative study may be possible. I mean if I can say that Silver Rain 007, who plays Cammy, wins 37% of his matches against Honda players (I wish…), and Noriega wins 31% of his matches against Honda players (he wishes too), and we’ve both played 100 matches against Hondas then that data through repetition becomes valuable without the chaotic and immeasurable impact of the talent / skill / ability variable. Just something to think about as this discussion moves forward.

Theory Fighter FTW.

Some updates to my chart:


	Bal Dha	Veg Ryu	Sag Dee	E.H Chu	M.B Ken	Gui Cam	Bla Fei T.H Zan	   Total
Balrog	-   5	5.5 5.5	6   6	5   5	6   6	6   4	6   6.5	6   4	   82.5
Dhalsim	5   -	4   6	7   6	4   4	5   6	6   3	4.5 5.5	6   6	   78
Vega	4.5 6	-   6	6   5	4   5	4   5.5	6.5 5.5	6   6	7   6	   83
Ryu	4.5 4	4   -	5   6	7   6	6   6	5   6.5	6   6	5.5 5	   82.5
Sagat	4   3	4   5	-   5	5.5 5.5	5   5.5	6   5	5   4.5	6.5 7	   76.5
Dee Jay	4   4	5   4	5   -	6.5 5	6   5	5   6	6   7	6.5 6.5	   81.5
E.Honda	5   6	6   3	4.5 3.5	-   4.5	6   3	4   6.5	6   6.5	8   7	   79.5
Chun Li	5   6	5   4	4.5 5	5.5 -	5.5 5.5	5   5.5	5.5 5.5	4.5 7	   79
M.Bison	4   5	6   4	5   4	4   4.5	-   4	5   6	5   5	5   6	   72.5
Ken	4   4	4.5 4	4.5 5	7   4.5	6   -	4.5 6.5	6   6	5   5.5	   77
Guile	4   4	3.5 5	4   5	6   5	5   5.5	-   7	3.5 6	6   7	   76.5
Cammy	6   7	4.5 3.5	5   4	3.5 4.5	4   3.5	3   -	5.5 5.5	7   7	   73.5
Blanka	4   5.5	4   4	5   4	4   4.5	5   4	6.5 4.5	-   4.5	6.5 7	   73
Fei Long3.5 4.5	4   4	5.5 3	3.5 4.5	5   4	4   4.5	5.5 -	6   7	   68.5
T.Hawk	4   4	3   4.5	3.5 3.5	2   5.5	5   5	4   3	3.5 4	-   5	   59.5
Zangief	6   4	4   5	3   3.5	3   3	4   4.5	3   3	3   3	5   -	   57

2009-Sept-21 changes:

Deejay vs Balrog
Homer Pimpson: 5-5
Tschesae: 4-6
BruceLB: 4-6
Eventhubs guy: 5-5
#Original chart rating: 6-4
##New chart rating: 4-6

Chun Li vs T. Hawk
Zass: 4-6 ## removed vote
JigglyNorris: 4.5-5.5
Eventhubs guy: 5-5
skankin garbage: 6-4 (?)
Ouroborus: "Chun’s worst matchup"
gridman: “Hawk splash still kills Chun Li”
#Original chart rating: 4-6
##New chart rating: 4.5-5.5

Fei vs Cammy
Noriega: 4-6
Aqua Snake: 4.5-5.5
Eventhubs guy: 5-5
Sirlin: “About even”
#Original chart rating: 5-5
##New chart rating: 4.5-5.5

Honda vs Chun Li
Thelo: 5-5
Basically everyone else: 4-6
#Original chart rating: 5-5
##New chart rating: 4.5-5.5

That’s what I figured, fei tends to have some kind of disadvantage to somebody.

Let me be clear.

I am Zass.

I am not Zaspacer.

Zass and Zaspacer are two different people.

As for Zasspacer, such a one does not exist. No, he is not the product of an unholy union, whose skill in Street Fighter ends the Era of Tiers. No he is not destined to become He Who Ravages Souls, the Sun Slayer: Bringer of Gtterdmmerung.

Best not to name him, in any case. Not that he exists. He doesn’t.

I got the chance to play several sets of Chun vs Fei with Aqua Snake – I definitely agree that this is largely in Fei’s favor. Unless I am missing something big, I don’t see a good defense against chicken wing lockdown. Another big factor in that fight is that CW hits her neutral jump attacks, which is one of her core defenses.

I think we went 10-0 in Fei’s favor in our set.

I would be inclined to agree with BruceLB about DeeJay/Boxer being 4-6. He has played much better Boxers than I have, so I will take his word for it.

Oh now Zass, you know I just accidentally added the extra S and I know you’re two different people, but just for that you start league play with a 3 point pool play penalty… j/k lol

Cross (Fei Long) vs. Nuki (Chun)
Round 1: Chun Li
Round 2: Chun Li
[media=youtube]A_XKg4q0jxU&feature=related[/media]

K (Fei Long) vs. Prince (Chun Li)
Round 1: Chun Li
Round 2: Fei Long
Round 3: Fei Long
[media=youtube]0HQvTx1V1Oc&feature=channel_page[/media]

Noguchi (Fei Long) vs. Kita (Chun Li)
Round 1: Fei Long
Round 2: Fei Long
[media=youtube]FMAKRQFFqdI[/media]

Okafei (Fei Long) vs. Prince (Chun Li)
Round 1: Chun Li
Round 2: Chun Li
[media=youtube]Gb35xe2mq9Q&feature=channel_page[/media]

Noguchi (Fei Long) vs. Tohjyo (Chun Li)
Round 1: Chun Li
Round 2: Chun Li
[media=youtube]AncLmb_qhNY&feature=related[/media]

http://homepage.mac.com/andrew_j_stewart/geo/chun-li.pdf

http://homepage.mac.com/andrew_j_stewart/geo/fei-long.pdf

Kwisatz Haderach?
Golden Path?

Please note that my last post was not advocating the pros/cons of an “ultimate/universal/master-of-the-universe” ranking chart.

It was simply advocating that these charts (their subjective/objective value aside) in their current formatting would benefit if done with fractions in their numbers.

It really depends on what you are trying to do.

If people want to determine the numbers for the ULTIMATE ranking chart where all character are played with ultimate game knowledge and ultimate ability and cherubs do cartwheels around it… then someone at CalTech needs to get busy creating Super AI’s that can pursue this.

And I agree, THAT is a pipe dream.

However, there are many practical uses for a ranking chart.

Entering a team tourney?
Make a chart from your perspective for your main and hand it to your teammate.
They can see what matchups you do poor in, and pick a character that does well in them.

Curious at what the “realistic”, ballpark top potential is with a character vs. each opponent?
Find matches where top skilled/knowledgable/performing players competed and note their track record(s).
Only have one match? Think of it as a record of a possible outcome.
Have a series of matches? Think of them as a record of a possible outcome trend.
Have access to a chart that top players made up based on their beliefs? Take it and add it to the information you have, but take it with a grain of salt.

I have used ranking charts (which I have reviewed and weighed relative to the magnitude range and danger of their potential errors in conjunction with using them), and I find it can be very helpful as a component in my learning process.

Sure, you could make a ranking chart with players instead of characters just as easily.

I like it.

You could also add modifiers like extra negative/positive multiplier weighting for such things as VERY bad matchups, or relative to frequency of opposing character being played, or tier ranking of opposing character, etc.

Well you don’t need someone at Caltech… just need a half decent economist :D. And I might have misunderstood your intention. I would certainly agree that even in a qualitative measuring system, having mark posts inbetween set integers is helpful for investigating the minutia of a given matchup. Of course an example of quantifiable values that would lead to the mark posts should, I think, be rigidly defined so that players on either the higher or lower end of that can understand the underlying details with a greater level of accuracy.

I also certainly agree that a ranking chart has its uses. I guess my (misreading) of your argument was something in favor of a strict quantitative approach versus a qualitative one, which re-reading your posts, I’m not sure you were trying to make that point. Just trying to move things forward.

Oh man, see, now I wish I hadn’t read this because now I’m thinking about how to regress the data in a way to remove pre-test bias and come up with a reasonable formula to determine reasonable coefficients towards measuring a match-up’s chance of victory…

The funny thing about these matchups is, few of them show a very solid defense against Chicken Wings. The only one that really demonstrates a good defense against CW is the Noguchi vs Tojjyo match, but here’s the thing about those methods of D:

  • S.Hk is slow, so you really have to do it early, or even anticipate a CW. It is a good move for it though, because if also stops Rekka attempts. Still, it requires anticipation and leaves you wide open if you were wrong.

  • walk under/duck under and throw is good, but it’s not easy to get in that position; too far away and it’s impossible to get under in time, and if you’re too close, you’ll be hit by CW on the way up.

  • Chundouken is an AWESOME counter to CW…in ST. In STHD, a smart Fei player will mix in Short Chicken Wings with Forward and Roundhouse ones, so if you throw a Chundouken at the wrong time (and how can you possibly tell by the startup?), you’ll take one to the face.

  • Neutral J.Lk and Neutral J.Hp will beat CW clean, but only if it’s done from farther away; if you try to use this as anti-air from close up, the first hit of CW will go right through Chun’s attack (or, technically, it hits her from under her attack).

I’ve been working on a few things to stop CW spam from JERKS like Aqua Snake (throw him out of startup, S.Mk him out of startup), but I haven’t got it to a point where I can do it reliably yet. Basically, though, I’d say this match is 6-4 Fei, but the matches pretty much always look one-sided; If Chun can zone Fei (and I like to go for crossups on knockdown), Chun can pretty much avoid taking damage the whole fight, but if Fei gets close enough to do repeated CWs, there’s not much that Chun can do, and the round could be over.

P.S: I love you Aqua Snake

P.P.S: No homo

Yeah, it looks like it was a misunderstanding.

I was speaking specifically towards that “ultimate game knowledge” and “ultimate ability” skills that could be determined/reached/simulted through AI… though I would tone down the “ultimate ability” so that the reaction time was at least as slow as the fastest human reaction speed.

:lovin:

Kwisatz haderach and the Golden path are two seperate paths that are opposite of each other.

one turns you into someone who can see the future and still be able to see things after your eyeballs melted out of their sockets from radiation.

the golden path is where you stick a bunch of sandslugs on your body and then you slowly transform into a giant worm that doesn’t like water.

this is relevant to the tier lists somehow.