Discussion on Contest Judging

Goat · Mar 26, 2017

ForgeHub contests suggest that maps are judged on the following qualities:

Eligible map submissions will be judged based on:

"Game-type Support
Does the map function for the game-type as necessary? Are all required objects and systems in place to successfully play a game?

Fun-Factor
Is the map fun to play on? Is the space fun to navigate through? Do the weapons, spawns, and layout promote a positive game experience? Is the map balanced and fair for the given game-type?

Performance
Does the map suffer from any negative visual or performance issues?

Originality / Creativity
Is the map interesting and unique in both design and appearance?"

PROBLEM: These categories can be interpreted differently by the judges depending on which one matters most to them, leading to maps that are arbitrarily cut for reasons specific to the preferences and opinions of each judge.

PROPOSAL: Eliminating bias is impossible, but diminishing it through a consistent and transparent assessment system can lead to more accurate representation of contest results.

Use this thread to propose your own assessment system for any contest. Here is my system for 2v2 contests, where each category is streamlined and weighted 25%:

Originality: Layout, Theme, Design - 25%
5 - Strongly Agree
4 - Agree
3 - Indifferent
2 - Disagree
1 - Strongly Disagree

Gameplay: Spawns, Weapons, Balance: - 25%
5 - Strongly Agree
4 - Agree
3 - Indifferent
2 - Disagree
1 - Strongly Disagree

Presentation: Lighting, Art, Aesthetics, Colors- 25%
5 - Strongly Agree
4 - Agree
3 - Indifferent
2 - Disagree
1 - Strongly Disagree

Performance: Framerate, Boundaries - 25%
5 - Strongly Agree
4 - Agree
3 - Indifferent
2 - Disagree
1 - Strongly Disagree

Here is the system in practice:

Originality: Strongly Agree, 5 = 25%
Gameplay: Agree, 4 = 20%
Presentation: Indifferent, 3 = 15%
Performance: Indifferent, 3 = 15%

Percentages are added up and then given a total score for each individual judge. So this judge has given the map 75%, or 75 out of 100.

Assuming this map receives four scores of 70, 75, 80 and 80, the total would be 305, the average of which is 76 rounded down. Therefore, this particular map is weighted at 76%, or 76/100

This system assumes a few things:
1. There are four judges. (I do not believe the number of judges negatively impacts the system.)
2. Maps that are tested are all passed through this system
3. Each judge gives the fairest and most accurate rating based on several playtests
4. This system is used to grade maps, and that a map that is graded below another map is not necessarily ranked below, and therefore eliminated by default.

TIEBREAKERS AND ELIMINATIONS:
Maps that are similar in score would then be tested against one another repeatedly during elimination rounds, after which a decision is made. The difference between this decision under this system verses this decision under no system is that each Judge's assessment of the map is clearly communicated, and this criteria applies to each map.

If there is a flaw in this system, feel free to correct or add onto it, but I'd also like to see other systems proposed.

Goat · Mar 26, 2017

@BodeyBode Explain to me how you arrived at your 18 and 19 number.

You can arrive at the number 19 by adding a score of 70, 75, 80 and 80 together, taking the average which is 76, and then dividing that by 4 again. But why would you do that? You already have your average. There's no need to lower the scale.

My proposal assumes 4 judges, 4 categories and 5 potential grades within that category, each weighted at 25%. Categories are not being compared to one another; rather, the sum total of each category is compared to the other judges' totals. If my math is wrong, then please point it out to me. But your numbers lead me to believe that you're misunderstanding the system. Your post seems to operate on a 1-25 scale system, which is not what I was suggesting. This is a 100% based grading scale where the judges' scores are averaged to give the map an assessment number to form a more concrete comparison point between maps.

It's imperative to stress that this grading system does not automatically rank maps for elimination. It is a grading scale, not a ranking system.

LargerFiend · Mar 26, 2017

I agree with the scale for the most part. One thing I thing needs to be changed though is having Performance removed as its own category. I think a map should be expected to not have any game breaking out-of-map spots and no framerate issues. If it does, it needs to be optimized until it works flawlessly; which isn’t too difficult, just takes a bit more time.

Having it as a part of the scale will just dilute the ratings, assuming all considered maps have no technical issues.

Agent Zero85 · Mar 26, 2017

An inherent bias is what makes a contest. Usually the judges share a majority opinion on the aspects on which they judge. This hasn't been in the case for many contests that have hosted judges who claim/cling to various design principles and philosophies. The system you show is imo "better" because it separates playability/replayability. (Whether the map is qualified for the contest, and whether it is fun playing.) And allows for a contest to be more "fair".

The goal of the contest is to allow those with a subjectively better talent to win, but the question I have is: Do judges also play a role in the issue you described?

BodeyBode · Mar 26, 2017

A 3 Legged Goat said: ↑

@BodeyBode Explain to me how you arrived at your 18 and 19 number.

You can arrive at the number 19 by adding a score of 70, 75, 80 and 80 together, taking the average which is 76, and then dividing that by 4 again. But why would you do that? You already have your average. There's no need to lower the scale.

My proposal assumes 4 judges, 4 categories and 5 potential grades within that category, each weighted at 25%. Categories are not being compared to one another; rather, the sum total of each category is compared to the other judges' totals. If my math is wrong, then please point it out to me. But your numbers lead me to believe that you're misunderstanding the system. Your post seems to operate on a 1-25 scale system, which is not what I was suggesting. This is a 100% based grading scale where the judges' scores are averaged to give the map an assessment number to form a more concrete comparison point between maps.

It's imperative to stress that this grading system does not automatically rank maps for elimination. It is a grading scale, not a ranking system.
Read More ...

Your system is 4 judges, 4 categories, 5 grades per category.

For me, the number of judges and categories is irrelevant but simplification purposes we will use 4 categories

If each of the category is weighed the same at 25% then each category can be worth a maximum of 25 points (4 categories, each worth 25 points would make a perfect map 100 points)

Example:

Map X
Cat1- 22 points
Cat2- 19 points
Cat3- 23 points
Cat4- 24 points
88 points = 88%

Map Y
Cat1- 21 points
Cat2- 20 points
Cat3- 22 points
Cat4- 24 points
87 points = 87%

If this same map were judged with only 5 grades per category it would look like this

Map X
Cat1- 5 strongly agree
Cat2- 4 agree
Cat3- 5 strongly agree
Cat4- 5 strongly agree
Total score 95%

Map Y
Cat1- 5 strongly agree
Cat2- 4 agree
Cat3- 5 strongly agree
Cat4- 5 strongly agree
Total score 95%

Goat · Mar 26, 2017

@BodeyBode Your first example doesn't make any sense. Where are you getting uneven numbers from?

Goat · Mar 26, 2017

LargerFiend said: ↑

I agree with the scale for the most part. One thing I thing needs to be changed though is having Performance removed as its own category. I think a map should be expected to not have any game breaking out-of-map spots and no framerate issues. If it does, it needs to be optimized until it works flawlessly; which isn’t too difficult, just takes a bit more time.

Having it as a part of the scale will just dilute the ratings, assuming all considered maps have no technical issues.
Read More ...

In my experience, a submitted map was deemed "playable" before it was played, and the ones that were then played either had major or minor performance problems. I can see how it dillutes the rating though, but what would you replace the category with if you remove it?

BodeyBode · Mar 26, 2017

A 3 Legged Goat said: ↑

@BodeyBode Your first example doesn't make any sense. Where are you getting uneven numbers from?
Read More ...

If each category is worth 25% of the total score. It's no different than 25 points. The numbers such as 19 are out of 25 possible points

Goat · Mar 26, 2017

BodeyBode said: ↑

A 3 Legged Goat said: ↑

@BodeyBode Your first example doesn't make any sense. Where are you getting uneven numbers from?
Read More ...

If each category is worth 25% of the total score. It's no different than 25 points. The numbers such as 19 are out of 25 possible points
Read More ...

Yes, if you divide twice, which you are not supposed to do. It's not a point system. You cannot directly award a map an uneven grade until you average the scores for a percentage of a 100% and then divide that again.

I do not understand what you are trying to say. Does what I proposed have a problem, or are you saying a point system would be better?

BodeyBode · Mar 26, 2017

A 3 Legged Goat said: ↑

BodeyBode said: ↑

A 3 Legged Goat said: ↑

@BodeyBode Your first example doesn't make any sense. Where are you getting uneven numbers from?
Read More ...

If each category is worth 25% of the total score. It's no different than 25 points. The numbers such as 19 are out of 25 possible points
Read More ...

Yes, if you divide twice, which you are not supposed to do. It's not a point system. You cannot directly award a map an uneven grade until you average the scores for a percentage of a 100% and then divide that again.

I do not understand what you are trying to say. Does what I proposed have a problem, or are you saying a point system would be better?
Read More ...

Dividing? I'm confused now.

Yes, points would be a better system in my opinion

a Chunk · Mar 26, 2017

Goat · Mar 26, 2017

BodeyBode said: ↑

A 3 Legged Goat said: ↑

BodeyBode said: ↑

If each category is worth 25% of the total score. It's no different than 25 points. The numbers such as 19 are out of 25 possible points
Read More ...

Yes, if you divide twice, which you are not supposed to do. It's not a point system. You cannot directly award a map an uneven grade until you average the scores for a percentage of a 100% and then divide that again.

I do not understand what you are trying to say. Does what I proposed have a problem, or are you saying a point system would be better?
Read More ...

Dividing? I'm confused now.

Yes, points would be a better system in my opinion
Read More ...

Well how would you give a map points? What determines the difference between an 88 and an 87? Is it some sort of scale or a perceived difference in quality by a single point?

What is a point worth in a 25 based scale? Do you take one point off if there are bad spawns? Do you take 13 points off if the light in half of the rooms is too bright? Do you take 7 points off if you get frames in one corner? The way you are portraying it looks as ambiguous and arbitrary as the current way of doing it.

Here's the way I sorted the Top 5 maps in the contest. I lined all of them up and gave them all a ranking based on who did what the best. So let's say I have 5 maps:

Gameplay
5 - Map B
4 - Map A
3 - Map C
2 - Map E
1 - Map D

I think there were 6 of these categories, so the max points I could give a map was 30. At that point, it was a simple ranking and I didn't need to assess it any further.

That's how you use points. They are small numbers comparing different values, not deducting based on an individual's whims.

BodeyBode · Mar 26, 2017

A 3 Legged Goat said: ↑

BodeyBode said: ↑

A 3 Legged Goat said: ↑

Yes, if you divide twice, which you are not supposed to do. It's not a point system. You cannot directly award a map an uneven grade until you average the scores for a percentage of a 100% and then divide that again.

I do not understand what you are trying to say. Does what I proposed have a problem, or are you saying a point system would be better?
Read More ...

Dividing? I'm confused now.

Yes, points would be a better system in my opinion
Read More ...

Well how would you give a map points? What determines the difference between an 88 and an 87? Is it some sort of scale or a perceived difference in quality by a single point?

What is a point worth in a 25 based scale? Do you take one point off if there are bad spawns? Do you take 13 points off if the light in half of the rooms is too bright? Do you take 7 points off if you get frames in one corner? The way you are portraying it looks as ambiguous and arbitrary as the current way of doing it.

Here's the way I sorted the Top 5 maps in the contest. I lined all of them up and gave them all a ranking based on who did what the best. So let's say I have 5 maps:

Gameplay
5 - Map B
4 - Map A
3 - Map C
2 - Map E
1 - Map D

I think there were 6 of these categories, so the max points I could give a map was 35. At that point, it was a simple ranking and I didn't need to assess it any further.

That's how you use points. They are small numbers comparing different values, not deducting based on an individual's whims.
Read More ...

this doesn't look anything like what you wrote in your OP. How would you rate maps when there's more than 5 maps to judge?

I gotta head back into work but I'll respond later.

Goat · Mar 26, 2017

BodeyBode said: ↑

A 3 Legged Goat said: ↑

BodeyBode said: ↑

Dividing? I'm confused now.

Yes, points would be a better system in my opinion
Read More ...

Well how would you give a map points? What determines the difference between an 88 and an 87? Is it some sort of scale or a perceived difference in quality by a single point?

What is a point worth in a 25 based scale? Do you take one point off if there are bad spawns? Do you take 13 points off if the light in half of the rooms is too bright? Do you take 7 points off if you get frames in one corner? The way you are portraying it looks as ambiguous and arbitrary as the current way of doing it.

Here's the way I sorted the Top 5 maps in the contest. I lined all of them up and gave them all a ranking based on who did what the best. So let's say I have 5 maps:

Gameplay
5 - Map B
4 - Map A
3 - Map C
2 - Map E
1 - Map D

I think there were 6 of these categories, so the max points I could give a map was 35. At that point, it was a simple ranking and I didn't need to assess it any further.

That's how you use points. They are small numbers comparing different values, not deducting based on an individual's whims.
Read More ...

this doesn't look anything like what you wrote in your OP. How would you rate maps when there's more than 5 maps to judge?

I gotta head back into work but I'll respond later.
Read More ...

Sigh

I used a point system to rank the top 5 maps. That is completely separate from a percentage based assessment system to judge 20 or 30 maps. The onlt similairty between the two is that i parsed my assessment through a scale. But they serve two completely different purposes.

You are suggesting a point based assessment system to grade the first waves of tested maps. I find that inefficient, and certainly not the same as percentages, because you cannot quantify the difference between numbers without a clearly demonstrated scale. In simpler words, you cannot - or at least have not in this case - told me how a map can get a 23 or a 22 in one category. That doesnt make any sense. You have not explained why a judge can arbitrarily deduct 2 points off of a category. The only way you can arrive at those uneven numbers is if you take several judges' scores in the categories, add them up, and take the average of them twice, but that is done after the fact and is still possible with the agree/disagree method.

That is assuming i understand what you have posted thus far. There seems to be some sort of barrier for communication here.

WAR · Mar 26, 2017

Hey this sounds similar to what I proposed for our most recent Ghost in the Shell Forge contest. We had trouble collectively determining why maps were being chosen so this was going to help us out. The 4 categories are different because that is what 343 provided us with however I proposed a similar point system out of 5. Eventually, even ForgeHub's map rating system will support 4 categories. It provides so much more information to the author regardless of post game discussion or comments.

MultiLockOn · Mar 26, 2017

The system makes sense for certain points in judging. Early on you don't need this because you're just hacking away the bad maps. But once things get a bit closer this comes in handy.

What doesn't make sense is your categories. Having performance be worth as much as visuals or gameplay is a little odd, that should just be something that's expected and not scored with.

I would do just 3 categories: Gameplay, Visuals, and Creativity. Out of 15 total and highest average wins. Simple as that.

BodeyBode · Mar 26, 2017

Sigh...

Sn1p3r C · Mar 26, 2017

Only thing I'd add would be the concept of a "binary" in case a specific contest had strict conditions. You multiply the final % score by these binaries to let them act as "disqualifiers".

For instance, the Ghost in the Shell maps might have a "Has Spider Tank" - and those that don't meet the criteria automagically get graded at 0%. Or an art contest for "Sierra 117" themed maps could use it to filter out maps that aren't remotely close to theme. Or maybe matchmaking contest has a binary for "player containment is airtight."

This would be differentiated from a something like framerate performance, where you might have varying degrees of bad from flawless to unplayable - these mark a map as broken regardless of other score. It'd be up to contest organizers to set these criteria, of course.

Other than that, the system looks pretty good, and I agree with War's point that having judges split their scores over categories gives actionable feedback on certain areas without the need for writeups.

K a n t a l o p e · Mar 26, 2017

I'd rate maps like this:
Visual Performance - points/2 (0=Back to Forge[remove from judging until fixed], 1=very minor problems, 2=flawless)
Gameplay - points/20 (0-10=[remove from judging until improved])
Fun-Factor - points/10
Creativity(Layout) - points/10
Creativity(Art) - points/8
Total=50

I'd say this is the best system of judging. Creativity is very important, but it could negatively impact gameplay. A great playing map is fairly guaranteed points, but can it edge out good, but also creative maps?

Edit: This is for fairly open and relaxed rules

purely fat · Mar 27, 2017

Here is my system:
How good is the head the forger gives? x/5
How much does said forger suck up to judges? x/5
Are they a good kisser? x/10(This is crucial.)
Did the forger spend enough time stroking my ego? x/80

It is the perfect 100 point system.

On a serious note. Giving a map an empirical rating can be tricky. Because peoples opinions vary. I think you could grade it out with what people have presented here. But I would then test the maps in a focus group type environment with non-forgers that are good at the game and those that play it more casually. Then you can adjust your list based on feedback. This is more so for matchmaking than anything though.

Discussion on Contest Judging

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

LargerFiend Legendary
Senior Member

Agent Zero85 Legendary
Wiki Contributor Senior Member

BodeyBode Ancient

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

BodeyBode Ancient

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

BodeyBode Ancient

a Chunk Blockout Artist
Forge Critic Wiki Contributor Senior Member

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

BodeyBode Ancient

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

WAR Cartographer
The Creator Forge Critic

MultiLockOn Ancient
Forge Critic Banned Senior Member

BodeyBode Ancient

Sn1p3r C Halo 3 Era
Creative Force

K a n t a l o p e Promethean
Senior Member

purely fat The Fattest Forger
Forge Critic Senior Member

Share This Page

Discussion on Contest Judging

Goat Rock Paper Scissors Scrap Forge Critic Senior Member

Goat Rock Paper Scissors Scrap Forge Critic Senior Member

LargerFiend Legendary Senior Member

Agent Zero85 Legendary Wiki Contributor Senior Member

BodeyBode Ancient

Goat Rock Paper Scissors Scrap Forge Critic Senior Member

Goat Rock Paper Scissors Scrap Forge Critic Senior Member

BodeyBode Ancient

Goat Rock Paper Scissors Scrap Forge Critic Senior Member

BodeyBode Ancient

a Chunk Blockout Artist Forge Critic Wiki Contributor Senior Member

Goat Rock Paper Scissors Scrap Forge Critic Senior Member

BodeyBode Ancient

Goat Rock Paper Scissors Scrap Forge Critic Senior Member

WAR Cartographer The Creator Forge Critic

MultiLockOn Ancient Forge Critic Banned Senior Member

BodeyBode Ancient

Sn1p3r C Halo 3 Era Creative Force

K a n t a l o p e Promethean Senior Member

purely fat The Fattest Forger Forge Critic Senior Member

Share This Page

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

LargerFiend Legendary
Senior Member

Agent Zero85 Legendary
Wiki Contributor Senior Member

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

a Chunk Blockout Artist
Forge Critic Wiki Contributor Senior Member

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

Goat Rock Paper Scissors Scrap
Forge Critic Senior Member

WAR Cartographer
The Creator Forge Critic

MultiLockOn Ancient
Forge Critic Banned Senior Member

Sn1p3r C Halo 3 Era
Creative Force

K a n t a l o p e Promethean
Senior Member

purely fat The Fattest Forger
Forge Critic Senior Member