Discussion on Contest Judging

Discussion in 'Halo and Forge Discussion' started by Goat, Mar 26, 2017.

  1. Goat

    Goat Rock Paper Scissors Scrap
    Forge Critic Senior Member

    Messages:
    4,570
    Likes Received:
    14,945
    ForgeHub contests suggest that maps are judged on the following qualities:

    Eligible map submissions will be judged based on:


    "Game-type Support
    Does the map function for the game-type as necessary? Are all required objects and systems in place to successfully play a game?

    Fun-Factor
    Is the map fun to play on? Is the space fun to navigate through? Do the weapons, spawns, and layout promote a positive game experience? Is the map balanced and fair for the given game-type?

    Performance

    Does the map suffer from any negative visual or performance issues?

    Originality / Creativity

    Is the map interesting and unique in both design and appearance?"

    PROBLEM: These categories can be interpreted differently by the judges depending on which one matters most to them, leading to maps that are arbitrarily cut for reasons specific to the preferences and opinions of each judge.

    PROPOSAL: Eliminating bias is impossible, but diminishing it through a consistent and transparent assessment system can lead to more accurate representation of contest results.

    Use this thread to propose your own assessment system for any contest. Here is my system for 2v2 contests, where each category is streamlined and weighted 25%:

    Originality: Layout, Theme, Design - 25%
    5 - Strongly Agree
    4 - Agree
    3 - Indifferent
    2 - Disagree
    1 - Strongly Disagree

    Gameplay: Spawns, Weapons, Balance: - 25%

    5 - Strongly Agree
    4 - Agree
    3 - Indifferent
    2 - Disagree
    1 - Strongly Disagree

    Presentation: Lighting, Art, Aesthetics, Colors- 25%
    5 - Strongly Agree
    4 - Agree
    3 - Indifferent
    2 - Disagree
    1 - Strongly Disagree

    Performance: Framerate, Boundaries - 25%
    5 - Strongly Agree
    4 - Agree
    3 - Indifferent
    2 - Disagree
    1 - Strongly Disagree

    Here is the system in practice:

    Originality: Strongly Agree, 5 = 25%
    Gameplay: Agree, 4 = 20%
    Presentation: Indifferent, 3 = 15%
    Performance: Indifferent, 3 = 15%

    Percentages are added up and then given a total score for each individual judge. So this judge has given the map 75%, or 75 out of 100.

    Assuming this map receives four scores of 70, 75, 80 and 80, the total would be 305, the average of which is 76 rounded down. Therefore, this particular map is weighted at 76%, or 76/100


    This system assumes a few things:
    1. There are four judges. (I do not believe the number of judges negatively impacts the system.)
    2. Maps that are tested are all passed through this system
    3. Each judge gives the fairest and most accurate rating based on several playtests
    4. This system is used to grade maps, and that a map that is graded below another map is not necessarily ranked below, and therefore eliminated by default.

    TIEBREAKERS AND ELIMINATIONS:

    Maps that are similar in score would then be tested against one another repeatedly during elimination rounds, after which a decision is made. The difference between this decision under this system verses this decision under no system is that each Judge's assessment of the map is clearly communicated, and this criteria applies to each map.

    If there is a flaw in this system, feel free to correct or add onto it, but I'd also like to see other systems proposed.


     
    #1 Goat, Mar 26, 2017
    Last edited: Mar 26, 2017
  2. Goat

    Goat Rock Paper Scissors Scrap
    Forge Critic Senior Member

    Messages:
    4,570
    Likes Received:
    14,945
    @BodeyBode Explain to me how you arrived at your 18 and 19 number.

    You can arrive at the number 19 by adding a score of 70, 75, 80 and 80 together, taking the average which is 76, and then dividing that by 4 again. But why would you do that? You already have your average. There's no need to lower the scale.

    My proposal assumes 4 judges, 4 categories and 5 potential grades within that category, each weighted at 25%. Categories are not being compared to one another; rather, the sum total of each category is compared to the other judges' totals. If my math is wrong, then please point it out to me. But your numbers lead me to believe that you're misunderstanding the system. Your post seems to operate on a 1-25 scale system, which is not what I was suggesting. This is a 100% based grading scale where the judges' scores are averaged to give the map an assessment number to form a more concrete comparison point between maps.

    It's imperative to stress that this grading system does not automatically rank maps for elimination. It is a grading scale, not a ranking system.
     
    #2 Goat, Mar 26, 2017
    Last edited: Mar 26, 2017
  3. LargerFiend

    LargerFiend Legendary
    Senior Member

    Messages:
    338
    Likes Received:
    1,297
    I agree with the scale for the most part. One thing I thing needs to be changed though is having Performance removed as its own category. I think a map should be expected to not have any game breaking out-of-map spots and no framerate issues. If it does, it needs to be optimized until it works flawlessly; which isn’t too difficult, just takes a bit more time.


    Having it as a part of the scale will just dilute the ratings, assuming all considered maps have no technical issues.
     
    K a n t a l o p e likes this.
  4. Agent Zero85

    Agent Zero85 Legendary
    Wiki Contributor Senior Member

    Messages:
    400
    Likes Received:
    435
    An inherent bias is what makes a contest. Usually the judges share a majority opinion on the aspects on which they judge. This hasn't been in the case for many contests that have hosted judges who claim/cling to various design principles and philosophies. The system you show is imo "better" because it separates playability/replayability. (Whether the map is qualified for the contest, and whether it is fun playing.) And allows for a contest to be more "fair".

    The goal of the contest is to allow those with a subjectively better talent to win, but the question I have is: Do judges also play a role in the issue you described?
     
  5. BodeyBode

    BodeyBode Ancient

    Messages:
    342
    Likes Received:
    557
    Your system is 4 judges, 4 categories, 5 grades per category.

    For me, the number of judges and categories is irrelevant but simplification purposes we will use 4 categories

    If each of the category is weighed the same at 25% then each category can be worth a maximum of 25 points (4 categories, each worth 25 points would make a perfect map 100 points)


    Example:

    Map X
    Cat1- 22 points
    Cat2- 19 points
    Cat3- 23 points
    Cat4- 24 points
    88 points = 88%

    Map Y
    Cat1- 21 points
    Cat2- 20 points
    Cat3- 22 points
    Cat4- 24 points
    87 points = 87%

    If this same map were judged with only 5 grades per category it would look like this

    Map X
    Cat1- 5 strongly agree
    Cat2- 4 agree
    Cat3- 5 strongly agree
    Cat4- 5 strongly agree
    Total score 95%

    Map Y
    Cat1- 5 strongly agree
    Cat2- 4 agree
    Cat3- 5 strongly agree
    Cat4- 5 strongly agree
    Total score 95%
     
  6. Goat

    Goat Rock Paper Scissors Scrap
    Forge Critic Senior Member

    Messages:
    4,570
    Likes Received:
    14,945
    @BodeyBode Your first example doesn't make any sense. Where are you getting uneven numbers from?
     
    #6 Goat, Mar 26, 2017
    Last edited: Mar 26, 2017
  7. Goat

    Goat Rock Paper Scissors Scrap
    Forge Critic Senior Member

    Messages:
    4,570
    Likes Received:
    14,945
    In my experience, a submitted map was deemed "playable" before it was played, and the ones that were then played either had major or minor performance problems. I can see how it dillutes the rating though, but what would you replace the category with if you remove it?
     
  8. BodeyBode

    BodeyBode Ancient

    Messages:
    342
    Likes Received:
    557
    If each category is worth 25% of the total score. It's no different than 25 points. The numbers such as 19 are out of 25 possible points
     
  9. Goat

    Goat Rock Paper Scissors Scrap
    Forge Critic Senior Member

    Messages:
    4,570
    Likes Received:
    14,945
    Yes, if you divide twice, which you are not supposed to do. It's not a point system. You cannot directly award a map an uneven grade until you average the scores for a percentage of a 100% and then divide that again.

    I do not understand what you are trying to say. Does what I proposed have a problem, or are you saying a point system would be better?
     
    #9 Goat, Mar 26, 2017
    Last edited: Mar 26, 2017
  10. BodeyBode

    BodeyBode Ancient

    Messages:
    342
    Likes Received:
    557
    Dividing? I'm confused now.

    Yes, points would be a better system in my opinion
     
  11. a Chunk

    a Chunk Blockout Artist
    Forge Critic Wiki Contributor Senior Member

    Messages:
    2,670
    Likes Received:
    7,152
  12. Goat

    Goat Rock Paper Scissors Scrap
    Forge Critic Senior Member

    Messages:
    4,570
    Likes Received:
    14,945
    Well how would you give a map points? What determines the difference between an 88 and an 87? Is it some sort of scale or a perceived difference in quality by a single point?

    What is a point worth in a 25 based scale? Do you take one point off if there are bad spawns? Do you take 13 points off if the light in half of the rooms is too bright? Do you take 7 points off if you get frames in one corner? The way you are portraying it looks as ambiguous and arbitrary as the current way of doing it.

    Here's the way I sorted the Top 5 maps in the contest. I lined all of them up and gave them all a ranking based on who did what the best. So let's say I have 5 maps:

    Gameplay
    5 - Map B
    4 - Map A
    3 - Map C
    2 - Map E
    1 - Map D

    I think there were 6 of these categories, so the max points I could give a map was 30. At that point, it was a simple ranking and I didn't need to assess it any further.

    That's how you use points. They are small numbers comparing different values, not deducting based on an individual's whims.
     
    #12 Goat, Mar 26, 2017
    Last edited: Mar 26, 2017
  13. BodeyBode

    BodeyBode Ancient

    Messages:
    342
    Likes Received:
    557
    this doesn't look anything like what you wrote in your OP. How would you rate maps when there's more than 5 maps to judge?

    I gotta head back into work but I'll respond later.
     
  14. Goat

    Goat Rock Paper Scissors Scrap
    Forge Critic Senior Member

    Messages:
    4,570
    Likes Received:
    14,945
    Sigh

    I used a point system to rank the top 5 maps. That is completely separate from a percentage based assessment system to judge 20 or 30 maps. The onlt similairty between the two is that i parsed my assessment through a scale. But they serve two completely different purposes.

    You are suggesting a point based assessment system to grade the first waves of tested maps. I find that inefficient, and certainly not the same as percentages, because you cannot quantify the difference between numbers without a clearly demonstrated scale. In simpler words, you cannot - or at least have not in this case - told me how a map can get a 23 or a 22 in one category. That doesnt make any sense. You have not explained why a judge can arbitrarily deduct 2 points off of a category. The only way you can arrive at those uneven numbers is if you take several judges' scores in the categories, add them up, and take the average of them twice, but that is done after the fact and is still possible with the agree/disagree method.

    That is assuming i understand what you have posted thus far. There seems to be some sort of barrier for communication here.
     
    Xandrith, MultiLockOn and Box Knows like this.
  15. WAR

    WAR Cartographer
    The Creator Forge Critic

    Messages:
    1,568
    Likes Received:
    3,893
    Hey this sounds similar to what I proposed for our most recent Ghost in the Shell Forge contest. We had trouble collectively determining why maps were being chosen so this was going to help us out. The 4 categories are different because that is what 343 provided us with however I proposed a similar point system out of 5. Eventually, even ForgeHub's map rating system will support 4 categories. It provides so much more information to the author regardless of post game discussion or comments.

    gits.jpg
     
    fame28 and Goat like this.
  16. MultiLockOn

    MultiLockOn Ancient
    Forge Critic Banned Senior Member

    Messages:
    4,815
    Likes Received:
    12,124
    The system makes sense for certain points in judging. Early on you don't need this because you're just hacking away the bad maps. But once things get a bit closer this comes in handy.

    What doesn't make sense is your categories. Having performance be worth as much as visuals or gameplay is a little odd, that should just be something that's expected and not scored with.

    I would do just 3 categories: Gameplay, Visuals, and Creativity. Out of 15 total and highest average wins. Simple as that.
     
  17. BodeyBode

    BodeyBode Ancient

    Messages:
    342
    Likes Received:
    557
    Sigh...
     
    #17 BodeyBode, Mar 26, 2017
    Last edited: Mar 26, 2017
  18. Sn1p3r C

    Sn1p3r C Halo 3 Era
    Creative Force

    Messages:
    379
    Likes Received:
    578
    Only thing I'd add would be the concept of a "binary" in case a specific contest had strict conditions. You multiply the final % score by these binaries to let them act as "disqualifiers".

    For instance, the Ghost in the Shell maps might have a "Has Spider Tank" - and those that don't meet the criteria automagically get graded at 0%. Or an art contest for "Sierra 117" themed maps could use it to filter out maps that aren't remotely close to theme. Or maybe matchmaking contest has a binary for "player containment is airtight."

    This would be differentiated from a something like framerate performance, where you might have varying degrees of bad from flawless to unplayable - these mark a map as broken regardless of other score. It'd be up to contest organizers to set these criteria, of course.

    Other than that, the system looks pretty good, and I agree with War's point that having judges split their scores over categories gives actionable feedback on certain areas without the need for writeups.
     
    Goat likes this.
  19. K a n t a l o p e

    K a n t a l o p e Promethean
    Senior Member

    Messages:
    882
    Likes Received:
    1,474
    I'd rate maps like this:
    Visual Performance - points/2 (0=Back to Forge[remove from judging until fixed], 1=very minor problems, 2=flawless)
    Gameplay - points/20 (0-10=[remove from judging until improved])
    Fun-Factor - points/10
    Creativity(Layout) - points/10
    Creativity(Art) - points/8
    Total=50

    I'd say this is the best system of judging. Creativity is very important, but it could negatively impact gameplay. A great playing map is fairly guaranteed points, but can it edge out good, but also creative maps?

    Edit: This is for fairly open and relaxed rules
     
  20. purely fat

    purely fat The Fattest Forger
    Forge Critic Senior Member

    Messages:
    3,010
    Likes Received:
    5,899
    Here is my system:
    How good is the head the forger gives? x/5
    How much does said forger suck up to judges? x/5
    Are they a good kisser? x/10(This is crucial.)
    Did the forger spend enough time stroking my ego? x/80

    It is the perfect 100 point system.

    On a serious note. Giving a map an empirical rating can be tricky. Because peoples opinions vary. I think you could grade it out with what people have presented here. But I would then test the maps in a focus group type environment with non-forgers that are good at the game and those that play it more casually. Then you can adjust your list based on feedback. This is more so for matchmaking than anything though.
     
    fame28, a Chunk, Dunco and 1 other person like this.

Share This Page