ForgeHub contests suggest that maps are judged on the following qualities: Eligible map submissions will be judged based on: "Game-type Support Does the map function for the game-type as necessary? Are all required objects and systems in place to successfully play a game? Fun-Factor Is the map fun to play on? Is the space fun to navigate through? Do the weapons, spawns, and layout promote a positive game experience? Is the map balanced and fair for the given game-type? Performance Does the map suffer from any negative visual or performance issues? Originality / Creativity Is the map interesting and unique in both design and appearance?" PROBLEM: These categories can be interpreted differently by the judges depending on which one matters most to them, leading to maps that are arbitrarily cut for reasons specific to the preferences and opinions of each judge. PROPOSAL: Eliminating bias is impossible, but diminishing it through a consistent and transparent assessment system can lead to more accurate representation of contest results. Use this thread to propose your own assessment system for any contest. Here is my system for 2v2 contests, where each category is streamlined and weighted 25%: Originality: Layout, Theme, Design - 25% 5 - Strongly Agree 4 - Agree 3 - Indifferent 2 - Disagree 1 - Strongly Disagree Gameplay: Spawns, Weapons, Balance: - 25% 5 - Strongly Agree 4 - Agree 3 - Indifferent 2 - Disagree 1 - Strongly Disagree Presentation: Lighting, Art, Aesthetics, Colors- 25% 5 - Strongly Agree 4 - Agree 3 - Indifferent 2 - Disagree 1 - Strongly Disagree Performance: Framerate, Boundaries - 25% 5 - Strongly Agree 4 - Agree 3 - Indifferent 2 - Disagree 1 - Strongly Disagree Here is the system in practice: Originality: Strongly Agree, 5 = 25% Gameplay: Agree, 4 = 20% Presentation: Indifferent, 3 = 15% Performance: Indifferent, 3 = 15% Percentages are added up and then given a total score for each individual judge. So this judge has given the map 75%, or 75 out of 100. Assuming this map receives four scores of 70, 75, 80 and 80, the total would be 305, the average of which is 76 rounded down. Therefore, this particular map is weighted at 76%, or 76/100 This system assumes a few things: 1. There are four judges. (I do not believe the number of judges negatively impacts the system.) 2. Maps that are tested are all passed through this system 3. Each judge gives the fairest and most accurate rating based on several playtests 4. This system is used to grade maps, and that a map that is graded below another map is not necessarily ranked below, and therefore eliminated by default. TIEBREAKERS AND ELIMINATIONS: Maps that are similar in score would then be tested against one another repeatedly during elimination rounds, after which a decision is made. The difference between this decision under this system verses this decision under no system is that each Judge's assessment of the map is clearly communicated, and this criteria applies to each map. If there is a flaw in this system, feel free to correct or add onto it, but I'd also like to see other systems proposed.
@BodeyBode Explain to me how you arrived at your 18 and 19 number. You can arrive at the number 19 by adding a score of 70, 75, 80 and 80 together, taking the average which is 76, and then dividing that by 4 again. But why would you do that? You already have your average. There's no need to lower the scale. My proposal assumes 4 judges, 4 categories and 5 potential grades within that category, each weighted at 25%. Categories are not being compared to one another; rather, the sum total of each category is compared to the other judges' totals. If my math is wrong, then please point it out to me. But your numbers lead me to believe that you're misunderstanding the system. Your post seems to operate on a 1-25 scale system, which is not what I was suggesting. This is a 100% based grading scale where the judges' scores are averaged to give the map an assessment number to form a more concrete comparison point between maps. It's imperative to stress that this grading system does not automatically rank maps for elimination. It is a grading scale, not a ranking system.
I agree with the scale for the most part. One thing I thing needs to be changed though is having Performance removed as its own category. I think a map should be expected to not have any game breaking out-of-map spots and no framerate issues. If it does, it needs to be optimized until it works flawlessly; which isn’t too difficult, just takes a bit more time. Having it as a part of the scale will just dilute the ratings, assuming all considered maps have no technical issues.
An inherent bias is what makes a contest. Usually the judges share a majority opinion on the aspects on which they judge. This hasn't been in the case for many contests that have hosted judges who claim/cling to various design principles and philosophies. The system you show is imo "better" because it separates playability/replayability. (Whether the map is qualified for the contest, and whether it is fun playing.) And allows for a contest to be more "fair". The goal of the contest is to allow those with a subjectively better talent to win, but the question I have is: Do judges also play a role in the issue you described?
Your system is 4 judges, 4 categories, 5 grades per category. For me, the number of judges and categories is irrelevant but simplification purposes we will use 4 categories If each of the category is weighed the same at 25% then each category can be worth a maximum of 25 points (4 categories, each worth 25 points would make a perfect map 100 points) Example: Map X Cat1- 22 points Cat2- 19 points Cat3- 23 points Cat4- 24 points 88 points = 88% Map Y Cat1- 21 points Cat2- 20 points Cat3- 22 points Cat4- 24 points 87 points = 87% If this same map were judged with only 5 grades per category it would look like this Map X Cat1- 5 strongly agree Cat2- 4 agree Cat3- 5 strongly agree Cat4- 5 strongly agree Total score 95% Map Y Cat1- 5 strongly agree Cat2- 4 agree Cat3- 5 strongly agree Cat4- 5 strongly agree Total score 95%
In my experience, a submitted map was deemed "playable" before it was played, and the ones that were then played either had major or minor performance problems. I can see how it dillutes the rating though, but what would you replace the category with if you remove it?
If each category is worth 25% of the total score. It's no different than 25 points. The numbers such as 19 are out of 25 possible points
Yes, if you divide twice, which you are not supposed to do. It's not a point system. You cannot directly award a map an uneven grade until you average the scores for a percentage of a 100% and then divide that again. I do not understand what you are trying to say. Does what I proposed have a problem, or are you saying a point system would be better?
Well how would you give a map points? What determines the difference between an 88 and an 87? Is it some sort of scale or a perceived difference in quality by a single point? What is a point worth in a 25 based scale? Do you take one point off if there are bad spawns? Do you take 13 points off if the light in half of the rooms is too bright? Do you take 7 points off if you get frames in one corner? The way you are portraying it looks as ambiguous and arbitrary as the current way of doing it. Here's the way I sorted the Top 5 maps in the contest. I lined all of them up and gave them all a ranking based on who did what the best. So let's say I have 5 maps: Gameplay 5 - Map B 4 - Map A 3 - Map C 2 - Map E 1 - Map D I think there were 6 of these categories, so the max points I could give a map was 30. At that point, it was a simple ranking and I didn't need to assess it any further. That's how you use points. They are small numbers comparing different values, not deducting based on an individual's whims.
this doesn't look anything like what you wrote in your OP. How would you rate maps when there's more than 5 maps to judge? I gotta head back into work but I'll respond later.
Sigh I used a point system to rank the top 5 maps. That is completely separate from a percentage based assessment system to judge 20 or 30 maps. The onlt similairty between the two is that i parsed my assessment through a scale. But they serve two completely different purposes. You are suggesting a point based assessment system to grade the first waves of tested maps. I find that inefficient, and certainly not the same as percentages, because you cannot quantify the difference between numbers without a clearly demonstrated scale. In simpler words, you cannot - or at least have not in this case - told me how a map can get a 23 or a 22 in one category. That doesnt make any sense. You have not explained why a judge can arbitrarily deduct 2 points off of a category. The only way you can arrive at those uneven numbers is if you take several judges' scores in the categories, add them up, and take the average of them twice, but that is done after the fact and is still possible with the agree/disagree method. That is assuming i understand what you have posted thus far. There seems to be some sort of barrier for communication here.
Hey this sounds similar to what I proposed for our most recent Ghost in the Shell Forge contest. We had trouble collectively determining why maps were being chosen so this was going to help us out. The 4 categories are different because that is what 343 provided us with however I proposed a similar point system out of 5. Eventually, even ForgeHub's map rating system will support 4 categories. It provides so much more information to the author regardless of post game discussion or comments.
The system makes sense for certain points in judging. Early on you don't need this because you're just hacking away the bad maps. But once things get a bit closer this comes in handy. What doesn't make sense is your categories. Having performance be worth as much as visuals or gameplay is a little odd, that should just be something that's expected and not scored with. I would do just 3 categories: Gameplay, Visuals, and Creativity. Out of 15 total and highest average wins. Simple as that.
Only thing I'd add would be the concept of a "binary" in case a specific contest had strict conditions. You multiply the final % score by these binaries to let them act as "disqualifiers". For instance, the Ghost in the Shell maps might have a "Has Spider Tank" - and those that don't meet the criteria automagically get graded at 0%. Or an art contest for "Sierra 117" themed maps could use it to filter out maps that aren't remotely close to theme. Or maybe matchmaking contest has a binary for "player containment is airtight." This would be differentiated from a something like framerate performance, where you might have varying degrees of bad from flawless to unplayable - these mark a map as broken regardless of other score. It'd be up to contest organizers to set these criteria, of course. Other than that, the system looks pretty good, and I agree with War's point that having judges split their scores over categories gives actionable feedback on certain areas without the need for writeups.
I'd rate maps like this: Visual Performance - points/2 (0=Back to Forge[remove from judging until fixed], 1=very minor problems, 2=flawless) Gameplay - points/20 (0-10=[remove from judging until improved]) Fun-Factor - points/10 Creativity(Layout) - points/10 Creativity(Art) - points/8 Total=50 I'd say this is the best system of judging. Creativity is very important, but it could negatively impact gameplay. A great playing map is fairly guaranteed points, but can it edge out good, but also creative maps? Edit: This is for fairly open and relaxed rules
Here is my system: How good is the head the forger gives? x/5 How much does said forger suck up to judges? x/5 Are they a good kisser? x/10(This is crucial.) Did the forger spend enough time stroking my ego? x/80 It is the perfect 100 point system. On a serious note. Giving a map an empirical rating can be tricky. Because peoples opinions vary. I think you could grade it out with what people have presented here. But I would then test the maps in a focus group type environment with non-forgers that are good at the game and those that play it more casually. Then you can adjust your list based on feedback. This is more so for matchmaking than anything though.