WBC Scoresheets – a few thoughts

I am aware there is some potential for me seeming like an arrogant so and so in this post, but it really is just about having a bit of a discussion.

It is no surprise that I am a big fan of barista competitions, but having recently gone through the UK judges workshop there are a couple of things I would like to post about and get some discussion going on.  First off an issue that both Anette and I find very frustrating:

The Scale of Words

For those unfamiliar with the words they are used to quantify the 0-6 scale used:

0 – Unacceptable
1 – Acceptable
2 – Average
3 – Good
4 – Very Good
5 – Excellent
6 – Extraordinary

Let’s start with 0 & 1.  I can see why they chose “Unacceptable” for 0 – if a judge is giving you no points whatsoever you must have done something pretty wrong.  However, I think using “Acceptable” creates an issue in the mind of the judge.  The drink might be very bad, but could certainly be worse.  Judges will often revert to the words – is this drink acceptable?  It may not be, but surely a single point out of six is punishment enough?

I guess it comes down to the difference in how numbers are perceived by judges and by competitors.  A score of 3 and below does not feel good.  Despite the words, a 3 feels mediocre.  However, a judge will often hold back from giving a 3 asking themselves if “good” is really the word to describe the drink.

Steps of 0.5 are allowed between 1 and 6, but these don’t come with words.  What is halfway between “good” and “very good”?  It is a question that needs to be answered as you see a lot of 3.5s awarded.  “Really quite good” perhaps?

Using “Average” to describe 2 is also a bit depressing.  I would have thought average would have been in the middle – so a 3?  Are we saying that we expect the average competing barista to only score 2 in the 6 point boxes?

Choosing the language is obviously very difficult.  It would be hard to replace “Acceptable” with a word that wasn’t more damning.  I would argue that as a barista I would be happier with a numerical score, and then written feedback alongside it indicating both the problem and a possible solution.  (i.e. scoring 2.5 for tactile balance on an espresso, with a note saying “the shot was lacking in body, likely due to fast brew/underextraction.  Perhaps a slower brew would improve the body”)

I know a lot of people like and use the words, believing them to be an important frame of reference.  I’d be very interesting to hear people’s suggestions for alternative words in the comments.  Would people like to get rid of the words?  Do they think they are fine as they are?

The Scoresheets

It seems churlish to complain about something and not at least offer some sort of solution.  The layout of the scoresheets hasn’t really changed in 7 or 8 years.  Rules have come and gone but the layout has been pretty rigid.  I took the Sensory Scoresheet and moved a few things around, changed a couple of words but it is designed to be used with the current rules.

You can view it here.

The changes are based on how I use a scoresheet, so perhaps it says something about my judging!

First off – intros have changed a lot.  Competitors often deliver a lot of information in the first 90 seconds, including details about the coffee(s) they are using.  I wanted a dedicated space where I could take notes.  Previously I had used the espresso section, but it quickly becomes crowded, especially if you want to write detailed feedback on the taste of the drink.

Secondly – the boxes switched sides.  I wanted more space and a stronger emphasis on notes.  Leaving a wide open space to the right makes it even more explicit that judges should be filling this up completely with lots of useful notes.  Returning a scoresheet without detailed notes should be grounds for disqualifying a judge.  Only one barista gets a prize, the others get the scoresheets and feedback from the event – so it had better be damned good!

Thirdly – circles.  This is something a lot of judges do already – draw a little circle to better communicate what was wrong with the visuals of the espresso/cappuccino.  Interestingly the UK judges have come up with a slightly more complex system for noting down the visuals of drinks to better communicate scores – especially to other judges debriefing a competitors on sheets they didn’t write.  I like the idea – though I feel like it would make a nice ancillary piece of info, rather than replacing words, and helpful advice.   An area to watch nonetheless.

Fourth – a little rewording.  In an effort to squeeze more notation space onto the sheets I trimmed a few words.  In other cases I added words that the rules say to look for but hadn’t been included on the sheets.  Thoughts and comments on this very welcome.

Ultimately I wanted more space to write notes, because I think that will improve the use of the sheets returned to the competitors.  Would love to hear some feedback – from baristas, judges or anyone else?

16 Comments

  1. Just on the points and words used for the points – this discussion’s been going on for ages. I think Sherri Johns is responsible for the wording (I could be wrong though); people were asking about this at my first Judges’ training seminar for the USBC in 2003 and IIRC, we were told something like the following (paraphrasing here):

    “We’re at the USBC (WBC), so therefore the baristas in the competition are already pretty good and better than most baristas in the field – so even their poorest showings are going to be acceptable by most cafes’ standards, hence the 1 being “acceptable””

    The argument some had was to start baristas at 3 (“good”) and scale them up or down depending on what they do. Others felt start thm at 6 (I think it was “exceptional” at the time) and they lose points depending on how well they execute. Still others started at 0 (actually at “1″ – we were told, as judges, to never dole out a zero for its demoralising nature) and add points as their performance went on, ie, earn as you go.

    I always felt points should be like school, (ie, 50% or 3 would be “passing”) but in a way I see why this cuppers-score motif (that, after all, is what this scoring is modelled after – everything above 1 point is “better than the average barista” – and in cupping this “50 points is your starting point” is a similar model)… why this model was chosen.

    I’ve seen MANY baristas get demoralised, upset, even pissed off if they got a 2, or a 2.5 or even a 3. Even 3.5s left some baristas upset. Imagine if we set the medium (3) as the passing mark, and actually allowed for room under the 3 for various levels of issues, problems, incomplete, taste taints, etc. It doesn’t give much wriggle room above 3, and as the model evolved, there’ll come a time where anything under, say 5 points, will be seen as a disapointment by a competiting barista.

  2. re: “scale of words”
    The scale is only as useful as an individual adheres to it. This is partly why the USBC strictly requires judges to utilize the words instead of the number, during calibration. If “a score of 3 and below does not feel good,” even though the judge meant “good” when they marked a “3,” then it means that getting a score of “good” “does not feel good.” There’s a disconnect there, and more times than not, the barista doesn’t know or care about the words that correspond with the score. There are many reasons for this, I suppose.
    I do think that you’re perhaps focusing too much on the scale though. As long as the scale is useful and represents a valuable calibration, it’s useful enough, no?
    My greater concern is how the scale is translated to other languages. However, either way, the scale is nothing on its own… it requires calibration. What good is a Cup of Excellence or SCAA cupping form evaluation scale, if there is no calibration?
    re: The Scoresheets
    Mr. Morrissey also opines often about how important notes and feedback are. I agree on principle, but two thoughts on the subject:
    I’d rather have a judge who scores consistently and accurately and in calibration, than a judge who provides copious notes. Quantity does not mean quality, and words very often fail to adequately communicate the sensory experience. Perhaps a cop-out, but something to consider.
    2nd thought: I have, by my own principles, major problems with your earlier sample note:

    “i.e. scoring 2.5 for tactile balance on an espresso, with a note saying “the shot was lacking in body, likely due to fast brew/underextraction.  Perhaps a slower brew would improve the body”

    I train judges NEVER to ‘play barista’ in their feedback. Very few judges (arguably no judge at all) are qualified to be assessing technique or lack thereof from the sensory table. I encourage customer-side feedback (“the shot lacked sweetness,” “the foam was dry and cold,” “sig drink did not taste as described” etc.) instead of barista-side feedback (“didn’t use enough coffee in the portafilter basket,” “steamed too aggressively,” etc.). MOST of the time, barista-side feedback is misguided and completely undermines the rest of that judge’s feedback. Haven’t you ever gotten feedback from a judge that you knew was wrong or baseless, and decided that judge was ridiculous and stupid?
    I lied… thought #3: circles. Does this encourage the judges to hyperfocus on flaws and little issues, rather on the general visual assessment?
    For better or for worse, in the USBC system, the new paradigm that we instituted in the 2008-2009 competition year led to a much improved correlation between the judges evaluation and the barista’s understanding, which in practice meant higher scores overall.
    Anyway, just a few choice thoughts.

  3. I get your point on the scales – but if it is all about calibration then surely the numbers would be enough on their own – they are for CoE? Removing them, and calibrating, may actually improve a judges ability to score consistently and without subconscious bias.

    I’d rather have a judge who scores consistently and accurately and in calibration, than a judge who provides copious notes.

    In what scenario? At a regional or at the WBC finals? I ask this because very few of the scoresheets filled out a year are done so in the finals. Most are at regional level and if the competition is going to genuinely succeed at increasing coffee quality (if that really is to be a defined goal) then the baristas who compete and don’t win need to understand what went wrong. They need to be better baristas as a result of the competition. If it is just about picking a winner (which I don’t think it is) then accurate numbers are enough.

    I wouldn’t describe someone’s shot as being fast/underextracted without checking in with the tech judges first to better understand why the shot tasted as it did. There needs to be, I think, greater interaction between techs and sensory judges in the post performance calibration. I get your point about judges undermining themselves – I don’t think judges should give emphatic advice on how to solve a problem, but a competitor is going to ask you how to fix that, most likely face to face in a debrief. I don’t think it would go down well to not offer any advice at all. “The espresso wasn’t very good, and I am not going to tell you how it might have been improved. But keep trying!”

    I don’t think the circles cause judges to hyperfocus any more than they already do. I wondered if it would serve as a useful memory aid for debrief. Just getting a score of a 3 for your art from one judge is a little frustrating if you don’t really understand why, and sometimes a little diagram helps.

    When you say “new paradigm”, what exactly do you mean?

  4. New paradigm: use the words, not the numbers. Assess what was GOOD about it, rather than cataloguing flaws. Be there to support the barista, not provide the opponent. Finally, and perhaps the most important: contrary to popular and oft repeated belief, THE JUDGES DO NOT “SELECT” THE CHAMPION! THE CHAMPIONS SELECT THEMSELVES! We have a semi-joking saying: the judge’s job is to be like the hole on the golf course… merely record what happened. Far too many judges all over are injecting too much of themselves to the detriment of the competitors.

    Also keep something important in mind regarding feedback, James: you are the 2007 WBC Champion. Of course barista competitors will and should look to you for technical advice. Without needing to name names though, most judges are NOT WBC Champions and would never be good enough to be a local or regional barista competition champion… and represent the majority of judges out there. Ya know what I mean?

  5. I think there are levels of technical feedback and we should be able to expect a minimum level from a qualified judge.
    If a shot is a 15 second, 45ml single then there really should be no need to avoid some basic advise in terms of how to improve it. If you can’t offer constructive criticism on that kind of shot (which are seen in many regionals round the world) then you absolutely should not be judging.

    I am not saying judges out to be giving out specific brew recipe instruction/opinion.

    On a side note I still believe at least tech judges, if not all judges, should be required to do an 8 drink performance (no sig) in order to qualify as a WBC, or National judge. Just perform to a basic level, nothing fancy, but at least go through the process of being on the other side of the table.

    Also – if we are recording what is happening – surely lots of space for detailed notes is a good thing!

  6. AMEN to that comment re the judges should do the capp and espresso rounds (to a minimum score level or higher!) to get their judges’ cert.

    Hopefully if you or someone else suggests it at a National or WBC judges training session, it won’t garner chuckles the next time it’s brought up. Many judge-candidates and trainers found the idea funny a few years ago.

  7. Some good points and suggestions here. The WBC scoresheets could absolutely benefit from a little re-wording and consolidating. I would agree that a scale is needed and the words help judges find some meaning/relativety in the numbers. As Nick noted, calibration is more important than the wording. Its all relative, and I discourage people from comparing scores between competitions. The scores can be inflated or deflated by a lot of different factors, including the overall quality of the competitors, who goes first, etc.

    I would agree that the circles probably don’t belong on the score sheet as they relate to only one score in each category and they encourage judges to create notes and artistic renderings that we have never provided training on. (Why don’t we just use photos anyhow to help better communicate to the competitors what the actual beverage appearances were? We have the technology.) I’d rather that they focus their time on the taste balance, a much more valuable score, than trying to draw a picture of the beverage. (And who can draw a nice looking espresso with a pencil anyhow?)

    One item in wording that I see many judges get hung up on is in the list of words that follows Taste Balance of Espresso. “Sweet, Acidic, Bitter.” As was explained to me by the person (who claims to have) created that list, it was never meant to be an exhaustive list of words. It was just a list of possible examples. In the rules, we changed the language to say “…a balance of sweet, acidic AND/OR bitter.” Still, I see some judges (esp. new ones) using that as a checklist. A balanced shot does not have to have ALL THREE OF THOSE to be a balanced shot and I think the score sheet words should make that more obvious. (And its not intended to be an exhaustive list of the possible elements that create balance. Maybe it even needs an “etc.”)

    As part of the newer judges training programs we are piloting in some NB’s, we ARE in fact requiring judges to prepare shots of espressos in teams. We have also tried having tech judges pull shots, not necessarily in a flight of 8 drinks as the logistics of that are not always reasonable. However the next time new WBC Judge certifications are done, its very likely there will be more hands-on and baristaing requirements for WBC tech judges. Considering the current pool of seven WBC certified tech judges, I am highly confident that ALL OF THEM could pull 8 espressos and capps successfully. The best tech judges I know are also excellent baristas.

    I agree that tech judges should be able to offer some suggestions with their feedback, such as “try using more coffee?” or “try slowing down the shot/fining up the grind?” but that has to be based on communicating with the sensory judges, as you suggested, to get a complete picture of what happened. And these suggestions are probably best delivered face-to-face vs. on the score sheet.

    Some of these comments and suggestions would be worth communicating on the WBC website where they can be submitted to the WBC R&R Committee:
    http://www.worldbaristachampionship.com/contact/rules-submissions

    And finally James, why are you not a WBC Certified Judge??? (And why do we only have ONE former WBC Champion – Morrissey – who is, or attempted to be, a WBC Judge?) :) We would all love to see more of the former champions judging in the WBC!

  8. I’ve certainly been frustrated about the words in the scoresheet many times, as I like to use them as a guidance and maybe also as communication back to the competing barista.

    The most problematic one is “Average”. What does that mean? Average of what? All the espressos served in the world? In the WBC? In this particular competition? I’ve always wanted to change that to “Mediocre”.

    I like your suggestions for changes in the scoring sheet. I like the space for more comments, and I agree that it is really important to give good comments, and some hints. However, I see Nickchos point on the judges not playing barista..

    Interesting discussion anyway.

    Best,
    E.

  9. I am pretty sure Fritz is a judge isn’t he? You can’t forget the Storm….

    I didn’t go for WBC certification for very practical reasons – either Anette and I could travel to the certification and we thought she was the better candidate. I hope to get more involved in the future. I will give Gwilym a nudge too!

    I will also put together a few suggestions for the R&R committee in the next couple of days.

  10. As a competitor I’ve always looked at the comments on scoresheets as an overture to further conversation. Even for those of us who take care to time the presentation of information appropriately, there’s not always enough time for a judge to listen to what I’m saying, then experience the beverage, and then make extensive notes without possibly missing crucial details of what I’m saying. When judges are available to me at length after a competition, my scores improve the next time out.

    Personally, the numbers never mean much to me.

  11. Wording should be more like this I reckon (with dictionary definitions just to show it is a linear progression):

    0 – Unacceptable (not satisfactory)
    1 – Poor (worse than usual, expected, or desired of)
    2 – Acceptable (adequate, satisfactory)
    3 – Good (desired or approved of)
    4 – Fine (of high quality)
    5 – Excellent (extremely good)
    6 – Exceptional (unusually good, outstanding)

    The only problem with this wording is that ‘Poor’ is sandwiched between ‘Acceptable’ and ‘Unacceptable’, boding the question: is it acceptable of not?!

    I did thin about using ‘hopeless’ or ‘crappy’ instead of unacceptable, but thought against it! The fact of the matter is that there has to be a word in between acceptable and unacceptable otherwise the leap is just too big. For that reason I placed ‘poor’ in the no-mans land and I think it works, especially when backed up by the definitions.

    I don’t think that there needs to be descriptors for half points. The descriptions are there to give a numeric reference, not the other way round. If a drink is too good to be ‘Good’, but not good enough to be ‘Fine’, it has to be scored a 3.5, exactly what English word that translates too is irrelevant, the points have been scored.

    I am of course in agreement that the scoring should be backed up by notes as much as possible.

  12. With the risk of making myself a laughing stock, I’ll take my chances on commenting anyway. I’ve in fact never been to a barista competition, but this discussion resembles what occurs in school grading and maybe the latter might be of relevance anyway, although posing a slightly alternative view on the main topic.

    Scale
    ======
    A major question in Norwegian school system lately has been: Is a graded scale contructed from the top and downwards, or is it a scale in which each grade has its own well defined value? In evaluating an espresso: for each tasting, are you looking for how far below 6 that espresso is, or are there well defined criteria for the different grades? The two extremes of giving i.e. grade 3 would be:

    either

    “because this cup lacks this, this, that, that and this virtue, it is three grades below 6 and is thus given grade 3″

    or

    “this cup has got this, this, this, this and that virtue. Hence we would give it a grade 3. For it to reach grade four, it would in addition need to have this and this virtue”.

    The difference is: the first way of seeing it would be to define the result as how far one is from the ideal (rating 6). The second would be to value it for what it is and give indications of how to do better next time (ref. Anthony’s comment above). The last version is considered to be more constructive that the first, helping the student (barista) to make an even better effort next time.

    Of course, the last model is quite labourious for the judges in beforehand. Criteria must be constructed for all grades prior to the competition. However, when this is done the scoring might be both more consistent and transparent. In that respect, I’d say that Tristan’s descriptions in parentheses are more precise and should be easier to judge by than the ones in bold (the words in parentheses are indeed one step on the way to consistent criteria).

    Yes, there are indeed great differences between baristas attending a competition and pupils attending compulsory education. But who knows, maybe there might be relevant meeting points anyway.

  13. Had to rummage around the dictionary a bit, trying to find words that I as a non native English speaker could sort of see fitting onto a scale of 0-6… Not easy to find the ones that have the right connotations for everyone, some of them sound a bit harsh and others just condescending… How to choose the right ones that everyone will relate to and not be upset by?

    Based on a few school and university grading systems, a few ideas:

    0- Fail/Poor/Inadequate/Insufficient/Unacceptable/Unsatisfactory
    1- Marginal/Adequate/Sufficient/Passable
    2- Fair/Satisfactory/Acceptable
    3-Good
    4-Very Good
    5-Excellent
    6- Outstanding/Superior/Exceptional

  14. My 2 cents as a judge for drum competitions:
    I think you will always fail with wording if it has to fit for all parts of judging. I recommend separate and significant explanatory keywords for each part like taste, tactile, crema etc. (eventually more than once for each point).
    Such an “explanation table” will be more helpful for the judges on stage but also for the competitors after getting their scoresheets. It decreases also the amount of notes on the scoresheets (which is still a good thing) and for regionals each keyword can correlate to some basic advices.

    A lot of them is just a “write down of what is going on in a judge’s head” but it will also causes some interesting debates I think…

  15. I’m agree d on that one, if judges are to judge, they themselves should be able to explain the details, offer constructive advice and I think its more credible if they have or can preform the task at hand.. producing well made coffee & milk. This would solve a lot of frustration on the Barista’s side of thing’s. If the industry as a whole is to evolve and improve, the judging itself needs to evolve along with the every charging Barista’s approach to it.

  16. I definitely agree with you on the wording / number scoring. If I were to judge the espressos I can get around me (Albany, NY area) most would receive a 0, a few would receive a 1, even less a 2, and only one would receive a 3 or a 4.

    Only 50% score for that one good espresso? I think they deserve a bit better than that. And I hate having to give a miserable 33% to my favourite coffee house. Oh dear.

Submit a comment