M
Matt Pocock @mattpocockuk
Friday, February 7, 2025 import
Tweet
LLM-as-a-judge in a few lines of code. Important points: - LLM's are bad at numeric scales, so we ask it to return a text enum which we then convert to a scale. - Super useful for situations where deterministic tests just can't work. https://t.co/ow8sydQWmb