lets say i have someone manually grading model outputs. Lets say we prompted "2+2" and the model output "5". is there a way in helicone to store what would be the correct model output e.g. "4"? either as "request scoring/grading" or as some alternative metadata (although it feels like this should be a first class citizen)?