So this is Benchmarks standardized tests measuring model performance and more stuff stuff I made up and then we're done.