FlagEval(天秤)由智源研究院将联合多个高校团队打造,是一种采用“能力—任务—指标”三维评测框架的大模型评测平台,旨在提供全面、细致的评测结果。该平台已提供了 30 多种能力、5 种任务和 4 大类指标,共 600 多个维度的全面评测,任务维度包括 22 个主客观评测数据集和 84433 道题目。
Copyright Notice: Unless otherwise stated, all articles on this website are originally created and owned by AINAVNews. Without permission, no individual, media, website or group may reprint, plagiarize or reproduce the content of this website in other ways, or set up a mirror on servers that do not belong to our website. Otherwise, our website will reserve the right to pursue relevant legal responsibilities in accordance with the law.