HELM全称Holistic Evaluation of Language Models(语言模型整体评估)是由斯坦福大学推出的大模型评测体系,该评测方法主要包括场景、适配、指标三个模块,每次评测的运行都需要指定一个场景,一个适配模型的提示,以及一个或多个指标。它评测主要覆盖的是英语,有7个指标,包括准确率、不确定性/校准、鲁棒性、公平性、偏差、毒性、推断效率;任务包括问答、信息检索、摘要、文本分类等。
Copyright Notice: Unless otherwise stated, all articles on this website are originally created and owned by AINAVNews. Without permission, no individual, media, website or group may reprint, plagiarize or reproduce the content of this website in other ways, or set up a mirror on servers that do not belong to our website. Otherwise, our website will reserve the right to pursue relevant legal responsibilities in accordance with the law.