The DRAGON benchmark comprises 28 tasks to evaluate model performance. Please check out the manuscript for information on all tasks [PENDING]. An overview is shown in the figure and table below.


Figure: Overview of the tasks in the DRAGON benchmark.
Table. Overview of the tasks in the DRAGON benchmark.

ID Name Task type Metric Sample report
T1 Adhesion presence SL Bin CLF AUC Report
T2 Pulmonary nodule presence SL Bin CLF AUC Report
T3 Kidney abnormality identification SL Bin CLF AUC Report
T4 Skin histopathology case selection SL Bin CLF AUC Report
T5 RECIST timeline SL Bin CLF AUC Report
T6 Histopathology cancer origin SL Bin CLF AUC Report
T7 Pulmonary nodule size presence SL Bin CLF AUC Report
T8 Pancreatic ductal adenocarcinoma size presence SL Bin CLF AUC Report
T9 Pancreatic ductal adenocarcinoma diagnosis SL MC CLF Unweighted Kappa Report
T10 Prostate radiology suspicious lesions SL MC CLF Linearly Weighted Kappa Report
T11 Prostate histopathology significant cancers SL MC CLF Linearly Weighted Kappa Report
T12 Histopathology tissue type SL MC CLF Unweighted Kappa Report
T13 Histopathology tissue origin SL MC CLF Unweighted Kappa Report
T14 Entailment diagnostic sentences SL MC CLF Linearly Weighted Kappa Report
T15 Colon histopathology diagnosis ML Bin CLF Macro AUC Report
T16 RECIST lesion size presence ML Bin CLF AUC Report
T17 Pancreatic ductal adenocarcinoma attributes ML MC CLF Unweighted Kappa Report
T18 Hip Kellgren-Lawrence scoring ML MC CLF Unweighted Kappa Report
T19 Prostate volume extraction SL Reg RSMAPE (ε = 4 cm3) Report
T20 Prostate specific antigen extraction SL Reg RSMAPE (ε = 0.4 ng/mL) Report
T21 Prostate specific antigen density extraction SL Reg RSMAPE (ε = 0.04 ng/mL2) Report
T22 Pancreatic ductal adenocarcinoma size measurement SL Reg RSMAPE (ε = 4 mm) Report
T23 Pulmonary nodule size measurement SL Reg RSMAPE (ε = 4 mm) Report
T24 RECIST lesion size measurement ML Reg RSMAPE (ε = 4 mm) Report
T25 Anonymization SL NER Macro F1 Report
T26 Medical terminology recognition SL NER F1 Report
T27 Prostate biopsy sampling ML NER Weighted F1 Report
T28 Skin histopathology diagnosis ML NER Weighted F1 Report

SL = single-label, ML = multi-label, Bin = binary, MC = multi-class, CLF = classification, Reg = regression, NER = named entity recognition, RSMAPE = Robust Symmetric Mean Absolute Percentage Error, RECIST = response evaluation criteria in solid tumors.