The DRAGON benchmark comprises 28 tasks to evaluate model performance. Please check out the manuscript for information on all tasks [PENDING]. An overview is shown in the figure and table below.


Figure: Overview of the tasks in the DRAGON benchmark.
Table. Overview of the tasks in the DRAGON benchmark.

ID Name Task type Metric
T1 Adhesion presence SL Bin CLF AUC
T2 Pulmonary nodule presence SL Bin CLF AUC
T3 Kidney abnormality identification SL Bin CLF AUC
T4 Skin histopathology case selection SL Bin CLF AUC
T5 RECIST timeline SL Bin CLF AUC
T6 Histopathology cancer origin SL Bin CLF AUC
T7 Pulmonary nodule size presence SL Bin CLF AUC
T8 Pancreatic ductal adenocarcinoma size presence SL Bin CLF AUC
T9 Pancreatic ductal adenocarcinoma diagnosis SL MC CLF Unweighted Kappa
T10 Prostate radiology suspicious lesions SL MC CLF Linearly Weighted Kappa
T11 Prostate histopathology significant cancers SL MC CLF Linearly Weighted Kappa
T12 Histopathology tissue type SL MC CLF Unweighted Kappa
T13 Histopathology tissue origin SL MC CLF Unweighted Kappa
T14 Entailment diagnostic sentences SL MC CLF Linearly Weighted Kappa
T15 Colon histopathology diagnosis ML Bin CLF Macro AUC
T16 RECIST lesion size presence ML Bin CLF AUC
T17 Pancreatic ductal adenocarcinoma attributes ML MC CLF Unweighted Kappa
T18 Hip Kellgren-Lawrence scoring ML MC CLF Unweighted Kappa
T19 Prostate volume extraction SL Reg RSMAPE (ε = 4 cm3)
T20 Prostate specific antigen extraction SL Reg RSMAPE (ε = 0.4 ng/mL)
T21 Prostate specific antigen density extraction SL Reg RSMAPE (ε = 0.04 ng/mL2)
T22 Pancreatic ductal adenocarcinoma size measurement SL Reg RSMAPE (ε = 4 mm)
T23 Pulmonary nodule size measurement SL Reg RSMAPE (ε = 4 mm)
T24 RECIST lesion size measurement ML Reg RSMAPE (ε = 4 mm)
T25 Anonymization SL NER Macro F1
T26 Medical terminology recognition SL NER F1
T27 Prostate biopsy sampling ML NER Weighted F1
T28 Skin histopathology diagnosis ML NER Weighted F1

SL = single-label, ML = multi-label, Bin = binary, MC = multi-class, CLF = classification, Reg = regression, NER = named entity recognition, RSMAPE = Robust Symmetric Mean Absolute Percentage Error, RECIST = response evaluation criteria in solid tumors.