Challenge website is under construction.
The DRAGON benchmark comprises 28 tasks to evaluate model performance. Please check out the manuscript for information on all tasks [PENDING]. An overview is shown in the figure and table below.
Figure: Overview of the tasks in the DRAGON benchmark.
Table. Overview of the tasks in the DRAGON benchmark.
ID | Name | Task type | Metric |
---|---|---|---|
T1 | Adhesion presence | SL Bin CLF | AUC |
T2 | Pulmonary nodule presence | SL Bin CLF | AUC |
T3 | Kidney abnormality identification | SL Bin CLF | AUC |
T4 | Skin histopathology case selection | SL Bin CLF | AUC |
T5 | RECIST timeline | SL Bin CLF | AUC |
T6 | Histopathology cancer origin | SL Bin CLF | AUC |
T7 | Pulmonary nodule size presence | SL Bin CLF | AUC |
T8 | Pancreatic ductal adenocarcinoma size presence | SL Bin CLF | AUC |
T9 | Pancreatic ductal adenocarcinoma diagnosis | SL MC CLF | Unweighted Kappa |
T10 | Prostate radiology suspicious lesions | SL MC CLF | Linearly Weighted Kappa |
T11 | Prostate histopathology significant cancers | SL MC CLF | Linearly Weighted Kappa |
T12 | Histopathology tissue type | SL MC CLF | Unweighted Kappa |
T13 | Histopathology tissue origin | SL MC CLF | Unweighted Kappa |
T14 | Entailment diagnostic sentences | SL MC CLF | Linearly Weighted Kappa |
T15 | Colon histopathology diagnosis | ML Bin CLF | Macro AUC |
T16 | RECIST lesion size presence | ML Bin CLF | AUC |
T17 | Pancreatic ductal adenocarcinoma attributes | ML MC CLF | Unweighted Kappa |
T18 | Hip Kellgren-Lawrence scoring | ML MC CLF | Unweighted Kappa |
T19 | Prostate volume extraction | SL Reg | RSMAPE (ε = 4 cm3) |
T20 | Prostate specific antigen extraction | SL Reg | RSMAPE (ε = 0.4 ng/mL) |
T21 | Prostate specific antigen density extraction | SL Reg | RSMAPE (ε = 0.04 ng/mL2) |
T22 | Pancreatic ductal adenocarcinoma size measurement | SL Reg | RSMAPE (ε = 4 mm) |
T23 | Pulmonary nodule size measurement | SL Reg | RSMAPE (ε = 4 mm) |
T24 | RECIST lesion size measurement | ML Reg | RSMAPE (ε = 4 mm) |
T25 | Anonymization | SL NER | Macro F1 |
T26 | Medical terminology recognition | SL NER | F1 |
T27 | Prostate biopsy sampling | ML NER | Weighted F1 |
T28 | Skin histopathology diagnosis | ML NER | Weighted F1 |
SL = single-label, ML = multi-label, Bin = binary, MC = multi-class, CLF = classification, Reg = regression, NER = named entity recognition, RSMAPE = Robust Symmetric Mean Absolute Percentage Error, RECIST = response evaluation criteria in solid tumors.