The DRAGON Challenge 🐉


Artificial Intelligence (AI) can mitigate the global shortage of medical diagnostic personnel but requires large-scale datasets to train clinical algorithms to perform at an expert level. Natural Language Processing (NLP) shows great potential to annotate large volumes of unlabeled data from clinical routine and facilitate the training of these algorithms.

The DRAGON (Diagnostic Report Analysis: General Optimization of NLP) challenge aims to facilitate the development of NLP algorithms, including Large Language Models, for automated dataset curation. First, it provides a unique and publicly available benchmark for clinical NLP that spans 28 clinically relevant tasks with 28,824 annotated medical reports from five Dutch care centers. The tasks are designed to facilitate automated dataset curation, enabling high-quality, low-cost, large-scale annotation for training clinical algorithms. Secondly, we provide pretrained foundational LLMs that were pretrained using four million clinical reports from a sixth Dutch care center.

About the Challenge 🐲


The DRAGON Challenge introduces the first large-scale, publicly accessible benchmark for NLP using full-length clinical reports, featuring:

28 Clinically Relevant Tasks: From predicting diagnoses to extracting key medical measurements. See Tasks for a full list of tasks included in the benchmark.
Data from Five Dutch Care Centers: Encompassing over 28,000 annotated clinical reports across various imaging modalities and conditions.
Pretrained Foundational Models: With Large Language Models pretrained using four million clinical reports. Five architectures were pretrained using two pretraining strategies, and all models are publicly available on HuggingFace! See Pretrained Models for an overview.

Participation


Join leading researchers from around the globe in advancing the field of clinical NLP. Whether you are a student, a seasoned researcher, or part of a clinical team, your contribution can significantly impact the future of medical diagnostics. Start to innovate, develop and test cutting-edge NLP solutions in a real-world medical context in three steps:

1. Register: Get started by registering on the Grand Challenge platform and joining the challenge (clicking "Join").
2. Download Resources: Access our guidelines, baseline training code, and synthetic data to prepare your submission. Navigate to github.com/DIAGNijmegen/dragon_baseline for documentation on how to get started!
3. Submit Your Solution: Contribute to the advancement of clinical diagnostics by submitting your NLP models. Please read the Submission page for more information.

Contact

Have questions? You can contact me via email (Joeran.Bosma@radboudumc.nl), leave a message on the forum, or open an issue on GitHub.