The DRAGON Challenge 🐉¶
Artificial Intelligence (AI) can mitigate the global shortage of medical diagnostic personnel but requires large-scale datasets to train clinical algorithms to perform at an expert level. Natural Language Processing (NLP) shows great potential to annotate large volumes of unlabeled data from clinical routine and facilitate the training of these algorithms.
The DRAGON (Diagnostic Report Analysis: General Optimization of NLP) challenge aims to facilitate the development of NLP algorithms, including Large Language Models, for automated dataset curation. First, it provides a unique and publicly available benchmark for clinical NLP that spans 28 clinically relevant tasks with 28,824 annotated medical reports from five Dutch care centers. The tasks are designed to facilitate automated dataset curation, enabling high-quality, low-cost, large-scale annotation for training clinical algorithms. Secondly, we provide pretrained foundational LLMs that were pretrained using four million clinical reports from a sixth Dutch care center.
About the Challenge 🐲¶
The DRAGON Challenge introduces the first large-scale, publicly accessible benchmark for NLP using full-length clinical reports, featuring:
28 Clinically Relevant Tasks: From predicting diagnoses to
extracting key medical measurements. See
Tasks for a full list of
tasks included in the benchmark.
Data from Five Dutch Care Centers: Encompassing over 28,000
annotated clinical reports across various imaging modalities and
conditions.
Pretrained Foundational Models: With Large Language Models
pretrained using four million clinical reports. Five architectures were
pretrained using two pretraining strategies, and all models are publicly
available on HuggingFace! See Pretrained
Models for an
overview.
Participation¶
Join leading researchers from around the globe in advancing the field of clinical NLP. Whether you are a student, a seasoned researcher, or part of a clinical team, your contribution can significantly impact the future of medical diagnostics. Start to innovate, develop and test cutting-edge NLP solutions in a real-world medical context in three steps:
1. Register: Get started by registering on the Grand Challenge
platform and joining the challenge (clicking "Join").
2. Download Resources: Access our guidelines, baseline training
code, and synthetic data to prepare your submission. Navigate
to github.com/DIAGNijmegen/dragon_baseline for
documentation on how to get started!
3. Submit Your Solution: Contribute to the advancement of clinical
diagnostics by submitting your NLP models. Please read the
Submission page for more information.
Contact¶
Have questions? You can contact me via email (Joeran.Bosma@radboudumc.nl), leave a message on the forum, or open an issue on GitHub.