Diagnostic Report Analysis: General Optimization of NLP Banner

The DRAGON Challenge 🐉¶

Artificial Intelligence (AI) can mitigate the global shortage of medical diagnostic personnel but requires large-scale datasets to train clinical algorithms to perform at an expert level. Natural Language Processing (NLP) shows great potential to annotate large volumes of unlabeled data from clinical routine and facilitate the training of these algorithms.

The DRAGON (Diagnostic Report Analysis: General Optimization of NLP) challenge aims to facilitate the development of NLP algorithms, including Large Language Models, for automated dataset curation. It provides a unique and publicly available benchmark for clinical NLP that spans 28 clinically relevant tasks with 28,824 annotated medical reports from five Dutch care centers. The tasks are designed to facilitate automated dataset curation, enabling high-quality, low-cost, large-scale annotation for training clinical algorithms.

About the Challenge 🐲¶

The DRAGON Challenge introduces the first large-scale, publicly accessible benchmark for clinical NLP, featuring:

🏥 Data from Five Dutch Care Centers: Encompassing over 28,000 annotated medical reports across various imaging modalities and conditions.
📋 28 Clinically Relevant Tasks: From predicting diagnoses to extracting key medical measurements. See Tasks for a full list of tasks included in the benchmark. See Sample Reports for anonymized example reports.
🤖 Pretrained Foundational Models: With Large Language Models pretrained using four million clinical reports. Five architectures were pretrained using two pretraining strategies, and all models are publicly available on HuggingFace! See Pretrained Models for an overview.

Participation¶

Join leading researchers from around the globe in advancing the field of clinical NLP. Whether you are a student or a seasoned researcher, your contribution can significantly impact the future of medical diagnostics. Start to innovate, develop and test cutting-edge NLP solutions in a real-world medical context in three steps:

1. Register: Get started by registering on the Grand Challenge platform and joining the challenge (clicking "Join").
2. Develop Your Solution: Access our guidelines, baseline training code, and synthetic data to prepare your submission. Navigate to github.com/DIAGNijmegen/dragon_baseline and the dedicated development guide for documentation on how to get started!
3. Submit Your Solution: Contribute to the advancement of clinical diagnostics by submitting your NLP models. Please read the Submission page for more information.

Contact¶

Have questions? You can contact me via email (Joeran.Bosma@radboudumc.nl), leave a message on the forum, or open an issue on GitHub.