GeMTeX - German Medical Text Corpus
Automated indexing of medical texts for research
-
In everyday clinical practice, there are many texts such as doctor's letters and findings that contain valuable information on the patient's medical history, progression and treatment. With the help of these texts, programs for the automatic processing of natural language (natural language processing, NLP for short) could support doctors and researchers in their work. However, the full potential of clinical documents cannot be exploited due to a lack of standardization. The GeMTeX (German Medical Text Corpus) method platform aims to close this gap and aims to make medical texts from patient care available for research projects. The aim is to create the largest medical text corpus in the German language.
In order for the texts from routine care to be used for clinical and research purposes, they must first be readable by computer-aided natural language processing (NLP) programs. This requires large quantities of annotated texts from daily patient care. Annotated texts are documents that contain additional information through systematic annotations, e.g. information on diagnoses or medications. The annotations are checked manually by trainee doctors and thus serve as a reference for the further improvement of automatic annotation. Information structured in this way can be used together with existing data for analyses and statistical models.
As part of GeMTeX, the team at the Institute of General Medicine at the LMU Clinic is working on a sub-project dealing with the annotation of adverse drug reactions.
https://www.smith.care/de/gemtex_mii/ueber-gemtex/
https://www.smith.care/wp-content/uploads/2024/03/GeMTeX_Faktenblatt_DE_RGB.pdf
-
No publications yet.