Winning first place at CLEF eHealth 2020, an international conference on medical AI

January 29, 2021

Japanese

Medical institutions around the world handle a huge number of medical documents daily, and the work of assigning diagnostic and treatment classification codes to these documents has become a major burden on medical institutions. CLEF eHealth, an international conference on medical AI, is holding a competition to improve AI technology for medical text. This year, the challenge focused on automatically assigning World Health Organisation (WHO) International Classification of Diseases (ICD) codes to clinical texts in Spanish. We participated in the competition and developed our own natural language processing AI technology, achieving the best score among the 22 participating organisations, comprising universities, companies and research institutes, from 11 countries. This led to an invited lecture at the convention in September and an award from the academic society in November.

Researchers Nuria Garcia and Kendrick Cetina, first-prize winnersResearchers Nuria Garcia and Kendrick Cetina, first-prize winners

Overview of the Competition

In medical institutions around the world, doctors keep text records of their daily practice, and in many countries and regions, they are required to provide international classification codes for diagnosis and treatment as they store their records. Currently, AI support is very much anticipated since assigning these codes is very time-consuming for medical staff, as they need to read through and understand large quantities of text.

CLEF eHealth is a conference on AI for medical texts, and this year the topic of automatic code assignment on Spanish medical texts was discussed. The 10th revision of the International Classification of Diseases (ICD-10), maintained by the World Health Organisation (WHO), has become the world standard for disease coding, and this time the task was to assign codes from its Spanish version, CIE-10. In the competition, participating organisations built an AI system to automatically assign classification codes to common data provided by academic societies, and the participants competed for accuracy. The task consisted of three subtasks: 1) assigning diagnosis codes, 2) assigning procedure codes, and 3) assigning diagnosis and procedure codes with explainable references.

Along with 22 universities, companies and research institutes from 11 countries participating, we entered all three sub-tasks. We devised a unique approach that integrates rule-based and machine-learning-based methods to perform high-precision clinical coding without manual work. When we applied this method to the task data provided by the organiser, we achieved the highest score in task 3 and took the first place. We also took the second place in task 1 and task 2. In addition to reporting this achievement at the convention from September 22nd to September 25th, we also held an invited lecture at a special session of the convention as a group with excellent results (see Figure 1).

Figure 1 Invited lecture at CLEF 2020 (Lecture video)Figure 1 Invited lecture at CLEF 2020 (Lecture video)

Features of the technology

Previous studies of automated clinical coding have used two main approaches: rule-based and machine-learning-based. In the rule-based approach, while detailed field application using field knowledge on clinical code is possible, there is a problem that it takes a great deal of time and effort to construct rules by field experts. On the other hand, the machine learning method can significantly reduce the amount of manual work, but it has the problem of training with scarce data. In machine learning, a large amount of labelled data is required to train a model that chooses one clinical code from 10,000, where preparing large collection of structured natural language sentences is impractical.

In this workshop, we developed a unique method to integrate rule-based and machine-learning-based methods, performed high-precision clinical coding without manual labour, and demonstrated its effectiveness. The proposed method automatically extracts technical terms from clinical texts and compares them with a knowledge system of clinical codes, which is automatically constructed by natural language processing to realize automatic clinical coding. We realized the automatic and high-precision process by having the term extraction suited for machine learning in it, circumventing the difficulty in directly classifying input texts to one of many codes, as most of machine learning approaches have done.

We will further enhance the technology as part of AI technologies for the healthcare domain, to promote social implementation through practical experiments.

Appendix

Figure 2 describes the system architecture employed in the competition. "CIE -10" is the Spanish version of International Classification of Diseases 10 (ICD-10), and the task is to assign CIE-10 code to input clinical texts. During the preparatory learning stage, CIE -10 is converted into a machine-processable knowledge graph (KG) about disease. Subsequently, pairs of clinical texts and their correct codes comprising the training data from the workshop are used to train the term extraction model by fine-tuning the pre-trained BERT model. During the inference phase, terminologies are extracted from the input clinical texts, and then classification codes are assigned by collating with KG in accordance with the distance calculation method developed for the system. In addition, Fujitsu has applied its own text augmentation technology to extend the training data set for terminology extraction to overcome the data scarcity problem.

Figure 2 System architectureFigure 2 System architecture

Contact Information

contact-clef2020@cs.jp.fujitsu.com

Please note that we would like to ask the people who reside in EEA (European Economic Area) to contact us at the following address.
Ask Fujitsu
Tel: +44-12-3579-7711
http://www.fujitsu.com/uk/contact/index.html

Fujitsu, London Office
Address :22 Baker Street
London United Kingdom
W1U 3BW

Share

  • facebook
  • twitter
  • LinkedIn
  • LINE
  • Weibo

Recommend

 
 

Connect with Fujitsu Research

LinkedIn
Twitter
YouTube
Top of Page