报告题目：Image Understanding of Figures in Biomedical Literature
Dong Xu is Shumaker Endowed Professor in Department of Electrical Engineering and Computer Science, Director of Information Technology Program, with appointments in the Christopher S. Bond Life Sciences Center and the Informatics Institute at the University of Missouri-Columbia. He obtained his PhD from the University of Illinois, Urbana-Champaign in 1995 and did two years of postdoctoral work at the US National Cancer Institute. He was a Staff Scientist at Oak Ridge National Laboratory until 2003 before joining the University of Missouri, where he served as Department Chair of Computer Science during 2007-2016. His research is in computational biology and bioinformatics, including machine-learning application in bioinformatics, protein structure prediction, post-translational modification prediction, high-throughput biological data analyses, in silico studies of plants, microbes and cancers, biological information systems, and mobile App development for healthcare. He has published more than 300 papers. He was elected to the rank of American Association for the Advancement of Science (AAAS) Fellow in 2015.
Figures in the scientific literature contain rich information. For example, many new molecular mechanisms of genomics, pharmacogenomics, immunology, and other fields are reflected in pathway figures and need to be curated for various applications, especially in precision medicine. Current manual curation approaches are inadequate in keeping up with the pace of biomedical literature growth. Compared with textual representations, pathway figures in biomedical literature often contain more direct representations of the mechanisms. However, no systematic method for curating pathway figures exists in publications. Here, we propose a pathway curation pipeline, which integrates a deep learning model with an optical character recognition method and an image processing strategy to capture the locations, names, and interactions of pathway entities in the figure. Our pipeline was evaluated on the figures from PubMed publications. The results demonstrate that our model can effectively retrieve molecular entities and their interactions from pathway figures at a large scale. The proposed pipeline provides an alternative way to text-mining approaches in biological literature mining. In future work, we will combine our method with text-mining tools to enrich extracted information and reconstruct pathway mechanisms fully.