Angus Roberts (Life Science Lead forGATE ,Sheffield University) & Dr. Robert Stewart (Clinical Informatics Lead, NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust & King’s College London) spoke about extracting structure from free text and using open source toolkits to do it, in particular UIMA and GATE (see this article on combining UIMA and GATE) – of note, training is available for the latter – as the GATE guys put it :”coaxing state-of-the-art performance (accuracy, speed) from these tools is still a fine art, and likely to remain so” – so a little training might come in handy.
In talking about the language engineering skills required to make effective use of these tools and the creation of pattern suites, the question which formed for us was “are you going to make the pattern suites you created for clinical data mining publicly available in pattern libraries?” We’ll get back to you when we have an answer. The abstract for their talk follows:
The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) hosts Europe's largest psychiatric case register. This is updated daily from live electronic medical records, and made available to approved researchers via a clinical records interactive search system (CRIS): a web search interface, and other search tools. While the structured records in the case register are of great value, an estimated 80% of the value of the data lies in free text entries made by clinicians in day-to-day practice.
With over 180,000 records, automated information extraction or text mining is essential if we are to make use of this. GATE - a General Architecture for Text Engineering’ - has been integrated with CRIS, to give a text mining capability. Several GATE applications have already been built to extract specific variables from free text. Output from these applications is being used successfully in a number of research projects, and a number of new applications are currently under development. There is an increasing understanding of how to maximise the benefits of GATE given the particular characteristics of the data.”