Angus Roberts (Life Science Lead forGATE ,Sheffield University) & Dr. Robert
Stewart (Clinical Informatics Lead, NIHR Biomedical Research Centre at South
London and Maudsley NHS Foundation Trust & King’s College London) spoke
about extracting structure from free text and using open source toolkits to do
it, in particular UIMA and GATE (see this article on combining UIMA and GATE) – of note, training
is available for the latter – as the GATE guys put it :”coaxing
state-of-the-art performance (accuracy, speed) from these tools is still a fine
art, and likely to remain so” – so a little training might come in handy.
In talking about the language
engineering skills required to make effective use of these tools and the
creation of pattern suites, the question which formed for us was “are you going
to make the pattern suites you created for clinical data mining publicly
available in pattern libraries?” We’ll get back to you when we have an answer. The
abstract for their talk follows:
The South London and Maudsley NHS
Foundation Trust Biomedical Research Centre (SLAM BRC) hosts Europe's largest
psychiatric case register. This is updated daily from live electronic medical
records, and made available to approved researchers via a clinical records
interactive search system (CRIS): a web search interface, and other search
tools. While the structured records in the case register are of great value, an
estimated 80% of the value of the data lies in free text entries made by
clinicians in day-to-day practice.
With over 180,000 records, automated
information extraction or text mining is essential if we are to make use of
this. GATE - a General Architecture for Text Engineering’ - has been integrated
with CRIS, to give a text mining capability. Several GATE applications have
already been built to extract specific variables from free text. Output from
these applications is being used successfully in a number of research projects,
and a number of new applications are currently under development. There is an
increasing understanding of how to maximise the benefits of GATE given the
particular characteristics of the data.”
No comments:
Post a Comment