We have submitted a response from the perspective of those who have worked with data supporting basic and translational research - we'll also be posting later a link to the NCIN and Department of health publication from December last year: An Intelligence Framework for Cancer - but for now, here's how we replied to the consultation:
The UK is currently supporting or considering the development of several initiatives seeking to promote the skills and infrastructure necessary to carry out health research based on linked large-scale or population-level datasets generated through routine processes of data collection.
In Wales, the Health Information Research Unit at the University of Swansea maintains the Secure Anonymised Linkage System (SAIL); in Scotland, the Scottish Health Informatics Programme (SHIP) supports the “collation, management, dissemination and research analysis of anonymised Electronic Patient Records”; in the UK, the Research Capability Programme of the National Institute for Health Research have piloted a Health Research Support Service which is due to be formally implemented as a full service: the Clinical Practice Research Datalink. The Medical Research Council also recently issued a call for e-Health Informatics Research Centres to “maximise the health research potential offered by linking electronic health records with other forms of routinely collected data and research datasets”.
Of the National Cancer Research Institute partners’ 2010 funding, however, over 50% was spent on research which could be described as basic, translational or early stage; 40% on Biology and Aetiology alone; and some proportion of the discovery and development elements of spending under Common Scientific Outcome (CSO) 5 (Treatment), technology development and evaluation under CSO 4 (Early Detection, Diagnosis and Prognosis), and CSO7 (Scientific and Model Systems) can be ascribed to these types of research.
The infrastructure used to support this work is in many cases intra-institutional, in some inter-institutional, rather than national – although with appropriate standardization, integrated datasets from within institutions could be submitted to national-scale repositories with greater ease. Completeness, accuracy and granularity of the data are vital for this research. Often the data which support and contextualize observations in the laboratory during these research projects are drawn from multiple hospital systems and collated with difficulty. This has an impact on timescales and the validation of observations. Some proportion of the NCRI partners’ spend in each of the CSOs is dedicated to Resources and Infrastructure (R&I) which may include informatics, however, if we look at the other project types falling under R&I for e.g. CSO 4 (Early Detection, Diagnosis and Prognosis) we may reasonably conclude that the proportion dedicated to informatics is not the majority – closer analysis of the NCRI CaRD database is required to confirm this.
CSO4.4 Examples of science that would fit:
· Informatics and informatics networks; for example, patient databanks
· Specimen resources (serum, tissue, images, etc.)
· Clinical trials infrastructure
· Epidemiological resources pertaining to risk assessment, detection, diagnosis, or prognosis
· Statistical methodology or biostatistical methods
· Centers, consortia, and/or networks
· Education and training of investigators at all levels (including clinicians), such as participation in training workshops, advanced research technique courses, and Master's course attendance. This does not include longer term research based training, such as Ph.D. or post-doctoral fellowships
The MRC, in their call for e-Health Informatics Research Centres, adduce the key findings of the ABPI and UK research funders mapping exercise reviewing the UK capability in e-Health records research – a number of these can be applied to the intra-institutional situation:: institutions could be submitted to national-scale repositories with greater easeics is not the
· There is a shortage of people with the breadth of skills necessary to carry out the complex linkage and analyses required in health informatics research.
· There is an absence of career structure in enabling roles such as data managers, software engineers, informaticians and data analysts.
· There are no clear interfaces between researchers and industry, policy makers or the NHS and there is no ready means for sharing best practice.
Certainly, my own experience of supporting even institutes with strong reputations for research is that they lack the skills, focus and confidence in informatics to make much progress in the development of their infrastructure – and have been extremely glad of the opportunity to take advice and receive support from experienced individuals with a research and informatics background.
Perhaps the NCIN could consider devoting some resource to skills development in this area, disseminating the acquired expertise and knowledge of the NCIN of best practices in data management and handling and the use of technology. Might this sit alongside the work currently envisaged by Proposal 6 of the consultation?
During a meeting with Oracle at the end of last year, an ex-colleague who specialises in molecular and gynaecological oncology suggested that their institute would not be seeking data integration services and infrastructure supply from the likes of Oracle with such urgency if they felt they could get ‘stage and grade’ at diagnosis from the Thames Cancer Registry.
The paucity of staging data in the registries is an established weakness as discussed in the NCIN and Department of Health document, An Intelligence Framework for cancer and steps are being taken to address this, however, the perception of the inadequacy of the dataset collected by the Thames Cancer Registry (and by extension, despite shining examples such as the ECRIC, the amalgamated registries’ dataset) extended beyond the known weaknesses unfairly to the dataset as a whole in the case of this Professor. Such perceptions were not uncommon at that centre and need to be overturned.
The vastly extended dataset which will be collected by the registries in future sounds extremely promising in its potential to support not only epidemiological and population-level research, but also basic and small-scale clinical research. It will be vital, however, to create a sustained ‘sales’ initiative to establish a new level of confidence in the data in the areas of the research community who have hitherto not engaged with these datasets due to the concerns described in An Intelligence Framework. Their concern may be that where a smaller dataset was found wanting, will the collection of a larger one not push already stretched resources beyond their elastic limit?
Having had first-hand experience of the way in which MDT data is fed into the Somerset system and the ample opportunities, often taken by overburdened MDT co-ordinators, to introduce error – it is inspiring to see that a truly modern approach to data extraction and aggregation is being implemented as described by Dr. Rashbass at, to give one instance, the NOCRI Information Systems Workshop. As described by Dr. Rashbass, various technologies including natural language querying will take data from pathology full-text reports, from local imaging systems and myriad other systems to create the amalgamated national dataset – and this data will be quality controlled and assured. More information on how the latter will be achieved would be welcome.
Similar initiatives and technologies are being employed by healthcare delivery and research organisations themselves – for example, the ORIS oncology platform being implemented intra-organisationally by King’s Health Partners and the Acropolis platform being implemented inter-organisationally. It is important to note that these implementations may be beyond the budget of smaller organisations who deliver oncology services and conduct research – and here the value of a new ‘high-resolution’, quality assured, timely dataset such as that envisaged by the registry modernisation team will have the potential to deliver enormous benefit.
But this will depend on the quality of the data and ensuring that this quality is recognised in the research community. “This service will ensure that common standards and working practices are applied to data extraction, linkage and quality assurance to both national feeds and a range of local sources.” This assertion really needs to be backed up with a strong communications and ‘marketing’ effort.
To this end, should the NCIN devote some resource to support activities at the provider end of the process to ensure that where providers are implementing their own data infrastructures, these can interface with and provide bulk data to the unified registries to the appropriate standard; and where they are not yet capable of developing their own infrastructures, that they have support in the provision of accurate and complete data to the registries and potentially support in the process of designing their own data architectures and integration solutions; and then to effectively communicate the work that they are doing to improve registry data effectively to the community – concentrating not on the sophisticated use of technology to capture and amalgamate data, but on the procedural changes being implemented to assure quality?
Many of the proposals made in the consultation document might be realised by the same infrastructural components – and many of these components are similar to those which will hopefully be implemented by the Clinical Practice Research Datalink (CPRD). Where respondents to the consultation indicate that the proposed data linkage and notification services would deliver great benefit to their work, it may be worth establishing what level of awareness they have of the CPRD, the concern being that the overhead involved in creating facilities which might duplicate some aspects of the CPRD could be enormous given the proportion of the budget for the latter initiative devoted to infrastructure. There might be a greater return on investment to be had by focusing on data rather than infrastructure at the national level?
In conclusion, it might be worth considering if Proposal 6 (a research support service advising on the availability of and access to data) could benefit from being expanded to include some work looking at supporting data quality and intra-institutional infrastructural development - and engaging the basic and translational research communities to overturn perceptions about the ability of the dataset to support their work.
Let us know what you think - are we way off-beam?