In BMJ health & care informatics ; h5-index 0.0
BACKGROUND : Data, particularly 'big' data are increasingly being used for research in health. Using data from electronic medical records optimally requires coded data, but not all systems produce coded data.
OBJECTIVE : To design a suitable, accurate method for converting large volumes of narrative diagnoses from Australian general practice records to codify them into SNOMED-CT-AU. Such codification will make them clinically useful for aggregation for population health and research purposes.
METHOD : The developed method consisted of using natural language processing to automatically code the texts, followed by a manual process to correct codes and subsequent natural language processing re-computation. These steps were repeated for four iterations until 95% of the records were coded. The coded data were then aggregated into classes considered to be useful for population health analytics.
RESULTS : Coding the data effectively covered 95% of the corpus. Problems with the use of SNOMED CT-AU were identified and protocols for creating consistent coding were created. These protocols can be used to guide further development of SNOMED CT-AU (SCT). The coded values will be immensely useful for the development of population health analytics for Australia, and the lessons learnt applicable elsewhere.
Pearce Christopher, McLeod Adam, Patrick Jon, Ferrigi Jason, Bainbridge Michael Michael, Rinehart Natalie, Fragkoudi Anna
information management, information science