Quantcast
Channel: Recommind » data mining
Viewing all articles
Browse latest Browse all 2

Big Data Is Not Enough: Lessons from the History of Public Health

$
0
0

2013 marks the 200th anniversary of the birth of John Snow.  The London School of Hygiene & Tropical Medicine celebrated with an exhibition titled “Cartographies of Life & Death: John Snow & Disease Mapping.”  Snow is widely cited in histories of both epidemiology and of visualization for his role in helping both to stem the 1854 London cholera epidemic and for using mapping overlaid with a Voronoi diagram to validate his germ theory of causality for cholera.

It’s a wonderful story, and still highly relevant today.  In fact, The Guardian recently republished the data over a map of modern London using modern visualization tools, and it doesn’t look a lot different.  What is especially striking about the story though, is that while Snow was a brilliant pioneer who got it right, he might not be considered the best data scientist in comparison with one of his contemporaries, William Farr.

Farr was the Statistical Superintendent of the General Register Office in England.  His work on disease classification contributed heavily to the origination of the International Classification of Diseases used today.  He was also an impressive example of the early use of “big data,” albeit manually, as he attempted to understand the factors contributing to cholera outbreaks.  Farr produced a study of the 1848-49 epidemic that relied heavily on tables, charts and maps – that is, visualization – to present his analysis of the outbreak.  His analysis reportedly included demographic, social and environmental factors including age, sex, weather statistics, day of the week, living conditions, property values and geography.  He used clinical data mining to search for predictable factors correlating with disease, in this case cholera.

The really interesting thing is that Farr found his correlation, and presented it visually. 

His data showed a clear relationship between elevation, and cholera mortality.  So why do we remember Snow and his map rather than Farr and his elevation chart?

Because for all the careful data science, Farr got it wrong.  Snow, who considered far fewer factors in his analysis, and published much less detailed research, got it right.  Farr’s data wasn’t wrong – just his conclusion from the data.  You see, he believed cholera was due to miasma, bad air.  And that the closer the patient to sea level, the more the vapor from polluted water exposed him or her to that miasma.  Snow believed the causative factor was an as yet unknown germ (Pasteur had not yet propounded germ theory) transmitted through water, and he looked for backing for his theory.

Farr later came to agree with Snow, and both men made enormous contributions to epidemiology.  But what can we learn from them about big data and the use of clinical data mining to help us understand how to optimize care processes?

  • Visualization is enormously effective in conveying results, but it may not tell the whole story.
  • The presuppositions we bring to our analysis are critical.  Snow actually went out and found the data necessary to support his hypothesis, and his contemporaries believed he was too dismissive of contrary evidence.  Farr cast a much wider net, and asked better questions, but allowed his theory to prejudice the interpretation.
  • You’ve got to take the insights from the data, test them further, and make sure they stand up to reality.
  • The data can’t do the job alone.  It takes human insight and expertise, and probably a spark of brilliance, to get to valid answers.

When I speak with people about Recommind’s big data and machine learning technology, they sometimes want to simply unleash the software on their data with the idea it will find the answers – be they care process optimizations, patient safety improvements, or better outcomes – all by itself.   Instead, the technology can bring information to the surface, and present it in new ways.  It is up to people to make that information useful.



The post Big Data Is Not Enough: Lessons from the History of Public Health appeared first on Recommind.


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images