Distinguished Lecture in Precision Medicine: Jeannette Wing, PhD. - At the Intersection of Health and Data

Jeannette Wing, Director of the Data Science Institute and Professor of Computer Science at Columbia University, delivered a Distinguished Lecture, as part of the ongoing Columbia Precision Medicine series.

Benjamin Grady Young
June 19, 2018

Dr. Wing’s presentation worked from the basics of ‘what is data science?’, through her philosophy and mission as the newly minted Director of the DSI, into some of the most interesting data science research projects currently underway at Columbia, and finally to her thoughts on the future of data science and why, in the face of the sometimes gross misuse of data, there is cause for optimism.

By keeping good ethical practice, embodied in her meme, “Data for Good”, as a focal point for all her initiatives, Dr. Wing is working to hold the data science community to a higher standard. Fairness, Accountability, Transparency, and Ethics are the pillars of good data science research. To this list Dr. Wing added Safety and Security, to create the modified acronym FATES, and emphasized the importance of safeguarding personal data in a time when a large-scale data breach is a regular headline. Even when forecasting the pitfalls of privacy that the global data science community will face in the 21st century, she was optimistic.

Columbia University is quickly becoming a leader in innovative health and data science research.

The Danino Lab, headed by Dr. Tal Danino, has analyzed the microbiome surrounding pancreatic cancer cells, made possible by deep sequencing and data analysis, and is developing a combination antibiotic-chemotherapeutic treatment to allow access to the notoriously elusive tumors.

Microarray sequencing of heterogenous tumors has taught us that no two tumors are alike and ‘big data’ analytics are starting to yield real time, personalized, treatment recommendations based on tumors’ genomic make up.

The Observational Health Data Sciences and Informatics (OHDSI) program[1], housed at Columbia and spearheaded by co-PIs Drs. George Hripcsak and David Madigan, is a “collaboration to bring out the value of health data through large-scale analytics.” The program draws data directly from patient records and has already demonstrated its value by identifying patient populations which consistently don’t respond to traditional first round treatment for hypertension, diabetes mellitus, or depression. OHDSI has a goal of collecting 1 billion patient records to analyze for hidden patterns. They are drawing their patient population from 25 countries, are working with 200 researchers, and have already sourced 600 million patient records.

While large datasets are vital for effective data analysis, it’s also important to recognize the importance of information translation and storytelling in data science. The OHDSI program produces intelligent and visual models comparing diagnoses and first, second, and third round treatment options to help physicians visualize which therapies are effective in which populations and explain options and decisions to patients.

A 2012 survey[2] found that 30% of global data resides in the healthcare industry including everything from FitBit™ heart rate records to pediatric surgery prognostics. With the drop in the price of genetic sequencing and the rise of wearable medical devices capable of constantly collecting data, health-tech’s percentage of data generation is only going to increase. This data will be a tool needing the brightest, most enthusiastic, and ethically conscious data scientists to wield it if we are going to use ‘Data for Good.’[3]


[1] https://www.ohdsi.org/

[2] https://www.buildingbetterhealthcare.co.uk/technical/article_page/Comment_Health_networks__delivering_the_future_of_healthcare/94931

[3] http://engineering.columbia.edu/web/newsletter/fall_2017/qa_jeannette_wing