By Allison Proffitt
October 12, 2020 | The Bio-IT World Conference & Expo Virtual launched final week with a gap plenary keynote from Drs. Susan Gregurick, Associate Director for Data Science at NIH and Rebecca Baker, Director of the HEAL Initiative at NIH.
Gregurick opened the primary digital Bio-IT World occasion by portray an image of the business poised on the beginning line of FAIR—findable, accessible, interoperable and reproducible—information. Wouldn’t it’s great, she mentioned, if we may pull information out of papers instantly into our personal Jupyter Notebooks, Galaxy, or apply GitHub algorithms?
Of course, that’s not often doable now. Instead, a big fraction of information we generate and publish is left on the desk, she mentioned. Measuring and quantifying information use and reuse is hard, Gregurick acknowledged, however estimated that in all probability 80% of information generated should not actually reused in an efficient means. That’s a disgrace, she lamented. Experiments are costly.
Our view should shift, she argued, to viewing metadata as a “love letter to our future selves” and future researchers—making future analysis and replication simpler. HEAL has achieved a very good job making metadata tremendous wealthy and standardized in a means that may be computable, Gregurick identified.
She’s heard the arguments in opposition to retaining information: “Just rerun the experiment!” many say. But we’re “data packrats” Gregurick acknowledged, and the info that scientists generate are the tangible product of the work. Data—and metadata—present instant-in-time snapshots of the place science stands right this moment and publishing that information now informs the longer term.
At NIH, 12 institutes and facilities specializing in imaging, genomics, biophysics, biomolecular simulation, and extra are working collectively to make this course of extra sustainable. For occasion, NIH hosts almost 37PB of genomics information from the Sequence Read Archive on two clouds: Google Cloud Platform and Amazon Web Services, giving researchers sooner entry to giant datasets in order that information will be shared simply and related to a wide range of compute sources.
NIH is engaged on new concepts as effectively, Gregurick mentioned, together with new infrastructure, new instruments to let individuals take part of their well being, new approaches to automated well being, and new medical information interpretation applied sciences. Some of those are coming quickly; Gregurick mentioned to count on new functions in early winter.
At a convention that’s newly digital, COVID-19 was by no means removed from anybody’s ideas. NIH is amassing COVID-19 information in BioData Catalyst, All of Us, COVID Cohort Collaborative, Alzheimer’s Disease Sequencing Project, and the Medical Imaging and Data Resource Center (MIDRC), Gregurick mentioned. The problem now’s delivering all of these information to researchers to make use of. NIH is shifting towards a related information platform ecosystem for COVID-19 information, she defined.
NIH is launching the Research Authentication Service (RAS), a single-sign on effort to save lots of researchers effort and time utilizing requirements created by GA4GH and others. In August, NIH deployed a RAS-dbGaP Visa and related providers that permit researchers to log in to RAS one time to entry any built-in repository and run an evaluation for as much as 15 days with out re-authenticating.
This is a vital first step, Gregurick mentioned, however there may be way more to be achieved. NIH shall be piloting a system to hyperlink information throughout platforms through an sincere dealer—throughout NIH websites like BioData Catalyst, and even between shopper information websites. “This is not new,” Gregurick mentioned, however “it’s something we’ll be piloting across a number of our systems, to really make de-duplication and linkage of data across a federated system a much more feasible endeavor.”
And she additionally highlighted “some really interesting work” from GA4GH: their Federated Analysis Systems Project, FASP, and emphasised that there’s extra to be achieved to allow workbenches and instruments throughout platforms for use with Jupyter, GitHub and different sources. “Most difficult is the dynamical and always-fluctuating policy with data access,” she mentioned. “Policy resources for data access are important!”
Finally, she known as for data methods that permit customers to navigate not simply information however data throughout methods. “All of these make interoperability across data platforms a reality for the future.”
Rebecca Baker, Director of the HEAL Initiative at NIH, is hoping to dwell in that future. Five years in the past, she mentioned, if she had requested her workforce to ensure all the info they gathered may very well be re-used, “It would have felt too hard!” she mentioned. Now, although, NIH has most of the foundations in place for information sharing and reuse to be accessible. “The technical challenges feel manageable.”
HEAL’s work focuses on ending opioid misuse, dependancy, and overdose. Overdose deaths elevated 4.6% in 2019, and the earliest, provisional information because the COVID-19 pandemic suggests a 50% year-over-year enhance in overdose in 2020. Opioid misuse and overdose—which was named a public well being emergency in 2017—is simply getting worse.
Along with opioid misuse are sister crises. Upstream: continual ache, Baker mentioned. Most opioid dependancy begins with a authentic prescription for ache administration, and so addressing the opioid disaster should start with the ache disaster and the way we deal with and medicate ache. And downstream: the variety of infants born opioid dependent annually. The HEAL Research Programs embrace $50M a 12 months for prevention analysis, translational analysis, scientific trials, and implementation science.
HEAL’s work, subsequently, is broad and diverse: prolonged launch addition medicines, different ache remedies, scientific trials on easy methods to finest nurture infants born with opioid dependence and their moms, and way more. The bolus of information that HEAL is producing is extraordinarily numerous, Baker mentioned: scientific trial information, behavioral information, genomic information imaging information, demographic and social determinate information, and way more.
Baker is engaged on harmonization of information assortment throughout HEAL, starting with implementing widespread information parts (CDEs) to facilitate cross-study comparisons. They began with a core set of CDEs—9 ache domains and continual and acute ache questionnaires. The supplemental set of CDEs is way broader: 375 supplemental questionnaires and 360 further measures.
It is with these CDEs that the challenges of information re-use come up, Baker mentioned. HEAL needs to make its information FAIR together with utilizing distinctive identifiers, labeling information, utilizing open methods to make information simply retrievable, and being “creative about which parts we make interoperable.” But the cultural challenges nonetheless require finesse. Researchers come to research with a substantial amount of experience, Baker mentioned, and plenty of are very set on their examine questions and endpoints. We must discover a technique to deliver them collectively, she mentioned: marrying their questions with HEAL’s CDEs.
It’s a cultural change, she mentioned, for researchers to run their experiments and ask their very own questions, all whereas NIH encourages including on further capabilities so the analysis will be enriched over time.
HEAL has revealed a public access and data sharing policy that requires speedy sharing of all underlying information on the time of publication, and is taking the primary steps to launch a Gen3 platform, permitting investigators and information mills to submit their information into information administration organizations that may be put right into a single platform.
These steps will all empower HEAL to start asking new sorts of modern questions, Baker mentioned. For occasion, How do information on opioid misuse and dependancy overlap with prescription information? Which sufferers are almost certainly to develop opioid use dysfunction? Which usually tend to overdose?
“I think we’ve made progress. We have a long way to go, but to me the future looks bright!” Gregurick mentioned.