NCCS’ First Designated Chief Data Architect Developing a Data Ecosystem at ORNL

May 7, 2021 — When J. “Robert” Michael began his new job on the National Center for Computational Sciences (NCCS) on the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL) final September, he took on a novel activity: ensuring that information is a “first-class citizen” within the scientific analysis being performed on NCCS supercomputers.

As the NCCS’s first designated chief information architect, Michael helps the event, refinement, and structure of methods to beat challenges encountered in data-intensive computing throughout many NCCS packages.

What does that imply?

Traditionally, scientific analysis on high-performance computing (HPC) techniques has used ab initio (“from first principles”) mathematical calculations to mannequin phenomena after which carry out simulations to attain outcomes which might be contrasted with experiments. A more moderen method analyzes giant information fashions on a selected topic to extract findings, so information is the “ground truth” for the analysis reasonably than equations. Both strategies are important instruments in computational science, though they’re usually used individually.

“What I’m trying to help facilitate is integrating these two approaches. We have a lot of history solving scientific problems, but I think we’re still building the expertise, the manpower, and the infrastructure to connect these systems,” Michael stated. “So it’s less about convincing people that data needs to be a first-class citizen and more about helping to provide the infrastructure to allow it to be a first-class citizen.”

Robert Michael is the NCCS’ new chief information architect.

One of Michael’s first initiatives has been main the Scalable Protected Infrastructure workforce, which is establishing security protocols for handling sensitive data that Oak Ridge Leadership Computing (OLCF) supercomputers haven’t beforehand been approved to investigate effectively at scale, comparable to affected person data protected by the federal legislation limiting launch of medical info. This required coordinating personnel from a number of completely different departments: HPC Cybersecurity and Information Engineering, HPC Scalable Systems, Scalable Protected Data, and the Information Technology Services Division.

“A lot of what I bring to the table is being able to bring together the policymakers, the program managers, the engineers, and the scientists. I’m able to kind of speak all of those languages,” Michael stated. “And that’s one thing that interested me in this position of chief data architect—it fit all of those areas that I’ve enjoyed in my career, from the computational side of things to the data side of things, as well as leadership and management.”

Michael is making use of this cross-team method to a wide range of packages inside the NCCS. He is at the moment the principal investigator for the Clinical Concept Repository undertaking, which seeks to design a scalable framework for looking a library of medical ideas in collaboration with the US Department of Veterans Affairs. He additionally serves because the biosciences lead for ORNL’s edge computing technique, which is growing methods to course of information proper on the scientific devices, in addition to centralized computer systems. This is carefully tied to his involvement with ORNL’s Compute and Data Environment for Science, which helps coordinate compute and information options throughout the lab.

Michael’s training and employment historical past make him uniquely suited to such interdepartmental collaborations; he has a PhD in computational science with an emphasis in quantum physics and grasp’s levels in theoretical arithmetic and pc science. At St. Jude Children’s Research Hospital in Memphis, he managed a workforce of engineers growing bioinformatics software program for hybrid HPC and cloud architectures.

Although Michael now lives in Knoxville close to the ORNL campus, he nonetheless works remotely from his house whereas the lab continues to deal with the COVID-19 pandemic. This has made workforce constructing and undertaking administration a bit more difficult, even when everybody’s cameras are turned on throughout video conferences.

“As the chief data architect, one thing that sets me apart from a data engineer is really the strategy piece of it. I think that’s probably one of the hardest things, especially working in a virtual world. It’s a bit easier to get to know people in person,” Michael stated, “so one of the first things I set out to do was really understand the landscape and really understand what are the projects that are happening, who are the people, where are the needs.”

One general want that Michael want to fulfill is the event of a knowledge administration ecosystem, ensuring that the information itself is tracked. This will grow to be much more necessary as information science advances together with the continuing will increase in computational energy, such because the OLCF’s upcoming exascale-class supercomputer, Frontier.

“In my mind, Summit and Frontier and whatever big HPC system comes next are all pieces to this large ecosystem puzzle,” Michael stated. “But I have been less focused on the big HPC systems and more focused on how we integrate all of these disparate systems and do federated computing and federated data analysis. How do we do real-time analytics with very big data right there at the instrument?”

UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the only largest supporter of fundamental analysis within the bodily sciences within the United States. DOE’s Office of Science is working to deal with a few of the most urgent challenges of our time. For extra info, go to https://energy.gov/science.


Source: OLCF

LEAVE A REPLY

Please enter your comment!
Please enter your name here