UNIVERSITY PARK, Pa. — In her 20 years of bee analysis, Christina Grozinger had not confronted an information administration drawback fairly just like the one she encountered in 2020. Grozinger, Publius Vergilius Maro Professor of Entomology at Penn State, research methods to counteract declining bee populations, and her analysis requires her to amass, ship and analyze knowledge throughout groups in a exact means.
“There’s tremendous interest in understanding what flowering plants bees preferentially use for collecting nectar to make honey, or pollen to feed to their developing larvae,” stated Grozinger, who is also director of Penn State’s Center for Pollinator Research and affiliate director of the College of Agricultural Sciences’ Institute for Sustainable Agricultural, Food, and Environmental Science (SAFES).
“Researchers, land managers and beekeepers all would like to know what the key plant species are in different regions and times of year,” said Grozinger. “We can also link these plant communities to land use patterns or climate conditions, to predict how bees will perform in different locations. But to do this, we need a substantial amount of data, which needs to be accessible to large collaborative teams.”
Grozinger’s analysis group conducts DNA analyses of pollen and honey to hyperlink these samples again to their floral supply. She is engaged on a number of tasks with completely different groups that use this method, and he or she realized there was a chance to combine throughout these tasks, and throughout spatial, genomic and plant distribution knowledge units. However, Grozinger stated, knowledge for various tasks have been being saved on separate spreadsheets and databases, which have been difficult to share and combine.
After a colleague informed her a couple of staff of computational scientists generally known as Research Innovations with Scientists and Engineers (RISE) within the Institute for Computational and Data Sciences (ICDS), Grozinger felt hopeful a couple of resolution to her knowledge science challenges.
“If you want to use cutting-edge computational tools, you have to know what they are to make that connection with your research. The RISE team was able to bridge that gap.”
—Karen Fisher-Vanden, professor of environmental and useful resource economics and public coverage
The RISE staff consists of staff members with a wide range of software program engineering and computational science abilities, starting from optimizing advanced computational codes, to constructing customized internet platforms and knowledge administration infrastructure, to knowledge visualization. After an preliminary session, Grozinger partnered with Danying Shao, analysis and improvement engineer, who then had a number of extra consultations with Grozinger and the opposite researchers on the challenge.
“I created a database and a web application for the team to manage the meta data throughout a pipeline that includes sample collection, DNA sequencing and downstream analysis, and easily share this across their team,” stated Shao.
This resolution was a convincing success, stated Grozinger. She stated that the info administration platform that Shao developed will function foundational analysis infrastructure. Already, Grozinger is utilizing this platform for a number of tasks, and he or she expects to proceed scaling up its makes use of.
Connecting and optimizing pc fashions
Grozinger had heard concerning the RISE staff by Karen Fisher-Vanden, professor of environmental and useful resource economics and public coverage, who realized about RISE as a member of ICDS’s Coordinating Committee, a college group that gives suggestions on ICDS’s strategic initiatives.
Fisher-Vanden and her analysis staff, the Program on Coupled Human and Earth Systems, had been scuffling with integrating particular person system fashions to have the ability to seize essential feedbacks between water, energy, agricultural and financial techniques. Hearing concerning the RISE staff’s companies, she stated she felt that assist from RISE may be precisely what her staff wanted to beat the computational challenges they have been going through.
“If there is water scarcity in a specific region, we are not just interested in how that impacts one sector, such as the agriculture, but also how it impacts sectors with competing demands for that water, say, the power system and urban areas,” stated Fisher-Vanden, who additionally directs the College of Agricultural Sciences’ Institute for Sustainable Agricultural, Food, and Environmental Science (SAFES). “To study this, we’re coupling computational models that were developed by researchers in different disciplines, which is a huge computational challenge because of how the models differ in spatial and temporal scales.”
For occasion, in accordance with Fisher-Vanden, the facility system mannequin optimizes at an hourly and spatial grid scale, whereas the financial mannequin optimizes at a yearly and state-level scale. Water shortage might trigger sure energy turbines to go offline, resulting in spikes in electrical energy costs and potential outages. Consumers of electrical energy will reply to those worth spikes by decreasing demand for electrical energy which can scale back the necessity for electrical energy era. To seize these feedbacks, the 2 fashions should move info to one another, re-optimize, and iterate till convergence is reached. Writing the code to automate and handle this course of in an environment friendly means posed a problem to the staff.
Collaborating with ICDS’s RISE staff helped the staff tackle this computational problem. External funding from the Program on Coupled Human and Earth Systems offered assist for one RISE staff member’s time for a number of months. Fisher-Vanden’s staff partnered with Jeff Nucciarone, a analysis and improvement engineer, whose experience is in optimizing and parallelizing pc code. Both optimization and parallelization enable code to run quicker by eliminating pointless steps within the code and breaking the code down into chunks that may run concurrently.
“I wrote a parallelizer, which used an interface to manage 52 separate processes that would run at same time,” he stated. “It also included logic to detect common failure modes, so if the code detected failure for any of the 52 processes, it would restart quickly. Improving this step allowed greater automation of the workflow.”
The outcome reliably and effectively connects the facility system mannequin and the financial mannequin, a primary step within the staff’s course of. Now, Fisher-Vanden’s staff is working with RISE to combine different fashions into this coupled system, particularly a water steadiness mannequin and a crop/land-use mannequin. They are additionally exploring whether or not machine studying strategies will help establish stress factors within the coupled system. This may assist inform decision-makers when and the place older energy vegetation must be retired, for instance. These varieties of choices are sometimes made on a state-by-state foundation, however the impacts usually lengthen past state traces. Being in a position to quantify these impacts may enhance future decision-making, stated the researchers.
Providing RISE time to agricultural sciences researchers
Fisher-Vanden and Grozinger praised the RISE staff’s versatility and their capacity to translate info between the worlds of knowledge science and the researchers’ respective domains.
“If you want to use cutting-edge computational tools, you have to know what they are to make that connection with your research,” stated Fisher-Vanden. “The RISE team was able to bridge that gap.”
After their constructive expertise of collaborating with RISE, Fisher-Vanden and Grozinger sought to show others within the College of Agricultural Sciences to this precious useful resource by a joint SAFES-RISE seed grant competitors. Through this seed grant program, researchers can apply to be allotted time with RISE staff members who can tackle knowledge science or computational science challenges. The program mirrors the same program, established by ICDS and funded by the National Science Foundation, which is designed to allow computational analysis on the University scale.
“Many faculty members are used to having everything run in our own lab, but for these types of data science challenges, we need help,” stated Grozinger. “We have expertise within our own fields of genomics, organismal biology, and ecology — we do not have the training or expertise in computational data science that is needed for constructing these sophisticated systems. The RISE team provides a great system for having access to a team of skilled specialists.”
Researchers within the College of Agricultural Sciences can apply for a SAFES-RISE seed grant by May 31. Researchers in different Penn State faculties or campuses can even apply for RISE time by the ICDS RISE seed grant program, which might be supplied every semester by 2023.