Dr. Tiziana Ferrari
Director, EGI Foundation
IoT as enabler of scientific projects
In the coming decade, we see IoT becoming a fundamentally enabler of scientific projects. In various scientific domains, and in particular in environmental science, research data is harvested from distributed sensors which require in-situ real time processing. All this data needs to be accessed in a scalable manner, processed and integrated with Artificial intelligence techniques and data analytics tools. We envisage IoT will become a key enabler of the initial process of data harvesting and processing, which is one of the main steps in any research experiment. Especially in the environmental sector we see a tremendous push for the ability to integrate the in situ data with satellite data, for example. This is still a technical issue which was not completely solved. Having standards solutions and commodity services for IoT will simply tremendously this task. Typically, researchers have been addressing it by creating domain-specific technical solutions. Having more standardization in those domains will even enable cross-project collaboration and faster processes to scientific results boosting European scientific excellence.
Bring computing even closer to where the data is being sourced
Historically, in our scientific domain we have been developing European infrastructures for research in a way which separates data from analytics tools andcomputing infrastructures that are necessary to process such data. This has to do with how funding is organized and how research organizations have been operating. As a result, we have a lot of data preservation infrastructures which collect data from IoT, from observatories, larger detectors, from instrumentation of different size and scale, which are fully distributed and are major sources of raw data. This data is typically processed in situ and then needs to be aggregated, curated and deposited for exploitation and sharing. This is in a way the data infrastructure which is being built by many scientific communities. Typically, in Europe the infrastructures have been developed in a nonintegrated way. The ability of bringing together data from different sectors, from different research projects and the necessary computing infrastructure to extract the knowledge from this data is one of the major challenges being faced by research communities.
In Europe we have a strong push towards open science. This means the ability to open research data and tools, to make it as open as possible and as protected as necessary. This had tremendous impact and has accelerated the many research projects. We have seen this in projects where thousands of users can integrate data and applications from different research communities. However, being able to produce open data also means having longer-term funding and knowhow to curate it, to make it available for third-party exploitation, and to support data users to extract knowledge from it. Open data exploitation is not something that is at the core of a research project. In particular, projects start with a research idea. This includes the infrastructure and funding to collect the data and extract the knowledge. After a project ends, there is no more funding for the research team that has produced the data. One of the major challenges would be to have a European infrastructure that can be searched and an infrastructure where data can be deposited for exploitation next to computing. Why next to computing? Because data is becoming increasingly large, some data for privacy reasons cannot be transferred outside a given organization, such as in the health sector, so we need computing next to the data. Successful research data exploitation will require distributed computing facilities, including near edge computing services. This is the strong legacy of the scientific communities, and our future challenge would be to bring computing on device, on-premise, so that they can become even more effective in processing the data.
The adoption of a distributed and federated approach to scientific computing started back in 2000 with the invention of Grid computing. Back then, IoT was not even a concept known to research communities. Now we need to bring computing even closer to where the data is being sourced.
The power of digital twins to enable science
In our domain, data is being used to simulate natural systems like the brain, sub-nuclear particles, but also elements of our environment. The ability to simulate processes and the systems is integral in science to produce scientific outputs. Our community has been developing plenty of applications that allow scientists to simulate data and to extract information, and the pace of this digital transformation is even increasing. The twin concept is something which is very close to scientists and is something which has been part of research projects and the way we do research for decades.
I would like to mention one of our best examples which shows the power of having digital twins to enable science. In the EGI Federation, the largest international infrastructure for scientific computing, structural biology is being enabled through a suite of applications. One of the most popular applications is HADDOCK, an integrative platform for the modeling of biomolecular complexes. It supports a large variety of input data and can deal multi-component assembles of proteins, peptide, small molecules, and nucleic acids (https://wenmr.science.uu.nl/). Applications such as HADDOCK build on the knowledge and expertise of the researchers. By making such tools open to every researcher, in the last 18 months, since the start of the pandemic, scientists across the entire planet, from 125 different countries, have been able to use these applications for their science. We have enabled these scientific projects by mobilizing millions of CPU hours across major research centers in Europe, in the United States and the Asia Pacific region. Thanks to this bilateral collaboratiosn and by joining forces across borders we have been able to boost the research in finding drugs that would treat the COVID-19 symptoms. This example shows the power of digital twins, the power of making these digital twins available for every researcher on the globe.
It is our pride to have a strong expertise in Europe in developing scientific applications. This is one of the areas where European research excels in the world. Our duty as EGI, as a federated distributed infrastructure, is to enable these digital twins on a strongly Edge-based computing infrastructure. which makes the data available for the simulations necessary to the scientists.