CONDITIONS OF RESEARCH. To simplify the surveillance of COVID-19 in society and monitor the spread of mutations of the virus, the University of Gothenburg is now establishing a unique database. It will coordinate large amounts of data about the virus and share this information among hospitals and universities throughout Sweden and with the Swedish Public Health Agency.
“We will be able to follow trends nationally and in the near future while the virus is spreading,” says Per Sikora, coordinator at Genomic Medicine Sweden, a collaboration among Sahlgrenska Academy, SciLifeLab and Sahlgrenska University Hospital.
Data-driven approach
Information like age, gender, geographical location and time of test combined with data from lab results will be collected in what is known as a data lake with a high server capacity. By combining data, which also includes information on how strong a positive result is and the virus subtype, the paths of the virus through society can be tracked effectively. This information is anonymized so that it cannot be linked to individuals.
“The method becomes a data-driven approach. Instead of structuring the data in advance, you use the large amount of information that the data lake offers to conduct searches and visualizations, enabling you to arrive at conclusions and find answers. You will probably be able to answer questions that you did not have before, with data you did not know you needed,” says Sikora.
Data will be made available continuously starting in April 2021.
Unique solution within medicine
As far as Sikora knows, there is no similar a solution in Sweden within diagnostics, health care or translational research. But he cites Helsinki University Hospital and Genomics England as two places that have built similar infrastructure. What makes the Swedish initiative stand out is the way the database connects regions and national participants.
“By creating this powerful tool in the field of genomics available nationally, we are opening up great opportunities for the future, not only for COVID research but also for monitoring the development of multi-resistant bacteria. In the long run, the platform will enable hospitals to share and aggregate data for diagnostics.”
Core Facilities is the hub
Core Facilities at the University of Gothenburg purchases and manages the servers on behalf of Genomic Medicine Sweden (GMS), Sweden’s precision medicine project.
“Within both the University of Gothenburg and Region Västra Götaland, we have a special national commission to develop informatics and IT infrastructure in genomics. At Core Facilities, we have special expertise in the field and will serve as the hub for the work,” says Sikora, who also chairs informatics and infrastructure for Genomic Medicine Sweden.
FACTS
Who has access to the content?
Participating regions in the collaboration that will have access to the data are Västra Götaland, Östergötland, Skåne, Örebro, Stockholm, Uppsala and Västerbotten. Data will also be shared with the Swedish Public Health Agency.
What is the design of the server system, and how will it be used?
All data will be on a collection of servers with large amounts of memory. Extremely fast switches connect them and the data warehouse. Millions of files and large quantities of metadata about COVID-19 will be stored unstructured in the data lake, ready to be “fished up” so that researchers, infection control physicians and clinical microbiological labs can come up with answers to questions. The information cannot be linked to any individual.
In the automotive industry and the banking world, this technology is used to find trends in large amounts of data, for machine learning and to keep track of transactions. But in medical research and health care, this is unique in Sweden, says Per Sikora.
Who is responsible for the database?
Core Facilities at the University of Gothenburg purchases and manages the infrastructure on behalf of Genomic Medicine Sweden (GMS), which is part of SciLifeLab. Both the University of Gothenburg and Region Västra Götaland have a national commission through GMS to develop informatics and IT infrastructure in genomics.
The database is the first step in building the next generation of infrastructure for translational research and diagnostics. This is a lynchpin that can be scaled up indefinitely and will be able to support almost all genomics data generated throughout Sweden in the future.
BY: CHARBEL SADER