Unlocking the Future of Cancer Genomics with High-Speed Data Transfer: A Collaboration with Cancer Science Institute of Singapore, SingAREN and NSCC

3 April 2025 – SingAREN in tandem with NSCC, worked closely with Dr Jason J. Pitt from Cancer Science Institute of Singapore (CSI Singapore) on his research project “Leveraging the Scalable Workflows for the Analysis of Genomes framework to identify and characterise genome instability patterns in cancer” in 2024. Dr Pitt is a Principal Investigator at CSI Singapore where he is also the Head of the Genomics and Data Analytics Core (GeDaC) facility. His laboratory is one of the five advanced facilities within CSI Singapore, that provides bioinformatics and computational support services to the institute labs and the university. Its mission is to, 1) provide investigators access to bioinformatics expertise and solutions, 2) use data science to enhance the impact of CSI Singapore’s research, and 3) develop innovative platforms for cancer genome analytics.

One major challenge that was faced by Dr Pitt’s team was harmonising massive amounts of cancer genomics data acquired from various sources which led to the technical batch effects. Batch effects can occur when disparate computational tools and algorithms are deployed over distinct subsets of sequencing data – making it difficult to distinguish between real mutational patterns and systematic errors. As the Pitt Lab is harmonising petabytes of whole genome sequencing (WGS) data, downloads are expected to be extremely fast, stable, and efficient, after which, the downloaded data is to be analysed and reprocessed with high computing power.

A high-performance workflow, known as Scalable Workflows for the Analysis of Genomes (SWAG), was deployed and optimised on NSCC ASPIRE2A to clean and harmonise data from local and/or external heterogeneous sources to eliminate batch effects.  SWAG was designed to extract valuable mutation calls from DNA sequencing data. In addition to fuelling the Pitt Lab’s data intensive exploration of genome instability patterns in cancer, the recapitulated batch-effect-free output will then be distributed within NUS research centres to bolster collaborative scientific projects.

In this project, SingAREN network provided Dr Pitt’s team with a high-speed link for petabytes of cancer genomics data downloaded from USA repositories into NSCC ASPIRE 2A at “lightning” speed. The entire project consumed ~2 petabytes of genome data from the National Cancer Institute’s Genomic Data Commons (GDC) administered by the University of Chicago. The amount of data downloaded from GDC daily was within the tens of TB, consisting of file sizes ranging from 3 GB to 500 GB. Dr Pitt commented that SingAREN’s download speed was very fast and had significantly improved their data processing capabilities efficiently.

Dr Pitt’s team has started applying discriminative AI, generative AI, and representational learning tools downstream to understand and predict clinically relevant genome instability phenotypes. As Singapore evolves into a repository of research data, SingAREN, an inherent part of the ecosystem to support advanced research, will facilitate the distribution of research data at a high-speed rate to researchers in diverse locations across the island.

“We have observed a significant speed improvement in our data transfers over the Internet using SingAREN’s infrastructure compared to other resources. Specifically, we’ve seen up to a tenfold increase in download speeds, greatly accelerating the throughput of our genomic analysis workflows.” – Akila Perera, HPC/Cloud Engineer & SWAG development lead in the Pitt Lab

“The network speeds of SingAREN, combined with the computational power of NSCC via ASPIRE2A, has allowed us to greatly enhance local scientific efforts by enabling us to effectively utilise petabytes of sequencing data generated globally” – Dr Jason Pitt, CSI/Pitt Lab

SingAREN is committed to providing the core infrastructure support to researchers to carry out their work speedily and reliably.” – A/Prof Francis Lee, VP – SingAREN

 

Picture credit to Pitt Lab

 

References:

Cancer Science Institute of Singapore (CSI Singapore) is a University Research Institute at the National University of Singapore (NUS). Officially launched on 15 October 2008, CSI Singapore aims to position Singapore as a global-leader in the field of Biomedical Sciences. Its mission is to better understand the causes of human cancer across Asia, and thereby improve its detection, treatment and prevention for the benefit of the patients. The CSI Singapore’s outstanding researchers and excellent scientific facilities create an energetic environment for ground-breaking research and world-class training. The CSI Singapore is internationally recognized for its innovative research on the biology of cancers prevalent in Asia, and for taking new methods for cancer treatment from the laboratory to the clinic. Through its local and global partnerships, the CSI Singapore works with leading minds from multiple scientific and clinical disciplines, both in academia and in industry. For more information on CSI Singapore, visit https://csi.nus.edu.sg

National Supercomputing Centre (NSCC) Singapore, established in 2015, manages Singapore’s first national Petascale facility providing high-performance computing (HPC) resources. As a National Research Infrastructure, NSCC supports private and public sector research including commercial companies, government agencies as well as higher education and research institutes. Through the support of its stakeholders including the Agency for Science Technology and Research (A*STAR); Nanyang Technological University (NTU); National University of Singapore (NUS); Singapore University of Technology and Design (SUTD); the National Environment Agency (NEA) and Technology Centre for Offshore and Marine, Singapore (TCOMS); and funded by the National Research Foundation (NRF), NSCC catalyses national research and development initiatives, attracts industrial research collaborations and enhances Singapore’s research capabilities. For more information, please visit: https://nscc.sg

SingAREN is Singapore’s National Research and Education (R&E) Network and the sole provider of local and international networks dedicated to serving the R&E community in Singapore. SingAREN facilitates high-speed transfers of large datasets, both domestically and across international boundaries, for scientific research and enables advanced network technology demonstrations through its resilient links and high-speed fibre network. The SingAREN Open Exchange (SOE) interconnects Singapore’s R&E community to the R&E networks in other countries, including Asia, Australia, Europe and the US. For more information, please visit singaren.net.sg

This article was written by Vee Len (SingAREN) and Dr Jason J Pitt (CSI), reviewed by Eugene Low (NSCC)

Follow us on social media

Get in touch with us to find out more about our network and services.