TCGA and Working in Big Science: A Medical Student’s Former Journey In Cancer Genomics

Author: Galen Gao

Two months ago, I had the opportunity to attend and present two posters at the TCGA Legacy Symposium in Washington, DC. As a sort of final capstone and celebration of The Cancer Genome Atlas (TCGA) and the associated Pan-Cancer Atlas, it was an exciting opportunity for me both to showcase my own work and to see what other scientists from across the world have been working on in the realm of cancer genomics. Spanning topics from genomic ancestries’ contributions to cancer risks, to improved identification of outliers in high-dimensional gene expression data, the quantity and diversity of projects presented at the symposium served as an excellent testament to the resources that TCGA was able to provide the scientific community.

Launched in December 2005, The Cancer Genome Atlas was a massive undertaking by the NCI and the NHGRI to comprehensively characterize a wide range of malignancies. Blossoming from a small pilot program of 206 glioblastoma patients, it grew to profile over 11,000 cancer patients representing 33 different cancer types through a diverse array of platforms encompassing SNP microarrays, methylation arrays, whole exome sequencing, and several more. Together, the information currently totals an impressive  2.5 petabytes of data. To summarize the findings of this extensive dataset, the Pan-Cancer Atlas was then launched as a collection of analyses across these multiple cancer types that explore broad themes of oncogenic processes, signaling pathways, and cell-of-origin patterns in these cancers. This September’s TCGA Legacy Symposium and the preceding publication of 30 PanCanAtlas papers in April 2018 have been fitting capstones of these endeavors and effective demonstrations of how applying large scale bioinformatic efforts to over 10,000 tumors can help uncover novel insights into tumor biology.

As a member of the Cancer Genome Atlas Research Network that spearheaded TCGA and the Pan-Cancer Atlas, I had the exciting opportunity at the symposium to finally meet many of my colleagues in person for the first time. It was wonderful to associate some faces to the countless voices I had listened to and worked with via telephone calls over the past 2 years before joining UT Southwestern. For me, working with TCGA was an eye-opening experience into the world of modern cancer genomics and its gradual evolution over the past decade. While I had taken classes in general biology and data analysis and statistics as an undergrad, I had no formal background in cancer genomics, and I remember spending much of my first few days of working on the Pan-Cancer Atlas wondering when my group would finally realize that they had made a huge mistake hiring a confused kid who had somehow stumbled his way through college and into the world of modern cancer research. Nevertheless, for two years, I had the privilege of working with and—very importantly—learning from many other researchers from across the nation and even the world, as we collectively tried to understand and characterize these cancers together.

As I left the hotel for the airport on the morning after the symposium had ended, I was able to reflect on my whirlwind 2-year introduction to cancer genomics and the role it had played in my scientific development. Although, with the symposium, TCGA has now officially drawn to a close, and my own daily worries have shifted from finding molecular associations in cancer to memorizing cranial nerves in medical school, the legacy of TCGA and the lessons I learned from my time there will carry on. Heralded as the “End of the Beginning” of cancer genomics, TCGA now serves as a template for “big” and “open” science, operating at a scale that far exceeded the capabilities of any single institution on its own to undertake at the time and making all of its data freely available to the general public for further mining and analysis through the Genomic Data Commons. Further, TCGA’s discoveries undoubtedly will affect my future in the clinic too. Already, starting with the earliest findings from the glioblastoma pilot project, discoveries announced in TCGA publications are beginning to redefine traditional, histological classifications of tumors in terms of molecular markers instead. While I had not planned for a 2-year hiatus between undergraduate and medical school, I can definitely say that I am more than happy to have both learned from, and played a small role in the story of TCGA. With the close of the TCGA Legacy Symposium, an entire decade’s worth of work can now help springboard the next chapter of both my career and that of many others in the scientific and medical community who have helped guide and inspire me. Here’s to the next decade of cancer genomics.