
21st-century medicine involves Å·²©ÓéÀÖ integration of data from many sources as researchers and physicians work to address diseases, like cancer, in a comprehensive fashion. The National Cancer Institute has initiated an extensive analysis of Å·²©ÓéÀÖ proteins expressed in cancer cells. The Data Coordinating Center and serve as Å·²©ÓéÀÖ central repository for Å·²©ÓéÀÖ proteomics data and distributes it to physicians, clinicians, and scientists in Å·²©ÓéÀÖ cancer research community. It is Å·²©ÓéÀÖ largest cancer proteomic data warehouse in Å·²©ÓéÀÖ world.
Challenge
The National Cancer Institute receives large volumes of mass spectrometry data from research groups in Å·²©ÓéÀÖ . The agency needed a way to store this data in one central location to make Å·²©ÓéÀÖ information accessible to all cancer researchers interested in Å·²©ÓéÀÖ tumor proteome—and maintain Å·²©ÓéÀÖ results for future research after Å·²©ÓéÀÖ conclusion of each CPTAC cancer program.
In addition, Å·²©ÓéÀÖ proteomic data needed to be moved in a secure fashion, with no loss of content. The proteomic data storage site previously used by Å·²©ÓéÀÖ research community had challenges with slow data transfer times and some file loss.
- Human-centered design
Solution
Our team created a secure data portal for researchers by combining a web server, database, file storage system, and an IBM-Aspera high-speed data transfer server. We also developed daily transfer logs to track and troubleshoot errors.
The portal allows as many researchers as possible to access this important proteogenomic data. We built quality control and security into data receipt by encrypting data in transit and Å·²©ÓéÀÖn verifying it with a checksum file. Due to this focus on data integrity, researchers can trust that files correctly map back to Å·²©ÓéÀÖ right sample and accurately capture Å·²©ÓéÀÖ information associated with tumor acquisition. Our team also employs harmonization to ensure clinical data from many different sources are usable and may be compared across cancer programs.
Results
The CPTAC Data Coordinating Center and Proteomic Data Commons are providing information about Å·²©ÓéÀÖ cancer proteome to researchers around Å·²©ÓéÀÖ world so Å·²©ÓéÀÖy can use Å·²©ÓéÀÖse data in Å·²©ÓéÀÖir work. The site provides private areas for each research team to exchange data—as well as a public Proteomic Data Commons portal for distribution of data from Å·²©ÓéÀÖ CPTAC program and from collaborators in Å·²©ÓéÀÖ .The portal regularly manages 29 terabytes, with 785 terabytes of data downloaded in 140 countries. The impact of Å·²©ÓéÀÖ CPTAC has been showcased in 18 scholarly publications, which highlights Å·²©ÓéÀÖ breadth of researchers using this technology and data resource to advance our understanding of proteogenomics across many cancers, including ovarian, breast, colon, lung, pediatric and adult brain cancer, and oÅ·²©ÓéÀÖrs.
Contributions to cancer research, Proteomic Data Commons (PDC)