Complete Genomics Analysis Platform

Our sequencing service uses a proprietary custom-built sequencing platform developed by Complete Genomics. This platform is a result of technological advancements in DNA library manufacturing, DNA nanoarrays, sequencing chemistry, instrumentation and software (Drmanac, et al., Science 2010). The accuracy of Complete Genomics’ novel sequencing chemistry, combined with advanced algorithms for mapping and assembly, provides high-quality data (99.9998% accurate) ready for biological interpretation, and at a much lower cost than the total cost of purchasing and operating DNA sequencing instruments.

Our Novel Array and Read Technology

There are two primary components of our sequencing technology: DNA nanoball arrays, or DNB™ arrays, and combinatorial probe-anchor ligation reads, or cPAL™, reads.

I. DNB Nanoball Arrays

We have developed a novel approach to preparing fragmented DNA which can be packed onto a silicon chip very efficiently. Each DNA fragment is then reproduced in a manner that connects all copies together in a head-to-tail configuration, forming a long single molecule of connected nucleotides. Proprietary techniques developed by Complete Genomics cause each long single molecule to consolidate, or ball up, into a small particle of DNA that we call a DNA nanoball, or DNB. The DNBs are approximately 200 nanometers in diameter. Each DNB contains hundreds of copies of the 70 bases of DNA we are seeking to read in each fragment.

The patterned DNB arrays, due to their small size and biochemical characteristics, enable us to pack DNA very efficiently on a silicon chip. A proprietary process has been developed that causes the DNA to adhere to desired spots on the chip, while conversely preventing the DNA from adhering to the area between these spots. This enables us to affix individual particles of DNA to over 90% of these spots, leading to increased efficiency in nanoarray assembly.

II. cPAL Technology (Combinatorial Probe Anchor-Ligation)

We have also developed a unique and highly accurate cPAL technology that allows the sequence of each DNB to be read very efficiently on our sequencing platform. A ligase enzyme is used that attaches fluorescent molecules to the individual nucleotides in each DNB, corresponding to each single nucleotide base. By imaging the fluorescence, we can subsequently determine the sequence of nucleotides in each DNB.

A key characteristic of our cPAL technology is its high accuracy of reading short 5-base sequences of DNA. Another proprietary technique for preparing the DNA fragments has been developed so that we can read seven 5-base segments from each of the two ends of each DNA fragment, for a total of 70 bases from each fragment. Our proprietary assembly software accurately reconstructs over 90% of the whole human genomes from these 70 base reads from each fragment.

Advanced Informatics and Data Management Software

Sequencing whole human genomes generates considerable amounts of data that must be managed, stored and analyzed. In response to this need, we have built a genomics data processing facility with computing infrastructure and storage for managing both small and large-scale sequencing projects.

There are two major components of our Complete Genomics Analysis Pipeline: assembly and analysis software.

  • Assembly. Assembly is the process of organizing all of the overlapping 70-base nucleotide sequences to reconstruct the whole human genome. Our proprietary assembly software uses advanced data analysis algorithms and statistical modeling techniques to accurately reconstruct over 90% of the whole human genome from approximately two billion 70-base reads.
  • Analysis. Post assembly, our analysis software identifies key differences, or variants, in each genome. Detected variants include single nucleotide polymorphisms (SNPs), indels, substitutions, copy number variants (CNVs), structural variants (SVs), and mobile element insertions (MEIs). We then access publicly available databases of variants and genomic information to annotate each variant in the genome that is described in these databases. For the Cancer Sequencing Service, we identify somatic variants based on paired tumor-normal comparison, in addition to the variants detected by comparing each genome to the human genome reference. By using our service and open source analytical tools, our customers can significantly reduce their investments in computing and data storage infrastructure.

A comprehensive data file that lists all the variant calls, annotations, evidence for calls, and the calls and underlying reads and mappings are delivered as the final output. Genomic data is assembled and analyzed in Complete Genomics’ data center and then is securely transferred over a dedicated network to Amazon Web Services (AWS) for delivery to customers either by shipping hard disk drives or electronically.

By broadly enabling researchers to conduct large-scale whole human genome studies, we have helped revolutionize the scale and significance of these studies, and have helped researchers expand their understanding of complex diseases.

Additional Materials

 

Introduction to Complete Genomics’ Sequencing Technology

Click here to read an overview of Complete Genomics approach, sequencing technology, data analysis and software.

To Get the Whole Picture, Sequence the Whole Genome

Complete Genomics' whole genome sequencing service gives researchers a complete view of the genome that is not possible with exome, partial or low-coverage sequencing. Read more.

Want to receive our Quarterly Newsletter?

Click here to be added to our list.

Copyright © 2012 Complete Genomics Incorporated. All rights reserved. Use of this website signifies your agreement to the Terms of Use and Online Privacy Policy. Contact Webmaster.