The Complete Genomics Analysis Tools (CGA™ Tools) is an open source project to provide tools for downstream analysis of genomics data produced by Complete Genomics. The general areas of functionality include:
The CGA Tools distribution is available on SourceForge. CGA Tools software is also available as a pre-compiled binary distribution for both 64-bit Linux and Mac systems which are also available on SourceForge.
The documentation for CGA Tools is available here.
CGA Tools includes five genome comparison utilities.
snpdiff – Compares Complete Genomics variation data file to SNP results provided by an alternative sequencing or genotyping platform that only produces SNP calls.
calldiff – Compares two Complete Genomics variation files to determine where the two genomes differ, and how.
calldiff for scoring somatic variations (beta) - Find differences between two genomes from the same individual, such as a tumor/normal pair, including somatic scores.
Multigenome compare tools: listvariants (beta) and testvariants (beta) - listvariants creates a list of all variants found in at least one of the genomes being compared and testvariants reports whether the variant is present or not in each of the genomes being compared.
junctiondiff (beta) - Identifies junctions (regions of the genome that are not adjacent on the reference genome) present in one genome but not another.
CGA Tools provides capabilities for converting the Complete Genomics export formats into other standard formats for performing additional analysis or data processing.
map2sam – Converts Complete Genomics exported reads and initial reference mappings to the SAM format.
evidence2sam (beta) - Converts Complete Genomics evidence mappings from local de novo assembly into SAM format.
generatemastervar (beta) - Aggregates variant calls, annotation data, and CNV information into a simple, integrated master variation (masterVar) file that contains one line per locus. This file can be used with other cgatools commands.
CGA Tools provides the following tools for manipulating tab-delimited files.
varfilter (beta) - Allows users to filter the content of var or masterVarBeta files based on one or more call selectors.
join (beta) - Adds user-specific annotations to the variation file, gene file, or any file containing genome coordinate information based on the overlap in genomic coordinates.
junctions2events - Considers possible relationships among junctions in the input file and determines which events a junction or multiple junctions is consistent with.
The reference tools are provided for building a copy of the reference genome for use with CGA Tools.
CGA Tools uses a specific reference sequence format called Compact Randomly Accessible Reference (CRR) file format to represent a reference sequence. The CRR format is designed for optimal memory usage and for processing tasks that require a randomly accessible reference.
Download CGA™ Tools from SourceForge.
Click here for CGA Tools documentation.