The IGM has performed research diagnostic whole exome or whole genome sequencing on 5,261 CUIMC patients with presentations including undiagnosed diseases of childhood, chronic kidney disease, fetal anomalies and neurological diseases (with a focus on epilepsy). All of these patient genomes have been analyzed with a standardized diagnostic pipeline in the IGM in an effort to identify single genotypes that are responsible for disease. Diagnostic genotypes are those that are considered to be likely contributing to the patient’s presentation through study team consensus (a multidisciplinary team that includes population geneticists, molecular geneticists, clinicians, genetic counselors, bioinformaticians and analysts). The IGMDx database provides an easy to use interface to access all such diagnoses in disease causing genes. Users can enter a gene of interest and quickly identify all patients with positive diagnoses in the indicated gene, including non-identifying diagnostic information about the patient and information about the relevant diagnostic mutation or mutations. Information about whether the variant was confirmed in a CLIA-approved laboratory is also indicated. If additional information is needed about patients carrying indicated causal genotypes please contact firstname.lastname@example.org.
Sequencing of DNA was performed by Institute for Genomic Medicine. Samples were either exome sequenced using the Agilent All Exon (37MB, 50MB or 65MB) or the Nimblegen SeqCap EZ V2.0 or 3.0 Exome Enrichment kit or whole-genome sequenced using Illumina NovaSeq 6000 sequencers according to standard protocols.
The Illumina lane-level fastq files were aligned to the Human Reference Genome (NCBI Build 37) using the Illumina DRAGEN Alignment tool. Picard software was used to remove duplicate reads and process these lane-level SAM files, resulting in a sample-level BAM file that was used for variant calling. GATK was used to recalibrate base quality scores, realign around indels, and call variants. The IGM variants were required to have a quality score (QUAL) of at least 30, a quality by depth score of at least 2, a mapping quality score of at least 40, a genotype quality (GQ) score of at least 20, a read position rank sum score greater than -10 and at least 10x coverage. Additionally, variants were restricted according to VQSR tranche (calculated using the known SNV sites from HapMap v3.3, dbSNP, and the Omni chip array from the 1000 Genomes Project): the cutoffs were a tranche of 99.9% for SNVs and 99% for indels.
Variants calls were restricted to coordinates within the Consensus Coding Sequence (CCDS) release 20. All variants were annotated to Ensembl 87 using CLINEFF.
All the individual level diagnostic evaluation is performed by IGM analysis using ATAV.
We will periodically add new disease causing gene and its individual count to the list.
Individual count N and Gene name