Kyushu University
Definitive Haplotype Database (D-HaploDB)
version 4.1 (Dec, 2011)

Haplotype Browser
(D4.1 on GRCh37)

Old

Data Download

Terms and Conditions

Update History

In this site, we present genome-wide definitive haplotypes, determined using a collection of 84 Japanese complete hydatidiform moles (CHMs), each carrying a genome derived from a single sperm. The haplotypes incorporate 1.7 million SNPs and 2,339 copy number variation regions (CNVRs), determined with high throughput array-based oligonucleotide hybridization techniques (Affymetrix SNP 6.0 and Illumina IM-duo).

The advantages of CHMs over conventional diploid cells for determining haplotype structures are as follows.

1. Haplotypes of CHMs can be read directly by genotyping, and no phase determinations are needed.

2. CNV segments (CNVSs) can be detected with a greater signal-to-noise ratio because all sites are homozygous

3. No need to resolve heterozygous sites of overlapping CNVSs, which makes identification of CNV alleles difficult in diploid data.

The Definitive Haplotype Browser can be used to view various information, such as SNP alleles, linkage disequilibrium (LD) bins and CNVs determined by us.


Version Contents Platform Annotation Reference
Phase I (D1) 281K SNPs Perlegen NCBI35 1,2
Phase II (D2) 581K SNPs Perlegen + Affymetrix 500K NCBI35 3
Phase III (D3) 876K SNPs + CNVs Affymetrix SNP 6.0 NCBI36 4
Phase IV (D4.1) 1.7 M SNPs + CNVs Affymetrix SNP 6.0 + Illumina 1M-duo GRCh37 5

News

2014-4-24,    Our paper about D-Haplo Phase IV (D4.1) is published online (see References #5).

2011-12-24,    D-Haplo Phase IV (D4.1) are now browsable. In this update, all data were reanalyzed using the latest annotations (see 2011-09-27 news).

2011-09-27,    Note that D-Haplo Phase IV will be updated shortly, using latest version (Illumina 1M-duov3_H and Affymetrix na32) of array annotations.

2011-06-23,    Genotypes, LD bin and CNV data for D-Haplo Phase IV release are now browsable.

2011-06-23,    CNV data for D-Haplo Phase III release are now browsable.

2010-05-28,    D-Haplo DB reopened after maintenance.

2009-09-03,    Genotypes data for D-Haplo Phase II release are now downlodable.

2008-05-27,    Genotypes and LD bin data for D-Haplo Phase III release are now browsable.


Getting Started

You can get started browsing by selecting a chromosome, gene, genomic region, or reference SNP for study. You will then be able to customize your view using the functionality of the Generic Genome Browser.

By Chromosome
By Genomic Region
By Gene Name
By refSNP Identifier

Track descriptions of the Definitive Haplotype Browser


References

1. Kukita Y, Miyatake K, Stokowski R, Hinds D, Higasa K, Wake N, Hirakawa T, Kato H, Matsuda T, Pant K, Cox D, Tahira T, Hayashi K (2005) Genome-Wide Definitive Haplotypes Determined Using a Collection of Complete Hydatideform Moles. Genome Research 15: 1511-1518.

2. Higasa K, Miyatake K, Kukita Y, Tahira T, Hayashi K (2007) D-HaploDB: a database of definitive haplotypes determined by genotyping complete hydatidiform mole samples. Nucleic Acids Res. 35: D685-689.

3. Higasa K, Kukita Y, Kato K, Wake N, Tahira T, Hayashi K (2009) Evaluation of haplotype inference using definitive haplotype data obtained from complete hydatidiform moles, and its significance for the analyses of positively selected regions. PLoS Genetics, 5(5): e1000468.

4. Kukita Y, Yahara K, Tahira T, Higasa K, Sonoda M, Yamamoto K, Kato K, Wake N, Hayashi K (2010) A definitive haplotype map as determined by genotyping duplicated haploid genomes finds a predominant haplotype preference at copy number variation events. Am. J. Hum.Genet., 86(6):918-928.

5. Tahira T, Yahara K, Kukita Y, Higasa K, Kato K, Wake N, Hayashi K (2014) A definitive haplotype map of structural variations determined by microarray analysis of duplicated haploid genomes. Genomics Data 2: 55-59.


Track descriptions of the Definitive Haplotype Browser

Fig. 1. Example of a browser view (D4.1 on GRCh37)



Cytoband
Informantion (cytoBandIdeo.txt.gz) was obtained from UCSC (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/).

Gene
Annotations in NCBI Build 37.2 (seq_contig.md.gz) was obtained from NCBI (ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/), and genes labeled as "GRCh37.p2-Primary Assembly" were displayed.

Transcripts
Annotations in NCBI Build 37.2 (seq_gene.md.gz) was obtained from NCBI (ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/), and transcripts labeled as "GRCh37.p2-Primary Assembly" were displayed.

CHMSNPs
SNPs genotyped in CHM samples with call rate greater than 90% are shown. Before call rate QC, genotype calls with log2 ratio lower than our empirically determined threshold (<-0.6 for Affymetrix, <-1 for Illumina) were forced to be no calls, because they were found to be unreliable in our analysis of "haploid" (thus all homozygous) CHM samples. Genotypes obtained with Illuimina and Affymetrix arrays were merged for each sample. For SNPs common to both arrays, some strand information of the Illumina array was changed so that it matched with Affymetrix annotation. Strand flip problem may still remain.
Details containing individual genotypes and allele counts are viewable by clicking the glyphs.

LD bins
The pair-wise r2 values were calculated with maximum inter-marker distance 300 kb for SNPs whose minor allele frequencies (MAF) were at least 5%. LD bins and tag SNPs were determined by "TagZilla version 1.0" that estimates bins using greedy maximal approach similar to that of "ldSelect" (Carlson et al. AJHG 74: 106-120, 2004) with the threshold of r2 >= 0.80. In the LD bin track, the best-tags (i.e., the tagSNP that showed the highest average r2 for the remaining members within the bin) are highlighted in red. Details containing SNP and haplotype information are viewable by clicking the glyphs.

Fig. 2. Details of LD bin


- In this version, haplotypes containing "N" are counted as unambiguous haplotypes, so some haplotypes were splitted by the position of "N".
- LD metrics "D prime" were not calculated in D-HaploD4.1.


CNV
CNV segments (segments that were deleted or amplified in each of the 84 CHMs) were identified separately using Affymetrix SNP 6.0 and Illumina1M-duo data sets. Circular Binary Segmentation method (DNA Copy 1.24.0: Venkatraman & Olshen) was employed to identify segments using the datasets of log2 ratio values. Segments (CNVSs) which has mean log2 ratio values deviated from 0 were judged to be deletions (< -1.0 for Affymetrix, <-1.5 for Illumina) or amplifications (> 0.5 for both data sets).

Color-codes for the CNVS are as follows.
Red : deletion (Affymetrix)
Light red : deletion (Illumina)
Blue : amplification (Affymetrix)
Light blue : amplification (Illumina)

In some rare cases, segments were judged to be CNVS of opposite direction (deletions or amplifications) between Affymetrix and Illumina datasets. This apparent discrepancy is derived from difference of reference samples used for calculation of log2 ratio between the two platforms.

CNV regions (CNVRs) in CHMs are defined as merges of CNV segments (>=50 bp) across CHM samples. CNVs identified by several other groups can also be displayed.

LD plot
A plot of pairwise LD values (r2) is generated by the LDplot plugin. The color of the box is based on the raw score for the marker pair that ranged from 0 (white) to 1 (red).


For questions or comments regarding this site, please contact hayashi.kenshi[at]gmail.com