The data made downloadable here are based on D1, D2, D3 SNPs.
Those who are interested in D4 dataset should contact ttahira[at]

Version Contents Platform Annotation Samples
Phase I (D1) 281K SNPs Perlegen NCBI35 74CHM
Phase II (D2) 581K SNPs Perlegen + Affymetrix 500K NCBI35 74CHM
Phase III (D3) 876K SNPs Affymetrix SNP 6.0 NCBI36 87CHM

- The D2 dataset is a merge of the genotypes presented in D1 and a part of the genotypes determined using Affymetrix 500K Arrays followed by qc (Higasa K, et al., 2009).

- The D3 dataset is prepared essentially as described in Kukita et al. (2010), but qc in the paper is more stringent and two samples (CHM010 and CHM035) were further excluded due to grossly abnormal log2 ratio profile.

Genotype Data of D2 and D3

The file "mole_info_DhaploD*.txt" extracted from the zip file is a tab-delimited plain text table, with the following columns:


Note that genotypes of CHMs are homozygous at all sites and thus shown in one letters.

LD_bin Data of D2 and D3

The file "bin_*.gff" extracted from the zip file is a gff format file as explained below:


SNPs with "score = 1" are tagSNP and "score = 2" are best tagSNP.

Genotype Data of D1

We prepared two compressed files for data downloading. When "" file is expanded, a single file containing SNP data on all chromosomes is produced. When "" file is expanded, 23 files, each containing SNP data on each chromosome (22 autosomes and X chromosome) are produced. These files give dbSNP ID, NCBI Build 35 coordinates, and genotype results for 74 complete hydatidiform mole (CHM) samples. Alleles are given for the (+) strand on the specified NCBI sequence. Each file is a tab-delimited plain text table, with the following columns:

Column NameDescription
Refsnp_IDrefSNP rs number
Perlegen_IDPerlegen unique identifier for this SNP
ChromosomeChromosome: 01-22, X
Accession_IDNCBI Build 35 sequence accession number
Contig_positionPosition within the specified Build 35 sequence
AllelesThe SNP alleles, in arbitrary order
CHM001 ~ Haploid genotypes for the specified Japanese CHM sample identifier

LD bin data of D1

The following files extracted from the are the detail information of LD bins.

File NameDescription
*_info.txtSummary of LD_bin
(Chromosome, bin ID, #SNP, #tagSNP, #Unambiguous Haplotype)
*_rs_info.txtSNP ID in each block/LD_bin
(Chromosome, bin ID, RS ID, Perlegen ID, position)
*_hap_info.txtFrequency of haplotype in each LD_bin
(Chromosome, bin ID, haplotype ID, frequency, haplotype)
*_tag_info.txttagSNPs in each LD_bin
(Chromosome, bin ID, name, Perlegen ID)

PML converted Data of D1

We begin to provide our data in PML (Polymorphism Markup Language)
which is based on XML, to facilitate portability of our data on SNPs and other sequence variations.
The files are available which is devided into header and readme files.
dhaplo_pml_bin.tar (17.3 MB), header and readme files.