Genotype Data of D2 SNPs
When "mole_info_DhaploD2.zip" file is expanded, a single file containing SNP data on all chromosomes is produced. The file gives genotype results of 581235 SNPs for 74 complete hydatidiform mole (CHM) samples. This dataset is a merge of the genotypes presented in D1 and a part of the genotypes determined using Affymetrix 500K Arrays followed by qc (Higasa K, Kukita Y, Kato K, Wake N, Tahira T, Hayashi K, "Evaluation of haplotype inference using definitive haplotype data obtained from complete hydatidiform moles, and its significance for the analyses of positively selected regions". PLoS Genetics 5:e1000468, 2009). The file is a tab-delimited plain text table, with the following columns:
| Column Name | Description |
| rs | rs number |
| chr | chromosome: 1-22, X |
| pos | position in chromosome (NCBI Build 35) |
| allele1 | allele 1 nucleotide |
| allele2 | allele 2 nucleotide |
| gtype | genotype results for 74 CHM samples |
Genotype Data of D1 SNPs
We prepared two compressed files for data downloading. When "gen_data_all.zip" file is expanded, a single file containing SNP data on all chromosomes is produced. When "gen_data.zip" file is expanded, 23 files, each containing SNP data on each chromosome (22 autosomes and X chromosome) are produced. These files give dbSNP ID, NCBI Build 35 coordinates, and genotype results for 74 complete hydatidiform mole (CHM) samples. Alleles are given for the (+) strand on the specified NCBI sequence. Each file is a tab-delimited plain text table, with the following columns:
| Column Name | Description |
| Refsnp_ID | refSNP rs number |
| Perlegen_ID | Perlegen unique identifier for this SNP |
| Chromosome | Chromosome: 01-22, X |
| Accession_ID | NCBI Build 35 sequence accession number |
| Contig_position | Position within the specified Build 35 sequence |
| Alleles | The SNP alleles, in arbitrary order |
| CHM001 ~ | Haploid genotypes for the specified Japanese CHM sample identifier |
Annotation Data
Following two files are detail information of haplotype block and LD bin.
Each file includes four tab-delimited text files, which are...
| File Name | Description |
| *_info.txt | Summary of block/LD_bin |
| (Chromosome, block/bin ID, #SNP, #tagSNP, #Unambiguous Haplotype) |
| *_rs_info.txt | SNP ID in each block/LD_bin |
| (Chromosome, block/bin ID, RS ID, Perlegen ID, position) |
| *_hap_info.txt | Frequency of haplotype in each block/LD_bin |
| (Chromosome, block/bin ID, haplotype ID, frequency, haplotype) |
| *_tag_info.txt | tagSNPs in each block/LD_bin |
| (Chromosome, block/bin ID, name, Perlegen ID) |
PML converted Data
We begin to provide our data in PML (Polymorphism Markup Language)
which is based on XML, to facilitate portability of our data on SNPs and other sequence variations.
The files are available as dhaplo_pml.tar (49.5 MB) which is devided into dhaplo_pml_snp.tar (32.2 MB),
dhaplo_pml_bin.tar (17.3 MB), header and readme files. |