|
Data download
The data made downloadable here are based on D1, D2, D3 SNPs. Those who are interested in D4 datasets or later should contact ttahira[at]gen.kyushu-u.ac.jp
| Version |
Contents |
Platform |
Annotation |
Samples |
| Phase I (D1) |
281K SNPs |
Perlegen |
NCBI35 |
74CHM |
| Phase II (D2) |
581K SNPs |
Perlegen + Affymetrix 500K |
NCBI35 |
74CHM |
| Phase III (D3) |
876K SNPs |
Affymetrix SNP 6.0 |
NCBI36 |
87CHM |
- The D2 dataset is a merge of the genotypes presented in D1 and a part of the genotypes determined using Affymetrix 500K Arrays followed by qc (Higasa K, et al., 2009).
- The D3 dataset is prepared essentially as described in Kukita et al. (2010), but qc in the paper is more stringent and two samples (CHM010 and CHM035) were further excluded due to grossly abnormal log2 ratio profile.
Genotype Data of D2 and D3
The file "mole_info_DhaploD*.txt" extracted from the zip file is a tab-delimited plain text table, with the following columns:
Note that genotypes of CHMs are homozygous at all sites and thus shown in one letters.
LD_bin Data of D2 and D3
The file "bin_*.gff" extracted from the zip file is a gff format file as explained below:
SNPs with "score = 1" are tagSNP and "score = 2" are best tagSNP.
Genotype Data of D1
We prepared two compressed files for data downloading. When "gen_data_all.zip" file is expanded, a single file containing SNP data on all chromosomes is produced. When "gen_data.zip" file is expanded, 23 files, each containing SNP data on each chromosome (22 autosomes and X chromosome) are produced. These files give dbSNP ID, NCBI Build 35 coordinates, and genotype results for 74 complete hydatidiform mole (CHM) samples. Alleles are given for the (+) strand on the specified NCBI sequence. Each file is a tab-delimited plain text table, with the following columns:
| Column Name | Description |
| Refsnp_ID | refSNP rs number |
| Perlegen_ID | Perlegen unique identifier for this SNP |
| Chromosome | Chromosome: 01-22, X |
| Accession_ID | NCBI Build 35 sequence accession number |
| Contig_position | Position within the specified Build 35 sequence |
| Alleles | The SNP alleles, in arbitrary order |
| CHM001 ~ | Haploid genotypes for the specified Japanese CHM sample identifier |
LD bin data of D1
The following files extracted from the LDbin.zip are the detail information of LD bins.
| File Name | Description |
| *_info.txt | Summary of LD_bin |
| (Chromosome, bin ID, #SNP, #tagSNP, #Unambiguous Haplotype) |
| *_rs_info.txt | SNP ID in each block/LD_bin |
| (Chromosome, bin ID, RS ID, Perlegen ID, position) |
| *_hap_info.txt | Frequency of haplotype in each LD_bin |
| (Chromosome, bin ID, haplotype ID, frequency, haplotype) |
| *_tag_info.txt | tagSNPs in each LD_bin |
| (Chromosome, bin ID, name, Perlegen ID) |
PML converted Data of D1
We begin to provide our data in PML (Polymorphism Markup Language)
which is based on XML, to facilitate portability of our data on SNPs and other sequence variations.
The files are available as dhaplo_pml.tar (49.5 MB) which is devided into dhaplo_pml_snp.tar (32.2 MB),
dhaplo_pml_bin.tar (17.3 MB), header and readme files. |