HGDP
Human Genome Diversity Project (HGDP) is the best resource for a diverse set of genomic data. It has 1050 individuals from 52 different populations. I got the Stanford University data which has data...
View ArticleAdmixture: Reference Population
For regular admixture analysis, I am using HapMap, HGDP, SGVP and Behar datasets with some samples removed as I wrote earlier. For each of these datasets, I first filtered to keep only the list of SNPs...
View ArticleReference Dataset II
Combining my reference population with Xing et al data gets me 3,222 3,161 samples but with only about 23,000 SNPs after LD-pruning. The good thing is that this dataset has 544 South Asian samples from...
View ArticleSan and Pygmy
I have removed San and Pygmy groups from my reference datasets. That meant removing 39 samples from Reference Data I and 61 samples from Reference Data II. The presence of those groups was creating...
View ArticleHGDP to PED Conversion
For converting the HGDP data (from Stanford University) to Plink PED format, I used the following code. #!/bin/bash dos2unix HGDP_FinalReport_Forward.txt dos2unix HGDP_Map.txt dos2unix...
View ArticleOne PED File to Rule Them All
I am interested in North African populations due to my own heritage, so when Razib alerted me that Henn et al had a paper out about South African origins of humans and their African dataset was...
View ArticleIntroducing Reference 3
Having collected 12 datasets, I have gone through them and finally selected the samples and SNPs I want to include in my new dataset, which I'll call Reference 3. It has 3,889 individuals and 217,957...
View ArticleBurusho Kalash HarappaWorld Admixture
Someone asked for the individual HarappaWorld Admixture results for the Burusho and Kalash from HGDP. In the chart below as well as in the spreadsheet, the IDs starting with "b" belong to the Burusho...
View Article
More Pages to Explore .....