27 May 2015

The biggest advantages of ANGSD as they claimed are “Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. This is especially useful for low and medium depth data”. The software was forked to my own repo.

Install from Github

You also need to install CRAM by htslib, although I have no idea what CRAM is, I believe it might be something more fast or a space saving format (like SAM?). I have to say the installation become so easy with github!

git clone https://github.com/samtools/htslib.git  
git clone git://github.com/ANGSD/angsd.git   
cd angsd  
make  

Preparation for BAM input

The following codes all following ANGSD’s tutorial.

### download data
wget http://popgen.dk/software/download/angsd/bams.tar.gz
tar xf bams.tar.gz
### indexing them
for i in bams/*.bam;do samtools index $i;done
### create a list
ls bams/*.bam > bam.filelist

SNP and genotype calling

SNPs are called based on their allele frequencies by -doMaf. Basically, they will call a SNP if a site has a minor allele frequency significantly different from 0. (Note: how about really minor allele?)

### MAF for every basepair
angsd -bam bam.filelist -doMajorMinor 2 -doMaf 8  -doCounts 1 -out out
 
### SNP calling
angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -SNP_pval 1e-6 -doMajorMinor 1

### Genotype Likelihoods
angsd -bam bam.filelist -GL 1 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 2e-6 -out genolike -nThreads 10

### Genotype calling in one step
angsd -bam bam.filelist -GL 2 -out gatk_outfile -doMaf 2 -doMajorMinor 1 -SNP_pval 1e-6 -doGeno 5 -doPost 1 -postCutoff 0.95

with option -doPlink 1, it will output PLINK format.

Population Genetics



blog comments powered by Disqus