Archive

Scripts

A recent paper by Earle et. al. nicely showed the use of linear mixed models to determine drug resistance related genetic variants. Part of the software provided is an R package called bugwas, which will make the nice plots in figure 1 for you.

Here are some notes on how to get it to run, and correctly format the input files

Getting gemma to work

You’ll need to use the author’s modified version of gemma, which can be downloaded here. This may not run on your system, due to the blas and lapack libraries being in unexpected places. You can easily solve this by making some symlinks.

First run ldd on the gemma executable to check which libraries cannot be found

ldd /nfs/users/nfs_j/jl11/installations/gemma.0.93b
 linux-vdso.so.1 => (0x00007fff28000000)
 libgsl.so.0 => /usr/lib/libgsl.so.0 (0x00007f9dfbcc0000)
 libgslcblas.so.0 => /usr/lib/libgslcblas.so.0 (0x00007f9dfba78000)
 libblas.so.3 => not found
 liblapack.so.3 => not found
 libstdc++.so.6 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libstdc++.so.6 (0x00007f9dfa8d0000)
 libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9dfa5d0000)
 libgcc_s.so.1 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libgcc_s.so.1 (0x00007f9dfa3b8000)
 libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9df9ff8000)
 libgfortran.so.3 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libgfortran.so.3 (0x00007f9df9cd8000)
 /lib64/ld-linux-x86-64.so.2 (0x00007f9dfc120000)
 libquadmath.so.0 => /software/hgi/pkglocal/gcc-4.9.1/lib/../lib64/libquadmath.so.0 (0x00007f9df9a98000)

In my case this was libblas and liblapack. Find the libraries using ‘locate liblapack’ etc. Above, I have created symlinks to these in a location in my LD_LIBRARY_PATH variable ‘/nfs/users/nfs_j/jl11/software/lib’ (you can make a similar directory and export it using ‘export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<new directory>’).

ln -s /usr/lib/lapack/liblapack.so /nfs/users/nfs_j/jl11/software/lib/liblapack.so.3
ln -s /usr/lib/libblas/libblas.so /nfs/users/nfs_j/jl11/software/lib/libblas.so.3

File formatting

The bugwas gen file format is as follows:
ps sample_1_name sample_2_name
23 A C
etc.

If you have a file readable into plink (stored as snps.bed, snps.bim and snps.fam) and bcftools, this is an easy conversion.

plink --bfile snps --recode vcf-iid --geno 0.05 --out bugwas
bcftools plugin missing2ref bugwas.vcf > bugwas2.vcf
bcftools query -f '%ID\t[%TGT\t]\n' bugwas2.vcf > bugwas_in.gen
sed -i -e 's/coor_//' -e 's/\/.//g' -e 's/\t$//' bugwas_in.gen
head -7 bugwas.vcf| tail -1 | cut -f 10- | sed 's/#/_/g'| awk '{print "ps\t" $0}' | cat - bugwas_in.gen > tmp.gen
mv tmp.gene bugwas_in.gen

As bugwas requires all sites to be imputed this code will take the major allele where any site is missing. An alternative would be to use the ancestral state, which the authors suggest using ClonalFrameML to do, though this will take longer to run.

If you’ve got a messy vcf rather than a bed as input a script I wrote may help you with missing/multiallelic sites.

Other pieces

You’ll also need gemma to generate the kinship matrix for the random effects:

gemma -bfile snps -gk 1 -o gemma_snps

You’ll need a phylogeny, which you can draw by standard methods (I’d recommend fasttree on your alignment if you’ve got > 1000 samples). If you’ve already got a tree make sure the tip labels match the samples in the gen file.

Run bugwas

The command you’ll need to run in R is of the form:

lin_loc(gen=”bugwas_in.gen”,pheno=”bugwas.pheno”,phylo=”fasttree.tr”,prefix=”gwas”,gem.path=”/nfs/users/nfs_j/jl11/installations/gemma.0.93b”, relmatrix=”output/gemma_snps.cXX.txt”, output.dir=”./”)

I’ve found for around 2 000 samples and 110 000 snps I needed around 15Gb of RAM and 4-5 hours to run.

Thermo Scientific NanoDrop range of UV/Vis spectrophotometers (http://www.nanodrop.com/Absorbance.aspx) seem to be pretty good to me — apart from their terrible software.

I used the NanoDrop 1000 Spectrophotometer, and found the software to be so unintuitive that I had to read the entirety of the manual for the sections I used (which can be found at http://goo.gl/smXCT). The two main points I would note are:

  • The data can be exported from the report – select ‘Show report’->’Save report’->’Full report’  which will save a .ndv file of all the spectra taken in the current session.
  • The absorbance data at 0.2mm (NB on the report it is shown for 0.1mm so you can visually confirm it is a tenth of the 1mm absorbance, see the wikipedia page on the Beer-Lambert law for an explanation of this linearity) is stored in C:\NanoDrop Data\User name\ HiAbs — there is no other way to get to it.

The .ndv files saved are tab-delimited files which you can load into spreadsheet software to manipulate and plot. As I wanted to do further spectral analysis this was a bit useless to me, so I wrote a quick perl script to convert these files into a set of .csv files (one for each spectrum listed in the .ndv file).

This is available at my other site (for now, this link will expire in July 2013) at http://users.ox.ac.uk/~ball3126/ndv-converter.pl and is also listed below. Run ‘ndv-converter.pl –help’ for a usage guide. The default behaviour is to convert all .ndv files in the current working directory

ndv-converter code listing (pdf)