Installing Augustus with manual bamtools installation

Tags

, ,

Augustus is a very popular tool for gene annotation, however its installation process can be a bit tricky. For example, if we just download and install Augustus like below, changes are it will not work.

wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz
tar xvzf augustus-3.2.3.tar.gz
cd augustus-3.2.3
make

You will probably get these errors:

cd auxprogs && make
make[1]: Entering directory '/home/jxyue/Tools/augustus-3.2.3/auxprogs'
cd bam2hints; make;
make[2]: Entering directory '/home/jxyue/Tools/augustus-3.2.3/auxprogs/bam2hints'
g++ -Wall -O2    -c bam2hints.cc -o bam2hints.o -I/usr/include/bamtools 
bam2hints.cc:16:27: fatal error: api/BamReader.h: No such file or directory
 #include <api/BamReader.h>
                           ^
compilation terminated.
Makefile:29: recipe for target 'bam2hints.o' failed
make[2]: *** [bam2hints.o] Error 1
make[2]: Leaving directory '/home/jxyue/Tools/augustus-3.2.3/auxprogs/bam2hints'
Makefile:7: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/home/jxyue/Tools/augustus-3.2.3/auxprogs'
Makefile:7: recipe for target 'all' failed
make: *** [all] Error 2

This is because the auxiliary tools, bam2hints and filterBam, depend on the pre-installation of bamtools with proper library path configured.

Therefore, let’s install bamtools first. It will be much easier if we just do system-wide installation like:

sudo apt-get install bamtools libbamtools-dev

However, when we are working with a shared computing server and we do not have the root permission, we mostly likely will need install bamtools under our local directory like below.

git clone git://github.com/pezmaster31/bamtools.git
mkdir build
cd build
cmake ..
make

Now bamtools should have been correctly installed. Next, we need to modify the Makefiles of bam2hints and filterBam to adapt them with our manually installed bamtools.

First, go to the “augustus-3.2.3/auxprogs/bam2hints” directory and make the following changes for the Makefile:

Add:
BAMTOOLS = /your/path/to/bamtools

Replace:
INCLUDES = /usr/include/bamtools
By:
INCLUDES = $(BAMTOOLS)/include

Replace:
LIBS = -lbamtools -lz
By:
LIBS = $(BAMTOOLS)/lib/libbamtools.a -lz

Then, go to the “augustus-3.2.3/auxprogs/filterBam/src” directory and make the following changes for the Makefile:

Replace:

BAMTOOLS = /usr/include/bamtools
By:
BAMTOOLS = /your/path/to/bamtools

Replace:
INCLUDES = -I$(BAMTOOLS) -Iheaders -I./bamtools
By:
INCLUDES = -I$(BAMTOOLS)/include -Iheaders -I./bamtools

Replace:
LIBS = -lbamtools -lz
By:
LIBS = $(BAMTOOLS)/lib/libbamtools.a -lz

Now, we are finally ready to compile Augustus. Get back to the “augustus-3.2.3” directory and type “make”, viola!

References

https://github.com/pezmaster31/bamtools/wiki/Building-and-installing

https://www.biostars.org/p/189048/

Advertisements

Custom installation of PacBio’s GenomicConsensus (Quiver)

Tags

, , ,

It has always been a tricky experience for me to install individual PacBio’s SMRTanalysis tools on isolated machines that do not have the SMRTanalysis package pre-installed. Below is a summary of my recent custom installation of the GenomicConsensus (Quiver) tool, which is used for “polishing” (error-correction) PacBio assemblies.

First of all, I installed Conda to set up an independent python environment that is isolated from the python environment pre-configured for the server. To keep compatible with the SMRTanalysis tools,  I chose python2.7.9 for my conda python environment.

conda install python=2.7.9

Then log out and re-log in.

Install required Python libraries NumPy and h5py, which can be installed via:

conda install numpy=1.12.1 conda install h5py=2.7.0

Now we are ready to install required PacBio libraries

pip install git+https://github.com/PacificBiosciences/pbcore
git clone https://github.com/PacificBiosciences/ConsensusCore
cd ConsensusCore
python setup.py install --swig=$SWIG --boost=$BOOST

Here you should replace $SWIG with the path to your swig executable and $BOOST with the path to your boost installation (the top level directory). (Note that if SWIG is in your $PATH and boost is in /usr/local or /usr/include/, you do not need to specify these flags on the command line. setup.py will find them automatically).

Now, we can install GenomicConsensus (Quiver). Given the significant revision made for GenomicConsensus recently, here I install an older version (v1.1.0) to make sure the installation can succeed. ( I also tried the most recent version but it didn’t work.)

git clone https://github.com/PacificBiosciences/GenomicConsensus.git
cd GenomicConsensus
git reset --hard 654d0276d4a03f269cd1a14ddd7dfd0f54bede45
python setup.py install

Now, GenomicConsensus (Quiver) should have been successfully installed. Let’s test!

quiver -h

usage: variantCaller.py --referenceFilename REFERENCEFILENAME -o
                        OUTPUTFILENAMES [-j NUMWORKERS]
                        [--minConfidence MINCONFIDENCE]
                        [--minCoverage MINCOVERAGE]
                        [--noEvidenceConsensusCall {nocall,reference,lowercasereference}]
                        [--coverage COVERAGE] [--minMapQV MINMAPQV]
                        [--referenceWindow REFERENCEWINDOWSASSTRING]
                        [--alignmentSetRefWindows]
                        [--referenceWindowsFile REFERENCEWINDOWSASSTRING]
                        [--barcode _BARCODE] [--readStratum READSTRATUM]
                        [--algorithm ALGORITHM]
                        [--parametersFile PARAMETERSFILE]
...

References

http://bioinfo-master.ird.fr:8080/smrtanalysis/doc/bioinformatics-tools/GenomicConsensus/doc/HowToQuiver.html

https://github.com/PacificBiosciences/GenomicConsensus/tree/654d0276d4a03f269cd1a14ddd7dfd0f54bede45

http://pb-falcon.readthedocs.io/en/latest/quick_start.html#quick-start

Compiling 64-bit MUMmer

Tags

I encountered the following error when running nucmer (part of the MUMmer package) for a large genome assembly:

mummer: suffix tree construction failed: textlen=727301215 larger than maximal textlen=536870908

It appears that we can work around this problem by compiling a 64-bit version of mummer. There are two ways of doing this:

  1. Edit the MUMmer3.23/src/kurtz/libbasedir/types.h file.  We need to add the following line “#define SIXTYFOURBITS” before the line “#ifdef SIXTYFOURBITS”. Then run make install to compile the package.
  2. When compiling the package, use the following command:
    make clean
    make CPPFLAGS=”-O3 -DSIXTYFOURBITS”

Either way should work.

References

http://seqanswers.com/wiki/Talk:MUMmer
https://sourceforge.net/p/mummer/discussion/451664/thread/59ad55aa/

Installing tRNAscan-SE and snoscan

Tags

, , ,

For tRNAscan:

wget http://lowelab.ucsc.edu/software/tRNAscan-SE.tar.gz
tar -zxvf tRNAscan-SE.tar.gz
ln -s tRNAscan-SE-1.3.1 tRNAscan-SE
cd tRNAscan-SE 

we might need to edit the Makefile if we want to customize the installation directory, here I used emacs for this:

 emacs Makefile

I modified the following lines from:

BINDIR  = $(HOME)/bin
LIBDIR  = $(HOME)/lib/tRNAscan-SE
MANDIR  = $(HOME)/man

to:

BINDIR  = $(HOME)/local/bin
LIBDIR  = $(HOME)/local/lib/tRNAscan-SE
MANDIR  = $(HOME)/local/share/man

Then, we are ready to compile the program.

make
make install
chmod 755 ~/local/bin/tRNAscan-SE

When running tRNAscan-SE, I got the following error message:
“Can’t locate tRNAscanSE/Utils.pm in @INC (@INC contains:…”

So we need to patch the following line in ~/local/bin/tRNAscan-SE at line 28 so that tRNAscan-SE can find the modules that it needs.

use lib "/home/<my_userid>/local/bin";

 

For snocan:

wget http://lowelab.ucsc.edu/snoRNAdb/code/snoscan-0.9b.tar.Z
tar xvzf snoscan-0.9b.tar.Z
ln -s snoscan-0.9b snoscan
cd snoscan

According to the README file, we need to compile squid-1.5j first, so

cd squid-1.5j

Again, we can customize the installation path like we did for tRNAscan-SE:

SQUIDHOME  = $(HOME)/local/lib/squid
BINDIR     = $(HOME)/local/bin
SCRIPTDIR  = $(HOME)/local/scripts
MANDIR     = $(HOME)/local/share/man

There is a type conflict in the original Makefile of squid-1.5j, so we need to do this:

sed -i 's/getline/getLine/g' sqio.c

then we can compile squid-1.5j:

make

Switch back to the directory of snocan, and modify the Makefile to set the installation path

cd ..
emacs Makefile

BINDIR  = $(HOME)/local/bin
MANDIR  = $(HOME)/local/share/man

make
make install

 

References

http://www.vcru.wisc.edu/simonlab/bioinformatics/programs/install/trnascan-se.htm
http://gif.biotech.iastate.edu/Tutorial/doku.php?id=errors:snoscan_error

 

Wrapping R functions for use in Perl scripts

Tags

,

It comes in handy when we can use R functions in our Perl scripts sometimes. So how could we do that?
Here is an exmaple:

Say we want to wrap the R function phyper(), dhyper(), ppois() and atan() as a Perl module named “rmath”,
first we need to create a file (here I named it as “rmath.in”) with the following content:

%module rmath
%{
       double phyper (double x, double NR, double NB, double n, int lower_tail, int log_p) ;
       double dhyper (double x, double NR, double NB, double n, int log_p) ;
       double ppois(double x, double lambda, int lower_tail, int log_p) ;
       double atan(double x) ;
%}
       double phyper (double x, double NR, double NB, double n, int lower_tail, int log_p) ;
       double dhyper (double x, double NR, double NB, double n, int log_p) ;
       double ppois(double x, double lambda, int lower_tail, int log_p) ;
       double atan(double x) ;

We will use swig (http://www.swig.org/) to wrap it. In Mac OSX, we can simply install it using homebrew (http://brew.sh/).


# install swig with homebrew
brew install swig

We need to download a temporary copy of R for compiling the standalone Rmath library.


# download R 
wget https://cran.r-project.org/src/base/R-3/R-3.2.1.tar.gz
tar xvzf R-3.2.1.tar.gz
ln -s R-3.2.1 R
cd R
./configure --prefix=/<Path to this temporary R>/R
cd ./src/nmath/standalone
make
make install

Now we use swig to wrap the R functions that we need into Perl modules.

swig -perl rmath.i 

rm *.o 
for f in *.c ; 
do 
gcc -I. -I../../../src/include -I../../../src/include -I./.. -I/usr/local/include -DHAVE_CONFIG_H -DMATHLIB_STANDALONE -arch x86_64 -g -O2 -std=gnu99 -c $f -o ${f/.c/.o} `perl -MExtUtils::Embed -e ccopts`;
ls -l ${f/.c/.o}; 
done rm sunif.o; 

gcc -bundle -flat_namespace -undefined dynamic_lookup -o rmath.bundle *.o
sudo cp rmath.bundle  /usr/local/ActivePerl-5.18/site/lib/ 
sudo cp rmath.pm  /usr/local/ActivePerl-5.18/site/lib/ 

Note: I am using ActivePerl-5.18 here since ActivePerl-5.22 doesn’t work for this for some reason.

References
https://cran.r-project.org/doc/manuals/r-release/R-admin.html#The-standalone-Rmath-library

Installing blasr on the the CentOS/RHEL system

Tags

, ,

I am working heavily with PacBio data recently, so it will be nice to have PacBio’s blasr (https://github.com/PacificBiosciences/blasr) installed on our local server.

First, make sure the library glibc, glibc-devel and glibc-static have been pre-installed on the system. I got errors when compiling blasr without glibc-devl and glibc-static libraries.

Second, we need to install the hdf5 library. It is a bit tricky with regard to this. I firstly installed the latest version hdf5-1.8.15-patch1 but I got errors when compiling blasr with this version. I did a bit  research online and found out this is due to an API change of the hdf5 library happened since the version hdf5-1.8.12. So I removed the hdf5-1.8.15-patch1 and installed hdf5-1.8.11 instead. Here I recorded the commands that I used for installing the hdf501.8.11 library and blasr as follows.

#!/usr/bin/sh

# download and install hdf5-1.8.11
wget http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.11/src/hdf5-1.8.11.tar.gz
tar -xvzf hdf5-1.8.11.tar.gz
cd hdf5-1.8.11
./configure --enable-cxx --prefix=/home/yjx/local/lib/libhdf5 --with-zlib=/home/yjx/local/lib/libhdf5
# I don't have the root permission so I installed this library locally under my home directory (/home/yjx). 
# You can use "--prefix=/usr/local/lib/libhdf5 --with-zlib=/usr/local/lib/libhdf5" instead if you have root permission.

make
make check &gt; check.out
# Please check the content of the check.out file generated here to make sure everything is OK
make install &gt; install.log
cd ..


# download and install blasr
wget https://github.com/PacificBiosciences/blasr/archive/smrtanalysis-2.2.tar.gz
tar -xvzf smrtanalysis-2.2.tar.gz
cd blasr-smrtanalysis-2.2
# edit the common.mk file to set the path to your hdf library like this:
HDF5INCLUDEDIR ?= /home/yjx/local/lib/libhdf5/include
HDF5LIBDIR     ?= /home/yjx/local/lib/libhdf5/lib

# Now we are ready to compile blasr. Just type "make":
make
# We probably want to put the executable files that we compiled into a dedicated directory.
make install ASSEMBLY_HOME=/home/yjx/Programs/smrtanalysis/analysis 
# Now you should see all the executable files were copied to the following directory:
"/home/yjx/Programs/smrtanalysis/analysis/bin" 
# Then, add this path to your .bashrc or .bash_profile file.
PATH=$PATH:/home/yjx/Programs/smrtanalysis/analysis/bin
export PATH
# also add the following line:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/yjx/local/lib/libhdf5/lib
# now restart a login session and type "blasr -h" to test. You should see the help information for 
# blasr correctly displayed.

 blasr -h
   Options for blasr 
   Basic usage: 'blasr reads.{fasta,bas.h5} genome.fasta [-options] 
 option    Description (default_value).

 Input Files.
   reads.fasta is a multi-fasta file of reads.  While any fasta file is valid input, 
               it is preferable to use pls.h5 or bas.h5 files because they contain
               more rich quality value information.

   reads.bas.h5|reads.pls.h5 Is the native output format in Hierarchical Data Format of 
               SMRT reads. This is the preferred input to blasr because rich quality
               value (insertion,deletion, and substitution quality values) information is 
               maintained.  The extra quality information improves variant detection and mapping
               speed.
...

Voilà!

References
https://github.com/PacificBiosciences/blasr/wiki/Blasr-Installation-Qs-&-As
http://www.vcru.wisc.edu/simonlab/bioinformatics/programs/install/blasr.htm

Install genewise on the CentOS/RHEL system

Tags

, ,

I am trying to deploy a gene annotation pipeline for eukaryote genomes recently. The genewise (or “Wise2”) package (http://www.ebi.ac.uk/~birney/wise2/) is one of many packages that I need for this purpose. The version that I am trying to install here is the latest one (v2.4.1), which can be obtained following this link: http://www.ebi.ac.uk/~birney/wise2/wise2.4.1.tar.gz . From the installation instruction, it seemed all I need to do is simply typing “make all” in the “./src” directory but in reality there are a few more steps needed before I get it compiled successfully. So I am recording these steps here for my future reference and hope it can help anyone who encountered the same problem.

#!/usr/bin/sh

# install glib2-devel package if it has not been installed on the system
<span class="im">sudo yum install -y glib2-devel</span>

# download and decompress the package
wget <a title="http://www.ebi.ac.uk/~birney/wise2/wise2.4.1.tar.gz" href="http://www.ebi.ac.uk/~birney/wise2/wise2.4.1.tar.gz" target="_blank">http://www.ebi.ac.uk/~birney/wise2/wise2.4.1.tar.gz</a>
tar xvzf wise2.4.1.tar.gz

# make some necessary changes in the makefile and source files.
cd wise2.4.1/src
# Here we need to modify makefile to replace "CC = cc" with "CC = gcc" in Line 25 and save the change
find ./ -name makefile |xargs sed -i 's/glib-config/pkg-config --libs glib-2.0/'
cd ./HMMer2/
sed 's/getline/getline_new/' sqio.c  > a &&  mv a sqio.c
cd ../models/
# Change phasemodel.c to replace "if( !isnumber(line[0]) ) {" with "if( !isdigit(line[0]) ) {" in Line 23
cd ..
make all

Now genewise should be correctly compiled and you can find in the “wise2.4.1/src/bin” directory.

References
https://www.biostars.org/p/87823/

Using LiftOver to convert genome assembly coordinates

Tags

, , ,

Suppose we have two different assemblies of the same organism, G1.fasta and G2.fasta, and we want to know the which regions in G2 correspond to our interested regions in G1, what should we do? UCSC LiftOver is a great tool to solve this kind of problems and here is how>

#!/usr/bin/sh

# split G2 into individual chromosome/scaffold (G2.$i.fa) and save them into ./G2_chr direcotry
# I did this step by using a perl script which I wrote, but you can also use UCSC faSplit tool I think.
# Say there are n chromosomes/scaffolds in the G2 assembly, so we will have G2.1.fa, G2.2.fa, ..., G2.n.fa.

# Split the G2 chromosomes/scaffolds into 3K chunks and make lift files
mkdir lift
mkdir split
for i in {1..n}
do
  faSplit size ./G2_chr/G2.$i.fa  3000 ./split/G2.$i.split  -lift=./lift/G2.$i.lft -oneFile
done

# run blat
mkdir psl
for i in {1..n}
do
  blat G1.fasta ./split/G2.$i.split.fa -t=dna -q=dna -tileSize=12 -fastMap -minIdentity=95 -noHead -minScore=100  ./psl/G2.$i.psl
done

# Change coordinates of .psl files to parent coordinate system
mkdir liftup
for i in {1..n}
do
	liftUp -pslQ ./liftup/G2.$i.liftup.psl ./lift/G2.$i.lft warn ./psl/G2.$i.psl
done

# Make chain files
mkdir chain_raw
for i in {1..n}
do
	axtChain -linearGap=medium  -faQ -faT -psl ./liftup/G2.$i.liftup.psl ./G1.fasta ./G2.fasta ./chain_raw/$i.chain
done

# Merge and sort chain files
chainMergeSort ./chain_raw/*.chain | chainSplit chain_split stdin

faSize G1.fasta  -detailed >G1.chr_length.txt
faSize G2.fasta  -detailed >G2.chr_length.txt

# Make alignment nets from chain files
mkdir net
for i in  ./chain_split/*.chain
do
 tag=${i/\.\/chain_split\//}
 chainNet $i ./G1.chr_length.txt ./G2.chr_length.txt ./net/$tag.net /dev/null
done

# Create liftOver chain file
mkdir over
for i in ./chain_split/*.chain
do
  tag=${i/\.\/chain_split\//}
  netChainSubset  ./net/$tag.net $i ./over/$tag.chain
done

cat ./over/*.chain >G1_G2.over.chain

# Make bed file to report converted coordinates. We can give the coordinates of our query regions (based on G1 assembly) in the input.bed file and liftOver will report the converted coordinates in conversion.bed file.
liftOver input.bed  ./G1_G2.over.chain conversion.bed unMapped

References
http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt
http://genomewiki.ucsc.edu/index.php/Minimal_Steps_For_LiftOver
http://genomewiki.ucsc.edu/index.php/LiftOver_Howto

Installing Git via Macports on Mac OSX

Tags

, ,

As I wrote more and more scripts for my work, it become difficult to keep tracking different version of my scripts for different specific tasks. Adopting a source control management (SCM) tool has been put on my agenda. An SCM tool will be even more useful if one is participating a collaborative code development project. There are many such tools available today: Subversions, Mercurial, Git, CVS, etc. I choose Git not only because it is a neat solution in terms of version control but also due to the popularity of Git-based online code development platform, GitHub. Continue reading