, , , ,

UCSC Genome Browser website surely is one of the must-know sites for both experimental and computational biologists. As their name suggested, it is pretty nice for visualizing and analyzing genomics data without getting bogged down in bioinformatics details. You can even do quite powerful data-mining on your favorite organisms or genes by just a few clicks. A good tutorial for UCSC Genome Browser and also another powerful genome browser called Ensembl can be found here. With the nice, intuitively designed web interface in the front, what happens behind scenes is many programs or utilities being organized together and doing their jobs as integrated pipelines. These utilities (called Kent Source Utilities since they were originally written by Dr. Jim Kent) can be of great help to our own data analysis if we can have a copy of them installed on our local machine. On the UCSC Genome Browser website, they provided access to some of those most commonly used tools for stand alone use. However, this is only a subset of all their tools. If we need to use other tools, we have no choice but to build a copy of our own from their source code. This is what we gonna do in this blog.

1) MySQL is the prerequisite for our building. Although it is installed on Rice SUG@R cluster, I cannot find the file “libmysqlclient.a” on SUG@R, which will be used in UCSC Kent Utilities building. So I have to installed a copy of MySQL in my home directory. The newest version of MySQL is 5.5.24 and it can be downloaded here. I downloaded the source code version for Generic Linux (Architecture Independent). The current version of MySQL need CMake to compile, so before we can do anything with MySQL, we need to have to back up a step and install CMake first.

> wget http://www.cmake.org/files/v2.8/cmake-2.8.8.tar.gz
> tar -xzvf cmake-2.8.8.tar.gz
> cd cmake-2.8.8
> ./configure --prefix=/users/NetID/local/
> gmake
> make install

Then install MySQL using CMake:

> tar xvzf mysql-5.5.24.tar.gz
> cd mysql-5.5.24
> mkdir -p /users/NetID/local/mysql
> cmake -DCMAKE_INSTALL_PREFIX=/users/NetID/local/mysql
> make
> make install

Of course, we still need more work to configure MySQL to let it work well. But since we have MySQL support on SUG@R already and only need a file from this installation, so we can take care of those configuration steps later when we actually need to run our local copy of MySQL.

2) Now we can finally work on building UCSC Kent Utilities. Check the shell environment first. We need to change the value of MACHTYPE to x86_64 for our build and also set this in the .bashrc file under my home directory.

> echo $MACHTYPE
> x86_64-redhat-linux-gnu
> MACHTYPE=x86_64
> emacs .bashrc
> export MACHTYPE=x86_64 # add this line into the .bashrc file

3) Under the $Home/bin/ directory, create a directory named x86_64 for those binary files generated during our compiling.

> mkdir -p $HOME/bin/${MACHTYPE}

4) Create the MySQL shell environment variables we need:

> MYSQLINC=/usr/include/mysql
> MYSQLLIBS="/users/NetID/local/mysql/lib/libmysqlclient.a -lz"

5) Now we are ready to compile those utilities. Unzip the UCSC source code we downloaded and we will see a resulting directory named kent.

> unzip jksrc.zip
> cd kent/src
> make libs
> cd utils/
> make

6) Then we are done. All those complied binary utilities can be found in $HOME/bin/$MACHTYPE directory. For me, it is /users/NetID/bin/x86_64/. Just try one, say faSplit. After typing faSplit in shell, we see the usage for this utility. So our build works!

> cd $HOME/bin/$MACHTYPE
> ./faSplit
> faSplit - Split an fa file into several files.
> usage:
>   faSplit how input.fa count outRoot
> where how is either 'about' 'byname' 'base' 'gap' 'sequence' or 'size'.
> Files split by sequence will be broken at the nearest fa record boundary.
> Files split by base will be broken at any base.
> Files broken by size will be broken every count bases.