• Welcome
  • Research
  • CV
  • Publications
  • Software
  • Blogs
  • Travel
  • Contact
  • Links

iAmphioxus

~ It's a long way from Amphioxus

iAmphioxus

Tag Archives: Kent Utilities

Using LiftOver to convert genome assembly coordinates

25 Tuesday Jun 2013

Posted by iAmphioxus in Bioinformatics, Blogs

≈ 4 Comments

Tags

Genome assembly, Kent Utilities, LiftOver, UCSC Genome Browser

Suppose we have two different assemblies of the same organism, G1.fasta and G2.fasta, and we want to know the which regions in G2 correspond to our interested regions in G1, what should we do? UCSC LiftOver is a great tool to solve this kind of problems and here is how>

#!/usr/bin/sh

# split G2 into individual chromosome/scaffold (G2.$i.fa) and save them into ./G2_chr direcotry
# I did this step by using a perl script which I wrote, but you can also use UCSC faSplit tool I think.
# Say there are n chromosomes/scaffolds in the G2 assembly, so we will have G2.1.fa, G2.2.fa, ..., G2.n.fa.

# Split the G2 chromosomes/scaffolds into 3K chunks and make lift files
mkdir lift
mkdir split
for i in {1..n}
do
  faSplit size ./G2_chr/G2.$i.fa  3000 ./split/G2.$i.split  -lift=./lift/G2.$i.lft -oneFile
done

# run blat
mkdir psl
for i in {1..n}
do
  blat G1.fasta ./split/G2.$i.split.fa -t=dna -q=dna -tileSize=12 -fastMap -minIdentity=95 -noHead -minScore=100  ./psl/G2.$i.psl
done

# Change coordinates of .psl files to parent coordinate system
mkdir liftup
for i in {1..n}
do
	liftUp -pslQ ./liftup/G2.$i.liftup.psl ./lift/G2.$i.lft warn ./psl/G2.$i.psl
done

# Make chain files
mkdir chain_raw
for i in {1..n}
do
	axtChain -linearGap=medium  -faQ -faT -psl ./liftup/G2.$i.liftup.psl ./G1.fasta ./G2.fasta ./chain_raw/$i.chain
done

# Merge and sort chain files
chainMergeSort ./chain_raw/*.chain | chainSplit chain_split stdin

faSize G1.fasta  -detailed >G1.chr_length.txt
faSize G2.fasta  -detailed >G2.chr_length.txt

# Make alignment nets from chain files
mkdir net
for i in  ./chain_split/*.chain
do
 tag=${i/\.\/chain_split\//}
 chainNet $i ./G1.chr_length.txt ./G2.chr_length.txt ./net/$tag.net /dev/null
done

# Create liftOver chain file
mkdir over
for i in ./chain_split/*.chain
do
  tag=${i/\.\/chain_split\//}
  netChainSubset  ./net/$tag.net $i ./over/$tag.chain
done

cat ./over/*.chain >G1_G2.over.chain

# Make bed file to report converted coordinates. We can give the coordinates of our query regions (based on G1 assembly) in the input.bed file and liftOver will report the converted coordinates in conversion.bed file.
liftOver input.bed  ./G1_G2.over.chain conversion.bed unMapped

References
http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt
http://genomewiki.ucsc.edu/index.php/Minimal_Steps_For_LiftOver
http://genomewiki.ucsc.edu/index.php/LiftOver_Howto

29.718460 -95.402683

Share this:

  • Email
  • Twitter
  • Facebook

Like this:

Like Loading...

Building UCSC Kent Source Utilities on Rice SUG@R Cluster

17 Thursday May 2012

Posted by iAmphioxus in Bioinformatics, Blogs

≈ Leave a comment

Tags

CMake, Ensembl, Kent Utilities, MySQL, UCSC Genome Browser

UCSC Genome Browser website surely is one of the must-know sites for both experimental and computational biologists. As their name suggested, it is pretty nice for visualizing and analyzing genomics data without getting bogged down in bioinformatics details. You can even do quite powerful data-mining on your favorite organisms or genes by just a few clicks. A good tutorial for UCSC Genome Browser and also another powerful genome browser called Ensembl can be found here. With the nice, intuitively designed web interface in the front, what happens behind scenes is many programs or utilities being organized together and doing their jobs as integrated pipelines. These utilities (called Kent Source Utilities since they were originally written by Dr. Jim Kent) can be of great help to our own data analysis if we can have a copy of them installed on our local machine. On the UCSC Genome Browser website, they provided access to some of those most commonly used tools for stand alone use. However, this is only a subset of all their tools. If we need to use other tools, we have no choice but to build a copy of our own from their source code. This is what we gonna do in this blog. Continue reading →

29.718460 -95.402683

Share this:

  • Email
  • Twitter
  • Facebook

Like this:

Like Loading...

Recent Posts

  • Using anaconda to manage local python environment
  • Local installation of ncbi-blast+ together with the nr and taxonomy database
  • Installing Augustus with manual bamtools installation
  • Custom installation of PacBio’s GenomicConsensus (Quiver)
  • Compiling 64-bit MUMmer

Archives

  • January 2018 (2)
  • May 2017 (2)
  • February 2016 (1)
  • January 2016 (1)
  • August 2015 (1)
  • July 2015 (1)
  • April 2015 (1)
  • June 2013 (1)
  • April 2013 (1)
  • October 2012 (1)
  • May 2012 (1)
  • April 2012 (1)
  • March 2012 (1)
  • February 2012 (2)
  • January 2012 (2)

Categories

  • Blogs (19)
    • Bioinformatics (14)
    • Journal club (3)

2012 annotation arabidopsis Augustus bamtools BioPerl blasr blast C. elegans Centennial CMake conda CPAN developmental biology embryonic genesis Ensembl evolution genewise Genome assembly GenomicConsensus Git GitHub Kent Utilities LiftOver Macports maker maternal effect multicellularity mummer MySQL ncbi new year nr PacBio parallel evolution Perl plant biology Python Quiver resolution Rice rmath SMRTanalysis snoscan Stampy synteny taxonomy tRNAscan-SE UCSC Genome Browser wise2 yeast

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 320 other followers

Blog Stats

  • 34,627 hits
Locations of visitors to this page
Map

Blog at WordPress.com.

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.
%d bloggers like this: