Suppose we have two different assemblies of the same organism, G1.fasta and G2.fasta, and we want to know the which regions in G2 correspond to our interested regions in G1, what should we do? UCSC LiftOver is a great tool to solve this kind of problems and here is how>
#!/usr/bin/sh # split G2 into individual chromosome/scaffold (G2.$i.fa) and save them into ./G2_chr direcotry # I did this step by using a perl script which I wrote, but you can also use UCSC faSplit tool I think. # Say there are n chromosomes/scaffolds in the G2 assembly, so we will have G2.1.fa, G2.2.fa, ..., G2.n.fa. # Split the G2 chromosomes/scaffolds into 3K chunks and make lift files mkdir lift mkdir split for i in {1..n} do faSplit size ./G2_chr/G2.$i.fa 3000 ./split/G2.$i.split -lift=./lift/G2.$i.lft -oneFile done # run blat mkdir psl for i in {1..n} do blat G1.fasta ./split/G2.$i.split.fa -t=dna -q=dna -tileSize=12 -fastMap -minIdentity=95 -noHead -minScore=100 ./psl/G2.$i.psl done # Change coordinates of .psl files to parent coordinate system mkdir liftup for i in {1..n} do liftUp -pslQ ./liftup/G2.$i.liftup.psl ./lift/G2.$i.lft warn ./psl/G2.$i.psl done # Make chain files mkdir chain_raw for i in {1..n} do axtChain -linearGap=medium -faQ -faT -psl ./liftup/G2.$i.liftup.psl ./G1.fasta ./G2.fasta ./chain_raw/$i.chain done # Merge and sort chain files chainMergeSort ./chain_raw/*.chain | chainSplit chain_split stdin faSize G1.fasta -detailed >G1.chr_length.txt faSize G2.fasta -detailed >G2.chr_length.txt # Make alignment nets from chain files mkdir net for i in ./chain_split/*.chain do tag=${i/\.\/chain_split\//} chainNet $i ./G1.chr_length.txt ./G2.chr_length.txt ./net/$tag.net /dev/null done # Create liftOver chain file mkdir over for i in ./chain_split/*.chain do tag=${i/\.\/chain_split\//} netChainSubset ./net/$tag.net $i ./over/$tag.chain done cat ./over/*.chain >G1_G2.over.chain # Make bed file to report converted coordinates. We can give the coordinates of our query regions (based on G1 assembly) in the input.bed file and liftOver will report the converted coordinates in conversion.bed file. liftOver input.bed ./G1_G2.over.chain conversion.bed unMapped
References
http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt
http://genomewiki.ucsc.edu/index.php/Minimal_Steps_For_LiftOver
http://genomewiki.ucsc.edu/index.php/LiftOver_Howto