New “concat” operation in filo’s “groupBy” tool

I recently added a new concatenation operation for filo‘s groupBy tool. Similar to the “collapse” operation, it allows one to combine column values from multiple lines in a file or stream based upon common groups. For example, imagine we have the following simple BED file (names.bed).

$ cat names.bed
chr1 10  20  aaa
chr1 10  20  bbb
chr1 10  20  ccc

Using the “collapse” operation, groupBy will give us a list of the names for this common interval:

$ groupBy -i names.bed -g 1,2,3 -c 4 -o collapse
chr1 10  20  aaa,bbb,ccc

Now, by using the “concat” operation, groupBy will merge names for this common interval:

$ groupBy -i names.bed -g 1,2,3 -c 4 -o concat
chr1 10  20  aaabbbccc

This feature allows one to do many useful things, especially with DNA sequences. Below is an example that uses groupBy with BEDTools to create cDNA sequences from a BED file of exons for each gene/transcript. Such a starting file could be created by using the UCSC “knownGene” track.

Cancel reply

Leave a Comment

Previous post:

Next post:

This entry was posted in bedtools, filo, software and tagged , , , . Bookmark the permalink.

Comments are closed.