[Inparanoid] InParanoid v4.1 does not work on more than 10000 sequences

Erik Sonnhammer erik.sonnhammer at scilifelab.se
Thu Mar 27 15:48:01 CET 2014


Hi Jean-Nicolas,


Sounds like your sequence files contains some weird characters that 
Blast chokes on.  Use "od -c" to check, there should only be normal 
characters and \n in there.


/Erik


On 2014-03-27 13:59, Jean-Nicolas Audet, Mr wrote:
>
> Hello,
>
> I've been trying to make InParanoid work using de novo transcriptomes 
> of two non-model species (birds) I assembled with Trinity. They were 
> 'translated' to protein sequences using Transdecoder. The resulting 
> fasta file I'm trying to use in InParanoid looks like this (~30 000 seqs):
>
> >comp100291_c0_seq1
>
> LPKKILLPIQQVLGHLLLALSYRGKVMQVKALKSKHEHNGPETLDAFLSSKLVVVKQPRE
>
> QAGFPLSIVFIPGEGRQERFLLHGEYNQSFCKEPVMELPRQ
>
> >comp102162_c0_seq1
>
> PNMTLHFLKSSPGSWRLSGLVLIPYVTETISGSCETLTRLQMPAHIQQSRWKAKHGPRIL
>
> LLGLLQNLRSLFPLKVLPPGANSQLKRNCSFTSVCLIGTFYVESS
>
> >comp102206_c0_seq1
>
> CQEQKWQKGNREEKGWAGVTVWGAYFPYLLIRCPNHQTSTPLSIHSQQHFMLCIIICPFS
>
> WLKPPVKTTQMFKGFFFKSGLKKFLALFLISWAAFATDRPLLGKQQSR
>
> I tried the example fasta files supplied with the program (called SC 
> and EC) and it works, but when I use my files, it's stuck at the first 
> step and it does not create any file (nor disk usage) after days. Here 
> is what I get with my fasta files:
>
> Loading module bio/ncbi-blast-2.2.22.
>
> Formatting BLAST databases
>
> Done formatting
>
> Starting BLAST searches...
>
> Starting first BLAST pass for bf - bf on [blastall] WARNING: the -C 3 
> argument is currently experimental
>
> It then stays like this forever (I tried up to 6 days with 24 CPUs and 
> 256G).
>
> I also tried supplying my Blast results (inter-sample) generated 
> myself that I parsed with their supplied parser but then it still 
> stays forever at the same state, again without generating any file:
>
> Done BLAST searches. Starting ortholog detection...
>
> I tried with and without bootstraping, multitreading (-a16 option) or 
> not, as I said with or without supplied blast results and I also 
> cleaned my fasta files for any weird characters (removed annotations, 
> all ' * ', spaces, empty lines and dots. Now I'm running out of 
> ideas... I'm using a Unix cluster. I tried these jobs using up to 4 to 
> 24 CPUs with 8 to 256G memory.
>
> Finally, since it was working with SC and EC, I tried with a small 
> subset of my transcriptomes (a few thousands sequences) and it worked. 
> Thus it seems to me that the problem could be that InParanoid cannot 
> take more than, say, 10000 sequences. I could split my transcriptomes 
> into several smaller files of specified sequence ranges but then the 
> orthologs that do not have the exact same length will have a chance to 
> be missed if they are in two different split files.
>
> Thanks in advance for your help,
>
> Jean-Nicolas
>
>
>
> _______________________________________________
> InParanoid mailing list
> InParanoid at lists.su.se
> https://lists.su.se/mailman/listinfo/inparanoid-at-sbc.su.se

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.su.se/pipermail/inparanoid-at-sbc.su.se/attachments/20140327/b39febc0/attachment.html>


More information about the InParanoid mailing list