Protocol1

Some else write the summary of protocol1. I just work with the code, I honestly have no idea what it does.

Relevant Usage at Saierlab
Protocol1 does a PSI-BLAST search for all the homologues of the specified query protein (default number of homologues to return is 500). If the number of homologues found is too small (i.e. less than 50), it is a good idea to perform additional iteration (you need the optional "-i" command to add iteration; without the command, default number of iteration to perform is zero), which is just an additional BLAST that uses evolutionarily conserved regions from all the homologues to pick up distant homologues (it may be necessary to eliminate evolutionarily irrelevant sequences directly from the NCBI PSI-BLAST page; check functional annotation to confirm this). If the number of homologues found is too large, there is also a "cutoff" value you can input (using the " -c" command followed by an appropriate number between 0.4 and 1 ) to eliminate highly conserved homologues that will not help you retrieve distant homologues. It is also helpful to retrieve the number of tms (use the " --tms" command ) for each homologues so we can detect addition/deletion of domain and see if they make evolutionary sense. You can also modified the name of the output folder using the "-o" command (" p1out" is assigned as the default name) .

Once all homologues are found, all the relevant information pertaining to the proteins will be displayed in a tabular form as an output file ("results.tab") that you can sort through through Excel. Note that "Abbreviation" includes the first letter of Genus name followed by the first 2 letters of the species name. If there is any number that follows the name, there will be multiple paralogues within the same organism. You can find the protein sequences and the abbreviations for them in the "results.faa" file.

In addition to the "results.tab" file, there will also be a "results.16S" file, which just contains the 16S rRNA sequences from all the relevant organisms. It will be necessary for you to compare the phylogenetic trees between the 16S and the protein hits you obtained in the "results.tab" file to determine if each protein arose from its respective host or not.

Protocol1 Usage Without Command Line Options
Just open the terminal and enter "protocol1". It will ask you for the options that you would have entered as command line options, specifically, protein ID (assession/GI), whether or not you want to count TMS's, CD-HIT threshold, and output path. If the terminal says something like "command not found", protocol1 may not be installed on your computer. Contact Bryant or Vamsee.

Protocol1 Usage With Command Line Options
If you want to learn more about command line options, just enter "protocol1", and by default, protocol1 will display the command line options available. It will also prod you for instructions, so to exit press Ctrl-C. The options are as follows:

Welcome to Protocol1 - This tool will run a PSI Blast with iterations, collect results,        remove redundant/simmilar sequences annotate, tabulate, & count TMSs. Developed by Vamsee Reddy :: Part of the BioV Suite. Options: --version  show program's version number and exit -h, --help show this help message and exit -q QUERY   Gi/Accession/Sequence to BLAST. -i ITERATE Number of additional iterations to perform. (0)  -n NUMBER   Number of results to fetch each round (500) -e EXPECT  E-Value cutoff (0.005) -c CUTOFF  CD-HIT threshold. From 0.4 - 1 (0.8) -o OUTDIR  Output folder (p1out) --tms      Include this flag to tabulate TMS stats. --min=MIN  Minimum sequences lengths to retrieve --max=MAX  Maximum sequences lengths to retrieve

The options that it asks you by default when you run protocol1 without command line options are the query, yes/no on TMS stats, output path, and CD-HIT threshold. Here is an example of a protocol1 command that fulfills all default information. protocol1 -q 16131859 --tms -o test2 -c 0.8

Changing the number of iterations
Since by default the number of iterations is not asked for when command line options are not used, the number of iterations must be changed by running it from the command line, with the -i option. Here's an example: protocol1 -q 16131859 --tms -o test2 -c 0.8 -i 2

Person to Ask

 * Eric
 * Ujj
 * Most people, a lot of people use protocol1