Superfamilytree

SFT1:

 * 1) BLAST all proteins that go into the tree against the NCBI NR Protein database and save hits into a file named after the protein destined to be in the tree (getNcbiSeq.pl).
 * 2) Of the NCBI BLAST hits, the program will randomly select 5 proteins for each protein destined to be in the tree (TreeProteins). (From here on it's the supertree.pl program.)
 * 3) The program will then compare all 5 hits for every tree protein with all the other NCBI BLAST hits, yielding a score. There will be 25 comparisons for each pair of TreeProteins.  If there are n TreeProteins, there will be n x 25 scores.
 * 4) The program then calculates the average score for the 25 comparisons that represent a single binary comparison of TreeProteins.
 * 5) The program transforms the average score into a relative distance [100/average-BLAST-bit-score].
 * 6) The program then repeats steps 2 - 5 100 times, each time with a different set of 5 proteins from each set of the NCBI BLAST hits.
 * 7) The Fitch program is then used to derive the 100 trees.
 * 8) The Consense program is used to integrate the 100 trees into a single tree.

To use SFT2 to derive a tree showing just the family (not the protein) relationships:

 * 1) Assign all members of a family to a single group.
 * 2) Within each group, the program selects 5 proteins (the original queries or the NCBI BLAST hits).
 * 3) The program repeats this procedure 100 times to generate many scores.
 * 4) The Fitch program is used to derive 100 familial trees
 * 5) The consense program is used to integrate the 100 trees into a single familial tree.

Steps to construct a phylogenetic tree using the "Hierarchical Clustering" (HC) program (Gabo's method).

 * 1) BLAST all proteins to be included in the tree against each other.
 * 2) Transform the scores into distances (new program). Following the distances used for the superfamilytree approach, the distance is [100/BLAST-bit-score].
 * 3) Proceed to perform hierarchical clustering with these distances (Ward method, using R's "cluster" module)