| 
 | NeoBio API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Object | +--neobio.alignment.FactorSequence
This class builds a list of factors of a character sequence as induced by its Lempel-Ziv (LZ78) factorisation. Each factor is enconded as the longest factor previously seen plus one character.
The input can come from any source, provided it is encapsulated in a proper
 Reader instance. The stream is expected to be ready (i.e. the next
 read operation must return the first character of the sequence) and it is
 not closed when its end is reached, so the client is allowed to reset it and maybe use
 it for another purpose.
Sequences can contain letters only although lines started with the
 COMMENT_CHAR character ('>') are regarded as comments and are completely
 skipped. White spaces (including tabs, line feeds and carriage returns) are also
 ignored throughout.
This class uses a Trie to keep track of a list of factors. Each node of the trie contains a Factor of the text. As the sequence is read from the input, the trie is traversed as far as possible. When a leaf node is reached (which means that the longest prefix of the input has been found), two tasks are accomplished:
Factor is created with the character at the current position of
 the input and the leaf node's factor;
 Each factor also receives a serial number according to the order they are found and a pointer to the next factor (in that order) for fast access. This pointer, together with the factor's ancestor pointer forms a doubly-linked list of factors. The original text can then be reconstructed simply by following the linked list and writing out its factors.
As an example, the sequence ACTAAACCGCATTAATAATAAAA is parsed into the
 following 12 factors:
 0  ( , ) = empty
 1  (0,A) = A
 2  (0,C) = C
 3  (0,T) = T
 4  (1,A) = AA
 5  (1,C) = AC
 6  (2,G) = CG
 7  (2,A) = CA
 8  (3,T) = TT
 9  (4,T) = AAT
 10 (9,A) = AATA
 11 (4,A) = AAA
 serial # (prefix, new char) = factor text
 
This class is used by CrochemoreLandauZivUkelson algorithm to speed up the classic dynamic programming approach to sequence alignment.
Factor, 
Trie, 
CrochemoreLandauZivUkelson| Field Summary | |
| protected static char | COMMENT_CHARThe character used to start a comment line in a sequence file. | 
| protected  int | num_charsThe numbers of character represented by this sequence. | 
| protected  int | num_factorsThe numbers of factors generated by the LZ78 parsing of the sequence. | 
| protected  Factor | root_factorA pointer to the root factor, the one that starts the list of factors. | 
| Constructor Summary | |
| FactorSequence(java.io.Reader reader)Creates a new instance of a FactorSequence, loading the sequence data
 from theReaderinput stream. | |
| Method Summary | |
|  Factor | getRootFactor()Returns the root factor, the one that starts the list of factors. | 
|  int | numChars()Returns the number of characters of the original sequence. | 
|  int | numFactors()Returns the number of factors produced by the LZ78 parsing of the text. | 
|  java.lang.String | printFactors()Returns a string representation of the actual list of factors produced by the LZ78 parsing of the text. | 
|  java.lang.String | toString()Reconstructs the sequence from the list of factors induced by the LZ78 parsing of the text. | 
| Methods inherited from class java.lang.Object | 
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait | 
| Field Detail | 
protected static final char COMMENT_CHAR
protected Factor root_factor
protected int num_chars
protected int num_factors
| Constructor Detail | 
public FactorSequence(java.io.Reader reader)
               throws java.io.IOException,
                      InvalidSequenceException
FactorSequence, loading the sequence data
 from the Reader input stream. A doubly-linked list of factors is built
 according to its LZ78 factorisation.
reader - source of characters for this sequence
java.io.IOException - if an I/O exception occurs when reading the input
InvalidSequenceException - if the input does not contain a valid sequence| Method Detail | 
public Factor getRootFactor()
public int numFactors()
public int numChars()
public java.lang.String toString()
toString in class java.lang.Objectpublic java.lang.String printFactors()
| 
 |  | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||