Glossary of PorToL Processing Pipeline Terms

base call

A process by which a program analyzes sequencer data to determine which base (A,C,T or G) is most likely the one which is present at a particular location in a sequence. The sequencer is an analog device which relies on chromatography, which is subject to interference from background noise, adjacent bases, and other factors. The confidence of a call is known as a quality score.
A program which searches for and removes known primers from the result sequences.
Polymerase Chain Reaction, a technique used to amplify a single sample of DNA into millions or billions of copies in a short period of time in order to provide enough amplitude in the input set for the sequencer machine.
A script which converts phred (phd) formatted files to FASTA format.
A program which aligns a set of contigs which were produced by phred for a given project.
A program that reads DNA sequencer trace data, calls bases, assigns quality values to the bases, and writes the base calls and quality values to output files. After calling bases, phred writes the sequences to files in either FASTA format, the format suitable for XBAP, PHD format, or the SCF format. Quality values for the bases are written to FASTA format files or PHD files, which can be used by the phrap sequence assembly program in order to increase the accuracy of the assembled sequence.
A molecule of DNA which binds to a predictable location of a candidate sequence. Primers are used to create known reference points when sequencing and assembling unknown fragments of DNA.
quality score
A measure of confidence in a base call. Quality scores can be presented in a variety of formats and number scales. A common one used in the PorToL project gives scores from 0 to 99 with a score of 30 being considered high quality.
A machine which analyzes DNA fragments and produces a stream of base sequences using one of a number of techniques. A common technique is one which uses lasers to detect bases which have been segregated into capilary tubes using a chromatography process.
The removal of low quality portions of a contig, which often occur at the beginning and end of a DNA sample due to the limitations of the sequencing process.