|
|
| Variable Region Finder |
untitled
Database Introduction
This database provides a list of variable regions within a group of strains or species. A variable region is defined as a locus that has significant variation between genomes belonging to the same group. These are found by looking for regions which do not have significant blast ‘hits’ between the genomes. These regions can be used to form the basis of new typing assays. For example by designing primers to conserved sequences in the flanking regions or within the variable region itself the amplification product may reveal length or sequence polymorphisms that could be exploited to derive a genotype. They may also be of interest to biologists looking at phenotypic variation between strains and possible correlation with genomic variation.
Database Use
- Click on a group in the left hand column. This will display a table where each row represents a variable region. By default these are ordered by position in the genome of one of the strains in the group that has been used as a reference strain.
- You can re-order the regions by clicking on the large blue arrows in each column. In this way you can sort the regions by ascending or descending order based on percentage identity or length of the variable region alignment as well as the genomic position. You can also change the cut-off value for the percentage identity by typing a number in the box at the top of the web page and clicking on the 'Filter by percent id cutoff' button. This cut-off will exclude those regions for which the overall percentage identity of the alignment is equal to or above the value.
- The simplest option is to view an alignment of a variable region by clicking on the 'Pre-defined clustal alignment' button if it exists. This will open a new window and display the alignment for the variable region. This can be downloaded in fasta format using the link at the top of the page
- For more details about a variable region, click on the plus button at the start of the row. This will reveal more data about the strains (it may be take a few seconds to retrieve the data from the server). The data includes for each strain the genomic position, the length and direction of the sequence within the alignment and any features described in the genbank file for this particular region.
'Clustal alignment' and 'MAFFT alignment' buttons will also be revealed. MAFFT is an alignment algorithm similar to clustal but much quicker. In some cases clustal may produce a more accurate alignment. These buttons will regenerate an alignment (probably taking from several seconds to a minute) based on any changes made in this panel or the one described in 5).
- Clicking on the plus button to the left of each strain name will reveal the alternative genomic positions for this variable region for the strain you have clicked on. These alternative positions result from the fact that within the algorithm each genome within the set is used in turn as the query sequence for blast searches. Therefore the number of alternative positions will equal the number of strains in the group. By default the 'earliest' starting position and 'latest' end position are used to extract the sequence used in the alignment.
Sometimes the algorithm will obtain the incorrect values for the start and or end position due to a chance blast 'hit' in a region of the genome other than adjacent to the variable region in question. This will often manifest itself by the maximum length of the alignment being very large and the percentage identity values and 'pre-defined clustal alignment' buttons being absent from the row. By looking at the alternative positions an 'offending' alternative position can be identified and excluded from the alignment by deselecting the adjacent check-box. Sometimes it is obvious which alternative position is causing the problem and this can be eliminated. For other variable regions a certain amount of trail and error may be required when changing first the alternative positions and as a last resort the directions before recalculating the alignment by clicking on the clustal or mafft alignment buttons.
- By looking at the features found within the positions of the variable region and the alignment itself, those regions which are of potential interest can be identified.
|
|