Submodule contained by Alignment . More...
#include <Cleaner.h>
Public Member Functions | |
int | selectMethod () |
Method that selects the best cleaning method based on statistics of the alignment. More... | |
Alignment * | cleanByCutValueOverpass (double cut, float baseLine, const int *gInCol, bool complementary) |
Method to clean an alignment. It removes sequences that overpass or equals a certain threshold. The function detects if it would clean too many sequences, and relaxes the threshold until we have enough sequences to achieve the given percentage desired to keep. To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one. . More... | |
Alignment * | cleanByCutValueFallBehind (float cut, float baseLine, const float *ValueVect, bool complementary) |
Method to clean an alignment. It removes sequences that fall behind a certain threshold. The function detects if it would clean too many sequences, and relaxes the threshold until we have enough sequences to achieve the given percentage desired to keep. To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one. More... | |
Alignment * | cleanByCutValueOverpassOrEquals (double cutGaps, const int *gInCol, float baseLine, float cutCons, const float *MDK_Win, bool complementary) |
Method to clean an alignment. It removes sequences that overpass or equals the gap threshold but fall behind the similarity threshold. The function detects if it would clean too many sequences, and relaxes both thresholds until we have enough sequences to achieve the given percentage desired to keep. To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one. More... | |
Alignment * | cleanStrict (int gapCut, const int *gInCol, float simCut, const float *MDK_W, bool complementary, bool variable) |
Method to clean an alignment. It carries out strict and strictplus. It removes sequences that overpass the gap threshold but fall behind the similarity threshold. The function recovers those columns that, by themselves would be rejected, but it's neighbours (3 of 4) don't. Column blocks that don't have a minimum size set by the method itself, will be removed too. More... | |
Alignment * | cleanOverlapSeq (float minimumOverlap, float *overlapSeq, bool complementary) |
Method to trim an alignment based on a minimum sequence overlap threshold. The method selects a combination of parameters to maximize the final number of columns in the new alignment. . More... | |
Alignment * | cleanGaps (float baseLine, float gapsPct, bool complementary) |
Method to trim an alignment based on the gap distribution values. Column blocks that don't have a minimum size set by the method itself, will be removed too. More... | |
Alignment * | cleanConservation (float baseLine, float conservationPct, bool complementary) |
Method to trim an alignment based on the similarity distribution values. More... | |
Alignment * | clean (float baseLine, float GapsPct, float conservationPct, bool complementary) |
Method to trim an alignment based on the similarity and gaps distribution values. . More... | |
Alignment * | cleanCompareFile (float cutpoint, float baseLine, float *vectValues, bool complementary) |
Method to trim an alignment based on consistency values obtained from a dataset of alignments. The function computes the optimal parameter combination values to trim an alignment based on the consistency value from the comparison among a dataset of alignments with the same sequences. More... | |
bool | calculateSpuriousVector (float overlap, float *spuriousVector) |
Method to compute the overlap values. See Alignment::overlaps. More... | |
Alignment * | cleanSpuriousSeq (float overlapColumn, float minimumOverlap, bool complementary) |
Method to remove sequences missaligned with the rest of sequences in the alignment. For each residue in the sequence, it tests it's similarity. If the similarity of that residue is higher than overlapColumn value, it counts as a hit for the sequence. After calculating the number of hits for the sequence, it removes the sequence if it has a proportion hits/residues lower tan minimumOverlap. More... | |
Alignment * | clean2ndSlope (bool complementary) |
Method that carries the gappyout approach. This methods calculates the slope in gaps distribution on the original alignment. Then, it compares groups of three consecutive residues, searching for the group with the most abrupt change in slope. When found, the first residue is taken as the cutpoint for the sequences. More... | |
Alignment * | cleanCombMethods (bool complementary, bool variable) |
Method to clean an alignment. It carries out strict and strictplus. The method: More... | |
Alignment * | cleanNoAllGaps (bool complementary) |
Method to remove columns composed only by gaps This method is specially useful when we remove missaligned sequences from a given alignment. More... | |
Alignment * | removeColumns (int *columns, int init, int size, bool complementary) |
Method to remove columns, expressed as ranges. More... | |
Alignment * | removeSequences (int *seqs, int init, int size, bool complementary) |
Method to remove sequences, expressed as ranges. More... | |
Alignment * | getClustering (float identityThreshold) |
Method to select the most representative sequence (the longest one) for each cluster from the input alignment to generate a new alignment. More... | |
float | getCutPointClusters (int clusterNumber) |
Method that calculates the optimal cut point for a given clusters number. The idea is to obtain a cutpoint that can be used to obtain as sequences as clusterNumber. More... | |
void | removeSmallerBlocks (int blockSize, Alignment &original) |
Method to remove blocks of columns with no rejected residue smaller than a given size. More... | |
bool | removeOnlyTerminal () |
Method to detect right and left borders. Borders are the first column found with no gaps. Everything between the borders are kept in the trimmed alignments. More... | |
void | removeAllGapsSeqsAndCols (bool seqs=true, bool cols=true) |
Method that identifies and removes columns and sequences composed only by gaps. More... | |
void | setTrimTerminalGapsFlag (bool terminalOnly_) |
Setter method to Terminal Only Flag. More... | |
void | setBoundaries (int *boundaries) |
Boundaries setter. More... | |
void | calculateSeqIdentity () |
Method to calculate identities between the sequences from the alignment. See Alignment::identities. More... | |
void | calculateRelaxedSeqIdentity () |
Method that makes a raw approximation of sequence identity computation. . More... | |
int * | calculateRepresentativeSeq (float maximumIdent) |
Method to assign sequences to clusters. Clusters are calculated following this flow: More... | |
void | computeComplementaryAlig (bool residues, bool sequences) |
Method for computing the complementary alignment. Complementary alignment is an alignment containing all sequences and columns that the original alignment would reject. It inverses the saveResidues / saveSequences tags. More... | |
void | removeDuplicates () |
Public Attributes | |
bool | terminalGapOnly |
Flag for trimming only on the terminal positions. More... | |
bool | keepSequences |
Flag for keeping sequences even when they are composed only by gaps. More... | |
int | blockSize |
Block size to use on the cleaning methods. More... | |
int | left_boundary |
left boundary of the alignment More... | |
int | right_boundary |
right boundary of the alignment More... | |
Private Member Functions | |
Cleaner (Alignment *parent) | |
Class Constructor - Called by Alignment. More... | |
Cleaner (Alignment *parent, Cleaner *mold) | |
Copy Constructor - Called by Alignment. More... | |
Private Attributes | |
Alignment * | alig |
Pointer to the alignment that contains this object. More... | |
Friends | |
class | Alignment |
Submodule contained by Alignment .
|
explicitprivate |
Class Constructor - Called by Alignment.
Definition at line 1633 of file Cleaner.cpp.
References alig, blockSize, keepSequences, left_boundary, right_boundary, and terminalGapOnly.
Referenced by Alignment::Alignment().
Copy Constructor - Called by Alignment.
Definition at line 1648 of file Cleaner.cpp.
References alig, blockSize, keepSequences, left_boundary, right_boundary, and terminalGapOnly.
Referenced by Alignment::Alignment().
void Cleaner::calculateRelaxedSeqIdentity | ( | ) |
Method that makes a raw approximation of sequence identity computation.
.
Definition at line 1483 of file Cleaner.cpp.
References alig, Alignment::identities, Alignment::originalNumberOfResidues, Alignment::originalNumberOfSequences, Alignment::saveResidues, Alignment::saveSequences, and Alignment::sequences.
int * Cleaner::calculateRepresentativeSeq | ( | float | maximumIdent | ) |
Method to assign sequences to clusters.
Clusters are calculated following this flow:
maximumIdent | Identity threshold used to decide if a sequence should be part of a cluster or create a new one. |
Definition at line 1525 of file Cleaner.cpp.
References alig, calculateSeqIdentity(), Alignment::Cleaning, Alignment::identities, Alignment::originalNumberOfSequences, utils::quicksort(), utils::removeCharacter(), Alignment::saveSequences, and Alignment::sequences.
Referenced by getClustering().
void Cleaner::calculateSeqIdentity | ( | ) |
Method to calculate identities between the sequences from the alignment.
See Alignment::identities.
Definition at line 1434 of file Cleaner.cpp.
References AA, alig, Alignment::getAlignmentType(), Alignment::identities, Alignment::numberOfResidues, Alignment::originalNumberOfSequences, Alignment::saveResidues, Alignment::saveSequences, and Alignment::sequences.
Referenced by calculateRepresentativeSeq(), getCutPointClusters(), Alignment::printSeqIdentity(), and selectMethod().
bool Cleaner::calculateSpuriousVector | ( | float | overlap, |
float * | spuriousVector | ||
) |
Method to compute the overlap values.
See Alignment::overlaps.
overlap | Overlap threshold. | |
[out] | spuriousVector | Pointer to the spuriousVector to fill. |
Definition at line 855 of file Cleaner.cpp.
References AA, alig, Alignment::getAlignmentType(), Alignment::originalNumberOfResidues, Alignment::originalNumberOfSequences, and Alignment::sequences.
Referenced by cleanSpuriousSeq().
Alignment * Cleaner::clean | ( | float | baseLine, |
float | GapsPct, | ||
float | conservationPct, | ||
bool | complementary | ||
) |
Method to trim an alignment based on the similarity and gaps distribution values.
.
baseLine | Minimum percentage of columns to conserve in the new alignment. |
GapsPct | Maximum percentage of gaps per column. |
conservationPct | Minimum value of similarity per column to keep the column. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 784 of file Cleaner.cpp.
References alig, statistics::Gaps::calcCutPoint(), statistics::Similarity::calcCutPoint(), statistics::Manager::calculateConservationStats(), statistics::Manager::calculateGapStats(), cleanByCutValueOverpassOrEquals(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), statistics::Similarity::getMdkWindowedVector(), statistics::Manager::similarity, and Alignment::Statistics.
Referenced by trimAlManager::CleanResiduesNonAuto().
Alignment * Cleaner::clean2ndSlope | ( | bool | complementary | ) |
Method that carries the gappyout approach.
This methods calculates the slope in gaps distribution on the original alignment.
Then, it compares groups of three consecutive residues, searching for the group with the most abrupt change in slope.
When found, the first residue is taken as the cutpoint for the sequences.
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 957 of file Cleaner.cpp.
References alig, statistics::Gaps::calcCutPoint2ndSlope(), statistics::Manager::calculateGapStats(), cleanByCutValueOverpass(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), and Alignment::Statistics.
Referenced by trimAlManager::CleanResiduesAuto().
Alignment * Cleaner::cleanByCutValueFallBehind | ( | float | cut, |
float | baseLine, | ||
const float * | ValueVect, | ||
bool | complementary | ||
) |
Method to clean an alignment.
It removes sequences that fall behind a certain threshold.
The function detects if it would clean too many sequences, and relaxes the threshold until we have enough sequences to achieve the given percentage desired to keep.
To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one.
cut | Gap cut value to use. |
baseLine | Percent of sequences to keep |
ValueVect | Vector that contains the gaps present on each column. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 231 of file Cleaner.cpp.
References alig, Alignment::Alignment(), blockSize, Alignment::Cleaning, Alignment::numberOfResidues, Alignment::originalNumberOfResidues, removeAllGapsSeqsAndCols(), removeSmallerBlocks(), utils::roundInt(), and Alignment::saveResidues.
Referenced by cleanCompareFile(), and cleanConservation().
Alignment * Cleaner::cleanByCutValueOverpass | ( | double | cut, |
float | baseLine, | ||
const int * | gInCol, | ||
bool | complementary | ||
) |
Method to clean an alignment.
It removes sequences that overpass or equals a certain threshold.
The function detects if it would clean too many sequences, and relaxes the threshold until we have enough sequences to achieve the given percentage desired to keep.
To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one.
.
cut | Cut value to use. If a column has a value lower or equal to the cut value, it is removed. |
baseLine | Percent of sequences to keep |
gInCol | Vector that contains the values that will be tested for each column. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 90 of file Cleaner.cpp.
References alig, Alignment::Alignment(), blockSize, Alignment::Cleaning, Alignment::numberOfResidues, Alignment::originalNumberOfResidues, utils::quicksort(), removeAllGapsSeqsAndCols(), removeSmallerBlocks(), utils::roundInt(), and Alignment::saveResidues.
Referenced by clean2ndSlope(), cleanGaps(), and cleanNoAllGaps().
Alignment * Cleaner::cleanByCutValueOverpassOrEquals | ( | double | cutGaps, |
const int * | gInCol, | ||
float | baseLine, | ||
float | cutCons, | ||
const float * | MDK_Win, | ||
bool | complementary | ||
) |
Method to clean an alignment.
It removes sequences that overpass or equals the gap threshold but fall behind the similarity threshold.
The function detects if it would clean too many sequences, and relaxes both thresholds until we have enough sequences to achieve the given percentage desired to keep.
To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one.
cutGaps | Gap cut value to use. |
baseLine | Percent of sequences to keep |
gInCol | Vector that contains the gaps present on each column. |
MDK_Win | Vector that contains similarity value for each column. |
cutCons | Minimum similarity value to keep a column in the alignment. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 357 of file Cleaner.cpp.
References alig, Alignment::Alignment(), blockSize, Alignment::Cleaning, Alignment::numberOfResidues, Alignment::originalNumberOfResidues, utils::quicksort(), removeAllGapsSeqsAndCols(), removeSmallerBlocks(), utils::roundInt(), and Alignment::saveResidues.
Referenced by clean().
Alignment * Cleaner::cleanCombMethods | ( | bool | complementary, |
bool | variable | ||
) |
Method to clean an alignment. It carries out strict and strictplus.
The method:
complementary | Whether or not to return the complementary version of the trimmed alignment. |
variable | Whether to use a variable block length. If false, block will be size 5. Else, it will use 1% of the alignment length, with a minimum of 3 and maximum of 12. This value will be overwritten if blockSize (of this object) is bigger than 0. |
Definition at line 980 of file Cleaner.cpp.
References alig, statistics::Gaps::calcCutPoint2ndSlope(), statistics::Manager::calculateConservationStats(), cleanStrict(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), statistics::Similarity::getMdkWindowedVector(), utils::initlVect(), Alignment::originalNumberOfResidues, utils::quicksort(), Alignment::saveResidues, statistics::Manager::similarity, and Alignment::Statistics.
Referenced by trimAlManager::CleanResiduesAuto().
Alignment * Cleaner::cleanCompareFile | ( | float | cutpoint, |
float | baseLine, | ||
float * | vectValues, | ||
bool | complementary | ||
) |
Method to trim an alignment based on consistency values obtained from a dataset of alignments.
The function computes the optimal parameter combination values to trim an alignment based on the consistency value from the comparison among a dataset of alignments with the same sequences.
cutpoint | Hint of Gap cut point. May be used if it's lower than the minimum percentage threshold. |
baseLine | Minimum percentage of columns to conserve in the new alignment. |
vectValues | Vector with alignment consistency values |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 820 of file Cleaner.cpp.
References alig, cleanByCutValueFallBehind(), utils::copyVect(), utils::min(), Alignment::originalNumberOfResidues, and utils::quicksort().
Referenced by trimAlManager::CleanResiduesNonAuto().
Alignment * Cleaner::cleanConservation | ( | float | baseLine, |
float | conservationPct, | ||
bool | complementary | ||
) |
Method to trim an alignment based on the similarity distribution values.
baseLine | Minimum percentage of columns to conserve in the new alignment. |
conservationPct | Minimum value of similarity per column to keep the column. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 760 of file Cleaner.cpp.
References alig, statistics::Similarity::calcCutPoint(), statistics::Manager::calculateConservationStats(), cleanByCutValueFallBehind(), statistics::Similarity::getMdkWindowedVector(), statistics::Manager::similarity, and Alignment::Statistics.
Referenced by trimAlManager::CleanResiduesNonAuto().
Alignment * Cleaner::cleanGaps | ( | float | baseLine, |
float | gapsPct, | ||
bool | complementary | ||
) |
Method to trim an alignment based on the gap distribution values. Column blocks that don't have a minimum size set by the method itself, will be removed too.
baseLine | Minimum percentage of columns to conserve in the new alignment. |
gapsPct | Maximum percentage of gaps per column. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 736 of file Cleaner.cpp.
References alig, statistics::Gaps::calcCutPoint(), statistics::Manager::calculateGapStats(), cleanByCutValueOverpass(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), and Alignment::Statistics.
Referenced by trimAlManager::CleanResiduesAuto(), and trimAlManager::CleanResiduesNonAuto().
Alignment * Cleaner::cleanNoAllGaps | ( | bool | complementary | ) |
Method to remove columns composed only by gaps
This method is specially useful when we remove missaligned sequences from a given alignment.
complementary | Whether or not to return the complementary version of the trimmed alignment. Although this method contains a complementary flag, setting this up would return an alignment full of gaps-only columns. |
Definition at line 1049 of file Cleaner.cpp.
References alig, statistics::Manager::calculateGapStats(), cleanByCutValueOverpass(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), Alignment::originalNumberOfSequences, and Alignment::Statistics.
Referenced by trimAlManager::CleanResiduesAuto(), and trimAlManager::CleanSequences().
Alignment * Cleaner::cleanOverlapSeq | ( | float | minimumOverlap, |
float * | overlapSeq, | ||
bool | complementary | ||
) |
Method to trim an alignment based on a minimum sequence overlap threshold.
The method selects a combination of parameters to maximize the final number of columns in the new alignment.
.
minimumOverlap | Min overlap to keep a sequence. |
overlapSeq | Vector containing the overlap for each column. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 714 of file Cleaner.cpp.
References alig, Alignment::Alignment(), Alignment::Cleaning, Alignment::originalNumberOfSequences, removeAllGapsSeqsAndCols(), and Alignment::saveSequences.
Referenced by cleanSpuriousSeq().
Alignment * Cleaner::cleanSpuriousSeq | ( | float | overlapColumn, |
float | minimumOverlap, | ||
bool | complementary | ||
) |
Method to remove sequences missaligned with the rest of sequences in the alignment.
For each residue in the sequence, it tests it's similarity. If the similarity of that residue is higher than overlapColumn value, it counts as a hit for the sequence.
After calculating the number of hits for the sequence, it removes the sequence if it has a proportion hits/residues lower tan minimumOverlap.
overlapColumn | Minimum similarity value that a residue needs to be considered a hit. |
minimumOverlap | Minimum proportion of hits that a sequence needs to be kept in the new alignment. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 934 of file Cleaner.cpp.
References alig, calculateSpuriousVector(), cleanOverlapSeq(), and Alignment::originalNumberOfSequences.
Referenced by trimAlManager::CleanSequences().
Alignment * Cleaner::cleanStrict | ( | int | gapCut, |
const int * | gInCol, | ||
float | simCut, | ||
const float * | MDK_W, | ||
bool | complementary, | ||
bool | variable | ||
) |
Method to clean an alignment. It carries out strict and strictplus.
It removes sequences that overpass the gap threshold but fall behind the similarity threshold.
The function recovers those columns that, by themselves would be rejected, but it's neighbours (3 of 4) don't.
Column blocks that don't have a minimum size set by the method itself, will be removed too.
gapCut | Gap cut value to use. |
gInCol | Vector that contains the gaps present on each column. |
simCut | Minimum similarity value to keep a column. |
MDK_W | Vector that contains the similarity of each column. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
variable | Whether to use a variable block length. If false, block will be size 5. Else, it will use 1% of the alignment length, with a minimum of 3 and maximum of 12. This value will be overwritten if blockSize (of this object) is bigger than 0. |
Definition at line 513 of file Cleaner.cpp.
References alig, Alignment::Alignment(), blockSize, Alignment::Cleaning, Alignment::numberOfResidues, Alignment::originalNumberOfResidues, removeAllGapsSeqsAndCols(), utils::roundInt(), and Alignment::saveResidues.
Referenced by cleanCombMethods().
void Cleaner::computeComplementaryAlig | ( | bool | residues, |
bool | sequences | ||
) |
Method for computing the complementary alignment.
Complementary alignment is an alignment containing all sequences and columns that the original alignment would reject.
It inverses the saveResidues / saveSequences tags.
residues | Whether to reverse resiudes tags. |
sequences | Whether to reverse sequences tags. |
Definition at line 1589 of file Cleaner.cpp.
References alig, Alignment::numberOfResidues, Alignment::numberOfSequences, Alignment::originalNumberOfResidues, Alignment::originalNumberOfSequences, Alignment::saveResidues, and Alignment::saveSequences.
Referenced by trimAlManager::postprocess_alignment().
Alignment * Cleaner::getClustering | ( | float | identityThreshold | ) |
Method to select the most representative sequence (the longest one) for each cluster from the input alignment to generate a new alignment.
identityThreshold | Threshold used to assign sequences to clusters. If identity between representative sequence of the cluster and sequence to assign is superior to this threshold and no other cluster has a better identity with its representative, it will be assigned to that cluster. |
Definition at line 1211 of file Cleaner.cpp.
References alig, Alignment::Alignment(), calculateRepresentativeSeq(), Alignment::numberOfSequences, Alignment::originalNumberOfSequences, and Alignment::saveSequences.
Referenced by trimAlManager::CleanSequences().
float Cleaner::getCutPointClusters | ( | int | clusterNumber | ) |
Method that calculates the optimal cut point for a given clusters number.
The idea is to obtain a cutpoint that can be used to obtain as sequences as clusterNumber.
clusterNumber | Number of representative sequences to obtain. |
Definition at line 1068 of file Cleaner.cpp.
References alig, calculateSeqIdentity(), Alignment::identities, Alignment::numberOfSequences, Alignment::originalNumberOfSequences, utils::quicksort(), utils::removeCharacter(), Alignment::saveSequences, and Alignment::sequences.
Referenced by trimAlManager::CleanSequences().
void Cleaner::removeAllGapsSeqsAndCols | ( | bool | seqs = true , |
bool | cols = true |
||
) |
Method that identifies and removes columns and sequences composed only by gaps.
Definition at line 1367 of file Cleaner.cpp.
References alig, debug, KeepingOnlyGapsSequence, keepSequences, Alignment::numberOfResidues, Alignment::numberOfSequences, Alignment::originalNumberOfResidues, Alignment::originalNumberOfSequences, RemovingOnlyGapsSequence, reporting::reportManager::report(), Alignment::saveResidues, Alignment::saveSequences, Alignment::seqsName, and Alignment::sequences.
Referenced by cleanByCutValueFallBehind(), cleanByCutValueOverpass(), cleanByCutValueOverpassOrEquals(), cleanOverlapSeq(), cleanStrict(), removeColumns(), and removeSequences().
Alignment * Cleaner::removeColumns | ( | int * | columns, |
int | init, | ||
int | size, | ||
bool | complementary | ||
) |
Method to remove columns, expressed as ranges.
columns | Vector containing the columns to remove. |
init | Where does the vector start. Set to 1 if the vector contains its size as first element. |
size | Size of the columns vector. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 1240 of file Cleaner.cpp.
References alig, Alignment::Alignment(), Alignment::Cleaning, removeAllGapsSeqsAndCols(), Alignment::saveResidues, and Alignment::updateSequencesAndResiduesNums().
Referenced by trimAlManager::CleanResiduesNonAuto().
void Cleaner::removeDuplicates | ( | ) |
Definition at line 1610 of file Cleaner.cpp.
References alig, debug, Alignment::originalNumberOfSequences, RemovingDuplicateSequences, reporting::reportManager::report(), Alignment::saveSequences, Alignment::seqsName, and Alignment::sequences.
Referenced by trimAlManager::CleanSequences().
bool Cleaner::removeOnlyTerminal | ( | ) |
Method to detect right and left borders. Borders are the first column found with no gaps.
Everything between the borders are kept in the trimmed alignments.
Definition at line 1283 of file Cleaner.cpp.
References alig, statistics::Manager::calculateGapStats(), debug, statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), left_boundary, LeftBoundaryBiggerThanRightBoundary, Alignment::originalNumberOfResidues, reporting::reportManager::report(), right_boundary, Alignment::saveResidues, Alignment::Statistics, and Alignment::updateSequencesAndResiduesNums().
Referenced by trimAlManager::postprocess_alignment().
Alignment * Cleaner::removeSequences | ( | int * | seqs, |
int | init, | ||
int | size, | ||
bool | complementary | ||
) |
Method to remove sequences, expressed as ranges.
seqs | Vector containing the sequences to remove. |
init | Where does the vector start. Set to 1 if the vector contains its size as first element. |
size | Size of the columns vector. |
complementary | Whether or not to return the complementary version of the trimmed alignment. |
Definition at line 1262 of file Cleaner.cpp.
References alig, Alignment::Alignment(), Alignment::Cleaning, removeAllGapsSeqsAndCols(), Alignment::saveSequences, and Alignment::updateSequencesAndResiduesNums().
Referenced by trimAlManager::CleanSequences().
void Cleaner::removeSmallerBlocks | ( | int | blockSize, |
Alignment & | original | ||
) |
Method to remove blocks of columns with no rejected residue smaller than a given size.
blockSize | Minimum size to remove a block from the alignment |
original | Alignment to apply the removal. Minimum size a block has to be to be kept. |
Definition at line 1326 of file Cleaner.cpp.
References alig, Alignment::numberOfResidues, and Alignment::saveResidues.
Referenced by cleanByCutValueFallBehind(), cleanByCutValueOverpass(), and cleanByCutValueOverpassOrEquals().
int Cleaner::selectMethod | ( | ) |
Method that selects the best cleaning method based on statistics of the alignment.
Definition at line 42 of file Cleaner.cpp.
References alig, calculateSeqIdentity(), Alignment::identities, and Alignment::numberOfSequences.
Referenced by trimAlManager::CleanResiduesAuto().
void Cleaner::setBoundaries | ( | int * | boundaries | ) |
Boundaries setter.
[in] | boundaries | New boundaries values. |
Definition at line 1201 of file Cleaner.cpp.
References left_boundary, and right_boundary.
void Cleaner::setTrimTerminalGapsFlag | ( | bool | terminalOnly_ | ) |
Setter method to Terminal Only Flag.
terminalOnly_ | New vlue of the Terminal Only Flag. |
Definition at line 1194 of file Cleaner.cpp.
References terminalGapOnly.
Referenced by trimAlManager::innerPerform().
|
private |
Pointer to the alignment that contains this object.
Definition at line 485 of file Cleaner.h.
Referenced by calculateRelaxedSeqIdentity(), calculateRepresentativeSeq(), calculateSeqIdentity(), calculateSpuriousVector(), clean(), clean2ndSlope(), cleanByCutValueFallBehind(), cleanByCutValueOverpass(), cleanByCutValueOverpassOrEquals(), cleanCombMethods(), cleanCompareFile(), cleanConservation(), Cleaner(), cleanGaps(), cleanNoAllGaps(), cleanOverlapSeq(), cleanSpuriousSeq(), cleanStrict(), computeComplementaryAlig(), getClustering(), getCutPointClusters(), removeAllGapsSeqsAndCols(), removeColumns(), removeDuplicates(), removeOnlyTerminal(), removeSequences(), removeSmallerBlocks(), and selectMethod().
int Cleaner::blockSize |
Block size to use on the cleaning methods.
Definition at line 67 of file Cleaner.h.
Referenced by cleanByCutValueFallBehind(), cleanByCutValueOverpass(), cleanByCutValueOverpassOrEquals(), Cleaner(), cleanStrict(), and Alignment::setBlockSize().
bool Cleaner::keepSequences |
Flag for keeping sequences even when they are composed only by gaps.
Definition at line 63 of file Cleaner.h.
Referenced by Cleaner(), removeAllGapsSeqsAndCols(), and Alignment::setKeepSequencesFlag().
int Cleaner::left_boundary |
left boundary of the alignment
Definition at line 71 of file Cleaner.h.
Referenced by Cleaner(), removeOnlyTerminal(), and setBoundaries().
int Cleaner::right_boundary |
right boundary of the alignment
Definition at line 75 of file Cleaner.h.
Referenced by Cleaner(), removeOnlyTerminal(), and setBoundaries().
bool Cleaner::terminalGapOnly |
Flag for trimming only on the terminal positions.
Definition at line 59 of file Cleaner.h.
Referenced by Cleaner(), and setTrimTerminalGapsFlag().