Cleaner Class Reference

Submodule contained by Alignment . More...

#include <Cleaner.h>

Public Member Functions

int selectMethod ()
 Method that selects the best cleaning method based on statistics of the alignment. More...
 
AlignmentcleanByCutValueOverpass (double cut, float baseLine, const int *gInCol, bool complementary)
 Method to clean an alignment.
It removes sequences that overpass or equals a certain threshold.
The function detects if it would clean too many sequences, and relaxes the threshold until we have enough sequences to achieve the given percentage desired to keep.
To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one.
. More...
 
AlignmentcleanByCutValueFallBehind (float cut, float baseLine, const float *ValueVect, bool complementary)
 Method to clean an alignment.
It removes sequences that fall behind a certain threshold.
The function detects if it would clean too many sequences, and relaxes the threshold until we have enough sequences to achieve the given percentage desired to keep.
To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one. More...
 
AlignmentcleanByCutValueOverpassOrEquals (double cutGaps, const int *gInCol, float baseLine, float cutCons, const float *MDK_Win, bool complementary)
 Method to clean an alignment.
It removes sequences that overpass or equals the gap threshold but fall behind the similarity threshold.
The function detects if it would clean too many sequences, and relaxes both thresholds until we have enough sequences to achieve the given percentage desired to keep.
To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one. More...
 
AlignmentcleanStrict (int gapCut, const int *gInCol, float simCut, const float *MDK_W, bool complementary, bool variable)
 Method to clean an alignment. It carries out strict and strictplus.
It removes sequences that overpass the gap threshold but fall behind the similarity threshold.
The function recovers those columns that, by themselves would be rejected, but it's neighbours (3 of 4) don't.
Column blocks that don't have a minimum size set by the method itself, will be removed too. More...
 
AlignmentcleanOverlapSeq (float minimumOverlap, float *overlapSeq, bool complementary)
 Method to trim an alignment based on a minimum sequence overlap threshold.
The method selects a combination of parameters to maximize the final number of columns in the new alignment.
. More...
 
AlignmentcleanGaps (float baseLine, float gapsPct, bool complementary)
 Method to trim an alignment based on the gap distribution values. Column blocks that don't have a minimum size set by the method itself, will be removed too. More...
 
AlignmentcleanConservation (float baseLine, float conservationPct, bool complementary)
 Method to trim an alignment based on the similarity distribution values. More...
 
Alignmentclean (float baseLine, float GapsPct, float conservationPct, bool complementary)
 Method to trim an alignment based on the similarity and gaps distribution values.
. More...
 
AlignmentcleanCompareFile (float cutpoint, float baseLine, float *vectValues, bool complementary)
 Method to trim an alignment based on consistency values obtained from a dataset of alignments.
The function computes the optimal parameter combination values to trim an alignment based on the consistency value from the comparison among a dataset of alignments with the same sequences. More...
 
bool calculateSpuriousVector (float overlap, float *spuriousVector)
 Method to compute the overlap values.
See Alignment::overlaps. More...
 
AlignmentcleanSpuriousSeq (float overlapColumn, float minimumOverlap, bool complementary)
 Method to remove sequences missaligned with the rest of sequences in the alignment.
For each residue in the sequence, it tests it's similarity. If the similarity of that residue is higher than overlapColumn value, it counts as a hit for the sequence.
After calculating the number of hits for the sequence, it removes the sequence if it has a proportion hits/residues lower tan minimumOverlap. More...
 
Alignmentclean2ndSlope (bool complementary)
 Method that carries the gappyout approach.
This methods calculates the slope in gaps distribution on the original alignment.
Then, it compares groups of three consecutive residues, searching for the group with the most abrupt change in slope.
When found, the first residue is taken as the cutpoint for the sequences. More...
 
AlignmentcleanCombMethods (bool complementary, bool variable)
 Method to clean an alignment. It carries out strict and strictplus.
The method: More...
 
AlignmentcleanNoAllGaps (bool complementary)
 Method to remove columns composed only by gaps
This method is specially useful when we remove missaligned sequences from a given alignment. More...
 
AlignmentremoveColumns (int *columns, int init, int size, bool complementary)
 Method to remove columns, expressed as ranges. More...
 
AlignmentremoveSequences (int *seqs, int init, int size, bool complementary)
 Method to remove sequences, expressed as ranges. More...
 
AlignmentgetClustering (float identityThreshold)
 Method to select the most representative sequence (the longest one) for each cluster from the input alignment to generate a new alignment. More...
 
float getCutPointClusters (int clusterNumber)
 Method that calculates the optimal cut point for a given clusters number.
The idea is to obtain a cutpoint that can be used to obtain as sequences as clusterNumber. More...
 
void removeSmallerBlocks (int blockSize, Alignment &original)
 Method to remove blocks of columns with no rejected residue smaller than a given size. More...
 
bool removeOnlyTerminal ()
 Method to detect right and left borders. Borders are the first column found with no gaps.
Everything between the borders are kept in the trimmed alignments. More...
 
void removeAllGapsSeqsAndCols (bool seqs=true, bool cols=true)
 Method that identifies and removes columns and sequences composed only by gaps. More...
 
void setTrimTerminalGapsFlag (bool terminalOnly_)
 Setter method to Terminal Only Flag. More...
 
void setBoundaries (int *boundaries)
 Boundaries setter. More...
 
void calculateSeqIdentity ()
 Method to calculate identities between the sequences from the alignment.
See Alignment::identities. More...
 
void calculateRelaxedSeqIdentity ()
 Method that makes a raw approximation of sequence identity computation.
. More...
 
int * calculateRepresentativeSeq (float maximumIdent)
 Method to assign sequences to clusters.
Clusters are calculated following this flow: More...
 
void computeComplementaryAlig (bool residues, bool sequences)
 Method for computing the complementary alignment.
Complementary alignment is an alignment containing all sequences and columns that the original alignment would reject.
It inverses the saveResidues / saveSequences tags. More...
 
void removeDuplicates ()
 

Public Attributes

bool terminalGapOnly
 Flag for trimming only on the terminal positions. More...
 
bool keepSequences
 Flag for keeping sequences even when they are composed only by gaps. More...
 
int blockSize
 Block size to use on the cleaning methods. More...
 
int left_boundary
 left boundary of the alignment More...
 
int right_boundary
 right boundary of the alignment More...
 

Private Member Functions

 Cleaner (Alignment *parent)
 Class Constructor - Called by Alignment. More...
 
 Cleaner (Alignment *parent, Cleaner *mold)
 Copy Constructor - Called by Alignment. More...
 

Private Attributes

Alignmentalig
 Pointer to the alignment that contains this object. More...
 

Friends

class Alignment
 

Detailed Description

Submodule contained by Alignment .

Complementary parameter
Cleaning methods may contain a boolean flag argument called 'complementary' .
If set to true, the function will return a new alignment that contains the sequences and residues that method would reject.
Otherwise it returns the cleaned version of the original alignment this object belongs to.

The use of this flag - complementary - is not recommended.
Complementarity is now performed when ALL methods have been applied.

Definition at line 52 of file Cleaner.h.

Constructor & Destructor Documentation

◆ Cleaner() [1/2]

Cleaner::Cleaner ( Alignment parent)
explicitprivate

Class Constructor - Called by Alignment.

Definition at line 1633 of file Cleaner.cpp.

References alig, blockSize, keepSequences, left_boundary, right_boundary, and terminalGapOnly.

Referenced by Alignment::Alignment().

+ Here is the caller graph for this function:

◆ Cleaner() [2/2]

Cleaner::Cleaner ( Alignment parent,
Cleaner mold 
)
private

Copy Constructor - Called by Alignment.

Definition at line 1648 of file Cleaner.cpp.

References alig, blockSize, keepSequences, left_boundary, right_boundary, and terminalGapOnly.

Referenced by Alignment::Alignment().

+ Here is the caller graph for this function:

Member Function Documentation

◆ calculateRelaxedSeqIdentity()

void Cleaner::calculateRelaxedSeqIdentity ( )

Method that makes a raw approximation of sequence identity computation.
.

Warning
Not In Use
Note
Designed for reducing comparisons for huge alignments.

Definition at line 1483 of file Cleaner.cpp.

References alig, Alignment::identities, Alignment::originalNumberOfResidues, Alignment::originalNumberOfSequences, Alignment::saveResidues, Alignment::saveSequences, and Alignment::sequences.

◆ calculateRepresentativeSeq()

int * Cleaner::calculateRepresentativeSeq ( float  maximumIdent)

Method to assign sequences to clusters.
Clusters are calculated following this flow:

  • Select the longest sequence and use it as representative of the first cluster.
  • Using the previously computed identity values, check if second longest sequence should be part of cluster of first sequence. This is computed using the identity value in comparison with a threshold.
  • If the second sequence should be part of the existing cluster, we add it. Otherwise we create a new cluster and use this sequence as representative for this new cluster.
  • Continue with the rest of sequences, comparing them with the representatives of existing clusters. If a sequence can pertain to more than one cluster, we choose the one that maximizes the identity value with the representative sequence.
    Parameters
    maximumIdentIdentity threshold used to decide if a sequence should be part of a cluster or create a new one.
    Returns
    Vector that contains the clustering info.
    The first item in the vector contains it's size.

Definition at line 1525 of file Cleaner.cpp.

References alig, calculateSeqIdentity(), Alignment::Cleaning, Alignment::identities, Alignment::originalNumberOfSequences, utils::quicksort(), utils::removeCharacter(), Alignment::saveSequences, and Alignment::sequences.

Referenced by getClustering().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ calculateSeqIdentity()

void Cleaner::calculateSeqIdentity ( )

Method to calculate identities between the sequences from the alignment.
See Alignment::identities.

Definition at line 1434 of file Cleaner.cpp.

References AA, alig, Alignment::getAlignmentType(), Alignment::identities, Alignment::numberOfResidues, Alignment::originalNumberOfSequences, Alignment::saveResidues, Alignment::saveSequences, and Alignment::sequences.

Referenced by calculateRepresentativeSeq(), getCutPointClusters(), Alignment::printSeqIdentity(), and selectMethod().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ calculateSpuriousVector()

bool Cleaner::calculateSpuriousVector ( float  overlap,
float *  spuriousVector 
)

Method to compute the overlap values.
See Alignment::overlaps.

Parameters
overlapOverlap threshold.
[out]spuriousVectorPointer to the spuriousVector to fill.
Returns
True if the calculation went ok.
False otherwise.
This should happen only if you pass a null pointer instead of a spuriousVector.

Definition at line 855 of file Cleaner.cpp.

References AA, alig, Alignment::getAlignmentType(), Alignment::originalNumberOfResidues, Alignment::originalNumberOfSequences, and Alignment::sequences.

Referenced by cleanSpuriousSeq().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ clean()

Alignment * Cleaner::clean ( float  baseLine,
float  GapsPct,
float  conservationPct,
bool  complementary 
)

Method to trim an alignment based on the similarity and gaps distribution values.
.

Parameters
baseLineMinimum percentage of columns to conserve in the new alignment.
GapsPctMaximum percentage of gaps per column.
conservationPctMinimum value of similarity per column to keep the column.
complementaryWhether or not to return the complementary version of the trimmed alignment.
Take in mind
If baseLine is too strict, the other two will be relaxed to obtain the minimum percentage desired.
Returns
Pointer to the cleaned alignment.

Definition at line 784 of file Cleaner.cpp.

References alig, statistics::Gaps::calcCutPoint(), statistics::Similarity::calcCutPoint(), statistics::Manager::calculateConservationStats(), statistics::Manager::calculateGapStats(), cleanByCutValueOverpassOrEquals(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), statistics::Similarity::getMdkWindowedVector(), statistics::Manager::similarity, and Alignment::Statistics.

Referenced by trimAlManager::CleanResiduesNonAuto().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ clean2ndSlope()

Alignment * Cleaner::clean2ndSlope ( bool  complementary)

Method that carries the gappyout approach.
This methods calculates the slope in gaps distribution on the original alignment.
Then, it compares groups of three consecutive residues, searching for the group with the most abrupt change in slope.
When found, the first residue is taken as the cutpoint for the sequences.

Parameters
complementaryWhether or not to return the complementary version of the trimmed alignment.

Definition at line 957 of file Cleaner.cpp.

References alig, statistics::Gaps::calcCutPoint2ndSlope(), statistics::Manager::calculateGapStats(), cleanByCutValueOverpass(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), and Alignment::Statistics.

Referenced by trimAlManager::CleanResiduesAuto().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanByCutValueFallBehind()

Alignment * Cleaner::cleanByCutValueFallBehind ( float  cut,
float  baseLine,
const float *  ValueVect,
bool  complementary 
)

Method to clean an alignment.
It removes sequences that fall behind a certain threshold.
The function detects if it would clean too many sequences, and relaxes the threshold until we have enough sequences to achieve the given percentage desired to keep.
To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one.

Parameters
cutGap cut value to use.
baseLinePercent of sequences to keep
ValueVectVector that contains the gaps present on each column.
complementaryWhether or not to return the complementary version of the trimmed alignment.
Returns
Pointer to the cleaned alignment.

Definition at line 231 of file Cleaner.cpp.

References alig, Alignment::Alignment(), blockSize, Alignment::Cleaning, Alignment::numberOfResidues, Alignment::originalNumberOfResidues, removeAllGapsSeqsAndCols(), removeSmallerBlocks(), utils::roundInt(), and Alignment::saveResidues.

Referenced by cleanCompareFile(), and cleanConservation().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanByCutValueOverpass()

Alignment * Cleaner::cleanByCutValueOverpass ( double  cut,
float  baseLine,
const int *  gInCol,
bool  complementary 
)

Method to clean an alignment.
It removes sequences that overpass or equals a certain threshold.
The function detects if it would clean too many sequences, and relaxes the threshold until we have enough sequences to achieve the given percentage desired to keep.
To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one.
.

Parameters
cutCut value to use. If a column has a value lower or equal to the cut value, it is removed.
baseLinePercent of sequences to keep
gInColVector that contains the values that will be tested for each column.
complementaryWhether or not to return the complementary version of the trimmed alignment.
Returns
Pointer to the cleaned alignment.

Definition at line 90 of file Cleaner.cpp.

References alig, Alignment::Alignment(), blockSize, Alignment::Cleaning, Alignment::numberOfResidues, Alignment::originalNumberOfResidues, utils::quicksort(), removeAllGapsSeqsAndCols(), removeSmallerBlocks(), utils::roundInt(), and Alignment::saveResidues.

Referenced by clean2ndSlope(), cleanGaps(), and cleanNoAllGaps().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanByCutValueOverpassOrEquals()

Alignment * Cleaner::cleanByCutValueOverpassOrEquals ( double  cutGaps,
const int *  gInCol,
float  baseLine,
float  cutCons,
const float *  MDK_Win,
bool  complementary 
)

Method to clean an alignment.
It removes sequences that overpass or equals the gap threshold but fall behind the similarity threshold.
The function detects if it would clean too many sequences, and relaxes both thresholds until we have enough sequences to achieve the given percentage desired to keep.
To achieve this, the method starts in the middle of the sequence and, alternating sides, adds column one by one.

Parameters
cutGapsGap cut value to use.
baseLinePercent of sequences to keep
gInColVector that contains the gaps present on each column.
MDK_WinVector that contains similarity value for each column.
cutConsMinimum similarity value to keep a column in the alignment.
complementaryWhether or not to return the complementary version of the trimmed alignment.
Returns
Pointer to the cleaned alignment.

Definition at line 357 of file Cleaner.cpp.

References alig, Alignment::Alignment(), blockSize, Alignment::Cleaning, Alignment::numberOfResidues, Alignment::originalNumberOfResidues, utils::quicksort(), removeAllGapsSeqsAndCols(), removeSmallerBlocks(), utils::roundInt(), and Alignment::saveResidues.

Referenced by clean().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanCombMethods()

Alignment * Cleaner::cleanCombMethods ( bool  complementary,
bool  variable 
)

Method to clean an alignment. It carries out strict and strictplus.
The method:

  • Computes gaps values and gap cut point of the alignment.
  • Computes similarity values and similarity cut point of the alignment.
  • Calls the cleanStrict method with these values and returns its output.
    Parameters
    complementaryWhether or not to return the complementary version of the trimmed alignment.
    variableWhether to use a variable block length. If false, block will be size 5. Else, it will use 1% of the alignment length, with a minimum of 3 and maximum of 12. This value will be overwritten if blockSize (of this object) is bigger than 0.
    Returns
    Pointer to the cleaned alignment returned by cleanStrict using the parameters calculated in this function.

Definition at line 980 of file Cleaner.cpp.

References alig, statistics::Gaps::calcCutPoint2ndSlope(), statistics::Manager::calculateConservationStats(), cleanStrict(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), statistics::Similarity::getMdkWindowedVector(), utils::initlVect(), Alignment::originalNumberOfResidues, utils::quicksort(), Alignment::saveResidues, statistics::Manager::similarity, and Alignment::Statistics.

Referenced by trimAlManager::CleanResiduesAuto().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanCompareFile()

Alignment * Cleaner::cleanCompareFile ( float  cutpoint,
float  baseLine,
float *  vectValues,
bool  complementary 
)

Method to trim an alignment based on consistency values obtained from a dataset of alignments.
The function computes the optimal parameter combination values to trim an alignment based on the consistency value from the comparison among a dataset of alignments with the same sequences.

Parameters
cutpointHint of Gap cut point. May be used if it's lower than the minimum percentage threshold.
baseLineMinimum percentage of columns to conserve in the new alignment.
vectValuesVector with alignment consistency values
complementaryWhether or not to return the complementary version of the trimmed alignment.
Returns
Pointer to the cleaned alignment.

Definition at line 820 of file Cleaner.cpp.

References alig, cleanByCutValueFallBehind(), utils::copyVect(), utils::min(), Alignment::originalNumberOfResidues, and utils::quicksort().

Referenced by trimAlManager::CleanResiduesNonAuto().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanConservation()

Alignment * Cleaner::cleanConservation ( float  baseLine,
float  conservationPct,
bool  complementary 
)

Method to trim an alignment based on the similarity distribution values.

Parameters
baseLineMinimum percentage of columns to conserve in the new alignment.
conservationPctMinimum value of similarity per column to keep the column.
complementaryWhether or not to return the complementary version of the trimmed alignment.
Returns
Pointer to the cleaned alignment.

Definition at line 760 of file Cleaner.cpp.

References alig, statistics::Similarity::calcCutPoint(), statistics::Manager::calculateConservationStats(), cleanByCutValueFallBehind(), statistics::Similarity::getMdkWindowedVector(), statistics::Manager::similarity, and Alignment::Statistics.

Referenced by trimAlManager::CleanResiduesNonAuto().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanGaps()

Alignment * Cleaner::cleanGaps ( float  baseLine,
float  gapsPct,
bool  complementary 
)

Method to trim an alignment based on the gap distribution values. Column blocks that don't have a minimum size set by the method itself, will be removed too.

Parameters
baseLineMinimum percentage of columns to conserve in the new alignment.
gapsPctMaximum percentage of gaps per column.
complementaryWhether or not to return the complementary version of the trimmed alignment.
Returns
Pointer to the cleaned alignment.

Definition at line 736 of file Cleaner.cpp.

References alig, statistics::Gaps::calcCutPoint(), statistics::Manager::calculateGapStats(), cleanByCutValueOverpass(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), and Alignment::Statistics.

Referenced by trimAlManager::CleanResiduesAuto(), and trimAlManager::CleanResiduesNonAuto().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanNoAllGaps()

Alignment * Cleaner::cleanNoAllGaps ( bool  complementary)

Method to remove columns composed only by gaps
This method is specially useful when we remove missaligned sequences from a given alignment.

Parameters
complementaryWhether or not to return the complementary version of the trimmed alignment.
Although this method contains a complementary flag, setting this up would return an alignment full of gaps-only columns.

Definition at line 1049 of file Cleaner.cpp.

References alig, statistics::Manager::calculateGapStats(), cleanByCutValueOverpass(), statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), Alignment::originalNumberOfSequences, and Alignment::Statistics.

Referenced by trimAlManager::CleanResiduesAuto(), and trimAlManager::CleanSequences().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanOverlapSeq()

Alignment * Cleaner::cleanOverlapSeq ( float  minimumOverlap,
float *  overlapSeq,
bool  complementary 
)

Method to trim an alignment based on a minimum sequence overlap threshold.
The method selects a combination of parameters to maximize the final number of columns in the new alignment.
.

Parameters
minimumOverlapMin overlap to keep a sequence.
overlapSeqVector containing the overlap for each column.
complementaryWhether or not to return the complementary version of the trimmed alignment.
Returns
Pointer to the cleaned alignment.

Definition at line 714 of file Cleaner.cpp.

References alig, Alignment::Alignment(), Alignment::Cleaning, Alignment::originalNumberOfSequences, removeAllGapsSeqsAndCols(), and Alignment::saveSequences.

Referenced by cleanSpuriousSeq().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanSpuriousSeq()

Alignment * Cleaner::cleanSpuriousSeq ( float  overlapColumn,
float  minimumOverlap,
bool  complementary 
)

Method to remove sequences missaligned with the rest of sequences in the alignment.
For each residue in the sequence, it tests it's similarity. If the similarity of that residue is higher than overlapColumn value, it counts as a hit for the sequence.
After calculating the number of hits for the sequence, it removes the sequence if it has a proportion hits/residues lower tan minimumOverlap.

Parameters
overlapColumnMinimum similarity value that a residue needs to be considered a hit.
minimumOverlapMinimum proportion of hits that a sequence needs to be kept in the new alignment.
complementaryWhether or not to return the complementary version of the trimmed alignment.
Returns
Pointer to the cleaned alignment. \

Definition at line 934 of file Cleaner.cpp.

References alig, calculateSpuriousVector(), cleanOverlapSeq(), and Alignment::originalNumberOfSequences.

Referenced by trimAlManager::CleanSequences().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ cleanStrict()

Alignment * Cleaner::cleanStrict ( int  gapCut,
const int *  gInCol,
float  simCut,
const float *  MDK_W,
bool  complementary,
bool  variable 
)

Method to clean an alignment. It carries out strict and strictplus.
It removes sequences that overpass the gap threshold but fall behind the similarity threshold.
The function recovers those columns that, by themselves would be rejected, but it's neighbours (3 of 4) don't.
Column blocks that don't have a minimum size set by the method itself, will be removed too.

Parameters
gapCutGap cut value to use.
gInColVector that contains the gaps present on each column.
simCutMinimum similarity value to keep a column.
MDK_WVector that contains the similarity of each column.
complementaryWhether or not to return the complementary version of the trimmed alignment.
variableWhether to use a variable block length. If false, block will be size 5. Else, it will use 1% of the alignment length, with a minimum of 3 and maximum of 12. This value will be overwritten if blockSize (of this object) is bigger than 0.
Returns
Pointer to the cleaned alignment.

Definition at line 513 of file Cleaner.cpp.

References alig, Alignment::Alignment(), blockSize, Alignment::Cleaning, Alignment::numberOfResidues, Alignment::originalNumberOfResidues, removeAllGapsSeqsAndCols(), utils::roundInt(), and Alignment::saveResidues.

Referenced by cleanCombMethods().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ computeComplementaryAlig()

void Cleaner::computeComplementaryAlig ( bool  residues,
bool  sequences 
)

Method for computing the complementary alignment.
Complementary alignment is an alignment containing all sequences and columns that the original alignment would reject.
It inverses the saveResidues / saveSequences tags.

Parameters
residuesWhether to reverse resiudes tags.
sequencesWhether to reverse sequences tags.

Definition at line 1589 of file Cleaner.cpp.

References alig, Alignment::numberOfResidues, Alignment::numberOfSequences, Alignment::originalNumberOfResidues, Alignment::originalNumberOfSequences, Alignment::saveResidues, and Alignment::saveSequences.

Referenced by trimAlManager::postprocess_alignment().

+ Here is the caller graph for this function:

◆ getClustering()

Alignment * Cleaner::getClustering ( float  identityThreshold)

Method to select the most representative sequence (the longest one) for each cluster from the input alignment to generate a new alignment.

Parameters
identityThresholdThreshold used to assign sequences to clusters.
If identity between representative sequence of the cluster and sequence to assign is superior to this threshold and no other cluster has a better identity with its representative, it will be assigned to that cluster.

Definition at line 1211 of file Cleaner.cpp.

References alig, Alignment::Alignment(), calculateRepresentativeSeq(), Alignment::numberOfSequences, Alignment::originalNumberOfSequences, and Alignment::saveSequences.

Referenced by trimAlManager::CleanSequences().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ getCutPointClusters()

float Cleaner::getCutPointClusters ( int  clusterNumber)

Method that calculates the optimal cut point for a given clusters number.
The idea is to obtain a cutpoint that can be used to obtain as sequences as clusterNumber.

Parameters
clusterNumberNumber of representative sequences to obtain.
Returns
CutPoint to obtain clusterNumber sequences.

Definition at line 1068 of file Cleaner.cpp.

References alig, calculateSeqIdentity(), Alignment::identities, Alignment::numberOfSequences, Alignment::originalNumberOfSequences, utils::quicksort(), utils::removeCharacter(), Alignment::saveSequences, and Alignment::sequences.

Referenced by trimAlManager::CleanSequences().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ removeAllGapsSeqsAndCols()

void Cleaner::removeAllGapsSeqsAndCols ( bool  seqs = true,
bool  cols = true 
)

Method that identifies and removes columns and sequences composed only by gaps.

Definition at line 1367 of file Cleaner.cpp.

References alig, debug, KeepingOnlyGapsSequence, keepSequences, Alignment::numberOfResidues, Alignment::numberOfSequences, Alignment::originalNumberOfResidues, Alignment::originalNumberOfSequences, RemovingOnlyGapsSequence, reporting::reportManager::report(), Alignment::saveResidues, Alignment::saveSequences, Alignment::seqsName, and Alignment::sequences.

Referenced by cleanByCutValueFallBehind(), cleanByCutValueOverpass(), cleanByCutValueOverpassOrEquals(), cleanOverlapSeq(), cleanStrict(), removeColumns(), and removeSequences().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ removeColumns()

Alignment * Cleaner::removeColumns ( int *  columns,
int  init,
int  size,
bool  complementary 
)

Method to remove columns, expressed as ranges.

Parameters
columnsVector containing the columns to remove.
initWhere does the vector start. Set to 1 if the vector contains its size as first element.
sizeSize of the columns vector.
complementaryWhether or not to return the complementary version of the trimmed alignment.

Definition at line 1240 of file Cleaner.cpp.

References alig, Alignment::Alignment(), Alignment::Cleaning, removeAllGapsSeqsAndCols(), Alignment::saveResidues, and Alignment::updateSequencesAndResiduesNums().

Referenced by trimAlManager::CleanResiduesNonAuto().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ removeDuplicates()

void Cleaner::removeDuplicates ( )

Definition at line 1610 of file Cleaner.cpp.

References alig, debug, Alignment::originalNumberOfSequences, RemovingDuplicateSequences, reporting::reportManager::report(), Alignment::saveSequences, Alignment::seqsName, and Alignment::sequences.

Referenced by trimAlManager::CleanSequences().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ removeOnlyTerminal()

bool Cleaner::removeOnlyTerminal ( )

Method to detect right and left borders. Borders are the first column found with no gaps.
Everything between the borders are kept in the trimmed alignments.

Returns
True if everything went ok.
False if it was not possible to calculate the gap stats.

Definition at line 1283 of file Cleaner.cpp.

References alig, statistics::Manager::calculateGapStats(), debug, statistics::Manager::gaps, statistics::Gaps::getGapsWindow(), left_boundary, LeftBoundaryBiggerThanRightBoundary, Alignment::originalNumberOfResidues, reporting::reportManager::report(), right_boundary, Alignment::saveResidues, Alignment::Statistics, and Alignment::updateSequencesAndResiduesNums().

Referenced by trimAlManager::postprocess_alignment().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ removeSequences()

Alignment * Cleaner::removeSequences ( int *  seqs,
int  init,
int  size,
bool  complementary 
)

Method to remove sequences, expressed as ranges.

Parameters
seqsVector containing the sequences to remove.
initWhere does the vector start. Set to 1 if the vector contains its size as first element.
sizeSize of the columns vector.
complementaryWhether or not to return the complementary version of the trimmed alignment.

Definition at line 1262 of file Cleaner.cpp.

References alig, Alignment::Alignment(), Alignment::Cleaning, removeAllGapsSeqsAndCols(), Alignment::saveSequences, and Alignment::updateSequencesAndResiduesNums().

Referenced by trimAlManager::CleanSequences().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ removeSmallerBlocks()

void Cleaner::removeSmallerBlocks ( int  blockSize,
Alignment original 
)

Method to remove blocks of columns with no rejected residue smaller than a given size.

Parameters
blockSizeMinimum size to remove a block from the alignment
originalAlignment to apply the removal. Minimum size a block has to be to be kept.

Definition at line 1326 of file Cleaner.cpp.

References alig, Alignment::numberOfResidues, and Alignment::saveResidues.

Referenced by cleanByCutValueFallBehind(), cleanByCutValueOverpass(), and cleanByCutValueOverpassOrEquals().

+ Here is the caller graph for this function:

◆ selectMethod()

int Cleaner::selectMethod ( )

Method that selects the best cleaning method based on statistics of the alignment.

Definition at line 42 of file Cleaner.cpp.

References alig, calculateSeqIdentity(), Alignment::identities, and Alignment::numberOfSequences.

Referenced by trimAlManager::CleanResiduesAuto().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ setBoundaries()

void Cleaner::setBoundaries ( int *  boundaries)

Boundaries setter.

Warning
Not In Use
Parameters
[in]boundariesNew boundaries values.

Definition at line 1201 of file Cleaner.cpp.

References left_boundary, and right_boundary.

◆ setTrimTerminalGapsFlag()

void Cleaner::setTrimTerminalGapsFlag ( bool  terminalOnly_)

Setter method to Terminal Only Flag.

Parameters
terminalOnly_New vlue of the Terminal Only Flag.

Definition at line 1194 of file Cleaner.cpp.

References terminalGapOnly.

Referenced by trimAlManager::innerPerform().

+ Here is the caller graph for this function:

Friends And Related Function Documentation

◆ Alignment

friend class Alignment
friend

Definition at line 480 of file Cleaner.h.

Member Data Documentation

◆ alig

◆ blockSize

int Cleaner::blockSize

Block size to use on the cleaning methods.

Definition at line 67 of file Cleaner.h.

Referenced by cleanByCutValueFallBehind(), cleanByCutValueOverpass(), cleanByCutValueOverpassOrEquals(), Cleaner(), cleanStrict(), and Alignment::setBlockSize().

◆ keepSequences

bool Cleaner::keepSequences

Flag for keeping sequences even when they are composed only by gaps.

Definition at line 63 of file Cleaner.h.

Referenced by Cleaner(), removeAllGapsSeqsAndCols(), and Alignment::setKeepSequencesFlag().

◆ left_boundary

int Cleaner::left_boundary

left boundary of the alignment

Definition at line 71 of file Cleaner.h.

Referenced by Cleaner(), removeOnlyTerminal(), and setBoundaries().

◆ right_boundary

int Cleaner::right_boundary

right boundary of the alignment

Definition at line 75 of file Cleaner.h.

Referenced by Cleaner(), removeOnlyTerminal(), and setBoundaries().

◆ terminalGapOnly

bool Cleaner::terminalGapOnly

Flag for trimming only on the terminal positions.

Definition at line 59 of file Cleaner.h.

Referenced by Cleaner(), and setTrimTerminalGapsFlag().


The documentation for this class was generated from the following files: