Data Flow

trimAl performs multiple steps to trim your alignments.
These steps are divided into:

Preprocessing

This step allows to obtain an alignment from different sources instead of providing one with the '-in' argument:

  • NGS: Allows to load a VCF file containing mutations for a REF file.
    When both are provided, a MSA can be built and used in later steps. WIP
  • CompareSet: Performs a consistency analysis to a set of alignments.
    The most consistent alignment among the set is selected.
    Note
    You can force to select an alignment despite it's consistency using -forceselect.
    This functionality allows to clean the alignment using consistency stats, while using the manually selected alignment as input
    Warning
    If no preprocessing is used, then an alignment must be provided (using -in argument).
    preprocess.svg

Sequence Trimming

This steps allows to trim the alignment by removing sequences.
Only one method is allowed.

  • Cluster: Performs a clustering among the MSA.
    The most representative sequences are selected.
  • MaxIdentity: Performs an identity analysis.
    It keeps only the sequences with a minimum identity (provided by the user) among the sequences in the set.
  • SelectSeqs: Manually removes sequences from the alignment.
seqTrim.svg

Residue Trimming

This step allows to trim the alignment by removing columns.

  • Automated Methods (Based on Gaps)
    • Gappyout: Searches a threshold based on gaps distribution.
      Removes residues that overpass this threshold.
    • NoGaps: Removes all columns that contains gaps.
    • NoAllGaps: Removes all columns that only contain gaps.
      Although the input file usually does not contain only gaps columns,
      removing sequences could lead to this 'all gaps' columns.
    • Automated Methods (Based on Similarity)
      • Strict: Searches a threshold based on gaps and similarity distribution.
      • StrictPlus: Searches a threshold based on gaps and similarity distribution.
  • Meta Automated Methods
    These methods use heuristics to select the best method among a set to clean an alignment based on certain properties:
    • Automated1: Decides between Gappyout and Strict methods.
  • Semi Automated Methods
    These methods allow to provide different statistics thresholds to trim the alignment based on them.
    • Consistency Threshold: Trim the alignment removing columns with a consistency ratio inferior to threshold provided
      When a consistency threshold is provided, the trimming is done before others.
      Note
      Only one of the following will be used depending on the thresholds provided:
    • Gap Threshold: Trim the alignment removing columns with a gap ratio superior to the threshold provided
    • Similarity Threshold: Trim the alignment removing columns with a similarity ratio inferior to threshold provided
    • Similarity and Gap Thresholds: Trim the alignment removing columns with a gap ratio superior to gap threshold or similarity ratio inferior to similarity threshold.
resTrim.svg

Post Process

This step performs post processing on the resulting alignment.
These steps are sequential and optional, so you can perform any combination of them, or not perform any post process.

  • Only Terminal Recover all residues that are outside terminal ends.
    Preserve the center of the alignment, preserving columns until a column with at least one gap is found.
    This is done for both ends independently, so their sizes don't need to match.
  • Complementary Inverse the residue selection, keeping all originally removed residues and removing all originally kept residues.
  • Backtranslate If provided an AA alignment it's possible to also provide a non-aligned MSA that contains the original DNA sequences.
    This step will translate the trimmed AA alignment into a DNA alignment, using the original sequences provided. This yields to better results than trimming directly with DNA.
postProcess.svg

Output

This step does not perform any trimming, but is used to report the information obtained to the user.

  • SVG Output – Obtain a report in SVG Format, where the original alignment is showed in a manner similar to html output format.
    This report includes all the statistics used in the trimming steps and also, points out what residues and sequences are kept or removed.
  • HTML Output – Obtain a report in HTML Format, including the statistics and kept/removed sequences / residues.
  • Output Alignment allows to output the aligment. Two options are available: Only one can be selected:
    • STDOUT (default) – The alignment will be printed on the terminal or STDOUT.
      • Some messages (as errors and warnings) will be displayed on STDOUT too, which may difficult obtaining the MSA.
      • This prevents the stats from being exported, as they are displayed on STDOUT too.
      • Only one format can be selected from the set.
    • FILE (recommended) – It is possible to write to a file the resulting MSA.
      • Multiple formats are allowed, using an output pattern.
      • Stats can be requested.
  • Print Stats – Print the requested stats if available.
    You can request statistics that haven't been used on trimming the alignment, nor included on the reports.
    Note
    For this purposes it may be handful to take a look on statAl (WIP)
    output.svg