šŸ“‘
Implementation of Bioinformatics Pipeline at AKU
  • PHA4GE: Implementation of CZ ID Workflow at AKU for Analyzing SARS-CoV-2 Genomics Data
    • šŸ˜ŽImplementation of CZ ID mini-WDL-based SARS-CoV-2 Consensus Genome Workflow Pipeline at AKU
      • 😊SARS CoV-2 Consensus Genome QC- Mixed sites Correction
      • šŸ˜‡SARS CoV-2 Consensus Genome QC- Frame shift Correction
Powered by GitBook
On this page

Was this helpful?

  1. PHA4GE: Implementation of CZ ID Workflow at AKU for Analyzing SARS-CoV-2 Genomics Data
  2. Implementation of CZ ID mini-WDL-based SARS-CoV-2 Consensus Genome Workflow Pipeline at AKU

SARS CoV-2 Consensus Genome QC- Frame shift Correction

This page will guide you how to remove frame shifts (FS) from your SARS-CoV-2 FASTA consensus genome file before submitting to GISAID.

PreviousSARS CoV-2 Consensus Genome QC- Mixed sites Correction

Last updated 2 years ago

Was this helpful?

Khan W, Kanwar S

AKU CITRIC Center for Bioinformatics and Computational Biology, Depratment of Pediatrics and Child Health, Faculty of Health Sciences, Medical College, The Aga Khan Universitry, Karachi-74800, Pakistan.

On e QC, Frame shift (FS) is represented by "F" and visually as red line. Color coding represents the total number of FS in the sequence.

  • Red indicates the number of FS >2.

  • Yellow indicates the number of FS <=2.

  • Green indicates the number of FS <=1 (acceptable).

You can follow below mentioned step to remove FS from your consensus genome:

1) Identify the nucleotide position from where the FS starts. On hovering the mouse on red line, the nucleotide position can be identified.

3) Hence, you can see here, the "N" in the above figure should be a gap "-". This is what caused the FS mutation. In this case, removing an ā€œNā€ from this position can remove the FS. Search for the region around ā€œNā€ in the consensus genome FASTA file.

Hint: To find the exact FS position, copy adjacent/flanking nucleotides (+/-3) around that specific "N" and search in FASTA file.

4- Remove the N and Run NextClade QC again.

Note: The alignment file will help us to decide whether we need to insert or delete ā€œNā€ at a specific position in the consensus file.

Hope this will help you to improve the QC of consensus genome.

2) Open the alignment file of a sample, for example, muscle.out.fasta (generated as intermediate file in ) containing FS on software and search for the said nucleotide position.

and voila!!! frameshift is corrected.

šŸ˜Ž
šŸ˜‡
šŸŽ‰
CZ ID pipeline
MegaX
NextClad
As here in this example, FS starts from 4148 to 13,468
Visualization of nucleotide position on MegaX
Identified position of "N" in consensus.fa
FS removed from the consensus genome.