Abstract
Background: Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy.
Objective: In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically.
Methods: Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs.
Results: We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools.
Conclusion: The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.
Keywords: Protein, multiple sequence alignment, progressive alignment, realign, refinement, splitting-splicing vertically.
Graphical Abstract
[http://dx.doi.org/10.1093/bioinformatics/bti252] [PMID: 15647299]
[http://dx.doi.org/10.1093/nar/gki735] [PMID: 16043635]
[PMID: 29771363]
[http://dx.doi.org/10.1186/s12918-017-0476-3] [PMID: 29297337]
[http://dx.doi.org/10.1093/bib/bbv099] [PMID: 26615024]
[http://dx.doi.org/10.3390/ijms17122118] [PMID: 27999256]
[http://dx.doi.org/10.3389/fgene.2019.00094] [PMID: 30891058]
[http://dx.doi.org/10.3389/fgene.2018.00657] [PMID: 30619477]
[http://dx.doi.org/10.2174/1566523218666181010101114] [PMID: 30306867]
[http://dx.doi.org/10.1017/CBO9780511790492]
[http://dx.doi.org/10.1007/BF02603120] [PMID: 3118049]
[http://dx.doi.org/10.3389/fgene.2018.00703] [PMID: 30740125]
[http://dx.doi.org/10.1186/s12864-017-4338-6] [PMID: 29363423]
[http://dx.doi.org/10.1093/bib/bbx103] [PMID: 28968812]
[http://dx.doi.org/10.1101/gr.2821705] [PMID: 15687296]
[http://dx.doi.org/10.1109/TCBB.2014.2316820] [PMID: 26357079]
[http://dx.doi.org/10.1109/BIBM.2018.8621220]
[http://dx.doi.org/10.1016/0022-2836(70)90057-4] [PMID: 5420325]
[http://dx.doi.org/10.1093/nar/gkh340] [PMID: 15034147]
[http://dx.doi.org/10.1006/jmbi.2000.4042] [PMID: 10964570]
[http://dx.doi.org/10.1038/msb.2011.75] [PMID: 21988835]
[http://dx.doi.org/10.1186/1748-7188-5-21] [PMID: 20470396]
[http://dx.doi.org/10.1093/nar/gkf436] [PMID: 12136088]
[http://dx.doi.org/10.1089/10665270252833172] [PMID: 11911793]
[http://dx.doi.org/10.1016/0022-2836(87)90316-0] [PMID: 3430611]
[http://dx.doi.org/10.1016/0022-2836(89)90592-5] [PMID: 2685324]
[http://dx.doi.org/10.1093/bioinformatics/7.4.479] [PMID: 1747779]
[http://dx.doi.org/10.1093/bioinformatics/9.3.361] [PMID: 8324637]
[http://dx.doi.org/10.1017/CBO9780511623486]
[http://dx.doi.org/10.1093/molbev/mst010] [PMID: 23329690]
[http://dx.doi.org/10.1093/bioinformatics/btl592] [PMID: 17118958]
[http://dx.doi.org/10.1093/bioinformatics/btl472] [PMID: 16954142]
[http://dx.doi.org/10.1093/bioinformatics/btq338] [PMID: 20576627]
[http://dx.doi.org/10.1006/jmbi.1996.0679] [PMID: 8980688]
[http://dx.doi.org/10.1093/nar/gkq255] [PMID: 20413579]
[http://dx.doi.org/10.1093/bioinformatics/15.1.87] [PMID: 10068696]
[http://dx.doi.org/10.1093/bioinformatics/bth116] [PMID: 14962914]
[http://dx.doi.org/10.1186/1471-2105-4-47] [PMID: 14552658]
[http://dx.doi.org/10.1093/nar/gkp1196] [PMID: 20047958]
[http://dx.doi.org/10.1093/molbev/msq196] [PMID: 20671041]
[http://dx.doi.org/10.1186/1471-2105-16-S5-S4] [PMID: 25859903]
[http://dx.doi.org/10.1093/bioinformatics/btv177] [PMID: 25812743]
[http://dx.doi.org/10.1186/s13015-017-0116-x] [PMID: 29026435]
[http://dx.doi.org/10.1089/cmb.2017.0040] [PMID: 29116822]
[http://dx.doi.org/10.1093/bib/bbs088] [PMID: 23396756]
[PMID: 30247625]
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009]
[http://dx.doi.org/10.1093/bioinformatics/bty002] [PMID: 29365045]
[PMID: 27543076]
[PMID: 28171531]
[http://dx.doi.org/10.1093/nar/gky1051] [PMID: 30380072]
[http://dx.doi.org/10.1038/srep34820] [PMID: 27703231]