About Article

Genome-wide annotation and structural modeling of hypothetical proteins in Listeria monocytogenes

Vol. 1, No. 1 · 2026

Cellular & Molecular Intelligence • Vol. 1, No. 1 • 2026

Open access PDF|CC BY 4.0 Open Access|Corresponding author email shown in PDF: azharasim@gmail.com

Highlights

The article investigates how missense mutations in the ARSB gene destabilize protein structure and contribute to mucopolysaccharidosis type VI (MPS VI).
The study combines sequence-based prediction, structure-based stability analysis, pathogenicity scoring, conservation analysis, aggregation propensity analysis, and molecular dynamics simulation.
The abstract reports 430 nsSNPs screened, 141 variants analyzed structurally, 57 overlapping high-confidence variants, and 44 highly deleterious mutations identified.
ConSurf analysis highlighted 12 final mutations in highly conserved regions, while SODA and structural interaction analysis emphasized A237D and W353R as key low-solubility, aggregation-relevant variants.
The PDF contains several internal inconsistencies in dates and variant counts, so this JSON preserves those conflicts explicitly instead of silently normalizing them.

Abstract

Mutations in the arylsulfatase B (ARSB) gene are directly implicated in mucopolysaccharidosis type VI (MPS VI). Several non-synonymous single-nucleotide polymorphisms (nsSNPs) in ARSB have been associated with disease pathogenesis. A comprehensive evaluation of these variants is essential to understand their structural and functional consequences. In this study, a systematic in silico analysis was performed to identify deleterious nsSNPs in the ARSB gene. Initially, 430 nsSNPs were evaluated using sequence-based prediction tools, including SIFT, PolyPhen-2, FATHMM, and Mutation Assessor. Subsequently, 141 nsSNPs were subjected to structure-based stability analysis using MAESTROweb, SDM2, mCSM, and DynaMut2, of which 57 variants overlapped with previous reports. High-confidence deleterious nsSNPs were further assessed for pathogenicity using PMut and MutPred2 servers. Our integrated computational approach identified 44 highly deleterious mutations. Aggregation propensity analysis revealed that 29 of these variants exhibit increased aggregation tendencies, while one variant demonstrated progressive loss of solubility. Molecular dynamics simulations further indicated that high- confidence deleterious nsSNPs significantly disrupt ARSB structural integrity, enhance molecular flexibility, reduce structural rigidity, and promote atomic-level aggregation. Overall, this study provides mechanistic insights into how pathogenic mutations destabilize the ARSB protein and contribute to MPS VI pathogenesis, highlighting potential targets for future therapeutic investigation.

Keywords

ARSB geneMucopolysaccharidosis type VINon-synonymous SNPsPathogenic mutationsProtein stabilityProtein aggregationGenetic variationComputational mutagenesis

Previous article in issue Next article in issue

Article Overview

The article introduces lysosomal storage diseases as disorders caused by deficiency of specific lysosomal enzymes and places MPS VI within that group as an autosomal recessive disease caused by ARSB mutations.

ARSB encodes N-acetylgalactosamine-4-sulfatase, which removes sulfate groups from dermatan sulfate and chondroitin-4-sulfate. Loss of ARSB function leads to glycosaminoglycan accumulation and progressive tissue pathology affecting the skeleton, joints, cornea, heart, liver, spleen, and respiratory system.

The paper emphasizes that more than 200 ARSB variants have been reported but that the structural and functional effects of many variants remain incompletely characterized, which complicates prognosis and therapeutic planning.

To address that gap, the study applies a computational workflow integrating sequence, structure, conservation, aggregation, and dynamics analyses to prioritize pathogenic ARSB nsSNPs and explain their mechanistic contribution to disease.

1. About the Article

This is a research article published in Cellular & Molecular Intelligence. It studies mutation-induced instability of arylsulfatase B (ARSB) in the context of mucopolysaccharidosis type VI.

The authors are affiliated with Jamia Hamdard, New Delhi, and King Faisal Specialist Hospital and Research, Riyadh. The PDF identifies Mohammad Asim Azhar as corresponding author in the affiliation block, but the email line names Barka Basharat together with that email address.

Journal: Cellular & Molecular Intelligence
Volume/Issue shown: Vol. 1, No. 1
Publication year shown in header: 2026
Article title: Molecular Insights into Arylsulfatase B Mutation-Induced Instability in Mucopolysaccharidosis Type VI
Authors: Barka Basharat; Nushrat Jahan; Mohammad Asim Azhar
Affiliation 1: Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, Hamdard Nagar, New Delhi, India
Affiliation 2: Organ Transplant Center of Excellence, King Faisal Specialist Hospital and Research, Riyadh, Kingdom of Saudi Arabia
Corresponding email shown: azharasim@gmail.com
Received: April 16th, 2024
Accepted: March 31, 2026
Published: May 21, 2024
License statement: Creative Commons CC-BY 4.0
Open-access status: Open Access
Header page range shown on first page: 1000–1025
Printed article page range visible in pages: 1000–1007

2. Introduction

The introduction explains lysosomal storage diseases as progressive disorders caused by deficiency of lysosomal enzymes or associated proteins, leading to substrate accumulation and multisystem disease. MPS VI is described specifically as Maroteaux-Lamy syndrome caused by ARSB mutations.

ARSB hydrolyzes sulfate groups from dermatan sulfate and chondroitin-4-sulfate. Deficient activity causes glycosaminoglycan buildup and progressive manifestations such as dysostosis multiplex, joint stiffness, corneal clouding, cardiac valve disease, hepatosplenomegaly, and respiratory complications.

Clinical severity varies, with severe cases presenting early in life and progressing to major disability. Diagnosis is described as combining clinical evaluation, urinary GAG quantification, enzymatic assays, and ARSB genotyping.

The authors argue that ARSB nsSNPs can disrupt protein stability, folding, or interactions, and they position the study as a systematic effort to evaluate such variants using a computational prioritization pipeline. Figure 1 on page 2 depicts that multi-step pipeline across sequence-based, structure-based, disease-prediction, and aggregation analyses.

Human ARSB UniProt ID used in the study: P15848
Crystal structure used in the study: PDB ID 1FSU
Figure 1 shows the computational workflow for forecasting pathogenicity in ARSB
The introduction mentions more than 200 reported ARSB variants

3. Materials and Methods

The methods section describes a fully computational analysis workflow built around mutation retrieval, sequence-based deleteriousness prediction, structure-based stability estimation, pathogenicity scoring, conservation analysis, aggregation propensity analysis, interaction analysis, and molecular dynamics simulation.

2.1. Data Retrieval

The FASTA sequence of the human ARSB gene was obtained from UniProt using accession P15848. Missense mutation information was gathered through PubMed review and from dbSNP, HGMD, ClinVar, and Ensembl, with redundant nsSNPs removed.

The crystal structure of human ARSB was taken from the Protein Data Bank using PDB ID 1FSU. The text also states that remaining mutations were taken from Ensembl and that multiple mutation types were included during data collection.

Sequence source: UniProt
UniProt ID: P15848
Structure source: Protein Data Bank
PDB ID: 1FSU
Variant sources listed: PubMed, dbSNP, HGMD, ClinVar, Ensembl

2.2. Sequence-Based Prediction of Deleterious Mutations

SIFT was used to predict functional impact based on sequence homology and conservation, with scores less than or equal to 0.05 treated as damaging. PolyPhen-2 evaluated substitutions using sequence and structure-based features and classified them as benign, possibly damaging, or probably damaging.

Mutation Assessor measured functional impact using evolutionary conservation patterns within protein families and subfamilies, and FATHMM applied hidden Markov model-based prediction with lower scores indicating higher pathogenic likelihood.

Sequence-based tools listed: SIFT, PolyPhen-2, Mutation Assessor, FATHMM
SIFT damaging threshold: ≤ 0.05
Mutation Assessor deleterious FI threshold noted: > 2.0

2.3. Structure-Based Stability Prediction

MAESTROweb, mCSM, DynaMut2, and PremPS were used to predict mutation-induced stability changes. These tools estimate changes in Gibbs free energy (ΔΔG) and, in the case of DynaMut2, changes in vibrational entropy and conformational flexibility.

The paper explains that these methods help identify destabilizing substitutions and mutation hotspots by combining structural and evolutionary features with machine-learning or graph-based signatures.

Structure-based tools listed: MAESTROweb, mCSM, DynaMut2, PremPS
Negative ΔΔG is described in the methods as destabilizing for MAESTROweb and mCSM
DynaMut2 also evaluates ΔΔS / flexibility changes

2.4. Pathogenicity Prediction Tools

MutPred2 was used as a machine-learning-based pathogenicity predictor that also suggests possible molecular consequences such as altered secondary structure or interaction changes. SNPs&GO and PhD-SNP were used to classify disease-associated versus neutral variants using sequence, Gene Ontology, and support-vector-machine approaches.

Pathogenicity tools listed: MutPred2, SNPs&GO, PhD-SNP
MutPred2 high-confidence pathogenicity threshold mentioned: > 0.589
PhD-SNP disease threshold mentioned: > 0.5

2.5. Evolutionary Conservation Analysis

ConSurf was used to evaluate evolutionary conservation of amino acid residues using multiple sequence alignment and phylogenetic analysis. The methods describe scores from 1 for variable residues to 9 for highly conserved residues and note that disease-associated substitutions often occur at highly conserved positions.

Conservation tool: ConSurf
ConSurf score range described: 1 to 9

2.6. Aggregation Propensity Analysis

SODA was used to predict effects of mutations on solubility, disorder, secondary structure, and aggregation propensity. Arpeggio was used to analyze interatomic interactions such as hydrogen bonds, van der Waals contacts, hydrophobic interactions, and ionic interactions in protein structures.

Aggregation tool: SODA
Interaction-analysis tool: Arpeggio

4. Results and Discussion

The article combines results and discussion into a single major section. It evaluates ARSB variants stepwise using sequence, structure, disease-prediction, conservation, aggregation, interaction, and molecular-dynamics analyses to explain how selected mutations destabilize the protein and may contribute to MPS VI pathophysiology.

3.1. Deleterious Mutation Identification of nsSNPs

The paper states that sequence-based evaluation was carried out on ARSB nsSNPs using online tools including SIFT, PolyPhen2, PROVEAN, Mutation Assessor, and FATHMM, while structure-based analysis used PremPS, MAESTROweb, DynaMut2, and mCSM on variants located within the structurally resolved region.

The narrative reports that sequence-based analysis highlighted dangerous substitutions in counts of 302, 264, 238, and 167 depending on the tool, as shown graphically in Figure 2 on page 4. Structure-based predictions identified destabilizing substitutions with counts 132, 130, 64, and 130 across mCSM, DynaMut2, MAESTROweb, and PremPS, as summarized in Figure 3 on page 4.

After intersecting sequence-based and structure-based outputs, the article says that 44 amino acid changes were identified as harmful and destabilizing candidates for further investigation.

Sequence-based tools named in results: SIFT, PolyPhen2, PROVEAN, Mutation Assessor, FATHMM
Figure 2 on page 4 visualizes sequence-based deleterious mutation counts
Figure 3 on page 4 visualizes structure-based destabilizing versus stabilizing counts
Final harmful/destabilizing set highlighted in this subsection: 44 mutations

3.2. Identification of Pathogenic Mutations Using a Computational Approach

The disease-phenotype step used PhD-SNP, SNPs&GO, and MutPred2. The text says that among 57 high-confidence variants identified through the earlier sequence and structure workflow, 44 were classified as pathogenic after disease-phenotype assessment.

The article explicitly lists the 44 pathogenic mutations as A237D, C91R, E323K, E483D, G149R, G302R, G324V, G56D, G64R, H393P, I296N, I67N, K145E, L129P, L236P, L360P, L498P, L51P, L72P, L72R, L82P, L82R, L90P, L98P, L98R, P93R, R315P, R315Q, R327G, R327Q, R388T, T92K, V277G, V80G, W146R, W146S, W353R, W438G, Y138C, Y175D, Y210C, Y266S, Y86C, and Y86N.

Table 3 on page 8 provides per-variant disease predictions from PhD-SNP, SNPs&GO, and MutPred2 and also includes a final remark column.

High-confidence variants before phenotype classification: 57
Pathogenic variants after phenotype classification: 44
Table 3 location: page 8

3.3. Analysis of Conserved Residue

ConSurf analysis was used to assess evolutionary conservation across the ARSB protein. According to the text, 12 of the final mutations were considered especially disease-causing because they fell within highly conserved regions.

The 12 highlighted conserved-region variants are A237D, G56D, G64R, L51P, I67N, L236P, W353R, V277G, T92K, R315Q, R315P, and H393P. Figure 5 on page 5 presents the conserved-residue mapping across the ARSB region.

Highly conserved disease-linked set highlighted: 12 mutations
Figure 5 location: page 5

3.4. Analysis of Aggregation Propensity

SODA was used to estimate effects of selected mutations on solubility and aggregation. Table 1 on page 5 reports values for 12 variants and marks A237D and W353R as less soluble, while the remaining listed variants are marked as more soluble or unchanged.

The text surrounding Table 1 contains multiple count statements, including that 12 vnsSNPs decrease protein solubility, that all three solubility-reducing alterations are concerning, and that 29 nsSNPs enhanced amino-acid dissolution. These statements do not fully align with the table itself, so they are preserved as extraction notes rather than normalized.

Figure 6 on page 5 shows mutant structural snapshots for A237D and W353R and is used in the discussion to explain how altered noncovalent interactions may contribute to misfolding, loss of function, or aggregation.

Table 1 less-soluble variants explicitly listed: A237D and W353R
Table 1 more-soluble / non-negative entries listed: G56D, G64R, H393P, I67N, L236P, L51P, R315P, R315Q, T92K, V277G
Figure 6 location: page 5

3.4–3.5. Structural Interaction Interpretation

Table 2 on page 6 compares noncovalent interactions in wild-type ARSB versus A237D and W353R mutants. For wild type, the table lists 71 proximal contacts plus 3 van der Waals clashes for a total of 74. A237D is reported with 1 van der Waals contact, 11 van der Waals clashes, and 119 proximal contacts for a total of 131, while W353R is reported with 4 van der Waals contacts, 7 van der Waals clashes, and 119 proximal contacts for a total of 130.

The text argues that A237D increases local contacts and structural strain but also gains polar, hydrogen-bond, and ionic interactions, potentially preserving some integrity. By contrast, W353R is described as locally destabilizing through altered aromatic-region interactions and reduced hydrophobic organization.

Figure 7 on page 6 presents a conceptual disease model in which ARSB mutation leads to accumulation of sulfated glycosaminoglycans in lysosomes and contributes to lysosomal disorder pathogenesis.

Table 2 location: page 6
Wild-type total contact count in Table 2: 74
A237D total contact count in Table 2: 131
W353R total contact count in Table 2: 130
Figure 7 location: page 6

3.6. Molecular Dynamics Simulation Analysis

Selected mutant variants and wild-type ARSB were subjected to molecular dynamics simulation to validate structural consequences of high-confidence deleterious substitutions. The analysis included RMSD, RMSF, radius of gyration (Rg), SASA, and hydrogen-bond evaluations.

Compared with wild type, mutant proteins are described as showing delayed equilibration, higher RMSD, greater flexibility in conserved and functional regions, increased Rg and SASA, reduced compactness, and greater solvent exposure. The authors interpret these patterns as consistent with unfolding tendency and increased aggregation propensity.

Figure 8 on page 7 presents radius-of-gyration analysis, while Figure 9 on page 7 presents RMSD behavior and conformational deviation during the simulation.

MD metrics named: RMSD, RMSF, Rg, SASA, hydrogen bonds
Figure 8 location: page 7
Figure 9 location: page 7

5. Conclusion

The conclusion states that SNPs are among the most common genetic variants linked to human disease and argues that computational mutational analysis can reveal mechanisms underlying MPS VI pathogenesis.

According to the conclusion text, 139 of 429 mutations were harmful and destabilizing by sequence- and structure-based study, 44 variants were harmful after pathogenicity assessment, and two final mutations remained as likely disease-driving factors after ConSurf and aggregation analysis.

The article presents the work as a foundation for future targeted therapy development and as evidence for the value of advanced computational analysis in understanding mutation-driven molecular pathology in ARSB.

Conclusion-level harmful/destabilizing count reported: 139 of 429
Conclusion-level final disease-driving mutations after conservation/aggregation narrowing: 2
Therapeutic implication: identified mutations may guide targeted treatment strategies

6. Statements and Declarations

The article includes end-matter sections for conflicts of interest, funding, data availability, and declaration of AI-tool use.

Conflicts of Interest

The authors declare no conflict of interest.

Funding

This work received no funding.

Data availability statement

All data generated or analyzed during this study are included in this manuscript.

Declaration on the Use of AI Tools

The authors declare that ChatGPT (OpenAI) was used solely to refine the language, improve grammar, and enhance the clarity of the manuscript.

Figures

The PDF includes three numbered figures covering the annotation workflow and two sets of structural superimpositions for selected modeled proteins.

Figure 1

Main figure previewDownload

Figure 1 · p. 2
Figure 1. Diagram of the computational methods used to forecast pathogenicity of ARSB mutations across sequence-based, structure-based, disease-prediction, and sequence-aggregation analyses.
Download figure

Figure 2 · p. 4
Figure 2. Sequence-based deleterious mutation counts for ARSB shown as a computational summary graph.
Download figure

Figure 3 · p. 4
Figure 3. Structure-based deleterious mutation counts for ARSB, comparing destabilizing and stabilizing predictions across tools.
Download figure

Figure 4 · p. 4
Figure 4. Pathogenic mutations predicted in the ARSB gene using structure-based tools and disease-prediction outputs.
Download figure

Figure 5. Conserved residue map of the ARSB region used for ConSurf-based conservation interpretation.

Figure 5 · p. 5

Figure 5. Conserved residue map of the ARSB region used for ConSurf-based conservation interpretation.

Figure 6. Mutant structural views for A237D and W353R illustrating local structural defects and altered interaction environments.

Figure 6 · p. 5