TY - GEN
T1 - Subspace Modeling for Classification of Protein Secondary Structure Elements from Cα Trace
AU - Sekmen, Ali
AU - Nasr, Kamal Al
AU - Jones, Christopher
N1 - Funding Information:
Ali Sekmen’s research is supported by DOD grant W911NF-20-100284. Kamal Al Nasr’s research is supported by NIH Academic Research Enhancement Award (R15 AREA: 1R15GM126509 01). *Corresponding author.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - This paper presents a novel subspace segmentation algorithm that models protein Calpha traces of secondary structure elements (SSEs) as a union of subspaces. For each Calpha, a set of general geometric features are considered. The algorithm first identifies the most relevant features for each SSE using a new matrix rank estimation technique and combinatorics. This is followed by grouping Calpha traces in a sliding-window so that each group represents a data point in a high-dimensional ambient space. Then, a lower dimensional subspace is matched for each SSE. When a group of unknown Calpha traces is presented, the algorithm determines a neighborhood around each Calpha and then uses two approaches to classify the Calpha. In the first approach, the Calpha is represented as a data point in the ambient space and its distance to each subspace is calculated. In the second approach, a local subspace is matched to the Calpha, and the separation of this local subspace from each SSE subspace is computed using geodesic distance on the Grassmannian manifold of the subspaces. The minimum point-to-subspace distance and minimum separation of subspaces are used to classify the Calpha. This geometric and mathematical approach has been applied a large protein dataset and generated 85% classification rate without the need to train a large machine learning system.
AB - This paper presents a novel subspace segmentation algorithm that models protein Calpha traces of secondary structure elements (SSEs) as a union of subspaces. For each Calpha, a set of general geometric features are considered. The algorithm first identifies the most relevant features for each SSE using a new matrix rank estimation technique and combinatorics. This is followed by grouping Calpha traces in a sliding-window so that each group represents a data point in a high-dimensional ambient space. Then, a lower dimensional subspace is matched for each SSE. When a group of unknown Calpha traces is presented, the algorithm determines a neighborhood around each Calpha and then uses two approaches to classify the Calpha. In the first approach, the Calpha is represented as a data point in the ambient space and its distance to each subspace is calculated. In the second approach, a local subspace is matched to the Calpha, and the separation of this local subspace from each SSE subspace is computed using geodesic distance on the Grassmannian manifold of the subspaces. The minimum point-to-subspace distance and minimum separation of subspaces are used to classify the Calpha. This geometric and mathematical approach has been applied a large protein dataset and generated 85% classification rate without the need to train a large machine learning system.
KW - Cα backbone
KW - protein modeling
KW - secondary structure classification
KW - subspace segmentation
UR - http://www.scopus.com/inward/record.url?scp=85125175610&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125175610&partnerID=8YFLogxK
U2 - 10.1109/BIBM52615.2021.9669762
DO - 10.1109/BIBM52615.2021.9669762
M3 - Conference contribution
AN - SCOPUS:85125175610
T3 - Proceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
SP - 72
EP - 79
BT - Proceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
A2 - Huang, Yufei
A2 - Kurgan, Lukasz
A2 - Luo, Feng
A2 - Hu, Xiaohua Tony
A2 - Chen, Yidong
A2 - Dougherty, Edward
A2 - Kloczkowski, Andrzej
A2 - Li, Yaohang
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
Y2 - 9 December 2021 through 12 December 2021
ER -