Machine Learning Approach to Assign Protein Secondary Structure Elements from Ca Trace

Mohammad Al Sallal, Wei Chen, Kamal Al Nasr

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Secondary structure elements in protein molecules refer to local sub-conformational regions stabilized by hydrogen bonding. Secondary structure elements can be divided into helical, sheet, or loop. Secondary structure elements bolster the folding and topology of the protein. They are important for modern structural bioinformatics such as protein modeling and functional analysis. Therefore, assigning the types of secondary structures in proteins is crucial. Many methods have been developed to address the problem. Methods can be categorized into two approaches. One approach uses the information about hydrogen bonding and energy while the other approach uses protein trace geometry. If the information of some atoms is missing, the second approach is more feasible. In this paper, we develop a machine learning method that belongs to the second approach to assign secondary structure elements. We develop a 3-state machine learning classifier. The classifier uses protein's Ca information only. The classifier ensembles four (4) machine learning models: Random Forest, Support Vector Machine, Multilayer Perceptron, and eXtreme Gradient Boosting. The classifier is trained with 600K amino acids. We tested our classifier at two different data sets. One data set contains 150K amino acids. The accuracy of our system was 94.6%. In addition, the classifier was tested on a set of 20 protein structures and compared with PCASSO from the same category. The information from Protein Data Bank was used as a reference. The comparison shows that our method can produce assignments that are more aligned with PDB at 93% accuracy while PCASSO achieved S4% accuracy.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
EditorsTaesung Park, Young-Rae Cho, Xiaohua Tony Hu, Illhoi Yoo, Hyun Goo Woo, Jianxin Wang, Julio Facelli, Seungyoon Nam, Mingon Kang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages35-41
Number of pages7
ISBN (Electronic)9781728162157
DOIs
StatePublished - Dec 16 2020
Externally publishedYes
Event2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020 - Virtual, Seoul, Korea, Republic of
Duration: Dec 16 2020Dec 19 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020

Conference

Conference2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
Country/TerritoryKorea, Republic of
CityVirtual, Seoul
Period12/16/2012/19/20

Keywords

  • Ca backbone
  • chain trace
  • protein
  • protein modeling
  • secondary structure assignment
  • secondary structure prediction

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems and Management
  • Medicine (miscellaneous)
  • Health Informatics

Fingerprint

Dive into the research topics of 'Machine Learning Approach to Assign Protein Secondary Structure Elements from Ca Trace'. Together they form a unique fingerprint.

Cite this