Towards Standardizing and Improving Classification of Bug-Fix Commits

Sarim Zafar; Muhammad Zubair Malik; Gursimran Singh Walia

doi:10.1109/ESEM.2019.8870174

Towards Standardizing and Improving Classification of Bug-Fix Commits

Sarim Zafar, Muhammad Zubair Malik, Gursimran Singh Walia

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

34 Scopus citations

Abstract

Background: Open source software repositories like GitHub are mined to gain useful empirical software engineering insights and answer critical research questions. However, the present state of the art mining approaches suffers from high error rate in the labeling of data that is used for such analysis. This is particularly true when labels are automatically generated from the commit message, and seriously undermines the results of these studies. Aim: Our goal is to label commit comments with high accuracy automatically. In this work, we focus on classifying a commit as a 'Bug-Fix commit' or not. Method: Traditionally, researchers have utilized keyword-based approaches to identify bug fix commits that leads to a significant increase in the error rate. We present an alternative methodology leveraging a deep neural network model called Bidirectional Encoder Representations from Transformers (BERT) that can understand the context of the commit message. We provide the rules for semantic interpretation of commit comments. We construct a hand-labeled dataset from real GitHub commits according to these rules and fine-tune BERT for classification. Results: Our initial evaluation shows that our approach significantly reduces the error rate, with up to 10% relative improvement in classification over keyword-based approaches. Future Direction: We plan on extending our dataset to cover more corner cases and reduce programming language specific biases. We also plan on refining the semantic rules. In this work, we have only considered a simple binary classification problem (Bug-Fix or not), which we plan to extend to other classes and extend the approach to consider multiclass problems. Conclusion: The rules, data, and the model proposed in this paper have the potential to be used by people analyzing open source repositories to improve the labeling of data used in their analysis.

Original language	English (US)
Title of host publication	Proceedings - 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019
Publisher	IEEE Computer Society
ISBN (Electronic)	9781728129686
DOIs	https://doi.org/10.1109/ESEM.2019.8870174
State	Published - Sep 2019
Externally published	Yes
Event	13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019 - Porto de Galinhas, Pernambuco, Brazil Duration: Sep 19 2019 → Sep 20 2019

Publication series

Name	International Symposium on Empirical Software Engineering and Measurement
Volume	2019-Septemer
ISSN (Print)	1949-3770
ISSN (Electronic)	1949-3789

Conference

Conference	13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019
Country/Territory	Brazil
City	Porto de Galinhas, Pernambuco
Period	9/19/19 → 9/20/19

Keywords

Human Factors
Mining Software Repositories
Predictive Models
Software Maintenance

ASJC Scopus subject areas

Computer Science Applications
Software

Access to Document

10.1109/ESEM.2019.8870174

Cite this

Zafar, S., Malik, M. Z., & Walia, G. S. (2019). Towards Standardizing and Improving Classification of Bug-Fix Commits. In Proceedings - 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019 Article 8870174 (International Symposium on Empirical Software Engineering and Measurement; Vol. 2019-Septemer). IEEE Computer Society. https://doi.org/10.1109/ESEM.2019.8870174

Towards Standardizing and Improving Classification of Bug-Fix Commits. / Zafar, Sarim; Malik, Muhammad Zubair; Walia, Gursimran Singh.
Proceedings - 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019. IEEE Computer Society, 2019. 8870174 (International Symposium on Empirical Software Engineering and Measurement; Vol. 2019-Septemer).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zafar, S, Malik, MZ & Walia, GS 2019, Towards Standardizing and Improving Classification of Bug-Fix Commits. in Proceedings - 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019., 8870174, International Symposium on Empirical Software Engineering and Measurement, vol. 2019-Septemer, IEEE Computer Society, 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019, Porto de Galinhas, Pernambuco, Brazil, 9/19/19. https://doi.org/10.1109/ESEM.2019.8870174

@inproceedings{c2e06cc717d741f19f4045471a33f3d1,

title = "Towards Standardizing and Improving Classification of Bug-Fix Commits",

abstract = "Background: Open source software repositories like GitHub are mined to gain useful empirical software engineering insights and answer critical research questions. However, the present state of the art mining approaches suffers from high error rate in the labeling of data that is used for such analysis. This is particularly true when labels are automatically generated from the commit message, and seriously undermines the results of these studies. Aim: Our goal is to label commit comments with high accuracy automatically. In this work, we focus on classifying a commit as a 'Bug-Fix commit' or not. Method: Traditionally, researchers have utilized keyword-based approaches to identify bug fix commits that leads to a significant increase in the error rate. We present an alternative methodology leveraging a deep neural network model called Bidirectional Encoder Representations from Transformers (BERT) that can understand the context of the commit message. We provide the rules for semantic interpretation of commit comments. We construct a hand-labeled dataset from real GitHub commits according to these rules and fine-tune BERT for classification. Results: Our initial evaluation shows that our approach significantly reduces the error rate, with up to 10% relative improvement in classification over keyword-based approaches. Future Direction: We plan on extending our dataset to cover more corner cases and reduce programming language specific biases. We also plan on refining the semantic rules. In this work, we have only considered a simple binary classification problem (Bug-Fix or not), which we plan to extend to other classes and extend the approach to consider multiclass problems. Conclusion: The rules, data, and the model proposed in this paper have the potential to be used by people analyzing open source repositories to improve the labeling of data used in their analysis.",

keywords = "Human Factors, Mining Software Repositories, Predictive Models, Software Maintenance",

author = "Sarim Zafar and Malik, {Muhammad Zubair} and Walia, {Gursimran Singh}",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019 ; Conference date: 19-09-2019 Through 20-09-2019",

year = "2019",

month = sep,

doi = "10.1109/ESEM.2019.8870174",

language = "English (US)",

series = "International Symposium on Empirical Software Engineering and Measurement",

publisher = "IEEE Computer Society",

booktitle = "Proceedings - 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019",

}

TY - GEN

T1 - Towards Standardizing and Improving Classification of Bug-Fix Commits

AU - Zafar, Sarim

AU - Malik, Muhammad Zubair

AU - Walia, Gursimran Singh

PY - 2019/9

Y1 - 2019/9

N2 - Background: Open source software repositories like GitHub are mined to gain useful empirical software engineering insights and answer critical research questions. However, the present state of the art mining approaches suffers from high error rate in the labeling of data that is used for such analysis. This is particularly true when labels are automatically generated from the commit message, and seriously undermines the results of these studies. Aim: Our goal is to label commit comments with high accuracy automatically. In this work, we focus on classifying a commit as a 'Bug-Fix commit' or not. Method: Traditionally, researchers have utilized keyword-based approaches to identify bug fix commits that leads to a significant increase in the error rate. We present an alternative methodology leveraging a deep neural network model called Bidirectional Encoder Representations from Transformers (BERT) that can understand the context of the commit message. We provide the rules for semantic interpretation of commit comments. We construct a hand-labeled dataset from real GitHub commits according to these rules and fine-tune BERT for classification. Results: Our initial evaluation shows that our approach significantly reduces the error rate, with up to 10% relative improvement in classification over keyword-based approaches. Future Direction: We plan on extending our dataset to cover more corner cases and reduce programming language specific biases. We also plan on refining the semantic rules. In this work, we have only considered a simple binary classification problem (Bug-Fix or not), which we plan to extend to other classes and extend the approach to consider multiclass problems. Conclusion: The rules, data, and the model proposed in this paper have the potential to be used by people analyzing open source repositories to improve the labeling of data used in their analysis.

AB - Background: Open source software repositories like GitHub are mined to gain useful empirical software engineering insights and answer critical research questions. However, the present state of the art mining approaches suffers from high error rate in the labeling of data that is used for such analysis. This is particularly true when labels are automatically generated from the commit message, and seriously undermines the results of these studies. Aim: Our goal is to label commit comments with high accuracy automatically. In this work, we focus on classifying a commit as a 'Bug-Fix commit' or not. Method: Traditionally, researchers have utilized keyword-based approaches to identify bug fix commits that leads to a significant increase in the error rate. We present an alternative methodology leveraging a deep neural network model called Bidirectional Encoder Representations from Transformers (BERT) that can understand the context of the commit message. We provide the rules for semantic interpretation of commit comments. We construct a hand-labeled dataset from real GitHub commits according to these rules and fine-tune BERT for classification. Results: Our initial evaluation shows that our approach significantly reduces the error rate, with up to 10% relative improvement in classification over keyword-based approaches. Future Direction: We plan on extending our dataset to cover more corner cases and reduce programming language specific biases. We also plan on refining the semantic rules. In this work, we have only considered a simple binary classification problem (Bug-Fix or not), which we plan to extend to other classes and extend the approach to consider multiclass problems. Conclusion: The rules, data, and the model proposed in this paper have the potential to be used by people analyzing open source repositories to improve the labeling of data used in their analysis.

KW - Human Factors

KW - Mining Software Repositories

KW - Predictive Models

KW - Software Maintenance

UR - http://www.scopus.com/inward/record.url?scp=85074293829&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074293829&partnerID=8YFLogxK

U2 - 10.1109/ESEM.2019.8870174

DO - 10.1109/ESEM.2019.8870174

M3 - Conference contribution

AN - SCOPUS:85074293829

T3 - International Symposium on Empirical Software Engineering and Measurement

BT - Proceedings - 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019

PB - IEEE Computer Society

T2 - 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019

Y2 - 19 September 2019 through 20 September 2019

ER -

Towards Standardizing and Improving Classification of Bug-Fix Commits

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this