TY - JOUR
T1 - A graph proximity feature augmentation approach for identifying accounts of terrorists on twitter
AU - Aleroud, Ahmed
AU - Abu-Alsheeh, Nisreen
AU - Al-Shawakfa, Emad
N1 - Funding Information:
Ahmed Aleroud Ahmed AlEroud is a post-doctoral researcher at the University of Maryland, Baltimore County. He holds degrees in Information Systems (Ph.D. and M.S.) from the University of Maryland, Baltimore County, and Software Engineering (B.S.) from Hashemite University in Jordan. He was a Visiting Associate Research Scientist at the University of Maryland, Baltimore County working on Cyber Security research projects. His-research work focuses on Cyber-security, privacy preserving network data analytics, Detection of Social Engineering Attacks and identifying radical content on the web. He is a certified Cyber Security Trainer at KADD Cyber Security Military Academy in Jordan. He has published articles in journals such as Computers and Security, Information System Frontiers, IEEE Transactions, Knowledge and Information Systems. His-research has been sponsored by MITER, European IP Networks(RIPE), State of Maryland, and Higher Council for science and Technology in Jordan.
Publisher Copyright:
© 2020
PY - 2020/12
Y1 - 2020/12
N2 - With the popularity of social networks, terrorist groups such as ISIS encouraged others to follow their activities, share their ideas, recruit fans, radicalize communities, and raise funds to support future attacks. This has led to the emergence of radicalized online accounts that belong to terrorists or their fans. Existing techniques for counter-terrorism investigations which aim to suspend such accounts are based on reports by users or syntactic-based sentiment analysis techniques, which are not accurate on short texts shared by terrorist such as tweets. This work proposed a feature augmentation approach to enrich the content of tweets before investigating them to discover the radicalized online contents. The augmented tweets are then used to classify accounts into Pro-ISIS or Anti-ISIS categories. We utilized topic modeling as a baseline method for feature augmentation. We studied the effects of utilizing tweets at different time intervals on the quality of the generated models that classify tweets and the corresponding accounts. We then introduced a novel feature augmentation approach that utilizes Neighborhood Overlap, a graph proximity technique that discovers terms having a strong relationship with the Pro-ISIS category. Terms extracted from tweets are represented as nodes in a graph, which is then partitioned into clusters containing different terms. Terms in strongly connected parts of each cluster are augmented to the original term vectors of the tweets based on the similarity between those terms and each tweet. We compared our approach with other baseline augmentation techniques such Term-to-Term correlation, Topic Modeling, and other existing techniques. Experimental results on a dataset containing Pro- and Anti-ISIS tweets show that our approach is quite promising to automate the identification of terrorist contents online. The results have shown that using graph proximity measures such as Neighborhood Overlap for term augmentation gains higher Precision, Recall, and F-measure than the typical approaches. In addition, we found that applying time-based analysis with term augmentation to identify radicalized accounts enhanced the Precision of the investigation process.
AB - With the popularity of social networks, terrorist groups such as ISIS encouraged others to follow their activities, share their ideas, recruit fans, radicalize communities, and raise funds to support future attacks. This has led to the emergence of radicalized online accounts that belong to terrorists or their fans. Existing techniques for counter-terrorism investigations which aim to suspend such accounts are based on reports by users or syntactic-based sentiment analysis techniques, which are not accurate on short texts shared by terrorist such as tweets. This work proposed a feature augmentation approach to enrich the content of tweets before investigating them to discover the radicalized online contents. The augmented tweets are then used to classify accounts into Pro-ISIS or Anti-ISIS categories. We utilized topic modeling as a baseline method for feature augmentation. We studied the effects of utilizing tweets at different time intervals on the quality of the generated models that classify tweets and the corresponding accounts. We then introduced a novel feature augmentation approach that utilizes Neighborhood Overlap, a graph proximity technique that discovers terms having a strong relationship with the Pro-ISIS category. Terms extracted from tweets are represented as nodes in a graph, which is then partitioned into clusters containing different terms. Terms in strongly connected parts of each cluster are augmented to the original term vectors of the tweets based on the similarity between those terms and each tweet. We compared our approach with other baseline augmentation techniques such Term-to-Term correlation, Topic Modeling, and other existing techniques. Experimental results on a dataset containing Pro- and Anti-ISIS tweets show that our approach is quite promising to automate the identification of terrorist contents online. The results have shown that using graph proximity measures such as Neighborhood Overlap for term augmentation gains higher Precision, Recall, and F-measure than the typical approaches. In addition, we found that applying time-based analysis with term augmentation to identify radicalized accounts enhanced the Precision of the investigation process.
KW - Feature augmentation
KW - Graph Neighborhood
KW - Latent dirichlet allocation (lda)
KW - Social network analysis
KW - Temporal analysis
KW - Terrorism Informatics
UR - http://www.scopus.com/inward/record.url?scp=85092238247&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092238247&partnerID=8YFLogxK
U2 - 10.1016/j.cose.2020.102056
DO - 10.1016/j.cose.2020.102056
M3 - Article
AN - SCOPUS:85092238247
SN - 0167-4048
VL - 99
JO - Computers and Security
JF - Computers and Security
M1 - 102056
ER -