Detecting science-based health disinformation: a stylometric machine learning approach

Jason A. Williams; Ahmed Aleroud; Danielle Zimmerman

doi:10.1007/s42001-023-00213-y

Detecting science-based health disinformation: a stylometric machine learning approach

Jason A. Williams, Ahmed Aleroud, Danielle Zimmerman

Research output: Contribution to journal › Article › peer-review

Abstract

The COVID-19 pandemic showed that misleading scientific health information has become widespread and is challenging to counteract. Some of this disinformation comes from modification of medical research results. This paper investigates how humans create health disinformation through controlled changes of text from abstracts of peer-reviewed COVID-19 research papers. We also developed a machine learning model that used statement embeddings, readability, and text quality features to create datasets that contain falsified scientific statements. We then created machine learning classification models to identify statements containing disinformation. Our results reveal the importance of readability metrics and information quality features in identifying which statements were falsified. We show that text embeddings and semantic similarity do not yield a high detection rate of true/falsified statements compared to using information quality and readability features.

Original language	English (US)
Pages (from-to)	817-843
Number of pages	27
Journal	Journal of Computational Social Science
Volume	6
Issue number	2
DOIs	https://doi.org/10.1007/s42001-023-00213-y
State	Published - Oct 2023

Keywords

COVID-19
Health disinformation
Human behavior
Machine learning
Science

ASJC Scopus subject areas

Transportation
Artificial Intelligence

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1007/s42001-023-00213-y

Cite this

@article{cef6634cc27443df871616991fc291cb,

title = "Detecting science-based health disinformation: a stylometric machine learning approach",

abstract = "The COVID-19 pandemic showed that misleading scientific health information has become widespread and is challenging to counteract. Some of this disinformation comes from modification of medical research results. This paper investigates how humans create health disinformation through controlled changes of text from abstracts of peer-reviewed COVID-19 research papers. We also developed a machine learning model that used statement embeddings, readability, and text quality features to create datasets that contain falsified scientific statements. We then created machine learning classification models to identify statements containing disinformation. Our results reveal the importance of readability metrics and information quality features in identifying which statements were falsified. We show that text embeddings and semantic similarity do not yield a high detection rate of true/falsified statements compared to using information quality and readability features.",

keywords = "COVID-19, Health disinformation, Human behavior, Machine learning, Science",

author = "Williams, {Jason A.} and Ahmed Aleroud and Danielle Zimmerman",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.",

year = "2023",

month = oct,

doi = "10.1007/s42001-023-00213-y",

language = "English (US)",

volume = "6",

pages = "817--843",

journal = "Journal of Computational Social Science",

issn = "2432-2717",

publisher = "Springer Nature",

number = "2",

}

TY - JOUR

T1 - Detecting science-based health disinformation

T2 - a stylometric machine learning approach

AU - Williams, Jason A.

AU - Aleroud, Ahmed

AU - Zimmerman, Danielle

PY - 2023/10

Y1 - 2023/10

N2 - The COVID-19 pandemic showed that misleading scientific health information has become widespread and is challenging to counteract. Some of this disinformation comes from modification of medical research results. This paper investigates how humans create health disinformation through controlled changes of text from abstracts of peer-reviewed COVID-19 research papers. We also developed a machine learning model that used statement embeddings, readability, and text quality features to create datasets that contain falsified scientific statements. We then created machine learning classification models to identify statements containing disinformation. Our results reveal the importance of readability metrics and information quality features in identifying which statements were falsified. We show that text embeddings and semantic similarity do not yield a high detection rate of true/falsified statements compared to using information quality and readability features.

AB - The COVID-19 pandemic showed that misleading scientific health information has become widespread and is challenging to counteract. Some of this disinformation comes from modification of medical research results. This paper investigates how humans create health disinformation through controlled changes of text from abstracts of peer-reviewed COVID-19 research papers. We also developed a machine learning model that used statement embeddings, readability, and text quality features to create datasets that contain falsified scientific statements. We then created machine learning classification models to identify statements containing disinformation. Our results reveal the importance of readability metrics and information quality features in identifying which statements were falsified. We show that text embeddings and semantic similarity do not yield a high detection rate of true/falsified statements compared to using information quality and readability features.

KW - COVID-19

KW - Health disinformation

KW - Human behavior

KW - Machine learning

KW - Science

UR - http://www.scopus.com/inward/record.url?scp=85163369334&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85163369334&partnerID=8YFLogxK

U2 - 10.1007/s42001-023-00213-y

DO - 10.1007/s42001-023-00213-y

M3 - Article

AN - SCOPUS:85163369334

SN - 2432-2717

VL - 6

SP - 817

EP - 843

JO - Journal of Computational Social Science

JF - Journal of Computational Social Science

IS - 2

ER -

Detecting science-based health disinformation: a stylometric machine learning approach

Abstract

Keywords

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this