This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Interactive Journal of Medical Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.i-jmr.org/, as well as this copyright and license information must be included.
When people with health conditions begin to manage their health issues, one important issue that emerges is the question as to what exactly do they do with the information that they have obtained through various sources (eg, news media, social media, health professionals, friends, and family). The information they gather helps form their opinions and, to some degree, influences their attitudes toward managing their condition.
This study aimed to understand how tinnitus is represented in the US newspaper media and in Facebook pages (ie, social media) using text pattern analysis.
This was a cross-sectional study based upon secondary analyses of publicly available data. The 2 datasets (ie, text corpuses) analyzed in this study were generated from US newspaper media during 1980-2017 (downloaded from the database US Major Dailies by ProQuest) and Facebook pages during 2010-2016. The text corpuses were analyzed using the Iramuteq software using cluster analysis and chi-square tests.
The newspaper dataset had 432 articles. The cluster analysis resulted in 5 clusters, which were named as follows: (1) brain stimulation (26.2%), (2) symptoms (13.5%), (3) coping (19.8%), (4) social support (24.2%), and (5) treatment innovation (16.4%). A time series analysis of clusters indicated a change in the pattern of information presented in newspaper media during 1980-2017 (eg, more emphasis on cluster 5, focusing on treatment inventions). The Facebook dataset had 1569 texts. The cluster analysis resulted in 7 clusters, which were named as: (1) diagnosis (21.9%), (2) cause (4.1%), (3) research and development (13.6%), (4) social support (18.8%), (5) challenges (11.1%), (6) symptoms (21.4%), and (7) coping (9.2%). A time series analysis of clusters indicated no change in information presented in Facebook pages on tinnitus during 2011-2016.
The study highlights the specific aspects about tinnitus that the US newspaper media and Facebook pages focus on, as well as how these aspects change over time. These findings can help health care providers better understand the presuppositions that tinnitus patients may have. More importantly, the findings can help public health experts and health communication experts in tailoring health information about tinnitus to promote self-management, as well as assisting in appropriate choices of treatment for those living with tinnitus.
Tinnitus can be defined as the conscious perception of an auditory sensation in the absence of a corresponding external stimulus [
Medical therapy alone is not sufficient to address the distress caused by various chronic symptoms and conditions [
The ubiquitous nature of mass media, in particular news media, makes it one of the most powerful sources of influence on specific issues (eg, climate change, political party), including health issues [
Although the media can help place specific health issues on a local and national agenda, there may be a discrepancy in what issues are covered and in their social and economic impact on society. For instance, we can assume that the issues more frequently raised in the media are those which are more likely to draw more attention and readership. Moreover, Hartz et al reports that there is a growing tension between media reporters, scientists, and health professionals [
There is some literature on how the deaf population is represented in the media [
This study aimed to understand how tinnitus is represented in the US newspaper media and in Facebook pages (ie, social media) using text pattern analysis.
This study used a cross-sectional design based upon secondary analyses of publicly available data. The 2 datasets analyzed in this study were generated from US newspaper media and Facebook pages. The study did not require ethical approval as the data were gathered from publicly available sources. Only the publicly available Facebook pages (not personal pages) were included in the data extraction, thus maintaining the anonymity of the responses, and no personally identifiable information was included. Moreover, no individual dataset was discussed in the paper, again maintaining the anonymity of the data. Considering the minimal or no potential risk to individual participants, no ethical approval was required [
To develop the newspaper media text corpus (ie, large and structured set of texts), we first explored the databases with the newspaper collection available at Lamar University, United States. Major Dailies by ProQuest was the database that had the largest newspaper collection. We then searched for articles related to tinnitus in this database between the years 1980 and 2017 and downloaded the results as text corpus. A python script was written to convert the text corpus to a format that was needed for data analysis and to preserve the metadata (ie, newspaper name, year of publication).
A different python script was written to extract posts with Facebook pages dealing with tinnitus (during 2010-2016). In total, 20 Facebook pages with more than 100 likes were identified, and the postings were downloaded as a text corpus. It is important to note that the data extraction was limited to what data were available publicly (ie, newspaper data during 1980-2017 and Facebook posts during 2010-2016).
The text corpuses were analyzed using the Iramuteq software [
The text corpus is composed of multiple newspaper articles. The software treats each of these articles as text (ie, it’s the first unit). The first step of the analysis is to segment each article (ie, text) into smaller units called text segments (ie, each text is split into multiple text segments based on criteria of size and punctuation). The split of the text into segments decreases the granularity of the units and thus makes it possible to increase the precision of certain analysis, in particular the search of themes within the text corpus. The goal is to create segments of consistent size while trying to maintain the natural segmentation of texts marked by punctuation. The segmentation process is based on a cutoff criterion that weighs the segment size by punctuation. The procedure does not fully respect the parameterized segment size if a strong punctuation (eg, a period, a question mark, and an exclamation mark) is present within a 15% margin of the planned cutoff. In the next step, the text corpus is lemmatized (ie, words are sorted by grouping inflected or variant forms of the same word) to their simplest forms, which are called lemmas (ie, group of words in a single form). Moreover, the software makes distinctions between “full words” (eg, verbs, noun, adjectives, and adverbs) and “tool words” (eg, pronoun, determent, and useful verbs such as—to be and to have). This distinction is done so that only full words are included in the main analysis. These steps are necessary to convert the large corpus into a more manageable dataset that is essential for further analysis. To specifically analyze the text that is closely related to “tinnitus,” the text segments related to this object were extracted and new corpuses were formed. It is important to note that the expression “tinnitus” was inside each of the text segments extracted. The subcorpus with more directed text segments was used for all further analyses.
This was followed by cluster analysis made with the Reinert method used for textual data analysis [
In this analysis, the program initially builds a binary matrix with text segments in rows and full words in columns, and it then performs hierarchical divisive clustering based on a series of bipartitions made with correspondent analysis. At each step of the classification, the larger remaining cluster is cut into 2 parts by computing the information extracted while cutting after each line of the matrix along the first factor of the correspondent analysis. The remaining cut is the one that maximizes the information extracted. In a second step, each line is reversed from one cluster to the other. If this reversion increases the information extracted, it is kept. This step loops until no inversion increases the information extracted for the whole table. This cluster analysis groups the text segments based on co-occurrence of lemmas. Each of the clusters aim to be homogeneous (regrouping text segments with the common pattern of lemmas). Moreover, clusters have to be as heterogeneous as possible between them (pattern of lemmas between groups should be as different as possible). The results are presented in a dendrogram that characterizes the clustering. For each cluster, the program computes profiles of lemmas, which are overrepresented (ie, significantly in a higher proportion within the cluster when compared with the rest of the text corpus based on chi-square analysis). Finally, the same text corpus was subjected to a time series analysis using the metadata. For example, in this corpus, we analyzed how the patterns of clusters change over time (see
Dendrogram (ie, classification of clusters), size of clusters as percentage of the text segments, and overrepresented words in each cluster in tinnitus newspaper corpus. (Note: The words are ordered by chi-square value with words at the bottom having a lower value).
Chronological bar showing proportion of each cluster for each year in tinnitus US newspaper media corpus (Note: Width of the bar is proportional to the number of text segments each year, and the height of the clusters represents the frequency of text segments within clusters).
Chronological bar based on chi-square analysis showing proportion of each cluster for each year in tinnitus US newspaper media corpus (Note: Width of the bar is proportional to the number of text segments each year, and the height of the bar represents the size of clusters).
Dendrogram (ie, classification of clusters), size of clusters as percentage of the text segments, and overrepresented words in each cluster in tinnitus on Facebook pages corpus. (Note: The words are ordered by chi-square value with words at the bottom having a lower value).
Chronological bar showing proportion of each cluster for each year on tinnitus Facebook pages corpus (Note: Width of the bar is proportional to the number of text segments each year, and the height of the clusters represents the frequency of text segments within clusters).
Chronological bar based on chi-square analysis showing the proportion of each cluster for each year on tinnitus Facebook pages corpus (Note: Width of the bar is proportional to the number of text segments each year, and the height of the bar represents the size of clusters).
The initial analysis text corpus had 433 texts (ie, each article from a newspaper is considered a text), 9176 text segments (ie, each text is split into multiple text segments based on criteria of size and punctuation), and 309,524 occurrences or tokens (ie, number of words). After extraction of a text segment related to “tinnitus,” the text corps was reduced to 432 texts, 2173 text segments, and 79,684 occurrences or tokens.
Frequency and percentage of articles containing at least one text-segment related to tinnitus in the subcorpus from different newspapers.
Newspaper | Frequency of articles, n (%) |
Chicago Tribune | 46 (11) |
Farm Weekly | 3 (0.7) |
Investor’s Business daily | 8 (1.9) |
Journal of Commerce | 1 (0.2) |
Journal Record | 2 (0.5) |
Los Angeles Times | 25 (5.8) |
Marine Corps Times | 1 (0.2) |
Miami Daily Business Review | 1 (0.2) |
Missouri Lawyers Media | 1 (0.2) |
NASDAQ OMXs News Release Distribution Channel | 30 (7.0) |
New York Times | 27 (6.3) |
Roll Call | 2 (0.4) |
Targeted News Service | 57 (13.1) |
The Village Voice | 2 (0.5) |
The Washington Post | 72 (16.7) |
The Weekly Times | 1 (0.2) |
US Fed News Services | 133 (30.8) |
Wall Street Journal | 20 (4.6) |
Total | 432 (100.0) |
Frequency and percentage of articles containing at least one text-segment related to tinnitus in the subcorpus based on timescales from different newspapers.
Year | Frequency of articles, n (%) |
1980 | 1 (0.2) |
1981 | 1 (0.2) |
1985 | 2 (0.5) |
1986 | 2 (0.5) |
1987 | 3 (0.7) |
1988 | 3 (0.7) |
1989 | 3 (0.7) |
1990 | 4 (0.9) |
1991 | 1 (0.2) |
1992 | 2 (0.5) |
1993 | 3 (0.7) |
1994 | 4 (0.9) |
1995 | 3 (0.7) |
1996 | 1 (0.2) |
1997 | 1 (0.2) |
1998 | 10 (2.3) |
1999 | 7 (1.6) |
2000 | 7 (1.6) |
2001 | 3 (0.7) |
2002 | 6 (1.4) |
2003 | 7 (1.6) |
2004 | 14 (3.2) |
2005 | 43 (10.0) |
2006 | 6 (1.4) |
2007 | 21 (4.9) |
2008 | 18 (4.2) |
2009 | 14 (3.2) |
2010 | 24 (5.6) |
2011 | 46 (10.7) |
2012 | 28 (6.5) |
2013 | 20 (4.6) |
2014 | 42 (9.7) |
2015 | 41 (9.5) |
2016 | 33 (7.7) |
2017 | 8 (1.9) |
Total | 432 (100.0) |
Example of a text segment for each cluster in newspaper media text corpus.
Cluster | Example of a text segment |
Cluster 1: |
Past research has |
Cluster 2: |
|
Cluster 3: |
Playing |
Cluster 4: |
A |
Cluster 5: |
As of |
Example of a text segment for each cluster in Facebook pages text corpus.
Cluster | Example of a text segment |
Cluster 1: |
I went to an |
Cluster 2: |
I have temporo mandibular |
Cluster 3: |
|
Cluster 4: |
Greetings, please |
Cluster 5: |
It was |
Cluster 6: |
|
Cluster 7: |
Can a thyroid affect your tinnitus and my thyroid is coming out soon. I am hoping the hissing in my ear will subside a bit. I had to |
Facebook pages text corpus had 1569 texts, 2747 text segments, and 78,218 occurrences or tokens. We have removed the URL and also texts that contained only the URL before analysis.
The media has played a prominent role in the dissemination of health information [
The analysis of text corpus extracted from the US newspaper media suggested that the information disseminated via newspaper media focuses mainly on 2 elements: (1) new developments in treatments (ie, brain stimulation, treatment innovations) and (2) disease information (ie, symptoms, coping, and social support). It is interesting to note that there is a fairly equal spread of information among all of these elements. Moreover, the analysis trends over time regarding information indicated a change in patterns of information presented in the newspaper media during 1980-2017. For example, there is an increasing emphasis of information focusing on treatment inventions (cluster 5 in
The analysis of content in Facebook pages related to tinnitus in this study suggests that tinnitus sufferers use social media for various purposes, including gaining symptoms and diagnostic information, social support, learning to cope, and also to obtain information about research in this area. It is important to note that nearly half of the discussions in Facebook pages were related to diagnosis and symptoms (ie, 43.3%), suggesting that this platform is used by tinnitus sufferers for self-assessment of their condition. Facebook pages are fairly recent when compared with news media, and we explored Facebook pages information only during the years 2011-2016. Time series analysis showed no change in patterns of information during this time, unlike newspaper media. The results of this study relate well to previous studies on social media and other health conditions; for example, a recent study was conducted on the hearing aid community who used social media sources for advice and support, information sharing, and service-related information [
One would expect that the media would provide publicity for an intervention that is evidence-based. Although there are various treatments and/or management strategies available for tinnitus sufferers, psychological interventions such as cognitive behavioral therapy have the best evidence base for alleviating tinnitus distress [
Overall, there is growing literature about the importance of health communication in chronic condition management, particularly concerning the role of mass media and social media in forming individuals’ opinions, and its bearing upon health behaviors. For this reason, it is important for health care professionals to be aware of the type of information that is being provided by the media on specific health conditions such as hearing loss and/or tinnitus.
This study has several practical implications. As the media plays an important role in formulating people’s knowledge and opinions [
This study is the first of its kind in the area of tinnitus. However, it has a few limitations. First, the text pattern analysis using software helps analyze big data, and although the analysis provides more of a macro view of the data, it may only provide a superficial understanding of its content. The automated data analysis may help us understand “what” aspects of tinnitus are represented in the media, rather than “how” tinnitus and these aspects are represented. For example, questions such as
This study explored how tinnitus is represented in the media, specifically in the US newspaper media and in Facebook pages. The information in the newspapers regarding tinnitus was mainly about brain stimulation, symptoms, coping, social support, and treatment innovations. Time series analysis showed that there were some changes in the patterns of information presented in newspaper media during 1990-2017. The information in Facebook pages about tinnitus was mainly related to diagnosis, cause, research and development, social support, challenges, symptoms, and coping. However, no changes in the patterns of information presented in Facebook pages between 2011 and 2016 were noted. These findings can help clinicians to better understand presuppositions tinnitus patients may have in regard to their condition. In addition, the findings can also be of interest to public health and health communication experts, allowing them to tailor health information about tinnitus to promote self-management and appropriate treatment choices for tinnitus sufferers.
The authors would like to acknowledge Gayatri Bhatta and Amit Kumar for help with data extraction from the US Newspaper database and Facebook pages and Prof William Harn for initial thoughts about text pattern analysis.
None declared.