Categories
Nevin Manimala Statistics

Analysis of Breast Cancer Information on Facebook Using Neural Network-Based Topic Modeling and Metadata Analysis of English and Spanish Content: Comparative Study

J Med Internet Res. 2025 Oct 15;27:e79161. doi: 10.2196/79161.

ABSTRACT

BACKGROUND: Breast cancer is the most common cancer diagnosis among women, with approximately 2.3 million new cases annually. When faced with a cancer diagnosis, individuals often turn to the internet for information or reassurance, despite the risk of encountering low-quality or incorrect information. While this observation is well documented in English, limited work has been done to understand the quality of breast cancer information in Spanish, the second most commonly spoken language in the United States.

OBJECTIVE: This study uses natural language processing methods and quantitative modeling to analyze English and Spanish breast cancer posts from Facebook, a vital source of health-related information for 40% of English-speaking and 60% of Spanish-speaking adults in the United States.

METHODS: Using the CrowdTangle application programming interface, we collected and processed 243,029 English-language and 96,334 Spanish-language Facebook posts. We applied BERTopic with the all-MiniLM-L6 model and k-means clustering to infer thematic structures and used coherence scores to determine the optimal number of topics for each language. Descriptive statistics compared metadata differences across languages. We calculated descriptive statistics and ran inferential tests for likes, comments, and shares. Finally, we examined the top 1% (n=2430 English and n=963 Spanish) of the most engaged content to analyze differences in poster characteristics across languages.

RESULTS: Coherence scores indicated an optimal topic solution of k=40 (coherence=0.58) for English and k=30 (coherence=0.52) for Spanish. Thematically, we observed similar content in both languages, with topics spanning mammography, breast cancer events, pink ribbon month, and personal narratives. However, Spanish posts included local and municipal breast cancer events not present in English. Additionally, Spanish posts were more likely to mention at-home breast exams, which are no longer recommended in the United States. Engagement behavior showed statistically significant differences by language across likes, comments, and shares. English posts exhibited more consistent liking and sharing behavior, while Spanish posts showed more consistency in commenting. The top 1% (n=2430) of engaged content in English came from leading breast cancer nonprofits, whereas in Spanish (n=963, 1%), it originated from local governments or food and beverage companies.

CONCLUSIONS: Facebook breast cancer content is generally consistent across languages. However, differences in engagement behavior suggest that English- and Spanish-speaking populations engage with content differently, highlighting cultural variability that warrants further exploration. Notably, leading cancer authorities may not have a strong presence in Spanish, indicating that the most accurate and up-to-date information may not be reaching a population particularly prone to worse breast cancer prognoses.

PMID:41091542 | DOI:10.2196/79161

By Nevin Manimala

Portfolio Website for Nevin Manimala