Reporting Social Media-based Adverse Events with Artificial Intelligence: Elaborating the Challenges – Mitigating with Innovation
Tata Consultancy Services
By Dr. Ashish Indani, Devraj Goulikar, Akhil Nair, Pratibha Potare, Dr. Sonal More
Download the full article in its original form using the link at the bottom.
Over the last decade, social media, which is at the peak of freedom of expression, has gained immense popularity and importance 1. With more than 3 billion Twitter users and even more Facebook users, a lot of information is generated on these two platforms alone. Among other diverse topics, social media posts reporting also includes healthcare topics such as doctor’s review or a treatment protocol, and ‘adverse medical events reports2.’ Though the Life Sciences industry acknowledges the influence of social media on people’s decision for healthcare services and product selection 3,4,5,6 there is a paradigm shift in its attitude after the regulatory authorities accepted social media as one of the sources of adverse drug events (ADEs) in February 2014 7. While social media, as a new source, added only 5-7 % to the volume of ADEs, it opened a complex and challenging situation for the industry and regulatory stakeholders.
The researchers insist a need for innovation in order to identify adverse events in social media posts amidst the challenges 8. Possible solutions for processing adverse events (AE), adverse drug reactions (ADRs) or device incident reports within social media sources need to overcome all such challenges.
While searching and processing adverse events, one can be faced with numerous challenges in vigilance of social media posts9. Top most challenges that impose maximum limitations are large volumes of data, proportion of noise, diversity in content, expression, language and posting formats, non-textual content used as text, and use of symbols, emoticons and jargons. These challenges are extremely sensitive for the industry because the regulatory timelines are rigid, stringent and continuously restricting. Hence, filtering out actual and complete ADE reports from the social media, with nothing amiss, is a herculean task 10, 11.
Data Volume and Noise
The first major problem is the volume of data and its major proportion being just noise. Pierce, C.E., Bouri, K., Pamer, C. et al. in their study demonstrated that out of almost a million of posts processed, there were only six potential adverse events identified 12. The second major challenge revolves around the unpredicted volume spikes. Several triggers classified under popular posts, popular links and sentiment drivers such as a mass event, price hike, or large number of posts by an influencer commonly trigger steep spikes in the posts’ volumes 13. Such situations remain unpredictable in terms of total volume of posts, volume of true case and false positive cases, noise and even intentionally posted negative material. In addition, noise comes through reposting of the same data by multiple users, either independently or with citation of the original content/ author.
Information on social media has a great diversity of content, language, jargon etc. 14, which makes the information extraction process extremely difficult. We come across words such as ‘BTW,’ which is used as a common acronym to substitute ‘by the way,’ and also as an abbreviation for the preposition ‘between.’
Similarly, ‘W/o’ and ‘WO’ are used synonymously to indicate ‘without’. However, ‘W/o’ also stands for ‘wife of’, ‘which one’, and ‘was offending’ while ‘WO’ is used to denote ‘working in office’, ‘written off’, and ‘which outfit’. The reference is usually drawn from the context of the statement 15.
Need for Contextualization
Contextualization is extremely important in processing social media posts 16. For example, “I am high” can have two different contexts; one of climbing over a hill or a staircase [meaning increasing altitude] and the other about taking a psychotropic or intoxicant material [meaning euphoric mood]. The changes in inferencing or figuring out the meaning of a statement based upon contextualization and spatial referencing can be one of the most critical aspects while resolving the social media processing errors.
Emoticons and Symbol Sequence
Often, a specific combination of symbols is used to create or represent an emoticon. For example, the serial combination ‘:)’ is used to create a smiling face ‘’ or if not automatically converted, will be used to represent the same. Similarly, there are various combinations of symbols like ‘:(‘ for , the crying face and ‘;)’ for a wink. These symbol combinations are either not used or used in different contexts in the NLP, needing careful recalibration. The emoticons are not used in natural language and hence require special consideration. Despite standardization efforts in Machine Learning, the challenges arise from non-standard and uncontrolled use of images and emoticons, contexts, words and references. For example, the emoticon ‘(ﾟ⊿ﾟ)’ denoting an angry bird is often misused with ‘ (:|)’ to express a straight face or ‘:-~’ to express a tick. Many a times, the text, special characters and symbols are used together to create a text art, which may often have little or no meaning. For example, ‘(•‾̑⌣‾̑•)’ is a combination of symbols, which is the same as the emoji with no different meaning. Often, there may be the use of some images, including faces to express sentiments. Many social media platforms and applications now have emoticon libraries, comprising images with specific meanings attached to each. All these require special learning by the machine through image-recognition ontology and rules.
Geo-oriented Language Variations
Sometimes, geographical variations in languages can pose complexity in processing the cases. For example, the use of phrases such as, ‘do the needful’ instead of ‘take required action’ is specific to India. A ‘bad shot’ is to substitute for ‘annoying actions’ specifically in the Philippines, and ‘Frigidaire’ the company name, is used in place of ‘refrigerator’ particularly in some parts of the US and Philippines. “Xerox” is used as an alternative to photocopy in Indian subcontinent. Another example of geography- specific language complexity is the use of double negation in the wrong context. This happens especially with non-English speaking population such as Spanish and Portuguese. For example, ‘I don’t know nothing’ is not a positive expression despite double negation. Hence, processing information originating from a diverse set of population with different styles and expressions is difficult 9.
Variation in Spellings
There are several words that are pronounced identically by most people but in writing, they are distinguished from each other by their spellings. For example, ‘colour’ in the UK is written as ‘color’ in the United States. The complexity of expression and spelling variation increases with social media. Idiosyncratic use of ‘hy’, ‘hey’, ‘hai’, ‘hei’, and ‘hye’ for the greeting ‘hi’ is a social media jargon or slang and is not specific to a geography. Use of greetings like ‘adieu’ and ‘au revoir’ is common in social media posts. However, the use of ‘ar’ for ‘au revroire’ is an example of a dialect mixed with slang 9.
These challenges pose extreme limitations for processes like robotic automation. Using modern technology backed by Artificial Intelligence (AI), such as Machine Learning (ML) and Natural Language Processing (NLP), and getting the system trained with social media jargon/language and medical terminology, is a promising solution to all these challenges 17.
Leveraging AI for Better Adverse Event Detection
Leveraging AI for social media case processing is probably junction of a solution and a new problem. While AI helps mitigate challenges, it opens a new challenge of regulatory compliance and reasonable justification for rejection of a post.
There have been several studies with various focus points in medical products vigilance process, mainly oriented around exploring potential of AI 18, 19, 20, 21,22,23,24. Here, we will discuss how social media posts can be processed with the use of AI and analogically find the diamonds from the coalmine. In order to process unstructured and complex information through social media, intelligence needs to come along with efficiency and scalability. In the case of human beings, the understanding comes from training and exposure. Similarly, the AI system understands through analysis of data and feedback loops. Social media is a big ocean of data, and it provides a big opportunity for us to implement an AI-based solution to arrive at the desired results. However, as AE processing is a highly regulated process, an optimum solution with validation and regulatory compliance can serve as the Holy Grail of AI-based AE processing in Social Media. The use of NLP can easily auto-detect ADRs and derive sense out of social media posts, and filter out relevant information and contextualize as explained in the blog Supervised & Unsupervised Learning for ICSR Processing. Machine Learning capabilities, using advanced algorithms, are being used evaluate the strength of ADRs in relation to what is being discussed in the social media post. Machine Learning techniques easily identify possible adverse events (AE) or adverse drug reactions (ADRs) by performing statistical data analysis and using pharmacovigilance data and is already in use 25.
Thinking Ahead of the Problem: AI with Deterministic Approach
The challenges, regulatory requirements and business prerequisites call for a solution that maintains the dynamic nature of machine learning for efficiency and rigid nature of process for compliance. Hence, a solution or a method to handle social media posts requires AI solution with justification trail. The proposed methodology has AI using a deterministic approach thoroughly trained and tested. There are three main steps in this method:
– Spatial contextualization mining and extracting unique records
– Appraisal and logical selection of potential posts of interest
– Use of strong ontology for high accuracy search and data transformation
Spatial Contextualization for Mining Target Information and Identifying Unique Records
For mining the information, there are multiple tools available in social media. In addition to the social media companies themselves making their data available for continuous monitoring, these tools allow data to be fetched into a system, which forms a “data lake” and can be used further. The first step of social media-based vigilance process is to identify the relevant information by putting an appropriate query on the data lake and fetching relevant information.
The first important step in data mining is to have a search strategy, which includes query terms and their synonyms or related words. The search query requires to correctly fetch all the posts that have even least probability of matching to the target outcome. In fact, a semantic search capability is among the best fit feature of any search engine for a business users as explained in the blog, Assessing Product Complaints and Ensuring Vigilance in the Pharma Industry Using Semantic Search. The capability of Machine Learning, which learns language, dialects, slangs, emoticons and jargons, aids to the query and helps identify all target posts. After the data of target posts is fetched, based upon some parameters, a significant number of duplicates are removed from the data, which commonly appears due to reposting of the same post, by multiple users (like re-tweets). When the unique records remain, we use the data for further processing.
An AI system processes natural language, extracts named entities from a post and encodes not just medical terms but sentiments expressed in the post. The post, after data identification, de-duplication and appraisal, is further processed to develop into a “case” and to populate all the required information in the vigilance database. In case of truncated record or snippets, the AI system visits the URL, fetches, and processes the complete web-based information. After completion of data extraction, the entire record is marked ‘completed’. The completion includes various processes such as derivations from formulae, determination on search outcomes, and algorithm-based decisions.
Appraisal and Logical Selection of Posts for Processing
While these capabilities apply to entire social media database, it still includes a large amount of data with a major proportion of noise. Therefore, second step of the solution should include identification of unique records and systematic appraisal with trackable justification of inclusion or exclusion of a particular post.
Appraisal is the process used by human beings to ensure selection of relevant information in large sources with least efforts. In context of social media posts, the appraisal process is used not only to make correct selection from the large amount of data, but also to create a trail of justification for the selected and rejected posts. With the adoption of social media-specific requirements–slangs, dialects, geo-specific lingual diversity, emoticon, image-based sentiment recognition, and ingestion of ontology–a customized NLP solution becomes ready for social media processing. It is imperative that our AI solution enables noise elimination (backed up by trail of justification) from the crawled data without missing out on a true positive case (completeness). In addition to fetching the correct information, the use of acceptable and validated methods is also essential. Figure 1 below provides a view of our in-house and unique appraisal system that meets all the requirements and is able to generate a record of posts along with their appraisal.
Figure 1 Deciphering social media slangs and appraising posts
Use of Strong Unique Ontology
The AI solution requires to learn and be able to process social media cases appropriately. Based upon the rules, ontologies and configurations, the process of such learning can be expedited. The rules for de-duplication, appraisal, and ontologies specific to social media type are among the most important supervised learnings. At the same time, NLP capabilities are recalibrated to better understand slangs and emoticons based on the ontology. As we have seen above, slangs can often have multiple contexts for which the message is being conveyed.
In this method, various capabilities were developed to use the correct ontology, which in turn gets enriched with the reinforced learning. This method hence understands diverse content, including language, dialects, symbols, jargons and rapidly changing conventions, understanding of medical conditions and sentiment of the expressions etc. It also equips the solution with high speed processing, quick learning, immediate implementation and many more attributes to handle all major challenges. The AI system is able to distinguish between various terms in medical language or those that are coded as medical terms by contextualization etc. We have developed an in-house dictionary to address slangs, emoticons, and symbols in training of the solution meant for social medial monitoring.
From Substance to Sublime – Example of the 3-step Method
The completion of record includes medical coding as well as assessment of relationship of event with the products etc. with correct context (Figure 2)
Figure 2 Social Media Case Processing with AI This example is to demonstrate how social media processing with the proposed method has mitigated all challenges in processing the tweet “I took B******** [drug] last night….. think I am a lil [jargon] high [colloquial term for a medical term]”.
Social media has been embedded in our daily lives and has become a widely used and acceptable medium to discuss and exchange information on patient’s lives, drug intake, adverse events, or their observations. With the kind of challenges that are imposed by social media, the processing of social media based ADRs is practically not possible and feasible for human beings. In addition, understanding the challenges posed by noise, language, jargon, dialect and variations generates humongous information. Moreover, the diversity limits robotic or other transactional automations. Hence, the industry is striving to find a way to monitor and mine correct information from the social media, while remaining compliant with regulation.
The promising and viable solution to all these issues is deployment of a modern technology of Artificial Intelligence, with systematic and deterministic approach. In addition to machine Learning and Natural Language Processing capabilities, the method using Spatial and contextual inference, using rich ontologies, systematic Appraisal and makes the decisions based upon a deterministic algorithm and thus helping the system to remain compliant and validated for regulatory purposes.
More info about TCS Life Sciences Advanced Drug Development Platforms https://www.tcs.com/advanced-drug-development
1 R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah, J. Yang, G. Gonzalez, Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, Association for Computational Linguistics (2010), pp. 117-125
2 Gage-Bouchard, E. A., LaValley, S., Warunek, M., Beaupin, L. K., & Mollica, M. (2018). Is cancer information exchanged on social media scientifically accurate?. Journal of Cancer Education, 33(6), 1328-1332.
3 Chee BW, Berlin R, Schatz B. Predicting adverse drug events from personal health messages. In: AMIA Annual Symposium Proceedings: American Medical Informatics Association, 2011: 217–26.
4 Nikfarjam A, Gonzalez GH. Pattern mining for extraction of mentions of adverse drug reactions from user comments. In: AMIA Annual Symposium Proceedings: American Medical Informatics Association, 2011: 1019–1026
5 Yang CC, Yang H, Jiang L, Zhang M. Social media mining for drug safety signal detection. In: Proceedings of the 2012 international workshop on Smart health and wellbeing: ACM, 2012: 33–40.
6 H. Wu, H. Fang, S. Stanhope, et al. Exploiting online discussions to discover unrecognized drug side effects
Methods Inform. Med., 52 (2) (2013), pp. 152-159
7 Center for Drug Evaluation and Research, Drug Safety Pririties 2017, available at https://www.fda.gov/downloads/Drugs/DrugSafety/UCM605229.pdf
8 A. Benton, L. Ungar, S. Hill, S. Hennessy, J. Mao, A. Chung, C.E. Leonard, J.H. Holmes, Identifying potential adverse effects using the web: a new approach to medical hypothesis generation, J. Biomed. Inform., 44 (6) (2011), pp. 989-996
9 Anna Katrine Jørgensen, Dirk Hovy, and Anders Søgaard, Challenges of studying and processing dialects in social media, available at https://www.aclweb.org/anthology/W15-4302
10 Ginn R, Pimpalkhute P, Nikfarjam A, Patki A, O’Connor K, Sarker A, Gonzalez G. Mining Twitter for adverse drug reaction mentions: a corpus and classification benchmark. LREC BioTexM. Proceedings of the fourth workshop on building and evaluating resources for health and biomedical text processing, 2014; 2: 1–8.
11 O’Connor K, Pimpalkhute P, Nikfarjam A, Ginn R, Smith KL, Gonzalez G. Pharmacovigilance on Twitter? Mining Tweets for adverse drug reactions. AMIA Annual Symposium Proceedings: American Medical Informatics Association, 2014: 924–933.
12 Carrie E. Pierce Khaled Bouri Carol Pamer Scott Proestel Harold W. Rodriguez Hoa Van Le Clark C. Freifeld John S. Brownstein Mark WalderhaugI. Ralph Edwards Nabarun Dasgupta, Evaluation of Facebook and Twitter Monitoring to Detect Safety Signals for Medical Products: An Analysis of Recent FDA Safety Alerts, Drug Saf (2017) 40: 317. https://doi.org/10.1007/s40264-016-0491-0
13 Investigating Spikes In Social Media Conversations, by NetBase Staff | May 16, 2016 | Social Analytics available at https://www.netbase.com/blog/spikes-in-social-media-conversations/
14 Seargeant, P., & Tagg, C. (Eds.). (2014). The language of social media: Identity and community on the internet. Springer.
15 Rahul GoelSandeep SoniNaman GoyalJohn PaparrizosHanna WallachFernando DiazJacob Eisenstein, The Social Dynamics of Language Change in Online Networks, In: Spiro E., Ahn YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science, vol 10046. Springer, Cham
16 Knezevic MZ, Bivolarevic IC, Peric TS, Jankovic SM. Using Facebook to increase spontaneous reporting of adverse drug reactions. Drug Saf 2011; 34: 351–2.
17 Sloane, R., Osanlou, O., Lewis, D., Bollegala, D., Maskell, S., & Pirmohamed, M. (2015). Social media and pharmacovigilance: A review of the opportunities and challenges. British journal of clinical pharmacology, 80(4), 910–920. doi:10.1111/bcp.12717
18 R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah, J. Yang, G. Gonzalez, Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks, Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, Association for Computational Linguistics (2010), pp. 117-12
19 A. Nikfarjam, G.H. Gonzalez, Pattern mining for extraction of mentions of adverse drug reactions from user comments, AMIA Annual Symposium Proceedings, vol. 2011, American Medical Informatics Association (2011), p. 1019
20 B.W. Chee, R. Berlin, B. Schatz, Predicting adverse drug events from personal health messages, AMIA Annual Symposium Proceedings, 2011, American Medical Informatics Association (2011), p. 217
21 A. Benton, L. Ungar, S. Hill, S. Hennessy, J. Mao, A. Chung, C.E. Leonard, J.H. Holmes, Identifying potential adverse effects using the web: a new approach to medical hypothesis generation, J. Biomed. Inform., 44 (6) (2011), pp. 989-996
22 J. Hadzi-Puric, J. Grmusa, Automatic drug adverse reaction discovery from parenting websites using disproportionality methods, Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), IEEE Computer Society (2012), pp. 792-797
23 J. Bian, U. Topaloglu, F. Yu, Towards large-scale twitter mining for drug-related adverse events
Proceedings of the 2012 International Workshop on Smart Health and Wellbeing, ACM (2012), pp. 25-32
24 Karen O’Connor, Pranoti Pimpalkhute, Azadeh Nikfarjam, MS, Rachel Ginn, Karen L Smith, PhD, and Graciela Gonzalez, PhD, Pharmacovigilance on Twitter? Mining Tweets for Adverse Drug Reactions, AMIA Annu Symp Proc. 2014; 2014: 924–933. Published online 2014 Nov 14. AMIA Annu Symp Procv.2014; 2014PMC4419871 PMCID: PMC4419871 PMID: 25954400,
25 INTERNATIONAL SOCIETY OF PHARMACOVIGILANCE Exerpts of ISoP Seminar, Intelligent Automation in Pharmacovigilance, 4th – 5th December 2017, Boston/Cambridge, MA (USA)