Social Media Listening and the Winter of AI
Trying to navigate the complexities of selecting the proper social media listening tool
Martin Miliev, Head of Social Media Intelligence in Publicis Groupe Bulgaria
Media listening originated as function of PR and for a while it was limited to mere print, online news clippings and rudimentary metrics like Share of Voice (SOV) and Advertising Value Equivalency (AVE). Then as social media networks emerged and PR professionals sought the help of tools to monitor and make sense of the unfathomable volumes user generated content from such platforms. This led to the rapid evolution of media listening to encompass more sophisticated analysis like brand health check-ups, consumer profiling and influencer identification. Likewise, other verticals of the marcom industry like media planning and consumer research took an interest and started incorporating social media intelligence into their arsenal of research tools.
As a result, the tech industry has jumped into the foray in an attempt to meet this growing demand and plethora of needs. A number of incumbents like Radian6, Netbase, Synthesio, Sysomos, Brandwatch/CrimsonHexagon, Talkwalker, Radarly and an ever-increasing number of up-and-coming tools like Pulsar, YouScan and DigiMind have been vying for our social listening tools budgets.
Speaking of the tools, those of us who have delved into social media intelligence, be it as PR practitioners, brand managers, media planners and marketing researchers, usually apply several key criteria when selecting a social listening tool - price, breadth and quality of data sources, search prowess and various data analytic features and automation.
With all this said, selecting the proper social listening tool should be a relatively easy task. We have a wide range of vendors and tools to choose from; but as the The Forrester Wave: Social Listening Platforms reports has been persistently and correctly pointing out over the past several years - the vendors have continuously invested in stronger analytics, deeper tech integration, artificial intelligence (AI), machine learning (ML), Natural Language Processing (NLP) and Image Recognition.
Unfortunately, this arms race has made the social listening technology much commoditized. Both incumbents and challengers have been offering marginally different data sources, analytics and bold claims about their AI and NLP. What’s more, content platforms often come with complex and confusing price models (do any of these sound familiar – Analyst User, monitor, topic, Twitter decahose, historical data restriction, etc.) and prohibitively high price tags.
What is worse though is that often the drummed-up AI gizmos fall short of what has been advertised. Regardless, if you have been working in media analytics for three months or ten years, by now you know the feeling of pre-Christmas elation and excitement when a content vendor starts strutting its AI and NLP gadgets and the quick dejection that usually sets in once you get your hands on the content platform. Don’t get me wrong, the content vendor industry has made great strides when it comes to data analytics technology and has provided us with widgets that enhance and speed our work. However, the NLP offerings keep leaving a bitter aftertaste.
So at the end, content tool comparison and tool selection can be a very daunting task and we are often left at the mercy of the likes of Forrester and the vendors with little opportunity to run actual head-to-head benchmarks.
In August 2019 the Publicis Groupe – Sofia Social Media Intelligence team in Bulgaria embarked on a journey to test how Netbase, Brandwatch/CrimsonHexagon, Talkwalker, Sysomos and YouScan stack up in terms of data sources, search results and NLP/Sentiment Scoring.
For full disclosure, the Publicis Groupe – Sofia Social Media Intelligence team is currently using CrimsonHexagon, Netbase and TalkWalker. In the past the team has used Sysomos too. Also, in the past six months, the content systems evaluated in our study have undergone updates and upgrades that might have direct implications on end results of the criteria we have applied.
For the purpose of this project, we picked a relatively cheerful topic – beer, and India pale ale (IPA) in particular, hoping to capture a lot of content with strong sentiment indicators. With regards to the highly vaunted NLP/Sentiment Scores, our goal was to analyze how the various platforms performed against each other, not the overall accuracy of the Sentiment score. As those of you have how have dealt with manual sentiment scoring know, “no two analysts ever scored the same tweet” and I will leave it up to you to decide if an Instagram post of a photo of a glass of IPA beer accompanied by #IPA #Craftbeer #craftbeerporn is a positive or neutral post*
* In my humble opinion this is a consumer endorsement post and should be scored as positive ????
Search query (adapted accordingly to Boolean search capabilities and operators of each tool): (IPA OR IPAS) NEAR 7 (“craft beer” OR #craftbeer)
Geographic markets: Global
Time Period: 12 August 2019 – 18 August 2019
Data Sources & Search Volumes
In an ideal world, the five content platforms would have dished out similar, if not identical, search volumes and composition of data sources.
In the real world, although all five vendors sport excellent Boolean search capabilities, our searches produced varying search results across the different content platforms. The discrepancies were not extreme in most cases but still large enough to raise some concerns and questions about the source books and crawling techniques of each vendor (For the record Netbase and CrimsonHexagon provided the highest search volumes and almost identical mix of data sources).
The discrepancy in the volumes, most noticeable on Twitter, can be attributed to several factors:
- Ability to recognize and assign particular languages to a piece of content.
- Automatic wildcard operators – NetBase runs wildcard operators as a default option so “craft beer” automatically pulls mentions of “craft beerS” too.
- Twitter API access and Decahose restrictions – these are subject to the vendor and Twitter’s contractual agreements.
- Crawling script criteria – some platforms look at the tweet and if the Tweet contains a link, the algorithm checks for the words featured in the URL. In a similar fashion, some platforms search the forum thread title and the post for key words, not just the forum post.
- Use of proprietary in house developed crawlers, buying content from third party providers (Twitter, LexisNexis/MoreOver, BoardReader.com, etc.) or combination of both.
Automated Sentiment Score:
The NLP Sentiment Scores of each of the five content vendors differed, although CrimsonHexagon, Talkwalker and Sysomos had relatively close Positive scores - 42%, 37%, 48% respectively but the deviation was still pretty high. NetBase and YouScan were stingiest when it came to Positive scores which was bit surprising given the topic we picked (craft beer).
Regardless, the NLP Sentiment Scores clearly showed the various platforms utilized different NLP rules and algorithms. The differences in the search results and compositions of data sources might be one factor impacting the NLP score but on the other hand Twitter comprised over 80% of the content across all platforms, so the NLP algorithms should have been applied on identical or almost identical content.
Deep Dive into the Automated Sentiment Score:
As a next step in our NLP/Sentiment Scoring assessment, we focused on Twitter as the combination of short content (240 symbols) and English language posed higher accuracy chance. Additionally, as mentioned above, Twitter contributed on average 85% of the content on the topic.
Blue: CrimsonHexagon, Red: NetBase, Green: Sysomos, Yellow: TalkWalker, Grey: YouScan)
We discovered that in only 44% of the time, instances a particular tweet received unanimous sentiment score. Furthermore, in 61% of the unanimous sentiment score data subset, the NLP assigned Neutral score.
The share of Positive and Negative posts with unanimous Automated Sentiment score were 37% and 2% respectively. It must be noted that in both instances, the content was loaded with sentiment trigger words and phrases like awesome, genuinely impressive, would recommend and disgusting, which did not give much room for ambiguity and error in the score.
Lack of Consistency within the platforms:
While the variations in the Sentiment scores across the different content vendor platform were somewhat expected, deviations in the automated sentiment on almost identical content were not. However, we discovered that all five NLP algorithms veered off and scored differently different, unique tweets with almost identical content and no obvious positive or negative trigger words that would prompt a Positive or Negative Score.
Automated Neutral Sentiment
If you have been using content platforms like Netbase or Sysomos for a while, you are used by now to seeing over 70% of the searched content graded as Neutral, regardless of search terms, topic and language. In all fairness, this is not unheard-of phenomena even when manual sentiment coding is applied.
Upon closer examination, the Twitter Neutral data set, showed the vendors’ NLP algorithms struggled somewhat to speak “hashtag” – as Tweets often don’t follow standard grammar and syntax rules, are comprised of seemingly unrelated words or spliced words, emojis, and contain links and visual content that is not subject to the NLP sentiment analysis.
It is evident that the current content vendors’ NLP text only analysis model has a blind spot and this inability to take full advantage of the data points available in the social media content leads to sub-par sentiment grades. Our guess and hope is that as the vendors continue to increase their image recognition capabilities and extract more contextual data from the visual content (objects, scenes, actions), the combination of text and visual analytics will improve the automated sentiment scoring.
However, the current one-size-fits all approach to the text analytics will continue to distort the Automated Sentiment Scores. The current AI model with over-reliance on dictionaries and sluggish NLP algorithm playing catch up with modern language developments, fads, and trends is not sustainable and does not do the end customers justice. While we have seen some attempts to empower the end users with Machine Learning tools but alas these have been rather limited in scope and capabilities.
Last, but not the least, for the foreseeable future, the content of a link, an article or Instagram post, will likely remain out of reach for various reasons unrelated to technology and the content vendors but rather social media network policy (Will Facebook allow Twitter to display the Instagram photos?).
I am sure none of this is news for the experienced media analysts out there. Nor is the idea that after we sliced and diced the data for several days, we circled back to the conviction that proper social media intelligence still requires the right mix of tech and human touch to extract the relevant data, translate it through the proper zeitgeist lenses, and distill actionable insights out of it.
The proliferation of data sources and the volumes of user generated content and the meta data that accompanies this content will continue to fuel the demand for enhanced and automated analytics features in the content platforms. This will remain the key differentiator for the content vendors.
However, when it comes to “qualitative” content analysis, the AI and machine learning vendor solutions will remain sub-par. Any progress in this field, even an incremental one, or in any related filed like image and video recognition, will definitely have a positive impact on the analysts’ daily work but for the foreseeable future, output of the algorithms will and should continue to be treated with suspicion.
We started off quoting the Forrester Wave reports and will wrap things up again agreeing with Forrester’s verdict on the best content platforms available and confirm that Netbase and CrimsonHexagon currently provide some of the best tools on the market. At this time, we are yet to see the end result of the Brandwatch-CrimsonHexagon merger but we are cautiously optimistic about the outcome. Also, the rapid pace of technological change makes it a must to keep an eye on the rest of the competitors and any new comers.
This report was first published on SILAB.com. Link to full report.
Martin Miliev is responsible for the operational business of the Social Media Intelligence unit at Publicis Groupe (Bulgaria). His team provide social media analytics and monitoring services to a number of Publicis Groupe agencies like MSL UK, MSL Germany, MSL Italy, Publicis Media (UK) & Digitas Pixelpark.