DATA TIMELINE: : OCT 18 – OCT 27, 2019
REPORT 3.0 PUBLISHED NOV 4, 2019
BLACKBIRD.AI – INTELLIGENCE REPORT
CHINESE COORDINATION CAMPAIGNS AND THEIR SIGNATURES
The Hong Kong Anti-ELAB protests have been going on for several months, having been triggered by a March 31, 2019 bill that would allow extradition from Hong Kong to mainland China or Taiwan. In early June, protests began in earnest and progressed to repeated confrontations between protestors and the Hong Kong police.
While the standoffs continue on the ground, a parallel battle has been launched on social media to try and gain control of the ongoing narrative. On one side, the primary players have been an Anti-ELAB faction, which styles itself as pro-democracy and pro-Hong Kong, and the other being the Chinese Government, which insists that Hong Kong is part of China. As is frequently the case, both sides have been gaming social media to amplify their message. For instance, on August 19, Twitter itself published a blog post and accompanying data related to 936 user accounts they found to be “deliberately and specifically attempting to sow political discord in Hong Kong”
In the course of this report we will focus on spotting coordinated Twitter campaigns by:
- Describing Blackbird.AI’s data collection/analysis pipeline.
- Identifying “fingerprints” of coordinated Twitter campaigns (likely) from Chinese state actors.
- Showing various examples of such campaigns.
- Recommending users for Twitter suspension/awareness.
- Suggesting further exploration going forward
TOP LEVEL TAKEAWAYS
- Blackbird.AI collected 442,220 tweets surrounding the Hong Kong protests from 133,185 users between October 18 – 27, 2019.
- Using community detection and toxic language analysis, Blackbird.AI immediately identified 392 suspicious users from 33 communities.
- Restricting to apparent Chinese or Hong Kong actors (excluding K-Pop, porn and Japanese anime) Blackbird.AI flagged 302 highly suspicious users from 17 communities.
- To date, Twitter has only suspended 99 of the 302 suspicious users.
- This suspension rate suggests the flagged communities are notable.
- There are still 203 users Blackbird.AI will submit to Twitter, suggesting suspension based on its analysis.
- The remainder of this post describes Blackbird.AI’s analysis with examples of bad actors and behavioral patterns found.
Blackbird.AI collects data using the Twitter API and runs a continuous stream that filters tweets by a predefined list of keywords. Active news stories can then be monitored by optimized keywords. Using keywords related to the Hong Kong protests, Blackbird.AI surfaced:
- 442,220 tweets and 133,185 users in 1,946 communities (highlighted below)
- Within the October 18 – 27, 2019 timeframe
From the Twitter data collected above, Blackbird.AI’s preprocessing pipeline extracts a variety of useful fields and transforms the data for network analysis. For this analysis, 2 derived fields in particular—Tweet Toxicity and User Community—turn out to be very effective to identify coordinated campaigns. With surprisingly high precision, Blackbird.AI combines aspects from those 2 fields to extract candidate coordination campaigns.
Blackbird.AI has built an in-house (multilingual, English and Chinese) machine learning model normalized for classifying the toxic language in text. This is one of several similar ‘classifiers’ used in our pipeline (including partisanship, emotion, etc), but is currently the most effective at identifying bad actors.
Here are examples of some of the most toxic tweets in the dataset according to our model when translated from Chinese using Google Translate:
Western dog media are flies. No matter what happens in the world, it is to discredit China and vilify China. It always talks about democracy and human rights. Now the truth comes out, it’s a face, and the Chinese government of China is supposed to be like China…
The vast majority of mainland Chinese people do not support those thugs…
Overseas is the opposite of the anti-community, but it is a good thing for the Chinese and the CCP. Look at what kind of freak the wheel has become overseas for 20 years. Look at the Guo ant gang and the Hong Kong egg thugs. The shameless performance, a bit of a brain Chinese will see clearly that the despicable hypocrisy and double standards of Western politicians support the turmoil and injury to Hong Kong and other countries, and the only countries in the world that can contain US hegemony are the Chinese-led China…
Toxicity scores will be a useful filter in the future when searching for possible bad actors. When we combined them with community information, we managed to surface hundreds of suspicious accounts.
Blackbird.AI identifies communities based on how users interact with each other. In this case, we focused on the Retweet Network from the dataset:
- The retweet network structures the users as nodes in a network graph, with links between a pair of users when they retweet one another (link strength is the number of retweets between them).
- In our pipeline we also compute communities for other relations including mentions, quotes, replies, and a combination of these. Our exploration found that retweets were the most valuable measure, as retweets are most indicative of affirmative support or agreement.
roughly 11K users were assigned no community, meaning they did not interact with anyone in the dataset).
Below is a sample snapshot of those communities using our in-house ‘Constellation’ tool:
Each color represents a different community in the data, and size represents user importance. The account @CGTNOfficial is highlighted, being a Chinese media source which is represented by the very large red dot in that upper red cluster (a community full of seemingly pro-Chinese users).
Now that we have identified toxic tweets and assigned users to communities, we require one more variable to be able to extract the highly coordinated campaigns that we are searching for being community sign-up variances, which we will look at in the subsequent pages..
We take a wider view on narratives across communities using our ‘Constellation’ tool, below you will find a snapshot of the major manipulated hashtags across multiple communities, represented by the different colors. The larger the size of the keyword, the higher the level of suspected manipulation activity around its conversations.
- The pink cluster represents the Pro-China network (AntiHKNazi, HKIS, HKRioters) using a smaller variety of hashtags to drive its narratives.
- The Pro-Protest (Anti-China) side dominates in volume in this dataset, using a much larger variety of hashtags, seen clustered in the bottom right of the image. With a slightly lower level of manipulation, which is still apparent.
- Seen in the light green cluster, features narratives of support (freehongkong, standwithhongkong, fightforfreedom) along with the darker green cluster (StandwithHongKong, antichinazi, BoycottChina, antitotalitarianism).
- While clusters evoking more focused narratives against China and the Hong Kong government include the red cluster (antiELABhk, chinazism, carrielam, policebrutality, hongkongprotests, ccp) and the purple cluster (hkpolicebrutality, HKPoliceState, HKPoliceTerrorism).
- Concurrently, the ‘Yellow Vest’ protests in France are seen in the orange cluster, leveraging narratives across the Hong Kong protests, seen from the connecting lines at the centre of the image
- QAnon makes an extensive appearance in the dataset, leveraging on existing world events to generally push their conspiracy fuelled narratives, focused on destabilizing world events. As seen in the orange cluster, they have primarily leveraged the’ Yellow Vest’ protests in France, along with narratives around Brexit, the Catalonian protests and naturally the Hong Kong protests.
- The QAnon network includes very large volumes and the highest levels of manipulation within the dataset. Mainly around its related hashtags (WWG1WGA and TheGreatAwakening), and including amplification around ‘MAGA’
COMMUNITY SIGN-UP VARIANCE
Here are some users from a particular community in our dataset:
You might have noticed that it is unnatural to have all 9 users signing up within a few hours of one another, coincidentally being part of the same community. Specifically, 15 of the 54 users in this community signed up on the same day – October 18. In probability theory the ‘Birthday Paradox’ states that with 23 or more people in a room, it is more likely than not that 2 of them will share the same birthday. In this instance however, it is highly unlikely that 15 of the 54 members of the same community (almost a third of the so called “random” users) shared the same account creation date.
Further, from the graph observe the sign-up volume by day for this community:
Certainly traits of an inorganic distribution. Being the basis for our final insight, which allows us to further cull the large starting dataset of 133K users down to a few hundred users for examination – 392, to be exact. Specifically, we use Community Sign-up Variance, the variance of the user sign-up date within a community. Communities of users all signing up nearly simultaneously are likely to be coordinating, therefore communities with a low Community Sign-up Variance should be inspected.
PUTTING IT ALL TOGETHER: RETRIEVING CANDIDATES
Explaining the fields that we have for our analysis, we further elaborate on their specific applications.
Below is a graph that immediately suggests the value in community sign-up variance:
This plot shows all communities with toxic tweets, with the community sign-up variance plotted against community size (the smaller communities will more likely have low sign-up variance). The communities we have targeted as most likely coordinated are highlighted in red, and the remainder in blue. The node size is a representation of community toxicity.
The finding from this is that the plot points look nearly linearly separable, which is highly inorganic. A line can be drawn to separate the reds from the blues (using a logarithmic function), meaning we can accurately distinguish the 2 community classes very clearly (with a pre-filter on toxicity). By querying for retweet communities with (1) toxic tweets and (2) low average yearly sign-up variance based on size, we will find likely candidates for coordinated campaign users.
This is a community-driven approach to finding inorganic users. Our further manual examination of the 392 candidates led us to conclude that 302 of them are part of coordinated pro-Chinese communities. An incredibly low probability result.
In this section we show some anecdotal evidence for the illegitimacy of the various users we have found.
COMMUNITY 1261: Here are some images from a community of 9 users that almost exclusively retweet one another and relentlessly attack 3 users: @IKusGtdqayjgqS1 (suspended), @anksa19 (suspended), and @Guzhuohenghk.
Points to note:
- Accounts are heavily Pro-China.
- Users have almost exclusively 0 followers and 0 follows.
- Users post many duplicate images.
COMMUNITY 1354: In our prior example community of 54 users, below is a list of the user names and sign-up times.
Points to note:
- Near-duplicate user names. A common theme within these communities.
- Near-duplicate sign-up time.
COMMUNITY 1844 : Here is an interesting community of 33 users that is positioned to be from America.
Points to note:
- Profiles are exclusively random “American” women.
- Users often have 0 followers and 0 follows.
- Users often have a random American town explicitly listed.
- Bursts of sign-ups.
- Tweets are almost all Chinese remarks about Hong Kong, despite profiles setup in English.
COMMUNITY 1897 and 10800: Here are 2 communities of 25 and 22 users that consistently use the same or very similar-looking models in their pictures.
Points to note:
- Same or similar model is used repeatedly.
- Minimal followers/following.
- Bursts of sign-ups.
- Plenty of content around Hong Kong.
The sample results shown are demonstrations of the different types of “leads” that we have found in manually examining the 392 candidate users. We have noted examples of these inorganic behaviors repeatedly throughout the entire set of users – flagging 302 of them as a result. Most of the users we did not flag did in fact meet the criteria of suspected coordination. However, these users were often focused on a different subject matter (mainly K-Pop, but also including Porn amongst other things). When we limited our search to Chinese language communities almost every user examined was marked as a threat actor
CONCLUSIONS AND CAVEATS
By looking for communities with bursts in sign-up activity, using toxic language, we can accurately uncover a large volume of coordinated Twitter users and campaigns with a very high precision. With the current state of Chinese social media influence operation, our findings represent insights of a type of digital “fingerprint” for identifying Chinese Government-backed Twitter campaigns.
With that, both parties are indeed responsible for creating targeted campaigns towards one another. While the report findings present an intriguing conclusion, we can only assume that the underlying source of our findings within this report is a government operation. We can certainly conclude that these communities of users exhibit similar behavioral patterns indicative of coordinated, toxic, and possibly synthetic campaigns. However, without additional information, in the data or elsewhere, we should not make a firm conclusion as to the ultimate source. Still, we can make a few additional points:
- As Twitter itself pointed out, Twitter is banned in China. While VPN users in mainland China exist, the likelihood that they are the ones pushing mass amounts of pro-China propaganda is low.
- The tweets often use phrases specific to common Chinese Government language, for instance frequent references to the “sinister” “thugs” “rioting” (protestors) or “Western dogs” (usually the United States CIA) interfering in Chinese (Hong Kong) affairs.
- The users promote a large amount of Chinese state-run media organizations on Twitter, with some users doing so almost exclusively.
At Blackbird.AI, there is a wide range of methodologies we consistently test in order to surface the activities of threat actors. In this report, our goal was to highlight the unique fingerprint signatures of these specific coordinated campaigns. Excluding the use of our techniques for bot detection. Many of the most damaging coordinated campaigns (Russian election interference, for example) have been driven primarily by real humans sitting at keyboards. Additionally we perform analysis of language usage, further derived field extraction, network analysis, visualization, insight generation, and more as we tackle the problem of disinformation from many different angles.