In this episode, we will take a look on disinformation campaign in polish social media platform - wykop.pl. It's very similar to widely known Reddit, without subreddits, but with tags and micro blog instead. I will present techniques to gather information about users, upvotes/downvotes and content. In addition, everything will be presented in clear way with chart and relationships graph for users.
In case you missed last episode about deobfuscation, source code analysis and taking down CP campaign, you can still read it here
OSINT & DISINFORMATION
We all know that social media platforms are not only for sharing photos and memories with friends but also for quick exchange of links and information, which might be used to manipulate social opinion. There are many examples of this behavior in the past but the biggest one - Russian interference in the 2016 United States elections was described in details in Mueller report and some of the propaganda methods used by Russian agents have been uncovered. Their main tactic was to buy advertisement on Facebook for their candidate, create groups associate with fans of one political side and sharing disinformaiton - building a websites containing favorable content and also hiding a uncomfortable information that don't fit their narrative.
Majority of the tasks have been done by bot network and main used social media were Facebook and Twitter.
Due to coming election in Poland, I have decided to analyze the biggest Polish sharing information platform - wykop.pl. Wykop is continously being accused of poor moderation, allowing to manipulate votes or sharing a fake news. Regarding users, some content on main website and Mikroblog is being censored and deleted by mods because or political correctness but getting many upvotes at the same time.
I took a look on content from tags #wybory (#election, from December 2019), #polityka (#politics, from April 2020) and on users and theirs followed, which gave the most amount of downvotes in mentioned tags. Data will be presented in readable way, where I was looking for anomalies, which might point to votes manipulation and potentially use of bot network which block content to go the main site.
Wykop has an API (https://www.wykop.pl/dla-programistow/api/), however access to it it's quite difficult, that's why I decided to check for manual scraping all of the necessary information.
At the beginning, I needed articles from previous weeks and even months. During browsing website with Burp Suite I saw an AJAX request send to
which returns last 25 articles in HTML format, which next can be easily parsed with Python and Beautiful Soup. To paginate over results, one need change "link-[ID]" to the ID of the last article. Parameter hash is just for decoration.
Having this data in our Elasticsearch database (timestamp and ID), we need to check for all users that downvoted all scraped previous articles. We can get this data by accessing following endpoint
where ID is ID of the article
So no database has been populated with articles from politics related tags and all users that downvoted these articles.
Last step is to create a relationship graph between users and to achieve this I needed to scrape all followed users by users that are in database. Also, in this case, access to the data is not limited in any way against scraping.
https://www.wykop.pl/ludzie/followers/USERNAME/- followed users
https://www.wykop.pl/ludzie/followed/USERNAME- following users
After getting all the data, we can start visualization and looking for patterns or anomalies.
Visualization & interpretation the data
As mentioned earlier, first step was to check amount of added articles in particular days against amount of downvotes at this same day.
Below chart presents this relationship.
It is wise to assume that when more articles then more downvotes, this pattern should be visible on any other tags as well. However, if there are only a few articles but big amount of downvotes, we can state that some of the articles were heavily downvoted to not get more views.
One of the example might be 18th of November, where only one article was added and got 127 downvotes ( 637 upvotes), in other days proportions were more sustainable.
Similar situation took place on 16th of December, only one article this day got 158 downvotes (235 upvotes), so it had no change to go trending.
If you take a closer look on day 18th of January, you can see that 4 articles were added with total amount of downvotes 166. Having all data in the database, we can quickly check which articles got most downvotes.
It has 97 downvotes, the most downvoted article from this day.
Other posted on this same day, got less downvotes:
So, this was the way how we can find most downvoted content in specific days. Next, by checking each article, one needs to examine if it wasn't manipulated.
Tag #politics is used often due to versatility, so lot more articles are added each day. Below chart shows relationship between amount of downvotes and articles each day in tag #polityka
If we compare first day of April we can see following schema
04.04 – 83 articles, 854 downvotes,
05.04 - 82 articles, 1640 downvotes,
06.04 – 141 articles, 1560 downvotes
What happened on 5th of April? Amount of articles and downvotes does not match the pattern in previous or next day. It means that one or couple of the articles had to be extensively downvoted, so we need to go deeper into this day and check each finding for the amount of downvotes.
Chart shows that some articles stand out
https://www.wykop.pl/link/5432323/prezydent-czech-mowi-o-porazce-ue-i-ke-ws-koronawirusa/ 156 downvotes (240 upvotes)
https://www.wykop.pl/link/5432507/co-probuje-nam-zaaplikowac-wielki-brat/ 134 downvotes
https://www.wykop.pl/link/5432043/kaczynski-nie-chce-stanu-wyjatkowego-bo-przestalby-byc-naczelnikiem/ 102 downvotes
https://www.wykop.pl/link/5432023/zaraz-bedziemy-mieli-wiecej-samobojstw-i-bankructw-z-powodu-epidemii-niz-wirusa/ 88 downvotes
https://www.wykop.pl/link/5431519/bosak-rozsadnikiem-koronawirusa-w-polsce-stala-sie-ochrona-zdrowia/ 116 downvotes
https://www.wykop.pl/link/5431513/zmarla-wdowa-po-premierze-janie-olszewskim/ 99 downvotes
https://www.wykop.pl/link/5430873/teraz-juz-rozumiecie-dlaczego-polska-przez-tyle-lat-byla-pod-zaborami/ 118 downvotes
Total amount of downvotes in this day was 1640. Above 7 articles got 813 downvotes in total.
Looking on day 15th of April, we can see that 107 articles have been added with total amount 1535 downvotes and only 8 articles (from this 107) got more than half of downvotes - 900
https://www.wykop.pl/link/5449905/kaczynski-nie-dorosl-do-demokracji-i-nie-radzi-sobie-ze-skutkami-epidemii/ 177 downvotes
https://www.wykop.pl/link/5450175/26-02-2020-lukasz-szumowski-ujawnil-ze-byl-w-tym-roku-na-nartach-we-wloszech/ 143 downvotes
https://www.wykop.pl/link/5450719/pis-popiera-zakaz-edukacji-seksualnej-i-kary-wiezienia-za-nia/ 66 downvotes
https://www.wykop.pl/link/5450813/grzegorz-braun-pokazuje-swoje-prawdziwe-oblicze/ 240 downvotes
https://www.wykop.pl/link/5451133/mati-sobie-harrego-pottera-zatrudnil-w-mf-szykujcie-sie-na-magie-portfelach/ 77 downvotes
https://www.wykop.pl/link/5450911/piec-wielkich-klamstw-jaroslawa-kaczynskiego/ 63 downvotes
https://www.wykop.pl/link/5450639/ustawa-stop-447-w-sejmie-robert-bakiewicz/ 61 downvotes
https://www.wykop.pl/link/5449009/krakow-baner-na-wiadukcie-wybory-to-zbrodnia/ 73 downvotes
Some of the articles might overlap due to double tags #polityka (politics) and #wybory (election)
Looking for a network
Users are the center of each social media platform, they decide what kind of content go trending and getting attention this same time. Wykop works the same, users vote for each article and all upvotes are visible in user profile. The thing is different with downvotes, they are hidden and access to this information impossible, however if you read carefully previous chapter, you could see that we can build network from the data that is already stored in database.
First, we need to distinguish users that gave most downvotes
Charts shows that some users were more active in downvoting than others. Among all of the users I've choosen the most engaged in downvoting, checked their followers and created a relationships graph at the end.
Having this kind of graphical representation, we can start to look for similarities, in this case accounts than connect users. This same Modus Operandi is used by bot network across all social media, especially Twitter. All of the account observe their "mother account" that works as Command and Control server sending commands to execute.
Graph has been colored by modularity, showing how it spreads to communities or clusters. Each user has it's own group of connections and only couple of 2-level connects are visible.
Account marked here is 'wykop' it's official account of Administration. This accounts is followed by almost all users, so it's not unusual that it connects to many our accounts. Visualization of the data in this way helps a lot during work with different social connections information and it's often used to track bot networks.
From raw data, scraped directly from the site, we come to charts comparing downvotes and articles, in specific time period, and graph representing relationships between "suspicious" users that we extract in previous part.
Of course, it's only one part of whole disinformation campaign, other things worth to mention are: aggressive or off topic comments and their votes or broadly defined fake news. Moreover, accounts that have been compromised, due to malware or weak password, can also take part in campaign. Based on their activity, we can't state if it participate in the campaign or not but view from charts perspective gives insight about all accounts that were most active in specific tags.
Nowadays, there are plenty of misinformation campaign across all social media platforms and you should be prepared for potential abusing activity on social medias to share their own narrative and propaganda.
Please subscribe for early access, new awesome things and more.