In this episode, we will take a look on obfuscated javascript code which is actively used in CP campaign, from at least 2018, and on whole distribution process that’s also obfuscated by pretending legit files.

As a source code analysis example, I will present very brief analysis of known coinminer malware that target Redis servers.

If you missed s01e01 about preparation and Bluekeep monitoring, you still can read it here

Last Saturday, Missing Persons CTF by TraceLabs took place, our team of 3 people finished at 10th place. It's very decent result knowing it was our first this kind of CTF. I really recommend everyone to participate, it is a lot of fun.

We were so proud to see our global community come together for a 6 hour #MissingCTF and use the power of #OSINTForGood to generate all of these important leads! The TL team is now working to generate valuable intelligence reports for Law Enforcement using these #OSINT findings! pic.twitter.com/sIE1X53vZK
— Trace Labs (@TraceLabs) April 12, 2020

DEOBFUSCATION & OSINT

Reading and understanding code is part of the intelligence gathering process, no OSINT course covers deobfuscation and analysis source code so we go a little bit deeper. From offensive point of view, it’s priceless knowledge to possess adversary tools source code and be aware how it works and what it is capable of. Blue teams also read and reverse malware code to report about possible damage and find threat actor behind the campaign. That’s why source code analyze and reversing skills are needed to provide complete intelligence report.

Obfuscation - the action of making something obscure, unclear, or unintelligible

In almost every technical investigation, you will come across different types of code obfuscation or encryption, that’s how threat actors hides theirs payload or whole infection/distribution process. These techniques are used to slow researchers down, prevents hijacking and hiding from law enforcement or antivirus companies. In malware world, encryption, packing or obfuscation is widely used to make malicious file or payload undetectable by different anti-malware engines.

Some of the obfuscation methods are easily reversible and based on strings encoding and variable substitutions, however when you have hundreds lines of code it’s easy to get lost.

Red tip #300: Renaming Mimikatz to Mimidogz will bypass China common security products such as 360. :)
— Vincent Yiu (@vysecurity) February 23, 2018

way of obfuscation ;)

Malwarebytes delivered good deobfuscation example of complex VB script

https://blog.malwarebytes.com/cybercrime/2016/02/de-obfuscating-malicious-vbscripts/

In this kind of tasks, patient is your friend. Some write-ups might seem simple to follow but author spent much time to deobfuscate and understand the code. In next chapter we will go into CP distribution network and how one obfuscated javascript took part in this malicious activity.

Deep dive into CP network distribution

I'm aware that this is not a widely discussed topic and only few people go into this world but anyway, I want to share my findings and one method how this network can be managed. To understand roots of this investigation we need to go back in time to September 2018 and my tool “Danger Zone”.

From one paste (listed in the article) I gathered email address and couple domains. It was enough to find more and enter into rabbit hole of CP network. Since then I was monitoring some websites and people, majority of forums are already down but one distribution method survived and I want to describe how it operates.

Exchange of any illegal material should include:

Anonymity
Reaching out as many people as possible
Hidden from third persons
Last as long as possible
Being temporary

You might ask why last two points are mutually exclusive. It depends of the type of material, if it’s Pastebin link to the actual data and you want to share it with someone, it should be temporary and treated as private conversation and deleted afterwards. From other hand, if one distributes material on a large scale he has to care about lifetime of the links and ensure their reliability.

You can read great article about investigation of one of Russian CP hosting website

https://sijmen.ruwhof.net/weblog/1782-massive-child-porn-site-is-hiding-in-plain-sight-and-the-owners-behind-it

So, back to our main topic, initial vector is [REDACTED]. This campaign lasts for couple years already and hundreds of new links are posted every day. It’s easy to Google it, if you know the keywords used in posts, dork or email addresses associated with the posts.

All the links are shorten by different url shortener services like bit.ly, gg.gg, sflk.in, qps.ru and other similar, sometimes known widely to community. It’s worth to mention that these services allow users to get paid per click, so it’s one of their “revenue” process as well.

This was posted on 23^rd of March 2019 and for this day, 4108 pages exists. Around 1100 pages are created per year, which is 3 pages per day and each page contains 10 posts. You can easily calculate that 30 posts are shared per day with couple different links only on this one board.

The links redirect you to Pastebins, then again to bit.ly. It is really confusing to gather and extract all the links but it’s really easy to find your way among this and download final stage material.

Bit.ly links redirect to different cloud providers to deliver first stage access, i.e. html file.

What about ranges you may ask, some of the links were not properly secured, i.e. you could hijack the amount of clicks if you add "+" sign at the end of bit.ly link. As you remember, 30 posts per day are uploaded with average 3 link so 90 links per day. Some of them, of course, repeat but it is still a lot, also over the years links have been taken down makes investigation harder.

Ranges goes in thousands over the month, but it’s only three links and around 90 links per day are posted. It’s quite profitable to add many redirections and earn money from clicks and cloud services that also pay for each file downloaded by other users.

Let’s jump to the file downloaded from dl4free.com (redirection from bit.ly).

The file is a zip archive with password that was shared in one of the paste. After unpacking we don’t have any videos yet but only one html file. Of course, every smart investigator won’t immediately open the file but examine it's content in text editor instead. I use sublime text - it has all necessary functions that we need like syntax coloring, screen dividing or replacing strings.

alt + shift + 2 - Divide Sublime text screen on two parts

ctrl + H - find & replace

The HTML file weights 5kb and contains little bit of css, link to javscript code (sflk.in) (line 8) and large base64 encoded image (displayed on right side of the screenshot) which is spinner showed during loading.

Sflk.in is another url shortener service and in this case redirects to pastebin.com/[REDACTED] where actual javascript code is hosted. This is where technical aspect of deobfuscation comes into play. On the first sight, it looks quite messy but that was the author’s intention to make it hard to draw conclusion on first sight.

Obfuscated javascript code (Censored strings)

First thing I always do is to clean up the code, i.e. putting enters, tabs, just make it looks nice. There are plenty of online tools that can do that for you but I want to show you step by step how this process is done. After beautify changes to the code, it’s recommended to replace variable names to more human readable format. In the script, we see large array “_0x764d” containing encoded strings. To make it clearer we rename it to “main_array” or similar that tells what data it keeps. This is how code looks like after cleaning and renaming variables.

Next step is to decode strings and replace actual array value instead of it’s reference. Strings are hex encoded what means we don’t need any special tools beside python, it prints automatically ASCI represents of the encoded value. Just import the "main_array" into python and print it in usual way

for i in main_array:
	print main_array[i]

Returned strings:

innerHTML
head

createElement
script
src
link
rel
stylesheet
href
appendChild
Added!
DOMContentLoaded
html
body
<h1 class="animated infinite bounce container" id="Loading" style="font-size:3em;">Please wait for a while...</h1>
append
ready
log
reload
location
http://sflk.in/[REDACTED]
then
http://sflk.in//[REDACTED]
http://sflk.in//[REDACTED]
http://sflk.in//[REDACTED]
http://sflk.in//[REDACTED]
https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js
https://cdnjs.cloudflare.com/ajax/libs/vue/2.5.16/vue.min.js
https://cdnjs.cloudflare.com/ajax/libs/animate.css/3.5.2/animate.min.css
all
addEventListener

Beside usual things like jquery, animation css and javascript related functions we found 5 external links to sflk.in that probably serve another scripts, but we still need to deobfuscate rest of the first one.

I recommend, especially for smaller script, to replace the references with strings manually. It will give insight and allow to get acquainted with the code in different formats. If someone would like to do it manually, he should read whole code as a string, write regex to extract “main_list” variable as well as index and then substitute it with proper value from decoded strings array.

We have left with not so complex code - first it creates document and display message “Please wait for a while...” until rest of the scripts will be loaded. “addTag” function as name implies it add tags to html based on the script or link (line4,7), this method allows stealthy distribution of sflk.in links, they are obfuscated, used once and redirects only to particular private paste.

Going deeper we need to investigate other external links which it turns out redirect to another code hosted on Pastebin. This code refers to the variables from lines 47-50 in the main javascript code

Dataname variable is long array consisting of dictionaries with keys “rname” and “fname”. Second key keeps titles of different books (really) and “rname” is base64 encoded representation of the… “fname”? No, value of “rname” is base64 encoded real video name, so book “Ash Lo Novel” has a following base64 encoded key "Baby J) 5 yo kleuterk***e 1 (and f**k a**l 1 shot).mp4". (lot of Dutch references) In later stage “rname” value is decoded and presented in browser and ‘fname’ is used as a reference to the link from variable below.

Basically, they are links to the last stage payload, i.e. videos. It pretends different kind of books so it looks innocent, however you can see .rar extension at the end which points to true type of file. Files are password protected so, it's hard to proof that these links are actually malicious and take part in child abuse.

Last edit time indicates that it's still running campaign and new links appear on weekly/monthly basis. So far it has been viewed by 163k people from August 15th, 2018, what gives around 270 views daily.

For now we identified 4 variables in 4 different external javascript code - Video_1, Video_2, Dataname_1, Dataname_2 that are responsible for keeping links to videos and showing real titles of videos for end users.

The last file includes many scripts related to source mapping, VUE etc, however small piece of code that creates links and shows output to users was also hidden here. After basic deobfuscation I was able to see following code, from last of sflk.in links, that is also hosted in Pastebin.

We found another katfile.com link that redirects to registration for premium CP content, it looks exactly like default sign up website on katfile.com but served on completely different path.

Last function that creates actual links is called “makeFnN” (line 31) and big variables “Video_1” and “Video_2”, with actual links to katfile servers, are passed there.

This code is also not hard to understand, first it takes a http link (from Videos), strip all url related strings (extension, URI) (line 8-9) so only filename left (book title) next it’s compared with the datanames and at the end shows corresponding base64 decoded string related to the previous book title (line 12).

Even when you think you know how it works, you should intercept whole traffic anyway. I’t useful to present proof for investigation and can give you insight of external links in case you missed something.

Content of HTML file hosted on cloud storage

Clicking on any link will redirect to the link associated with the book, as mentioned before.

In this case "Harry and his bucket full of dinosaurs" is a series of children's books written and drawn by Ian Whybrow and Adrian Reynolds. The series is about a 5-year-old boy named Harry, who has a bucket full of dinosaurs. (Obviously!) But the actual archive contains CP content. I'm not a specialist and do not actively follow this world, however I think some of the videos might come from old Playpen archive.

And that’s it, we know how it operates, how it tries to hide itself and what services it uses. The last thing is to draw a process flow that exactly shows steps taken by actor and can be shared among members to know tactics techniques and procedures used in attack.

It’s high level diagram of methods used in this campaign but we can clearly see the flow of delivery.

To sum up, it start from links in different paste services that redirects to another different paste platform, which next gives link to html file hosted on cloud storage. File includes sflk.in links that use Pastebin and serves obfuscated javascript code. From then, we got 5 different external links to sflk with redirections to Pastebin. Four of them contain links and obfuscated title of the movies impersonating legit content at the same time. In one of the scripts, function MakeFnF exists and is responsible for merging everything together to deliver final content.

Knowing exact process and techniques used by adversary gives unimaginable advantage. In current case, 3 things come to my mind how one can abuse this process.

- Hijacking revenue – they earn money from clicks on url shortener services and downloads from cloud storage. Someone can post his links and send users, who looks for this kind of content, into loop of redirections and serves different content at the end.

- Tracking – To identify people who distribute and share this content, someone can put small tracking script inside HTML file, at second stage, and post links as they were identical.

- Taking down – I examined couple delivered HTML files and all of them had links to this same pastebin javascript code. It means that delivery flow is centralized at one point. Cutting off access to the scripts will make whole operation unavailable.

After my report, Katfile has removed around 6000 CP videos and Pastebin took all links down making whole 2+ years distribution network completely unavailable.

Another reversing source code research, that gave me so much fun, you can read here

MALWARE SOURCE CODE ANALYSIS & OSINT

Every cyber security researcher – threat hunter or intelligence analyst deal with source code from unknown origin so in the second part of this episode we will take a high level look on a Redis malware that already infected more than 3600 servers all over the world.

Same as in previous article about solving any problem - top-down approach works the best. Firstly, gather general information about operating methods, external links etc and then go into details if it's needed for high level analysis.

Redis databases were always an easy target for cybercriminals to deploy coin miners and one campaign included worm-like capabilities.

In infected database, 4 keyspaces has been added by threat actors.

The script is identical from both sources, the second entries are used as a backup in case first command and control server will be taken down. It contains following functions:

kill_miner_proc
kill_sus_proc
downloads
unlock_cron
lock_cron

but first it needs to do basic assignments and checks.

Anyone familiar with linux should not have any problems with understanding the code on a high level. Basically, it turns off SE Linux, sets paths, users and attributes.

It also checks for presence of “[a]liyun” string in processes (line 32) and if it exists, it downloads another pair of scripts to remove everything related to Aliyun. You can check code for both here

https://hughsite.com/post/aliyun-remove-monitor.html

First mention about this file was around 5 years ago

It just uninstalls everything related to aegis but it’s hard to find anything useful about what it actually is. Aliyun is a name of cloud that belongs to Chinese company Alibaba and documentation on their site mentions Server Guard agent and process aegis that is located in “/usr/local/aegis/ /etc/init.d/aegis”. So it seems reasonable to turn off any security products that run on the machine.

If you take a closer look on the uninstall link, your offensive mind should suggest that if path to “uninstall.sh” exists so the “install.sh” should also. Of course, it is. You can check the code on Pastebin

https://pastebin.com/17BKpttw

It refers to following external urls, however none of them are active

AEGIS_UPDATE_SITE="http://aegis.alicdn.com/download"
AEGIS_UPDATE_SITE2="http://update.aegis.aliyun.com/download"
AEGIS_UPDATE_SITE3="http://update2.aegis.aliyun.com/download"
AEGIS_UPDATE_SITE4="http://update4.aegis.aliyun.com/download"
AEGIS_UPDATE_SITE5="http://update5.aegis.aliyun.com/download"
AEGIS_UPDATE_SITE6=http://update6.aegis.aliyun.com/download

Basic checks end with hardcoded urls to config, scripts and executable files. Urls don’t serve any files however rest of the C2 is active. I assume there are some additional server side checks.

Kill_miner_proc as name suggest kills any other miner process which is currently running on the machine. It’s just bunch of netstat, grep, ps and kill commands combined looking any relevant string, it also checks if machine has been previously infected with any other type of miner.

netstat -anp | grep 185.71.65.238 | awk '{print $7}' | awk -F'[/]' '{print $1}' | xargs -I % kill -9 %
ps aux | grep -v grep | grep ':3333' | awk '{print $2}' | xargs -I % kill -9 %

Some of the strings were quite intriguing, like email addresses, IP addresses or base64 encoded commands.

kill_sus_proc function kills all processes associated with previously downloaded files, i.e. sysguerd, updata.sh, sysupdata, networkservics.

downloads is a function for downloading files, it uses wget and curl

Lock_cron and unclock_cron set proper permissions to etc/crontab and /var/spool/cron.

In next stage, malware makes it’s persistence by adding new ssh key into authorized keys

It has also self update mechanism, with every run it compares hardcoded file size (line 53) with previously loaded config file. This method allows operator to push new version of malware as soon as it’s ready.

At the end it loads and runs another script is.sh from hxxp://178[.]157.91.26/ec8ce6ab/is.sh. It also contains checks for mentioned before “adiyun” and has a “download” function to remotely retrieve files, which are Massscan and pnscan.

But before download happens, all dependencies are installed and permissions are set.

As you might see, it does not stop and download another script rs.sh from same IP address (line 154) but it was not accessible at the moment of writing. I tried many ways and urls to retrieve this payload but from unknown to me reasons I couldn’t, so unfortunately journey ends here.

Imperva articles is from March 2018 but I hope you see some similarities to this example. It is also self-sufficient thanks to installing many dependencies on victim’s machine, adds sshkey entry in authorized keys, drops coinminer and massscan at later stage. It looks like this might be still the same campaign but updated and possibly with new features in further infection.

To sum up, it has following features:

Download from external sources
Backdoor as ssh keys
Configurable
Updates implemented
Self-sufficient
Checks and kills processes

As you might see it’s not a rocket science to analyse the malicious code,from OSINT perspective, used in a real malware campaign. I took simple example of current campaign only to present my way of quickly getting acquainted with the code, real malware used by nation state actors are analyzed by best researchers and it takes weeks and even months. I presented high level analysis (and unfortunately not finished) but everyone already is aware about TTP used by this cyber criminals and also weak and strong features of malware itself. It’s also enough to determine IOC for the campaign.

Before this analysis, I had no idea about couple commands, processes and their usage but I was googling, reading manuals and learned a lot. I still doesn’t know many details in this code but low level analysis take it’s very time consuming, I would have to analyze many lines of code and check each rabbit hole on the way.

init.sh https://www.virustotal.com/gui/file/3c7faf7512565d86b1ec4fe2810b2006b75c3476b4a5b955f0141d9a1c237d38/detection

is.sh https://www.virustotal.com/gui/file/6faa026af253c784ef97ffec3a9953055d394061a9a1fbfdcc5b28445b73ffdc/detection

In the past I analyzed multi stage packing of Betabot, this analysis is available below

Conclusion

I can’t present every case of deobfuscation, it’s just impossible but I can give some tips how to determine features of code from unknown origin. Reversing application and source code is essential skill in every technical investigation and should be in assortment of all security analysts. In addition, source code analysis gives inside view of adversary techniques, tactics and procedures and allows to protect your assets from similar attacks.

Please subscribe for early access, new awesome things and more.

Offensive OSINT s01e02 - Deobfuscation & Source code analysis + uncovering CP distribution network

DEOBFUSCATION & OSINT

Deep dive into CP network distribution

MALWARE SOURCE CODE ANALYSIS & OSINT

Conclusion

Wojciech

About

Contact

Tags Cloud

Offensive OSINT s01e02 - Deobfuscation & Source code analysis + uncovering CP distribution network

DEOBFUSCATION & OSINT

Deep dive into CP network distribution

MALWARE SOURCE CODE ANALYSIS & OSINT

Conclusion

Wojciech

About

Contact

Tags Cloud

Subscribe