Written by Carolina Christofoletti
Whilst a great part of the keyword-crawlable Dark Web is made of criminals announcing, through particular keywords, their criminal intents to others, the other part of the underworld seems to be made of cyber counter-threat actors setting those very same “announcements” from the other side of the brick wall: as “bait”.
The Dark Web is a place that not only Law Enforcement Agencies have declared a cyberwar against Child Sexual Abuse and Child Sexual Abuse Material (CSAM): Cyber-threat actors and people with excellent computer skills have also declared war against those – some of which also habitat legal Dark-Web-related discussion forums. To put it simply, those are also the guys who are able to create an easily phishible version of the Dark Web.
In terms of research methodology, this finding is to be read as such: No values are absolute and everything must be calibrated. Not all paths are “true ones”, not all the leads are worth pursuing (such as the obvious scan ones), and keeping a methodological track of what one is doing is more relevant than ever not only for researchers but especially for Law Enforcement investigators.
When we talk about Child Sexual Abuse Material (CSAM) forums in the Dark Web, there is three valid hypotheses:
a) URLs are born from maieutics (which is an absurd one. Keywords cannot lead to alphanumerics URL)
b) Criminals are using Dark Web Directories to access those, which is a more plausible one
c) Criminals are using Open Web trusted peers to get to know the Dark Web directories, which is one I see as very credible
The purpose of today’s article is to examine the hypothesis, diving deeper into the Dark Web search engines Parameters, and, when we talk about Dark Web search engines (which have declared war on CSAM keywords, linkage operates through the so-called directories.
The Dark Web Directories
If the Dark Web’s gates could be placed (conceptually) anywhere, it would have been in the “directories”- broadly said, places where compilations of .onion addresses are being not only shared but also cataloged and constantly updated by the directory administrator. Without the links, one cannot access anything. Directories are what make the Dark Web searchable.
Directories are, most of them, indirectly indexed by Open Web search engines. After all, that is not the only illegal content that the Dark Web is made of. Illegal Directories, on the other hand, are usually not indexed as such, but rather as links inside links inside links – where the root will meet, at some point, an Open Web URL.
From a Trust & Safety perspective, this explains how hard moderating the spread of CSAM links can be – especially if one has no idea of what is going on in the Dark side of things. Yes, Trust & Safety Teams and Researchers are not the only ones reading search engines’ indexing policies – criminals are also reading it. Not only are they reading it but carefully studying it. Those instructions are parameters around which criminals are doing their index-but-do-not-leak mathematics.
Directories are “special pages” where .onion (otherwise non-searchable) links are being compiled. Directories, which are (most, but not all them) often themselves a .onion URL, and they interact with Dark Web users through search engines. Sometimes, this linkage is made through Open Web search engines such as Duck Duck Go (which is in no way a Dark Web search engine) but, sometimes, the tools used are more specialized ones: Dark Web Search Engines.
Dark Web Search Engines
To put it simply, the difference between an Open Web and a Dark Web search engine is that while Open Web ones are obeying, for legal reasons, the “non-crawl” command (which Open Web criminals usually write in their codes), Deep Web search engines are not. As long as Dark Web Search Engines are finding the URLs, they are indexing them.
Being a .onion is one thing, being non-crawlable is another one. In all cases, Dark Web search engines could crawl both cases. Some of those Deep Web search engines are specialized in .onion results but, still, some of them keep displaying mixed Open & Dark results. Because things are defined by their aims and not by their technical structure, not all Dark Web search engines belong, necessarily, to the Deep Web environment.
Because crawlers are content-neutral and because the Deep Web (non-crawlable world) is known to store, behind the TOR, I2p, Freenet, and other Dark Web anonymous walls, Deep Web search engines are, also, to great extent, Dark Web ones.
If it wasn’t for the directories, the Dark Web would be “uncrawlable”. If it wasn’t for the Deep Web Search Engines, robot.txt, the Dark Web would be unsearchable. If it wasn’t for this combination, the .onions wouldn’t be displayed anywhere.
Dark vs. Open Web Search Engines
While it is true that such a thing as a “Dark Web search engine” is already existent, those are usually non-operational depending on where you want to go. Dark Web Search Engines face, still, a certain degree of moderation and a high degree of technical issues.
Dark Web Search Engines are not, like its Open Web peer, searchable 24 hours per day and 7 days per week. Furthermore, having their structure most of the time hosted in a .onion, it is sometimes difficult to know what their valid .onion (and non-malware poisoned) address is.
Crawlable illegal things
For anyone researching the Dark Web through Dark Web search engines, there is something that is, up to now, worth mentioning. Even if the search engine supports the keyword search parameter, the underworld does not. There is nothing more mined than the Dark Web Search Engine champ.
If you are trying to create any point whatsoever from keywords, you might be paying attention to the fact that criminals are announcing their illegal intent and links, but that you are trying to find the non-uniformized enemy in the middle of a battlefield and under crossed-fire. Things need to be retrospectively analyzed. And the only way to do it is, for sure, is if you have cooperation and authorization of Law Enforcement to go after that.
That is what I have done here. And I discovered that most of the times Dark Web search engines said they had a non-void result to the parameter (not keyword) I was looking for. The results were “fake” ones. Generally speaking, I can say that depending on the search engine, getting a non-in-URL-readable-scam was not easy.
Where my internal parameters led, in approximately 70% of cases, to the final parameter I must validate, we were talking of an URL crawled one single time… in the middle of pages and pages of search results. Great news: Dark Web search engines are confused about how to display these search results, about what the valid links are, and especially, how to provide “validation” in a world where, as we have seen by the amount of “fake” results, anyone can claim the URL to be anything.
Mined champ? Keep walking.
The “Cyber-Threat” Scenario
Prior to following up with the next paragraph, just a terminological consideration: Even though malware sharers, phishing people, and other threat actors are considered to be, as mentioned, “threat actors” – when their acts are aimed at legally protected good, I do not consider the threat to CSAM forums worthy of being considered as a “Cyber Threat”. Coming from a Law School, I cannot accept the circumstance of having two opposite means under the same concept representation. Furthermore, I can also not accept the claim that there would be any “legal right” to be protected (as the right of not having your machine infected by malware) if you are looking explicitly for CSAM.
For Dark Web search engines, the first step to make their products “a little less legally threatening” – a case that we are going to see in sequence, identifying and removing the CSAM links from search results would be the first step. What those parallel links (also announced as CSAM directories) coming in my search results, together with my single validation point, meant is that one of those great cyber threat researches that would be worth conducting. For I had the help of Hades (a Dark Web monitoring tool licensed by the Anti-Human Trafficking Intelligence Initiative) to help me validate what the real link hosters were. While I cannot affirm that those other links were fake ones, I can affirm, combining Hades with the Dark Web search engine mechanic in question, with certitude, that those were click-to-see ones.
CSAM forums are full of discussions saying how click-to-see, but also search engines access, are concrete dangers. As of @userA; “If you are trapped in the middle, it is your fault”. This is the siege, in its purest symbolism: A situation where CSAM criminals will provide no support (as of @userA: “you can be a threat”) and where cyber acts are still shooting. Also, forget not that different Dark Web search engines will lead to different search results, which expands still the degree of incertitude.
Dark Web Search Engines “Content + embedded Link Moderation”
To some extent, Dark Web search engines are also opposers to some very specific criminal things going on in the dark. As black markets, they mirror the moralities of the non-cyber world: While counterfeiting money, stolen goods, and even drugs might be acceptable, where moderation exists (report abuse button), CSAM is not tolerated – by no means.
Ahmia, for example, does crawl CSAM links: To display it but in a non-crawlable list, which the Dark Web search engine it is giving, in the form of a check-your-lead white flag, as a hash table to Law Enforcement and Researchers. This is the case, for example, of Ahmia – as hash values are non-reversible, the intelligence here works only one-sided.
With this preliminary consideration done, I would like to take a minute to show, mathematically, how mined the CSAM Dark Web champ is. Of course, for legal reasons but also because no CSAM Researcher needs to face CSAM content to be able to research anything – I am working with very specific outside parameters. Valid non-compromising methodologies for sensitive cases as such: That is how one should evaluate a Dark Web Researcher.
And, in the Dark Web, one thing is the finding that criminals (and most specifically, CSAM criminals) know exactly what they are looking for. They know how those should look like and, also, what the hidden referrals are. Another thing is predicting how often criminals are able to find, without a previous bookmark, a non-trapped version of it. And, as we will see in sequence, the answer to this second question is: Not so often.
With the help of Hades and with a previous law-enforcement validated lead in my hands, I was able to prove that the CSAM forum I was analyzing was properly listed in eleven (of all Hades crawled DarkWeb) Dark Web directories (link to link to link). Nine of those displaying the “right” URL, two with fake ones and two of those considered as “mirrors” of two links-to-links URLs displayed in the same list. Curiously enough, out of my eleven linked URLs, the two fake ones were one explicit CSAM advertising directory and a mirror… of a still active CSAM forum.
The most interesting part of it is that, where my valid leads were reverse searched, I usually faced a situation where the same Dark Web search engine that was answering correctly to my internal parameters search query were leading me to, literally, nowhere when I searched the valid links itself. This may indicate that, somehow, the problematic feature of my URL is not unknown to the Dark Web search engine but, only, not so obviously crawled. Open Web search engines have a very similar problem.
The existence of two mirrored CSAM forums in a list of seven directories makes one think about why such a thing as a CSAM directory would be so constantly mirrored. Take as note the fact that some CSAM forums (and not directories) have ten, twenty, or even sometimes hundreds of mirrors. What does that mean? The case here seems to provide a very good hypothesis . . .
If one is to dig a little bit deeper into the analysis and try to know what is actually happening or happened to the mirrored URL, one will discover that, in one of them, the three versions of the Dark Web CSAM directory were taken down and seized by the police – an information one finds in the Dark, but also in the Open Web. Still, this was yet an ‘online’ onion. One of those two mirrors is, very probably, law enforcement controlled – a strong hypothesis if you consider that these very mirrors were easily Open Web searchable.
For the second case, the directory was one of those rare cases where Dark Web users could “accidentally” click and (shock alert), face CSAM for the way it is indexed. The mirror was, in this case, an update to a non-deprecated onion version – a process that changes, substantially, the links.
From Dark Web search engines to CSAM forums and forum directories, how far was I? As far as… a second-degree connection.
A Suspend-to-Check, a Trust-your-Reporter, or a Hide-it-in-this-Section Dark Web Search Engine moderation?
As the good research ethics mandate, everything one finds in the middle or in the end is to be reported to CSAM hotlines but, also, to the search engines themselves. What happened with my Dark Web Ssearch engine Abuse Report was but a case that is worth mentioning.
In all places that I found active links to active CSAM directories or forums, if the report abuse button is active, I have pressed it. In two of them (an Open Web Directory), the result disappeared from my next searches. Maybe, the Dark Web search engine has removed or suspended my reported URL. Maybe, it has only been hidden from me. Maybe, it is a matter of non-stable pages: Also for Dark Web Search Engines.
Now or Never: The V2 to V3 Onion Services Transition
CSAM routes are full of traps, and this is not a 2021’s finding. Never, ever, have but “true” CSAM links have the chance of being properly replaced by something else – and for law enforcement purposes. TOR is mandating CSAM forums to update their onions and, consequently, changing their .onion addresses. Deprecated CSAM forums will either move or die.
After all, if one does not know what onion address one is looking for and if V3 onions look pretty different from the V2 ones, everything will look legitimate. Remember also that, on CSAM forums, link directories are usually controlled by the admin, which will now have to collect the V3 versions “manually” and in the complete Dark and who does not moderate old posts.
This is a historical moment for CSAM forums, with their routes, to die.
Considering that .onion addresses are usually non-static and that some CSAM forums have, as shown by Hades, more than ten, and sometimes even more than a hundred possible – not necessarily valid- addresses, the chance of having someone looking for CSAM clicking an old .onion address which was posted and validated in a CSAM forum one year ago and whose address has been appropriated by the police is, let us recognize, very high. This is but the V2 situation.
The V3 scenario is but a new one. The URL pointers will start running… practically from zero. How long will it take for Dark Web and CSAM routes chart to the updated ones? Maybe, some URLs will become forever lost – deprecated, forever. But, with timely Law Enforcement action, those routes can be also falsified. And, let us recognize, the chances of CSAM criminals bypassing this posted gossip is also very low. Shuttle everything.
So far as CSAM forums have been announcing, there is but a single URL vulnerability point that is being overall checked. For purposes of Law Enforcement URL counterintelligence, the V2 Onion Services transition to their V3 version could be no better time for working on that.
Think about it.