Written by Carolina Christofoletti
Research Methodology Considerations:
When working with qualitative analysis of huge datasets, a very common research methodology used for dealing, among others, with hidden data is called the Snowball Method. To put it simply, it ascertains that, where things are not in plain sight, researchers will start from some approximate entrance points, which shall lead, by indication, to other points of informative interest. Usually, this is a research method used for Qualitative Interviews.
Nothing hinders, but, that is where the labyrinth paths stats to be clarified and interviews are substituted by another referential dataset. For researching the Dark and in the Darkness, the Snowball Method keeps being and still is a valid one. Valid, because it allows, if the researcher accepts to read the data accordingly to what it is actually pointing. As a researcher, and having worked with these kinds of methodology problems, “expected results” is something that no Research Proposal should ask, and not even accept.
This seems to be the wrong terminology, because what we call “expected result” is, in fact, the hypothesis. There are no expected results where things are meant to be unbiased. If the data points out to another, very different and still richer direction, researchers should be able to calibrate, and even change the route to the very opposite direction.
Unfortunately, this is not always possible. Sometimes, because the hypothesis makes the researcher blind. Sometimes, because research projects are being written without any prior checking, so that changing the route becomes impossible once things are already formalized.
Fortunately, my work with the Anti-Human Trafficking Intelligence Initiative (ATII) and the University of São Paulo give me the freedom to “correct routes” in order to keep the results in their best quality. Personally, as a researcher, my very first question tends to be not my hypothesis, but the “what data do you have” one. Also, because I shall be able to evaluate, prior to starting to analyzing anything, if any conclusion is possible.
The richness of working with problem-solving Computer Programmers dedicated to a noble cause such as fighting CSAM crimes, Law Enforcement personnel, and adequate software is what is needed to make the missing data appear. Consequently, research advances. And, in the feedback modus, things grow as an innovative solution.
In this opportunity, I want to present to you some preliminary results of my last analysis regarding CSAM forums on the Dark Web, conducted in partnership with ATII and Law Enforcement Personnel. Preliminary because, in order to be able to affirm, with certitude, my hypothesis (which, in my proposed inversion, is in fact my conclusion), I would need to analyze, from a technical point of view, the webpage structures.
As such, I will not enter, in this opportunity, in detail about what the anti-forensic feature of what is being observed here is, but rather restrict myself to share some preliminary light on what I have, until now, been observing. The research here is, as such, is not finished. I need to count, on all cases, also with the possibility of facing not a mechanic, but a temporal bias. I will deconflict that later on. For now, all I can say is that my research is being conducted on active, already reported CSAM forums.
Let us go, now, to the specific study conducted in case. Following a methodology that I have proposed, multiple CSAM forums hiding in the Dark were searched for a unique identifier. Where the same file was found in a second, different place, a note was kept. A second note was also kept, regarding some further, additional information related to that file. This second data was also analyzed for patterns. Happily, with a successful result. The non-hash pattern actually existed.
The idea that CSAM files tend to spread very rapidly in the Dark brings us nothing new under the sun. What brings, in fact, a moonlight in the starry sky is the fact that, additionally to the unique identifiers, those CSAM files share:
a. a second identifier that is actually the same identifier contained in its mirrors
b. an identifiable sharing pattern
The point I want to discuss is point “a”. Point “b” will be held on standby for the next article.
Even though I will not, for intelligence reasons, disclose what this second identifier is (and no, it is not metadata), I can affirm to you that it led to a third research hypothesis. Maybe, this second identifier only exists because criminals keep, as part of the underworld’s code, a safety rule that requires, necessarily, “burning” this identifier- and therefore why they are constant.
Either one burns its own computer or mobile device, or one burns the identifier. Criminals seem to have chosen the second option. And, from a forensic point of view, I might say that this is, indeed, a point.
Without mentioning any compromising information, I can say that, from a general point of view, CSAM forums in the underworld contain, overall, repeated files hosted at its open locations. And, with few exceptions, the Big Data here is scarry: it seems like, CSAM forums are endless “mirrors” of the very same thing.
Qualifying it as mirrors implies analyzing it in depth, for the mirror qualification is a complex one. I will, for that reason, not qualify things as mirrors, but rather say that the URLS do not have the same name. Yes, names: CSAM forums have names.
This study has limitations, and one might say it openly. The first one is that we have not, and could have not, analyzed all existent CSAM forums on the Darkweb. The dataset that has been compiled is, originally, an already limited one. Even though this selection was meant to be blind, we must also count on the hypothesis that those CSAM forums I am now analyzing were found because they were, at some point, associated with each other.
This does not invalidate the conclusion since this is the very same path that criminals go on the Darkweb, which was now simulated for analysis. There is a second very interesting study to be conducted, still, on this behalf. By now, I can say that this can be also methodologically validated, and also used for deeper investigative purposes. Additionally, where things have gone deeper (a dataset that was initially excluded from this analysis), things do not repeat so often.
I will keep observing, with ATII and Law Enforcement personnel, how this second identifier evolves. In a further opportunity, I am planning to cross it with the webpage structure and the CSAM forums rules where they exist (rules are usually seen as a common feature of CSAM closed forums) to see if my anti-forensics “safety tip” explanation is proven.
Because we are not talking about a unique case where this appears, but an extensive dataset, I am afraid we are in front of a replicable CSAM club mechanic that could be operationalized, luckily, to solve the present intelligence issue involving what has been, until now, the hash databases.
To be continued . . .