Big data, terrorism and the role of the Digital Humanities

This week one friend who prefers not to be mentioned here (thanks anonymous friend!) sent me a link to this article on how intelligence agencies around the world are using “big data” to fight terrorism.

Another recent article reports how Israel’s Shin Bet domestic intelligence agency “have begun using advanced algorithms to sift through a vast volume of social media activity. Their goal is to search for the small – yet deadly – needles in a haystack of online information, and find warning signs pointing to an individual who is primed to strike.” ‘Primed to strike’ is an interesting expression that reminds me of Spielberg’s movie, Minority report, where a special police unit is able to arrest murderers before they commit their crimes… But in fact the Chief of “PreCrime” unit John Anderton (Tom Cruise), finds out that the system is flawed and “reports” by the PreCogs (three mutated humans who predict the future crimes) are being manipulated.

The present geopolitical scenario pushes governments and intelligence agencies to develop strategies to prevent crimes much in the style of Spielberg’s movie (based on a story by Philip K. Dick, by the way). However, for the moment they prefer to use brick and mortar software rather than some unreliable psychic powers. Nonetheless, in both cases it is a story of how data and the stories they generate are used and misused. A much less recent academic paper jointly published by US and Israeli scholars explained more in detail how one of this contemporary system works:

“The proposed methodology learns the typical behavior (‘profile’) of terrorists by applying a data mining algorithm to the textual content of terror-related Web sites. The resulting profile is used by the system to perform real-time detection of users suspected of being engaged in terrorist activities. (…).”

Particularly interesting is the (ideological? ontological?) approach behind the “Detection Environment”:

“It is assumed that terror-related content usually viewed by terrorists and their supporters can be used as training data for a learning process to obtain a ‘Typical-Terrorist-Behavior’. This typical behavior will be used to detect further terrorists and their supporters.”

To see how it works a “Typical-Terrorist-Behavior” you can watch this video (thanks to Geoffrey Rockwell for pointing this out). Cathy O’Neil, author of the book Weapons of Math Destruction, in a recent interview said that some of these opaque algorithms can have descrutive consequences on people, i.e. “they actually can really screw up somebody’s life.”

minority-report-philip-dickBut as potential human rights violations are not the main concern of Digital Humanities, I won’t discuss them here. Instead my attention was attracted by the software Palantir, used among others by the United States Department of Defense. Palantir (a word familiar to Tolkien’s fans) does many things, like building associations and connections through a set of collected data and then visualizing everything in nice clusters, graphs, histograms, etc. The online demo shows how “easy” it is to connect people, places, dates, etc. and draw reliable conclusions on the 2009 Jakarta Hotel bombings. But as Geoffrey Rockwell was asking when analyzing the CSEC presentation in 2013, “Could they come to believe that all there is to know is what is revealed by signals intelligence?”

Putting aside for a moment the surveillance issues and the ideology that informs data collection, in these mass surveillance tools I see a huge challenge for the digital humanist: what we see are “representations” based on other “representations”… The semiotic, discursive and rhetorical layers of these tools are totally opaque, everything is taken for granted. However, it is one thing when you are using multimedia to build an interactive story or a literary archive, and another thing when you use locked software to build a hypertext connecting facts and people to terrorism. So, if these surveillance tools build stories, what happened to interpretation? And when data started to become neutral?

History has never been too generous with the social sciences and the humanities, but at least we knew (and learned how to teach) how symbols and representations were built, and although we did not control them, we could find out who made them and for what purpose. But today layers of digital representations and connected dispositifs – what they call big data and algorithms – are escaping any rational and perhaps human law. They produce knowledge, they control knowledge, and they manipulate knowledge. For what purpose, I suspect, they don’t even know.

So, in all this mess, where are we, digital humanists? I think once again we are missing the opportunity to get involved in these knowledge-shaping processes, and try to direct them towards more transparent, more equitable and more human aims.

*I’m grateful to Geoffrey Rockwell for his help with this post. With Stéfan Sinclair he was among the first DH scholar to write on text analysis and surveillance issues (see their recent Hermeneutica book).