A “tsunami of data”: the investigative technology behind the Pandora Papers

Multi-million dollar beachfront properties purchased by King Abdullah II of Jordan through shell companies. $1 billion in foreign assets hidden away in trusts in the US. A $22 million chateau purchased on the French Riviera by the Czech Republic’s prime minister through offshore companies. These are just a handful of the mass of revelations that came to light through the Pandora Papers investigation.

logo of the International Consortium of Investigative JournalistsLed by the International Consortium of Investigative Journalists (ICIJ), the Pandora Papers investigation comes just 1 year after the FinCEN Files and 5 years after the Panama Papers. This latest investigation – the largest journalistic collaboration ever undertaken – exposes the truly global nature of tax and secrecy havens that enable billionaires, politicians, and fraudsters to conceal their wealth and assets. How did ICIJ and their media partners around the world investigate the most expansive leak of tax haven files in history? 

What is the Pandora Papers investigation?

At the heart of the Pandora Papers is a truly massive amount of documents – “a tsunami of data” in the words of ICIJ. The leak included 11.9 million records from 14 different offshore services firms, which amounted to 2.94 terabytes of data. It took more than 600 journalists from 150 media organizations over a year to fully investigate.

The findings in the Pandora Papers investigation echo what we learned from the Panama Papers. But the Pandora Papers go further and tell us even more. The investigation implicates 330 politicians and 130 Forbes billionaires, in addition to celebrities, fraudsters, members of royal families, drug dealers, and religious leaders. It gathers information on more than 27,000 companies and 29,000 beneficial owners – over twice the number identified in the Panama Papers. 

That the leaks came from 14 different firms and include documents in multiple languages is even more evidence of a complex global system of financial secrecy and tax evasion. This isn’t the work of a few bad actors, but a worldwide system to conceal wealth and assets. The result is more global inequality, increased distrust and discontent among populations around the world, and criminality and corruption going unseen and unpunished.

Making sense of millions of documents with Linkurious

How do you make sense of 11.9 million documents, which include 10,000-page PDFs and years worth of emails, written in many different languages? Thanks to several powerful technology solutions, including the Linkurious investigation platform, ICIJ was able to unravel the stories in this massive leak in just over one year – a massive feat considering only 4% of the files were structured to begin with.  

First, ICIJ had to identify the files containing beneficial ownership information and structure that data. They combined individual spreadsheets into master spreadsheets. For PDF or document files, ICIJ used programming languages like Python to automate data extraction and structuring where possible. For more complex cases, the ICIJ used machine learning and other tools like Fonduer and Scikit-learn softwares to identify and separate certain forms from longer documents. 

After filtering and structuring the data, the Linkurious Enterprise investigation platform and Neo4j graph database were able to help the journalists easily search, explore, and visualize this huge quantity of data.

Linkurious Enterprise is built for this. It enables analysts and investigators to easily explore and visualize data through a network analysis approach to quickly understand all the complex direct and indirect connections within huge amounts of data.

A longtime user of Linkurious through the Linkurious for Good program, ICIJ also used the investigation platform for the Panama Papers, Paradise Papers, and FinCEN Files investigations. “Linkurious is very user-friendly. It’s easy for anyone to use, even without a technical background. Yet if you are an advanced user, you can also turn it into something extremely powerful. It’s a very versatile tool,” explains Miguel Fiandor, data specialist at ICIJ. 

With Linkurious, the journalists were able to collaboratively establish a precise picture of the tens of thousands of businesses and beneficial owners (UBOs) implicated in the leak. Through intuitive network visualizations, the journalists could explore and understand the connections between all of these entities across providers. They were also able to bring in external data sets, like sanctions lists, previous leaks, and public records, to help identify the most interesting stories and give extra context to the leaked data. 

“We have complex data sets and our partners can explore them independently on Linkurious,” says Emilia Diaz-Struck, research editor at ICIJ. “In this investigation, we focused on beneficial owners. If you know who the real person behind a company is, you can understand whether or not they’re potentially connected to any controversial activity. Using graph technology, it’s easy to understand corporate structures and who is behind a given company.”

Some of the visualizations created for the investigation are available to the public on ICIJ’s website

Note: This article was written with information available as of October 6, 2021 and may be updated as new information becomes available. 

Subscribe to our newsletter to keep up with the latest from Linkurious. 

Banner reading "Join our monthly newsletter!"

 

Tags: , , , ,