How big data technology is transforming fraud investigations

The inside story of the Paradise Papers leak

Linkurious is a partner of the International Investigative Journalist Consortium (ICIJ) since the Swiss Leaks and the Panama Papers scandal. ICIJ network of 380 journalists used Linkurious’ data analytics software to investigate the Paradise Papers leak. With more than 1,4 terabytes of data, the Paradise Papers is the illustration of the new possibilities for the fraud investigation world.


Paradise Papers

Paradise Papers © ICIJ

The leak

On November 5th, ICIJ and 96 media organizations around the world shared revelations based on a new massive leak. The Paradise Papers contained 13,4 million documents from offshore service providers Appleby and Asiaciti Trust and 19 other registries of offshore tax havens.

German newspaper Süddeutsche Zeitung managed to obtain 1,4 terabytes of confidential files that were shared with ICIJ. Alongside with 380 journalists worldwide, they processed and scrutinized the information during several months before releasing their findings.

The Paradise Papers represents the second biggest leak in history after last year’s Panama Papers (2,6 terabytes). It is also another case of a successful large-scale data-driven investigation that illustrates the recent shift in fraud investigation. Graph technology is changing the scale and the possibilities of this field by providing investigators, reporters or analysts with new tools to handle the complexity of data-driven investigations.

The complexity of working with big data

When it comes to big data, challenges derive from the nature and the volume of the data. Whether it’s a data leak or a financial company’s internal data, the amount of data we are dealing with is considerable. While in the Paradise Papers leak, journalists were dealing with about 1,4 TB of data, some organizations can gather dozens of terabytes every month.

To complicate things, investigations usually start from raw, unstructured data.  And it’s impossible to automate or scale the investigation without a predefined-data model or any kind of organizational logic. The files obtained by Süddeutsche Zeitung included millions of loan agreements, financial statements, emails, trust deeds and other paperwork dating back to nearly 50 years.

The large amounts of data and their unstructured form raise a first difficulty. Organizations have to handle the processing of these large volumes of raw data into computable information that can be organized, stored and analyzed.

“Depending on the source, we had different formats and many of those were not machine-readable” declared Pierre Romera, ICIJ’s Chief Technology Officer.

The second obstacle is related to the way we store data. The success of fraud investigations is determined by the finding of connections between entities. Though, in many investigation cases, data is kept in silos that make it difficult to cross-reference it and highlight connections. For the Paradise Papers, ICIJ’s reporters conducted the investigation with data stemming from the leak but also from public databases. To make siloed data talk, it’s essential to bring everything together.

Finally, data-driven investigations are reducing the availability. Like for ICIJ, making the data exploration accessible to non-tech-savvy reporters is both a challenge and a necessity. Otherwise, without an army of data analysts and database specialists, data-driven investigations would be nearly impossible to lead.

According to Romera, “one of the key challenges is to make our technology user-friendly for the journalists so that everyone around the world is able to use it.”

The ICIJ’s method: an efficient approach to fraud investigation

As for the Panama Papers, ICIJ proceeded in several phases to make the documents exploitable by its 380 journalists network.

The Data & Research unit was in charge of processing the documents into a machine-readable format, indexing and connecting them together through their metadata. ICIJ used Optical character recognition (OCR) and content-extraction technology Extract to transform and Apache Solr to index the unstructured data into a searchable knowledge center.

“The knowledge center was essential to let our partners access and explore all the information,” stated Romera.

Additionally, they used graph technology to bring all the sources and data together. The team made use of Talend ETL (Extract, Transform, Load) tools to load the data into Neo4j, a graph database platform, creating a network of nodes and edges. On top of that, they provided the reporters with a visual investigation and analytics software. Linkurious Enterprise let them explore the data, connect the dots and share visualizations of their stories.

ICIJ technology

The result provides unique insights into the offshore interests and tax activities of more than 120 politicians and world leaders. Reporters highlighted the relationships between politics, offshore companies, and their lawyers.

According to Romera, “graph visualization technologies like Linkurious are a great asset. It’s intuitive for the non-tech-savvy reporters. They just need to click on dots to expand the connections and uncover persons of interest and potential stories in a short time-frame.”

Linkurious Paradise Papers

Linkurious Enterprise visualization interface displaying former Icelandic prime minister’s indirect connections to an offshore account.

With this approach, analysts and investigators can deal with data growing complexity and heterogeneity and also gets around the problem of multiple data sources and siloed resources. They can uncover hidden networks by focusing on the relationships in complex data. More globally this method gives significant results when applied to fraud investigation, anti-money laundering or first and third-party bank fraud.

The future of fraud investigation

With 20 people only, ICIJ was able to organize an efficient and reproducible process for 380 journalists to investigate millions of documents for the Paradise Papers. The breakthrough revelations were made possible by Linkurious’ software for data analytics and visualization. Today this technology is used by public authorities such as the French Ministry of Finance, other European countries to fight tax evasion and private organizations like banks.

With the investigation tools of the Paradise Papers available, banks, payment providers and money transfer companies can block more frauds now and comply with anti-money laundering regulations.

Linkurious’ CEO Sébastien Heymann believes that it is the right time for companies in the financial sector to improve their investigation units with modern software. Technology will help them dramatically increase their efficiency, control the cost of compliance, and meet regulatory expectations.

Tags: , ,

5 Responses to “How big data technology is transforming fraud investigations”

  1. David Murphy November 30, 2017 at 4:27 pm #

    Fascinating article but leaves some unanswered. How was the data model constructed? How was the tagging of the data carried by such a small team on so many documents? How was the tagging mapped to the model? IT would be very interesting to understand the process in more detail, but great piece of work.

  2. Elise Devaux December 1, 2017 at 8:34 am #

    Thank you for the feedback.
    For the graph model, they used the same one than the Offshore Leaks database model (4 node types: entity, officer, address, intermediary), so the Paradise Papers data can be ultimately added to this DB. You can find more details about this in Neo4j’s article (
    It took months to ICIJ team to merge all the sources together. Textual documents were indexed in full text in a search engine thanks to different tools. Different company registries were imported into the graph DB and Linkurious to look for suspicious connections.


  1. Finding insights with graph analytics in Linkurious Enterprise - Linkurious - September 28, 2018

    […] a closer look at a real-life graph dataset, the Paradise Papers dataset, created by the ICIJ to investigate the world offshore finance industry. We use Linkurious Enterprise to query, analyze and visualize the data using graph analytics tools […]

  2. Finding insights with graph analytics - - October 7, 2018

    […] a closer look at a real-life graph dataset, the Paradise Papers dataset, created by the ICIJ to investigate the world offshore finance industry. We use Linkurious Enterprise to query, analyze and visualize the data using graph analytics tools […]

  3. The graph visualization landscape 2019 - OneMediaLogy - April 16, 2019

    […] need to identify shady financial schemes in terabytes of data? Graph data visualization. You need to understand the human dynamic between criminal networks? […]

Leave a Reply