This post was originally published on Global Witness blog on January 17th, 2018 by Sam Leon who takes us behind the scenes of Global Witness’ investigation “Narco-A-Lago: Money Laundering at the Trump Ocean Club, Panama”. He explains how the teams collected, processed and analyzed the data gathered for the investigation.
JOINING THE DOTS: FROM PROPERTY DEALS AT THE TRUMP OCEAN CLUB, PANAMA TO LATIN AMERICAN DRUG CARTELS, HOW WE’RE USING NEW TECHNOLOGY TO EXPOSE CORRUPT NETWORKS
In October Global Witness’ investigation Narco-A-Lago: Money Laundering at the Trump Ocean Club, Panama exposed how profits from Colombian cartels were trafficked through the purchase of real estate units at a Panama property development which Donald Trump made millions selling his name to. The investigation was based on months of research, interviews and in-depth data work on public sources including the Panamanian Company Registry, the Panamanian Land Registry and OpenCorporates.com.
This blog post describes in broad brush strokes some of the steps we took in the data research for Narco-A-Lago: Money Laundering at the Trump Ocean Club Panama. It outlines how we built programmes that took raw data from OpenCorporates.com and turned it into something that the broader investigative team could visually explore, verify and then incorporate into their final story. The scripts could be re-run quickly whenever the underlying data was changed, for example, when new companies of interest were identified as having been involved in purchasing property. The approaches outlined here are becoming increasingly central to how organisations like ours collaborate around digital sources and data to expose corruption more effectively in the digital age.
AUTOMATING SEARCHES FOR RELEVANT COMPANIES USING OPENCORPORATES
By using scripts that automatically queried OpenCorporates we were able to incorporate much larger sets of information in our research than would be allowed by manual extraction. OpenCorporates is the world’s largest open database of company information; it contains the records on over 138 million companies registered in a huge range of jurisdictions. As well as offering web-based search it also offers a number of tools to data journalists via its Application Programming Interface (API) to help them find, extract and connect large numbers of companies in automated workflows. In some cases this can substantially reduce the amount of time required to do investigative tasks. For instance, extracting all the companies a given individual is a director of can be automated via writing a simple computer script. In the case of Narco-A-Lago: Money Laundering at the Trump Ocean Club Panama we used this to identify companies that had been created apparently in order to purchase properties at Trump Ocean Club.
USING NETWORK ANALYSIS TOOLS – NEO4J AND LINKURIOUS – TO REVEAL PATTERNS IN THE DATA
We used a visualisation tool called Linkurious that enables us to share and spot patterns that would otherwise be difficult to detect if just looking at tables and statistics. As is the case in many Global Witness investigations, our core dataset on Narco-A-Lago: Money Laundering at the Trump Ocean Club Panama was made of properties, people and companies that represented a complicated real-world network. This network had certain features that were of interest to our researchers. For example, we wanted to know which people directed the most property-purchasing companies and who else the companies were connected to.
In order to find this out and explore the connections in our data we used a database tool called Neo4J and an exploration technology, Linkurious. This tool stack was also used by the International Consortium of Investigative Journalists to interrogate the data contained within both the Panama and Paradise Papers and now powers their Offshore Leaks site. While the dataset for Narco-A-Lago: Money Laundering at the Trump Ocean Club Panama was much smaller than either of these gigantic leaks, these tools helped us to do three things.
First, it enabled us to quantify and see how individuals of interest, such as Alexandre Ventura Nogueira, were connected to companies apparently used to purchase properties at the Trump Ocean Club, Panama:. Second, it allowed us to gain a different perspective on our data. Data visualisation has long been understood in statistics as not simply about generating pretty insights, but also integral to the process of data exploration (for those who haven’t seen it, Anscombe’s quartet is a wonderful illustration of this). Third, it allowed us to provide an interactive snapshot of some limited portions of our data when we published the report online:
The interactive above illustrates the networks we found using OpenCorporates and network analysis tool, Linkurious. It shows some of the directors, officers, agents, and subscribers of 7 Homes companies discussed in our investigation Narco-A-Lago: Money Laundering at the Trump Ocean Club, Panama. Click on the companies to find out more information about them. Company and officer data was extracted from OpenCorporates and verified in the original Panamanian company registry. Read the full report here.
CREATING REPRODUCIBLE ANALYSES
An analysis is reproducible if you can reach the same conclusions or make the same inferences based on the same data. The importance of reproducibility to the development of modern science has been understood for at least three centuries. It is the means through which the conclusion of one person or group can be tested and interrogated by others. As more journalism becomes dependent on data sources, the tools used for doing reproducible analysis in the sciences have started to be applied to investigations. These tools help data journalists and the teams they are working with move from the raw data to their conclusions, documenting each step on the way. For those wanting a good explanation of the key components to a reproducible analysis that can be shared with others, see Jeff Leek. In Narco-A-Lago: Money Laundering at the Trump Ocean Club Panama all our analysis was written in the easy-to-learn programming language Python and stored as a reproducible script in a Jupyter Notebook. This had three key advantages.
- If the underlying data was changed (for instance we identified new companies that had been set up with the apparent intention of being used to purchase Trump Ocean Club properties) the analysis could be re-run at the click of a button. This saved us time and reduced the possibility of introducing error by re-doing the analysis.
- Every step that we took extracting, cleaning and analysing the data was written out in plain non-technical English so that we developed a record of each assumption we made. This meant that lawyers, researchers and investigators could understand and agree on the chosen approach. Having the analysis organised and stored in this fashion is also very helpful later down the line when revisiting an analysis that was undertaken previously. Anyone who has undertaken complicated data transformation and analysis using Excel will know how easy it is to lose track of the steps taken to arrive at your final analysis.
- Jupyter Notebooks can easily be shared within a team and published online in conjunction with a story. This enables others to rerun your analysis and understand the exact steps you took to reach your conclusions. Code sharing sites like Github now render them and any visuals so that they no longer require specialist software to view. You can see our recent analysis of property owned by offshore companies in England and Wales in the form of a Jupyter Notebook here.