We have explored in the past how businesses are targeted by fraud schemes. Countries too are victims of fraud. In Europe the Value Added Tax (VAT) is manipulated by criminals that can win hundreds of millions. Let’s see how these rings operate and how to use graph technologies to detect them.
Getting rich with taxes : not a privilege reserved to sovereign states
The Value Added Tax, or VAT is a consumption tax assessed on the value added to goods and services. In the countries that apply it, like the European countries, the consumers pay the VAT tax every time they buy a product or service. It is a very important source of income : France for example levies 135 billions of euros with the VAT tax. That is two times what French citizens pay in income tax.
Contrary to sales taxes that are paid and calculated once, the VAT taxes involve a lot of paperwork as each company has to keep track of it when it makes a transaction. Regardless of whether it is dealing with a simple consumer of with another business.
The VAT tax is complex and costs a lot of money. It is no surprise there are fraudsters trying to take advantage of this, particularly in Europe where it is called “carousel fraud“.
In 2012, in the United Kingdom, a fraud ringleader was jailed for 17 years together with fifteen of his accomplices in five trials for faking sale of 4m phones through ghost companies in a complex £176m VAT scam. Before their arrests, these criminals had results that could turn all bank robbers or drug dealers jealous. Their system generated millions of cash on a regular basis : let’s see how.
How does the carousel fraud work
In the carousel fraud, a fraudster import goods VAT-free. He the sells the goods to a company controlled by an accomplice and charge him the VAT tax. The goods are then sold through a series of companies, each liable to VAT, and finally exported.
The first link in the chain always disappears. He vanishes with the VAT that he has charged to his customer and that he should report and transfer to the tax agency. The final link also disappears but not before he has reclaimed the VAT it has paid from the tax agency.
The carousel fraud requires a lot of sophistication. The criminals have to invest money, create companies and execute a series of transactions in a short amount of time to be successful. The returns are very substantial with various cases in Europe where more than a hundred millions of euros where appropriated by a few criminals.
The added benefits of this fraud is that it is hard to prove and even harder to detect before the criminals vanished. Thankfully graphs can change that!
A fraud detection example for the VAT fraud
To prove this, I have worked with Scott Mongeau, a Data Scientist from SARK7. Together we have prepared a dataset that represents the kind of data national tax agencies can have on the fraudsters. It includes company information, business transactions, tax reports and access to a blacklist of known fraudsters. Each of this source of information can help tax investigators identify fraudsters. Typically the problem is that the information exists in separated silos. It is thus very hard to piece it together and build a complete picture : criminals use this to their advantage and slip through the cracks.
A graph data model is going to help us solve the technical dimension of the challenge :
Here we can see in a single picture how each thing relates to each other. This makes things easier to understand. Furthermore, the graph data model is going to allow us to ask questions by looking at all the data at the same time, instead of focusing on a specific silo.
The traditional fraud detection system the VAT criminals know how to bypass use statistical techniques to identify for example a suspicious company, set of transactions or individual profile. Each result is interesting but we want to look across the different data sources to identify more precisely and faster a potential criminal ring.
Graph analytics are going to help us identify the VAT fraud in our dataset.
Hunting for fraudsters with graph analytics
To identify potential fraudsters based on the data we have, we are going to look for :
- a set of at least three transaction that includes companies from two different countries ;
- we want the company in the middle of the series to be young (fraudsters like to create dummy companies they can easily discard when they disappear) ;
- the transactions have to occur in a short amount of time ;
Together, these characteristics define a fraud pattern. Fraud analysts are experts at articulating these patterns. It reflects their experience of the scams and the signs they look out for to identify them. The fraud analysts cannot however analyse hundred of millions of datapoints. To do that they have to rely on technology.
There are a couple of options to analyse graph data. Today, we are going to use Neo4j, the leading graph database. It embeds a query language, Cypher, designed to search data in a graph.
Here is the Cypher query (designed with the help of Jim Biard) that will return us all the companies that match our fraud pattern :
WHERE a.country <> c.country
WITH p, a, c, rs, nodes(p) AS ns
WITH p, a, c, rs, filter(n IN ns WHERE n.epoch – 1383123473 < (90*60*60*24)) AS bs
WITH p, a, c, rs, head(bs) AS b
WHERE NOT b IS NULL
WITH p, a, b, c, head(rs) AS r1, last(rs) AS rn
WITH p, a, b, c, r1, rn, rn.epoch – r1.epoch AS d
WHERE d < (15*60*60*24)
RETURN a, b, c, d, r1, rn
These few lines of code are sufficient to analyse our data and detect a potential carousel fraud case. The query could be run for example when a new transaction occurs, when a fraudster is added to the financial crime blacklist, etc. A simple query doesn’t make a fraud detection system though. The query will have false positives, miss cases and need to complemented by human analysis. That is where graph visualization comes in.
Graph visualization is the last mile of the analysis
Graph visualization enables analysts to understand graph data faster. For example, we can look at the result our fraud query returns :
In a single picture, we can see that two chains of transactions match our pattern. In orange we see Italian companies, in dark green a US company and in light green a UK company.
In real life, this picture would be the starting point of the investigation not its end. Are these transactions really criminal? Who are the people involved in it? How much is at stake? To answer these questions, an analyst must take a close look at the data. Graph visualization allows to quickly answers these questions.
This time we have chosen to add to the graph the different entities our original companies are linked to. In purple we see the holding companies, in pink the companies and in green the people. The graph shows that the two original transactions are actually linked. In particular, Joint Bridge Co. controls Joint IT Group which is doing business with Swift Co. This gives us an overview of the fraud case that would be long and difficult to build by reading a text or looking at a table.
At this point, we might want to zoom in on the companies based in Italy.
In a few clicks, we can see that the fraud seems to start with Cletis Bysshe. He is a Director of Souther Europa Telco. That company has solde phone cards rights to Joint Bridge Co. That information can be accessed while looking at the graph.
Graph visualization complements the automatic graph analysis. The algorithms are great to surface information hidden in large datasets but in sensible contexts like fraud, we need humans to analyse the information before decisions are made. Graph visualization solutions like Linkurious help data analysis experts investigate suspicious cases : they can decide whether a case is not suspicious or further investigate the real cases.
Tax fraud is a major problem in most countries. In the European Union, some VAT fraud rings rack up hundreds of millions of euros. Graph technologies can help detect these schemes and bring the criminals to justice.