We have explored in the past how businesses are targeted by fraud schemes. Countries too are victims of fraud. In Europe the Value Added Tax (VAT) is manipulated by criminals that can win hundreds of millions. Let’s see how these VAT fraud rings operate and how to use graph technologies to detect them.
Getting rich with taxes : not a privilege reserved to sovereign states
The Value Added Tax, or VAT, is a consumption tax assessed on the value added to goods and services. In countries that apply it, like the European countries, consumers pay the VAT tax every time they buy a product or service. It is a very important source of income: France, for example levies more than 200 billion of euros with the VAT tax. That is two times what French citizens pay in income tax.
Contrary to sales taxes, that are paid and calculated once, VAT taxes involve a lot of paperwork as each company has to keep track of it when it makes a transaction. Regardless of whether it is dealing with a simple consumer of with another business.
The VAT tax is complex and costs a lot of money. It is no surprise that fraudsters are trying to take advantage of this, particularly in Europe where it is called “carousel fraud.”
In 2018 , the European Anti-Fraud Office (OLAF) arrested eight criminals running a highly sophisticated carousel fraud scheme designed to avoid VAT duties in several countries across the EU. The amount of evaded VAT was estimated at around €30 million. Let’s see how the VAT criminals proceed.
How does the carousel fraud work
In the carousel or VAT fraud, a fraudster import goods VAT-free. He sells the goods to a company controlled by an accomplice and charges him the VAT tax. The goods are then sold through a series of companies, each liable to VAT, and finally exported.
The first link in the chain always disappears. He vanishes with the VAT that he has charged to his customer and that he should report and transfer to the tax agency. The final link also disappears but not before he has reclaimed the VAT it has paid from the tax agency.
The carousel fraud requires a lot of sophistication. The criminals have to invest money, create companies and execute a series of transactions in a short amount of time to be successful. The proceeds are very substantial with various cases in Europe where more than a hundred millions of euros were appropriated by a few criminals.
The added ‘benefit’ of this fraud is that it is hard to prove and even harder to detect before the criminals vanished. Thankfully graphs can change that!
A fraud detection example for the VAT fraud
To prove this, I have worked with Scott Mongeau, a Data Scientist from SARK7. Together we have prepared a dataset that represents the kind of data national tax agencies can have on the fraudsters. It includes company information, business transactions, tax reports and access to a blacklist of known fraudsters. Each of this source of information can help tax investigators identify fraudsters. Typically the problem is that the information exists in separated silos. It is thus very hard to piece it together and build a complete picture: criminals use this to their advantage and slip through the cracks.
A graph data model is going to help us solve the technical dimension of the challenge:
Here we can see in a single picture how each thing relates to each other. It’s one of the advantages of graphs: it makes things easier to understand. Furthermore, the graph data model is going to allow us to ask questions by looking at all the data at the same time, instead of focusing on a specific silo.
The traditional fraud detection system the VAT criminals know how to bypass use statistical techniques to identify for example a suspicious company, set of transactions or individual profile. Each result is interesting, but we want to look across the different data sources to identify more precisely and faster a potential criminal ring.
Graph analytics is going to help us identify the VAT fraud in our dataset.
Hunting for fraudsters with graph analytics
To identify potential fraudsters based on the data we have, we are going to look for:
- a set of at least three transactions that includes companies from two different countries;
- we want the company in the middle of the series to be young (fraudsters like to create dummy companies they can easily discard when they disappear);
- the transactions have to occur in a short amount of time;
Together, these characteristics define a fraud pattern. Fraud analysts are experts at articulating these patterns. It reflects their experience of the scams and the signs they look out for to identify them. The fraud analysts cannot, however, analyze hundreds of millions of data points. To do that, they have to rely on technology.
There are a couple of options to analyze graph data. Today, we are going to use Neo4j, the leading vendor among the graph database ecosystem. It embeds a query language, Cypher, designed to search data for patterns in graph data.
Here is the Cypher query (designed with the help of Jim Biard) that will return us all the companies that match our fraud pattern:
These few lines of code are sufficient to analyze our data and detect a potential carousel fraud case. The query could be run for example when a new transaction occurs, when a fraudster is added to the financial crime blacklist, etc. A simple query doesn’t make a fraud detection system, though. The query will have false positives, miss cases and need to be complemented by human analysis. That is where graph visualization comes in.
Graph visualization is the last mile of the analysis
Graph visualization enables analysts to understand graph data faster. For example, we can look at the result our fraud query returns:
We can see that two chains of transactions match our pattern. We see Italian companies, a US company, and a UK company.
In real life, this visualization would be the starting point of the investigation, not its end. Are these transactions really criminal? Who are the people involved in it? How much is at stake? To answer these questions, an analyst must take a close look at the data. Graph visualization tools like Linkurious Enterprise allows to quickly answer these questions.
This time we have chosen to add to the graph the different entities our original companies are linked to. With the flag background are the holding companies, in blue the companies and in turquoise the people. The graph shows that the two original transactions are actually linked. In particular, Joint Bridge Co. controls Joint IT Group which is doing business with Swift Co. This gives us an overview of the fraud case that would be long and difficult to build by reading a text or looking at a table.
At this point, we might want to focus on the companies based in Italy.
In a few clicks, we can see that the fraud seems to start with Cletis Bysshe. He is a Director of Southern Europa Telco which sold phone cards rights to Joint Bridge Co. That information can be accessed while looking at the graph and data properties.
Graph visualization complements the automatic graph analysis. Algorithms are great to surface information hidden in large datasets, but in sensitive contexts like fraud, we need humans to analyze the information before decisions are made. Graph visualization platforms like Linkurious Enterprise allow data analysis experts to investigate suspicious cases faster: they can search, expand, filter and review data connections to decide whether or not a case is suspicious, or further investigate the real cases.
VAT fraud is a major problem in most countries. In the European Union, some VAT fraud rings rack up hundreds of millions of euros. Graph technology is a real asset for investigators, helping them detect these schemes and bring criminals to justice.