Fraud detection in retail

Fraud detection is all about connecting the dots. We are going to see how to use graph analysis to identify stolen credit cards and fake identities. For the purpose of this article we have worked with Ralf Becher, irregular.bi. Ralf is Qlik Luminary and he provides solutions to integrate the Graph approach into Business Intelligence solutions like QlikView and Qlik Sense.

Third party fraud in retail

Third party fraud occurs when a criminal uses someone else’s identity to commit fraud. For a typical retail operation this takes the form of individuals or groups of individuals using stolen credit card to purchase high-value items.

Fighting it is a challenge. In particular, it means having a capability to detect potential fraud cases in large datasets and a capability to distinguish between real cases and false positives (the cases that look suspicious but are legitimate).

Traditional fraud detection systems focus on threshold related to customers activities. Suspicious activities include for example multiple purchases of the same product, high number of transactions per person or per credit card.

Graph analysis can add an extra layer of security by focusing on the relationships between fraudsters or fraud cases. It helps identify fraud cases that would otherwise go undetected…until too late. We recently explained how to use graph analysis to identify stolen credit cards.

For the this article, we have prepared a dummy dataset typical of an online retail operation. It includes:

  • order details: product, amount, order-id, date;
  • personal details: first name, last name;
  • contact info: phone, email;
  • payment: credit card;
  • shipping: address, zip, city, country;
  • tracking: IP address.

To analyse the connections in our data, we stored it in a Neo4j, the leading graph database. The graph approach lies in modelling data as nodes and edges. Here is a schema of our data represented as a graph:

Graph data model

Graph data model

You can download the data here.

Finding suspicious transactions

Now that the data is stored in Neo4j, we can analyse it.

First of all we need to set a benchmark for what’s normal. Here is an example of a transaction:

Legitimate account

Example of a legitimate account

Now that we have an idea of what not to look we can start thinking about patterns specifically associated with fraud. One such pattern is a personal piece of information (IP, email, credit card, address) associated with multiple persons.

Neo4j includes a graph query language called Cypher that allows us to detect such a pattern. Here is how to do it:

//———————–
//Detect fraud pattern
//———————–
MATCH (order:Order)<-[:ORDERED]-(person:Person)
MATCH (order)-[]-(fact)
WITH fact, collect(order) as orders, collect(distinct person) as people
WHERE size(orders) > 1 and size(people) > 1
RETURN fact, orders, people
LIMIT 20

What this query does is search for shared personal pieces of information. It returns all groups of at least two persons and two orders connected by a common personal information.

To verify the accuracy of our query, fine-tune it or evaluate how to act on the alerts it returns, we will use graph visualization.

Case#1: multiple people sharing the same email

The address edmund@gmail.com is shared by 3 people

The address edmund@gmail.com (center) is shared by 3 people (purple nodes)

Here we can see that 3 persons are sharing the same email. Are we looking at a potential fraud? If we expand the graph, we can see that 3 persons have distinct addresses, IPs, phones and credit cards.

graph visualization

Data associated with the 3 distinct people using edmund@gmail.com

In isolation, each of this person looks normal. Edmund Cagliostro for example seems like a legitimate customer.

Details of Edmund Cagliostro

Details of Edmund Cagliostro

The fact that these seemingly distinct accounts share a common address is suspicious. It justifies to further investigate Edmund Cagliostro and its connections.

Case#2: multiple people using the same IP address

Our query also reveals an IP address shared by multiple persons.

4 - 0.106.244.75 and its connections

An IP address (center) with connections to 5 persons (purple) and orders (orange)

 

We can see that IP address 0.106.244.75 is shared by 5 people. Once again this is suspicious and should be investigated.

Graph visualization can help us inspect potential fraud cases and quickly evaluate them.

Identifying a ring of fraudsters

Now that we have found a couple of suspicious fraud cases, it’s time to dig deeper. We want to assess the full impact of an individual fraud to take appropriate actions.

Let’s say we noticed in our dummy dataset that a “Leisa Gugliotta” is involved in a fraud. Not only do we want to block any transactions from her but we also need to identify her potential accomplices. In order to do that, we need to see who else is using the personal information used by Leisa Gugliotta.

Here is how to do that via Cypher:

//———————–
//Who are Leisa’s accomplices?
//———————–
MATCH (suspect:Person {full_name:”Leisa Gugliotta”})
MATCH (fact)<-[:USED_EMAIL|:USED_PHONE|:USED_IP|:USED_CREDIT_CARD|:USED_ADDRESS]-(suspect)
MATCH (fact)<-[:USED_EMAIL|:USED_PHONE|:USED_IP|:USED_CREDIT_CARD|:USED_ADDRESS]-(other)
WHERE suspect <> other
RETURN suspect,other,collect(distinct fact) as facts
LIMIT 20

We can run the same analysis via Linkurious. The result is the following graph:

5 - fraud ring of  Leisa Gugliotta

The people involved the fraud ring led by Leisa Gugliotta

 

This picture makes it easy to view that our retail operation has been targeted by a fraud ring. Leisa Gugliotta shares a credit card with one other person and a email address with 4 people. These fraudsters can all be identified by the connections between them. Now we can freeze their accounts and add their information to our blacklist.


Third party fraud means that personal pieces of information are reused to create fake identifies (know as synthetic identities). Graph analysis makes it possible to spot that pattern and prevent fraud. Through graph visualization, we can quickly evaluate potential fraud cases and make informed decisions. Try Linkurious now to learn more!

Tags: , , , , , , , , ,

9 Responses to “Fraud detection in retail”

  1. JohnM June 6, 2015 at 1:17 am #

    Hi, I really enjoy these posts. A small point on this one is that the image titled “An IP address (center) with connections to 5 persons (purple) and orders (orange)” is not clickable unlike the other ones. 🙂

  2. Al June 17, 2015 at 3:22 pm #

    Amazing how easy this is with Neo4j and its Cypher query language. To bad Linkurious is not free and open source 🙂

  3. Al June 22, 2015 at 7:36 pm #

    Linkurious has documentation, including setup for their commercial product. However the scripts they provide do not apply to the open source version on github at https://github.com/Linkurious/linkurious.js
    The scripts they refer to do not exist among the files in the open source version.
    Linkurious claims on their web site that there version “is” available on github, but the link is broken. I believe that since they now charge a price for their version, they had to remove the free version they had on github. I think this was just to get people interested in their product.
    Since their is not documentation for linkurious.js, I will not be using it.

    • jean June 23, 2015 at 8:34 am #

      Hello,

      May I ask where is the broken link? Hopefully we can fix it.
      There seems to be a confusion: linkurious.js can indeed be downloaded for free.

      It is available here: https://github.com/Linkurious/linkurious.js
      Documentation for linkurious.js is here: https://github.com/Linkurious/linkurious.js/wiki
      The doc has been used by our customers (who had no prior knowledge of the product) to build beautiful web-apps.

      linkurious.js is free to use for open-source project. Commercial projects do need to buy a license though: https://linkurio.us/toolkit/#pricing

      In addition to linkurious.js, we sell Linkurious Enterprise and Linkurious Starter. These are available only through a commercial license. You need to buy licenses to download these products: https://linkurio.us/product/#plans

      Does that make things clearer?

  4. gul-ash July 13, 2015 at 6:44 am #

    why would one need a graph database to find similar items? RDBMS does it as well. Intresting would be how “close/near” potential accomplices are to an identified person, by sharing the not unified information, like adress, house number etc

  5. Bill August 12, 2016 at 5:39 am #

    Be careful assuming that it is suspicious for people to share IP addresses. What if they are on public WiFi? The same could be true about street addresses and phone numbers in cultures where it is more common to share resources. The people sharing the phone or address could be family or room mates. While it may be a good starting point to look for shared resources, the evidence should considered in tandem with other fraud indicators.

Leave a Reply