Lyft vs Uber : visualizing fraud patterns

You might have heard that the competition between Lyft and Uber, the leading car-sharing services, is heating up. Accusations are flying and each party accuses the other of sabotage. Employees of each company would have ordered and cancelled thousands of rides. Both Uber and Lyft claim to have data to back up their allegations. Of course, at Linkurious we have no insights on the situation outside of what has been published. We do love stories that include deception and data though : we thought it would be fun to use the Lyft vs Uber case to illustrate how to use graph visualization to identify suspicious patterns.

The digital fingerprints of ride cancellations

The various outlets which have covered the car-sharing dispute allege the same thing :

  • employees from Lyft/Uber have ordered rides from their competitors ;
  • at the last minute they cancelled the rides ;

This tactic frustrates drivers who lose time (and thus money) chasing phantom customers. It also deprive real customers from accessing potential drivers. It sounds like a good way for not-so-honest people to disrupt a competitor!

Lyft claims to have spotted the tactic by investigating its data. Analysts at Lyft :

Cross-referenced the phone numbers associated with known Uber recruiters with those attached to accounts that have canceled rides. They found, all told, 5,560 phantom requests since October 3, 2013.

In other words, Lyft looked at the cancellations and found some intriguing links between them. Lyft alleges that some cancellations were linked to the same accounts, these accounts were linked in turn to the same phone number…and these phone numbers could be traced back to Uber’s employees.

A few of these employees would have acted with a lot of enthusiasm :

One Lyft passenger, identified by seven different Lyft drivers as an Uber recruiter, canceled 300 rides from May 26 to June 10. That user’s phone number was tied to 21 other accounts, for a total of 1,524 canceled rides. Another Uber recruiter created 14 different accounts responsible for 680 cancellations. A single account from a Los Angeles-based Uber representative canceled 49 rides from October through mid-April.

How long did it take Lyft or Uber to find about these attacks? Would they have found them if the attack had been executed more discreetly? According to Vallywag, Uber completed more than 800 000 rides per day in late 2013. Finding suspicious patterns in a high volume of data can be challenging.

Can you spot the fraudsters : visualizing fraud patterns

I have prepared a small dataset based on the information Lyft and Uber would have used to identify the “DDoS” attacks. The data contains some 30 rides (canceled or not canceled) with the corresponding accounts, phone numbers…and IP addresses. Each order is linked to an account, accounts are linked to one phone number and one IP address. The articles on Lyft and Uber have not mentioned the IPs being used to detect suspicious activities but both companies could have used it.

The entire dataset has been loaded into Neo4j and can be downloaded here. Neo4j is a graph database and can be used to quickly find the connections between various entities, even in large datasets.

I have pulled up this data in Linkurious, the graph visualization tool we develop to understand what the people at Uber of Lyft may have experienced. In light green are IP addresses, in dark green phone numbers, in orange orders (with their dates displayed). The names represent the accounts.

Digital lineup : can you spot the suspect(s).

Digital lineup : can you spot the suspect(s).

What does this picture tell us? We can see 3 separate groups of nodes and edges. When investigating a large dataset identifying patterns is key. In a typical fraud project for example, the data scientists explore the data to find fraud patterns. The data scientist wants to know what is a “normal” pattern and what is a “fraud” pattern. They then write code that can identify them automatically. In the end, the code identifies potential fraud cases. These results are investigated by a human being. He has to look at the data and decide whether it looks suspicious or not.

This process can be done by looking at tables, distributions, histograms but graph visualization is usually what works best for the human brain. After all fraud is always about deception and hiding links.

case a

Case 1 : everything looks fine.

Here we are looking at an average customer. He has one IP address, one phone number and multiple orders. Only one of them has been cancelled. This is the “normal” situation against which we can compare suspicious cases.

Case 2 : there's something fishy...

Case 2 : there’s something fishy…

Here we are visualizing the kind of pattern described by Lyft and Uber. We have one IP address and one phone number. They have been used in combination to create multiple accounts and each of these accounts has canceled 5 rides. We can see that it looks very different from the “normal” customer.

Case 3 : a more sophisticated pattern.

Case 3 : the pattern of a more sophisticated agent.

In this last case, there i still one single IP address but now for each account there is a distinct phone number. If we were only using the phone numbers, we’d miss the connections between Domitrius Dimiatr, Dorotheos Rudolf and Seth Ekewaka. The IP address tie them together and we can see that a single device is used to cancel multiple car rides.

Based on his quick data visualization analysis we could :

  • use our data to look for people who have created numerous accounts tied to one phone number and/or one IP address ;
  • look at more canceled rides and see whether the resemble the suspicious patterns or to the “normal” pattern ;

 

Graph visualization helps quickly identify patterns in data. If you are looking for hidden connections in a large dataset, graphs and graph visualization can help you get results faster!

Tags: , , , , , , , ,

8 Responses to “Lyft vs Uber : visualizing fraud patterns”

  1. Anonymous August 15, 2014 at 6:47 pm #

    It took me a while to realize that “727…” was supposed to be a fake IP address.

    If you’re going to use fake but plausible names like “John Hudson”, why not use fake but plausible IP addresses like “127.26.83.98”? Using numbers that never appear in an IP address is like making fake names with characters that never appear in a name.

    • Simon Shine August 15, 2014 at 7:15 pm #

      I was similarly confused about this and figured that they were some crazy number scheme for American phone numbers that I hadn’t heard of and assumed the IP addresses were implicit. Untill a bell rang. ;)

      • jean August 15, 2014 at 9:33 pm #

        Thank you to the both of you for my pointing my error!
        I’ll update the article tomorrow :)

  2. OriginalGeek August 16, 2014 at 4:49 am #

    Don’t assume malice where incompetence or ignorance can explain a happening. Case in point, while driving for Uber and Lyft simultaneously, when I go offline on Lyft to take an Uber ride, I have on multiple occasions summoned a Lyft driver because of the poor placement of the “Request Ride” button in the Lyft app, the lack of a confirmation prompt, and the general bad design choice of combining the rider and driver modes into the same app. I NEVER request an Uber when switching them off to give a Lyft ride because the Uber Driver app has no “Request Ride” button.

    IMO Lyft should hire some competent UI designers, and make all of their designers and engineers drive for Lyft before slinging mud.

  3. Ross August 16, 2014 at 7:37 am #

    Using IP addresses as unique identifiers becomes more challenging in a mobile environment. It’s very rare for a single subscriber to have anything remotely resembling a static IP address on mobile networks, so one might have to come up with other means to tie users together.

    • jean August 16, 2014 at 8:02 am #

      +1.Do you want to expand on some of the techniques used to tie users together?

  4. Peter Kaye August 18, 2014 at 8:22 pm #

    Does the data specify who cancelled, the driver or the user? That would be a major factor in what is actually happening rather than speculative theories.

  5. steven williams October 15, 2014 at 6:34 am #

    Or it could be customers trying to get a first ride discount

Leave a Reply