How to use phone calls and network analysis to identify criminals?

Call records are a great source of information on real life networks. We are going to see how graph technologies can be used to analyse these records in order to find potential criminals. That article has been written with the assistance of Ashley Englefield, Detective in California and instructor at Police Technical.

How to use phone calls to identify criminals?

The fact that a mobile phone can be a dangerous thing to have for a professional criminal has entered the popular culture a while ago. In the Wire for example, drug dealers use “burners”, cheap phones they dispose of regularly. Why? because your phone operator is authorized to collect information about whom you call, for how long and from where. In certain circumstances, that data can be used by law enforcement. But do you know the techniques used by police officers to use phone data to aid in arrests and convictions?


We are going to see how using graph technologies it is possible to analyse phone calls to find criminals.

To illustrate our use case, let’s use a common scenario. In a residential neighborhood, a store robbery is committed during the day by a group of 4 criminals. The criminals are masked, use a stolen vehicle and leave no fingerprints. In that kind of case, finding an answer may take a lot of legwork. A witness noticed that one of the criminal used his phone to make a call minutes before the crime.

Equipped with a search warrant, a police officer can contact mobile phone operators to collect information about the calls made and received near the robbery when it happened.

Data model to analyse the network in the phone calls

The data phone operators provide law enforcement is highly tabular. Trying to identify unique phone numbers and their relationships in tabular data is very hard. We are thus going to use the phone calls data to build a graph. That graph will show how the phone numbers are connected by phone calls. From a list of calls, we are inferring a network.

Finding relationships within a spreadsheet is hard.

Finding relationships within a spreadsheet is hard.

For this article, we have prepared a small dataset using Mockaroo. That data is in a spreadsheet format. Here are the columns :

  • FULL_NAME : full name of phone subscriber ;
  • FIRST_NAME : first name of phone subscriber ;
  • LAST_NAME : last name of phone subscriber ;
  • CALLING_NBR : phone number of the caller ;
  • CALLED_NBR : phone number of the person called ;
  • START_DATE : start of phone call ;
  • END_DATE : end of phone call ;
  • DURATION : duration of phone call ;
  • CELL_SITE: ID of cell site used to route phone call ;
  • CITY : city of cell site used to route phone call ;
  • STATE : state of cell site used to route phone call ;
  • ADDRESS : address of cell site used to route phone call ;

We are going to use the data stored in the spreadsheet to build a graph. In order to do that, we need to define a graph model.

Graph model to represent the phone calls.

Graph model to represent the phone calls.

You can see above that our graph model for phone calls is centered around calls. A single phone call connects together 4 entities : 2 phone owners, a location (the cell site the caller was next to when he initiated the call), a state and a city.

It is important to note that in real life, most of the time we would not have access to the names of the phone numbers owners.

Importing the call records

Now that we have defined a model, we are going to populate it with the data stored in the spreadsheet. To store our graph, we will use Neo4j, a popular graph database. Neo4j has a language called Cypher that makes it easy to import csv files.

Here is a script that can turn our data into a Neo4j graph :

The result can be found here.

Now that our data is actually stored as a graph, we are going to be able to analyse it to find our criminals.

Exploring the phone records

What we need first is to identify the criminal who made the phone call. We are going for the sake of this story to assume that the robbery was perpetrated at 2524 Thelma Avenue in Sacramento on the 25th of November, 2014 around 10:40am.

Find the potential suspect

In that case, the police officers would ask the phone operators for the phone calls made 10 minutes before and after 10:40am near 2524 Thelma Avenue. Here is how a phone operator could quickly answer that question using Cypher, the query language for Neo4j :

WHERE b.cell_site = ‘0101’ OR b.cell_site = ‘0102’ AND 1416904730 < toInt(a.start) AND toInt(a.start) < 1416911930
WITH a, b
RETURN c.full_name as caller, d.full_name as called, a.start as time, a.duration as duration, b.address as address

The query above looks for the phone calls made from 2 of the nearest towers from 2524 Thelma Avenue, where the call started between 10:29 and 10:49. Here are the results of that query :

DavidMccoyRachelCarpenter1417746372122524 Thelma Avenue
TimothyStevensSharonAllen141701562692524 Thelma Avenue
IreneGreenElizabethRamirez141491791843962524 Thelma Avenue

This list give us 3 potential suspects. They all have made a phone call in the vicinity of our crime location. The only problem is that we have multiple names. Is one of them our perpetrator?

What is the network of our suspects?

Let’s say that as a police investigator the names is the list of suspects do not ring any bells. We need further digging to identify our perpetrator. We could interview the different suspects and check their background but we are going to use data to speed up our investigation :

WHERE b.cell_site = ‘0101’ OR b.cell_site = ‘0102’ AND 1416904730 < toInt(a.start) AND toInt(a.start) < 1416911930
WITH a, b
WITH c,d
RETURN e,c,d

We are reusing the query we build to find potential suspects by adding a last part that gives us the names of the people they are in contact with. These are the 2nd degree contacts of our suspects.

That same search can be done very quickly with Linkurious. We simply have to type the suspect names in the search bar and then visually query their relationships.

Here is the result :

The 3 suspects and the calls they made. Note that there is no connections between the different suspects.

The 3 suspects and the calls they made. Note that there is no connections between the different suspects.

Above we can see the phone calls made by each of our suspects. If we want to see the persons our suspects are in contact with, we have to display the persons connected to the calls.

The 3 suspects, the calls they made and who they made it to.

The 3 suspects, the calls they made and who they made it to.

Graph visualization makes it easy to search and understand connected data. The picture above sums up the network of our suspects. That information would have require a long investigation with Excel or with traditional BI solutions.

To make the visualization more useful let’s modify the data. Instead of displaying the people, the calls and the locations, we are going to focus on the people. To do that, let’s create a direct relationship called “KNOWS” between everyone who share a phone call. This way we will display less data and it will be easier to analyse what is left.

CREATE (c)-[:KNOWS]->(d);
MATCH (a)-[r]-()
DELETE a, r;

Here is how the new graph schema looks like :

Simplified model for our call records analysis.

Simplified model for our call records analysis.

Visual analysis of the network

First of all, let’s see how the network of our 3 suspects, David Mccoy, Timothy Stevens and Irene Green. The graph is fairly dense and thus hard to read.

34 nodes and 150 relationships : the 3 suspects and the people they know.

34 nodes and 150 relationships : the 3 suspects and the people they know.

I can select one of the suspects to see his connections highlighted.

Highlight of connections for Tim

As a police investigator we are going to assume that we recognize a few names that have already appeared in other investigations : Paul Sims and Richard Greene.

Suspicious people in the graph

These people are not directly tied to the crime we are investigating but they are in contact with someone who is. Visually we can investigate that connection.

The phone call analysis shows that Timothy Stevens is connected to two known criminals : Paul Sims and Richard Greene. They are part of a small community within the larger graph. Among our initial suspects, Timothy Stevens is the most likely to be a criminal. We should focus our investigation on him.

In a few steps, we turned lines and lines of call records into one specific insight : Timothy Stevens is the likeliest suspect in our criminal investigation. In order to achieve that result, we simply used the power of graph analysis.

Police investigation is one of the fields where graph analysis is used, together with more traditional techniques, to discover insights in complex data. Try our demo to learn how to explore graphs!

A note about Ashley Englefield

Ashley Englefield is a 13 year veteran of a moderate sized Law Enforcement Agency in California. Prior to becoming a police officer Mr. Englefield was a United States Marine and obtained a Bachelors degree in Information Systems from California State University Sacramento.

Ashley Englefield

After his time in the patrol division, Mr. Englefield joined the Detective division and worked as an investigator in the narcotics, gangs, and eventually the homicide division.

During his time as a homicide detective Mr. Englefield gained expertise in conducting investigations into cell phones, cell phone records, and many aspects of internet related technologies.

Mr. Englefield is a graduate the University of Cambridge (UK), having earned a Masters degree in Criminology. Mr. Englefield resides in California with his wife and daughter.

Tags: , , , , ,

5 Responses to “How to use phone calls and network analysis to identify criminals?”

  1. Truong Phan May 7, 2017 at 5:00 am #


    I am a newbie in Neo4J and I am using Neo4J version 3.1.4. I attached database which you share but I could not start Neo4J service with the error message:

    Starting Neo4j failed: Component ‘org.neo4j.server.database.LifecycleManagingDatabase@d3e36d9’ was successfully initialized, but failed to start. Please see attached cause exception.

    Please help me to solve it.

    Many thanks,

  2. Christian Sicari May 13, 2018 at 8:32 am #

    What is the datatype used to create START_DATE, END_DATE in Mockaroo? Did you use conditions for duration in mockaroo (example: end_date > start_date?)


  1. L’ingénierie juridique | L'Atelier de Cartographie - August 29, 2015

    […] sur le champ qui se découvre pour des cartographies d’informations contextualisées: les données téléphoniques dans la résolution d’affaires criminelles, les réseaux de comptes […]

  2. Les lignes du destin – L'Atelier de Cartographie - August 21, 2016

    […] Ainsi, les informations retenues ne présentent pas de difficultés particulières à traiter en empruntant au vaste domaine du social network analysis certaines méthodologies. Les méthodes d’analyse des réseaux sociaux appliqués aux questions de sécurité sont aujourd’hui presque devenues une tradition mais on ne doit pas oublier l’importance que jouent les outils de visualisation dans une démarche d’interprétation et d’analyse locale, de dévoilement progressif d’une structure. Je pense ici, bien évidemment, au système développé par dont la boîte à outils me paraît être l’une des plus complètes aujourd’hui pour accompagner l’analyste, le journaliste ou l’enquêteur. […]

  3. Why computers need to learn to ‘disambiguate’ people | The All I Need - January 20, 2017

    […] CCTV technology – I Know This Song, But I Can’t Remember Its Name! – How to use phone calls and network analysis to identify criminals? – FBI Wants No Privacy on Biometric Database – Why Persistent Identifiers Deserve Their […]

Leave a Reply to Christian Sicari Cancel reply