Archive | Tutorial

RSS feed for this section

Reinforcing AML systems with graph technologies

Fighting financial crimes is a daily battle worldwide. Organizations have to deploy intelligent systems to prevent and detect wrongdoings, such as anti-money laundering (AML) control frameworks. We’ll see in this blog post how graph technologies can reinforce those systems.

Using graph technologies to fight financial crimes

In today’s complex economy, law enforcement and financial organizations fight against a wide range of financial crimes: embezzlement, tax evasion, extortion, corruption, terrorism funding or money laundering, to name a few. While tracking down those activities, governments and financial institutions have to deal with a fast moving financial crime landscape and a growing volume of information of various formats.

Graph technologies like Linkurious can be powerful assets to help fight financial crimes. They provide exhaustive overviews of the different entities and their connections. And they support complex data queries on large data-sets in a near-real time environment.

In this article, we’ll focus on anti-money laundering procedures and explore a specific case with a graph approach.

Strengthening AML controls with network analysis

Money laundering is the act of converting proceeds from criminal activities into legal assets, concealing their true origins. Governments have been steadily strengthening AML rules to prevent those activities. Banking institutions are now required to follow strict AML policies and to report money laundering activity suspicions. Ineffective regulation compliance might be penalized with important financial penalties.

Organizations began to develop risk-based AML frameworks to monitor their customers and financial transactions. But criminals deploy sophisticated tactics to hide their wrongdoings. Shell corporations, tax havens or complex financial schemes are used to prevent identification or tracking of money flows. To thwart such criminal strategies, finding information about a specific suspicious entity is not enough. Financial crime units have to investigate the connections between individuals, accounts, companies, locations, to trace complex transactions. This is why network analysis and visualization technologies turned out to be efficient tools to support AML processes.

We will see below how graph technologies like Linkurious can be an additional asset when it comes to monitoring high risk customers for example.

Financial activities visualized as networks

Banking institutions keep track of numerous information sources about their customers (individuals or companies) and their financial activities. Graph database (GDB) technologies like Neo4j, Titan, AllegroGraph or DataStax Enterprise Graph allow to index complex connected data and easily query them to find patterns. With such systems, organizations can compile various information into a single data model.

A possible graph model of financial informations

A possible graph model of financial information

Linkurious provides an advanced graph interface compatible with numerous graph databases to easily explore and monitor the data.

Identifying money laundering patterns with Linkurious

AML regulations require banks to monitor their high risk customers. Listed on special watchlists, those individuals can be identified either by authorities (e.g Politically Exposed Person, Specially Designated National Lists) or by the institution itself (e.g customers with repeated suspicious transactions). “Are my customers currently involved in activities with flagged individuals?” “If yes, are these activities suspicious?”. Organizations need to be able to answer those questions.

Linkurious offers an interface to monitor graph data in real-time. In addition, analysts can set up alerts for specific patterns with Cypher queries. For instance as an AML analyst, I want to be warned each time there is any type of connection between my customer’s financial activities and my watch-list. I can use the following query to create my alert in the system:

Creation of an alert query

If new data are collected, such as transactions, persons, companies or relationships, Linkurious will automatically update and look for suspicious connections. With the advanced graph visualization interface, it’s then easy to investigate and assess the different cases.

Visual investigation of financial activities

The alert system reported several matches to our query. To evaluate the risk-level of the cases, analysts can use the interface to quickly visualize and investigate. Let’s check one of them:

Visualization of one of the matches signaled by the alert query

Visualization of one of the matches signaled by the alert query

In a glimpse, I see that Angela Marshall (a fictitious individual who figures on my watch-list) is indirectly connected to a transaction on a customer’s bank account. She appears to share the same address as my customer, the company Miboo.

This pattern is relatively suspicious. I might want to explore beyond this single connection and see which other entities are linked to this address.

Investigating the entities linked to the address

Investigating the entities linked to the address

In addition to sharing an address with an individual on a watch-list, our suspicious customer also share his address with three other companies and two of our internal employees. They are all living in the city of Hongqiao, China on 3557 Straubel Circle.

The address, 3557 Straubel Circle is located in Hongqiao, in China

The address, 3557 Straubel Circle is located in Hongqiao, in China

As an analyst, I might recognize a known pattern of money laundering: different companies registered with a unique address. Also, some employees are connected to a known high-risk customer. Those information can be reported to higher authorities to further investigation on the field.

Graph analysis focuses on relationships, therefore helps to discover hidden connections between different entities. Linkurious also operates an alert system in a near-real time environment. That way, financial crime units can identify suspicious activity schemes instantly and reinforce their AML regulation system.

Leverage today the power of graph analysis and visualization to fight financial crimes. Try Linkurious demo or contact us to discuss your project.

Using graphs for intelligence analysis

The identification and monitoring of terrorist or criminal networks are imperatives to detect threats and defeat attacks. Let’s see how Linkurious and graph visualizations can help identify and track potential dangerous individuals and networks.

Challenges for intelligence analysis

Criminal or terrorist activities are rarely the acts of isolated individuals. Behind these activities we find more or less centralized organizations or networks. Intelligence experts are in charge of identifying every actors of such groups, despite their strategies to hide their connections to the networks (encrypted communication services, numerous middlemen, fake identities, etc). Getting the whole picture of the network is essential to monitor suspect activities, prevent attacks or detected potential threats.

Countering such activities is also about gathering as much information as possible, from any possible sources. The more data intelligence and security organisms are able to obtain, the easier it is to track and anticipate criminal or terrorist activities. This means that analysts and investigators have to handle large sets of heterogeneous data.

Graph analysis is particularly suited to this sort of challenge. Graph databases allow organizations to store and query in near real-time the relationships between billions of entities. Let’s see how these systems, combined to tools like Linkurious, can help intelligence analysts identify and investigate threats.

Applying a graph approach to intelligence analysis

We will dive into the investigation of a potential terrorism threat and explore how Linkurious can help identify and investigate suspicious networks.

For this purpose, we have created a dataset with fictitious data about people, including addresses, phone numbers and travel information. This data can easily be modeled as a graph:

Graph data model of our investigation data

Graph data model of our investigation data.

To keep our analysis understandable we chose a very simple model with only a limited volume of data. An authentic situation will definitely involve larger volumes and a wider range of data types.

Data entities, such as individual, email, phone, are modeled as nodes. Relationships between entities are symbolized with edges, labeled with the nature of the connection. The data then forms a network.

In our graph model we have five types of nodes: people, countries, addresses and phone numbers, and as many types of edges, or relationships.
Let’s start our investigation by trying to detect suspicious patterns in our data.

How to use graph patterns to detect potential threats

When dealing with large datasets, we need to find ways to focus the analysts’ attention on relevant information. Here, we want to detect potential terrorist cells. We are going to try to detect groups of at least three people who 1) visited an at-risk country (in our case Syria) and 2) are indirectly in contact (via their addresses or phone communications).

With a simple Cypher script query, Linkurious users can set up a monitoring activity for chosen patterns. Below is the script we will use to identify our pattern:

// Detecting threats:
MATCH (a:Person)-[s:HAS_CONTACTED|HAS_PHONE|HAS_ADDRESS*..10]-(b:Person)-[:HAS_BEEN_TO]->(d:Country {name:’Syria’})
WITH a, collect(s) as rels,collect(distinct b) as suspects,d,count(distinct b) as score
WHERE score > 2
RETURN a,suspects
ORDER BY score DESC

Linkurious reported three individuals: Jessica Wells, Bobby Murphy and Ruth Warren (on the left of the graph). As an analyst, I can visualize them and how they are interconnected. Jessica, Bobby and Ruth display a “has been to” relationship with Syria and appeared to be all connected to a unique phone number: Judy Lewis’ (on the right of the graph).

Visualization of a suspicious network around Jessica, Bobby & Ruth

Visualization of a suspicious network around Jessica, Bobby & Ruth.

Several nodes intermediate between our three people and Judy’s phone number. Phone calls and address are the bridges enabling the connection between our individuals. For analysts, this particular pattern could be pointing toward a recruiting network, with numerous middlemen to avoid detection. Those results could lead to specific recommendations and further investigations.

A graph approach provides the opportunity to detect specific cross-data patterns. With Linkurious, it is easy to visualize and understand both the network and the relationship between its members. Node-edges graph visualizations combine all the available information in a single representation.
Some of the nodes here seem to be connected to other entities. Linkurious allows analysts to interactively explore the data and uncover new information.

Investigate complex network with graph visualization

We identified a potential network with several people. Perhaps they have accomplices? We can try to investigate further, starting from one node of the network. Let’s pick Judy’s phone number for instance and extend the nodes around it.

Investigating Judy’s closest connections via her phone number

Investigating Judy’s closest connections via her phone number.

Judy is connected to a certain Robert Wells, via phone communications, and Robert is himself connected to Theresa Mills’ phone number. If we expand the nodes linked to Theresa’s phone, we get the following visualization.

Visualization of a sub-network around Theresa’s phone number

Visualization of a sub-network around Theresa’s phone number.

The sub-network around Theresa Mills is very specific. The nodes, all linked together, are phone numbers associated to seven individuals. Such pattern -a  small highly connected group with a unique bridge to other potential suspects – represents a sub-network within the larger network we are investigating.

From a single node, we went up to another group, gathering new information about the network. Interactive and scalable tools like Linkurious ease the exploration and analysis for experts.

Visualize and analyse intelligence and security data with Linkurious

Graph approaches are well suited for the investigation of criminal network and terrorist groups. Linkurious offers to intelligence agents a unique entry point to identify hidden insights in complex connected data. Analysts can determine specific pattern to monitor suspicious activities. The visualization interface allows them to navigate between the nodes to identify new key actors through hidden connections.

Discover how you can identify hidden insights in your graph data and try the demo of Linkurious.

Graph data visualisation for cyber-security threats analysis

 In this blog post, we will offer an overview on how to deal with Security information and event management/log management (SIEM/LM) data overflow. Let’s see how Linkurious’ advanced graph visualisation solution helps easily identify and investigate cyber-security threats.

Switching to a data lake architecture is often a required first step for analysts who wish to use graph data visualisation solutions such as Linkurious to start visualising their SIEM/LM data. Linkurious enables analysts to deal with SIEM/LM data overflow and perform precise real-time and/or post-attack forensics analysis. In the second part, we will demonstrate the extent of Linkurious’ possibilities using a real life SIEM/LM data-set use case and perform a forensics analysis example.

Dealing with SIEM/LM data overflow: putting security analysts back in control

SIEM/LM solutions have evolved continuously over the last 15 years to match the ever changing landscape of cyber-security threats. SIEM/LM solutions aim to provide analysts all the necessary information and context they need to determine the nature of an attack, its degree of sophistication and of proliferation inside the network. To efficiently contain security breach damages and react efficiently, analysts need the right information at the right time.

Today, it still remains a considerable challenge for organisations of all sizes to meet their necessary operational, audit and security needs. As networks become more and more complex, the number of devices to monitor has significantly increased. Analysts are literally overflowed with data. Because of that, aggregating these different SIEM/LM data sources together has become a challenge in itself. These significant framework limitations disable analysts. They have too much data, but not enough information. There is a real need to reduce the scale and complexity of the analysis to a more intelligible level in order for analysts to come up with appropriate solutions to improve overall security. Advanced data visualisation solutions enable just that.

But for the moment, SIEM/LM solutions still rarely include data visualisation tools. Even if they do, they are not efficient at treating such big amounts of data and do not offer real-time pattern detection and exploration possibilities. Right now most companies relying on SIEM/LM data visualisation solutions only use them for illustrations purposes rather than for analysis. They often have to rely on external services to carry out post-attack forensics as these operations require a lot of skill and time.

Using graph data visualisation tackles this problem and makes SIEM/LM data operational again

Today, the trend in the cyber-security world to resolve these issues is to switch from the traditional data warehouse framework to more flexible and scalable data backends. This enables the use of new tools such as graph data visualisation analytics solutions. Typically these new backends take the form of data lake frameworks: often Hadoop combined with other services such as graph databases and other analytics tools. Data Lakes have many advantages compared to data warehouses when it comes to managing terabytes of security logs: centralisation, flexibility, operationality, and high scalability. Companies who are serious about using new analytics applications such as Linkurious for their SIEM/LM data will have to make the switch sooner or later. One might also add that depending on the company’s needs, the switch can be fairly non-intrusive for the overall existing system architecture.

How Linkurious empowers security analysts

Once the SIEM/LM data is centralised into the data lake, using a graph data visualisation solution like Linkurious to explore and investigate the data provides analysts with a real added value for their everyday operations. They are operational in real time, can visualise the data instantly and can carry out precise post-attack forensics analysis in much simpler ways than ever before. The detection of suspicious activity patterns can be largely automated using pattern recognition algorithms. That way, analysts can focus on investigating suspicious activity visually.

Visualisation is empowering for analysts as it resolves to a great extent the problem of having large amounts of data to interpret. Visualisation considerably reduces the scale and complexity of the analysis. It also allows companies to carry out most of their forensics analysis internally. With Linkurious’ advanced collaboration and security features, analysts are able to work together, share visualisations between them, and administer user access rights to the data. Finally, the advanced customisation possibilities that Linkurious offers allows its integration into internal security systems.

Next, we will demonstrate Linkurious’ possibilities using a real-life SIEM/LM dataset to see the advantages of graph visualisation technology to monitor networks in real-time and perform advanced forensics analysis.

Using Linkurious for cyber-security: a real-life use case

This dataset was created using a real life log archive of an enterprise network. Courtesy of the University of Victoria who created and made public the dataset for general research purposes. The dataset is the combination of several existing publicly available malicious and non-malicious SIEM/LM log datasets. The dataset reproduces the day to day usage of an enterprise network. More information on the dataset here.

The PCAP files were generated with Wireshark and we converted it into a CSV file. We then generated several CSV files to model the dataset and import it into Neo4j.

Modelling

We used the following model for the Neo4j database:

Cyber-Security Linkurious

Import Script:

cd C:\Users\linkurious\Downloads\neo4j-community-3.0.1-windows\neo4j-community-3.0.1\bin

neo4j-import –USING PERIODIC COMMIT 1000 –skip-bad-relationships –C:\Users\linkurious\Downloads\neo4j-community-3.0.0-RC1-windows\neo4j-community-3.0.0-RC1\bin –nodes nodeip.src.csv –nodes nodeport.csv –relationships Relationshipdst.portip.dst.csv –relationships RelationshipIP.srcdst.port.csv –into C:\

The connections were aggregated together with the start date and end date to reduce the number of edges. Creating an edge for each transmitted packet would create super nodes and make the graph very difficult to read. The model we use is very simple, but the modeling can be made to fit very specific use cases depending on what the analyst is looking for.

Using Linkurious to identify a UMTP storm botnet

Linkurious enables analysts to visualise data that is otherwise seemingly difficult to conceptualise. Experienced analysts know what “normal” behaviours looks like on the network they manage. This makes it possible for them to set pattern detection algorithms that will pull up abnormal behaviours from the database. For example, the following visualisation shows a “normal” interaction in the network. IP’s interact with a wide variety of different service ports of 131.243.125.208. 

normal behaviour Linkurious cyber-security

Normal activity on the network

On the other hand, here is an abnormal behaviour pattern. Most of the IPs that connect to “172.16.0.11”use port 25 (SMTP Port) and don’t generate any other traffic than that on any other services. This is suspicious in itself. But the large number of IP’s doing the same operation at the same time seem to indicate a botnet network carrying out a UDP storm attack. These attacks are basically a denial of service attack (DoS).  

Cyber-Security graph Visualisation Linkurious

A UDP storm attack

If a geolocation service fetches the GPS coordinates of the IP addresses, it is possible to visualise them directly on a map. In one click, using Linkurious geospatial visualisation feature, we can see that most of the IPs that are part of the botnet network are in the same region. Most of them come from around Odessa in Ukraine.

Graph data visualisation for cyber-security threats analysis

Geospatial representation of the IP adresses of the UDP Botnet attack

Graph data visualisation for cyber-security threats analysis

Zoom in to the most concentrated activity region

botnet attack Graph data visualisation for cyber-security threats analysis

Most of the toxic traffic comes from Ukraine around Odessa

 

We can then explore the activity of specific IP addresses and see which services were affected by their activity. For example, the address “12.166.237.145” has other links that we haven’t examined yet. Let’s examine it separately and expand it to see all its connections. That way, we see it links to another IP on our network: “172.16.0.12”.

otherattack2

Exploring 12.166.237.145 connections on the network

If we expand the IP address “172.16.0.12” to see its connections, we find it is connected to another attack. This means the two are probably linked together and that the network was maybe compromised several times. The attack follows the same pattern as the first SMTP storm attack we just saw.

botnet attack graph visualisation

A second botnet attack

Linkurious: graph data visualisation for cyber-security threats analysis

This simple use case shows the great potential graph visualisation technology has for cyber-security analysts. Analysts can now start to make sense of their connected data and investigate any suspicious behaviours on their network. Graph Visualisation offers a high level of precision for analysts to quickly understand any kind of security event. Assessing the degree of sophistication of an attack and reacting accordingly becomes easier than ever before.

Once the company’s data framework ready for graph data visualisation  Linkurious will become a solid ally for all security analysts. The multiple possibilities that solutions like Linkurious offer enable analysts to overcome the overflow of SIEM/LM data and extract the information they need. Graph visualisation has the potential to reduce the complexity of their analysis, making SIEM/LM data operational. Forensic analysis also becomes less expensive as it is now possible to conduct it internally more often.

Graph technology enables the automation of a large part of the detection process. That way, analysts can focus on investigating the security alerts on the network. Linkurious’ collaboration features also enable them to work together more efficiently and rapidly. Linkurious meets all security standards for such sensible data and provides all the necessary tools to administrate user rights access. Using a graph-based approach also offers many advantages when working with non-technical users and other departments inside the company because of its inherent simplicity. Who doesn’t understand nodes and edges?

Want to explore and understand your graph data? Simply try the demo of Linkurious or contact us!

Investigating Enron’s email corpus: The trail of Tim Belden

In fraud and white collar crimes, forensic investigators often have to go through massive amounts of complex connected data to gather proofs and evidence for their cases. In the recent years, the development of graph databases and data visualization tools have made it much easier to quickly find information that would have taken days to find by other means. Let’s see how Linkurious can help investigate a real life email network dataset to establish responsibilities or proofs of guilt. We’ll use real emails coming from Enron, one of the biggest financial scandal in US history.

Investigating Enron’s emails

In October 2001, the U.S. Securities and Exchange Commission (SEC) began investigating what would rapidly become known worldwide as the Enron scandal. The energy company had been using accounting loopholes and offshore platforms to conceal billions of dollar of debt in its financial reports for years. It was also found to have manipulated the Californian and Canadian energy market to push prices up artificially to increase its profit. The scandal eventually led to Enron’s bankruptcy making it the biggest company reorganization in American history at the time. Many executives were indicted and trialed.

During its investigation, the Federal Energy Regulatory Commission (FERC) made the controversial decision to publish online all of the company’s emails for transparency, historical and academic research purposes. The “Enron email corpus”, as it is now widely known, constitutes the largest public domain database of real world company e-mails in the world and has been used in a very large range of studies and research projects worldwide.

Importing the email corpus into Neo4j

To start exploring the corpus, we needed to import it into a Neo4J graph database. In order to do so, we relied heavily on Arne Hendrik Schulz’s work and his MySQL 4 dumps of the dataset that we turned into CSV files. The result is a graph with 328,209 nodes and 2,317,231 relationships. You can learn more about how to import large datasets into Neo4j here.

Enron email corpus

Our graph model is pretty simple, we have 2 types of nodes: persons and emails. Persons are linked to emails by “HAS_RECEIVED” and “HAS_SENT” relationships. We could use Linkurious to explore the email contents themselves, but for this article, our interest is more to explore the network of key executives in the scandal to see if we can find interesting information that could be useful for investigators.

Investigating Tim Belden’s network

Tim Belden, the head of trading at Enron, was one of the first executives to be prosecuted and to admit wrongdoings at Enron. He pled guilty on charges of conspiracy to commit wire fraud as part of a plea bargain and agreed to cooperate with the authorities to help convict many top Enron executives. He’ll be the starting point of our fictive investigation. Let’s see if we can find relevant information just by analysing his email activity.

The first problem we have to deal with here is that a lot of emails he sent and received were directed to many recipients. The ones that are really interesting to us as investigators are his personal emails. An easy and quick way to isolate them is to expand only the least connected nodes in his sent and received emails. That way, we find the interlocutors with whom he had direct one-to-one contact. This method is really effective if we do not need a 100% precision level to explore the data.

A quick look at the graph shows that he used his email address primarily to send emails to the Enron’s World Trade center Office: ‘center.dl-portland@enron.com’. But he did send a few emails to individuals inside the company as well.

Enron Belden network

Belden’s sent email activity

Now, if we get rid of all the emails sent to the WTC office and add the 200 least connected emails he received we get a map of all his interactions inside Enron. After cleaning the uninteresting emails we see that his primary interlocutors inside the company were: John Lavoreto, Jeff Dasovich, Kevin M. Presto, Philip K. Ellen, Louise Kitchen and Kate Symes, all top executives at Enron. Dasovich was Enron’s governmental affairs executive, Presto was Vice President, Lavoreto and Kitchen were senior traders, and Ellen and Symes were both traders as well.

Enron Belden emails

Belden’s cleaned email activity map

Assessing Belden’s relationships

Now let’s play the part of a forensic investigator who wants to assess Belden’s Relationships inside the company. Lavoreto appears to be by far the individual with whom he had the most interactions even though he only sent a few emails to him. With such information, an investigator could have decided to investigate their relationship furthermore. Doing so he could have discovered a conversation between the two proving that they both knew Enron was actively manipulating the Canadian energy market in August 2000. The scam operation was called project Stanley. As the FERC most probably lacked the tools to explore the dataset efficiently, this story only came out in 2005. If they had had a tool like Linkurious they would have been able to spot significative relationships more easily and would have known which emails to drill into.

download (1)

Shortest paths between Lavoreto and Dasovich

Now, we can also investigate whether the people in Belden’s first circle knew each other. An easy and effective way to do this is to use the “find the shortest path” feature of Linkurious. For example, let’s check if Lavoreto and Dasovich interacted together directly. Instantly we see that they never exchanged any private emails but only received the same chain emails with many recipients.

On the other hand, Lavoreto and Presto did have many private email interactions. It could be interesting to investigate their relationship as well since they are both connected to Belden.

Lavoreto Kevin interactions Enron

Shortest paths between Lavoreto and Presto

A quick search on google tells us that the FERC established in 2002 that “Presto’s role paralleled that of Tim Belden” and that he was also involved in project Stanley too. Using the dataset we can establish that Balden, Lavoreto and Presto were part of the same circle inside and communicated together.

Querying the dataset

Now let’s see how we can return nodes that fit more complex patterns and criteria in the dataset. Cypher queries, Neo4j’s query language, can be entered directly in Linkurious. For example, this request returns all the nodes ending by “@enron.com” that never sent any emails. This could be a potentially useful query if the investigators suspect some emails were deleted from the dataset and they wish to check which email addresses were altered.

// Cypher request:
MATCH (p:Person)-[s:HAS_RECEIVED]->(m:Mail)
WHERE p.email =~ “.*@enron.com”
RETURN p
LIMIT 10

Here we have three results, but it doesn’t seem to highlight any wrongdoing on Enron’s side:

Cypher query enron

 

Another good example of graph query would be to find all the personal emails connected to a person. The following query returns all the emails that have less than 3 connections and were sent or received by Tim Belden:

// Cypher request:
match (n)–()
with n,count(*) as rel_cnt
where 1<rel_cnt<=2
match (n:Mail)–(:Person{email:”tim.belden@enron.com”})
return n

Cypher query Enron Belden

The result is nearly exactly the same as what we had earlier when we expanded Belden’s least connected emails, except this time we’re sure not to have missed any nodes that fit the criteria we have set. It is just a more rigorous and precise way of obtaining a map of his interlocutors, but at least we’re sure not to miss a single email!

 

If anything this exercise demonstrates the power of graph visualisation when investigating or auditing a network. Without even having read the emails, we managed to establish who belonged in Belden’s first circle inside Enron and established that some people in his network knew each other as well. It turned out that Belden, Lavorato and Presto indeed knew about project Stanley and were all potentially involved in it. Linkurious is the perfect tool to investigate social networks in detail, find key people and communities, establish responsibilities and relationships. Linkurious can be used to conduct large-scale audits or investigation inside large organisations of any kind.

Want to explore and understand your graph data? Simply try the demo of Linkurious!