The NSA and intelligence agencies are among the organizations that face the biggest data challenges. Tasked with finding people who are trained to hide, they rely heavily on data analysis. A recent leak illustrates how with graph visualization, the NSA can turn email metadata into insights on the network of key leaders. If you’re struggling to understand big data sets, you should try using graph visualization too !
Why Huawei became a target for the NSA?
Huawei is not yet a well-know brand for consumers outside China but it is already a global powerhouse. The Chinese telecommunication company claims to connect close to a third of the world population to the internet. No wonder then that the NSA has been interested in getting access to Huawei’s network and equipment. Identifying vulnerabilities and controlling Huawei-produced products means being able to target one third of the world population. The US interest in Huawei doesn’t stop there. In 2012, the US House of Representatives intelligence committee made public suspicions of links between Huawei and the People’s Liberation Army. The report claimed that Huawei and another Chinese company, ZTE, “cannot be trusted to be free of foreign state influence and thus pose a security threat to the United States and to our systems.”
According to recent revelations originating from Edward Snowden, it seems the NSA tried to establish whether Huawei is collaborating with the Chinese military to spy on its customers. Ironically, the NSA created back doors and pried its way into the servers in Huawei’s headquarters.
The network of Huawei’s CEO can be inferred from the metadata of his mails
The NSA operation targeting Huawei was code-named “Shotgiant”. In addition to proprietary information about Huawei’s technology, the NSA monitored communications of the company’s top executives. Among those whose emails the NSA was able to read was the president of Huawei, Ren Zhengfei and Sun Yafang, the Chairwoman. What can you learn from spying on hundreds, thousands or millions of mails? We don’t know what intelligence the NSA extracted from the content of the emails it intercepted. But sometimes the content is not the only source of information.
Since Edward Snowden made public the information about the NSA’s actions, we have learnt the importance of metadata. Metadata is information is generated by the electronic services we use. According to the Guardian, it relates to the to the “transactional information about the user, the device and activities”. Very little legal protection apply to these information which are considered less “private” than the actual content of electronic records. In consequence, the NSA can oftentimes collect metadata like date and time of emails even when it is not allowed to scrutinize the content of these same emails.
The Shotgiant documents that leaked show how these metadata can be exploited. Operation Shotgiant gave the NSA access to the mails of Huawei’s CEO and Chairwoman. By studying the metadata of their communications, the NSA was able to get a glimpse on their social network.
The picture above shows the result of this data collection program. The graph represents email addresses and the exchanges between them. Without more context it’s impossible to interpret the graph with certainty. Two points or nodes seem to gather most connections : they probably are the email addresses of Ren Zhengfei and Sun Yafan, the CEO and the Chiarwoman of Huawei.
They seem to have different profiles. Mr Zhengfei, on the left, has an extensive network of close to 100 persons. It is made up of a first set of people who form a circle close to him and of s second circle of more distant people. Ms. Yafang has a smaller network with only a few common contacts with Mr. Zhengfei.
Without context, the graph leaves open a lot of questions :
- are the emails collected only from people within Huawei?
- how are the nodes within the graph positioned? is the first circle of contacts around Huawei’s CEO made of the people he emails the most? who are these people?
- why has Huawei’s Chairwoman so few contacts? is she using a secondary mail address? does she prefer communicating over phone?
- who are the people connecting Mr. Zhengfei and Ms. Yafang?
The metadata of Huawei’s emails offers an interesting perspective on the social networks of key leaders within the organization. Without even looking at the actual content of their mails, the NSA can use graph visualization to identify with whom they communicate and the relative power and influence of their contacts.
Graph visualization is a key technique to analyse large data sets. It can make apparent the links and structure that are hard to grasp with tabular-oriented tools.
The way the NSA uses graph visualization is just one of the many ways to turn emails into insights. For example, scientists and investigators have used the Enron emails to understand a fraud scheme and the responsibilities of its participants. You can visualize your own email too with a project from MIT called Immersion. You can even do the same thing with Neo4j and Linkurious, just follow this quick guide !