We are launching Graph Viz 101, a series of posts to teach the basics of graph visualization, written by Sébastien Heymann in collaboration with Bénédicte Le Grand of Université de Paris 1. This is our first post, please discuss it below!
A graph (also called network) is made of a set of entities, called nodes, and a set of relationships between entities (also called edges or links). The way nodes are connected constitutes the topology of the network. Moreover, additional information can be added such as properties, which are key-value pairs associated to each node or relationship. For example, individuals of a social network may be characterized by properties like gender, language, and age.
The analysis of complex networks consists in (but is not limited to) diverse types of tasks, such as the understanding the statistical properties of their topologies, the identification of significant nodes, and the detection of anomalies. One of the biggest challenges encountered is to get a good intuition of the network under study. Even when information like node properties is available, extracting valuable knowledge and providing insights is challenging. Analysts may indeed deal with multiple dimensions made of (but not limited to) social, topical, geographical, and temporal data, which may also be aggregated at different levels of detail.
Faced with such diversity of data and the potentially unlimited number of analysis to perform at the first steps of a new project, analysts usually follow an exploratory approach to inspect data and outline interesting perspectives before drilling down to specific issues. When the datasets describe complex networks, this process is called Exploratory Network Analysis (ENA); it is based on data visualization and manipulation to analyze complex networks. This framework takes its roots in the more general framework of Exploratory Data Analysis (EDA), which consists in performing a preliminary analysis guided by visualization before proposing a model or doing a statistical analysis. Described by J. Tukey in the book “The future of data analysis” (1962), the philosophy of EDA can be wrapped up as follows:
Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.
The main goal of EDA is to speed up the formulation of novel questions and relevant hypotheses about data through serendipitous findings (i.e. discoveries made while searching for something else). EDA’s process relies on visualization and interaction techniques embedded in a broader process, which includes data cleaning, storage, and mining. Related goals include error checking in data input, result validation, and finding faster the facts we intuit. In this context, we outline the objectives of ENA as follows:
- to speed up Research in complex networks,
- to provide technological platforms for the development of novel methods and industrial products using complex networks,
- to democratize the concepts related to complex networks and reach a broad audience in order to empower civil society.
Graph Viz 101 provides an introduction to the most common approaches for the visual exploratory analysis of networks, intended for beginners. We will firstly focus on the importance of visualization for ENA. Then we will give an overview of the whole ENA processing chain. We will split the visual exploration into two distinct approaches: the global approach aims at observing the general properties of data, whereas the local approach aims at investigating entities with their contexts.
Don’t miss out the Graph Viz 101 series! Subscribe to the email alerts below (you can unsubscribe any time), or follow us on your favorite social network: Twitter, LinkedIn, Google+, Facebook. Help us spread it to see everyone making better and useful graph visualizations!