加载中

In this post, we are going to explore non-linear data structures like graphs. We are going to cover the central concepts and typical applications.

You are probably using programs that use graphs (and trees). Let’s say for instance that you want to know the shortest path between your workplace and home you can use graph algorithms to get the answer! We are going to explore this and other fun challenges.

In the previous post, we explore linear data structures like arrays, linked lists, sets, stacks and so on. This post builds on top of what we learned.

This post is part of a tutorial series:

Learning Data Structures and Algorithms (DSA) for Beginners

  1. Intro to algorithm’s time complexity and Big O notation

  2. Eight time complexities that every programmer should know

  3. Data Structures for Beginners: Arrays, HashMaps, and Lists

  4. Graph Data Structures for Beginners <== you are here

  5. Trees Data Structures for Beginners

  6. Self-balanced Binary Search Trees

  7. Appendix I: Analysis of Recursive Algorithms

Here is the summary of the operations that we are going to cover on this post:


Adjacency ListAdjacency Matrix
SpaceO(|V| + |E|)O(|V|2)
addVertexO(1)O(|V|2)
removeVertexO(|V| + |E|)O(|V|2)
addEdgeO(1)O(1)
removeEdge (using Array)O(|E|)O(1)
removeEdge (using HashSet)O(1)O(1)
getAdjacentsO(|E|)O(|V|)
isAdjacent (using Array)O(|E|)O(1)
isAdjacent (using HashSet)O(1)O(1)

在这篇文章中,我们将探索像图这样的非线性数据结构。我们将介绍其核心概念和典型应用。

你可能正在使用使用图(和树)数据结构的程序。比方说,你想知道你工作的地方和家之间的最短路径,你可以使用图算法来获得答案!我们将探讨这个和其他有趣的挑战。

在上一篇文章中,我们探索线性数据结构,如数组、链表、集合、栈等。这篇文章是建立在我们已学到的知识之上的。

这篇文章是此教程系列的一部分:

给初学者的数据结构及算法之路

  1. 算法时间复杂度及大O表示法简介

  2. 每个程序员必知的三个时间复杂度

  3. 给初学者的数据结构:数组、哈希表以及链表

  4. 给初学者的数据结构之图 <== 你在这里

  5. 给初学者的数据结构之树

  6. 自平衡二叉搜索树

  7. 附录I: 递归算法分析

以下是我们将在本文中介绍的操作的摘要:

 Adjacency ListAdjacency Matrix
SpaceO(|V| + |E|)O(|V|2)
addVertexO(1)O(|V|2)
removeVertexO(|V| + |E|)O(|V|2)
addEdgeO(1)O(1)
removeEdge (using Array)O(|E|)O(1)
removeEdge (using HashSet)O(1)O(1)
getAdjacentsO(|E|)O(|V|)
isAdjacent (using Array)O(|E|)O(1)
isAdjacent (using HashSet)O(1)O(1)

Graphs Basics

A graph is a data structure where a node can have zero or more adjacent elements.

The connection between two nodes is called edge. Nodes can also be called vertices.

The degree is the number of edges connected to a vertex. E.g., the purple vertex has a degree of 3 while the blue one has a degree of 1.

If the edges are bi-directional, then we have a undirected graph. But, if the edges have a direction, then we have a directed graph or di-graph for short. You can think of it as a one-way street (directed) or two-way street (undirected).

Vertex can have edges that go to itself (e.g., blue node), this is called self-loop.

图的基础知识

图是一种数据结构,其中的节点可以具有零个或多个相邻元素。两个节点之间的连接称为边。 节点亦可称为顶点。

度是与顶点相关联的边的个数。例如,紫色顶点的度数为3,而蓝色顶点度数为1.

如果边是双向的,我们就有了一个无向图。 但如果边有方向,那么我们就有一个有向图或简单图。 你可以将其视为单行道(有向)或双向街道(无向)。

顶点可以有关于自己的边,例如蓝的顶点,称之为自循环。

A graph can have cycles which means that if you traverse through the node, you could get to the same node more than once. The graph without cycles is called acyclic graph.

Also, acyclic undirected graphs are called tree. We are going to cover trees in depth in the next post.

Not all vertices have to be connected in the graph. You might have isolated nodes or even separated subgraphs. If all nodes are has a least one edge, then we have a connected graph. When all nodes are connected to all other nodes, then we have a complete graph.

For a complete graph, each node has to have #nodes - 1 edges. In the previous example we have 7 vertices, so each node has 6 edges.

图可以具有循环,这意味着如果要遍历节点,则可以多次访问同一节点。 没有循环的图称为非循环图。

此外,非循环无向图称为树。 我们将在下篇中深入讲解树。

并非所有顶点都必须在图中连接。 可以有孤立的节点甚至是分离的子图。 如果所有节点都至少有一个边,那么我们有一个连通图。 当所有节点都连接到所有其他节点时,我们就有了完全图。

对于完全图,每个节点必须有#nodes - 1个边。 在前面的例子中,我们有7个顶点,因此每个节点有6个边。

Graph Applications

When edges have values/cost assigned to them, we say we have a weighted graph. If the weight is absent, we can assume it’s 1.

Weighted graphs have many applications depending on the domain where you need to solve a problem. To name a few:

  • Airline Traffic (image above)

    • Node/vertex = Airport

    • Edges = direct flights between two airports

    • Weight = miles between two airports

  • GPS Navigation

    • Node = road insersection

    • Edge = road

    • Weigth = time required to go from one intersection to another

  • Networks routing

    • Node = server

    • Edge = data link

    • Weight = connection speed

In general, graphs have many real-world applications like:

  • Electronic circuits

  • Flight reservations

  • Driving directions

  • Telcom: Cell tower frequency planning

  • Social networks. E.g., Facebook uses a graph for suggesting friends

  • Recommendations: Amazon/Netflix uses graphs to make suggestions products/movies

  • Graphs help to plan logistics of delivering goods

We just learned the basics of graphs and some applications. Let’s learn now how to represent graphs in code.

图的应用

当边具有已分配给它们的价值/费用时,我们说我们有一个加权图。如果没有加权值,我们可以假设它是1。

加权图具有许多应用场景,具体取决于你需要解决问题的域。仅举几例:

  • 航线交通 (见上图)

    • 节点/顶点 = 机场

    • 边 = 两个机场之间的直接航班

    • 权重 = 两个机场之间的里程数

  • GPS导航

    • 节点 = 道路交点

    • 边 = 道路

    • 权重 = 两个路口之间距离所需时间

  • 网络路由

    • 节点 = 服务器

    • 边 = 数据链路

    • 权重 = 连接速度

通常,图有很多现实世界的应用,比如: 

  • 电子电路

  • 航班预订

  • 驾驶导航

  • 电性:信号塔频率规划

  • 社交网络,比如,Facebook使用图来实现朋友推荐

  • 推荐: Amazon/Netflix使用图做产品/电影推荐

  • 图有助于物流规划设计

我们刚刚学习了图的基础知识及其相关的一些应用场景。现在让我们学习如何在代码中表示图。

Representing graphs

Thre are two primary ways of representing graph:

  1. Adjacency list

  2. Adjacency Matrix

Let’s explain it with the following directed graph (digraph) as an example:

We a digraph with 4 nodes. When a vertex has link to itself (e.g. a) is called self-loop.

Adjacency Matrix

The adjacency matrix is one way of representing a graph using a two-dimensional array (NxN matrix). In the intersection of nodes, we add 1 (or other weight) if they are connected and 0 or - if they are not connected.

Using the same example as before, we can build the following adjacency matrix:


It’s important to notice that for undirected graphs the adjacency matrix will alwaysbe symmetrical by the diagonal. However, that’s not always the case on a digraph (like our example).As you can see, the matrix list all nodes horizontally and vertically. If there a few connections we called sparse graph if there are many connections (close to the max number of links) we called it dense graph. If all possible connections are reached, then we have a complete graph.

What is the time complexity of finding connections of two vertices?

Querying if two nodes are connected in an adjacency matrix is O(1).

What is the space complexity?

Storing a graph as an adjacency matrix has a space complexity of O(n2), where n is the number of vertices. Also, represented as O(|V|2)

What is the runtime to add a vertex?

The vertices are stored as a VxV matrix. So, everytime a vertex is added, the matrix needs to be reconstructed to a V+1xV+1.

Adding a vertex on a adjacency matrix is O(|V|2)

表示图

有两种主要用于表示图的方式:

  1. 邻接表

  2. 邻接矩阵

让我们用以下有向图作为示例分析一下:

此有向图有4个节点。当一个顶点有指向自身的链接时(比如a)则称之为自循环

邻接矩阵

邻接矩阵是一种使用二维数组(NxN矩阵)来表示图的方式。针对顶点的相交的情况,如果二者是连接的,我们将添加1(或者其他权重),否则设置为0或-。

使用之前的示例,我们可以构建以下邻接矩阵:

需要注意的是无向图的邻接矩阵总是以对角线为轴对称的。然而,在有向图中并不是如此(例如我们的示例中的有向图)。正如你所看到的,该矩阵从水平和垂直方向列出了所有节点。如果连接比较少,我们称之为稀疏图;如果连接很多(接近最大连接数目),我们称之为稠密图。如果达到了所有可能的连接,我们就拥有了完全图。

查找两个顶点之间的时间复杂度如何呢?

在邻接矩阵中查询两个顶点是否相连的复杂度是O(1).

空间复杂度如何呢?

将图保存为邻接矩阵的空间复杂度是O(n2),其中n是顶点的数目。也可表示为 O(|V|2)。

添加一个顶点的运行时间如何呢?

顶点被保存在VxV的矩阵中。因此,每次添加顶点时,矩阵必须被重构为新的V+1xV+1

在邻接矩阵中添加顶点的复杂度是O(|V|2)

What about getting the adjacent nodes?

Since the matrix has a VxV matrix, to get all the adjacent nodes to a given vertex, we would have to go to the node row and get all its edges with the other nodes.

In our previous example, let’s say we want all the adjacent nodes to b. We have to get the full row where b with all the other nodes.


We have to visit all nodes so,

Getting adjacent nodes on an adjacency matrix is O(|V|)

Imagine that you need to represent Facebook network as a graph. You would have to create a matrix of 2 billion x 2 billion, where most of it would be empty! Nobody would know everybody else just a few thousands at most.

In general, we deal with sparse graphs so the matrix will waste a lot of space. That’s why in most implementation we would use an adjacency list rather than the matrix.

那么如何获得相邻的顶点呢?

既然邻接矩阵拥有一个VxV的矩阵,为了获得与给定顶点相邻的所有节点,我们必须进入顶点所在的行,然后获取它与其他顶点相连的所有边。

在之前的示例中,假设我们需要获得所有与b相邻的顶点。我们需要获取存储b与其他所有顶点连接信息的行数据。


我们需要遍历所有节点,

在邻接矩阵中获取相邻节点的时间复杂度是O(|V|)

假设你需要将Facebook网络用图来表示。你可能需要创建一个20亿x20亿的矩阵,其中多数节点是空的!没有人可以认识其他所有人,通常至多几千人而已。

通常,我们需要处理稀疏图的矩阵,因为这种矩阵将浪费大量存储空间。这就是为什么在多数实现中我们使用邻接表而不选择邻接矩阵的原因了。

Adjacency List

Adjacency List is one of the most common ways to represent graphs. Each node has a list of all the nodes connected to it.

Graphs can be represented as an adjacency list using an Array (or HashMap) containing the nodes. Each of this node entries includes a list (array, linked list, set, etc.) that list its adjacent nodes.

For instance in the graph above we have that a has an connection to b and also a self-loop to itself. In turn, b has a connection to c and so on:


Querying if two nodes are connected in an adjacency list is O(n), where n is the number of vertices. Also represented as O(|V|)As you can imagine if you want to know if a node is connected to another node, you would have to go through the list.

What about the space complexity?

Storing a graph as an adjacency list has a space complexity of O(n), where n is the sum of vertices and edges. Also, represented as O(|V| + |E|)

邻接表

一种表示图的比较通用的方法是使用邻接表. 每一个节点都包含一个列表, 这个列表中的每一个元素都是与该节点相连的.

可以使用Array(或HashMap)作为邻接表容器来表示图, 容器中的元素就是图节点. 而每一个节点实体又包含一个列表(数组,链表,集合等), 来包含与该节点邻接的节点.

例如,在上面的图中, a 与 b 相连接, 同时 a 又连接到自身. 接下来 b 与 c 相连接, 如此继续下去.

给定邻接表中两个节点是否相连的复杂度是O(n), 其中 n 代表顶点的数量. 也可以写为O(|V|). 你可以想象, 如果你想知道一个节点与另一个节点是否相连, 你必须遍历这个列表.

那么空间复杂度又如何呢?

将图存储为邻接表的空间复杂度是O(n), 其中 n 是顶点数与边数的和. 也可以表示为O(|V| + |E|).

Adjacency List Graph HashMap Implementation

The adjacency list is the most common way of representing graphs. There are several ways to implement the adjacency list:

One of the most simple is using a HashMap. The key is the value of the node, and the value is an array of adjacency.


Add and remove verticesGraph usually needs the following operations:

  • Add and remove edges

Adding and removing vertices involves updating the adjacency list.

Let’s say that we want to remove the vertex b. We could do delete graph['b'];, however, we still have to remove the references on the adjacency list on d and a.

Everytime we remove a node, we would have to iterate through all the nodes’ list O(|V| + |E|). Can do better? We will answer that later, first let’s *implement our list in a more object-oriented way so we can swap implementations easily.

使用邻接表作为图表示方式的HashMap实现

邻接表是图的最通用的一种表示方式, 也有很多方式可以实现邻接表.

其中最简单的一种实现方式就是HashMap. 节点值可以作为key, value值是一个邻接数组.

增加/移除图中的顶点经常需要以下操作:

  • 增加/移除边

增加/移除顶点也需要更新邻接表.

假设我们想移除顶点b, 我们可以这样做: delete graph['b'];, 可是我们还必须移除邻接表中顶点d和顶点a对b的引用.

每次我们移除一个节点, 我们都必须迭代所有的节点列表, 复杂度是 O(|V| + |E|). 有没有更好的方法呢? 我们后面再回答这个问题. 首先让我们一起来以一种更"面向对象"的方式来实现我们的表, 这样做swap操作就会更容易些.

Adjacency List Graph OO Implementation

Let’s start with the Node class that holds the vertex’s value and its adjacent vertices. We can also have helper functions for adding and removing adjacent nodes from the list.


Make it work. Make it right. Make it faster.Notice that adjacent runtime is O(1), while remove adjacent is O(|E|). What if instead of an array use a HashSet ? It could be O(1). But, let first get it working and later we can make it faster.

Ok, now that we have the Node class, let’s build the Graph class that can perform operations such as adding/removing vertices and edges.

Graph.constructor


The first thing that we need to know is if the graph is directed or undirected. That makes a difference when we are adding edges.

邻接表图"面向对象"实现

我们先新建一个 Node 类, 顶点值和对应的邻接顶点作为类成员. 然后创建一个帮助函数来对表进行添加/移除邻接顶点的操作.

然后我们让它变得更快更正确更好. 这里你可以看到: 增加 adjacent 时间复杂度是O(1), 移除 adjacent 的时间复杂度是 O(|E|). 那么如果使用HashSet来代替数组呢? 那将是 O(1). 但是我们先让它可以运行, 后面再做优化.

好了, 现在我们有了 Node 类, 接下来让我们构建 Graph 类, 以及可以实现添加/移除顶点和边的操作.

Graph.constructor

我们需要知道的第一件事是这个图是个有向图还是无向图. 在添加边时这两种类型的图会有所不同.

Graph.addEdge

Two add an edge we need two nodes. One is the source, and the other is the destination.


We add an edge from the source vertex to the destination. If we have an undirected graph, then we also add from target node to source since it’s bidirectional.

The runtime of adding an edge from a graph adjacency list is: O(1)

If we try to add an edge and the nodes don’t exist, we need to create them first. Let’s do that next!

Graph.addVertex

The way we create a node is that we add it to the this.nodes Map. The map store a key/value pair, where the key is the vertex’s value while the map value is the instance of the node class. Take a look at line 5-6:


If the node already exists we don’t want to overwrite it. So, we first check if it already exists if not then we create it.

The runtime of adding a vertex from a graph adjacency list is: O(1)

图 --- 添加边操作(Graph.addEdge)

增加一条边需要两个节点。一个是源节点,一个是目的节点。

我们增加一条从源顶点到目的顶点的边。 如果是一个无向图, 那么我们还要增加一条从目标点到源节点的边,因为它是双向的。

为一个图的邻接表增加一条边的时间复杂度是O(1)

如果我们要增加边时,对应的节点不存在,我们就需要先创建它们。接下来我们看如何实现:

图 --- 添加顶点操作(Graph.addVertex)

创建节点的方法就是将它添加到 this.nodes Map中。Map存储一个键值对,key 是顶点值,value 是Node类的实例。下面让我们看一下代码:

如果节点已经存在,我们不需要覆盖它的值,所以我们首先要检查节点是否存在,如果不存在,就创建。

对图的邻接表添加顶点的时间复杂度是 O(1)。

返回顶部
顶部