Skip to content

Graphs

graph 用于建模关系与连接:从社交网络到道路导航,再到依赖链。本文件覆盖表示方法、BFS、DFS、shortest path、topological sort 与 connected component,并给出图算法面试中最常见的遍历与路径模式。

  • 我们已经在第 12 章学习过 graph theory(adjacency matrix、Laplacian、spectral 性质),在第 13 章学习过更偏理论的内容(tree、planarity、colouring)。本文件聚焦算法模式:如何在代码中遍历、搜索与优化 graph。

  • graph 题最核心的两大算法是 BFSDFS。绝大多数 graph 问题都可归约到这两者(或其改造版)。把这两个模式吃透,就能覆盖大部分题目。

Graph 表示方法

  • adjacency list:为每个 node 存一份邻居列表。空间复杂度 \(O(|V| + |E|)\),最适合 sparse graph(真实世界大多数 graph 都是 sparse)。
# Undirected graph
graph = {
    0: [1, 2],
    1: [0, 3],
    2: [0, 3],
    3: [1, 2]
}

# From edge list
def build_graph(n, edges):
    graph = {i: [] for i in range(n)}
    for u, v in edges:
        graph[u].append(v)
        graph[v].append(u)  # omit for directed
    return graph
  • adjacency matrix\(n \times n\) 矩阵,若边 \((i, j)\) 存在则 \(A[i][j] = 1\)。空间复杂度 \(O(|V|^2)\),适合 dense graph 或需要 \(O(1)\) 判边场景。

  • 如何选择:绝大多数情况下优先 adjacency list。只有当 graph 很稠密(\(|E| \approx |V|^2\))或你必须频繁常数时间判断边存在性时,才考虑 matrix。

  • BFS 用 queue 逐层扩展(level by level),适用于:
    • 无权 graph 的 shortest path
    • 层序遍历
    • connected component
    • 任意“最少步数”问题
from collections import deque

def bfs(graph, start):
    visited = {start}
    queue = deque([start])

    while queue:
        node = queue.popleft()
        for neighbour in graph[node]:
            if neighbour not in visited:
                visited.add(neighbour)
                queue.append(neighbour)
  • 关键细节visited 应在 enqueue 时标记,而非 dequeue 时。若出队才标记,同一 node 可能被多个前驱重复入队,造成额外开销甚至错误。

Easy: Number of Islands

  • 问题:给定 2D 网格 '1'(陆地)和 '0'(海水),统计岛屿个数。

  • 模式:扫描网格,遇到 '1' 就发起一次 BFS/DFS,把该连通陆地全部标记为已访问。发起次数即岛屿数。

from collections import deque

def num_islands(grid):
    if not grid:
        return 0

    rows, cols = len(grid), len(grid[0])
    count = 0

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                count += 1
                # BFS to mark entire island
                queue = deque([(r, c)])
                grid[r][c] = '0'  # mark visited
                while queue:
                    cr, cc = queue.popleft()
                    for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
                        nr, nc = cr + dr, cc + dc
                        if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1':
                            grid[nr][nc] = '0'
                            queue.append((nr, nc))

    return count
  • 常见坑directions = [(0,1),(0,-1),(1,0),(-1,0)] 是 4 邻接网格题的高频模板。建议直接记忆;若是 8 邻接再加四个对角方向。

  • 常见坑:此解法通过 grid[r][c] = '0' 原地标记,省去 visited 集合,但会修改输入。面试中可用,但要主动说明这一取舍。

Medium: Rotting Oranges

  • 问题:新鲜橘子与腐烂橘子相邻会被感染。求全部腐烂所需最短时间,若无法全部腐烂返回 -1。

  • 模式:多源 BFS(multi-source BFS)。把所有初始腐烂橘子同时入队;每一层 BFS 对应 1 个时间单位。

from collections import deque

def oranges_rotting(grid):
    rows, cols = len(grid), len(grid[0])
    queue = deque()
    fresh = 0

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == 2:
                queue.append((r, c))
            elif grid[r][c] == 1:
                fresh += 1

    if fresh == 0:
        return 0

    time = 0
    while queue and fresh > 0:
        time += 1
        for _ in range(len(queue)):
            cr, cc = queue.popleft()
            for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
                nr, nc = cr + dr, cc + dc
                if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1:
                    grid[nr][nc] = 2
                    fresh -= 1
                    queue.append((nr, nc))

    return time if fresh == 0 else -1
  • 关键洞察:multi-source BFS 计算的是“每个点到任一源点的最短距离”,正好对应“最后一个新鲜橘子被感染所需时间”。
  • DFS 会尽可能向深处走,再回溯。可用显式 stack 或递归调用栈实现。典型应用:
    • 环检测
    • topological sort
    • connected component
    • 回溯/穷举
    • 带约束路径搜索
def dfs(graph, node, visited=None):
    if visited is None:
        visited = set()
    visited.add(node)
    for neighbour in graph[node]:
        if neighbour not in visited:
            dfs(graph, neighbour, visited)

Medium: Course Schedule(Cycle Detection)

  • 问题:给定课程数与先修关系,判断能否修完全部课程(即有向图中是否无环)。

  • 模式:在 directed graph 中做环检测。DFS 使用三色状态:未访问、访问中(当前路径上)、已完成。

def can_finish(num_courses, prerequisites):
    graph = {i: [] for i in range(num_courses)}
    for course, prereq in prerequisites:
        graph[course].append(prereq)

    # 0 = unvisited, 1 = in-progress, 2 = completed
    state = [0] * num_courses

    def has_cycle(node):
        if state[node] == 1:
            return True   # back edge → cycle
        if state[node] == 2:
            return False  # already fully explored

        state[node] = 1  # mark in-progress
        for neighbour in graph[node]:
            if has_cycle(neighbour):
                return True
        state[node] = 2  # mark completed
        return False

    for course in range(num_courses):
        if has_cycle(course):
            return False
    return True
  • 为什么必须三状态:两状态(visited/unvisited)无法区分“正在当前 DFS 路径中”与“已完全处理完”。碰到状态 1 才是 back edge(有环);碰到状态 2 只是交叉/前向边,不代表有环。

Medium: Course Schedule II(Topological Sort)

  • 问题:输出一种合法修课顺序(topological order)。

  • 模式(Kahn algorithm,BFS 版):把入度为 0 的 node 入队,逐个弹出并削减其邻居入度,新的入度 0 再入队,直到处理结束。

from collections import deque

def find_order(num_courses, prerequisites):
    graph = {i: [] for i in range(num_courses)}
    indegree = [0] * num_courses

    for course, prereq in prerequisites:
        graph[prereq].append(course)
        indegree[course] += 1

    queue = deque([i for i in range(num_courses) if indegree[i] == 0])
    order = []

    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbour in graph[node]:
            indegree[neighbour] -= 1
            if indegree[neighbour] == 0:
                queue.append(neighbour)

    return order if len(order) == num_courses else []  # empty = cycle exists
  • 常见坑:若结果长度小于 node 总数,说明有环(某些 node 入度永远无法降到 0)。

Shortest Path

Dijkstra Algorithm

  • 用于非负 weighted graph 中,从单源到其余所有 node 的 shortest path。实现依赖 priority queue(min-heap)。
import heapq

def dijkstra(graph, start):
    # graph: {node: [(neighbour, weight), ...]}
    dist = {node: float('inf') for node in graph}
    dist[start] = 0
    heap = [(0, start)]

    while heap:
        d, node = heapq.heappop(heap)
        if d > dist[node]:
            continue  # stale entry

        for neighbour, weight in graph[node]:
            new_dist = d + weight
            if new_dist < dist[neighbour]:
                dist[neighbour] = new_dist
                heapq.heappush(heap, (new_dist, neighbour))

    return dist
  • 时间复杂度:二叉 heap 下为 \(O((|V| + |E|) \log |V|)\)

  • 常见坑if d > dist[node]: continue 必不可少。否则会反复处理过期堆项,严重拖慢性能,甚至退化。

  • 常见坑:Dijkstra 不适用于负权边。只要存在负权,贪心前提(节点一旦出堆即最优)就被破坏,应改用 Bellman-Ford。

Hard: Network Delay Time

  • 问题:给定 \(n\) 个 node 和有向带权边,求信号从源点到达所有 node 的时间;若有 node 不可达返回 -1。
def network_delay(times, n, k):
    graph = {i: [] for i in range(1, n + 1)}
    for u, v, w in times:
        graph[u].append((v, w))

    dist = dijkstra(graph, k)
    max_time = max(dist.values())
    return max_time if max_time < float('inf') else -1

Strongly Connected Component(SCC)

  • 在 directed graph 中,SCC 是一个极大 node 集,使得集合内任意两 node 彼此可达。

  • Kosaraju algorithm: 1) 在原图 DFS,记录完成时间顺序; 2) 转置图(反向所有边); 3) 按完成时间逆序在转置图 DFS。第 3 步每棵 DFS 树就是一个 SCC。

  • 适用场景:查循环依赖、2-SAT、把有向图压缩为 SCC DAG。


Common Pitfalls Summary

Pitfall Example Fix
出队时才标记 visited 同一 node 多次入队 入队时立刻标记
有向图只用两状态 visited 回边与交叉边混淆 用三状态(未访问/访问中/完成)
负权图用 Dijkstra shortest path 错误 换 Bellman-Ford
忘记跳过 stale heap 项 重复处理过期路径 if d > dist[node]: continue
网格越界判断漏写 数组越界异常 固定写边界条件
忽略 time=0 情况 Rotting Oranges 无新鲜橘子 BFS 前先判 fresh == 0
有向边错建成双向 先修关系方向错误 只按题意单向建边

Take-Home Problems(NeetCode)

BFS Pattern

DFS Pattern

Shortest Path

Advanced