Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!

Introduction

The disjoint set data structure, also known as the union-find data structure, is a data structure that efficiently manages a partition of a set into disjoint (non-overlapping) subsets. It supports two primary operations:
→ Find, which determines the representative of the subset containing a particular element, and
→Union, which merges two subsets into one. This data structure is particularly useful for applications involving dynamic connectivity, such as in graph algorithms and network connectivity.

Use Examples

Initialization

In the disjoint set data structure, each element starts in its own subset. This can be represented by initializing an array where each element points to itself.

Pseudo Code:


function Initialize(n):
    parent = array of size n
    rank = array of size n
    for i from 0 to n-1:
        parent[i] = i
        rank[i] = 0
        

Example:

For a set with 5 elements (0 through 4), the initial arrays are:
parent=[0,1,2,3,4]parent = [0, 1, 2, 3, 4]
rank=[0,0,0,0,0]rank = [0, 0, 0, 0, 0]

Find Operation

The Find operation locates the representative (or root) of the subset containing a particular element. This can be optimized with path compression to flatten the structure and speed up future operations.

Pseudo Code:


function Find(x):
    if parent[x] != x:
        parent[x] = Find(parent[x])  # Path compression
    return parent[x]
        

Example:

Consider the following unions have been performed:
→ Union(0, 1)
→ Union(1, 2)
The parent array might look like:
parent=[0,0,0,3,4]parent = [0, 0, 0, 3, 4]
To find the representative of element 2:
Find(2)parent[2]=0Find(2) \rightarrow parent[2] = 0

Union Operation

The Union operation merges two subsets into a single subset. Union by rank ensures that the smaller tree is always attached under the root of the larger tree, keeping the structure balanced.

Pseudo Code:


function Union(x, y):
    rootX = Find(x)
    rootY = Find(y)
    
    if rootX != rootY:
        if rank[rootX] > rank[rootY]:
            parent[rootY] = rootX
        elif rank[rootX] < rank[rootY]:
            parent[rootX] = rootY
        else:
            parent[rootY] = rootX
            rank[rootX] += 1
        

Example:

Continuing with the previous example, perform the union of elements 3 and 4:
→ Union(3, 4)
The updated arrays might look like:
parent=[0,0,0,3,3]parent = [0, 0, 0, 3, 3]
rank=[1,0,0,1,0]rank = [1, 0, 0, 1, 0]

Use Cases

Kruskal's Algorithm: Used to find the minimum spanning tree of a graph. The union-find structure helps efficiently manage the merging of connected components.

Network Connectivity: To determine if there is a path between two nodes in a network. The Find operation helps check if two nodes belong to the same connected component.

Image Processing: For segmenting an image into connected components, where each pixel can be connected to its neighbors.

Dynamic Equivalence: Managing dynamic connectivity in systems where connections between components are frequently added or queried.

Merging Accounts: In applications like social networks, where it is necessary to merge user accounts and manage connections between users dynamically.

Advanced Techniques and Optimizations

Path Compression

Path compression is an optimization technique used in the Find operation to make the tree representing each subset as flat as possible. Whenever Find is called, it flattens the structure by making every node point directly to the root. This ensures that future operations are faster.

Pseudo Code:


function Find(x):
    if parent[x] != x:
        parent[x] = Find(parent[x])  # Path compression
    return parent[x]
        

Union by Rank/Size

Union by rank or size is used during the Union operation to keep the tree balanced. The idea is to always attach the smaller tree under the root of the larger tree. This helps in keeping the tree shallow, which in turn keeps the Find operations efficient.

Pseudo Code:


function Union(x, y):
    rootX = Find(x)
    rootY = Find(y)
    
    if rootX != rootY:
        if rank[rootX] > rank[rootY]:
            parent[rootY] = rootX
        elif rank[rootX] < rank[rootY]:
            parent[rootX] = rootY
        else:
            parent[rootY] = rootX
            rank[rootX] += 1
        

Complexity Analysis

The time complexity of both Union and Find operations is nearly constant, specifically O(α(n))O(\alpha(n)), where α(n)\alpha(n) is the inverse Ackermann function. This function grows extremely slowly, making it effectively constant for all practical purposes. Thus, the disjoint set data structure provides very efficient operations for managing and merging subsets.