B-Tree Data Structure

Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!

B-Tree Introduction

A B-Tree is a self-balancing tree data structure that maintains sorted data and allows for efficient insertion, deletion, and search operations. It is particularly well-suited for systems that read and write large blocks of data, such as databases and file systems. B-Trees are a generalization of binary search trees in that a node can have more than two children.

Consider a B-Tree of order 3.


     [30]
   /     \
[10,20] [40,50,60]

In this B-Tree:

→ The root node contains one key: 30.
→ The root has two children: the left child contains keys 10 and 20, and the right child contains keys 40, 50, and 60.

Order of a B-Tree

The order of a B-Tree, denoted as $m$ , is a crucial property that defines the maximum number of children a node can have. It determines the capacity of nodes and the structure of the tree. Specifically:

→ Each node can contain at most $m-1$ keys.
→ Each node (except for the root) must contain at least $\lceil \frac{m}{2} \rceil - 1$ keys.
→ A node with $k$ keys will have $k+1$ children.
→ The root must have at least one key if it is not a leaf node.

In essence, the order of the B-Tree influences how "wide" the tree is, affecting its height and the efficiency of operations.

Consider a B-Tree of order 3. Here's a visual representation:


      [30]
    /     \
[10,20]  [40,50,60]

In this B-Tree:

→ The root node contains one key: 30.
→ The root has two children: the left child contains keys 10 and 20, and the right child contains keys 40, 50, and 60.

Structure of a B-Tree

A B-Tree of order $m$ (also referred to as a B-Tree of degree $m$ ) is defined by the following properties:

Node Structure:

→ Each node contains a certain number of keys.
→ The number of keys in a node must be at least $\lceil \frac{m}{2} \rceil - 1$ and at most $m - 1$ .
→ Each node has at most $m$ children.

Root Properties:

→ The root node must have at least one key if it is not a leaf node.

Internal Node Properties:

→ Each internal node has exactly one more child than the number of keys.
→ The keys in each node are in sorted order.
→ All leaves have the same depth (the tree is balanced).

Leaf Properties:

→ Leaves do not carry children and serve as endpoints of the tree.

Search in B-Tree

Searching in a B-Tree is similar to searching in a binary search tree but involves more comparisons per node due to the larger number of keys. The search operation proceeds as follows:

→ Start from the root and recursively traverse down the tree.
→ At each node, perform a binary search to find the range in which the search key falls.
→ Move to the corresponding child pointer.
→ Repeat until the key is found or a leaf node is reached.

Consider the following 3-level B-Tree of order 3:


          [50]
        /     \
       /       \
  [20,30]        [60,70]
  /   |  \      /    \
[10] [25] [35] [55]  [71,75]

Let's search for the key 75 in this B-Tree:

→ Start at the root: [50].
→ Since 75 > 50, move to the right child: [60,70].
→ Perform a binary search in [60,70]. Since 75 > 70, move to the right child: [71,75].
→ Perform a binary search in [71,75].
→ Find 75 in the node [71,75].

Therefore, the key 75 is found in the B-Tree.

Insertion in B-Tree

Inserting a new key into a B-Tree involves the following steps:

Locate the Leaf: Traverse the tree to find the appropriate leaf node where the new key should be inserted.
Insert the Key: If the leaf node has fewer than $m-1$ keys (where m is the order of the B-Tree), insert the new key in the correct sorted position.
Split the Node: If the leaf node already contains $m-1$ keys, split the node:

Choose the median key as the split point.
Move the median key up to the parent node.
Create a new node to the right of the original node.
Move keys greater than the median to the new node.

Propagate Changes: If the parent node becomes full due to the insertion of the median key, recursively split the parent and continue this process up the tree if necessary.
Adjust Root: If the split propagates to the root, create a new root node, increasing the tree's height.

Here's an example illustrating insertion into a B-Tree of order 3 (each node can have 2 to 4 keys):

1. Initial state (inserting 70):


      [30]
     /    \
  [10,20] [40,50,60]

2. Insert 70, causing node overflow:


      [30]
     /    \
  [10,20] [40,50,60,70] (overflow)

3. Split the overflowing node:


      [30,50]
     /   |    \
  [10,20] [40] [60,70]

Note: In a B-Tree of order m, each node (except the root) must contain between $\lceil \frac{m}{2} \rceil - 1$ and $m-1$ keys. The root can have between 1 and $m-1$ keys.

Deletion in B-Tree

Deleting from a B-Tree is more complex and involves several cases:

→ Locate the Key: Perform a search to find the key to be deleted.
→ Leaf Node Deletion: If the key is in a leaf node, remove it directly.
→ Internal Node Deletion: If the key is in an internal node, replace it with the predecessor or successor key from the subtree.
→ Merge or Borrow: If removing the key causes a node to have fewer than the minimum number of keys:

→ Borrow from Sibling: If an adjacent sibling has more than the minimum number of keys, borrow a key.
→ Merge Nodes: If both siblings have the minimum number of keys, merge the node with a sibling, and adjust the parent accordingly.

Consider the following B-Tree of order 3 (each node can have 2 to 4 keys):


         [30]
        /    \
     [10,20] [40,50,60]
      /   |   \    |    \
   [5] [15] [25] [35] [45,55]

Let's delete the key 20 from this B-Tree:

→ Start at the root: [30].
→ Traverse to the appropriate leaf node [10,20].
→ Remove 20 from the leaf node.
→ Adjust the B-Tree to maintain balance, possibly borrowing from siblings or merging nodes if necessary.

Complexity Analysis

B-Trees are optimized for systems that read and write large blocks of data. The height of a B-Tree is $O(\log_m n)$ , where $n$ is the number of keys and $m$ is the order of the B-Tree. This guarantees that operations like search, insertion, and deletion can be performed in logarithmic time relative to the number of keys.

Applications of B-Trees

→ Databases: B-Trees are widely used in databases and database indexes to allow efficient querying and data management.
→ File Systems: File systems use B-Trees to manage directories and file metadata.
→ Multilevel Indexing: B-Trees are used in multilevel indexing to manage large amounts of data with efficient access patterns.

Suggetested Articles