This post has already been read 2015 times!

Introduction

In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.

The B-tree is a generalization of a binary search tree in that a node can have more than two children. Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data.

It is commonly used in databases and filesystems.

831px-B-tree.svg

The database problem

This section describes a problem faced by database designers, outlines a series of increasingly effective solutions to the problem, and ends by describing how the B-tree solves the problem completely.

Time to search a sorted file

Usually, sorting and searching algorithms have been characterized by the number of comparison operations that must be performed using order notation.

A binary search of a sorted table with

N records

, for example, can be done in roughly

Log 2 (N) comparisons

.

If the table had

1,000,000 records

, then a specific record could be located with at most 20 comparisons:

Log 2 (1,000,000) = 20

.

Large databases have historically been kept on disk drives. The time to read a record on a disk drive far exceeds the time needed to compare keys once the record is available.

The time to read a record from a disk drive involves a seek time and a rotational delay. The seek time may be 0 to 20 or more milliseconds, and the rotational delay averages about half the rotation period. For a 7200 RPM drive, the rotation period is 8.33 milliseconds. For a drive such as the Seagate ST3500320NS, the track-to-track seek time is 0.8 milliseconds and the average reading seek time is 8.5 milliseconds. For simplicity, assume reading from disk takes about 10 milliseconds.

Naively, then, the time to locate one record out of a million would take :

20 disk reads times 10 milliseconds per disk read, which is 0.2 seconds

The time won't be that bad because individual records are grouped together in a disk block.
A disk block might be 16 kilobytes. If each record is 160 bytes, then 100 records could be stored in each block.

disk_block_record_storage

The disk read time above was actually for an entire block. Once the disk head is in position, one or more disk blocks can be read with little delay. With 100 records per block, the last 6 or so comparisons don't need to do any disk reads—the comparisons are all within the last disk block read.

To speed the search further, the first 13 to 14 comparisons (which each required a disk access) must be sped up.

The B-tree uses all those ideas

The B-tree uses all of the ideas described above.

In particular, a B-tree:

  1. keeps keys in sorted order for sequential traversing
  2. uses a hierarchical index to minimize the number of disk reads
  3. uses partially full blocks to speed insertions and deletions
  4. keeps the index balanced with an elegant recursive algorithm

In addition, a B-tree minimizes waste by making sure the interior nodes are at least half full. A B-tree can handle an arbitrary number of insertions and deletions.

B-tree implementation with C++


#include "iostream.h"
#include "stdlib.h"
#include "alloc.h"
const int MAX = 4 ;
const int MIN = 2 ;
struct btnode
{
	int count ;
	int value[MAX + 1] ;
	btnode *child[MAX + 1] ;
} ;
class btree
{
	private :
		btnode *root ;
	public :
		btree( ) ;
		void insert ( int val ) ;
		int setval ( int val, btnode *n, int *p, btnode **c ) ;
		static btnode * search ( int val, btnode *root, int *pos ) ;
		static int searchnode ( int val, btnode *n, int *pos ) ;
		void fillnode ( int val, btnode *c, btnode *n, int k ) ;
		void split ( int val, btnode *c, btnode *n,
				int k, int *y, btnode **newnode ) ;
		void del ( int val ) ;
		int delhelp ( int val, btnode *root ) ;
		void clear ( btnode *root, int k ) ;
		void copysucc ( btnode *root, int i ) ;
		void restore ( btnode *root, int i ) ;
		void rightshift ( int k ) ;
		void leftshift ( int k ) ;
		void merge ( int k ) ;
		void show( ) ;
		static void display ( btnode *root ) ;
		static void deltree ( btnode *root ) ;
		~btree( ) ;
} ;
 
btree :: btree( )
{
	root = NULL ;
}
void btree :: insert ( int val )
{
	int i ;
	btnode *c, *n ;
	int flag ;
	flag = setval ( val, root, &i, &c ) ;
	if ( flag )
	{
		n = new btnode ;
		n -> count = 1 ;
		n -> value[1] = i ;
		n -> child[0] = root ;
		n -> child[1] = c ;
		root = n ;
	}
}
int btree :: setval ( int val, btnode *n, int *p, btnode **c )
{
	int k ;
	if ( n == NULL )
	{
		*p = val ;
		*c = NULL ;
		return 1 ;
	}
	else
	{
		if ( searchnode ( val, n, &k ) )
			cout << endl << "Key value already exists." << endl ;
		if ( setval ( val, n -> child[k], p, c ) )
		{
			if ( n -> count < MAX )
			{
				fillnode ( *p, *c, n, k ) ;
				return 0 ;
			}
			else
			{
				split ( *p, *c, n, k, p, c ) ;
				return 1 ;
			}
		}
		return 0 ;
	}
}
btnode * btree :: search ( int val, btnode *root, int *pos )
{
	if ( root == NULL )
		return NULL ;
	else
	{
		if ( searchnode ( val, root, pos ) )
			return root ;
		else
			return search ( val, root -> child[*pos], pos ) ;
	}
}
int btree :: searchnode ( int val, btnode *n, int *pos )
{
	if ( val < n -> value[1] )
	{
		*pos = 0 ;
		return 0 ;
	}
	else
	{
		*pos = n -> count ;
		while ( ( val < n -> value[*pos] ) && *pos > 1 )
			( *pos )-- ;
		if ( val == n -> value[*pos] )
			return 1 ;
		else
			return 0 ;
	}
}
void btree :: fillnode ( int val, btnode *c, btnode *n, int k )
{
	int i ;
	for ( i = n -> count ; i > k ; i-- )
	{
		n -> value[i + 1] = n -> value[i] ;
		n -> child[i + 1] = n -> child[i] ;
	}
	n -> value[k + 1] = val ;
	n -> child[k + 1] = c ;
	n -> count++ ;
}
void btree :: split ( int val, btnode *c, btnode *n,
		int k, int *y, btnode **newnode )
{
	int i, mid ;
 
	if ( k <= MIN )
		mid = MIN ;
	else
		mid = MIN + 1 ;
 
	*newnode = new btnode ;
 
	for ( i = mid + 1 ; i <= MAX ; i++ )
	{
		( *newnode ) -> value[i - mid] = n -> value[i] ;
		( *newnode ) -> child[i - mid] = n -> child[i] ;
	}
 
	( *newnode ) -> count = MAX - mid ;
	n -> count = mid ;
 
	if ( k <= MIN )
		fillnode ( val, c, n, k ) ;
	else
		fillnode ( val, c, *newnode, k - mid ) ;
 
	*y = n -> value[n -> count] ;
	( *newnode ) -> child[0] = n -> child[n -> count] ;
	n -> count-- ;
}
void btree :: del ( int val )
{
	btnode * temp ;
 
	if ( ! delhelp ( val, root ) )
		cout << endl << "Value " << val << " not found." ;
	else
	{
		if ( root -> count == 0 )
		{
			temp = root ;
			root = root -> child[0] ;
			delete temp ;
		}
	}
}
int btree :: delhelp ( int val, btnode *root )
{
	int i ;
	int flag ;
 
	if ( root == NULL )
		return 0 ;
	else
	{
		flag = searchnode ( val, root, &i ) ;
		if ( flag )
		{
			if ( root -> child[i - 1] )
			{
				copysucc ( root, i ) ;
				flag = delhelp ( root -> value[i], root -> child[i] ) ;
				if ( !flag )
					cout << endl << "Value " << val << " not found." ;
			}
			else
				clear ( root, i ) ;
		}
		else
			flag = delhelp ( val, root -> child[i] ) ;
		if ( root -> child[i] != NULL )
		{
			if ( root -> child[i] -> count < MIN )
				restore ( root, i ) ;
		}
		return flag ;
	}
}
void btree :: clear ( btnode *root, int k )
{
	int i ;
	for ( i = k + 1 ; i <= root -> count ; i++ )
	{
		root -> value[i - 1] = root -> value[i] ;
		root -> child[i - 1] = root -> child[i] ;
	}
	root -> count-- ;
}
void btree :: copysucc ( btnode *root, int i )
{
	btnode *temp = root -> child[i] ;
 
	while ( temp -> child[0] )
		temp = temp -> child[0] ;
 
	root -> value[i] = temp -> value[1] ;
}
void btree :: restore ( btnode *root, int i )
{
	if ( i == 0 )
	{
		if ( root -> child [1] -> count > MIN )
			leftshift ( 1 ) ;
		else
			merge ( 1 ) ;
	}
	else
	{
		if ( i == root -> count )
		{
			if ( root -> child[i - 1] -> count > MIN )
				rightshift ( i ) ;
			else
				merge ( i ) ;
		}
		else
		{
			if ( root -> child[i - 1] -> count > MIN )
				rightshift ( i ) ;
			else
			{
				if ( root -> child[i + 1] -> count > MIN )
					leftshift ( i + 1 ) ;
				else
					merge ( i ) ;
			}
		}
	}
}
void btree :: rightshift ( int k )
{
	int i ;
	btnode *temp ;
 
	temp = root -> child[k] ;
 
	for ( i = temp -> count ; i > 0 ; i-- )
	{
		temp -> value[i + 1] = temp -> value[i] ;
		temp -> child[i + 1] = temp -> child[i] ;
	}
 
	temp -> child[1] = temp -> child[0] ;
	temp -> count++ ;
	temp -> value[1] = root -> value[k] ;
	temp = root -> child[k - 1] ;
	root -> value[k] = temp -> value[temp -> count] ;
	root -> child[k] -> child [0] = temp -> child[temp -> count] ;
	temp -> count-- ;
}
void btree :: leftshift ( int k )
{
	btnode *temp ;
 
	temp = root -> child[k - 1] ;
	temp -> count++ ;
	temp -> value[temp -> count] = root -> value[k] ;
	temp -> child[temp -> count] = root -> child[k] -> child[0] ;
	temp = root -> child[k] ;
	root -> value[k] = temp -> value[1] ;
	temp -> child[0] = temp -> child[1] ;
	temp -> count-- ;
	for ( int i = 1 ; i <= temp -> count ; i++ )
	{
		temp -> value[i] = temp -> value[i + 1] ;
		temp -> child[i] = temp -> child[i + 1] ;
	}
}
void btree :: merge ( int k )
{
	btnode *temp1, *temp2 ;
	temp1 = root -> child[k] ;
	temp2 = root -> child[k - 1] ;
	temp2 -> count++ ;
	temp2 -> value[temp2 -> count] = root -> value[k] ;
	temp2 -> child[temp2 -> count] = root -> child[0] ;
	for ( int i = 1 ; i <= temp1 -> count ; i++ )
	{
		temp2 -> count++ ;
		temp2 -> value[temp2 -> count] = temp1 -> value[i] ;
		temp2 -> child[temp2 -> count] = temp1 -> child[i] ;
	}
	for ( i = k ; i < root -> count ; i++ )
	{
		root -> value[i] = root -> value[i + 1] ;
		root -> child[i] = root -> child[i + 1] ;
	}
	root -> count-- ;
	delete temp1 ;
}
void btree :: show( )
{
	display ( root ) ;
}
void btree :: display ( btnode *root )
{
	if ( root != NULL )
	{
		for ( int i = 0 ; i < root -> count ; i++ )
		{
			display ( root -> child[i] ) ;
			cout << root -> value[i + 1] << "\t" ;
		}
		display ( root -> child[i] ) ;
	}
}
void btree :: deltree ( btnode *root )
{
	if ( root != NULL )
	{
		for ( int i = 0 ; i < root -> count ; i++ )
		{
			deltree ( root -> child[i] ) ;
			delete ( root -> child[i] ) ;
		}
		deltree ( root -> child[i] ) ;
		delete ( root -> child[i] ) ;
	}
}
 
btree :: ~btree( )
{
	deltree ( root ) ;
}
 
void main( )
{
	btree b ;
	int arr[ ] = { 27, 42, 22, 47, 32, 2, 51, 40, 13 } ;
	int sz = sizeof ( arr ) / sizeof ( int ) ;
	for ( int i = 0 ; i < sz ; i++ )
		b.insert ( arr[i] ) ;
	cout << "B-tree of order 5:" << endl ;
	b.show( ) ;
	b.del ( 22 ) ;
	b.del ( 11 ) ;
	cout << "\n\nB-tree after deletion of values:" << endl ;
	b.show( ) ;
}

Leave a Reply

Post Navigation