home technical personal links weddings
spacer general tech mapinfo/gis oracle/database misc/useless

Information on Indexes

B-tree
Binary-tree or Balanced-tree indexes. The standard indexing scheme used in an RDBMS - keep on narrowing the area down by eliminating half the data at a time. Provide good functionality (like partial searches), but they increase disk I/O and multi-user contention. Good for high-cardinality type data like customer name (when column values are unique or near unique - should only return 20% at most of the rows in a table). In general, a B-tree index consists of a root block, any number of branch blocks, and associated leaf blocks that reference data blocks. The whole scheme of a B-tree index is to keep itself balanced so that all requests for a leaf block take about the same amount of time. A B-tree index stores ROWIDs for each key that in turn points to the rows in the table containing the key value. The database attempts to balance the levels across multiple branches. This index is a hierarchical index that contains a header block (some DBMSs use the term "page" or "node" for the term "block") at the top level. The header (root) block contains keys and pointers to the intermediate (branch) blocks, which in turn contain keys and pointers to the leaf block. At the leaf level, all keys are entered with pointers (rowids) to the table data. Every row is represented at the leaf level, while every leaf block has an individual entry at the intermediate level and every intermediate-level block has an entry at the header level.A B-tree index keeps all index entries in order.

Individual entries will be kept in physically continuous blocks but may become logically continuous if a block fills up and an index split must be performed. Within a leaf block, the index entries are not kept in physical order but in logical order. As the table and index are populated, only a header block is first populated. At this stage, the B-tree index is a one-level (and one-block) index. Once this becomes full, leaf blocks are created, and the header block is transformed to point to the first two new leaf blocks. This is now a two-level index. Once the header block again becomes full, intermediate blocks are created. The header block is transformed to point to each intermediate block, and the intermediate blocks are transformed to point to the leaf blocks. The leaf blocks still point to the individual rows on the table. At this stage, we have a three-level index, which is useful for accessing millions of rows. Once the intermediate-level blocks become full, an extra level of intermediate-level blocks can be created, making this a four-level index.

Keeping track of the number of levels in B-tree index structures is important, since this can affect the number of logical and physical reads that are required to access a row of data. Accessing a single row of data using a three-level unique index and a perfectly written query requires that four blocks of data be accessed--the index header, the intermediate-level block, leaf blocks, and the table data. Once an index moves to four levels, an extra I/O may be required. When you see this occur, reorganize your indexes, which could bring the indexes back to a three-level one. In fact, indexes should be reorganized regularly to remove any index block-splitting that may have occurred. You will most commonly see three-level indexes in your database.

Deleting a row that has indexed values has a relatively large impact on system performance. When a row of data is physically deleted, the index entries must be deleted as well, which means that the entry on the leaf block (which points to the data row) must be deleted. Remember that an intermediate block (assuming a three-level B-tree index) will have a single entry that contains a high key for each leaf block; if the row being deleted happens to be the entry on the intermediate block, it must be deleted and a new value must be placed in the intermediate block that corresponds to the new high value on the leaf block. Similarly, the index header block contains a single high value for every intermediate block. If the intermediate-block value that was changed was also the highest value for the intermediate block, then the header block must be changed to enter the new high value of the intermediate block.

When an indexed row is inserted, a new index leaf-block entry must be added. This entry must be placed in the correct leaf block, since the index entries are always kept in order. Some shifting of entries in the leaf block may be needed to keep the values in that block in the correct logical sequence. If the new entry is the new high value in the leaf block, then the intermediate block (assuming a three-level B-tree index) will need to be changed to put the new value pointing to this leaf block in place of the previous high value. Similarly, if this is the new high value of the intermediate block, the header block must be changed to replace the old high value pointing to the intermediate block with the new one. If this were not complicated enough, an insert into an already full leaf block causes a leaf-block split. In this situation, half of the leaf-block entries remain in the same place and a new leaf block will be created with the other half of the entries; the new inserted value will be placed on the correct leaf block. This causes two changes to occur on the intermediate block. First, the original leaf block may need to have the original high value replaced with the new one. Second, the new leaf block will need to be added to the intermediate block. This may also cascade-up to the header block. In the case that an intermediate block is full and a new entry is being added, a "block split" may also occur at the intermediate-block level with similar changes being performed at the header block.

Bitmapped
Represent data values as bits instead of strings or integers - the index assigns a bitmap for each key value - each bit in the bitmap scheme matches a possible ROWID, and if the bit is set, it indicates a hit. Coexist with and complement other available indexing schemes like standard B-tree indexes, clustered tables, and hash indexes. Suited for low-cardinality column data (less than 100 or so values) like gender or department. Can effectively use to retieve large percentages of a table (up to 80%) and be used to get results based on nulls (since they are also indexed). So if 3 possible regions then 3 columns and either a 0 or 1 value depending on which region it was. Use when where clause contains multiple predicates (lengthy where clause) on low-cardinality columns and the tables have many rows and return many rows. May require 100 times less space than a B-tree index (since you would need several permutations on different concatenated orderings of the columns) and offer significant performance gains. NOT suitable for tables with heavy DML activity (update, insert, delete) or OLTP environments because locking of bitmapped indexes occur at the block level rather than the row level and updates take longer. Also, should only be on multiple columns that have a few distinct values, they aren't good for large numbers of distinct values. Use them for data warehousing applications where do bulk inserts and updates. Can't be used with rules-based optimizer since they aren't considered. Should be rebuilt (like b-tree indxes) if there is a lot of DML activity.

CREATE BITMAP INDEX emp_sex_b1 ON employees (sex);

Bitmap Join Index
The rowids from one table are stored along with the indexed column from the joined table. The result of the join is stored so only one table can be updated concurrently by different transactions and parallel DML is only supported on the fact (parent) table. No table can appear twice in the join and can't create one on an index-organized table (IOT) or on a temporary table.

CREATE BITMAP INDEX empdept_idx ON emp(dept.deptno) FROM emp, dept WHERE emp.deptno = dept.deptno;
SELECT /*+ index(emp empdept_idx) */ count(*) FROM emp, dept WHERE emp.deptno = dept.deptno; (this query would not have to even access the dept table)

CREATE BITMAP INDEX emp_deptloc_ms_idx ON emp(dept.loc, sales.marital_status) FROM emp, dept, sales WHERE emp.deptno = dept.deptno AND emp.empno = sales.empno;
SELECT emp.empno, emp.name, FROM emp, dept, sales WHERE emp.deptno = dept.deptno AND emp.empno = sales.empno and dept.loc = 10 and sales.marital_status = 'N';

Cluster Index
They reduce I/O because related rows are stored together in the same data blocks, they improve access times for joins of the cluster tables, and they reduce the amount of space needed for indexes since the cluster key value is stored only once in the index. The cluster key is the column(s) the cluster tables share. Should be used on data that is queried (based on a range of values), not updated whose columns contain redundant data with a wide range of values, and where the tables are joined a lot (accessed together). By using them the performance of table scans of individual tables in the cluster may be severly degraded.The maximum number of columns in a cluster key is 16, but the actual value depends on the data block size since the cluster key value can't exceed 1/3 of the data block size.

A cluster index is unlike a table index in the following ways:

Hash Clusters
Use on tables that are a consistent size and where equality queries are done on the hashed columns - it isn't good for partial searches like WHERE name = 'SM%'. The hash key should be well distributed (like employee #), otherwise will have chaining in overflow blocks. If the cluster is poorly configured or the size of the cluster changes overflows of the hash keys may occur or the cluser may become sparsely populated - in the first case performance of hash-key retrieval can degrade and in the second case table scans will be lesss efficient. They eliminate the need to perform sorts because in-memory hash table is constructed at run time - avoids traversing a deep B-tree structure. Suited for scaleable parallel execution. Only requires 1 I/O - the index is the Row ID - it uses a hash algorithm on the key value to assign the row to a database block (page). Cannot perform range scans of the hash key.

Examples:
SQL: Select * from customers where gender = 'male';
Answer: parallel table scan - since it will return approximately 50% of the rows

SQL: Select * from customers where cust_no = 101;
Answer: use B-tree index or hash index - will return only a small # of records

SQL: Select * from customers where gender = 'male' and region in('central','west') and marital_status in ('married','divorced');
Answer: bit-mapped index - multiple predicated on low-cardinality data

SQL: Select * from customers where gender = 'male' and cust_no < 100;
Answer: B-tree index on cust_no - return small # of rows on the high-cardinality data

for questions/comments: kgmahoney@yahoo.com   © 2001-2017 kmahoney.com