home technical personal links weddings
spacer general tech mapinfo/gis oracle/database misc/useless

Information on Joins

Join
A query that retrieves data from multiple database tables, specifying how rows are to be related using a join condition. Occasionally one table is joined to itself. There are 3 basic join operations in Oracle Server:

NESTED LOOPS
It's a row operation which returns each row to the next operation as processed rather than waiting for the entire set to be processed. It's the most common way Oracle does a join and in most cases it indicates that an index is available for use during the join so it's a very effective join for multi-user online applications. The Cost-based optimizer selects one of the tables to be the driving table for the join which will be the first table accessed in the join (it's expected to the return the fewest rows). Rule-based optimization selects the last table in the FROM clause as the driving table.

The steps of the operation are:
1. FULL TABLE SCAN of the driving table
2. INDEX RANGE SCAN of the driven table
3. If a match is found then a TABLE ACCESS BY ROWID of the driven table

Hints to use with NESTED LOOPS:
ORDERED - join the tables based on their order in the FROM clause
INDEX - list specific indexes to use
FULL - list a specific table for a FULL TABLE SCAN so this table will be the driving table
USE_NL - list tables to join via NESTED LOOPS joins

MERGE JOIN
It's a set operation that has to process all the rows before anything is returned to the next operation. You usually user merge joins when indexes are either unavailable or disabled by the query's syntax. It's not appropriate for muti-user online operations because it's slow to return the first row of a query, and the result set won't stay in the SGA very long. In batch operations or large reports MERGE JOIN may result in the best possible throughput for the query. Tune by concentrating on the first two steps below - full-table scans can be improved via I/O tuning and improved use of Oracle's multiblock read capabilities or by using the Parallel Query Option. Sort performance is improved by increasing the size of the SORT_AREA_SIZE parameter in INIT.ORA (as much of the data as possible will be sorted in the sort area - if can't hold it all Oracle allocates a temporary segment for the duration of the query which will cost equivalently 10 insert/update/delete statements), upgrading to Oracle 7.2 or 7.3, and dedicating a tablespace to temporary segment activity.

The steps of the operation are:
1. TABLE ACCESS FULL to scan the rows of each table in the join
2. SORT JOIN to sort the results of the data-access operations
3. MERGE JOIN to merge the results of the SORT JOIN operation

SORT-MERGE JOINS
The sort-merge join--also called a merge-scan--is a very popular join performed when accessing a large number of rows. It is also seen in background tasks, batch processing, and decision-support systems.

Take an example of two tables being joined and returning a large number of rows (say, thousands), as follows:

Select * From Table1 T1, Table2 T2 Where T1.Table1_Id = T2.Table1_id;

The merge-scan join will be chosen because the database has detected that a large number of rows need to be processed, and it may also notice that index access to the rows is not efficient, since the data is not clustered (ordered) efficiently for this join. Here are the steps to perform this type of join:

  1. Pick an inner and outer table.
  2. Access the inner table; choose the rows that match the predicates in the Where clause of the SQL statement.
  3. Sort the rows retrieved from the inner table by the joining columns and store these as a temporary table (this step may not be performed if data is ordered by the keys, and efficient index access can be performed).
  4. The outer table may also need to be sorted by the joining columns so that both tables to be joined are sorted in the same manner (this step is also optional and depends on whether the outer table is already well ordered by the keys and whether efficient index access can be ).
  5. Read both outer and inner tables (these may be the sorted temporary tables created in previous steps), choosing rows that match the join criteria. This operation is very quick, since both tables are sorted in the same manner and Database Prefetch can be used.
  6. Optionally sort the data one more time if a Sort was performed (for example, an Order By clause), using columns that are not the same as were used to perform the join.
The merge join can be deceivingly fast because of database prefetch capabilities and the fact that each table is accessed only one time each.

HASH JOIN
It's a combination of row and set operations. It's similar to NESTED LOOPS in that it loops through the rows coming from each step in the plan, but instead of going to the index and to the table to retrieve the row, it builds an internal cache structure that keeps the rows that were retrieved. It tries to keep the cache totally in memory, otherwise it allocates secondary cache on-disk. The memory size allocated is defined with the HASH_AREA_SIZE parameter in the INIT.ORA file. HASH JOINS only work 7.3 with the cost-based optimizer and if the HASH_JOIN_ENABLED parameter in the INIT.ORA file is set to TRUE. It's useful when joining large result sets and also when joining a large result set to a small result set. HASH JOINs are almost always faster than MERGE JOINS. You can alter the HASH_AREA_SIZE for a specific query by doing the following command: ALTER SESSION SET hash_area_size = 65000

The hash join is being implemented in most major DBMSs and is a very efficient join. With the hash join, one table is chosen as the outer table. This is the larger of the two tables in the join--and the other is chosen as the inner table. Both tables are broken into sections, and the inner table is "hashed." This hashing provides an algorithmic pointer that makes data access very efficient. The database attempts to keep the inner table in memory since it will be "scanned" many times.

The outer rows that match the query predicates are then selected, and for each outer table row chosen, hashing is performed on the key, and the hash value is used to quickly find the matching row in the inner table.

This join often can outperform a sort-merge join, particularly when one table is much larger than another. No sorting is performed, and memory usage is efficient for this join.

Example:
Select /*+ USE_HASH(COMPANY) */ COMPANY.Name, SUM(Dollar_Amount) From COMPANY, SALES Where COMPANY.Company_ID = SALES.Company_ID Group by COMPANY.Name;

REFORMATTING JOINS
This join technique is used by Sybase Adaptive Server and many versions of Microsoft SQL Server. This is a variation on the sort-merge join, and it has similar pros and cons.

Reformatting is chosen in cases where a suitable index cannot be found on an inner table, and where the inner table is large enough that repeated table scans are not a viable option. The reformatting option creates an internal table from the inner table and a temporary clustered index on it. This clustered index is created on the join column, and the join can then be performed between the outer table and the new temporary clustered index.



for questions/comments: kgmahoney@yahoo.com   © 2001-2017 kmahoney.com