home technical personal links weddings
spacer general tech mapinfo/gis oracle/database misc/useless

OLAP/OLTP Information

Data Mart
While traditional legacy data warehouses attempt to be all things to all people, data marts distribute it to serve the specific needs of individual workgroups or departments - they focus on one subject area or on one group of users - therefore thy should be easier to navigate and understand than a data warehouse. Can have many data marts, but only one enterprise data warehouse in an organization. They also don't contain operational data store information like in a data warehouse.

Data Mining
Identifies and characterizes interrelationships among multivariable dimensions without requiring a human to ask specific questions. Used for large applications that grow rapidly and have high computational requirements. It will go through vast amounts of data and look for hidden trends and correlations. Typical question would be "What are the characteristics of customers who switch long-distance carriers?" The response might list factors not anticipated by the user, such as spending more than $75 per month on long-distance service, subscribing to a calling plan and living in a large city. It enables knowledge discovery rather than knowledge verification like with ad hoc queries and OLAP. Typical data mining tasks are to learn about associations/affinity, sequences/temporal patterns, clustering, classification, and forecasting/prediction. Data mining algorithms include association rule induction, descision trees, K-nearest neighbor, neural network, and genetic algorithms.

Data Warehouse
Place to store organization's massive information infrastructure. When an organization consolidates the mountains of data they have an then use OLAP tools to help understand their customers, markets, and internal processes better. They usually contain hundreds of gigabytes of data - the data is not static and is usually updated nightly. It doesn't need as much normalization like for an OLTP system b/c speed of queries, not space, is the key.

DSS
Decision Support System

OLAP
On-line analytical processing - ability to analyze data across multiple dimensions in order to support critical decsion making. Data is in a multidimensional (array-oriented) model like used in Oracle Express Server. Often used against a data warehouse. Answers question like 'How much have we spent on health benefits, by month, in division X, in each state, compared to plan?' - 5-dimensional question. Provide functions for data synthesis, analysis, and consolidation - try to discover trends and anomalies via it. Ask 'What if' questions in addition to 'What' questions. Use Oracle/Discover 2000's GUI to access it.

OLTP
On-line transaction processing - ability to capture, manipulate, and view transactions for detailed processing of specific events - used against typical RDBMS. Answers question like 'How much have we spent?'

ROLAP
Relational OLAP - represents multi-dimensional data using an RDBMS as its basis - done using a star schema.

VLDB
Very Large Databases - Gartner defines an OLTP VLDB as anything bigger than 200 gigabytes and a VLDB data warehouse as anything bigger than 500 gigabytes.

<-- from oracle developer programme news dec/jan '97 p. 8 -->

Top 10 Really Slick Data Warehouse Tricks

  1. Use star schema hierarchies to facilitate drill down/roll-up queries.
  2. Reduce the number of joins needed by using Denormalization (judiciously!).
  3. Consider data integrity issues.
    No online updates so may not want to bother with referntial integrity constraints - use UNRECOVERABLE clause in load and index rebuilds (bypasses the logs) - use new export option DIRECT=TRUE (bypasses the buffer cache so twice as fast) - consider caching small tables by using the CACHE option in the CREATE TABLE statement
  4. Avoid large row size.
    The longer the row size the longer table scans will take - use shorter character stinr and then use the DECODE function to translate them
  5. Use table partitioning and parallel query whenever possible.
    Logical partitioning involves splitting a table into horizontal partitions creating separate tables (use Partitioned Views in Oracle 7.3) - Physical partitioning is where you split one table across multiple disks for performance (like RAID-0 disk striping)
  6. Optimize loads and scrubbing operations
    use Oracle tools tkprof and EXPLAIN on scrub programs
  7. Get creative about aggregations
  8. Use 7.3 degreees of parallelization
    set degree of parallelism at twice the number of CPUs in the CREATE TABLE statement PARALLEL(DEGREE X) clause - use optimizer hints to specify the degree of parallelization for consistency with the /*+PARALLEL(tablename X) */* syntax
  9. Index ideas
    Oracle doesn't parallize index scans so to force a table scan use SELECT /*+PARALLEL(table, X) FULL(table)*/* - will get a large performance gain if all the columns speicifed in both the SELCT and WHERE clauses are covered in an index (if only need to add one more do it) - compute statistics, don';t estimate them - don't have to many indexes because they take up space and slow loads down - at load time use ALTER INDEX REBUILD which creates an index from an existing one (considerably faster) and use the UNRECOVERABLE option
  10. Use tools whenever possible
<-- from oracle developer programme news dec/jan '97 p. 8 -->

Top 10 Data Warehouse Tricks

  1. Avoiding joins is a good thing.
  2. Very large row size is a bad thing.
  3. Normalize your data, then Denormalize, then Overnormalize, then normalize again.
  4. If a certain report is required frequently, put all of the information required into one table.
  5. Each entity must have a primary key. Use shorter primary keys for faster joins.
  6. Oracle will scan only the tables required by the query. Queries will run faster on a smaller table.
  7. Use dimensions to partition tables, departments, product lines, geographic regions, etc.
  8. Don't have too many indices! Try hashing.
  9. Bit-mapped indexes are good for columns with a small domain, such as "male", "female" or binary-type values.
  10. Code generators and other tools can make life easier.

General Application Differences

Charactersitics RDBMS OLTP RDBMS DSS/Warehouse MDBMS OLAP
Typical operation Update Report Analyze
Level of analytical requirements Low Medium High
Screens Unchanging User-defined User-defined
Amount of data per transaction Small Small to large Large
Data level Detail Detail and summary Mostly summary
Age of data Current Historical and current Historical, current, and projected
Orientation Records Records Arrays


for questions/comments: kgmahoney@yahoo.com   © 2001-2017 kmahoney.com