Search This Blog

Thursday, February 14, 2013

Merge statement in oracle 9i


Merge statement in oracle 9i

This article introduces the new MERGE SQL command (sometimes referred to as "UPSERT"). MERGE is a DML command that enables us to optionally update or insert data into a target table, depending on whether matching records already exist. In versions prior to 9i, we would have to code this scenario either in separate bulk SQL statements or in PL/SQL. We will compare MERGE to these methods later in this article.

getting started: sample data

Before we begin, we will create two scratch tables to work with; a source table and a target table. We will start by creating the source table as a selection of ALL_OBJECTS data.
SQL> CREATE TABLE source_table
  2  NOLOGGING
  3  AS
  4     SELECT ROWNUM AS object_id
  5     ,      object_name
  6     ,      object_type
  7     FROM   all_objects;

Table created.
To add some volume and some data with different characteristics, we will add to our source table as follows.
SQL> INSERT /*+ APPEND */ INTO source_table
  2  SELECT ROWNUM + (COUNT(*) OVER () * 5) AS object_id
  3  ,      LOWER(object_name)              AS object_name
  4  ,      SUBSTR(object_type,1,1)         AS object_type
  5  FROM   all_objects;

28963 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT /*+ APPEND */ INTO source_table
  2  SELECT ROWNUM + (COUNT(*) OVER() * 10) AS object_id
  3  ,      INITCAP(object_name)            AS object_name
  4  ,      SUBSTR(object_type,-1)          AS object_type
  5  FROM   all_objects;

28963 rows created.

SQL> COMMIT;

Commit complete.
Finally, we will add a primary key as follows.
SQL> ALTER TABLE source_table ADD PRIMARY KEY (object_id);

Table altered.
Our target table will be a 50% sample of the source table. Like the source table, it will also have a primary key on OBJECT_ID.
SQL> CREATE TABLE target_table
  2  NOLOGGING
  3  AS
  4     SELECT *
  5     FROM   source_table SAMPLE (50);

Table created.

SQL> ALTER TABLE target_table ADD PRIMARY KEY (object_id);

Table altered.

the merge statement

We can now see an example of the MERGE statement. In the following example, we will merge the source table into the target table. We will capture a count of the target table rows before and after the merge.
SQL> SELECT COUNT(*) FROM target_table;

  COUNT(*)
----------
     43485

1 row selected.

SQL> MERGE
  2     INTO  target_table tgt
  3     USING source_table src
  4     ON  ( src.object_id = tgt.object_id )
  5  WHEN MATCHED
  6  THEN
  7     UPDATE
  8     SET   tgt.object_name = src.object_name
  9     ,     tgt.object_type = src.object_type
 10  WHEN NOT MATCHED
 11  THEN
 12     INSERT ( tgt.object_id
 13            , tgt.object_name
 14            , tgt.object_type )
 15     VALUES ( src.object_id
 16            , src.object_name
 17            , src.object_type );

86889 rows merged.
The syntax at first looks a little daunting, but if we read through from top to bottom, it is quite intuitive. Note the following clauses:
  • MERGE (line 1): as stated previously, this is now the 4th DML statement in Oracle. Any hints we might wish to add directly follow this keyword (i.e. MERGE /*+ HINT */);
  • INTO (line 2): this is how we specify the target for the MERGE. The target must be either a table or an updateable view (an in-line view cannot be used here);
  • USING (line 3): the USING clause represents the source dataset for the MERGE. This can be a single table (as in our example) or an in-line view;
  • ON () (line 4): the ON clause is where we supply the join between the source dataset and target table. Note that the join conditions must be in parentheses;
  • WHEN MATCHED (line 5): this clause is where we instruct Oracle on what to do when we already have a matching record in the target table (i.e. there is a join between the source and target datasets). We obviously want an UPDATE in this case. One of the restrictions of this clause is that we cannot update any of the columns used in the ON clause (though of course we don't need to as they already match). Any attempt to include a join column will raise an unintuitive invalid identifier exception; and
  • WHEN NOT MATCHED (line 10): this clause is where we INSERT records for which there is no current match.
Note that sqlplus reports the number of rows merged. This includes both the updates and inserts. Oracle treats MERGE as a MERGE and not an UPDATE+INSERT statement. The same is true of SQL%ROWCOUNT in PL/SQL.
As a rough sanity check, we can report the record count in the target table following the MERGE. We can see that this is the same as the MERGE count.
SQL> SELECT COUNT(*) FROM target_table;

  COUNT(*)
----------
     86889

1 row selected.

merge performance considerations

MERGE is useful for combining larger source and target datasets, particularly for slowly-changing dimensions in data warehouses. If we explain a MERGE statement, we can see the mechanism Oracle uses to determine whether source and target rows match. The following output is an Autotrace explain plan for our original MERGE statement from above.

Execution Plan
----------------------------------------------------------
   0      MERGE STATEMENT Optimizer=CHOOSE (Cost=194 Card=86889 Bytes=9557790)
   1    0   MERGE OF 'TARGET_TABLE'
   2    1     VIEW
   3    2       HASH JOIN (OUTER) (Cost=194 Card=86889 Bytes=7038009)
   4    3         TABLE ACCESS (FULL) OF 'SOURCE_TABLE' (Cost=46 Card=86889 Bytes=2867337)
   5    3         TABLE ACCESS (FULL) OF 'TARGET_TABLE' (Cost=24 Card=18950 Bytes=909600)
We can see that Oracle performs an outer join between the source dataset and target table (in our case we happen to have a hash join). This creates a non-mergeable view (this is an unfortunate coincidence in terminology) that is applied back to the target table. Without the outer join, Oracle would need to implement a "two-pass" solution such as we might code ourselves with a separate INSERT and UPDATE statement.
Major performance gains will be achieved by tuning the source-target join (for example, using indexes, hints, partitioning etc) or by tuning the USING clause if it is a complex in-line view. In addition, we can achieve some minor gains by ordering the WHEN clauses in a MERGE according to which event is the most likely (i.e. most frequent first), although at our sample data volume, the effects of this are not generally noticeable.

merge vs bulk insert + update

We can compare the overall performance of MERGE against a pre-9i SQL solution. In the following example, we will use a variation of Tom Kyte's RUNSTATS utility to compare our original MERGE with a solution that runs a bulk update followed by a bulk insert. We will pause the statistics between the two runs to reset the data. We will begin with the MERGE.
SQL> exec runstats_pkg.rs_start();

PL/SQL procedure successfully completed.

SQL> MERGE
  2     INTO  target_table tgt
  3     USING source_table src
  4     ON  ( src.object_id = tgt.object_id )
  5  WHEN MATCHED
  6  THEN
  7     UPDATE
  8     SET   tgt.object_name = src.object_name
  9     ,     tgt.object_type = src.object_type
 10  WHEN NOT MATCHED
 11  THEN
 12     INSERT ( tgt.object_id
 13            , tgt.object_name
 14            , tgt.object_type )
 15     VALUES ( src.object_id
 16            , src.object_name
 17            , src.object_type );

86889 rows merged.
Now we will pause runstats, rollback the merged data to reset it, and resume our resource snapshots.
SQL> exec runstats_pkg.rs_pause();

PL/SQL procedure successfully completed.

SQL> ROLLBACK;

Rollback complete.

SQL> exec runstats_pkg.rs_resume();

PL/SQL procedure successfully completed.
We will now run a bulk update and insert as separate SQL statements. The update is written as an updateable in-line view which is often the fastest technique for bulk updating one table from another. The sqlplus feedback gives us the breakdown of the previous merge rowcount.
SQL> UPDATE ( SELECT src.object_name AS src_name
  2           ,      src.object_type AS src_type
  3           ,      tgt.object_name AS tgt_name
  4           ,      tgt.object_type AS tgt_type
  5           FROM   source_table src
  6           ,      target_table tgt
  7           WHERE  src.object_id = tgt.object_id )
  8  SET tgt_name = src_name
  9  ,   tgt_type = src_type;

43485 rows updated.

SQL> INSERT INTO target_table tgt
  2  SELECT *
  3  FROM   source_table src
  4  WHERE  NOT EXISTS ( SELECT NULL
  5                      FROM   target_table tgt
  6                      WHERE  src.object_id = tgt.object_id );

43404 rows created.
Finally we can report the differences in time and resources with runstats. We will only report major differences to keep the output to a minimum, as follows.
SQL> exec runstats_pkg.rs_stop(1000);
Run1 ran in 1239 hsecs
Run2 ran in 522 hsecs
Run1 ran in 237.36% of the time


Name                                  Run1        Run2        Diff
STAT..hot buffers moved to hea           2       1,024       1,022
LATCH.simulator hash latch           5,423       7,820       2,397
LATCH.checkpoint queue latch        12,435       8,966      -3,469
STAT..session pga memory max         7,904           0      -7,904
STAT..redo entries                  87,326      44,564     -42,762
LATCH.redo allocation               87,793      44,871     -42,922
LATCH.cache buffers chains         580,501     499,486     -81,015
STAT..db block changes             175,708      89,825     -85,883
STAT..db block gets                132,934      46,622     -86,312
STAT..index fetch by key                 0      86,889      86,889
STAT..table scan rows gotten       130,374     217,263      86,889
STAT..consistent gets - examin         230      87,120      86,890
STAT..buffer is not pinned cou         978      88,317      87,339
STAT..no work - consistent rea         969      88,312      87,343
STAT..session logical reads        134,389     222,507      88,118
STAT..consistent gets                1,455     175,885     174,430
STAT..redo size                 23,369,732  15,606,560  -7,763,172


Run1 latches total versus run2 -- difference and pct
Run1        Run2        Diff        Pct
693,976     568,252    -125,724    122.12%

PL/SQL procedure successfully completed.
We can see that MERGE performed less well than our two-part SQL solution; with it taking over twice as long. It generated more redo and used more latches. We can repeat the test against a typical PL/SQL-coded merge (common in older applications). We will replace the two-part SQL solution with a PL/SQL loop that will attempt an update first and insert only if the update affects no rows. The alternative to this would be to insert first and only update when a DUP_VAL_ON_INDEX exception was raised (a primary or unique key is required for this to work).
SQL> exec runstats_pkg.rs_start();

PL/SQL procedure successfully completed.

SQL> MERGE
  2     INTO  target_table tgt
  3     USING source_table src
  4     ON  ( src.object_id = tgt.object_id )
  5  WHEN MATCHED
  6  THEN
  7     UPDATE
  8     SET   tgt.object_name = src.object_name
  9     ,     tgt.object_type = src.object_type
 10  WHEN NOT MATCHED
 11  THEN
 12     INSERT ( tgt.object_id
 13            , tgt.object_name
 14            , tgt.object_type )
 15     VALUES ( src.object_id
 16            , src.object_name
 17            , src.object_type );

86889 rows merged.

SQL> exec runstats_pkg.rs_pause();

PL/SQL procedure successfully completed.

SQL> ROLLBACK;

Rollback complete.

SQL> exec runstats_pkg.rs_resume();

PL/SQL procedure successfully completed.

SQL> DECLARE
  2     i PLS_INTEGER := 0;
  3     u PLS_INTEGER := 0;
  4  BEGIN
  5     FOR r IN ( SELECT * FROM source_table )
  6     LOOP
  7
  8        UPDATE target_table tgt
  9        SET    tgt.object_name = r.object_name
 10        ,      tgt.object_type = r.object_type
 11        WHERE  tgt.object_id = r.object_id;
 12
 13        u := u + SQL%ROWCOUNT;
 14
 15        IF SQL%ROWCOUNT = 0 THEN
 16           INSERT INTO target_table
 17              ( object_id, object_name, object_type )
 18           VALUES
 19              ( r.object_id, r.object_name, r.object_type );
 20           i := i + 1;
 21        END IF;
 22
 23     END LOOP;
 24
 25     DBMS_OUTPUT.PUT_LINE( u || ' rows updated.' );
 26     DBMS_OUTPUT.PUT_LINE( i || ' rows inserted.' );
 27  END;
 28  /
43485 rows updated.
43404 rows inserted.

PL/SQL procedure successfully completed.
We can now report the differences between the MERGE and the PL/SQL solution.
SQL> exec runstats_pkg.rs_stop(1000);
Run1 ran in 1197 hsecs
Run2 ran in 2617 hsecs
Run1 ran in 45.74% of the time


Name                                  Run1        Run2        Diff
STAT..CPU used by this session       1,087       2,283       1,196
STAT..CPU used when call start       1,087       2,283       1,196
STAT..hot buffers moved to hea          13       1,299       1,286
LATCH.cache buffers lru chain        2,742       4,825       2,083
STAT..recursive cpu usage                2       2,137       2,135
LATCH.checkpoint queue latch         8,576      15,028       6,452
LATCH.simulator hash latch           5,406      18,595      13,189
STAT..shared hash latch upgrad           8      43,092      43,084
STAT..redo entries                  87,340     130,585      43,245
LATCH.redo allocation               87,790     131,215      43,425
STAT..table scan rows gotten       130,374      86,889     -43,485
STAT..db block gets                132,942     177,148      44,206
STAT..Cached Commit SCN refere         286      56,854      56,568
STAT..table scan blocks gotten         951      86,889      85,938
STAT..buffer is not pinned cou         975      86,924      85,949
STAT..db block changes             175,740     262,562      86,822
STAT..index fetch by key                 0      86,891      86,891
STAT..no work - consistent rea         967     129,991     129,024
STAT..execute count                     13     130,338     130,325
STAT..calls to get snapshot sc          28     130,355     130,327
STAT..consistent gets - examin         230     130,962     130,732
STAT..recursive calls                  567     218,074     217,507
STAT..consistent gets                1,469     260,988     259,519
LATCH.library cache pin                119     261,164     261,045
LATCH.library cache                    166     261,675     261,509
STAT..session logical reads        134,411     438,136     303,725
LATCH.cache buffers chains         579,034   1,190,456     611,422
STAT..redo size                 23,372,796  32,740,676   9,367,880


Run1 latches total versus run2 -- difference and pct
Run1        Run2        Diff        Pct
687,702   1,889,744   1,202,042     36.39%

PL/SQL procedure successfully completed.
It is probably no surprise that the MERGE was significantly faster than the PL/SQL solution. We could speed up the latter by using bulk processing, but we wouldn't be able to achieve a reduction of two-thirds required to match the MERGE.

merge and key preservation

MERGE is a deterministic, key-preserved operation. This means that for each source row, Oracle needs to be able to identify a single target record for update. The simplest method of ensuring that the MERGE is key-preserved is to join source and target according to the primary key of the target. We can demonstrate what happens if we cannot ensure key-preservation by modifying our MERGE to join on a column other than that of the primary key.
SQL> MERGE
  2     INTO  target_table tgt
  3     USING source_table src
  4     ON  ( src.object_name = tgt.object_name )
  5  WHEN MATCHED
  6  THEN
  7     UPDATE
  8     SET    tgt.object_type = src.object_type
  9  WHEN NOT MATCHED
 10  THEN
 11     INSERT ( tgt.object_id
 12            , tgt.object_name
 13            , tgt.object_type )
 14     VALUES ( src.object_id
 15            , src.object_name
 16            , src.object_type );
   USING source_table src
         *
ERROR at line 3:
ORA-30926: unable to get a stable set of rows in the source tables
This error message is slightly confusing. It has been raised because Oracle found more than one target row that matched a single source row. MERGE will allow multiple updates of a single target row, however, as long as the join is key-preserved. In the following example, we'll remove the primary key from the source table and duplicate some source data. We will revert to the key-preserved join and this time our MERGE should be successful.
SQL> ALTER TABLE source_table DROP PRIMARY KEY;

Table altered.

SQL> INSERT INTO source_table
  2  SELECT *
  3  FROM   target_table
  4  WHERE  ROWNUM <= 10000;

10000 rows created.

SQL> MERGE
  2     INTO  target_table tgt
  3     USING source_table src
  4     ON  ( src.object_id = tgt.object_id )
  5  WHEN MATCHED
  6  THEN
  7     UPDATE
  8     SET    tgt.object_type = src.object_type
  9     ,      tgt.object_name = src.object_name
 10  WHEN NOT MATCHED
 11  THEN
 12     INSERT ( tgt.object_id
 13            , tgt.object_name
 14            , tgt.object_type )
 15     VALUES ( src.object_id
 16            , src.object_name
 17            , src.object_type );

96889 rows merged.
This simply means that a single target row might be updated multiple times, which could give rise to some data quality issues (for example, if compound arithmetic such as summing is included in the update of some columns).

merge in pl/sql

MERGE can be used with in PL/SQL (even row-by-row) but should generally be considered a set operation. The following example shows how we might use MERGE with FORALL.
SQL> DECLARE
  2
  3     TYPE aat_id IS TABLE OF source_table.object_id%TYPE
  4        INDEX BY PLS_INTEGER;
  5     TYPE aat_name IS TABLE OF source_table.object_name%TYPE
  6        INDEX BY PLS_INTEGER;
  7     TYPE aat_type IS TABLE OF source_table.object_type%TYPE
  8        INDEX BY PLS_INTEGER;
  9
 10     aa_ids   aat_id;
 11     aa_names aat_name;
 12     aa_types aat_type;
 13
 14     CURSOR c_source IS
 15        SELECT object_id
 16        ,      object_name
 17        ,      object_type
 18        FROM   source_table;
 19
 20     n PLS_INTEGER := 0;
 21
 22  BEGIN
 23
 24     OPEN c_source;
 25     LOOP
 26        FETCH c_source BULK COLLECT INTO aa_ids, aa_names, aa_types LIMIT 1000;
 27
 28        --[some processing]--
 29
 30        FORALL i IN 1 .. aa_ids.COUNT
 31           MERGE
 32              INTO  target_table tgt
 33              USING (
 34                     SELECT aa_ids(i) AS object_id
 35                     ,      aa_names(i) AS object_name
 36                     ,      aa_types(i) AS object_type
 37                     FROM   dual
 38                    ) src
 39              ON   ( src.object_id = tgt.object_id )
 40           WHEN MATCHED
 41           THEN
 42              UPDATE
 43              SET   tgt.object_name = src.object_name
 44              ,     tgt.object_type = src.object_type
 45           WHEN NOT MATCHED
 46           THEN
 47              INSERT ( tgt.object_id
 48                     , tgt.object_name
 49                     , tgt.object_type )
 50              VALUES ( src.object_id
 51                     , src.object_name
 52                     , src.object_type );
 53
 54        n := n + SQL%ROWCOUNT;
 55
 56        EXIT WHEN c_source%NOTFOUND;
 57     END LOOP;
 58     CLOSE c_source;
 59
 60     DBMS_OUTPUT.PUT_LINE(n || ' rows merged.');
 61  END;
 62  /
86889 rows merged.

PL/SQL procedure successfully completed.
We can see that this is quite cumbersome. To use MERGE with either row-by-row PL/SQL or FORALL (as we saw above), each source record must be selected from DUAL to generate a rowsource. We cannot use the variables in direct assignments within the update or insert sections of the MERGE itself. This will generally perform better than row-by-row "insert-else-update" or "update-else-insert", but it is not generally recommended as an approach for reasonable data volumes.

merge and triggers

The MERGE statement will cause before and after event triggers to fire. Ironically, we have stated more than once that MERGE is a DML statement in its own right, yet Oracle will fire UPDATE and INSERT triggers if these events occur within the merge. We can demonstrate this quite easily below. We will create separate insert and update triggers and simply output a message from each.
SQL> CREATE OR REPLACE TRIGGER insert_trigger
  2     AFTER INSERT
  3     ON    target_table
  4  BEGIN
  5     DBMS_OUTPUT.PUT_LINE('Insert trigger...');
  6  END;
  7  /

Trigger created.

SQL> CREATE OR REPLACE TRIGGER update_trigger
  2     AFTER UPDATE
  3     ON    target_table
  4  BEGIN
  5     DBMS_OUTPUT.PUT_LINE('Update trigger...');
  6  END;
  7  /

Trigger created.

SQL> MERGE
  2     INTO  target_table tgt
  3     USING source_table src
  4     ON  ( src.object_id = tgt.object_id )
  5  WHEN MATCHED
  6  THEN
  7     UPDATE
  8     SET   tgt.object_name = src.object_name
  9     ,     tgt.object_type = src.object_type
 10  WHEN NOT MATCHED
 11  THEN
 12     INSERT ( tgt.object_id
 13            , tgt.object_name
 14            , tgt.object_type )
 15     VALUES ( src.object_id
 16            , src.object_name
 17            , src.object_type );
Update trigger...
Insert trigger...

86889 rows merged.
As an aside, we can compile a MERGE trigger, but it doesn't seem to do anything (and the documentation doesn't contain any information on this either).
SQL> DROP TRIGGER insert_trigger;

Trigger dropped.

SQL> DROP TRIGGER update_trigger;

Trigger dropped.

SQL> CREATE OR REPLACE TRIGGER merge_trigger
  2     AFTER MERGE
  3     ON    target_table
  4  BEGIN
  5     DBMS_OUTPUT.PUT_LINE('Merge trigger...');
  6  END;
  7  /

Trigger created.

SQL> MERGE
  2     INTO  target_table tgt
  3     USING source_table src
  4     ON  ( src.object_id = tgt.object_id )
  5  WHEN MATCHED
  6  THEN
  7     UPDATE
  8     SET   tgt.object_name = src.object_name
  9     ,     tgt.object_type = src.object_type
 10  WHEN NOT MATCHED
 11  THEN
 12     INSERT ( tgt.object_id
 13            , tgt.object_name
 14            , tgt.object_type )
 15     VALUES ( src.object_id
 16            , src.object_name
 17            , src.object_type );

86889 rows merged.

No comments:

Post a Comment