Merge statement in oracle 9i
This article introduces the new MERGE SQL command (sometimes referred to as "UPSERT"). MERGE is a DML command that enables us to optionally update or insert data into a target table, depending on whether matching records already exist. In versions prior to 9i, we would have to code this scenario either in separate bulk SQL statements or in PL/SQL. We will compare MERGE to these methods later in this article.getting started: sample data
Before we begin, we will create two scratch tables to work with; a source table and a target table. We will start by creating the source table as a selection of ALL_OBJECTS data.SQL> CREATE TABLE source_table 2 NOLOGGING 3 AS 4 SELECT ROWNUM AS object_id 5 , object_name 6 , object_type 7 FROM all_objects;
Table created.To add some volume and some data with different characteristics, we will add to our source table as follows.
SQL> INSERT /*+ APPEND */ INTO source_table 2 SELECT ROWNUM + (COUNT(*) OVER () * 5) AS object_id 3 , LOWER(object_name) AS object_name 4 , SUBSTR(object_type,1,1) AS object_type 5 FROM all_objects;
28963 rows created.
SQL> COMMIT;
Commit complete.
SQL> INSERT /*+ APPEND */ INTO source_table 2 SELECT ROWNUM + (COUNT(*) OVER() * 10) AS object_id 3 , INITCAP(object_name) AS object_name 4 , SUBSTR(object_type,-1) AS object_type 5 FROM all_objects;
28963 rows created.
SQL> COMMIT;
Commit complete.Finally, we will add a primary key as follows.
SQL> ALTER TABLE source_table ADD PRIMARY KEY (object_id);
Table altered.Our target table will be a 50% sample of the source table. Like the source table, it will also have a primary key on OBJECT_ID.
SQL> CREATE TABLE target_table 2 NOLOGGING 3 AS 4 SELECT * 5 FROM source_table SAMPLE (50);
Table created.
SQL> ALTER TABLE target_table ADD PRIMARY KEY (object_id);
Table altered.
the merge statement
We can now see an example of the MERGE statement. In the following example, we will merge the source table into the target table. We will capture a count of the target table rows before and after the merge.SQL> SELECT COUNT(*) FROM target_table;
COUNT(*) ---------- 43485 1 row selected.
SQL> MERGE 2 INTO target_table tgt 3 USING source_table src 4 ON ( src.object_id = tgt.object_id ) 5 WHEN MATCHED 6 THEN 7 UPDATE 8 SET tgt.object_name = src.object_name 9 , tgt.object_type = src.object_type 10 WHEN NOT MATCHED 11 THEN 12 INSERT ( tgt.object_id 13 , tgt.object_name 14 , tgt.object_type ) 15 VALUES ( src.object_id 16 , src.object_name 17 , src.object_type );
86889 rows merged.The syntax at first looks a little daunting, but if we read through from top to bottom, it is quite intuitive. Note the following clauses:
- MERGE (line 1): as stated previously, this is now the 4th DML statement in Oracle. Any hints we might wish to add directly follow this keyword (i.e. MERGE /*+ HINT */);
- INTO (line 2): this is how we specify the target for the MERGE. The target must be either a table or an updateable view (an in-line view cannot be used here);
- USING (line 3): the USING clause represents the source dataset for the MERGE. This can be a single table (as in our example) or an in-line view;
- ON () (line 4): the ON clause is where we supply the join between the source dataset and target table. Note that the join conditions must be in parentheses;
- WHEN MATCHED (line 5): this clause is where we instruct Oracle on what to do when we already have a matching record in the target table (i.e. there is a join between the source and target datasets). We obviously want an UPDATE in this case. One of the restrictions of this clause is that we cannot update any of the columns used in the ON clause (though of course we don't need to as they already match). Any attempt to include a join column will raise an unintuitive invalid identifier exception; and
- WHEN NOT MATCHED (line 10): this clause is where we INSERT records for which there is no current match.
As a rough sanity check, we can report the record count in the target table following the MERGE. We can see that this is the same as the MERGE count.
SQL> SELECT COUNT(*) FROM target_table;
COUNT(*) ---------- 86889 1 row selected.
merge performance considerations
MERGE is useful for combining larger source and target datasets, particularly for slowly-changing dimensions in data warehouses. If we explain a MERGE statement, we can see the mechanism Oracle uses to determine whether source and target rows match. The following output is an Autotrace explain plan for our original MERGE statement from above.Execution Plan ---------------------------------------------------------- 0 MERGE STATEMENT Optimizer=CHOOSE (Cost=194 Card=86889 Bytes=9557790) 1 0 MERGE OF 'TARGET_TABLE' 2 1 VIEW 3 2 HASH JOIN (OUTER) (Cost=194 Card=86889 Bytes=7038009) 4 3 TABLE ACCESS (FULL) OF 'SOURCE_TABLE' (Cost=46 Card=86889 Bytes=2867337) 5 3 TABLE ACCESS (FULL) OF 'TARGET_TABLE' (Cost=24 Card=18950 Bytes=909600)We can see that Oracle performs an outer join between the source dataset and target table (in our case we happen to have a hash join). This creates a non-mergeable view (this is an unfortunate coincidence in terminology) that is applied back to the target table. Without the outer join, Oracle would need to implement a "two-pass" solution such as we might code ourselves with a separate INSERT and UPDATE statement.
Major performance gains will be achieved by tuning the source-target join (for example, using indexes, hints, partitioning etc) or by tuning the USING clause if it is a complex in-line view. In addition, we can achieve some minor gains by ordering the WHEN clauses in a MERGE according to which event is the most likely (i.e. most frequent first), although at our sample data volume, the effects of this are not generally noticeable.
merge vs bulk insert + update
We can compare the overall performance of MERGE against a pre-9i SQL solution. In the following example, we will use a variation of Tom Kyte's RUNSTATS utility to compare our original MERGE with a solution that runs a bulk update followed by a bulk insert. We will pause the statistics between the two runs to reset the data. We will begin with the MERGE.SQL> exec runstats_pkg.rs_start();
PL/SQL procedure successfully completed.
SQL> MERGE 2 INTO target_table tgt 3 USING source_table src 4 ON ( src.object_id = tgt.object_id ) 5 WHEN MATCHED 6 THEN 7 UPDATE 8 SET tgt.object_name = src.object_name 9 , tgt.object_type = src.object_type 10 WHEN NOT MATCHED 11 THEN 12 INSERT ( tgt.object_id 13 , tgt.object_name 14 , tgt.object_type ) 15 VALUES ( src.object_id 16 , src.object_name 17 , src.object_type );
86889 rows merged.Now we will pause runstats, rollback the merged data to reset it, and resume our resource snapshots.
SQL> exec runstats_pkg.rs_pause();
PL/SQL procedure successfully completed.
SQL> ROLLBACK;
Rollback complete.
SQL> exec runstats_pkg.rs_resume();
PL/SQL procedure successfully completed.We will now run a bulk update and insert as separate SQL statements. The update is written as an updateable in-line view which is often the fastest technique for bulk updating one table from another. The sqlplus feedback gives us the breakdown of the previous merge rowcount.
SQL> UPDATE ( SELECT src.object_name AS src_name 2 , src.object_type AS src_type 3 , tgt.object_name AS tgt_name 4 , tgt.object_type AS tgt_type 5 FROM source_table src 6 , target_table tgt 7 WHERE src.object_id = tgt.object_id ) 8 SET tgt_name = src_name 9 , tgt_type = src_type;
43485 rows updated.
SQL> INSERT INTO target_table tgt 2 SELECT * 3 FROM source_table src 4 WHERE NOT EXISTS ( SELECT NULL 5 FROM target_table tgt 6 WHERE src.object_id = tgt.object_id );
43404 rows created.Finally we can report the differences in time and resources with runstats. We will only report major differences to keep the output to a minimum, as follows.
SQL> exec runstats_pkg.rs_stop(1000);
Run1 ran in 1239 hsecs Run2 ran in 522 hsecs Run1 ran in 237.36% of the time Name Run1 Run2 Diff STAT..hot buffers moved to hea 2 1,024 1,022 LATCH.simulator hash latch 5,423 7,820 2,397 LATCH.checkpoint queue latch 12,435 8,966 -3,469 STAT..session pga memory max 7,904 0 -7,904 STAT..redo entries 87,326 44,564 -42,762 LATCH.redo allocation 87,793 44,871 -42,922 LATCH.cache buffers chains 580,501 499,486 -81,015 STAT..db block changes 175,708 89,825 -85,883 STAT..db block gets 132,934 46,622 -86,312 STAT..index fetch by key 0 86,889 86,889 STAT..table scan rows gotten 130,374 217,263 86,889 STAT..consistent gets - examin 230 87,120 86,890 STAT..buffer is not pinned cou 978 88,317 87,339 STAT..no work - consistent rea 969 88,312 87,343 STAT..session logical reads 134,389 222,507 88,118 STAT..consistent gets 1,455 175,885 174,430 STAT..redo size 23,369,732 15,606,560 -7,763,172 Run1 latches total versus run2 -- difference and pct Run1 Run2 Diff Pct 693,976 568,252 -125,724 122.12% PL/SQL procedure successfully completed.We can see that MERGE performed less well than our two-part SQL solution; with it taking over twice as long. It generated more redo and used more latches. We can repeat the test against a typical PL/SQL-coded merge (common in older applications). We will replace the two-part SQL solution with a PL/SQL loop that will attempt an update first and insert only if the update affects no rows. The alternative to this would be to insert first and only update when a DUP_VAL_ON_INDEX exception was raised (a primary or unique key is required for this to work).
SQL> exec runstats_pkg.rs_start();
PL/SQL procedure successfully completed.
SQL> MERGE 2 INTO target_table tgt 3 USING source_table src 4 ON ( src.object_id = tgt.object_id ) 5 WHEN MATCHED 6 THEN 7 UPDATE 8 SET tgt.object_name = src.object_name 9 , tgt.object_type = src.object_type 10 WHEN NOT MATCHED 11 THEN 12 INSERT ( tgt.object_id 13 , tgt.object_name 14 , tgt.object_type ) 15 VALUES ( src.object_id 16 , src.object_name 17 , src.object_type );
86889 rows merged.
SQL> exec runstats_pkg.rs_pause();
PL/SQL procedure successfully completed.
SQL> ROLLBACK;
Rollback complete.
SQL> exec runstats_pkg.rs_resume();
PL/SQL procedure successfully completed.
SQL> DECLARE 2 i PLS_INTEGER := 0; 3 u PLS_INTEGER := 0; 4 BEGIN 5 FOR r IN ( SELECT * FROM source_table ) 6 LOOP 7 8 UPDATE target_table tgt 9 SET tgt.object_name = r.object_name 10 , tgt.object_type = r.object_type 11 WHERE tgt.object_id = r.object_id; 12 13 u := u + SQL%ROWCOUNT; 14 15 IF SQL%ROWCOUNT = 0 THEN 16 INSERT INTO target_table 17 ( object_id, object_name, object_type ) 18 VALUES 19 ( r.object_id, r.object_name, r.object_type ); 20 i := i + 1; 21 END IF; 22 23 END LOOP; 24 25 DBMS_OUTPUT.PUT_LINE( u || ' rows updated.' ); 26 DBMS_OUTPUT.PUT_LINE( i || ' rows inserted.' ); 27 END; 28 /
43485 rows updated. 43404 rows inserted. PL/SQL procedure successfully completed.We can now report the differences between the MERGE and the PL/SQL solution.
SQL> exec runstats_pkg.rs_stop(1000);
Run1 ran in 1197 hsecs Run2 ran in 2617 hsecs Run1 ran in 45.74% of the time Name Run1 Run2 Diff STAT..CPU used by this session 1,087 2,283 1,196 STAT..CPU used when call start 1,087 2,283 1,196 STAT..hot buffers moved to hea 13 1,299 1,286 LATCH.cache buffers lru chain 2,742 4,825 2,083 STAT..recursive cpu usage 2 2,137 2,135 LATCH.checkpoint queue latch 8,576 15,028 6,452 LATCH.simulator hash latch 5,406 18,595 13,189 STAT..shared hash latch upgrad 8 43,092 43,084 STAT..redo entries 87,340 130,585 43,245 LATCH.redo allocation 87,790 131,215 43,425 STAT..table scan rows gotten 130,374 86,889 -43,485 STAT..db block gets 132,942 177,148 44,206 STAT..Cached Commit SCN refere 286 56,854 56,568 STAT..table scan blocks gotten 951 86,889 85,938 STAT..buffer is not pinned cou 975 86,924 85,949 STAT..db block changes 175,740 262,562 86,822 STAT..index fetch by key 0 86,891 86,891 STAT..no work - consistent rea 967 129,991 129,024 STAT..execute count 13 130,338 130,325 STAT..calls to get snapshot sc 28 130,355 130,327 STAT..consistent gets - examin 230 130,962 130,732 STAT..recursive calls 567 218,074 217,507 STAT..consistent gets 1,469 260,988 259,519 LATCH.library cache pin 119 261,164 261,045 LATCH.library cache 166 261,675 261,509 STAT..session logical reads 134,411 438,136 303,725 LATCH.cache buffers chains 579,034 1,190,456 611,422 STAT..redo size 23,372,796 32,740,676 9,367,880 Run1 latches total versus run2 -- difference and pct Run1 Run2 Diff Pct 687,702 1,889,744 1,202,042 36.39% PL/SQL procedure successfully completed.It is probably no surprise that the MERGE was significantly faster than the PL/SQL solution. We could speed up the latter by using bulk processing, but we wouldn't be able to achieve a reduction of two-thirds required to match the MERGE.
merge and key preservation
MERGE is a deterministic, key-preserved operation. This means that for each source row, Oracle needs to be able to identify a single target record for update. The simplest method of ensuring that the MERGE is key-preserved is to join source and target according to the primary key of the target. We can demonstrate what happens if we cannot ensure key-preservation by modifying our MERGE to join on a column other than that of the primary key.SQL> MERGE 2 INTO target_table tgt 3 USING source_table src 4 ON ( src.object_name = tgt.object_name ) 5 WHEN MATCHED 6 THEN 7 UPDATE 8 SET tgt.object_type = src.object_type 9 WHEN NOT MATCHED 10 THEN 11 INSERT ( tgt.object_id 12 , tgt.object_name 13 , tgt.object_type ) 14 VALUES ( src.object_id 15 , src.object_name 16 , src.object_type );
USING source_table src * ERROR at line 3: ORA-30926: unable to get a stable set of rows in the source tablesThis error message is slightly confusing. It has been raised because Oracle found more than one target row that matched a single source row. MERGE will allow multiple updates of a single target row, however, as long as the join is key-preserved. In the following example, we'll remove the primary key from the source table and duplicate some source data. We will revert to the key-preserved join and this time our MERGE should be successful.
SQL> ALTER TABLE source_table DROP PRIMARY KEY;
Table altered.
SQL> INSERT INTO source_table 2 SELECT * 3 FROM target_table 4 WHERE ROWNUM <= 10000;
10000 rows created.
SQL> MERGE 2 INTO target_table tgt 3 USING source_table src 4 ON ( src.object_id = tgt.object_id ) 5 WHEN MATCHED 6 THEN 7 UPDATE 8 SET tgt.object_type = src.object_type 9 , tgt.object_name = src.object_name 10 WHEN NOT MATCHED 11 THEN 12 INSERT ( tgt.object_id 13 , tgt.object_name 14 , tgt.object_type ) 15 VALUES ( src.object_id 16 , src.object_name 17 , src.object_type );
96889 rows merged.This simply means that a single target row might be updated multiple times, which could give rise to some data quality issues (for example, if compound arithmetic such as summing is included in the update of some columns).
merge in pl/sql
MERGE can be used with in PL/SQL (even row-by-row) but should generally be considered a set operation. The following example shows how we might use MERGE with FORALL.SQL> DECLARE 2 3 TYPE aat_id IS TABLE OF source_table.object_id%TYPE 4 INDEX BY PLS_INTEGER; 5 TYPE aat_name IS TABLE OF source_table.object_name%TYPE 6 INDEX BY PLS_INTEGER; 7 TYPE aat_type IS TABLE OF source_table.object_type%TYPE 8 INDEX BY PLS_INTEGER; 9 10 aa_ids aat_id; 11 aa_names aat_name; 12 aa_types aat_type; 13 14 CURSOR c_source IS 15 SELECT object_id 16 , object_name 17 , object_type 18 FROM source_table; 19 20 n PLS_INTEGER := 0; 21 22 BEGIN 23 24 OPEN c_source; 25 LOOP 26 FETCH c_source BULK COLLECT INTO aa_ids, aa_names, aa_types LIMIT 1000; 27 28 --[some processing]-- 29 30 FORALL i IN 1 .. aa_ids.COUNT 31 MERGE 32 INTO target_table tgt 33 USING ( 34 SELECT aa_ids(i) AS object_id 35 , aa_names(i) AS object_name 36 , aa_types(i) AS object_type 37 FROM dual 38 ) src 39 ON ( src.object_id = tgt.object_id ) 40 WHEN MATCHED 41 THEN 42 UPDATE 43 SET tgt.object_name = src.object_name 44 , tgt.object_type = src.object_type 45 WHEN NOT MATCHED 46 THEN 47 INSERT ( tgt.object_id 48 , tgt.object_name 49 , tgt.object_type ) 50 VALUES ( src.object_id 51 , src.object_name 52 , src.object_type ); 53 54 n := n + SQL%ROWCOUNT; 55 56 EXIT WHEN c_source%NOTFOUND; 57 END LOOP; 58 CLOSE c_source; 59 60 DBMS_OUTPUT.PUT_LINE(n || ' rows merged.'); 61 END; 62 /
86889 rows merged. PL/SQL procedure successfully completed.We can see that this is quite cumbersome. To use MERGE with either row-by-row PL/SQL or FORALL (as we saw above), each source record must be selected from DUAL to generate a rowsource. We cannot use the variables in direct assignments within the update or insert sections of the MERGE itself. This will generally perform better than row-by-row "insert-else-update" or "update-else-insert", but it is not generally recommended as an approach for reasonable data volumes.
merge and triggers
The MERGE statement will cause before and after event triggers to fire. Ironically, we have stated more than once that MERGE is a DML statement in its own right, yet Oracle will fire UPDATE and INSERT triggers if these events occur within the merge. We can demonstrate this quite easily below. We will create separate insert and update triggers and simply output a message from each.SQL> CREATE OR REPLACE TRIGGER insert_trigger 2 AFTER INSERT 3 ON target_table 4 BEGIN 5 DBMS_OUTPUT.PUT_LINE('Insert trigger...'); 6 END; 7 /
Trigger created.
SQL> CREATE OR REPLACE TRIGGER update_trigger 2 AFTER UPDATE 3 ON target_table 4 BEGIN 5 DBMS_OUTPUT.PUT_LINE('Update trigger...'); 6 END; 7 /
Trigger created.
SQL> MERGE 2 INTO target_table tgt 3 USING source_table src 4 ON ( src.object_id = tgt.object_id ) 5 WHEN MATCHED 6 THEN 7 UPDATE 8 SET tgt.object_name = src.object_name 9 , tgt.object_type = src.object_type 10 WHEN NOT MATCHED 11 THEN 12 INSERT ( tgt.object_id 13 , tgt.object_name 14 , tgt.object_type ) 15 VALUES ( src.object_id 16 , src.object_name 17 , src.object_type );
Update trigger... Insert trigger... 86889 rows merged.As an aside, we can compile a MERGE trigger, but it doesn't seem to do anything (and the documentation doesn't contain any information on this either).
SQL> DROP TRIGGER insert_trigger;
Trigger dropped.
SQL> DROP TRIGGER update_trigger;
Trigger dropped.
SQL> CREATE OR REPLACE TRIGGER merge_trigger 2 AFTER MERGE 3 ON target_table 4 BEGIN 5 DBMS_OUTPUT.PUT_LINE('Merge trigger...'); 6 END; 7 /
Trigger created.
SQL> MERGE 2 INTO target_table tgt 3 USING source_table src 4 ON ( src.object_id = tgt.object_id ) 5 WHEN MATCHED 6 THEN 7 UPDATE 8 SET tgt.object_name = src.object_name 9 , tgt.object_type = src.object_type 10 WHEN NOT MATCHED 11 THEN 12 INSERT ( tgt.object_id 13 , tgt.object_name 14 , tgt.object_type ) 15 VALUES ( src.object_id 16 , src.object_name 17 , src.object_type );
86889 rows merged.
No comments:
Post a Comment