Home > Database, Oracle > When should I use BULK COLLECT?

When should I use BULK COLLECT?

When should I use BULK COLLECT?

The primary use case for BULK COLLECT is any kind of loop that results in fetching data one row at a time, which Tom Kyte refers as “slow by slow” processing.

You might think, therefore, that a cursor FOR loop is an obvious opportunity for conversion to BULK COLLECT. If you are running Oracle Database 10g Release 2 or higher, however, that is not necessary – as long as you have turned on Oracle’s automatic PL/SQL optimization to at least level 2.

Oracle Database 10g introduced automatic optimization of PL/SQL code. The default optimization level is 2 – and you should just leave it that way. At this level, Oracle will take many, careful steps to transform your code so that it runs substantially faster.

One of the most interesting and non-trivial optimization is that of cursor loops. Consider this code:

BEGIN
FOR rec IN (  SELECT *
FROM employees
ORDER BY last_name DESC)
LOOP
DBMS_OUTPUT.put_line (l_employees (indx).last_name);
END LOOP;
END;
/

This block seems to fetch data from the employees table one row at a time, and then display the last name. If, however, optimization is set to level 2 or higher, then Oracle will automatically convert this block into C code that fetches data in bulk, just like with BULK COLLECT.

As a result, you do not need to explicitly convert your cursor FOR loops to use BULK COLLECT if:

  1. You are running 10.2 or higher.
  2. Your optimization level is to 2 or higher.
  3. The cursor FOR loop does not contain any DML statements (insert, update, delete).

In other words, you definitely should convert code like the following block either to a cursor FOR loop (to benefit from automatic optimization) or an explicit BULK COLLECT:

DECLARE
CURSOR employees_cur
IS
SELECT *
BULK COLLECT INTO l_employees
FROM employees
ORDER BY last_name DESC;

   l_employee   employees_cur%ROWTYPE;
BEGIN
OPEN employees_cur;

   LOOP
FETCH employees_cur INTO l_employee;

      EXIT WHEN employees_cur%NOTFOUND;
DBMS_OUTPUT.put_line (l_employee.last_name);
END LOOP;
END;
/

Here is the explicit BULK COLLECT version:

DECLARE
TYPE employees_aat IS TABLE OF employees%ROWTYPE;

   l_employees   employees_aat;
BEGIN
— All rows at once…
     SELECT *
BULK COLLECT INTO l_employees
FROM employees
ORDER BY last_name DESC;

   DBMS_OUTPUT.put_line (l_employees.COUNT);

   FOR indx IN 1 .. l_employees.COUNT
LOOP
DBMS_OUTPUT.put_line (l_employees (indx).last_name);
END LOOP;
END;
/

and here is the cursor FOR loop version (same as shown earlier):

BEGIN
FOR rec IN (  SELECT *
FROM employees
ORDER BY last_name DESC)
LOOP
DBMS_OUTPUT.put_line (l_employees (indx).last_name);
END LOOP;
END;
/

Clearly, the cursor FOR loop offers the simplest solution. Use it! You will find at the end of this QuickTip a script you can run to prove to yourself the impact of the optimization level on cursor FOR loop optimization.

I noted earlier that you will need to explicitly convert to BULK COLLECT if your loop (even a cursor FOR loop) contains one or more DML statements. The reason for this is simple: Oracle does not automatically optimize the performance of DML statements inside loops to use FORALL. So even if the cursor FOR loop is automatically “converted” to BULK COLLECT-like levels of performance, the DML statements will still run for each row individually.

Since DML overhead is much higher than querying, you will still likely need to convert those DML statements to use FORALL. And that means you will need to populate collections to “feed” into the FORALL. Which means you will need (or should) use BULK COLLECT to fill up those collections.

Here is an example of a procedure that should be converted explicitly to use both BULK COLLECT and FORALL:

CREATE OR REPLACE PROCEDURE upd_for_dept (
dept_in     IN   employees.department_id%TYPE
, newsal_in   IN   employees.salary%TYPE
)
IS
CURSOR emp_cur
IS
SELECT employee_id, salary, hire_date
FROM employees
WHERE department_id = dept_in;
BEGIN
FOR rec IN emp_cur
LOOP
BEGIN
INSERT INTO employee_history
(employee_id, salary, hire_date
)
VALUES (rec.employee_id, rec.salary, rec.hire_date
);

         rec.salary := newsal_in;

adjust_compensation (rec.employee_id, rec.salary);

         UPDATE employees
SET salary = rec.salary
WHERE employee_id = rec.employee_id;
EXCEPTION
WHEN OTHERS
THEN
log_error;
END;
END LOOP;
END upd_for_dept;

So be on the lookout for any kind of loop in your code that contains DML statements. They will offer the greatest “bang for the buck” on improving application performance with relatively little effort on your part.

BULK COLLECT Good to Knows

Keep the following factoids in mind when using BULK COLLECT:

  • Prior to Oracle9i Database, you could use BULK COLLECT only with static SQL. Now you can use BULK COLLECT with both dynamic and static SQL.
  • You can use BULK COLLECT keywords in any of the following clauses: SELECT INTO, FETCH INTO, and RETURNING INTO.
  • A collection populated by BULK COLLECT is always filled sequentially from index value 1 (or is empty).
  • If you are BULK COLLECTing into a varray or a nested table, you do not need to initialize or extend prior to executing the BULK COLLECT query. Oracle will take care of all of that for you.
  • SELECT…BULK COLLECT will not raise NO_DATA_FOUND if no rows are found. Instead, you must check the contents of the collection to see if there is any data inside it.
  • If the query returns no rows, the collection’s COUNT method will return 0.
  • As with any operation involving collections, use of BULK COLLECT will likely increase the amount of PGA (Process Global Area) memory used by an application session.
  • Use the LIMIT clause with BULK COLLECT to put a cap on the amount of PGA memory used by your query.
  • You can specify the limit (example in next section) as a literal, expression or parameter. I suggest you avoid hard-coding the value.
  • 100 is a good default or starting value for LIMIT. Setting the value higher, to say 500 or 1000, will probably not improve performance significantly, but willincrease PGA consumption. For very large volumes of data, such millions of rows of data, you should probably experiment with higher limit values to see what kind of impact you see.
  • You can fetch into a single collection of records or a series of collections (one for each expression returned in the SELECT list).

Here are some examples:

  1. Select all the rows from the employees table in a single “round trip” to the SQL engine:

DECLARE
TYPE employees_aat IS TABLE OF employees%ROWTYPE;

   l_employees   employees_aat;
BEGIN
— All rows at once…
     SELECT *
BULK COLLECT INTO l_employees
FROM employees
ORDER BY last_name DESC;

   DBMS_OUTPUT.put_line (l_employees.COUNT);

   FOR indx IN 1 .. l_employees.COUNT
LOOP
DBMS_OUTPUT.put_line (l_employees (indx).last_name);
END LOOP;
END;
/

  1. Now limit the fetching to 10 rows at a time.

DECLARE
CURSOR employees_cur
IS
SELECT *
FROM employees
ORDER BY last_name DESC;

   TYPE employees_aat IS TABLE OF employees%ROWTYPE;

   l_employees   employees_aat;
BEGIN
OPEN employees_cur;

   LOOP
FETCH employees_cur
BULK COLLECT INTO l_employees
LIMIT 10;

      EXIT WHEN l_employees.COUNT = 0;

      FOR indx IN 1 .. l_employees.COUNT
LOOP
DBMS_OUTPUT.put_line (l_employee.last_name);
END LOOP;
END LOOP;
END;
/

  1. Use BULK COLLET with a dynamic query.

CREATE OR REPLACE PROCEDURE show_names (query_in IN VARCHAR2)
IS
l_names   DBMS_SQL.varchar2_table;
BEGIN
EXECUTE IMMEDIATE query_in BULK COLLECT INTO l_names;

   FOR indx IN 1 .. l_names.COUNT
LOOP
DBMS_OUTPUT.put_line (l_names (indx));
END LOOP;
END;
/

  1. Use BULK COLLECT with a cursor variable.

CREATE OR REPLACE PROCEDURE show_names (cv_io IN OUT SYS_REFCURSOR)
IS
l_names   DBMS_SQL.varchar2_table;
BEGIN
LOOP
FETCH cv_io
BULK COLLECT INTO l_names
LIMIT 10;

      EXIT WHEN l_names.COUNT = 0;

      FOR indx IN 1 .. l_names.COUNT
LOOP
DBMS_OUTPUT.put_line (l_names (indx));
END LOOP;
END LOOP;

   CLOSE cv_io;
END;
/

DECLARE
CV   SYS_REFCURSOR;
BEGIN
OPEN CV FOR ‘select last_name from employees’;

   show_names (CV);
END;
/

Using LIMIT with BULK COLLECT

Oracle provides a LIMIT clause for BULK COLLECT that allows you to limit the number of rows fetched from the database. The syntax is:

FETCH cursor BULK COLLECT INTO … [LIMIT rows];

where rows can be any literal, variable, or expression that evaluates to an integer (otherwise, the database will raise a VALUE_ERROR exception).

LIMIT is very useful with BULK COLLECT, because it helps you manage how much memory your program will used to process data. Suppose, for example, that you need to query and process 10,000 rows of data. You could use BULK COLLECT to retrieve all those rows and populate a rather large collection. However, this approach will consume lots of memory in the PGA for that session. If this code is run by many separate Oracle schemas, your application performance may degrade because of PGA swapping.

The following block of code uses the LIMIT clause in a FETCH that is inside a simple loop.

    DECLARE
l_limit PLS_INTEGER := 100;
CURSOR allrows_cur IS SELECT * FROM employees;
TYPE employees_aat IS TABLE OF allrows_cur%ROWTYPE
INDEX BY BINARY_INTEGER;
l_employees employees_aat;
BEGIN
OPEN allrows_cur;
LOOP
FETCH allrows_cur BULK COLLECT INTO l_employees LIMIT l_limit;

/* Process the data by scanning through the collection. */
          FOR l_row IN 1 .. l_employees.COUNT
LOOP
upgrade_employee_status (l_employees(l_row).employee_id);
END LOOP;

          EXIT WHEN allrows_cur%NOTFOUND;
END LOOP;

CLOSE allrows_cur;
END;

Notice that I terminate the loop by checking the value of allrows_cur%NOTFOUND at the bottom of the loop. When querying data one row at a time, we usually put this code immediately after the FETCH statement. You should not do that when using BULK COLLECT, because when the fetch retrieves the last set of rows, the cursor will be exhausted (and %NOTFOUND will return TRUE) but you will still have some elements in the collection to process.

So either check the %NOTFOUND attribute at the bottom of your loop, or check the contents of the collection immediately after the fetch:

LOOP
FETCH allrows_cur BULK COLLECT INTO l_employees LIMIT l_limit;
EXIT WHEN l_employees.COUNT = 0;

The disadvantage of this second approach is that you will perform an extra fetch that returns no rows, compared to checking %NOTFOUND at the bottom of the loop body. The performance impact of this extra fetch should, however, be minimal.

Advertisements
Categories: Database, Oracle
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: