Home > big data > listing files with the external table preprocessor in 11g

listing files with the external table preprocessor in 11g

listing files with the external table preprocessor in 11g

The external table preprocessor was introduced in 11g Release 1 (patchset 11.1.0.7) and formally documented in 11g Release 2. Briefly, the preprocessor enables us to define an executable or shell/batch script that can be used to generate a rowsource for the external table as it is queried. The most obvious use for such a feature is to enable compressed files to be queried directly from external tables and this is described in another oracle-developer.net article (see references below).

Another obvious use for the preprocessor is to solve the common problem of how to list files in directories from within Oracle (i.e. using SQL or PL/SQL). There is no documented built-in method for doing this in Oracle and techniques described on the web include Java stored procedures or an undocumented DBMS_BACKUP_RESTORE procedure. With the introduction of the preprocessor, however, we will demonstrate how directory listings can be achieved with an external table and a preprocessor batch/shell script.

background reading

Developers who are not familiar with the new preprocessor can read this oracle-developer.net article on new features for external tables in 11g, which describes the feature in some detail and how it can be used to query compressed files.

a simple file listing example

To begin, we will create a simple example of a preprocessor directory listing and for this we require the following components:

  • a read-write Oracle directory (for the external table);
  • an executable Oracle directory (for the preprocessor);
  • an external table;
  • a shell/batch script;
  • a dummy text file (for our table location).

As a SYSDBA, we will create the Oracle directories and grant the relevant privileges to SCOTT, as follows.

SQL> conn / as sysdba
Connected.

SQL> CREATE DIRECTORY xt_dir AS 'd: \oracle\dir\xt_dir';

Directory created.

SQL> GRANT READ, WRITE ON DIRECTORY xt_dir TO scott;

Grant succeeded.

SQL> CREATE DIRECTORY bin_dir AS 'd: \oracle\dir\bin_dir';

Directory created.

SQL> GRANT EXECUTE ON DIRECTORY bin_dir TO scott;

Grant succeeded.

We will now create the external table to read a standard DOS directory listing, as follows.

SQL> conn scott/tiger
Connected.

SQL> CREATE TABLE files_xt
  2  ( file_date VARCHAR2(50)
  3  , file_time VARCHAR2(50)
  4  , file_size VARCHAR2(50)
  5  , file_name VARCHAR2(255)
  6  )
  7  ORGANIZATION EXTERNAL
  8  (
  9    TYPE ORACLE_LOADER
 10    DEFAULT DIRECTORY xt_dir
 11    ACCESS PARAMETERS
 12    (
 13       RECORDS DELIMITED BY NEWLINE
 14       LOAD WHEN file_size != '<DIR>'
 15       PREPROCESSOR bin_dir: 'list_files.bat'
 16       FIELDS TERMINATED BY WHITESPACE
 17    )
 18    LOCATION ('sticky.txt')
 19  )
 20  REJECT LIMIT UNLIMITED;

Table created.

Note the following points about this external table:

  • Line 14: we use the LOAD WHEN clause to exclude directories from our listings;
  • Line 15: our preprocessor command simply calls a batch script named list_files.bat in the BIN_DIR;
  • Line 18: we need a location clause, even though we are executing a script and not reading a file. This file must exist (even if it is not used) and cannot be a directory. Note that the file name given for the location can be read by the preprocessor batch script, so we will exploit this feature later on in this article to build a dynamic file listing. For now, however, we have no interest in the sticky.txt file other than its presence.

We now need to create two files; namely the preprocessor script (i.e. list_files.bat) and the location file (i.e. sticky.txt). As this is a Windows system, the preprocessor script needs to be a batch script. In our case, list_files.bat contains the following simple code:

 
@echo off
dir d: \oracle\diag\rdbms\ora11\ora11\trace

The first line is mandatory for batch scripts and the script itself must be located in the executable Oracle directory (in our case BIN_DIR). Note that for now we have hard-coded the directory we wish to query (i.e. the user dump destination for this instance).

Moving on, we can now create our sticky.txt location file in our Oracle read-write directory (XT_DIR), as follows.

SQL> host echo > d: \oracle\dir\xt_dir\sticky.txt

For this example, the file doesn’t need to contain anything. It simply has to exist. We will now query our preprocessor external table to get a file listing for the 11g trace directory that we coded in the list_files.bat script, as follows.

SQL> SELECT * FROM files_xt WHERE ROWNUM <= 5;

FILE_DATE    FILE_TIME    FILE_SIZE    FILE_NAME
------------ ------------ ------------ ----------------------
Volume       in           drive        D
Volume       Serial       Number       is
14/10/2009   22:39        618,508      alert_ora11.log
14/10/2009   22:38        1,474        ora11_cjq0_5148.trc
14/10/2009   22:38        175          ora11_cjq0_5148.trm

5 rows selected.

We can see our directory listing at work, but the data requires some cleanup. The easiest way to do this is with a view, to filter and format the data we require in a consistent manner, as follows.

SQL> CREATE VIEW files_vxt
  2  AS
  3     SELECT file_name
  4     ,      TO_DATE(
  5               file_date||','||file_time,
  6               'DD/MM/YYYY HH24:MI') AS file_time
  7     ,      TO_NUMBER(
  8               file_size,
  9               'fm999,999,999,999') AS file_size
 10     FROM   files_xt
 11     WHERE  REGEXP_LIKE(
 12               file_date,
 13               '[0-9]{2}/[0-9]{2}/[0-9]{4}');

View created.

We can now query our file listing more cleanly, as follows.

SQL> col file_time format a20
SQL> col file_name format a30
SQL> col file_size format 999,999,999

SQL> SELECT * FROM files_vxt WHERE ROWNUM <= 5;

FILE_NAME                      FILE_TIME               FILE_SIZE
------------------------------ -------------------- ------------
alert_ora11.log                14/10/2009 22:39:00       618,508
ora11_cjq0_5148.trc            14/10/2009 22:38:00         1,474
ora11_cjq0_5148.trm            14/10/2009 22:38:00           175
ora11_dbrm_4016.trc            14/10/2009 23:20:00        20,184
ora11_dbrm_4016.trm            14/10/2009 23:20:00           991

5 rows selected.

Of course, because we are using SQL to access our file listing, we can use wildcard searches, date ordering and so on, as the following example demonstrates.

SQL> SELECT *
  2  FROM   files_vxt
  3  WHERE  file_name LIKE '%.trc'
  4  ORDER  BY
  5         file_time DESC;

FILE_NAME                      FILE_TIME               FILE_SIZE
------------------------------ -------------------- ------------
ora11_dbrm_4016.trc            14/10/2009 22:51:00        18,385
ora11_cjq0_5148.trc            14/10/2009 22:38:00         1,474
ora11_lgwr_2444.trc            14/10/2009 22:37:00         1,312
ora11_smon_2040.trc            14/10/2009 22:37:00         1,133
ora11_mmnl_3760.trc            14/10/2009 22:37:00         1,478
ora11_reco_5792.trc            14/10/2009 22:37:00         1,222
ora11_fbda_3052.trc            13/10/2009 17:41:00         1,167
ora11_w000_4796.trc            13/10/2009 06:52:00           905

8 rows selected.

dynamic file listing with preprocessor scripts

The small batch script example above uses a pre-defined directory listing (i.e. it only lists the 11g trace directory files). One of the features of the batch or shell script that we use is that it can read the path and name of the file used in the external table location clause. This means that it is possible to make the directory listing more dynamic by putting the paths to the directory or directories we wish to list inside the external table location file.

To demonstrate this, we will re-create our FILES_XT external table, using a new batch script and location file, as follows.

SQL> DROP TABLE files_xt;

Table dropped.

SQL> CREATE TABLE files_xt
  2  ( file_date VARCHAR2(50)
  3  , file_time VARCHAR2(50)
  4  , file_size VARCHAR2(50)
  5  , file_name VARCHAR2(255)
  6  )
  7  ORGANIZATION EXTERNAL
  8  (
  9    TYPE ORACLE_LOADER
 10    DEFAULT DIRECTORY xt_dir
 11    ACCESS PARAMETERS
 12    (
 13       RECORDS DELIMITED by NEWLINE
 14       LOAD WHEN file_size != '<DIR>'
 15       PREPROCESSOR bin_dir: 'dynamic_list_files.bat'
 16       FIELDS TERMINATED BY WHITESPACE
 17    )
 18    LOCATION ('directories.txt')
 19  )
 20  REJECT LIMIT UNLIMITED;

Table created.

This time we have a batch script in our BIN_DIR named dynamic_list_files.bat and a location file called directories.txt. The batch file contents are quite simple, as follows.

 
@echo off
for /F %%X in (%1) do dir %%X

This small script reads the external table location file (%1) and each line in the file (/F) is assigned to a variable (%%X) and used as the target for a directory listing (dir %%X). Each line in the location file therefore needs to be a valid directory path. With this method, there is no hard-coding of paths required in the preprocessor script and we can change the contents of the location file to list different directories as and when required.

To test this, we will add the 11g trace file path to the directories.txt file and run a query over our external table, as follows.

SQL> host echo d: \oracle\diag\rdbms\ora11\ora11\trace > d: \oracle\dir\xt_dir\directories.txt

SQL> SELECT *
  2  FROM   files_vxt
  3  WHERE  file_name LIKE '%.trc'
  4  ORDER  BY
  5         file_time DESC;

FILE_NAME                      FILE_TIME               FILE_SIZE
------------------------------ -------------------- ------------
ora11_dbrm_4016.trc            14/10/2009 22:59:00        18,899
ora11_cjq0_5148.trc            14/10/2009 22:38:00         1,474
ora11_lgwr_2444.trc            14/10/2009 22:37:00         1,312
ora11_smon_2040.trc            14/10/2009 22:37:00         1,133
ora11_mmnl_3760.trc            14/10/2009 22:37:00         1,478
ora11_reco_5792.trc            14/10/2009 22:37:00         1,222
ora11_fbda_3052.trc            13/10/2009 17:41:00         1,167
ora11_w000_4796.trc            13/10/2009 06:52:00           905

8 rows selected.

As described above, the batch script is able to read inside the directories.txt file and pass the contents (i.e. a path) to the dir command. If we wish, we can list multiple paths in the location file. In the following example, we will list the files in two different user dump destinations. First we will add the paths to the location file, as follows.

SQL> host echo d: \oracle\admin\ora92\udump > d: \oracle\dir\xt_dir\directories.txt

SQL> host echo d: \oracle\admin\ora102\udump >> d: \oracle\dir\xt_dir\directories.txt

We are now able to list a sample of 9i and 10g trace files from the same external table, below.

SQL> SELECT *
  2  FROM   files_vxt
  3  WHERE  file_name LIKE '%.trc'
  4  AND    ROWNUM <= 10;

FILE_NAME                      FILE_TIME               FILE_SIZE
------------------------------ -------------------- ------------
ora92_ora_3164.trc             29/04/2009 18:32:00   448,045,831
ora92_ora_3340_bv.trc          19/10/2007 18:14:00         9,802
ora92_ora_4364_sctx.trc        19/10/2007 18:23:00        10,536
ora92_ora_4532_lit.trc         19/10/2007 18:14:00        10,799
ora92_ora_4708_bvcs.trc        19/10/2007 18:28:00        10,513
ora92_ora_5156.trc             28/04/2009 18:00:00         1,784
ora102_ora_1344_x2.trc         18/10/2007 19:14:00        36,239
ora102_ora_1544.trc            17/03/2009 23:04:00           934
ora102_ora_1896.trc            13/03/2009 13:19:00           608
ora102_ora_2272.trc            17/11/2008 07:47:00           591

10 rows selected.

We can clearly see trace files from the two different paths we included in our directories.txt location file.

a more secure dynamic method

Being able to dynamically inject any path into the directories.txt location file has security implications. As an alternative, readers might like to consider the following, more secure, method. Rather than a single “injectable” location file for the FILES_XT external table, we can setup individual location files that each contain one hard-coded directory path. These files can be placed in a readonly directory and whenever a directory needs to be listed, the external table’s location can be changed to the corresponding location file.

To demonstrate this, we will begin by creating a readonly Oracle directory as follows.

SQL> conn / as sysdba
Connected.

SQL> CREATE DIRECTORY readonly_dir AS 'd: \oracle\dir\readonly_dir';

Directory created.

SQL> GRANT READ ON DIRECTORY readonly_dir TO scott;

Grant succeeded.

We will now add three location files to the READONLY_DIR directory, one for each of the user dump destinations used in the previous examples. Each file contains the path to the directory indicated by the filename.

SQL> host echo d: \oracle\admin\ora92\udump > d: \oracle\dir\readonly_dir\ora92_udump.txt

SQL> host echo d: \oracle\admin\ora102\udump > d: \oracle\dir\readonly_dir\ora102_udump.txt

SQL> host echo d: \oracle\diag\rdbms\ora11\ora11\trace > d: \oracle\dir\readonly_dir\ora11_udump.txt

We will now modify our FILES_XT table to set the default directory to the READONLY_DIR, as follows.

SQL> conn scott/tiger
Connected.

SQL> ALTER TABLE files_xt DEFAULT DIRECTORY readonly_dir;

Table altered.

We are now ready to use the more secure file listing method. We will begin by listing the files in the ORA92 user dump destination, as follows.

SQL> ALTER TABLE files_xt LOCATION ('ora92_udump.txt');

Table altered.

SQL> SELECT * FROM files_vxt WHERE ROWNUM <= 5;

FILE_NAME                      FILE_TIME               FILE_SIZE
------------------------------ -------------------- ------------
ora92_ora_3164.trc             29/04/2009 18:32:00   448,045,831
ora92_ora_3340_bv.trc          19/10/2007 18:14:00         9,802
ora92_ora_4364_sctx.trc        19/10/2007 18:23:00        10,536
ora92_ora_4532_lit.trc         19/10/2007 18:14:00        10,799
ora92_ora_4708_bvcs.trc        19/10/2007 18:28:00        10,513

5 rows selected.

To query the ORA102 and ORA11 destinations, we simply change the location of the external table to the corresponding (and more importantly, controlled) location files, as follows.

SQL> ALTER TABLE files_xt LOCATION ('ora102_udump.txt');

Table altered.

SQL> SELECT * FROM files_vxt WHERE ROWNUM <= 5;

FILE_NAME                      FILE_TIME               FILE_SIZE
------------------------------ -------------------- ------------
ora102_ora_1344_x2.trc         18/10/2007 19:14:00        36,239
ora102_ora_1544.trc            17/03/2009 23:04:00           934
ora102_ora_1896.trc            13/03/2009 13:19:00           608
ora102_ora_2272.trc            17/11/2008 07:47:00           591
ora102_ora_2564.trc            20/08/2008 07:39:00           591

5 rows selected.

SQL> ALTER TABLE files_xt LOCATION ('ora11_udump.txt');

Table altered.

SQL> SELECT * FROM files_vxt WHERE ROWNUM <= 5;

FILE_NAME                      FILE_TIME               FILE_SIZE
------------------------------ -------------------- ------------
alert_ora11.log                14/10/2009 22:39:00       618,508
ora11_cjq0_5148.trc            14/10/2009 22:38:00         1,474
ora11_cjq0_5148.trm            14/10/2009 22:38:00           175
ora11_dbrm_4016.trc            14/10/2009 23:25:00        20,441
ora11_dbrm_4016.trm            14/10/2009 23:25:00         1,003

5 rows selected.

This method is equally as flexible from an application’s perspective, but it gives the DBA more control over which directories are available for listing and who can use the utility. Be aware, however, that if the application has access to any read-write directories, it is still possible for the external table’s default directory to be altered and the less secure method used instead. If this is the case, it might be more secure to create a dedicated external table for each directory that needs to be listed and ensure that the application is unable to alter the external tables’ directories or locations.

using a pipelined function for dynamic file listing

Finally, we will take the dynamic file listing to a logical conclusion and create a pipelined function to perform the actions in the previous example. For this, we will need to create some types to describe our pipelined function’s output data, as follows.

SQL> CREATE TYPE file_listing_ot AS OBJECT
  2  ( file_path VARCHAR(1000)
  3  , file_name VARCHAR2(100)
  4  , file_time DATE
  5  , file_size NUMBER
  6  );
  7  /

Type created.

SQL> CREATE TYPE file_listing_ntt AS TABLE OF file_listing_ot;
  2  /

Type created.

As we can see, the function will return the basic file listing information, together with the path of the directory being listed. We will now create a simple pipelined function to list the files, as follows.

SQL> CREATE FUNCTION list_files(
  2                  p_directory_file IN VARCHAR2
  3                  ) RETURN file_listing_ntt
  4                    AUTHID CURRENT_USER
  5                    PIPELINED IS
  6
  7     v_path VARCHAR2(1000);
  8
  9     ----------------------------------------------------------
 10     PROCEDURE get_path( p_file IN  VARCHAR2,
 11                         p_path OUT VARCHAR2 ) IS
 12        v_file UTL_FILE.FILE_TYPE;
 13     BEGIN
 14        v_file := UTL_FILE.FOPEN('READONLY_DIR', p_file, 'r');
 15        UTL_FILE.GET_LINE(v_file, p_path);
 16        UTL_FILE.FCLOSE(v_file);
 17     END get_path;
 18
 19     ----------------------------------------------------------
 20     PROCEDURE set_location( p_file IN VARCHAR2 ) IS
 21        PRAGMA AUTONOMOUS_TRANSACTION;
 22     BEGIN
 23        EXECUTE IMMEDIATE
 24           'ALTER TABLE files_xt LOCATION (''' || p_file || ''')';
 25     END set_location;
 26
 27  BEGIN
 28
 29     /* Read the path from the directory file... */
 30     get_path( p_directory_file, v_path );
 31
 32     /* Prepare the external table... */
 33     set_location( p_directory_file );
 34
 35     /* Read the file listing... */
 36     FOR r_files IN (SELECT * FROM files_vxt) LOOP
 37        PIPE ROW ( file_listing_ot( v_path,
 38                                    r_files.file_name,
 39                                    r_files.file_time,
 40                                    r_files.file_size ));
 41     END LOOP;
 42     RETURN;
 43  END list_files;
 44  /

Function created.

This function wraps the previous interactive example and performs three simple tasks:

  • Line 30 (calling lines 10-17): first, it reads the path of the listed directory from the static location file;
  • Line 33 (calling lines 20-25): second, it sets the location of the FILES_XT to the static location file parameter;
  • Lines 36-41: third, it queries the FILES_VXT view (that sits over the FILES_XT table) to output the directory listing.

We will now query the pipelined function to get a file listing of the user dump destination for the ORA92 instance on this system.

SQL> SELECT *
  2  FROM   TABLE(list_files('ora92_udump.txt'))
  3  ORDER  BY
  4         file_size DESC;

FILE_PATH                      FILE_NAME                 FILE_TIME               FILE_SIZE
------------------------------ ------------------------- -------------------- ------------
d:\oracle\admin\ora92\udump    ora92_ora_3164.trc        29/04/2009 18:32:00   448,045,831
d:\oracle\admin\ora92\udump    ora92_ora_4532_lit.trc    19/10/2007 18:14:00        10,799
d:\oracle\admin\ora92\udump    ora92_ora_4364_sctx.trc   19/10/2007 18:23:00        10,536
d:\oracle\admin\ora92\udump    ora92_ora_4708_bvcs.trc   19/10/2007 18:28:00        10,513
d:\oracle\admin\ora92\udump    ora92_ora_3340_bv.trc     19/10/2007 18:14:00         9,802
d:\oracle\admin\ora92\udump    ora92_ora_5156.trc        28/04/2009 18:00:00         1,784

6 rows selected.

We can see that this is a simple and relatively secure method for listing directory files with a preprocessor. To make this more robust, we would need to add some exception handling and an application lock around the FILES_XT access. We could also pass in a collection of directory files and have multiple locations set and listed in one query, but enhancements like this can be an exercise for the reader!

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: