Splitting a clob into rows


I’ve used this tool for a wide variety of other parsing projects. One of the interesting tuning techniques I used was to pull the clob apart into 32K varchar2 chunks.
It is possible to split the clob directly using the DBMS_LOB package or through the overloaded SQL functions; but clobs are expensive objects. Varchar2 variables on the other hand are relatively light weight making the sub-parsing within them much faster. Doing this does take a little bit of care though to make sure the chunks don’t accidentally split a line in two.

Also, I do make the assumption that no one line will be more than 32K long which is fine for this function anyway since the output is a SQL collection with a varchar2 limit of 4000 bytes.
The returned VCARRAY type is a simple table collection type.

CREATE OR REPLACE TYPE VCARRAY as table of varchar2(4000)

I wrote this originally in 9i. With 12c support for 32K varchar2 in SQL I may need to revisit it and make a new version.

CREATE OR REPLACE FUNCTION split_clob(p_clob IN CLOB, p_delimiter IN VARCHAR2 DEFAULT CHR(10))
    RETURN vcarray
    PIPELINED
IS
    --                    .///.
    --                   (0 o)
    ---------------0000--(_)--0000---------------
    --
    --  Sean D. Stuber
    --  sean.stuber@gmail.com
    --
    --             oooO      Oooo
    --------------(   )-----(   )---------------
    --             \ (       ) /
    --              \_)     (_/

    c_chunk_limit   CONSTANT INTEGER := 32767;
    v_clob_length            INTEGER := DBMS_LOB.getlength(p_clob);
    v_clob_index             INTEGER;
    v_chunk                  VARCHAR2(32767);
    v_chunk_end              INTEGER;
    v_chunk_length           INTEGER;
    v_chunk_index            INTEGER;
    v_delim_len              INTEGER := LENGTH(p_delimiter);
    v_line_end               INTEGER;
BEGIN
    v_clob_length := DBMS_LOB.getlength(p_clob);
    v_clob_index := 1;

    WHILE v_clob_index <= v_clob_length
    LOOP
        /*
            Pull one 32K chunk off the clob at a time.
            This is because it's MUCH faster to use built in functions
            on a varchar2 type than to use dbms_lob functions on a clob.
        */
        v_chunk := DBMS_LOB.SUBSTR(p_clob, c_chunk_limit, v_clob_index);

        IF v_clob_index > v_clob_length - c_chunk_limit
        THEN
            -- if we walked off the end the clob,
            -- then the chunk is whatever we picked up at the end
            -- delimited or not
            v_clob_index := v_clob_length + 1;
        ELSE
            v_chunk_end := INSTR(v_chunk, p_delimiter, -1);

            IF v_chunk_end = 0
            THEN
                DBMS_OUTPUT.put_line('No delimiters found!');
                RETURN;
            END IF;

            v_chunk := SUBSTR(v_chunk, 1, v_chunk_end);
            v_clob_index := v_clob_index + v_chunk_end + v_delim_len - 1;
        END IF;

        /*
            Given a varchar2 chunk split it into lines
        */

        v_chunk_index := 1;
        v_chunk_length := NVL(LENGTH(v_chunk), 0);

        WHILE v_chunk_index <= v_chunk_length
        LOOP
            v_line_end := INSTR(v_chunk, p_delimiter, v_chunk_index);

            IF v_line_end = 0 OR (v_line_end - v_chunk_index) > 4000
            THEN
                PIPE ROW (SUBSTR(v_chunk, v_chunk_index, 4000));
                v_chunk_index := v_chunk_index + 4000;
            ELSE
                PIPE ROW (SUBSTR(v_chunk, v_chunk_index, v_line_end - v_chunk_index));
                v_chunk_index := v_line_end + v_delim_len;
            END IF;
        END LOOP;
    END LOOP;

    RETURN;
EXCEPTION
    WHEN no_data_needed
    THEN
        NULL;
END split_clob;
/

Advertisements

2 Responses to Splitting a clob into rows

  1. Claudio says:

    No other comments here? Really? This solution seems to be the fastest clob splitter by far – if you are parsing large clobs you must try this.

    Thanks Sean!

Questions and Comments always welcome

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: