Using SQL to querying USGS Data to monitor recent earthquakes


I live in Ohio which is fairly stable as far as earthquakes go; but I have coworkers and friends that live in other, more quake-prone areas of the country.  So, sometimes I like to look up what’s going on in those areas.  Naturally, I decided to see how I could use Oracle features to assist in my monitoring.

First, I needed a reliable source of information.  The U.S. Geological Survey web site is my data source and they have a variety of free data feeds available.  If you’re going to do a lot of data analysis I recommend pulling the data into a local table and saving it rather than hammering their server again and again for each query.

So, while these example show how to access and parse the data, your production use should be to local storage.  You’ll get better performance on repeated access rather than pulling across the internet anyway.

The USGS has a several different formats available.  I’ll be using the Quakeml formatted feed.  Quakeml is simply xml with a predefined schema.  Since I’m not a seismologist I’m only interested in the basic information of when?/where?/how big? and not necessarily all of the minute details like sampling method, uncertainty, etc.

In the examples below I’ll use a rectangular box to define the area I want to examine and I’ll request all of the data for the current and previous 2 days.  For more complete information on constructing other query urls, see the reference here: http://earthquake.usgs.gov/fdsnws/event/1

The service returns a single xml document consisting of multiple event tags.  Each event represents one earthquake.  We’ll extract the text within the following nodes to get our result set.

/q:quakeml/eventParameters/event/description/text
/q:quakeml/eventParameters/event/origin/time/value
/q:quakeml/eventParameters/event/origin/longitude/value
/q:quakeml/eventParameters/event/origin/latitude/value
/q:quakeml/eventParameters/event/origin/depth/value
/q:quakeml/eventParameters/event/magnitude/mag/value

Note text is an xml tag,  not the xquery function text().

The url construction requires some cumbersome syntax but isn’t really all that complicated.  It basically boils down to a base url followed by the parameters for the start and end times (in GMT),  a boundary box of latitude/longitude coordinates, a minimum magnitude and an ordering criteria.

In this case I’m looking for a bounding box that includes the state of Oklahoma, thus yielding a url construction of:

'http://earthquake.usgs.gov/fdsnws/event/1/query.quakeml?starttime='
|| TO_CHAR(FROM_TZ(CAST(TRUNC(SYSDATE - 2) AS TIMESTAMP), 'US/Eastern') AT TIME ZONE 'GMT', 'yyyy-mm-dd')
|| '&endtime='
|| TO_CHAR(SYSTIMESTAMP AT TIME ZONE 'GMT', 'yyyy-mm-dd"%20"hh24:mi:ss')
|| '&maxlatitude=37&minlatitude=33&maxlongitude=-94&minlongitude=-103'
|| '&minmagnitude=0&orderby=time'

 

Using the HTTPURITYPE function we can read the USGS site for that url and extract the xml.  Using the XMLTABLE we can then extract each of the event nodes into individual rows and parse out the values we’re interested in.

SELECT FROM_TZ(TO_TIMESTAMP(time, 'yyyy-mm-dd"T"hh24:mi:ssxff"Z"'), 'GMT') 
            AT TIME ZONE 'US/Central' quake_time,
       latitude,
       longitude,
       magnitude,
       depth_in_meters,
       location_name,
       XMLSERIALIZE(DOCUMENT eventxml AS CLOB INDENT SIZE = 3)
  FROM XMLTABLE(
           xmlnamespaces(DEFAULT 'http://quakeml.org/xmlns/bed/1.2',
                         'http://anss.org/xmlns/catalog/0.1' AS "catalog",
                         'http://quakeml.org/xmlns/quakeml/1.2' AS "q"),
           '/q:quakeml/eventParameters/event'
           PASSING httpuritype(
                          'http://earthquake.usgs.gov/fdsnws/event/1/query.quakeml?starttime='
                       || TO_CHAR(FROM_TZ(CAST(TRUNC(SYSDATE - 2) AS TIMESTAMP), 'US/Eastern')
                                AT TIME ZONE 'GMT','yyyy-mm-dd"')
                       || '&endtime='
                       || TO_CHAR(SYSTIMESTAMP AT TIME ZONE 'GMT', 'yyyy-mm-dd"%20"hh24:mi:ss')
                       || '&maxlatitude=37&minlatitude=33&maxlongitude=-94&minlongitude=-103'
                       || '&minmagnitude=0&orderby=time').getxml()
           COLUMNS 
              eventxml XMLTYPE PATH '.',
              time VARCHAR2(100) PATH './origin/time/value',
              longitude NUMBER PATH './origin/longitude/value',
              latitude NUMBER PATH './origin/latitude/value',
              magnitude NUMBER PATH './magnitude/mag/value',
              depth_in_meters NUMBER PATH './origin/depth/value',
              location_name VARCHAR2(100) PATH './description/text')

 

Since Oklahoma is in US/Central timezone I’ll extract the data in time local to the events.

Running the query above yields a result set like the following…

QUAKE_TIME LATITUDE LONGITUDE MAGNITUDE DEPTH_IN_METERS LOCATION_NAME
2015-07-27 19:24:03.770000000 -05:00
35.9938
-97.5664
3.2
3290
5km NNE of Crescent, Oklahoma
2015-07-27 16:07:46.590000000 -05:00
36.0019
-97.5183
2.5
4490
8km NE of Crescent, Oklahoma
2015-07-27 13:12:15.330000000 -05:00
36.006
-97.5761
4.5
3180
6km NNE of Crescent, Oklahoma
2015-07-27 12:49:28.000000000 -05:00
36.0018
-97.5667
4
5590
6km NNE of Crescent, Oklahoma
2015-07-26 08:26:50.480000000 -05:00
35.9983
-97.5693
3.2
4270
5km NNE of Crescent, Oklahoma
2015-07-26 06:30:48.720000000 -05:00
36.5294
-98.8821
3
5000
30km ENE of Mooreland, Oklahoma
2015-07-26 04:54:33.520000000 -05:00
36.0082
-97.5709
3.9
5860
6km NNE of Crescent, Oklahoma
2015-07-26 04:15:06.100000000 -05:00
35.9656
-96.8124
2.8
4160
4km WSW of Cushing, Oklahoma
2015-07-26 01:59:44.610000000 -05:00
36.0016
-97.5775
2.9
7130
5km NNE of Crescent, Oklahoma
2015-07-25 06:14:44.200000000 -05:00
36.0034
-97.568
3.9
5570
6km NNE of Crescent, Oklahoma
2015-07-25 03:47:53.420000000 -05:00
36.3478
-96.8192
2.9
1100
1km NW of Pawnee, Oklahoma
2015-07-25 01:48:21.880000000 -05:00
36.136
-97.0685
1.8
4280
2km NNW of Stillwater, Oklahoma
2015-07-24 23:49:02.820000000 -05:00
36.1675
-96.967
3
5000
10km NE of Stillwater, Oklahoma
2015-07-24 23:10:33.690000000 -05:00
36.5804
-97.6208
3.1
5000
27km SSE of Medford, Oklahoma

New version of LDAP package published


I let this project sit on the back burners for too many months but tonight finally got around to cleaning it up and getting it out for people to use.

Do note, the authentication is a little different and hence, not compatible with prior versions.
While I don’t like breaking compatibility, it was a necessary change since the earlier versions made too many assumptions about administrative distinguished names.
Other than that, most of the api should still look and behave pretty similarly.

Version 2 supports aliases now, so if you have multiple names for the same database service, you can now consolidate them to a single entry with aliases pointing to it.

Enjoy!

Version 2 can be found here

Oracle optimizer doesn’t understand elementary math


If A < B and B < C then A < C

Right? We all know the transitive rule… all of us except the Oracle optimizer.
Here’s a seasonal example to illustrate how the Optimizer changes it’s methods based on seemingly trivial options.

Let’s say I bake 10 types of cookies and then I want to see how many distinct bundles of 3 types I can make.
Mathematically we’d represent that as C(10,3) = 120.

In SQL, we could solve for them like this:

CREATE TABLE cookies
AS
SELECT 'oatmeal' cookie FROM DUAL
UNION ALL
SELECT 'm&m' cookie FROM DUAL
UNION ALL
SELECT 'chocolate chip' cookie FROM DUAL
UNION ALL
SELECT 'molasses' cookie FROM DUAL
UNION ALL
SELECT 'gingerbread' cookie FROM DUAL
UNION ALL
SELECT 'macadamia nut' cookie FROM DUAL
UNION ALL
SELECT 'peanut butter' cookie FROM DUAL
UNION ALL
SELECT 'snickerdoodle' cookie FROM DUAL
UNION ALL
SELECT 'sugar cookie' cookie FROM DUAL
UNION ALL
SELECT 'shortbread' cookie FROM DUAL;

EXEC dbms_stats.gather_table_stats(ownname=>user,tabname=>'COOKIES');

Now run a couple of test queries to check our math.

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie;

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie AND a.cookie < c.cookie;

Note, both queries return the expected 120 rows.
However, the optimizer came up with 2 different plans for these functionally identical queries.

With implied transitivity

--------------------------------------------------
| Id  | Operation             | Name    | E-Rows |
--------------------------------------------------
|   0 | SELECT STATEMENT      |         |    203 |
|   1 |  MERGE JOIN           |         |    203 |
|   2 |   SORT JOIN           |         |     45 |
|   3 |    MERGE JOIN         |         |     45 |
|   4 |     SORT JOIN         |         |     10 |
|   5 |      TABLE ACCESS FULL| COOKIES |     10 |
|*  6 |     SORT JOIN         |         |     10 |
|   7 |      TABLE ACCESS FULL| COOKIES |     10 |
|*  8 |   SORT JOIN           |         |     10 |
|   9 |    TABLE ACCESS FULL  | COOKIES |     10 |
--------------------------------------------------

With explicit transitivity

-------------------------------------------------
| Id  | Operation            | Name    | E-Rows |
-------------------------------------------------
|   0 | SELECT STATEMENT     |         |     91 |
|   1 |  MERGE JOIN          |         |     91 |
|   2 |   MERGE JOIN         |         |     45 |
|   3 |    SORT JOIN         |         |     10 |
|   4 |     TABLE ACCESS FULL| COOKIES |     10 |
|*  5 |    SORT JOIN         |         |     10 |
|   6 |     TABLE ACCESS FULL| COOKIES |     10 |
|*  7 |   FILTER             |         |        |
|*  8 |    SORT JOIN         |         |     10 |
|   9 |     TABLE ACCESS FULL| COOKIES |     10 |
-------------------------------------------------

We can take it a step futher, let’s say I want all of the cookies to be the same type.
This is obviously going to be just 10 different combinations and no joins are needed; but I’ll write it this way to further illustrate the point.

With implied transitivity

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie = b.cookie AND b.cookie = c.cookie;

------------------------------------------------
| Id  | Operation           | Name    | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT    |         |     10 |
|*  1 |  HASH JOIN          |         |     10 |
|*  2 |   HASH JOIN         |         |     10 |
|   3 |    TABLE ACCESS FULL| COOKIES |     10 |
|   4 |    TABLE ACCESS FULL| COOKIES |     10 |
|   5 |   TABLE ACCESS FULL | COOKIES |     10 |
------------------------------------------------

With explicit transitivity

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie = b.cookie AND b.cookie = c.cookie AND a.cookie = c.cookie;

------------------------------------------------
| Id  | Operation           | Name    | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |
|*  1 |  HASH JOIN          |         |      1 |
|*  2 |   HASH JOIN         |         |     10 |
|   3 |    TABLE ACCESS FULL| COOKIES |     10 |
|   4 |    TABLE ACCESS FULL| COOKIES |     10 |
|   5 |   TABLE ACCESS FULL | COOKIES |     10 |
------------------------------------------------

Again note the change in estimated rows for an evaluation that we might assume the optimizer could figure out on its own.

The optimizer doesn’t understand transitivity of columns; but what if we add variables or constants?
Going back to the original query and take a closer look:

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie;

The plan’s predicates include:

Predicate Information (identified by operation id):
---------------------------------------------------
 
   6 - access(INTERNAL_FUNCTION("A"."COOKIE")<INTERNAL_FUNCTION("B"."COOKIE"))
       filter(INTERNAL_FUNCTION("A"."COOKIE")<INTERNAL_FUNCTION("B"."COOKIE"))
   8 - access("B"."COOKIE"<"C"."COOKIE")
       filter("B"."COOKIE"<"C"."COOKIE")

But, if we add a variable, obviously this changes the results, but it’s also interesting to note how the SQL condition becomes interpreted as filter conditions within the optimizer’s plan predicates.

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie and a.cookie= :b1;

------------------------------------------------
| Id  | Operation           | Name    | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |
|   1 |  NESTED LOOPS       |         |      1 |
|   2 |   NESTED LOOPS      |         |      1 |
|*  3 |    TABLE ACCESS FULL| COOKIES |      1 |
|*  4 |    TABLE ACCESS FULL| COOKIES |      1 |
|*  5 |   TABLE ACCESS FULL | COOKIES |      1 |
------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("A"."COOKIE"=:B1)
   4 - filter("B"."COOKIE">:B1 AND "A"."COOKIE"<"B"."COOKIE")
   5 - filter("C"."COOKIE">:B1 AND "B"."COOKIE"<"C"."COOKIE")

The variable does get propagated through to the other conditions as we might expect.
Now, let’s try adding a literal value and we’ll see Oracle again is able to derive transitivity and propagates the constant literal through to all of the table predicates and radically alters the estimates and execution plan

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie and a.cookie= 'gingerbread';

--------------------------------------------------
| Id  | Operation             | Name    | E-Rows |
--------------------------------------------------
|   0 | SELECT STATEMENT      |         |     14 |
|   1 |  MERGE JOIN           |         |     14 |
|   2 |   MERGE JOIN CARTESIAN|         |      8 |
|*  3 |    TABLE ACCESS FULL  | COOKIES |      1 |
|   4 |    BUFFER SORT        |         |      8 |
|*  5 |     TABLE ACCESS FULL | COOKIES |      8 |
|*  6 |   FILTER              |         |        |
|*  7 |    SORT JOIN          |         |      8 |
|*  8 |     TABLE ACCESS FULL | COOKIES |      8 |
--------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("A"."COOKIE"='gingerbread')
   5 - filter("C"."COOKIE">'gingerbread')
   6 - filter("B"."COOKIE"<"C"."COOKIE")
   7 - access("A"."COOKIE"<"B"."COOKIE")
       filter("A"."COOKIE"<"B"."COOKIE")
   8 - filter("B"."COOKIE">'gingerbread')

But, even though it now seems to understands the transitivity, that understanding is still flawed, because if we put the implicit condition of a.cookie < c.cookie, which offers no functional change, we still get different plans with different joins and estimates.

SELECT *
  FROM cookies a, cookies b, cookies c
 WHERE a.cookie < b.cookie
   AND b.cookie < c.cookie
   AND a.cookie < c.cookie
   AND a.cookie = 'gingerbread';
   
-------------------------------------------------
| Id  | Operation            | Name    | E-Rows |
-------------------------------------------------
|   0 | SELECT STATEMENT     |         |      8 |
|   1 |  MERGE JOIN          |         |      8 |
|   2 |   NESTED LOOPS       |         |      4 |
|*  3 |    TABLE ACCESS FULL | COOKIES |      1 |
|*  4 |    TABLE ACCESS FULL | COOKIES |      4 |
|*  5 |   FILTER             |         |        |
|*  6 |    SORT JOIN         |         |      8 |
|*  7 |     TABLE ACCESS FULL| COOKIES |      8 |
-------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("A"."COOKIE"='gingerbread')
   4 - filter("A"."COOKIE"<"C"."COOKIE" AND "C"."COOKIE">'gingerbread')
   5 - filter("B"."COOKIE"<"C"."COOKIE")
   6 - access("A"."COOKIE"<"B"."COOKIE")
       filter("A"."COOKIE"<"B"."COOKIE")
   7 - filter("B"."COOKIE">'gingerbread')

So, what are the lessons here?

First, I really like oatmeal cookies.

Second, I’m reminded that I should bake more.

Third, if you’re stuck on a query where the optimizer seems like it’s ignoring some crucial information, it may be a hole in its logic where it doesn’t extend “obvious” rules throughout the plan. If so, it may be worth changing the where conditions to include, exclude or alter functionally identical conditions.

For example:
WHERE a=b and b=c
could be written
WHERE a=b and a=c
Or
WHERE a=b and a=c and b=c

All three conditions are functionally identical; but writing them in different ways could hide or expose information the optimizer can’t derive on its own (even if we think it should.)

Book Review: Oracle PL/SQL Performance Tuning Tips & Techniques


I finished “Oracle PL/SQL Performance Tuning Tips & Techniques” by Michael Rosenblum and Paul Dorsey today.

I recommend it as a lesson, a reference and an inspiration for things to try.
Steven Feuerstein still gets the nod as the “go to” for first learning how to use pl/sql; but this book has great examples for learning how to use it better.

It’s quite readable, definitely one of the best out of Oracle Press I’ve seen in a while.
The examples cover 11g and some 12c as well when warranted. The code snippet construction is particularly good. They are not only complete and functional but don’t come off as completely arbitrary. They are modeled after real world tasks, making it much easier to relate to them.

Update to LDAP package


I’m in the process of updating my OID package from last year to version 2.0. The new version will include a significant change in API but will also include more functionality, in particular, support for TNS aliases. I’m also changing the way exceptions are raised as well as improving the efficiency of searches.

Along the way to 2.0, I have made a few incremental on the original release; so, I’ve updated the original post with a link to Version 1.3 and updated the example to show the new authentication method.

If someone still wants the original release, it can be found here: Version 1.2

Parallel PL/SQL with DBMS_PARALLEL_EXECUTE


11gR2 introduced the DBMS_PARALLEL_EXECUTE package.

The basic idea behind it is you divide some large task into smaller chunks and then execute all the chunks in parallel.
The API and the examples in the Oracle Documentation might lead one to believe the package can only be used to split large SQL statements into parallel chunks.

You can however use the same package to execute a pl/sql block multiple times simultaneously.
With proper use of the chunk ids you can even execute multiple different procedures in parallel as single task.

I’ll illustrate both below.

First, create a table to hold some results:

CREATE TABLE parallel_exec_test_table
(
    test_name   VARCHAR2(100),
    chunk_name  VARCHAR2(100),
    chunk_value INTEGER,
    chunk_start TIMESTAMP,
    chunk_end   TIMESTAMP
);

Next, I’ll create a procedure that does some dummy work, pausing for a second and updating the test table multiple times.

CREATE OR REPLACE PROCEDURE parallel_exec_test1_chunk(
    p_test_name     IN VARCHAR2,
    p_chunk_name    IN VARCHAR2,
    p_chunk_value   IN INTEGER
)
IS
BEGIN
    IF p_chunk_value = 4
    THEN
        RAISE VALUE_ERROR;
    END IF;

    INSERT INTO parallel_exec_test_table(test_name, chunk_name, chunk_start)
    VALUES (p_test_name, p_chunk_name, SYSTIMESTAMP);

    DBMS_LOCK.sleep(2);

    FOR i IN 1 .. p_chunk_value
    LOOP
        UPDATE parallel_exec_test_table
           SET chunk_value = i
         WHERE test_name = p_test_name AND chunk_name = p_chunk_name;
    END LOOP;

    UPDATE parallel_exec_test_table
       SET chunk_end = SYSTIMESTAMP
     WHERE test_name = p_test_name AND chunk_name = p_chunk_name;
END;

You might note the intentional RAISE for an input of 4, that’s to illustrate the parallel execution exception handling.

Next, create a procedure to call the test chunk multiple times in parallel.
Important to note your pl/sql block must have bind variables called :start_id and :end_id.
If you don’t have both, or name them something different you’ll get an ORA-01006 exception.
In this example, I generate a list of chunks with ids 1-10. Each chunk starts and ends with a single id, so 10 chunks.
I then pass these ids to the chunk procedure and execute 5 of them at a time. Since each invocation of the chunk procedure will log when it starts this should create 2 clusters of five parallel executions.

CREATE OR REPLACE PROCEDURE parallel_exec_test_proc
IS
    v_task_name   VARCHAR2(30) := 'parallel_test1';
    v_plsql_block VARCHAR2(32767);
BEGIN
    DELETE parallel_exec_test_table
     WHERE test_name = v_task_name;

    DBMS_PARALLEL_EXECUTE.create_task(task_name => v_task_name);

    DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(
        task_name   => v_task_name,
        sql_stmt    => 'SELECT level start_id, level end_id FROM dual connect by level <=10',
        by_rowid    => FALSE
    );


    v_plsql_block :=
        q'[
begin 
   parallel_exec_test1_chunk('parallel_test1',to_char(to_date(:start_id,'j'),'jsp'),:end_id);
end;
]';
    DBMS_PARALLEL_EXECUTE.run_task(
        task_name        => v_task_name,
        sql_stmt         => v_plsql_block,
        language_flag    => DBMS_SQL.native,
        parallel_level   => 5
    );

    DBMS_OUTPUT.put_line(
           TO_CHAR(SYSTIMESTAMP, 'yyyy-mm-dd hh24:mi:ss.ff')
        || '  '
        || DBMS_PARALLEL_EXECUTE.task_status(v_task_name)
    );
END;
/

Testing produces the following results…

SQL> set timing on
SQL> EXEC parallel_exec_test_proc

PL/SQL procedure successfully completed.

Elapsed: 00:00:06.03
SQL> SELECT * FROM parallel_exec_test_table WHERE test_name = 'parallel_test1' ORDER BY chunk_start;

TEST_NAME            CHUNK_NAME CHUNK_VALUE CHUNK_START                CHUNK_END
-------------------- ---------- ----------- -------------------------- ------------------------------
parallel_test1       three                3 2013-10-17 00:44:27.494957 2013-10-17 00:44:29.496812
parallel_test1       five                 5 2013-10-17 00:44:27.495694 2013-10-17 00:44:29.499113
parallel_test1       six                  6 2013-10-17 00:44:27.496019 2013-10-17 00:44:29.499743
parallel_test1       seven                7 2013-10-17 00:44:27.496139 2013-10-17 00:44:29.498747
parallel_test1       eight                8 2013-10-17 00:44:27.510579 2013-10-17 00:44:29.514314
parallel_test1       nine                 9 2013-10-17 00:44:29.498327 2013-10-17 00:44:31.500831
parallel_test1       ten                 10 2013-10-17 00:44:29.500020 2013-10-17 00:44:31.503208
parallel_test1       one                  1 2013-10-17 00:44:29.500746 2013-10-17 00:44:31.502249
parallel_test1       two                  2 2013-10-17 00:44:29.501337 2013-10-17 00:44:31.502495

9 rows selected.

Elapsed: 00:00:00.04

As you can see, 5 runs at 44:27 and 4 runs at 44:29. You might also note that even though the chunks were defined in order 1-10, the actual execution order is effectively random.
Also note, 10 chunks were submitted but only 9 records were written. This is because for id 4 raised an exception.

Checking the chunk status in the dictionary we can see the error

SQL> SELECT start_id,end_id,status,start_ts,error_message FROM dba_parallel_execute_chunks WHERE task_name = 'parallel_test1';

  START_ID     END_ID STATUS               START_TS                                                            ERROR_MESSAGE
---------- ---------- -------------------- --------------------------------------------------------------------------- --------------------------------------------------
         3          3 PROCESSED            2013-10-17 00:44:27.494342
         5          5 PROCESSED            2013-10-17 00:44:27.494354
         6          6 PROCESSED            2013-10-17 00:44:27.495262
         7          7 PROCESSED            2013-10-17 00:44:27.495951
         8          8 PROCESSED            2013-10-17 00:44:27.510169
         9          9 PROCESSED            2013-10-17 00:44:29.497890
        10         10 PROCESSED            2013-10-17 00:44:29.499605
         1          1 PROCESSED            2013-10-17 00:44:29.500361
         2          2 PROCESSED            2013-10-17 00:44:29.500928
         4          4 PROCESSED_WITH_ERROR 2013-10-17 00:44:27.494370                                              ORA-06502: PL/SQL: numeric or value error

10 rows selected.

Another, potentially more useful option is executing different procedures simultaneously.
To demonstrate this I’ll create 5 new procedures.

CREATE OR REPLACE PROCEDURE parallel_exec_test2_chunk1(p_test_name IN VARCHAR2)
IS
    c_chunk_name CONSTANT VARCHAR2(6) := 'chunk1';
BEGIN
    INSERT INTO parallel_exec_test_table(test_name, chunk_name, chunk_start)
    VALUES (p_test_name, c_chunk_name, SYSTIMESTAMP);

    DBMS_LOCK.sleep(2);

    UPDATE parallel_exec_test_table
       SET chunk_end = SYSTIMESTAMP
     WHERE test_name = p_test_name AND chunk_name = c_chunk_name;
END;

CREATE OR REPLACE PROCEDURE parallel_exec_test2_chunk2(p_test_name IN VARCHAR2)
IS
    c_chunk_name CONSTANT VARCHAR2(6) := 'chunk2';
BEGIN
    INSERT INTO parallel_exec_test_table(test_name, chunk_name, chunk_start)
    VALUES (p_test_name, c_chunk_name, SYSTIMESTAMP);

    DBMS_LOCK.sleep(2);

    UPDATE parallel_exec_test_table
       SET chunk_end = SYSTIMESTAMP
     WHERE test_name = p_test_name AND chunk_name = c_chunk_name;
END;

CREATE OR REPLACE PROCEDURE parallel_exec_test2_chunk3(p_test_name IN VARCHAR2)
IS
    c_chunk_name CONSTANT VARCHAR2(6) := 'chunk3';
BEGIN
    INSERT INTO parallel_exec_test_table(test_name, chunk_name, chunk_start)
    VALUES (p_test_name, c_chunk_name, SYSTIMESTAMP);

    DBMS_LOCK.sleep(2);

    UPDATE parallel_exec_test_table
       SET chunk_end = SYSTIMESTAMP
     WHERE test_name = p_test_name AND chunk_name = c_chunk_name;
END;

CREATE OR REPLACE PROCEDURE parallel_exec_test2_chunk4(p_test_name IN VARCHAR2)
IS
    c_chunk_name CONSTANT VARCHAR2(6) := 'chunk4';
BEGIN
    INSERT INTO parallel_exec_test_table(test_name, chunk_name, chunk_start)
    VALUES (p_test_name, c_chunk_name, SYSTIMESTAMP);

    DBMS_LOCK.sleep(2);

    UPDATE parallel_exec_test_table
       SET chunk_end = SYSTIMESTAMP
     WHERE test_name = p_test_name AND chunk_name = c_chunk_name;
END;

CREATE OR REPLACE PROCEDURE parallel_exec_test2_chunk5(p_test_name IN VARCHAR2)
IS
    c_chunk_name CONSTANT VARCHAR2(6) := 'chunk5';
BEGIN
    INSERT INTO parallel_exec_test_table(test_name, chunk_name, chunk_start)
    VALUES (p_test_name, c_chunk_name, SYSTIMESTAMP);

    DBMS_LOCK.sleep(2);

    UPDATE parallel_exec_test_table
       SET chunk_end = SYSTIMESTAMP
     WHERE test_name = p_test_name AND chunk_name = c_chunk_name;
END;

Just as in the first test, I’ll create a procedure that will execute them simultaneously. The trick here will be that the API requires chunk ids not procedure names. This is easily handled though with a simple CASE statement to call the procedure I want for each id.

CREATE OR REPLACE PROCEDURE parallel_exec_test2_proc
IS
    v_task_name   VARCHAR2(30) := 'parallel_test2';
    v_plsql_block VARCHAR2(32767);
BEGIN
    DELETE parallel_exec_test_table
     WHERE test_name = v_task_name;

    DBMS_PARALLEL_EXECUTE.create_task(task_name => v_task_name);

    DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(
        task_name   => v_task_name,
        sql_stmt    => 'SELECT level start_id, level end_id FROM dual connect by level <=5',
        by_rowid    => FALSE
    );


    v_plsql_block := q'[
declare
    v_dummy integer := :end_id; 
begin 
  case :start_id
    when 1 then parallel_exec_test2_chunk1('parallel_test2');
    when 2 then parallel_exec_test2_chunk2('parallel_test2');
    when 3 then parallel_exec_test2_chunk3('parallel_test2');
    when 4 then parallel_exec_test2_chunk4('parallel_test2');
    when 5 then parallel_exec_test2_chunk5('parallel_test2');
  end case; 
end;
]';
    DBMS_PARALLEL_EXECUTE.run_task(
        task_name        => v_task_name,
        sql_stmt         => v_plsql_block,
        language_flag    => DBMS_SQL.native,
        parallel_level   => 5
    );

    DBMS_OUTPUT.put_line(
           TO_CHAR(SYSTIMESTAMP, 'yyyy-mm-dd hh24:mi:ss.ff')
        || '  '
        || DBMS_PARALLEL_EXECUTE.task_status(v_task_name)
    );
END;
/

And testing produces these results…

SQL> EXEC parallel_exec_test2_proc

PL/SQL procedure successfully completed.

Elapsed: 00:00:06.02
SQL> SELECT *
  2    FROM parallel_exec_test_table
  3   WHERE test_name = 'parallel_test2'
  4  ORDER BY chunk_start;

TEST_NAME            CHUNK_NAME CHUNK_VALUE CHUNK_START                CHUNK_END
-------------------- ---------- ----------- -------------------------- ----------------------------
parallel_test2       chunk3                 2013-10-17 01:11:18.236996 2013-10-17 01:11:20.238041
parallel_test2       chunk4                 2013-10-17 01:11:18.242705 2013-10-17 01:11:20.244119
parallel_test2       chunk5                 2013-10-17 01:11:18.251068 2013-10-17 01:11:20.252110
parallel_test2       chunk1                 2013-10-17 01:11:18.259176 2013-10-17 01:11:20.260017
parallel_test2       chunk2                 2013-10-17 01:11:18.269944 2013-10-17 01:11:20.271156

If I was working with 11gR1 or older I’d use dbms_scheduler or dbms_job (for really old versions like 9i or lower.) Using dbms_parallel_execute is certainly easier. One important thing to note is each chunk is committed if it completes successfully. So, this is not a good technique if you need to maintain an atomic transaction across all the chunks. Still, it’s another tool for the toolbox and one I appreciate.

I hope it helps.
Questions and comments, as always, are welcome.

Managing TNS lookups in OpenLDAP via PL/SQL


I have an Oracle Express Edition (XE) database running on almost every pc in my home. It’s not so numerous that I can’t manage them by manipulation of TNSNAMES.ORA; but I always thought it would be nice to manage them centrally via LDAP service name lookup.

Unfortunately Oracle Internet Directory (OID) licensing is included with a valid Oracle Database license unless that database is XE. I don’t really need or want a full OID installation anyway. So, I looked into other LDAP alternatives and decided on OpenLDAP. It’s open source, free and I can install it on Linux or Windows. I chose Linux and followed the instructions here…
http://diznix.com/articles/oracle-names-resolution-with-openldap without issue.

configuring one of the Windows XE installs to use my new linux server was simple enough
C:\oraclexe\app\oracle\product\11.2.0\server\network\ADMIN\sqlnet.ora
NAMES.DIRECTORY_PATH= (LDAP, EZCONNECT)

C:\oraclexe\app\oracle\product\11.2.0\server\network\ADMIN\ldap.ora
DIRECTORY_SERVERS= (mylinuxserver:389:636)
DEFAULT_ADMIN_CONTEXT = "dc=home"
DIRECTORY_SERVER_TYPE = OID

The next step was getting my databases defined in the ldap server. I could do it through ldif files as shown in the link above; but I thought it would be nicer to have a sql and pl/sql interface to the server. Oracle facilitates that with the DBMS_LDAP package. To make it even easier, I built a wrapper package for it.

First, set up a user to own the package and, since I’m using 11g XE I need to create an Access Control List to allow my owner to talk to my ldap/oid server. I’m using the default non-SSL port 389.

create user oidowner identified by pa55w0rd;
grant create session to oidowner;
grant create procedure to oidowner;
grant create type to oidowner;

BEGIN
DBMS_NETWORK_ACL_ADMIN.create_acl(acl => 'oid.xml',
description => 'ACL that lets users see oid server',
principal => 'OIDOWNER',
is_grant => TRUE,
privilege => 'connect'
);

DBMS_NETWORK_ACL_ADMIN.assign_acl('oid.xml', 'mylinuxserver',389,389);
COMMIT;
END;

Next I create my package and a supporting type to make listing the contents easier

create or replace type vcarray as table of varchar2(4000);

Package code is available from my dropbox https://dl.dropboxusercontent.com/u/75336811/sdsoid_pkg_1.3.txt

And finally, I add an entry for an XE installation on one of my PCs.

BEGIN
sdsoid.insert_oid(
'dc=home',
'mylinuxserver',
'seanxe',
'(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=seanpc)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=xe)))',
'cn=myadminuser,dc=home',
'myadminpassword'
);
END;

Then I check my work…

C:\oraclexe\app\oracle\product\11.2.0\server\network\ADMIN>tnsping seanxe

TNS Ping Utility for 32-bit Windows: Version 11.2.0.2.0 - Production on 24-AUG-2013 13:38:16

Copyright (c) 1997, 2010, Oracle. All rights reserved.

Used parameter files:
c:\oracle\product\11.2.0.2\network\admin\sqlnet.ora

Used LDAP adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=seanpc)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=xe)))
OK (0 msec)

But, I want a SQL interface to the LDAP entries. This is where the collection type defined before the package comes in.
Using the list_oid table function it’s easy to retrieve some of the entries for other databases I’ve created.

C:\oraclexe\app\oracle\product\11.2.0\server\network\ADMIN>sqlplus oidowner/pa55w0rd@seanxe

SQL*Plus: Release 11.2.0.2.0 Production on Sat Aug 24 14:01:54 2013

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Express Edition Release 11.2.0.2.0 - Production

SQL> set lines 1000
SQL> SELECT * FROM TABLE(sdsoid.list_oid('dc=home', 'mylinuxserver'));

COLUMN_VALUE
-------------------------------------------------------------------------------------------------------------------------
kidsxe: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=familypc)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=xe)))
testxe: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=testpc)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=xe)))
seanxe: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=seanpc)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=xe)))

Currently all entries are maintained and retrieved with the full connect descriptor. In my next release I’ll add support for parameterized entries. I have tested this code on 12c databases and it works without modification.

Questions and comments are welcome.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: