Setting up the Star Schema Benchmark (SSB) in Oracle


In my previous two posts I showed how to setup a schema for the TPC-H tables and test data. A related test system called the Star Schema Benchmark (SSB) from Pat O’Neil, Betty O’Neil, and Xuedong Chen at the University of Massachusetts at Boston alters the TPC-H structures to create a warehousing data model.

The SSB is documented here. As with the TPC-H setup, a modified version of the dbgen utility creates the data can also be found on the UMass site here.

Unzip the dbgen file somewhere on your linux system. As of the time of this writing June 5, 2009 Revision 3 is the current version of the SSB. In the future the steps may change slightly but should be fairly similar to what is described below.
When you unzip, a directory dbgen will be created. Change to that directory, make a copy of the makefile template and edit it. The SSB version of dbgen doesn’t support Oracle but that’s not necessary for the data setup, only the query generation requires a database. So you can pick any of the supported databases for the purposes of data creation.

$ cd dbgen
$ cp makefile.suite makefile
$ vi makefile

Within the makefile, change the following lines

CC = gcc
DATABASE= SQLSERVER
MACHINE = LINUX
WORKLOAD = SSBM

Then make will create the dbgen utility. You’ll get several warnings but since we’re not distributing the resulting binary and have controlled usage they are safe to ignore.

Oracle offers a training lab for the 12c In-Memory option utilizing the SSB schema. Their training guide states the test system is based on a 50GB scale; but the query results illustrated in the guide show only a 4GB data set, so I’ll demonstrate the same here.
We’ll generate a small, approximately 4GB, set of test data and place it in a directory we’ll use later for the upload into the Oracle tables. Unlike the TPC-H dbgen you must generate each file type individually.

$ ./dbgen -s 4 -T c
$ ./dbgen -s 4 -T p
$ ./dbgen -s 4 -T s
$ ./dbgen -s 4 -T d
$ ./dbgen -s 4 -T l
$ mv *.tbl /home/oracle/ssb

Within the database, set up the SSB schema. All tables will be created according to the layouts described in section 2 of the SSB specification. For the purposes of data import, a set of external tables will also be created. These are not part of SSB itself and may be left in place or dropped after data load is complete. The SSB specification describes primary keys and foreign keys but no check constraints. The TPC-H schema allows NOT NULL declarations so I’ve included them here as well. If you prefer, simply remove the “NOT NULL” in the DDL below to allow nulls. The dbgen utility will populate every column though. The descriptions of a couple tables have some errors which I’ve made note of in the comments below. One is a partially documented; but seemingly unused column which I have removed. The other is on the date table which defines a day-of-week column of 8 characters but dbgen creates some days of 9 letters in length (“Wednesday”.)

One final change, the “DATE” table has been renamed “DATE_DIM” since DATE is an Oracle keyword. This change also makes the schema compatible with the Oracle In-Memory lab.

CREATE USER ssb IDENTIFIED BY ssb;

GRANT CREATE SESSION,
      CREATE TABLE,
      CREATE ANY DIRECTORY,
      UNLIMITED TABLESPACE
    TO ssb;

CREATE OR REPLACE DIRECTORY ssb_dir AS '/home/oracle/ssb';

GRANT READ, WRITE ON DIRECTORY ssb_dir TO ssb;

CREATE TABLE ssb.ext_lineorder
(
    lo_orderkey        INTEGER,
    lo_linenumber      NUMBER(1, 0),
    lo_custkey         INTEGER,
    lo_partkey         INTEGER,
    lo_suppkey         INTEGER,
    lo_orderdate       INTEGER,
    lo_orderpriority   CHAR(15),
    lo_shippriority    CHAR(1),
    lo_quantity        NUMBER(2, 0),
    lo_extendedprice   NUMBER,
    lo_ordtotalprice   NUMBER,
    lo_discount        NUMBER(2, 0),
    lo_revenue         NUMBER,
    lo_supplycost      NUMBER,
    --lo_ordsupplycost   NUMBER, -- this is mentioned in 2.2 Notes(c) but isn't in the layout or sample queries, so not needed?
    lo_tax             NUMBER(1, 0),
    lo_commitdate      INTEGER,
    lo_shipmode        CHAR(10)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY ssb_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('lineorder.tbl*'))
          PARALLEL 4;

CREATE TABLE ssb.lineorder
(
    lo_orderkey        INTEGER NOT NULL,
    lo_linenumber      NUMBER(1, 0) NOT NULL,
    lo_custkey         INTEGER NOT NULL,
    lo_partkey         INTEGER NOT NULL,
    lo_suppkey         INTEGER NOT NULL,
    lo_orderdate       NUMBER(8,0) NOT NULL,
    lo_orderpriority   CHAR(15) NOT NULL,
    lo_shippriority    CHAR(1) NOT NULL,
    lo_quantity        NUMBER(2, 0) NOT NULL,
    lo_extendedprice   NUMBER NOT NULL,
    lo_ordtotalprice   NUMBER NOT NULL,
    lo_discount        NUMBER(2, 0) NOT NULL,
    lo_revenue         NUMBER NOT NULL,
    lo_supplycost      NUMBER NOT NULL,
    --lo_ordsupplycost   NUMBER not null, -- this is mentioned in 2.2 Notes(c) but isn't in the layout or sample queries, so not needed?
    lo_tax             NUMBER(1, 0) NOT NULL,
    lo_commitdate      NUMBER(8,0) NOT NULL,
    lo_shipmode        CHAR(10) NOT NULL
);

CREATE TABLE ssb.ext_part
(
    p_partkey     INTEGER,
    p_name        VARCHAR2(22),
    p_mfgr        CHAR(6),
    p_category    CHAR(7),
    p_brand1      CHAR(9),
    p_color       VARCHAR2(11),
    p_type        VARCHAR2(25),
    p_size        NUMBER(2, 0),
    p_container   CHAR(10)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY ssb_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('part.tbl'));

CREATE TABLE ssb.part
(
    p_partkey     INTEGER NOT NULL,
    p_name        VARCHAR2(22) NOT NULL,
    p_mfgr        CHAR(6) NOT NULL,
    p_category    CHAR(7) NOT NULL,
    p_brand1      CHAR(9) NOT NULL,
    p_color       VARCHAR2(11) NOT NULL,
    p_type        VARCHAR2(25) NOT NULL,
    p_size        NUMBER(2, 0) NOT NULL,
    p_container   CHAR(10) NOT NULL
);

CREATE TABLE ssb.ext_supplier
(
    s_suppkey   INTEGER,
    s_name      CHAR(25),
    s_address   VARCHAR2(25),
    s_city      CHAR(10),
    s_nation    CHAR(15),
    s_region    CHAR(12),
    s_phone     CHAR(15)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY ssb_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('supplier.tbl'));

CREATE TABLE ssb.supplier
(
    s_suppkey   INTEGER NOT NULL,
    s_name      CHAR(25) NOT NULL,
    s_address   VARCHAR2(25) NOT NULL,
    s_city      CHAR(10) NOT NULL,
    s_nation    CHAR(15) NOT NULL,
    s_region    CHAR(12) NOT NULL,
    s_phone     CHAR(15) NOT NULL
);

CREATE TABLE ssb.ext_customer
(
    c_custkey      INTEGER,
    c_name         VARCHAR2(25),
    c_address      VARCHAR2(25),
    c_city         CHAR(10),
    c_nation       CHAR(15),
    c_region       CHAR(12),
    c_phone        CHAR(15),
    c_mktsegment   CHAR(10)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY ssb_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('customer.tbl'));

CREATE TABLE ssb.customer
(
    c_custkey      INTEGER NOT NULL,
    c_name         VARCHAR2(25) NOT NULL,
    c_address      VARCHAR2(25) NOT NULL,
    c_city         CHAR(10) NOT NULL,
    c_nation       CHAR(15) NOT NULL,
    c_region       CHAR(12) NOT NULL,
    c_phone        CHAR(15) NOT NULL,
    c_mktsegment   CHAR(10) NOT NULL
);

CREATE TABLE ssb.ext_date_dim
(
    d_datekey            NUMBER(8,0),
    d_date               CHAR(18),
    d_dayofweek          CHAR(9),    -- defined in Section 2.6 as Size 8, but Wednesday is 9 letters
    d_month              CHAR(9),
    d_year               NUMBER(4, 0),
    d_yearmonthnum       NUMBER(6, 0),
    d_yearmonth          CHAR(7),
    d_daynuminweek       NUMBER(1, 0),
    d_daynuminmonth      NUMBER(2, 0),
    d_daynuminyear       NUMBER(3, 0),
    d_monthnuminyear     NUMBER(2, 0),
    d_weeknuminyear      NUMBER(2, 0),
    d_sellingseason      CHAR(12),
    d_lastdayinweekfl    NUMBER(1, 0),
    d_lastdayinmonthfl   NUMBER(1, 0),
    d_holidayfl          NUMBER(1, 0),
    d_weekdayfl          NUMBER(1, 0)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY ssb_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('date.tbl'));

CREATE TABLE ssb.date_dim
(
    d_datekey            NUMBER(8,0) NOT NULL,
    d_date               CHAR(18) NOT NULL,
    d_dayofweek          CHAR(9) NOT NULL,    -- defined in Section 2.6 as Size 8, but Wednesday is 9 letters
    d_month              CHAR(9) NOT NULL,
    d_year               NUMBER(4, 0) NOT NULL,
    d_yearmonthnum       NUMBER(6, 0) NOT NULL,
    d_yearmonth          CHAR(7) NOT NULL,
    d_daynuminweek       NUMBER(1, 0) NOT NULL,
    d_daynuminmonth      NUMBER(2, 0) NOT NULL,
    d_daynuminyear       NUMBER(3, 0) NOT NULL,
    d_monthnuminyear     NUMBER(2, 0) NOT NULL,
    d_weeknuminyear      NUMBER(2, 0) NOT NULL,
    d_sellingseason      CHAR(12) NOT NULL,
    d_lastdayinweekfl    NUMBER(1, 0) NOT NULL,
    d_lastdayinmonthfl   NUMBER(1, 0) NOT NULL,
    d_holidayfl          NUMBER(1, 0) NOT NULL,
    d_weekdayfl          NUMBER(1, 0) NOT NULL
);

Now load the data. As you scale up into larger volumes, these steps are still valid; but you may want to split the loads into separate steps and alter the LINEORDER external table to read multiple files in parallel and use parallel dml on insert in order to speed up the process. The truncate lines aren’t necessary for the first time data load; but are included for future reloads of the dbgen data with other scaling.

TRUNCATE TABLE ssb.lineorder;
TRUNCATE TABLE ssb.part;
TRUNCATE TABLE ssb.supplier;
TRUNCATE TABLE ssb.customer;
TRUNCATE TABLE ssb.date_dim;

ALTER TABLE ssb.lineorder PARALLEL 4;
ALTER SESSION ENABLE PARALLEL DML;

INSERT /*+ APPEND */ INTO  ssb.part      SELECT * FROM ssb.ext_part;
commit;
INSERT /*+ APPEND */ INTO  ssb.supplier  SELECT * FROM ssb.ext_supplier;
commit;
INSERT /*+ APPEND */ INTO  ssb.customer  SELECT * FROM ssb.ext_customer;
commit;
INSERT /*+ APPEND */ INTO  ssb.date_dim  SELECT * FROM ssb.ext_date_dim;
commit;
INSERT /*+ APPEND */ INTO  ssb.lineorder SELECT * FROM ssb.ext_lineorder;
commit;

And finally, add the constraints and indexes.

ALTER TABLE ssb.lineorder
    ADD CONSTRAINT pk_lineorder PRIMARY KEY(lo_orderkey, lo_linenumber);

ALTER TABLE ssb.part
    ADD CONSTRAINT pk_part PRIMARY KEY(p_partkey);

ALTER TABLE ssb.supplier
    ADD CONSTRAINT pk_supplier PRIMARY KEY(s_suppkey);

ALTER TABLE ssb.customer
    ADD CONSTRAINT pk_customer PRIMARY KEY(c_custkey);

ALTER TABLE ssb.date_dim
    ADD CONSTRAINT pk_date_dim PRIMARY KEY(d_datekey);

---

ALTER TABLE ssb.lineorder
    ADD CONSTRAINT fk_lineitem_customer FOREIGN KEY(lo_custkey) REFERENCES ssb.customer(c_custkey);

ALTER TABLE ssb.lineorder
    ADD CONSTRAINT fk_lineitem_part FOREIGN KEY(lo_partkey) REFERENCES ssb.part(p_partkey);

ALTER TABLE ssb.lineorder
    ADD CONSTRAINT fk_lineitem_supplier FOREIGN KEY(lo_suppkey) REFERENCES ssb.supplier(s_suppkey);

ALTER TABLE ssb.lineorder
    ADD CONSTRAINT fk_lineitem_orderdate FOREIGN KEY(lo_orderdate) REFERENCES ssb.date_dim(d_datekey);

ALTER TABLE ssb.lineorder
    ADD CONSTRAINT fk_lineitem_commitdate FOREIGN KEY(lo_commitdate) REFERENCES ssb.date_dim(d_datekey);

And that’s it, you should now have a complete SSB scale-4 data set to complete either the SSB suite of test queries, oracle labs, or run your own tests.

If you want to generate larger data sets you can with similar syntax to that seen with TPC-H. The DDL and inserts above are already defined for either parallel or single-file loads of the lineorder table. The other tables are relatively small in comparison. They can still be split if desired but you probably won’t need to.

$ ./dbgen -s 10 -T c
$ ./dbgen -s 10 -T p
$ ./dbgen -s 10 -T s
$ ./dbgen -s 10 -T d
$ ./dbgen -s 10 -T l -C 4 -S 1
$ ./dbgen -s 10 -T l -C 4 -S 2
$ ./dbgen -s 10 -T l -C 4 -S 3
$ ./dbgen -s 10 -T l -C 4 -S 4

Enjoy!

Setting up TPC-H test data with Oracle on Linux (Part 1 – Small data sets)


While Oracle has their own SCOTT, SH, OE, HR, etc. sample schemas, it’s often useful to have the ability to scale data volumes for different experiments.  The TPC-H schema and sample data sets provide a convenient means of doing so. They are especially helpful in scenarios such as this, blogging, where readers may have a database but not the same data. Setting up the TPC-H data is fairly simple.

First, go to the TPC home, click Downloads and select the tools zip file.  You may also want to read the pdf file documenting the schema and the data scales.  Unzip the file somewhere on your linux system.  As of the time of this writing  2.17.2  is the current version.  In the future the steps may change slightly but should be fairly similar to what is described below.
When you unzip, a directory 2.17.2 will be created. Change to that directory, make a copy of the makefile template and edit it.

$ cd 2.17.2/dbgen
$ cp makefile.suite makefile
$ vi makefile

Within the makefile, change the following lines

CC = gcc
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TPCH

Then make will create the dbgen utility.

$ make
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o build.o build.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o driver.o driver.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o bm_utils.o bm_utils.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o rnd.o rnd.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o print.o print.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o load_stub.o load_stub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o bcd2.o bcd2.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o speed_seed.o speed_seed.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o text.o text.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o permute.o permute.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o rng64.o rng64.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64  -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o qgen.o qgen.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o varsub.o varsub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64  -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm

We’ll generate a small, approximately 4GB, set of test data and place it in a directory we’ll use later for the upload into the Oracle tables.

$ ./dbgen -s 4
TPC-H Population Generator (Version 2.17.2)
Copyright Transaction Processing Performance Council 1994 - 2010
$ ls *.tbl
customer.tbl  lineitem.tbl  nation.tbl  orders.tbl  partsupp.tbl  part.tbl  region.tbl  supplier.tbl
$ mv *.tbl /home/oracle/tpch

Within the database, set up the tpch schema. All tables will be created according to the layouts described in section 1.4.1 of the TPC-H standard specification. For the purposes of data import, a set of external tables will also be created. These are not part of TPC-H itself and may be left in place or dropped after data load is complete. Also, per section 1.4.2, constraints are optional. I’m including the allowable primary key, foreign key, not null, and check constraints described in the 1.4.2 subsections. Other than the default indexes created to support the primary key constraints, no other indexes are included in the steps below. For efficiency of data loading, the constraints and indexes will be added after the data loading is complete.

CREATE USER tpch IDENTIFIED BY tpch;

GRANT CREATE SESSION,
      CREATE TABLE,
      UNLIMITED TABLESPACE
    TO tpch;

CREATE OR REPLACE DIRECTORY tpch_dir AS '/home/oracle/tpch';

GRANT READ ON DIRECTORY tpch_dir TO tpch;

-- 1.4.1
--  per 1.4.2.1  all table columns may be defined NOT NULL

CREATE TABLE tpch.ext_part
(
    p_partkey       NUMBER(10, 0),
    p_name          VARCHAR2(55),
    p_mfgr          CHAR(25),
    p_brand         CHAR(10),
    p_type          VARCHAR2(25),
    p_size          INTEGER,
    p_container     CHAR(10),
    p_retailprice   NUMBER,
    p_comment       VARCHAR2(23)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('part.tbl'));

CREATE TABLE tpch.part
(
    p_partkey       NUMBER(10, 0) NOT NULL,
    p_name          VARCHAR2(55) NOT NULL,
    p_mfgr          CHAR(25) NOT NULL,
    p_brand         CHAR(10) NOT NULL,
    p_type          VARCHAR2(25) NOT NULL,
    p_size          INTEGER NOT NULL,
    p_container     CHAR(10) NOT NULL,
    p_retailprice   NUMBER NOT NULL,
    p_comment       VARCHAR2(23) NOT NULL
);


CREATE TABLE tpch.ext_supplier
(
    s_suppkey     NUMBER(10, 0),
    s_name        CHAR(25),
    s_address     VARCHAR2(40),
    s_nationkey   NUMBER(10, 0),
    s_phone       CHAR(15),
    s_acctbal     NUMBER,
    s_comment     VARCHAR2(101)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('supplier.tbl'));

CREATE TABLE tpch.supplier
(
    s_suppkey     NUMBER(10, 0) NOT NULL,
    s_name        CHAR(25) NOT NULL,
    s_address     VARCHAR2(40) NOT NULL,
    s_nationkey   NUMBER(10, 0) NOT NULL,
    s_phone       CHAR(15) NOT NULL,
    s_acctbal     NUMBER NOT NULL,
    s_comment     VARCHAR2(101) NOT NULL
);

CREATE TABLE tpch.ext_partsupp
(
    ps_partkey      NUMBER(10, 0),
    ps_suppkey      NUMBER(10, 0),
    ps_availqty     INTEGER,
    ps_supplycost   NUMBER,
    ps_comment      VARCHAR2(199)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('partsupp.tbl'));

CREATE TABLE tpch.partsupp
(
    ps_partkey      NUMBER(10, 0) NOT NULL,
    ps_suppkey      NUMBER(10, 0) NOT NULL,
    ps_availqty     INTEGER NOT NULL,
    ps_supplycost   NUMBER NOT NULL,
    ps_comment      VARCHAR2(199) NOT NULL
);

CREATE TABLE tpch.ext_customer
(
    c_custkey      NUMBER(10, 0),
    c_name         VARCHAR2(25),
    c_address      VARCHAR2(40),
    c_nationkey    NUMBER(10, 0),
    c_phone        CHAR(15),
    c_acctbal      NUMBER,
    c_mktsegment   CHAR(10),
    c_comment      VARCHAR2(117)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('customer.tbl'));

CREATE TABLE tpch.customer
(
    c_custkey      NUMBER(10, 0) NOT NULL,
    c_name         VARCHAR2(25) NOT NULL,
    c_address      VARCHAR2(40) NOT NULL,
    c_nationkey    NUMBER(10, 0) NOT NULL,
    c_phone        CHAR(15) NOT NULL,
    c_acctbal      NUMBER NOT NULL,
    c_mktsegment   CHAR(10) NOT NULL,
    c_comment      VARCHAR2(117) NOT NULL
);

-- read date values as yyyy-mm-dd text

CREATE TABLE tpch.ext_orders
(
    o_orderkey        NUMBER(10, 0),
    o_custkey         NUMBER(10, 0),
    o_orderstatus     CHAR(1),
    o_totalprice      NUMBER,
    o_orderdate       CHAR(10),
    o_orderpriority   CHAR(15),
    o_clerk           CHAR(15),
    o_shippriority    INTEGER,
    o_comment         VARCHAR2(79)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('orders.tbl'));

CREATE TABLE tpch.orders
(
    o_orderkey        NUMBER(10, 0) NOT NULL,
    o_custkey         NUMBER(10, 0) NOT NULL,
    o_orderstatus     CHAR(1) NOT NULL,
    o_totalprice      NUMBER NOT NULL,
    o_orderdate       DATE NOT NULL,
    o_orderpriority   CHAR(15) NOT NULL,
    o_clerk           CHAR(15) NOT NULL,
    o_shippriority    INTEGER NOT NULL,
    o_comment         VARCHAR2(79) NOT NULL
);

-- read date values as yyyy-mm-dd text

CREATE TABLE tpch.ext_lineitem
(
    l_orderkey        NUMBER(10, 0),
    l_partkey         NUMBER(10, 0),
    l_suppkey         NUMBER(10, 0),
    l_linenumber      INTEGER,
    l_quantity        NUMBER,
    l_extendedprice   NUMBER,
    l_discount        NUMBER,
    l_tax             NUMBER,
    l_returnflag      CHAR(1),
    l_linestatus      CHAR(1),
    l_shipdate        CHAR(10),
    l_commitdate      CHAR(10),
    l_receiptdate     CHAR(10),
    l_shipinstruct    CHAR(25),
    l_shipmode        CHAR(10),
    l_comment         VARCHAR2(44)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('lineitem.tbl'));

CREATE TABLE tpch.lineitem
(
    l_orderkey        NUMBER(10, 0),
    l_partkey         NUMBER(10, 0),
    l_suppkey         NUMBER(10, 0),
    l_linenumber      INTEGER,
    l_quantity        NUMBER,
    l_extendedprice   NUMBER,
    l_discount        NUMBER,
    l_tax             NUMBER,
    l_returnflag      CHAR(1),
    l_linestatus      CHAR(1),
    l_shipdate        DATE,
    l_commitdate      DATE,
    l_receiptdate     DATE,
    l_shipinstruct    CHAR(25),
    l_shipmode        CHAR(10),
    l_comment         VARCHAR2(44)
);

CREATE TABLE tpch.ext_nation
(
    n_nationkey   NUMBER(10, 0),
    n_name        CHAR(25),
    n_regionkey   NUMBER(10, 0),
    n_comment     VARCHAR(152)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('nation.tbl'));

CREATE TABLE tpch.nation
(
    n_nationkey   NUMBER(10, 0),
    n_name        CHAR(25),
    n_regionkey   NUMBER(10, 0),
    n_comment     VARCHAR(152)
);

CREATE TABLE tpch.ext_region
(
    r_regionkey   NUMBER(10, 0),
    r_name        CHAR(25),
    r_comment     VARCHAR(152)
)
ORGANIZATION EXTERNAL
    (TYPE oracle_loader
          DEFAULT DIRECTORY tpch_dir
              ACCESS PARAMETERS (
                  FIELDS
                      TERMINATED BY '|'
                  MISSING FIELD VALUES ARE NULL
              )
          LOCATION('region.tbl'));

CREATE TABLE tpch.region
(
    r_regionkey   NUMBER(10, 0),
    r_name        CHAR(25),
    r_comment     VARCHAR(152)
);

Now load the data. The external tables read the date values as text, so we must set the NLS_DATE_FORMAT prior to loading so the text will be parsed correctly, or embed the formatting within each sql statement. For the small data set in this example, the steps described here should complete within a few minutes. As you scale up into larger volumes, these steps are still valid; but you may want to split the loads into separate steps and alter the external to read multiple files in parallel and use parallel dml on insert in order to speed up the process. The truncate lines aren’t necessary for the first time data load; but are included for future reloads of the dbgen data with other scaling.

TRUNCATE TABLE tpch.part;
TRUNCATE TABLE tpch.supplier;
TRUNCATE TABLE tpch.partsupp;
TRUNCATE TABLE tpch.customer;
TRUNCATE TABLE tpch.orders;
TRUNCATE TABLE tpch.lineitem;
TRUNCATE TABLE tpch.nation;
TRUNCATE TABLE tpch.region;

ALTER SESSION SET nls_date_format='YYYY-MM-DD';

INSERT /*+ APPEND */ INTO  tpch.part     SELECT * FROM tpch.ext_part;
INSERT /*+ APPEND */ INTO  tpch.supplier SELECT * FROM tpch.ext_supplier;
INSERT /*+ APPEND */ INTO  tpch.partsupp SELECT * FROM tpch.ext_partsupp;
INSERT /*+ APPEND */ INTO  tpch.customer SELECT * FROM tpch.ext_customer;
INSERT /*+ APPEND */ INTO  tpch.orders   SELECT * FROM tpch.ext_orders;
INSERT /*+ APPEND */ INTO  tpch.lineitem SELECT * FROM tpch.ext_lineitem;
INSERT /*+ APPEND */ INTO  tpch.nation   SELECT * FROM tpch.ext_nation;
INSERT /*+ APPEND */ INTO  tpch.region   SELECT * FROM tpch.ext_region;

And finally, add the constraints and indexes.

ALTER TABLE tpch.part
    ADD CONSTRAINT pk_part PRIMARY KEY(p_partkey);

ALTER TABLE tpch.supplier
    ADD CONSTRAINT pk_supplier PRIMARY KEY(s_suppkey);

ALTER TABLE tpch.partsupp
    ADD CONSTRAINT pk_partsupp PRIMARY KEY(ps_partkey, ps_suppkey);

ALTER TABLE tpch.customer
    ADD CONSTRAINT pk_customer PRIMARY KEY(c_custkey);

ALTER TABLE tpch.orders
    ADD CONSTRAINT pk_orders PRIMARY KEY(o_orderkey);

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT pk_lineitem PRIMARY KEY(l_linenumber, l_orderkey);

ALTER TABLE tpch.nation
    ADD CONSTRAINT pk_nation PRIMARY KEY(n_nationkey);

ALTER TABLE tpch.region
    ADD CONSTRAINT pk_region PRIMARY KEY(r_regionkey);

-- 1.4.2.3

ALTER TABLE tpch.partsupp
    ADD CONSTRAINT fk_partsupp_part FOREIGN KEY(ps_partkey) REFERENCES tpch.part(p_partkey);

ALTER TABLE tpch.partsupp
    ADD CONSTRAINT fk_partsupp_supplier FOREIGN KEY(ps_suppkey) REFERENCES tpch.supplier(s_suppkey);

ALTER TABLE tpch.customer
    ADD CONSTRAINT fk_customer_nation FOREIGN KEY(c_nationkey) REFERENCES tpch.nation(n_nationkey);

ALTER TABLE tpch.orders
    ADD CONSTRAINT fk_orders_customer FOREIGN KEY(o_custkey) REFERENCES tpch.customer(c_custkey);

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT fk_lineitem_order FOREIGN KEY(l_orderkey) REFERENCES tpch.orders(o_orderkey);

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT fk_lineitem_part FOREIGN KEY(l_partkey) REFERENCES tpch.part(p_partkey);

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT fk_lineitem_supplier FOREIGN KEY(l_suppkey) REFERENCES tpch.supplier(s_suppkey);

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT fk_lineitem_partsupp FOREIGN KEY(l_partkey, l_suppkey)
        REFERENCES tpch.partsupp(ps_partkey, ps_suppkey);

-- 1.4.2.4 - 1

ALTER TABLE tpch.part
    ADD CONSTRAINT chk_part_partkey CHECK(p_partkey >= 0);

ALTER TABLE tpch.supplier
    ADD CONSTRAINT chk_supplier_suppkey CHECK(s_suppkey >= 0);

ALTER TABLE tpch.customer
    ADD CONSTRAINT chk_customer_custkey CHECK(c_custkey >= 0);

ALTER TABLE tpch.partsupp
    ADD CONSTRAINT chk_partsupp_partkey CHECK(ps_partkey >= 0);

ALTER TABLE tpch.region
    ADD CONSTRAINT chk_region_regionkey CHECK(r_regionkey >= 0);

ALTER TABLE tpch.nation
    ADD CONSTRAINT chk_nation_nationkey CHECK(n_nationkey >= 0);

-- 1.4.2.4 - 2

ALTER TABLE tpch.part
    ADD CONSTRAINT chk_part_size CHECK(p_size >= 0);

ALTER TABLE tpch.part
    ADD CONSTRAINT chk_part_retailprice CHECK(p_retailprice >= 0);

ALTER TABLE tpch.partsupp
    ADD CONSTRAINT chk_partsupp_availqty CHECK(ps_availqty >= 0);

ALTER TABLE tpch.partsupp
    ADD CONSTRAINT chk_partsupp_supplycost CHECK(ps_supplycost >= 0);

ALTER TABLE tpch.orders
    ADD CONSTRAINT chk_orders_totalprice CHECK(o_totalprice >= 0);

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT chk_lineitem_quantity CHECK(l_quantity >= 0);

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT chk_lineitem_extendedprice CHECK(l_extendedprice >= 0);

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT chk_lineitem_tax CHECK(l_tax >= 0);

-- 1.4.2.4 - 3

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT chk_lineitem_discount CHECK(l_discount >= 0.00 AND l_discount <= 1.00);

-- 1.4.2.4 - 4

ALTER TABLE tpch.lineitem
    ADD CONSTRAINT chk_lineitem_ship_rcpt CHECK(l_shipdate <= l_receiptdate);

And that’s it, you should now have a complete TPC-H scale-4 data set to complete either the TPC-H suite of test queries, oracle labs, or run your own tests.

Enjoy!

Why does my code break with no changes and no invalid objects?


Recently I encountered an interesting situation where a job has been running successfully for months and then one day it suddenly started failing every time with an ORA-06502: PL/SQL: numeric or value error.  The same error can also manifest itself in bulk operations with ORA-06502: PL/SQL: numeric or value error: Bulk Bind: Truncated Bind.

The more confusing part is that no changes were made to the code and there were no invalid objects.  Digging into the problem some more, the job called a procedure which read from a view, selecting from remote tables via a database link.

So, next step, obviously, I check on the remote side: but again, I find no invalid objects there and looking at the LAST_DDL_TIME in DBA_OBJECTS I see none of the tables had been modified in a couple months either.

Those old remote changes seemed innocuous; but it had been even longer since I had changed anything in the view or procedure within the local db.  I recompiled the view in the local db and the job started working again.  So, what changed and why did it only fail now and not a couple months ago when the remote tables changed?

For this I had to go to SYS and query DBA_TAB_COLS with flashback query to check the columns of my view and sure enough, one of the VARCHAR2 columns had grown.  The reason nothing failed was no new data came in right away using the new, bigger limit.

Once I saw the problem, replicating it was fairly easy.  Another interesting quirk between SQL and PL/SQL is the query within the procedure would run without error when executed as SQL because it didn’t have precompiled size limitations.  So it would adapt on its own.

The example objects and data to replicate the problem are fairly simple but it can be a little tricky following along since the session alternates between the local and remote database.  I’ve left the connect statements and altered the SQL> prompt to include the database name to help highlight where the action is taking place.

sds@REMOTE_DB> connect sds/pa55w0rd@remote_db
Connected.
sds@REMOTE_DB> create table sds.testtable(test_id integer, testtext varchar2(10));

Table created.

sds@REMOTE_DB> insert into sds.testtable(test_id,testtext) values (1,'abcd');

1 row created.

sds@REMOTE_DB> insert into sds.testtable(test_id,testtext) values (2,'efgh');

1 row created.

sds@REMOTE_DB> commit;

Commit complete.

sds@REMOTE_DB> connect sds/pa55w0rd@local_db
Connected.
sds@LOCAL_DB> CREATE DATABASE LINK remote_db_link CONNECT TO sds IDENTIFIED BY "pa55w0rd" USING 'remote_db';

Database link created.

sds@LOCAL_DB> CREATE OR REPLACE VIEW test_view
AS
SELECT * FROM sds.testtable@remote_db_link; 2 3

View created.

sds@LOCAL_DB> SELECT * FROM test_view;

TEST_ID TESTTEXT
---------- ----------
1 abcd
2 efgh

sds@LOCAL_DB> CREATE OR REPLACE PROCEDURE show_remote_data
IS
BEGIN
FOR x IN ( SELECT test_id, testtext
FROM test_view
ORDER BY test_id)
LOOP
DBMS_OUTPUT.put_line(x.test_id || ' ' || x.testtext);
END LOOP;
END; 2 3 4 5 6 7 8 9 10
11 /

Procedure created.

sds@LOCAL_DB> set serveroutput on
sds@LOCAL_DB> exec show_remote_data;
1 abcd
2 efgh

PL/SQL procedure successfully completed.

Up to here everything is working normally, now we’ll make the text column larger.
The procedure will still work correctly even though the local and remote sizes don’t match. The error doesn’t occur until new data shows up that exceeds the prior limit.

sds@LOCAL_DB> connect sds/pa55w0rd@remote_db
Connected.
sds@REMOTE_DB> alter table sds.testtable modify (testtext varchar2(30));

Table altered.

sds@REMOTE_DB> connect sds/pa55w0rd@local_db
Connected.
sds@LOCAL_DB> exec show_remote_data;

PL/SQL procedure successfully completed.

sds@LOCAL_DB> set serveroutput on
sds@LOCAL_DB> exec show_remote_data;
1 abcd
2 efgh

PL/SQL procedure successfully completed.

sds@LOCAL_DB> connect sds/pa55w0rd@remote_db
Connected.
sds@REMOTE_DB> insert into sds.testtable(test_id,testtext) values (3,'abcdefghijklmnopqrstuvwxyz');

1 row created.

sds@REMOTE_DB> commit;

Commit complete.

sds@REMOTE_DB> connect sds/pa55w0rd@local_db
Connected.
sds@LOCAL_DB> set serveroutput on
sds@LOCAL_DB> exec show_remote_data;
1 abcd
2 efgh
BEGIN show_remote_data; END;

*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at "SDS.SHOW_REMOTE_DATA", line 4
ORA-06512: at line 1

sds@LOCAL_DB>

Missing peaks in ASH results


The ASH charts in OEM are great utilities for getting a quick summary of your system’s activity. However, these results can be misleading because of how the data is represented on screen. First, ASH is data is collected by sampling so it’s not a complete picture of everything that runs. Another thing to consider is that the charting in OEM doesn’t plot every ASH data point. Instead, it will average them across time slices. Within Top Activity and the ASH Analytics summary charts these points are then connected by curves or straight lines which then further dilutes the results.

Some example snapshots will help illustrate these issues.

The OEM Top Activity screen may produce a chart like this…
Top Activity

First, note the large spike around 1:30am on the 16th. This spike was largely comprised of RMAN backups and is a significant increase in overall activity on the server with approximately 9 active sessions at its peak and a sustained activity level of 8 for most of that period.

Next, let’s look at that same database using ASH Analytics and note how that spike is drawn as a pyramid of activity. While the slope of the sides is fairly steep, it’s still significantly more gradual than that illustrated by the Top Activity chart. The peak activity is still approximately 9 active sessions at its highest but it’s harder to determine when and where it tapers off because the charting simply draws straight lines between time slices.

ASH Analytics

But, ASH Analytics offers a zoom window feature and using that we can highlight the 1am-2am hour and we get a different picture that more closely reflects the story told in the Top Activity chart. Note the sharp increase at 1:30 as see in the Top Activity. Also, note the higher peaks approaching and exceeding 12 active sessions whereas each of the previous charts indicated a peak of 9. The last curiosity is when the activity declines it is more gradual than the Top Activity but steeper than the Analytics overall chart.

ASH Analytics wall

The charts above demonstrate some ambiguities in using any one visualization. In those examples though, the data was mostly consistent in magnitude, but differing on rate of change due to resolution of the time slices.

Another potential problem with the averaging is losing accuracy by dropping information. For instance, in the first chart above, note the brief IO spike around 9:30am with a peak of 6 active sessions. If you look on the ASH Analytics summary chart it has averaged the curve down to approximately 2 active sessions. If we now go to the ASH Analytics page and zoom in to only the 9am-10am hour, we see that spike was in fact much larger at 24! This is 4 to 12 times our previous values and more importantly, running at twice the number of available processors. It was a brief surge and the system recovered fine but if you were looking for potential trouble areas of resource contention, the first two charts could be misleading.

ASH Analytics peak

I definitely don’t want to discourage readers from using OEM’s ASH tools; but I also don’t want to suggest you need to zoom in on every single time range in order to get the most accurate picture. Instead I want readers to be aware of the limitations inherent in data averaging and if you do have reason to inspect activity at a narrow time range, then by all means zoom in with ASH Analytics to get the best picture. If you need larger scale summary views, consider querying the ASH data yourself to find extreme values that may have been hidden by the averaging.

The Curse of “Expertise”


Like everyone else, I make mistakes. While the results can sometimes be unfortunate, it’s also a truth that shouldn’t be ignored. A recurring problem though is that as a designated “expert” sometimes people don’t bother to test what I’ve given them. They just roll with it and then are surprised when their production installation goes awry.

I just ran into this situation again a few days ago. I was asked to help with a query that didn’t ever finish. I worked on it for a little while and came up with something that finished in a few seconds. Since the original didn’t finish, I didn’t have a predetermined set of results to test against. I manually walked through some sample data and my results seemed to tie out… so, it seemed like I was on the right track. I showed the client what I had and they were elated with the speed improvement.

I gave a brief description of what I had attempted to do and why it ran quickly. Then I asked them to test and contact me again if there were any questions.

The next day I got a message that they were very happy with the speed and were using it. I was glad to hear that but I also had been thinking that my query was extremely complicated, so even though it has apparently passed inspection I spent a few more minutes on it and came up with a simpler approach. This new method was almost as fast the other one but more significantly it returned more rows than my previous version. Clearly, at least one of them was incorrect.

With the simplified logic of the new version, it was much easier to verify that this second attempt was correct and the older more complicated version was wrong. I reached out to my client again and notified them of the change in query and problem I found. Then suggested they rerun more extensive tests anyway because I still could be wrong.

Fortunately, this second attempt did appear to be truly correct and the performance was still more than adequate.

Finding the name of an Oracle database


Oracle offers several methods for finding the name of a database.

More significantly, 12c introduces new functionality which may change the expected value from some of the old methods due to the multi-tenant feature.

Here are 11 methods for finding the name of a database.

SELECT ‘ora_database_name’ method, ora_database_name VALUE FROM DUAL
UNION ALL
SELECT ‘SYS_CONTEXT(userenv,db_name)’, SYS_CONTEXT(‘userenv’, ‘db_name’) FROM DUAL
UNION ALL
SELECT ‘SYS_CONTEXT(userenv,db_unique_name)’, SYS_CONTEXT(‘userenv’, ‘db_unique_name’) FROM DUAL
UNION ALL
SELECT ‘SYS_CONTEXT(userenv,con_name)’, SYS_CONTEXT(‘userenv’, ‘con_name’) FROM DUAL
UNION ALL
SELECT ‘SYS_CONTEXT(userenv,cdb_name)’, SYS_CONTEXT(‘userenv’, ‘cdb_name’) FROM DUAL
UNION ALL
SELECT ‘V$DATABASE name’, name FROM v$database
UNION ALL
SELECT ‘V$PARAMETER db_name’, VALUE
FROM v$parameter
WHERE name = ‘db_name’
UNION ALL
SELECT ‘V$PARAMETER db_unique_name’, VALUE
FROM v$parameter
WHERE name = ‘db_unique_name’
UNION ALL
SELECT ‘GLOBAL_NAME global_name’, global_name FROM global_name
UNION ALL
SELECT ‘DATABASE_PROPERTIES GLOBAL_DB_NAME’, property_value
FROM database_properties
WHERE property_name = ‘GLOBAL_DB_NAME’
UNION ALL
SELECT ‘DBMS_STANDARD.database_name’, DBMS_STANDARD.database_name FROM DUAL;

The results of these will vary by version, whether the db is a container or not,  and if its is a container, whether the query runs within a pluggable database or the container root database.
Note, the con_name and cdb_name options for the SYS_CONTEXT function do not exist in 11g or lower. So those queries in the union must be removed to execute in an 11g database. Within a pluggable database some of the methods recognize the PDB as the database, while others recognize the container as the database.

So, if you are using any of these methods in an 11g database and you upgrade to a 12c pluggable db, you may expect the PDB name to be returned, but instead you’ll get the CDB name instead.
Also note, some of the methods always return the name in capital letters, others will return the exact value used to create the database.

METHOD 12c

Non-Container

12c

CDB$Root

12c

PDB

11g
GLOBAL_NAME global_name SDS12CR1 SDSCDB1 SDSPDB1 SDS11GR2
DATABASE_PROPERTIES GLOBAL_DB_NAME SDS12CR1 SDSCDB1 SDSPDB1 SDS11GR2
DBMS_STANDARD.database_name SDS12CR1 SDSCDB1 SDSPDB1 SDS11GR2
ora_database_name SDS12CR1 SDSCDB1 SDSPDB1 SDS11GR2
V$DATABASE name SDS12CR1 SDSCDB1 SDSCDB1 SDS11GR2
SYS_CONTEXT(userenv,db_name) sds12cr1 sdscdb1 sdscdb1 sds11gr2
SYS_CONTEXT(userenv,db_unique_name) sds12cr1 sdscdb1 sdscdb1 sds11gr2
SYS_CONTEXT(userenv,con_name) sds12cr1 CDB$ROOT SDSPDB1 n/a
SYS_CONTEXT(userenv,cdb_name) sdscdb1 sdscdb1 n/a
V$PARAMETER db_name sds12cr1 sdscdb1 sdscdb1 sds11gr2
V$PARAMETER db_unique_name sds12cr1 sdscdb1 sdscdb1 sds11gr2

On a related note, only the container of a multi-tenant database has instances. So, while PDBs can declare their own name for the database level with some methods above; there is no corresponding PDB-instance name functionality.

How Oracle Stores Passwords


Several years ago I wrote a small summary of the Oracle password hashing and storage for versions up to 11g.

Today I’ve completed my update of that article up to 12.1.0.2, including code to mimic generation of passwords given the appropriate salts.
The initial publication is in PDF format, I may convert and reformat it to other forms for better distribution.

The pdf file can be downloaded from my dropbox here.

It was interesting and enjoyable digging into the details of the hashes and how they change between versions and interact with the case-sensitivity settings.

I hope you enjoy it as much as I did writing it.

%d bloggers like this: