preprocessor的意思是,数据是以某种格式存放的,需要你去预先去处理一下,而不要去改变原来的文件,比如:有个gz格式的压缩文件,你处理它的时候不要去试图解压缩他,而是把它的内容显示出来,符合他的预处理程序有:zcat 、gzip -cd ,前面做实验的时候想当然的认为预处理就是把它事先解压缩,写了个shell,用了gzip -d $1 ,做了很长时间都没有成功,每次建完external table的时候都把原始文件给解压缩了。
Oracle? Database Utilities -> 14 The ORACLE_LOADER Access Driver
If the file you want to load contains data records that are not in a format supported by the ORACLE_LOADER access driver, then use the PREPROCESSOR clause to specify a user-supplied preprocessor program that will execute for every data file. Note that the program specification must be enclosed in a shell script. if it uses arguments (see the description of "file_spec").
The preprocessor program converts the data to a record format supported by the access driver and then writes the converted record data to standard output (stdout), which the access driver reads as input. The syntax of the PREPROCESSOR clause is as follows:
下面做个简单的例子:
1.生成数据文件 d.dat 内容为:
13,987,1998-01-10 00:00:00,3,999,1,1232.16,
13,1660,1998-01-10 00:00:00,3,999,1,1232.16,
13,1762,1998-01-10 00:00:00,3,999,1,1232.16,
13,1843,1998-01-10 00:00:00,3,999,1,1232.16,
13,1948,1998-01-10 00:00:00,3,999,1,1232.16,
13,2273,1998-01-10 00:00:00,3,999,1,1232.16,
13,2380,1998-01-10 00:00:00,3,999,1,1232.16,
2.生成压缩数据文件:
gzip d.dat
3.写个shell :uncompress ,内容为,shell的权限对oracle用户有x权限
/bin/gzip -cd $1
4.建一个directory :DUMP
create directory dump as '/home/oracle/dump';
5.把压缩数据文件、shell文件都放dump下
6.
CREATE TABLE ext2
(
"PROD_ID" NUMBER,
"CUST_ID" NUMBER,
"TIME_ID" DATE,
"CHANNEL_ID" NUMBER,
"PROMO_ID" NUMBER,
"QUANTITY_SOLD" NUMBER(10,2),
"AMOUNT_SOLD" NUMBER(10,2)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY DUMP
ACCESS PARAMETERS
(
RECORDS DELIMITED BY '
' CHARACTERSET US7ASCII
PREPROCESSOR DUMP:'uncompress'
BADFILE 'DUMP':'c.bad'
LOGFILE 'c.log_xt'
READSIZE 1048576
FIELDS TERMINATED BY "," LDRTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
(
"PROD_ID" CHAR(255)
TERMINATED BY ",",
"CUST_ID" CHAR(255)
TERMINATED BY ",",
"TIME_ID" CHAR(255)
TERMINATED BY ","
DATE_FORMAT DATE MASK "yyyy-mm-dd hh24:mi:ss",
"CHANNEL_ID" CHAR(255)
TERMINATED BY ",",
"PROMO_ID" CHAR(255)
TERMINATED BY ",",
"QUANTITY_SOLD" CHAR(255)
TERMINATED BY ",",
"AMOUNT_SOLD" CHAR(255)
TERMINATED BY ","
)
)
location
(
'd.dat.gz'
)
)REJECT LIMIT UNLIMITED parallel 6
;
|