Parallel Processing Shell Script - SQLPLUS example...

I have a data collection system that I run against all DBs in my environment nightly. I inherited a script that would just start at the top of the list and SQLPlus into them one by one to collect the data. Well the scope of the data I needed to collect expanded and put me in the 0.75-1 minute per database range... and I now have 600+ databases I need to talk to nightly. It was taking over 5 hours a night to collect this info... I wanted it to take an hour or less.

Yes there are other methods to collect this data. My main intent was to have no configuration and zero footprint on target databases. Log in, collect and store data, log out. No fuss, no muss, no distributed code, just get the data and run.

I was about to roll my own script to run multiple SQLPLUS sessions in parallel when I came across Parallel Processing Shell Script on Google Code. I hate recreating the wheel. And after messing with PPSS a few minutes I realized they had covered all the territory I needed to cover and did a much better and robust job than I could in the time I had to deal with my issue.

What is PPSS?
From the project page:
PPSS is a Bash shell script that executes commands, scripts or programs in parallel. It is designed to make full use of current multi-core CPUs. It will detect the number of available CPUs and start a separate job for each CPU core. It will also use hyper threading by default.
PPSS can be run on multiple hosts, processing a single group of items, like a cluster.

Well I didn't need to go super massively parallel across multiple machines, but running 6 or 8 SQLPLUS sessions at a time would get me below my 60 minute goal so that's where I started.

1) Download and set up ppss
The Google Code project and download page is here.

2) Create or generate a list of database names you want to visit
I leveraged TNS via LDAP for connections to the database. So before I kick off ppss my shell script creates a text file with 1 TNS name per line. An example of how to do this is here.

dblist.txt:

MYDB1
MYDB2
MYDB3
MYDB4

... and so on.

If you have OEM you may be able to generate a list of DB names from the OEM repository. More info on these views is here. If you have the full connection string to each database available you could use that as well. We are just going to tack the contents of each line on to a SQLPLUS command line so whatever works easiest for you.

3) Create a shell script that will handle 1 execution of the process you want to parallelize

Like when you are creating a script to run with cron, it is best to have the shell script set up all the environment you may possibly need. In this case it is PATH, ORACLE_SID, and we leverage oraenv to set the rest of the library paths, etc.

Keep in mind the script must not interfere with other instances of the script that are running. For example, if you script outputs to LOG.TXT... when happens when 8 copies of the script are running and writing to LOG.TXT? Consider appending something unique (like DB Name in this case) to the script to make it unique for each run.

getdata.sh

# Pass in the TNS name so we can use it to connect
CONNDB=$1; export CONNDB

PATH=$PATH:/usr/local/bin:/u01/app/oracle/product/11.1.0.7/bin;export PATH
ORACLE_SID=orcl;export ORACLE_SID
ORAENV_ASK=NO;export ORAENV_ASK
. oraenv orcl

$ORACLE_HOME/bin/sqlplus -s /nolog << EOF
set echo on
connect util/util@CONNDB
insert into db_link_@repolink select * from dba_db_links;
exit sql.sqlcode
EOF
exit

4) Prepare to run in parallel
This will fire up 1 instance of the ppss shell script on your local machine, using dblist.txt as input, getdata.sh as the script to run, and spinning up 6 processes.

ppss -f dblist.txt -c 'getdata.sh "$ITEM"' -p 6 -l ppss_log.txt

A demonstration from the PPSS project folks is available here.

Debugging
PPSS writes extensive info to its logs. It is sometimes hard to navigate all of the logs it generates, but if you have a failure you should be able to find out why by looking at the PPSS output directory.
Here is a script that will examine all of the files in the ppss_log directory for errors. It prints out the DB that was trying to be contacted and also the relevant ORA- errors that were thrown:

#!/bin/bash
LOGFILES=./ppss_dir/job_log/*
ERRNUM=1
for file in $LOGFILES
do

  # Determine if ORA error occured in log file
  NUMERRORS=`grep ORA- $file | wc -l`
  if [ $NUMERRORS -gt 0 ]
  then
    echo Error Number : $ERRNUM
    ERRNUM=$[ERRNUM+1]
    grep Item: $file
    grep ORA- $file | grep -v 06512
    echo ----------------------------------------------------------------------
  fi

done