pg_batch -- run SQL jobs in parallel
pg_batch [OPTIONS] FILENAME [args...]
The following parameters are available for OPTIONS. See also Options for detail.
pg_batch executes SQL jobs in PostgreSQL. It runs a SQL script that lists SQL jobs first, then runs the result SQLs in serial or parallel.
The features in pg_batch:
As an useful example, "A script that lists VACUUM commands" is attached. The script can be used instead of vacuumdb.
Run VACUUM jobs for test database.
$ pg_batch -d test $PGSHARE/contrib/pg_batch_vacuum.sql
Run user-jobs.sql for all databases.
$ pg_batch --all user-jobs.sql
The following command line options are available in pg_batch.
Options to connect to servers.
This utility, like most other PostgreSQL utilities, also uses the environment variables supported by libpq (see Environment Variables).
pg_batch has following restrictions and limitations:
pg_batch is a single-threaded program, but uses multiple sessions (-j option) and asynchronous queries to execute SQL jobs in parallel.
Job-listing scripts must have a query that returns SETOF (query AS text, priority AS float8). "query" is a SQL to be executed as a job. "priority" is a priority of job. In general, job-listing script will look like:
SELECT query, priority FROM jobs_list WHERE should_be_run_today;
"priority" column can be omitted. If so, priorities of all jobs are 0.
pg_batch executes jobs in descending order of "priority". In addition, if jobs are canceld by timeout or Ctrl+C, it raises WARNING for canceled jobs that have priorities equal or higher than 0. The priority should be chosen accoding to whether the job should be executed or not. For example, if you run jobs once per day, priorities can be assigned with the following rules:
If timeout is specified, the running job on timeout is canceled and remaining jobs are skipped. When jobs that has 0 or higher priorities are canceled or skipped, pg_batch writes WARNING messages for such jobs. Also, the exit code of pg_batch will be non-zero in such cases. An interruption with Ctrl+C is also treated as same as timeout.
Log messages written by pg_batch are explained in the section.
INFO: DATABASE 'database name' (number of jobs jobs)
INFO: [no/total] START: start timestamp (YYYY-MM-DD HH:MI:SS) INFO: [no/total] QUERY: SQL
/* when the job is succeeded */ INFO: [no/total] SUCCESS: end timestamp (YYYY-MM-DD HH:MI:SS) (duration (HH:MM:SS)) /* when the job is failed */ WARNING: [no/total] FAILED: end timestamp (YYYY-MM-DD HH:MI:SS) (duration (HH:MM:SS))
/* when no failed jobs */ INFO: TIMEOUT timestamp of timeout (YYYY-MM-DD HH:MI:SS) /* when some failed jobs */ WARNING: TIMEOUT timestamp of timeout (YYYY-MM-DD HH:MI:SS)
/* when low-priority jobs (priority < 0) */ INFO: [no/total] SKIP: SQL /* when high-priority jobs (priority >= 0) */ WARNING: [no/total] SKIP: SQL
ERROR: error messages
The script runs VACUUMs for each table that required to be vacuumed, like autovacuum. Only tables that have many dead tuples or near to transaction wrap-around will be vacuumed. The script is better than vacuumdb because it skips tables that doesn't need to be vacuumed and run VACUUMs in parallel.
The script runs VACUUMs aggressively than autovacuum. It uses the following threshold to determine to run VACUUMs. See also "The Autovacuum Daemon" for details.
pg_batch can be installed like as other standard contrib modules.
The module can be built with pgxs.
$ cd pg_batch $ make USE_PGXS=1 $ make USE_PGXS=1 install
No registration is required just to use pg_batch. If you will run "VACUUM jobs", you need to run $PGSHARE/contrib/pg_batch.sql for each table.
$ psql -d dbname -f $PGSHARE/contrib/pg_batch.sql