Parallel

From TBP Wiki
Jump to: navigation, search

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input into blocks and pipe a block into each command in parallel.

If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.

GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.

For each line of input GNU parallel will execute command with the line as arguments. If no command is given, the line of input is executed. Several lines will be run in parallel. GNU parallel can often be used as a substitute for xargs or cat | bash.

Using Parallel

This lists the contents of /srv/mail dir and pipes it to parallel in order to rsync multiple items, 8 at a time per the -j flag, to another location.

   cd /srv/mail/ && ls | parallel -v -j8 rsync -azvP --progress {} user@myserver.com:/srv/mail/{}

This automates the above process and just asks you where you want to copy from and then copy to. Be aware that threads are still set to 8 (per the flag "-j8") but you can change this manually.

   read -ep "What is the directory from? " from; read -ep "What is the directory to? " to;  pwd=$(pwd); cd $from && ls | parallel -v -j8 rsync -azvP --progress {} $to{}; cd $pwd

This will delete all files within the specified directory using rm.

   read -ep "What is the directory to delete stuff in? " delete; pwd=$(pwd); cd $delete && ls | grep -v "./" | grep -v "../" | parallel -v -j8 rm -rf {}; cd $pwd

If rm runs into "too many arguments", this'll work instead.

   read -ep "What is the directory to delete stuff in? " delete; pwd=$(pwd); cd $delete && find . -name "*" | parallel -v -j8 rm -rf {}; cd $pwd

Recursively grep for STRING within /DIR using 8 threads in parallel:

   find /DIR -type f | parallel -k -j8 -n 1000 -m grep -H -n -ril STRING {}

Install Parallel

Parallel is usually within each OS repository. There's usually a workaround if it isn't.

Install Parallel to CentOS 6:

   cd /etc/yum.repos.d/
   wget http://download.opensuse.org/repositories/home:tange/CentOS_CentOS-6/home:tange.repo
   yum install parallel