Difference between revisions of "Parallel"
(→Using Parallel) |
(→Install Parallel) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 24: | Line 24: | ||
read -ep "What is the directory to delete stuff in? " delete; pwd=$(pwd); cd $delete && find . -name "*" | parallel -v -j8 rm -rf {}; cd $pwd | read -ep "What is the directory to delete stuff in? " delete; pwd=$(pwd); cd $delete && find . -name "*" | parallel -v -j8 rm -rf {}; cd $pwd | ||
− | Recursively grep for STRING within /DIR using 8 | + | Recursively grep for STRING within /DIR using 8 threads in parallel: |
find /DIR -type f | parallel -k -j8 -n 1000 -m grep -H -n -ril STRING {} | find /DIR -type f | parallel -k -j8 -n 1000 -m grep -H -n -ril STRING {} | ||
+ | |||
+ | =Install Parallel= | ||
+ | Parallel is usually within each OS repository. There's usually a workaround if it isn't. | ||
+ | |||
+ | Install Parallel to CentOS 6: | ||
+ | cd /etc/yum.repos.d/ | ||
+ | wget http://download.opensuse.org/repositories/home:tange/CentOS_CentOS-6/home:tange.repo | ||
+ | yum install parallel |
Latest revision as of 09:43, 20 July 2019
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input into blocks and pipe a block into each command in parallel.
If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.
GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
For each line of input GNU parallel will execute command with the line as arguments. If no command is given, the line of input is executed. Several lines will be run in parallel. GNU parallel can often be used as a substitute for xargs or cat | bash.
Using Parallel
This lists the contents of /srv/mail dir and pipes it to parallel in order to rsync multiple items, 8 at a time per the -j flag, to another location.
cd /srv/mail/ && ls | parallel -v -j8 rsync -azvP --progress {} user@myserver.com:/srv/mail/{}
This automates the above process and just asks you where you want to copy from and then copy to. Be aware that threads are still set to 8 (per the flag "-j8") but you can change this manually.
read -ep "What is the directory from? " from; read -ep "What is the directory to? " to; pwd=$(pwd); cd $from && ls | parallel -v -j8 rsync -azvP --progress {} $to{}; cd $pwd
This will delete all files within the specified directory using rm.
read -ep "What is the directory to delete stuff in? " delete; pwd=$(pwd); cd $delete && ls | grep -v "./" | grep -v "../" | parallel -v -j8 rm -rf {}; cd $pwd
If rm runs into "too many arguments", this'll work instead.
read -ep "What is the directory to delete stuff in? " delete; pwd=$(pwd); cd $delete && find . -name "*" | parallel -v -j8 rm -rf {}; cd $pwd
Recursively grep for STRING within /DIR using 8 threads in parallel:
find /DIR -type f | parallel -k -j8 -n 1000 -m grep -H -n -ril STRING {}
Install Parallel
Parallel is usually within each OS repository. There's usually a workaround if it isn't.
Install Parallel to CentOS 6:
cd /etc/yum.repos.d/ wget http://download.opensuse.org/repositories/home:tange/CentOS_CentOS-6/home:tange.repo yum install parallel