Post by Craig Sanders via luv-mainPost by Joel W. Shea via luv-mainAre you maxing out your disk/network bandwidth already?
This is key, IMO, to whether running multiple rsyncs in parallel is
worth it or not. Almost all of the time, rsync is going to be I/O
bound (disk and network) rather than CPU bound - so adding more rsync
processes is just going to slow them all down even more. A single rsync
process can saturate the disk and I/O bandwidth of most common disk
subsystems and network connections.
If you have a RAID-1 array then you should be able to benefit from having as
many processes as there are mirrors of the data for reading (IE the
transmitting end and the receiver for updating previous data).
If you have a RAID-5 then you should get some benefits from multiple readers
but it's not as easy to predict. The same applies for command queuing in a
single device, but for a much smaller benefit.
Linux does some queuing of requests and it's theoretically possible to get
some benefits from multiple processes accessing a single disk at one time. But
the benefits will probably be small.
If you have a process that does some CPU operations as well as some IO there
is potential for performance improvement from running multiple processes at
once if nothing else is using the disk. For example if the process is using
10% CPU time and 90% iowait then you could get a 10% performance increase by
using a second process as there will almost always be a process blocked on
disk IO.
Apart from the case of 2 processes reading from a RAID-1 device the benefits
from all these are small. But for example if you want to transition a server
to new hardware or a new DC in an 8 hour downtime window and the transfer
looks like it will take 9 hours these are things you really want to do.
Post by Craig Sanders via luv-mainsplitting up the transfer into multiple smaller rsync jobs to be
run consecutively, not simultaneously, can be useful....especially
if you intend to run the transfers multiple times to get
new/changed/deleted/etc files since the last run. There's a lot
of startup overhead (and RAM & CPU usage) with rsync on every run,
comparing file lists and file timestamps and/or checksums to figure
out what needs to be transferred. Multiple smaller transfers (e.g. of
entire subdirectory trees) tend to be noticably much faster than one
large transfer.
Yes, especially if you are running out of dentry cache.
Post by Craig Sanders via luv-mainin other words, multiple parallel rsyncs is usually a false
optimisation.
The thing that concerns me most about such things is the potential for
mistakes. For everything you do there is some probability of stuffing it up.
Is the probability of a stuff-up a reasonable trade-off for a performance
improvement?
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/