[BackupPC-users] BackupPC Pool synchronization?

Discussion:

Mark Campbell

2013-02-28 21:10:13 UTC

So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD RAID1 array to an external Fireproof drive (with plans to also sync to a remote server at our collo). I found the script BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few examples I've seen online have indicated to me that this isn't quite what I'm looking for, since it appears to output it to a different layout. I initially tried the rsync method with -H, but my server would end up choking at 350GB. Any suggestions on how to do this?

Thanks,

Mark

Les Mikesell

2013-02-28 23:34:43 UTC

Permalink

On Thu, Feb 28, 2013 at 3:10 PM, Mark Campbell

Post by Mark Campbell
So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD
RAID1 array to an external Fireproof drive (with plans to also sync to a
remote server at our collo). I found the script BackupPC_CopyPcPool.pl by
Jeffrey, but the syntax and the few examples I've seen online have indicated
to me that this isn't quite what I'm looking for, since it appears to output
it to a different layout. I initially tried the rsync method with -H, but
my server would end up choking at 350GB. Any suggestions on how to do this?

I'm not sure anyone has come up with a really good way to do this.
One approach is to use a 3-member raid1 where you periodically remove
a drive and resync a new one. If you have reasonable remote
bandwidth and enough of a backup window, it is much easier to just run
another instance of backuppc hitting the same targets independently.

--
Les Mikesell
***@gmail.com

Lars Tobias Skjong-Børsting

2013-03-01 09:17:56 UTC

Permalink

Hi,

Post by Les Mikesell

I have come up with a IMHO good way to do this using ZFS (ZFSonLinux).

Description:
* uses 3 disks.
* at all times, keep 1 mirrored disk in a fire safe.
* periodically swap the safe disk with mirror in server.

1. create a zpool with three mirrored members.
2. create a filesystem on it and mount at /var/lib/backuppc.
3. do some backups.
4. detach one disk and put in safe.
5. do more backups.
6. detach one disk and swap with the other disk in the safe.
7. attach and online the disk from the safe.
8. watch it sync up.

I am currently using 2TB disks, and swap period of 1 month. Because of
ZFS it doesn't need to sync all the blocks, but only the changed blocks
since 1 month ago. For example, with 10GB changed it will sync in less
than 25 minutes (approx. 7 MB/s speed). That's a lot faster than
anything I got with mdraid which syncs every block.

ZFS also comes with benefits of checksumming and error correction of
file content and file metadata. BackupPC also supports error correction
through par2, and this gives an extra layer of data protection.

Backing up large numbers of files can take a very long time because of
harddisk seeking. This can be alleviated by using a SSD cache drive for
ZFS. This support for read (ZFS L2ARC) and write (ZFS ZIL) caching on a
small SSD (30 GB) cuts incremental time down to half for some shares.

As for remote sync, you can use "zfs send" on the backup server and "zfs
receive" on the offsite server. This will only send the differences
since last sync (like rsync), and will be probably be significantly
faster than rsync that in addition has to resolve all the hardlinks.

--
Best regards,
Lars Tobias

Mark Campbell

2013-03-01 21:37:56 UTC

Permalink

Lars,

Thanks for the interesting idea! I confess I haven't played with ZFS much (though I've been wanting to for some time), maybe this is the excuse I need ;). Question, taking your model here, and applying it to my situation, how well would this work:

BackupPC server, with a RAID1 zpool, with the third member being my external fireproof drive. Rather than the rotation you described, just leave it as is as it does its daily routine. Then, should the day come where I need to grab the drive and go, plugging the drive into a system with ZFSonLinux & BackupPC installed, could I mount this drive by itself?

I really like your idea of zfs send/receive for the remote copy. Do you have any tips/pointers/docs on the best way to run it in this scenario?

Thanks,

--Mark

-----Original Message-----
From: Lars Tobias Skjong-Børsting [mailto:***@snota.no]
Sent: Friday, March 01, 2013 4:18 AM
To: backuppc-***@lists.sourceforge.net
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Hi,

Post by Les Mikesell

I have come up with a IMHO good way to do this using ZFS (ZFSonLinux).

Description:
* uses 3 disks.
* at all times, keep 1 mirrored disk in a fire safe.
* periodically swap the safe disk with mirror in server.

1. create a zpool with three mirrored members.
2. create a filesystem on it and mount at /var/lib/backuppc.
3. do some backups.
4. detach one disk and put in safe.
5. do more backups.
6. detach one disk and swap with the other disk in the safe.
7. attach and online the disk from the safe.
8. watch it sync up.

I am currently using 2TB disks, and swap period of 1 month. Because of ZFS it doesn't need to sync all the blocks, but only the changed blocks since 1 month ago. For example, with 10GB changed it will sync in less than 25 minutes (approx. 7 MB/s speed). That's a lot faster than anything I got with mdraid which syncs every block.

ZFS also comes with benefits of checksumming and error correction of file content and file metadata. BackupPC also supports error correction through par2, and this gives an extra layer of data protection.

Backing up large numbers of files can take a very long time because of harddisk seeking. This can be alleviated by using a SSD cache drive for ZFS. This support for read (ZFS L2ARC) and write (ZFS ZIL) caching on a small SSD (30 GB) cuts incremental time down to half for some shares.

As for remote sync, you can use "zfs send" on the backup server and "zfs receive" on the offsite server. This will only send the differences since last sync (like rsync), and will be probably be significantly faster than rsync that in addition has to resolve all the hardlinks.

--
Best regards,
Lars Tobias

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Lars Tobias Skjong-Børsting

2013-03-02 19:28:18 UTC

Permalink

Hi Mark,

Post by Mark Campbell
Question, taking your model here, and applying it to my situation,
BackupPC server, with a RAID1 zpool, with the third member being my
external fireproof drive. Rather than the rotation you described,
just leave it as is as it does its daily routine. Then, should the
day come where I need to grab the drive and go, plugging the drive
into a system with ZFSonLinux & BackupPC installed, could I mount
this drive by itself?

Yes, this is no different, really. It would work very well. Just keep it
in sync, and all should be fine. You can just pull out any drive at
will, without causing any filesystem corruption. The fireproof drive can
be inserted in a different computer with ZFS support and you can run
"zpool import" and then you can mount the filesystem.

You shouldn't use USB for your external drive, though. E-SATA, Firewire
or Thunnderbolt is fine.

Post by Mark Campbell
I really like your idea of zfs send/receive for the remote copy. Do
you have any tips/pointers/docs on the best way to run it in this
scenario?

--
Best regards,
Lars Tobias

Mark Campbell

2013-03-04 15:59:07 UTC

Permalink

Thanks Lars,

I think that this is going to be the way I'm going to go. I'm going to migrate the existing pool from its current location on a 1TB linux MD RAID 1 to a newly created 2TB ZFS RAID 1 using 3x drives (the third being the fireproof external). I do believe that this is where BackupPC_copyPcPool.pl will come in handy, am I correct Jeffrey?

When we're ready to put in place the offsite backup, could I temporarily sync a 4th drive to the ZFS RAID array so that I can then transport the drive to our collo, and import it there? Also, would I be correct in assuming that the ZFS resilvering process is like other RAID systems, in that I wouldn't have to shut down BackupPC during its resilvering process (that it would just update changes as it went along automatically)?

Thanks,

--Mark

-----Original Message-----
From: Lars Tobias Skjong-Børsting [mailto:***@snota.no]
Sent: Saturday, March 02, 2013 2:28 PM
To: backuppc-***@lists.sourceforge.net
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Hi Mark,

Post by Mark Campbell
Question, taking your model here, and applying it to my situation, how
BackupPC server, with a RAID1 zpool, with the third member being my
external fireproof drive. Rather than the rotation you described,
just leave it as is as it does its daily routine. Then, should the
day come where I need to grab the drive and go, plugging the drive
into a system with ZFSonLinux & BackupPC installed, could I mount this
drive by itself?

Yes, this is no different, really. It would work very well. Just keep it in sync, and all should be fine. You can just pull out any drive at will, without causing any filesystem corruption. The fireproof drive can be inserted in a different computer with ZFS support and you can run "zpool import" and then you can mount the filesystem.

You shouldn't use USB for your external drive, though. E-SATA, Firewire or Thunnderbolt is fine.

Post by Mark Campbell
I really like your idea of zfs send/receive for the remote copy. Do
you have any tips/pointers/docs on the best way to run it in this
scenario?

I don't mean to say RTFM, but the top results of a Google search are as good a starting point as any:
https://www.google.com/search?q=zfs+send+receive+backup

I think this article is quite good:
http://cuddletech.com/blog/pivot/entry.php?id=984

If you have any further questions, don't hesitate to ask. :)

--
Best regards,
Lars Tobias

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Mark Campbell

2013-03-04 16:20:44 UTC

Permalink

Oh, and while I'm thinking of it, what are your thoughts on using ZFS' dedup feature on a BackupPC pool? I'm aware that a goodly amount of RAM would be required for that feature. But since BackupPC's dedup feature is file-based, and ZFS' dedup feature is block-based, even more space could be saved; particularly when you're backing up things like .pst files, where a large majority of the file is the same, save a few bytes/KB/MB. Such files are flagged by BackupPC as different.

Thanks,

--Mark

-----Original Message-----
From: Lars Tobias Skjong-Børsting [mailto:***@snota.no]
Sent: Saturday, March 02, 2013 2:28 PM
To: backuppc-***@lists.sourceforge.net
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Hi Mark,

Yes, this is no different, really. It would work very well. Just keep it in sync, and all should be fine. You can just pull out any drive at will, without causing any filesystem corruption. The fireproof drive can be inserted in a different computer with ZFS support and you can run "zpool import" and then you can mount the filesystem.

You shouldn't use USB for your external drive, though. E-SATA, Firewire or Thunnderbolt is fine.

Post by Mark Campbell
I really like your idea of zfs send/receive for the remote copy. Do
you have any tips/pointers/docs on the best way to run it in this
scenario?

Tyler J. Wagner

2013-03-04 16:38:19 UTC

Permalink

Post by Mark Campbell
Oh, and while I'm thinking of it, what are your thoughts on using ZFS' dedup feature on a BackupPC pool? I'm aware that a goodly amount of RAM would be required for that feature. But since BackupPC's dedup feature is file-based, and ZFS' dedup feature is block-based, even more space could be saved; particularly when you're backing up things like .pst files, where a large majority of the file is the same, save a few bytes/KB/MB. Such files are flagged by BackupPC as different.

Just disable BackupPC's pooling entirely. You'd have to disable
BackupPC_nightlyAdmin, and the link process after completing the dump stage.

Tyler

--
"Any advert in a public space that gives you no choice whether you see it
or not is yours. It’s yours to take, re-arrange and re-use. You can do
whatever you like with it. Asking for permission is like asking to keep
a rock someone just threw at your head. You owe the companies nothing."
-- Banksy on Advertising

Trey Dockendorf

2013-03-04 16:39:58 UTC

Permalink

On Mon, Mar 4, 2013 at 10:20 AM, Mark Campbell

Yes, this is no different, really. It would work very well. Just keep it in sync, and all should be fine. You can just pull out any drive at will, without causing any filesystem corruption. The fireproof drive can be inserted in a different computer with ZFS support and you can run "zpool import" and then you can mount the filesystem.
You shouldn't use USB for your external drive, though. E-SATA, Firewire or Thunnderbolt is fine.

Post by Mark Campbell
I really like your idea of zfs send/receive for the remote copy. Do
you have any tips/pointers/docs on the best way to run it in this
scenario?

https://www.google.com/search?q=zfs+send+receive+backup
http://cuddletech.com/blog/pivot/entry.php?id=984
If you have any further questions, don't hesitate to ask. :)
--
Best regards,
Lars Tobias
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Mark Campbell

2013-03-05 16:39:12 UTC

Permalink

Trey,

I haven't really played with zfs' dedup much either, which is why I posed the question. But my understanding of how the dedup feature works appears to me that it would make it transparent to BackupPC & its hardlinking. Perhaps that's a question better asked in a zfs mailing list/forum, but I just thought I'd pose the question here since not everyone who works with zfs would understand how BackupPC works.

Thanks,

--Mark

-----Original Message-----
From: Trey Dockendorf [mailto:***@gmail.com]
Sent: Monday, March 04, 2013 11:40 AM
To: General list for user discussion, questions and support
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Yes, this is no different, really. It would work very well. Just keep it in sync, and all should be fine. You can just pull out any drive at will, without causing any filesystem corruption. The fireproof drive can be inserted in a different computer with ZFS support and you can run "zpool import" and then you can mount the filesystem.
You shouldn't use USB for your external drive, though. E-SATA, Firewire or Thunnderbolt is fine.

Post by Mark Campbell
I really like your idea of zfs send/receive for the remote copy. Do
you have any tips/pointers/docs on the best way to run it in this
scenario?

https://www.google.com/search?q=zfs+send+receive+backup
http://cuddletech.com/blog/pivot/entry.php?id=984
If you have any further questions, don't hesitate to ask. :)
--
Best regards,
Lars Tobias
----------------------------------------------------------------------
-------- Everyone hates slow websites. So do we.
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
----------------------------------------------------------------------
-------- Everyone hates slow websites. So do we.
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Resilvering in ZFS is slightly different than standard RAID. The main difference is that ZFS is 'filesystem aware'. ZFS only has to copy the existing data, rather than the entire block device like in traditional RAID. This means resilvering can be very fast compared to traditional RAID.

Some useful docs on the subject:

http://www.cupfighter.net/index.php/2012/10/default-nexenta-zfs-settings-you-want-to-change/

http://info.nexenta.com/rs/nexenta/images/white_paper_nexentastor_zfs_initialization_and_resilvering.pdf

https://github.com/szaydel/Nexenta-Docs-Pub

As for Dedup, I have no experience in it's usage, but it seems that deduplication has the potential to upset BackupPC as what BackupPC does is already a form of dedup by hardlinking files to save space. I personally would not use it in this case.

Useful docs on dedup and when to use:

https://blogs.oracle.com/bonwick/entry/zfs_dedup

http://hub.opensolaris.org/bin/view/Community+Group+zfs/dedup

- Trey

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Lars Tobias Skjong-Børsting

2013-03-06 10:10:38 UTC

Permalink

As long as you can keep the DDT in RAM, it would not slow down too much.
You should test that, though.

For about 1 TB of used disk space, I believe you would need something
like 3 GB of RAM, depending on your filesystem's average block size. You
can calculate the required RAM amount with "zdb -S trunk" which will
simulate dedup on your disk, and then you can multiply the total blocks
number by 320 to get the required RAM.

If the dedup tables (DDTs) spill over your amount of available RAM, it
will use the L2ARC if you have one or the disk if you don't have L2ARC
cache. Harddisk access for the DDTs would slow you down significantly.
If you can keep a pretty decent sized L2ARC on a very fast SSD it would
be less slow if your RAM is too small.

Also, like Tyler said, you should disable BackupPC's pooling to increase
your performance.

--
Best regards,
Lars Tobias

Lars Tobias Skjong-Børsting

2013-03-06 11:14:46 UTC

Permalink

Post by Lars Tobias Skjong-BÃ¸rsting
Also, like Tyler said, you should disable BackupPC's pooling to increase
your performance.

And you must also disable compression in BackupPC, and enable it in ZFS
instead. Compressing the files will destroy your dedup potential.

--
Best regards,
Lars Tobias

Mark Campbell

2013-03-06 16:01:28 UTC

Permalink

I don't mean to bring up another "RTFM" moment, but I've searched around, and I haven't found the location for enabling/disabling the pooling. The compression option I've found, but not pooling.

Thanks,

--Mark

-----Original Message-----
From: Lars Tobias Skjong-Børsting [mailto:***@snota.no]
Sent: Wednesday, March 06, 2013 6:15 AM
To: backuppc-***@lists.sourceforge.net
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Post by Lars Tobias Skjong-BÃ¸rsting
Also, like Tyler said, you should disable BackupPC's pooling to
increase your performance.

And you must also disable compression in BackupPC, and enable it in ZFS instead. Compressing the files will destroy your dedup potential.

--
Best regards,
Lars Tobias

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Les Mikesell

2013-03-06 16:20:00 UTC

Permalink

On Wed, Mar 6, 2013 at 10:01 AM, Mark Campbell

Post by Mark Campbell
I don't mean to bring up another "RTFM" moment, but I've searched around, and I haven't found the location for enabling/disabling the pooling. The compression option I've found, but not pooling.

I don't think you can - short of changing permissions on the
pool/cpool directories and letting your logs fill with "can't link"
errors. It would probably take a code change.

--
Les Mikesell
***@gmail.com

Mark Campbell

2013-03-06 17:43:04 UTC

Permalink

I see. I just assumed that it was possible based on Lars' comments. Something I just noticed as I was browsing through the config files, could it be possible to disable this by changing $Conf{HardLinkMax} = 31999 to 1?

Thanks,

--Mark

-----Original Message-----
From: Les Mikesell [mailto:***@gmail.com]
Sent: Wednesday, March 06, 2013 11:20 AM
To: General list for user discussion, questions and support
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

I don't think you can - short of changing permissions on the pool/cpool directories and letting your logs fill with "can't link"
errors. It would probably take a code change.

--
Les Mikesell
***@gmail.com

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Les Mikesell

2013-03-06 18:03:05 UTC

Permalink

On Wed, Mar 6, 2013 at 11:43 AM, Mark Campbell

Post by Mark Campbell
I see. I just assumed that it was possible based on Lars' comments. Something I just noticed as I was browsing through the config files, could it be possible to disable this by changing $Conf{HardLinkMax} = 31999 to 1?

That might make things worse. Not sure, but I think when the max is
hit it makes a new pool name to become the target of additional links.

--
Les Mikesell
***@gmail.com

Mark Campbell

2013-03-06 18:16:58 UTC

Permalink

Interesting. Well then I guess the answer is to not muck with pooling (as redundant as it is, at least it theoretically shouldn't hurt anything), disable compression, and enable dedup & compression on ZFS.

Thanks,

--Mark

-----Original Message-----
From: Les Mikesell [mailto:***@gmail.com]
Sent: Wednesday, March 06, 2013 1:03 PM
To: General list for user discussion, questions and support
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

That might make things worse. Not sure, but I think when the max is hit it makes a new pool name to become the target of additional links.
--
Les Mikesell
***@gmail.com

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Les Mikesell

2013-03-06 19:42:17 UTC

Permalink

On Wed, Mar 6, 2013 at 12:16 PM, Mark Campbell

Post by Mark Campbell
Interesting. Well then I guess the answer is to not muck with pooling (as redundant as it is, at least it theoretically shouldn't hurt anything), disable compression, and enable dedup & compression on ZFS.

--
Les Mikesell
***@gmail.com

Holger Parplies

2013-03-07 14:15:49 UTC

Permalink

Mark Campbell

2013-03-07 14:34:45 UTC

Permalink

Holgar,

My thinking at this point is that I'll leave the pooling be--it may require some extra CPU cycles & RAM from time to time, but my understanding of the zfs dedup & compress features are that they should be transparent to BackupPC, so while pooling in BackupPC won't avail much, it probably wouldn't hurt anything either.

Thanks,

--Mark

-----Original Message-----
From: Holger Parplies [mailto:***@parplies.de]
Sent: Thursday, March 07, 2013 9:16 AM
To: General list for user discussion, questions and support
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Hi,

Post by Les Mikesell
On Wed, Mar 6, 2013 at 12:16 PM, Mark Campbell

Yes, I'd do that and try out the mirroring and send/receive features.
If you are sure everything else is good you can probably find the part
in the code that makes the links and remove it.

It's a bit more than one "part in the code". *New pool entries* are created by BackupPC_link, which would then be essentially unnecessary. That part is simple enough to turn off. But there's really a rather complex strategy to link to *existing pool entries*. In fact, without pooling there is not much point in using the Perl rsync implementation, for instance (well, maybe the attrib files, but then again, maybe we could get rid of them as well, if we don't use pooling). It really sounds like a major redesign of BackupPC if you want to gain all the benefits you can. Sort of like halfway to 4.0 :).
Basically, you end up with just the BackupPC scheduler, rsync (or tar or just about anything you can put into a command line) for transport, and ZFS for storage. Personally, I'd probably get rid of the attrib files (leaving plain file system snapshots easily accessible with all known tools and subject to kernel permission checking) and the whole web interface ;-). Most others will want to be able to browse backups through the web interface, which probably entails keeping attrib files (and having all files be owned by the backuppc user, just like the current situation). Then again, 'fakeroot' emulates root-type file system semantics through a preloaded library. Maybe this idea could be adapted for BackupPC to use stock tools for transport and get attrib files (and backuppc file ownership) just the same.

ZFS is an interesting topic these days. It's probably best to gain some BackupPC community experience with ZFS first, before contemplating changing BackupPC to take the most advantage. Even with BackupPC pooling in place, significant gains seem possible.

Regards,
Holger

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Tyler J. Wagner

2013-03-07 15:14:14 UTC

Permalink

Post by Mark Campbell
My thinking at this point is that I'll leave the pooling be--it may
require some extra CPU cycles & RAM from time to time, but my
understanding of the zfs dedup & compress features are that they should
be transparent to BackupPC, so while pooling in BackupPC won't avail
much, it probably wouldn't hurt anything either.

--
"... I've never seen the Icarus story as a lesson about the limitations of
humans. I see it as a lesson about the limitations of wax as an adhesive."
-- Randall Munroe, "XKCD What IF?: Interplanetary Cessna"

Mark Campbell

2013-03-07 16:33:22 UTC

Permalink

This post might be inappropriate. Click to display it.

Mark Campbell

2013-08-06 12:41:54 UTC

Permalink

I thought I would give an update to this old thread. To those unfamiliar with this thread, basically, I was looking for a way to do backuppc data synchronization, and we brainstormed a way using the ZFS file system (as it has an rsync-like ability (sending only changes), but at the filesystem/block level (like dd), rather than the file level of rsync, so the problems that plague trying to rsync a BackupPC pool shouldn't affect ZFS send/receive). Then that topic expanded into using ZFS with deduplication (which is block based, to compliment the dedup of BackupPC's pools, which is file based) to further cut down on disk usage. It is on that subject that I wanted to report my findings.

A spare Supermicro server was recently acquired that had an Opteron quad core, 16GB of RAM, and a RAID-5 array with 4 250GB drives, giving it a 750GB array (25% less than my production system, but good for an experimental setup). So I decided to try and use it as an experimental ZFS/BackupPC box. I started by loading CentOS 6.4 on it, installed ZFSOnLinux (I know, experimental, but that's exactly what this is), and installed BackupPC. I created a single disk ZFS pool from a partition of the array in the same location relative to / as it is on my production BackupPC box & enabled both dedup & compression. I then copied over /etc/BackupPC from production to my test box, and modified the server config to not do any compression ($Conf{CompressLevel}=0)--not doing this completely negated dedup's abilities. Once I created the basic BackupPC pool structure on the ZFS pool, I started the BackupPC service, and let her go for several days (but monitored her), accumulating the same backups that my production box does.

It should be noted that this box needed ALL 16GB of RAM for the dedup feature, but it never crashed, or even used swap significantly, and performance remained reasonable. Over the course of 7 days, this box has been extremely successful in its dedup & compression features. At 25% smaller total disk space, I'm currently at 2.17x Dedup rate (really good!), and it is storing nearly as many backups as production is, with more space free!

I have not yet had a second box to play with to do ZFS transfers, but when I do, I will report on that too.

Thanks,

--Mark

-----Original Message-----
From: Tyler J. Wagner [mailto:***@tolaris.com]
Sent: Thursday, March 07, 2013 10:14 AM
To: General list for user discussion, questions and support
Cc: Mark Campbell
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Post by Mark Campbell
My thinking at this point is that I'll leave the pooling be--it may
require some extra CPU cycles & RAM from time to time, but my
understanding of the zfs dedup & compress features are that they
should be transparent to BackupPC, so while pooling in BackupPC won't
avail much, it probably wouldn't hurt anything either.

Except that it's the pooling (hardlinking) that makes pool synchronization suck so badly. Although perhaps ZFS mirror might make that better, I'd much rather disable pooling entirely (disable the linking process), and then just use rsync to sync the backuppc/pc tree between primary and secondary hosts.

Regards,
Tyler

--
"... I've never seen the Icarus story as a lesson about the limitations of humans. I see it as a lesson about the limitations of wax as an adhesive."
-- Randall Munroe, "XKCD What IF?: Interplanetary Cessna"

Les Mikesell

2013-03-07 16:04:02 UTC

Permalink

Post by unknown
It's a bit more than one "part in the code". *New pool entries* are created
by BackupPC_link, which would then be essentially unnecessary. That part is
simple enough to turn off. But there's really a rather complex strategy to
link to *existing pool entries*. In fact, without pooling there is not much
point in using the Perl rsync implementation, for instance (well, maybe the
attrib files, but then again, maybe we could get rid of them as well, if we
don't use pooling).

The perl rsync understands the local compression - which may also be
better handled by the file system. Clearly the snapshots of a growing
logfile could be stored more efficiently with a block level scheme -
but backuppc's checksum caching might be a win for non-changing files
in terms of processing efficiency.

Post by unknown
It really sounds like a major redesign of BackupPC if you
want to gain all the benefits you can. Sort of like halfway to 4.0 :).
Basically, you end up with just the BackupPC scheduler, rsync (or tar or just
about anything you can put into a command line) for transport, and ZFS for
storage. Personally, I'd probably get rid of the attrib files (leaving plain
file system snapshots easily accessible with all known tools and subject to
kernel permission checking) and the whole web interface ;-).

If anyone is designing for the future, I think it makes sense to split
out all of the dedup and compression operations, since odds are good
that future filesystems will handle this well and your backup system
won't be a special case. Keeping 'real' filesystem attributes is more
of a problem, since the system hosting the backups may not have the
same user base as the targets, the filesystem may not be capable of
holding the same attributes, and even if those were not a prioblem it
would mean the backup system would have to run as root to have full
access.

Post by unknown
Most others will
want to be able to browse backups through the web interface, which probably
entails keeping attrib files (and having all files be owned by the backuppc
user, just like the current situation). Then again, 'fakeroot' emulates
root-type file system semantics through a preloaded library.

That's interesting - it would be nice to have a user-level abstraction
where a non-admin web user could access things with approximately the
permissions he would have on the source host.

Post by unknown
Maybe this idea
could be adapted for BackupPC to use stock tools for transport and get attrib
files (and backuppc file ownership) just the same.
ZFS is an interesting topic these days. It's probably best to gain some
BackupPC community experience with ZFS first, before contemplating changing
BackupPC to take the most advantage. Even with BackupPC pooling in place,
significant gains seem possible.

Hmmm, maybe something even more extreme for the future would be to
work out a way to have snapshots of virtual-machine images updated
with block-level file pooling. Then, assuming appropriate network
connectivity, you'd have the option of firing up the VM as an instant
replacement instead of rebuilding/restoring a failed host.

--
Les Mikesell
***@gmail.com

Mark Campbell

2013-03-07 14:32:02 UTC

Permalink

I realize that it's probably not considered "production" yet, but I was considering zfsonlinux on top of CentOS. All my linux servers at this time run CentOS (my current BackupPC implementation included), and this is just a natural extension of that.

Thanks,

--Mark

-----Original Message-----
From: Les Mikesell [mailto:***@gmail.com]
Sent: Wednesday, March 06, 2013 2:42 PM
To: General list for user discussion, questions and support
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Yes, I'd do that and try out the mirroring and send/receive features.
If you are sure everything else is good you can probably find the part
in the code that makes the links and remove it. What zfs-supporting
platform are you planning to use?

--
Les Mikesell
***@gmail.com

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Les Mikesell

2013-03-07 15:42:48 UTC

Permalink

Post by Mark Campbell
I realize that it's probably not considered "production" yet, but I was considering zfsonlinux on top of CentOS. All my linux servers at this time run CentOS (my current BackupPC implementation included), and this is just a natural extension of that.

I wouldn't consider anything that has not been in wide use for several
years to store backups. If you ever look in project changelogs at the
number and severity of bugs that are still being fixed long after any
complex code is shipped it will scare you off. I've always
considered it to be extremely unfortunate how the GPL prevents
assembling 'best-of-breed' components together for distribution.

--
Les Mikesell
***@gmail.com

Lars Tobias Skjong-Børsting

2013-03-06 21:47:11 UTC

Permalink

Post by Tyler J. Wagner
Just disable BackupPC's pooling entirely. You'd have to disable
BackupPC_nightlyAdmin, and the link process after completing the dump
stage.
Also, like Tyler said, you should disable BackupPC's pooling to
increase your performance.
I see. I just assumed that it was possible based on Lars' comments.

I didn't know how to do it myself and I just referred to Tyler's advice
in the text above. It sounds like it almost certainly involves some Perl
coding or in the very least commenting out some code.

--
Best regards,
Lars Tobias

b***@kosowsky.org

2013-03-08 01:20:31 UTC

Permalink

Adam Goryachev

2013-03-08 01:34:03 UTC

Permalink

Post by b***@kosowsky.org

Indeed, perhaps a better solution would be to simply rm -rf the pool
every night. Still run the nightly script to send warnings about
machines not backed up/etc

Regards,
Adam

--
Adam Goryachev
Website Managers
www.websitemanagers.com.au

Mark Campbell

2013-03-08 14:17:30 UTC

Permalink

My apologies. I was just going by what everyone else seemed to be recommending at that time; and when browsing the config, I happened upon a variable that I wondered would accomplish what they were advocating. If you'll see the rest of the chain, you'll see that I abandoned that idea since ZFS' mechanisms would very likely be transparent to BackupPC anyway.

Thanks,

--Mark

-----Original Message-----
From: ***@kosowsky.org [mailto:***@kosowsky.org]
Sent: Thursday, March 07, 2013 8:21 PM
To: General list for user discussion, questions and support
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

This must be one of the absolute WORST ideas I have heard in a long time...

First, setting it to 1 would be nothing could hard link to the pool not even the first backup so BackupPC would probably crash/thrash immediately. You would at least want to set it to 2 since the pool file itself is a link.

Furthermore, doing so would really slow things down since now every duplicated file would lead to a chain that would need to be walked when adding/comparing a new file. Also, BackupPC_nightly would slow down as chains would need to be renumbered now most of the time instead of by exception.

Plus, there might be cases where HardLinkMax will fail if set to 1.

Most of all, why even have a pool if only one file ever links to it.

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Timothy J Massey

2013-03-06 16:14:24 UTC

Permalink

Post by Mark Campbell
I don't mean to bring up another "RTFM" moment, but I've searched
around, and I haven't found the location for enabling/disabling the
pooling. The compression option I've found, but not pooling.

There is no way of doing this. If you don't want pooling, you don't want
BackupPC. BackupPC without pooling equals rsync (or tar or cp or...).

What *exactly* are you trying to accomplish by turning off pooling?

Tim Massey

Out of the Box Solutions, Inc.
Creative IT Solutions Made Simple!
http://www.OutOfTheBoxSolutions.com
***@obscorp.com

22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796

Les Mikesell

2013-03-06 16:52:58 UTC

Permalink

Post by Timothy J Massey

Post by Mark Campbell
I don't mean to bring up another "RTFM" moment, but I've searched
around, and I haven't found the location for enabling/disabling the
pooling. The compression option I've found, but not pooling.

There's a long thread here - the idea is to run backuppc (for its many
other features) with the archive on a zfs filesystem with block-level
dedup (and probably compression) enabled. That should actually be
transparent to backuppc's hardlink dedup but it would be unnecessary
and redundant. Turning off compression would allow dedup of matching
chunks within files where backuppc will store whole copies with slight
differences. The real win is (may be?) getting the the more
intelligent zfs mirror rebuild for offsite swaps and the incremental
send/receive to update remote backups.

--
Les Mikesell
***@gmail.com

Trey Dockendorf

2013-03-03 07:02:47 UTC

Permalink

Post by Mark Campbell
Lars,
Thanks for the interesting idea! I confess I haven't played with ZFS

much (though I've been wanting to for some time), maybe this is the excuse
I need ;). Question, taking your model here, and applying it to my

Post by Mark Campbell
BackupPC server, with a RAID1 zpool, with the third member being my

external fireproof drive. Rather than the rotation you described, just
leave it as is as it does its daily routine. Then, should the day come
where I need to grab the drive and go, plugging the drive into a system
with ZFSonLinux & BackupPC installed, could I mount this drive by itself?

Post by Mark Campbell
I really like your idea of zfs send/receive for the remote copy. Do you

have any tips/pointers/docs on the best way to run it in this scenario?

Post by Mark Campbell
Thanks,
--Mark
-----Original Message-----
Sent: Friday, March 01, 2013 4:18 AM
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?
Hi,

On Thu, Feb 28, 2013 at 3:10 PM, Mark Campbell <

I have come up with a IMHO good way to do this using ZFS (ZFSonLinux).
* uses 3 disks.
* at all times, keep 1 mirrored disk in a fire safe.
* periodically swap the safe disk with mirror in server.
1. create a zpool with three mirrored members.
2. create a filesystem on it and mount at /var/lib/backuppc.
3. do some backups.
4. detach one disk and put in safe.
5. do more backups.
6. detach one disk and swap with the other disk in the safe.
7. attach and online the disk from the safe.
8. watch it sync up.
I am currently using 2TB disks, and swap period of 1 month. Because of

ZFS it doesn't need to sync all the blocks, but only the changed blocks
since 1 month ago. For example, with 10GB changed it will sync in less than
25 minutes (approx. 7 MB/s speed). That's a lot faster than anything I got
with mdraid which syncs every block.

Post by Mark Campbell
ZFS also comes with benefits of checksumming and error correction of file

content and file metadata. BackupPC also supports error correction through
par2, and this gives an extra layer of data protection.

Post by Mark Campbell
Backing up large numbers of files can take a very long time because of

harddisk seeking. This can be alleviated by using a SSD cache drive for
ZFS. This support for read (ZFS L2ARC) and write (ZFS ZIL) caching on a
small SSD (30 GB) cuts incremental time down to half for some shares.

Post by Mark Campbell
As for remote sync, you can use "zfs send" on the backup server and "zfs

receive" on the offsite server. This will only send the differences since
last sync (like rsync), and will be probably be significantly faster than
rsync that in addition has to resolve all the hardlinks.

Post by Mark Campbell
--
Best regards,
Lars Tobias

------------------------------------------------------------------------------

Post by Mark Campbell
Everyone hates slow websites. So do we.
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

------------------------------------------------------------------------------

Post by Mark Campbell
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

+1 for ZFS as a means to replicate the pool without lots of rsyncing.
However the checksumming in ZFS only takes place on RAIDZ sets. ZFS
mirroring (RAID 1) does not do checksum verification. You would have to
use RAIDZ1 (RAID 5) , RAIDZ2 (RAID 6) or RAIDZ3 (triple parity) to benefit
from checksum verification.

Lars Tobias Skjong-Børsting

2013-03-03 11:20:49 UTC

Permalink

Post by Trey Dockendorf
However the checksumming in ZFS only takes place on RAIDZ sets.

ZFS actually always checksums and error detection, also on RAID1 and
RAID0. For RAID0 there is no redundant data to attempt correction with.
For RAID1 there is a copy of the data to use for correction.

Post by Trey Dockendorf
ZFS mirroring (RAID 1) does not do checksum verification.

That's wrong, it certainly does, also with error correction.

Read up on it here:
https://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data

--
Best regards,
Lars Tobias

Trey Dockendorf

2013-03-03 19:26:23 UTC

Permalink

Post by Lars Tobias Skjong-BÃ¸rsting

Post by Trey Dockendorf
However the checksumming in ZFS only takes place on RAIDZ sets.

Post by Trey Dockendorf
ZFS mirroring (RAID 1) does not do checksum verification.

That's wrong, it certainly does, also with error correction.
https://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data
--
Best regards,
Lars Tobias

------------------------------------------------------------------------------

Post by Lars Tobias Skjong-BÃ¸rsting
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

My mistake, not sure where that incorrect assumption came from. So a
mirror would achieve the desired result of OP.

b***@kosowsky.org

2013-03-01 02:43:05 UTC

Permalink

Post by Mark Campbell
So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD RAID1 array to an external Fireproof drive (with plans to also sync to a remote server at our collo). I found the script BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few examples I've seen online have indicated to me that this isn't quite what I'm looking for, since it appears to output it to a different layout. I initially tried the rsync method with -H, but my server would end up choking at 350GB. Any suggestions on how to do this?

The bottom line is that other than doing a block level file system
copy there is no "free lunch" that gets around the hard problem of
copying over densely hard-linked files.

As many like yourself have noted, rsync bogs down using the -H (hard
links) flag, in part because rsync knows nothing of the special structure
of the pool & pc trees so it has to keep full track of all possible
hard links.

One solution is BackupPC_tarPCCopy which uses a tar-like perl script
to track and copy over the structure.

My script BackupPC_copyPcPool tries to combine the best of both
worlds. It allows you to use rsync or even "cp -r" to copy over the
pool disregarding any hard links. The pc tree with its links to the
pool are re-created by creating a flat file listing all the links,
directories, and zero size files that comprise the pc tree. This is
done with the help of a hash that caches the inode number of each pool
entry. The pc tree is then recreated by sequentially (re)creating
directories, zero size files, and links to the pool.

I have substantially re-written my original script to make it orders
of magnitude faster by substituting a packed in-memory hash for the
file-system inode-tree I used in the previous version. Several other
improvements have been added, including the ability to record full
file md5sums and to fix broken/missing links.

I was able to copy over a BackupPC tree consisting of 1.3 million pool
files (180 GB) and 24 million pc tree entries (4 million directories, 20
million links, 300 thousand zero-length files) in the following time:

~4 hours to copy over the pool
~5 hours to create the flat file mapping out the pc tree directories,
hard links & zero length files
~7 hours to convert the flat file into a new pc tree on the target filesystem

These numbers are approximate since I didn't really time it. But it
was all done on a low end AMD dual-core laptop with a single USB3
drive.

For this case, the flat file of links/directories/zero length files is 660 MB
compress (about 3.5 GB uncompressed). The inode caching requires about
250MB of RAM (mostly due to perl overhead) for the 1.3 million pool
files.

Note, before I release the revised script, I also hope to add a feature that
allows the copying of one or more backups from the pc tree on one
machine to the pc tree on another machine (with a different
pool). This feature is not available on any other backup scheme... and
effectively will allow "incremental-like" backups.

I also plan to allow the option to more tightly pack the inode caching
to save memory at the expense of some speed. I should be able to fit
10 million pool nodes in a 300MB cache.

I would like to benchmark my revised routine against
BackupPC_tarPCCopy in terms of speed, memory requirement, and
generated file size...

Les Mikesell

2013-03-01 03:14:08 UTC

Permalink

Post by b***@kosowsky.org
Note, before I release the revised script, I also hope to add a feature that
allows the copying of one or more backups from the pc tree on one
machine to the pc tree on another machine (with a different
pool). This feature is not available on any other backup scheme... and
effectively will allow "incremental-like" backups.

That could also be extremely handy when migrating to a new server - I
always have a lot of machines where I don't care about old history
mixed in with the few where I do.

--
Les Mikesell
***@gmail.com

Mark Campbell

2013-03-01 21:26:35 UTC

Permalink

I find myself rather surprised that this is a major issue in what is otherwise a really good enterprise-level backup tool. Syncronizing backups just seems to be a basic element to the idea of backups in a corporate environment. Should the building that my backup server resides in burns down, gets hit by a tornado, etc, there should be a process whereby you can have a syncronized backup elsewhere. Also by extension, what happens when you want to have a "cluster" of BackupPC?

The idea that you just run two BackupPC servers each running their own backups may work in some cases, but you are talking about double the transfers on the machines being backed up, and that can be unacceptable in some cases. For example, one of my machines being backed up is a linux server acting as a network drive. Backups of this can take a long time, BackupPC tells me 514 minutes for it's last full backup (naturally, this occurs after business hours). Once its been backed up, it's been deduped & compressed. It would ideally be better, even on a LAN, to transfer this compressed & deduped pool than it would to back it up twice on the same day. In the case of my network drive, worst case it gets bogged down 8hrs a day for backup. I have a small space of time that is considered "off hours" for it. My backup server on the other hand, can be bogged down 24 hrs/day for all I care, no one else is using its services but me.

Jeffrey, what is your latest version of your script? I have 0.1.3, circa Sept '11. Given how your script generally works, could it be made to simply recreate the pool structure on an external drive on the same system, rather than compressing it to a tarball? My end goal here is to be able to simply grab the external drive at a moment's notice, plug it into a new linux machine, and using a tarball of the BackupPC config files, and stand it up long enough to restore everyone's PCs & appropriate servers.

Greg, I would definitely have an interest in seeing the script; anything that will help me achieve a tertiary remote backup...

Thanks,

--Mark

-----Original Message-----
From: ***@kosowsky.org [mailto:***@kosowsky.org]
Sent: Thursday, February 28, 2013 9:43 PM
To: General list for user discussion, questions and support
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Post by Mark Campbell
So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD RAID1 array to an external Fireproof drive (with plans to also sync to a remote server at our collo). I found the script BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few examples I've seen online have indicated to me that this isn't quite what I'm looking for, since it appears to output it to a different layout. I initially tried the rsync method with -H, but my server would end up choking at 350GB. Any suggestions on how to do this?

The bottom line is that other than doing a block level file system copy there is no "free lunch" that gets around the hard problem of copying over densely hard-linked files.

As many like yourself have noted, rsync bogs down using the -H (hard
links) flag, in part because rsync knows nothing of the special structure of the pool & pc trees so it has to keep full track of all possible hard links.

One solution is BackupPC_tarPCCopy which uses a tar-like perl script to track and copy over the structure.

My script BackupPC_copyPcPool tries to combine the best of both worlds. It allows you to use rsync or even "cp -r" to copy over the pool disregarding any hard links. The pc tree with its links to the pool are re-created by creating a flat file listing all the links, directories, and zero size files that comprise the pc tree. This is done with the help of a hash that caches the inode number of each pool entry. The pc tree is then recreated by sequentially (re)creating directories, zero size files, and links to the pool.

I have substantially re-written my original script to make it orders of magnitude faster by substituting a packed in-memory hash for the file-system inode-tree I used in the previous version. Several other improvements have been added, including the ability to record full file md5sums and to fix broken/missing links.

I was able to copy over a BackupPC tree consisting of 1.3 million pool files (180 GB) and 24 million pc tree entries (4 million directories, 20 million links, 300 thousand zero-length files) in the following time:

~4 hours to copy over the pool
~5 hours to create the flat file mapping out the pc tree directories,
hard links & zero length files
~7 hours to convert the flat file into a new pc tree on the target filesystem

These numbers are approximate since I didn't really time it. But it was all done on a low end AMD dual-core laptop with a single USB3 drive.

For this case, the flat file of links/directories/zero length files is 660 MB compress (about 3.5 GB uncompressed). The inode caching requires about 250MB of RAM (mostly due to perl overhead) for the 1.3 million pool files.

Note, before I release the revised script, I also hope to add a feature that allows the copying of one or more backups from the pc tree on one machine to the pc tree on another machine (with a different pool). This feature is not available on any other backup scheme... and effectively will allow "incremental-like" backups.

I also plan to allow the option to more tightly pack the inode caching to save memory at the expense of some speed. I should be able to fit
10 million pool nodes in a 300MB cache.

I would like to benchmark my revised routine against BackupPC_tarPCCopy in terms of speed, memory requirement, and generated file size...

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
BackupPC-***@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

John Rouillard

2013-03-01 22:14:36 UTC

Permalink

Post by Mark Campbell
I find myself rather surprised that this is a major issue in what is
otherwise a really good enterprise-level backup tool. Syncronizing
backups just seems to be a basic element to the idea of backups in a
corporate environment. Should the building that my backup server
resides in burns down, gets hit by a tornado, etc, there should be a
process whereby you can have a syncronized backup elsewhere. Also by
extension, what happens when you want to have a "cluster" of BackupPC?

Handling this at the device/block level with zfs send/receive, DRBD
etc. is another way to handle the sync.

I had some luck running DRBD across a simulated laggy WAN (using WANEM
to simulate the wan) with a subset of 50 or so hosts being backed
up. The compress cycle after each backup did bog things down a bit
though.

--
-- rouilj

John Rouillard System Administrator
Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111

b***@kosowsky.org

2013-03-01 22:41:48 UTC

Permalink

Post by Mark Campbell
Jeffrey, what is your latest version of your script? I have 0.1.3, circa Sept '11. Given how your script generally works, could it be made to simply recreate the pool structure on an external drive on the same system, rather than compressing it to a tarball? My end goal here is to be able to simply grab the external drive at a moment's notice, plug it into a new linux machine, and using a tarball of the BackupPC config files, and stand it up long enough to restore everyone's PCs & appropriate servers.

Well there is no need for a tarball.
You rsync or cp the pool (without paying any attention to hard
links)... so this can even be done incrementally or over a ssh (or
netcat) pipe.

My program then (quickly) crawls the pool to create an in-memory inode
hash of the pool (which can be saved to a file too for reuse). Then
the program crawls all (or some) of the pc tree to create a flat file
specifying the directories, zeros, and links. Then you run the same
program in restore mode on the new machine to re-create the directory
tree, zero files, and links.

The only storage requires is for the links file -- which for me
compressed took about 600MB to store 4M directories, 20M links and
300K zero length files. This plus the unpooled log files at the root
of each machine (small!) plus the (incremental) rsync of the pool is
all that needs to be transferred between machines to do a full
BackupPC tree copy.

Arnold Krille

2013-02-28 22:56:56 UTC

Permalink

On Thu, 28 Feb 2013 14:10:13 -0700 Mark Campbell

Post by Mark Campbell
So I'm trying to get a BackupPC pool synced on a daily basis from a
1TB MD RAID1 array to an external Fireproof drive (with plans to also
sync to a remote server at our collo). I found the script
BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few
examples I've seen online have indicated to me that this isn't quite
what I'm looking for, since it appears to output it to a different
layout. I initially tried the rsync method with -H, but my server
would end up choking at 350GB. Any suggestions on how to do this?

Create a snapshot from the underlying lvm-volume and then copy / zip
that snapshot directly.

Or use BackupPC's 'archive' method to write full tar.gz of your hosts
to your external disk. We are using that to write tgz to a directory
where amanda then writes these to tape...

Have fun,

Arnold

gregrwm

2013-03-01 10:24:10 UTC

Permalink

i'm using a simple procedure i cooked up to maintain a "third copy" at a
third physical location using as little bandwidth as possible. it simply
looks at each pc/*/backups, selects the most recent full and most recent
incremental (plus any partial or /new), and copies them across the wire,
together with the most recently copied full&incremental set (plus any
incompletely copied sets), using rsync, with it's hardlink copying
feature. thus my third location has a copy of the most recent (already
compressed) pc/ tree data, using rsync to avoid copying stuff over the wire
that's already there (and not bothering with the cpool), which, for me, is
a happily sized set of hardlinks that rsync can actually manage (ymmv). i
have successfully used this together with a script to recreate the cpool
if/when necessary. if it's of interest i could share it.

John Rouillard

2013-03-01 19:58:39 UTC

Permalink

I think this fills a useful use case, so yeah I would say send it to
the mailing list.

--
-- rouilj

John Rouillard System Administrator
Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111

gregrwm

2013-03-03 00:51:47 UTC

Permalink

Post by John Rouillard

Post by gregrwm
a happily sized set of hardlinks that rsync can actually manage (ymmv).
[...] if it's of interest i could share it.

I think this fills a useful use case, so yeah I would say send it to
the mailing list.

#currently running as "pull", tho can run as "push" with minor mods:
#(root bash)
rbh=remote.backuppc.host
b=/var/lib/BackupPC/pc
cd /local/third/copy
sb='sudo -ubackuppc'
ssp="$sb ssh -p2222" #nonstandard ssh port
ssb="$ssp $rbh cd $b"
from_to=("$rbh:$b/*" .)
fob=$ssb
df=($(df -m .))
prun= #prun=--delete-excluded to prune
local/third/copy down to most recent backups only
echo df=${df[10]} prun=$prun #show current filespace and prun setting
[ ! -s iMRFIN ]&&{ touch iMRFIN ||exit
$?;} #most recent finished set
[ ! -s iMRUNF ]&&{ touch iLRUNF ||exit $?;}||{ cat iMRUNF>>iLRUNF||exit
$?;} #most recent and less recent unfinished sets
$fob 'echo " --include=*/new"
#any unfinished backups
for m in */backups;do unset f i
#look at all pc/*/backups files
while read -r r;do r=($r)
[[ ${r[1]} = full ]]&&fu[f++]=$r
[[ ${r[1]} = incr ]]&&in[i++]=$r
[[ ${r[1]} = partial ]]&&echo " --include=${m%backups}$r"
#any incomplete backups
done < $m
[[ $f -gt 0 ]]&&echo " --include=${m%backups}${fu[f-1]}"
#most recent full
[[ $i -gt 0 ]]&&echo " --include=${m%backups}${in[i-1]}"
#most recent incremental
done'>| iMRUNF ||echo badexit;head -99 i*
#show backup sets included for transfer
rc=255;while [[ $rc = 255 ]];do date
#reconnect if 255(connection dropped)
#note some special custom excludes are on a separate line
rsync -qPHSae"$ssp" --rsync-path="sudo rsync" $(cat iMRFIN iLRUNF
iMRUNF) $prun --exclude="/*/*/" \
--exclude=fNAVupdate --exclude=fDownloads --exclude=\*Personal
--exclude="*COPY of C*" \
"${from_to[@]}"
rc=$?;echo rc=$rc;if [ $rc = 0 ];then mv iMRUNF iMRFIN;rm
iLRUNF;fi;done;df -m .

b***@kosowsky.org

2013-03-01 21:32:16 UTC

Permalink

Post by gregrwm
i'm using a simple procedure i cooked up to maintain a "third copy" at a
third physical location using as little bandwidth as possible. it simply
looks at each pc/*/backups, selects the most recent full and most recent
incremental (plus any partial or /new), and copies them across the wire,
together with the most recently copied full&incremental set (plus any
incompletely copied sets), using rsync, with it's hardlink copying
feature. thus my third location has a copy of the most recent (already
compressed) pc/ tree data, using rsync to avoid copying stuff over the wire
that's already there (and not bothering with the cpool), which, for me, is
a happily sized set of hardlinks that rsync can actually manage (ymmv). i
have successfully used this together with a script to recreate the cpool
if/when necessary. if it's of interest i could share it.

One caution: If one is managing multiple pc's with redundant files across them
(e.g., OS, apps), then you will waste a lot of bandwidth (and time)
copying them since you will lose the pooling. Alternatively, if you
use rsync with the -H flag, then you are back to the problem of rsync
choking on hardlinks.

Les Mikesell

2013-03-02 16:20:58 UTC

Permalink

Post by b***@kosowsky.org

But, you'd spit the difference of these problems if you 'rsync -H' a
single pc tree or just the recent runs at a time. Then the inode
table to track the links would be smaller and less likely to cause
trouble - and most of the hard links are to previous runs of the same
file anyway. And even backuppc itself won't identify the duplicates
before the transfer in each new location.

--
Les Mikesell
***@gmail.com

Adam Goryachev

2013-02-28 22:49:21 UTC

Permalink

Post by Mark Campbell
So I'm trying to get a BackupPC pool synced on a daily basis from a
1TB MD RAID1 array to an external Fireproof drive (with plans to also
sync to a remote server at our collo). I found the script
BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few examples
I've seen online have indicated to me that this isn't quite what I'm
looking for, since it appears to output it to a different layout. I
initially tried the rsync method with -H, but my server would end up
choking at 350GB. Any suggestions on how to do this?

The best option I've found if using an external drive of equal size to
the pool is to use Linux md RAID1, and use the --write-mostly on the
external drive. Make sure you enable bitmaps on the RAID1 array, and
after you rotate drives, you may not need to resync the entire content.

For offsite, you can use something like linux md raid1 over the top of
NBD, ENBD (or whatever it is called) or DRBD, etc... However, this
really depends on the speed of your remote connection, reliability, and
will most likely degrade performance significantly.

There have been plenty of discussions on this topic on the list over the
years, try to find it, as there are lots of options which work for
different people, and plenty of pros/cons for each method which has
already been discussed.

Regards,
Adam

--
Adam Goryachev
Website Managers
www.websitemanagers.com.au

unknown

1970-01-01 00:00:00 UTC

Permalink

Hi,

Post by Les Mikesell
On Wed, Mar 6, 2013 at 12:16 PM, Mark Campbell

Yes, I'd do that and try out the mirroring and send/receive features.
If you are sure everything else is good you can probably find the part
in the code that makes the links and remove it.

It's a bit more than one "part in the code". *New pool entries* are created
by BackupPC_link, which would then be essentially unnecessary. That part is
simple enough to turn off. But there's really a rather complex strategy to
link to *existing pool entries*. In fact, without pooling there is not much
point in using the Perl rsync implementation, for instance (well, maybe the
attrib files, but then again, maybe we could get rid of them as well, if we
don't use pooling). It really sounds like a major redesign of BackupPC if you
want to gain all the benefits you can. Sort of like halfway to 4.0 :).
Basically, you end up with just the BackupPC scheduler, rsync (or tar or just
about anything you can put into a command line) for transport, and ZFS for
storage. Personally, I'd probably get rid of the attrib files (leaving plain
file system snapshots easily accessible with all known tools and subject to
kernel permission checking) and the whole web interface ;-). Most others will
want to be able to browse backups through the web interface, which probably
entails keeping attrib files (and having all files be owned by the backuppc
user, just like the current situation). Then again, 'fakeroot' emulates
root-type file system semantics through a preloaded library. Maybe this idea
could be adapted for BackupPC to use stock tools for transport and get attrib
files (and backuppc file ownership) just the same.

ZFS is an interesting topic these days. It's probably best to gain some
BackupPC community experience with ZFS first, before contemplating changing
BackupPC to take the most advantage. Even with BackupPC pooling in place,
significant gains seem possible.

Regards,
Holger