Discussion:
[PATCH] [WIP] AD DC backup and restore tool
Andrew Bartlett via samba-technical
2018-05-16 08:30:16 UTC
Permalink
G'Day,

Just a heads up that Tim and I plan to finish up the backup tool soon.

Given the strong feedback so far it will include the restore tool,
however it won't include the extended attributes (file permissions on
the backup of the [netlogon share]).

Handling the extended attributes turns out to be harder than you might
expect, while 'just' a couple more options to a tar command, testing it
runs up against the fake xattrs we use in our selftest environment,
which are not visible to tar.

Likewise, for the online backup, the ideal option would be to query
these over SMB and store the NT ACL directly into the tar xattrs via
the python API. However this isn't available until python 3.x

Finally, testing runs into the same issue, we can't just extract the
files with tar because we need the xattrs put into the xattr.tdb.

As the client task was literally named 'samba_backup is not tested' I'm
loathed to add a feature I can't test.

Therefore, I will be proposing the tool matching the existing
samba_backup for features, but more importantly with the critical
locking bug addressed.

For the overly curious, the current WIP patches are part of this tree:
https://gitlab.com/catalyst-samba/samba/commits/aaron-backup2

Please let us know any further feedback we should be aware of when
presenting these.

Thanks,

Andrew Bartlett
--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba
Rowland Penny via samba-technical
2018-05-16 09:10:19 UTC
Permalink
On Wed, 16 May 2018 20:30:16 +1200
Post by Andrew Bartlett via samba-technical
G'Day,
Just a heads up that Tim and I plan to finish up the backup tool soon.
Given the strong feedback so far it will include the restore tool,
however it won't include the extended attributes (file permissions on
the backup of the [netlogon share]).
Handling the extended attributes turns out to be harder than you might
expect, while 'just' a couple more options to a tar command, testing
it runs up against the fake xattrs we use in our selftest environment,
which are not visible to tar.
Likewise, for the online backup, the ideal option would be to query
these over SMB and store the NT ACL directly into the tar xattrs via
the python API. However this isn't available until python 3.x
Finally, testing runs into the same issue, we can't just extract the
files with tar because we need the xattrs put into the xattr.tdb.
As the client task was literally named 'samba_backup is not tested'
I'm loathed to add a feature I can't test.
Therefore, I will be proposing the tool matching the existing
samba_backup for features, but more importantly with the critical
locking bug addressed.
https://gitlab.com/catalyst-samba/samba/commits/aaron-backup2
Please let us know any further feedback we should be aware of when
presenting these.
Thanks,
Andrew Bartlett
Hi Andrew, what does backing up another server online get you, that just
backing up the server the python code is running on doesn't ?

If the code has mostly been written by Aaron, why is the copyright
assigned to you ?

As I said before, get the locking bug code into Samba and backport it,
we can discuss the python backup script then.

Rowland
Rowland Penny via samba-technical
2018-05-16 09:53:26 UTC
Permalink
On Wed, 16 May 2018 21:28:46 +1200
Post by Rowland Penny via samba-technical
On Wed, 16 May 2018 20:30:16 +1200
Andrew Bartlett via samba-technical
Post by Andrew Bartlett via samba-technical
G'Day,
Just a heads up that Tim and I plan to finish up the backup tool soon.
Given the strong feedback so far it will include the restore tool,
however it won't include the extended attributes (file
permissions on the backup of the [netlogon share]).
Handling the extended attributes turns out to be harder than you
might expect, while 'just' a couple more options to a tar
command, testing it runs up against the fake xattrs we use in our
selftest environment, which are not visible to tar.
Likewise, for the online backup, the ideal option would be to
query these over SMB and store the NT ACL directly into the tar
xattrs via the python API. However this isn't available until
python 3.x
Finally, testing runs into the same issue, we can't just extract
the files with tar because we need the xattrs put into the
xattr.tdb.
As the client task was literally named 'samba_backup is not
tested' I'm loathed to add a feature I can't test.
Therefore, I will be proposing the tool matching the existing
samba_backup for features, but more importantly with the critical
locking bug addressed.
https://gitlab.com/catalyst-samba/samba/commits/aaron-backup2
Please let us know any further feedback we should be aware of when
presenting these.
Thanks,
Andrew Bartlett
Hi Andrew, what does backing up another server online get you, that
just backing up the server the python code is running on doesn't ?
Quite a few things. We generally find that DRS replication gives a
much more reliable snapshot of the DB, without hidden faults lurking
within the raw database.
No, still don't understand this. Yes, I understand that some attributes
do not get replicated, but this is the same for all DCs, but if the DB
is damaged in some way, it is very probably damaged on all DCs including
the one being backed up online.
On the flip side, a file-based backup will get non-replicated
attributes. Each has their place.
Tim will also be building on the basis of the online tool to provide a
new domain-rename feature, which should be quite handy.
This is a very good idea, this topic comes up quite often.
Post by Rowland Penny via samba-technical
If the code has mostly been written by Aaron, why is the copyright
assigned to you ?
For internal corporate reasons. While it looks strange, it is
deliberate and legitimate.
I didn't think otherwise, but I still think Aaron should be given
credit.
Post by Rowland Penny via samba-technical
As I said before, get the locking bug code into Samba and backport
it, we can discuss the python backup script then.
The inability to correctly lock the various databases safely in
samba_backup is the major purpose of the new script. However, now
that we have needed to build it, we have tried to make it a proper
part of samba-tool, built in a way we can support long-term and
backed by a dazzling array of tests.
By your own admission, you cannot test it, the only valid test I can
think of is, can you backup a DC, remove the DB etc and then restore it
to a working state again.
Running tdbbackup -r won't change a shell script into one that locks
the whole DB. This is because a transaction lock (or at least a
global read lock) needs to be taken out correctly over the DB while
the databases are being copied.
I think what you are trying to point out here is, even with 'tdbbackup
-r', you only get a lock on the file being backed up, but the others
could change whilst the file is backed up.
(tdbbackup -r is being added to allow this snapshot. It is a
necessary but not alone a sufficient element of the process).
Thanks,
Andrew Bartlett
Rowland
Andrew Bartlett via samba-technical
2018-05-16 10:00:17 UTC
Permalink
Post by Rowland Penny via samba-technical
Running tdbbackup -r won't change a shell script into one that locks
the whole DB. This is because a transaction lock (or at least a
global read lock) needs to be taken out correctly over the DB while
the databases are being copied.
I think what you are trying to point out here is, even with 'tdbbackup
-r', you only get a lock on the file being backed up, but the others
could change whilst the file is backed up.
Yes. The -r thing is just to have tdbbackup use a read lock that is
compatible with the overall transaction lock so both can co-exist and
the backup proceed.

I trust this clarifies things,

Andrew Bartlett
--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba
Andrew Bartlett via samba-technical
2018-05-16 10:13:03 UTC
Permalink
Post by Rowland Penny via samba-technical
On Wed, 16 May 2018 21:28:46 +1200
Post by Rowland Penny via samba-technical
On Wed, 16 May 2018 20:30:16 +1200
Andrew Bartlett via samba-technical
Post by Andrew Bartlett via samba-technical
G'Day,
Just a heads up that Tim and I plan to finish up the backup tool soon.
Given the strong feedback so far it will include the restore tool,
however it won't include the extended attributes (file
permissions on the backup of the [netlogon share]).
Handling the extended attributes turns out to be harder than you
might expect, while 'just' a couple more options to a tar
command, testing it runs up against the fake xattrs we use in our
selftest environment, which are not visible to tar.
Likewise, for the online backup, the ideal option would be to
query these over SMB and store the NT ACL directly into the tar
xattrs via the python API. However this isn't available until
python 3.x
Finally, testing runs into the same issue, we can't just extract
the files with tar because we need the xattrs put into the
xattr.tdb.
As the client task was literally named 'samba_backup is not
tested' I'm loathed to add a feature I can't test.
Therefore, I will be proposing the tool matching the existing
samba_backup for features, but more importantly with the critical
locking bug addressed.
https://gitlab.com/catalyst-samba/samba/commits/aaron-backup2
Please let us know any further feedback we should be aware of when
presenting these.
Thanks,
Andrew Bartlett
Hi Andrew, what does backing up another server online get you, that
just backing up the server the python code is running on doesn't ?
Quite a few things. We generally find that DRS replication gives a
much more reliable snapshot of the DB, without hidden faults lurking
within the raw database.
No, still don't understand this. Yes, I understand that some attributes
do not get replicated, but this is the same for all DCs, but if the DB
is damaged in some way, it is very probably damaged on all DCs including
the one being backed up online.
We are providing two new tools that will help our administrators and
cover the two types of backup that make sense to have.

- Offline: Files as-is, just locked
- Online: What a new DC would get if it joined the domain
Post by Rowland Penny via samba-technical
On the flip side, a file-based backup will get non-replicated
attributes. Each has their place.
Tim will also be building on the basis of the online tool to provide a
new domain-rename feature, which should be quite handy.
This is a very good idea, this topic comes up quite often.
Post by Rowland Penny via samba-technical
If the code has mostly been written by Aaron, why is the copyright
assigned to you ?
For internal corporate reasons. While it looks strange, it is
deliberate and legitimate.
I didn't think otherwise, but I still think Aaron should be given
credit.
Douglas sometimes added a note to such effect, and in any case Aaron is
still listed as the author of each patch, which is the primary way we
track these things.
Post by Rowland Penny via samba-technical
Post by Rowland Penny via samba-technical
As I said before, get the locking bug code into Samba and backport
it, we can discuss the python backup script then.
The inability to correctly lock the various databases safely in
samba_backup is the major purpose of the new script. However, now
that we have needed to build it, we have tried to make it a proper
part of samba-tool, built in a way we can support long-term and
backed by a dazzling array of tests.
By your own admission, you cannot test it, the only valid test I can
think of is, can you backup a DC, remove the DB etc and then restore it
to a working state again.
I'm not sure where I said I can't test it, because we plan to do just
that. One of the tests we plan fully animate a working DC alongside
ad_dc and the other selftest environments, using a stored backup and
these tools.

There are lots of other good tests already written that validate the
state of the backup, that likewise gives us good confidence that the
tool works as described.

This is a serious matter, we don't intend to leave it to chance.

I hope to post some actual patches soon, so we can move this discussion
to the practical details of what can reasonably be improved, as we are
going back over ground we already discussed a few weeks ago.

Thanks,

Andrew Bartlett
--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba
Rowland Penny via samba-technical
2018-05-16 10:25:57 UTC
Permalink
On Wed, 16 May 2018 22:13:03 +1200
Post by Andrew Bartlett via samba-technical
Post by Rowland Penny via samba-technical
On Wed, 16 May 2018 21:28:46 +1200
Post by Rowland Penny via samba-technical
On Wed, 16 May 2018 20:30:16 +1200
Andrew Bartlett via samba-technical
Post by Andrew Bartlett via samba-technical
G'Day,
Just a heads up that Tim and I plan to finish up the backup tool soon.
Given the strong feedback so far it will include the restore
tool, however it won't include the extended attributes (file
permissions on the backup of the [netlogon share]).
Handling the extended attributes turns out to be harder than
you might expect, while 'just' a couple more options to a tar
command, testing it runs up against the fake xattrs we use in
our selftest environment, which are not visible to tar.
Likewise, for the online backup, the ideal option would be to
query these over SMB and store the NT ACL directly into the
tar xattrs via the python API. However this isn't available
until python 3.x
Finally, testing runs into the same issue, we can't just
extract the files with tar because we need the xattrs put
into the xattr.tdb.
As the client task was literally named 'samba_backup is not
tested' I'm loathed to add a feature I can't test.
Therefore, I will be proposing the tool matching the existing
samba_backup for features, but more importantly with the
critical locking bug addressed.
https://gitlab.com/catalyst-samba/samba/commits/aaron-backup2
Please let us know any further feedback we should be aware of
when presenting these.
Thanks,
Andrew Bartlett
Hi Andrew, what does backing up another server online get you,
that just backing up the server the python code is running on
doesn't ?
Quite a few things. We generally find that DRS replication gives
a much more reliable snapshot of the DB, without hidden faults
lurking within the raw database.
No, still don't understand this. Yes, I understand that some
attributes do not get replicated, but this is the same for all DCs,
but if the DB is damaged in some way, it is very probably damaged
on all DCs including the one being backed up online.
We are providing two new tools that will help our administrators and
cover the two types of backup that make sense to have.
- Offline: Files as-is, just locked
- Online: What a new DC would get if it joined the domain
Ah, this I do understand, but do not see the need for. Surely, if at
least one DC is running correctly, you do not need the 'online' backup,
you should just join a new DC. My understanding is that you should only
restore a DC if no other DCs are running i.e. you have suffered a
catastrophic DB failure.
Post by Andrew Bartlett via samba-technical
Post by Rowland Penny via samba-technical
On the flip side, a file-based backup will get non-replicated
attributes. Each has their place.
Tim will also be building on the basis of the online tool to
provide a new domain-rename feature, which should be quite handy.
This is a very good idea, this topic comes up quite often.
Post by Rowland Penny via samba-technical
If the code has mostly been written by Aaron, why is the
copyright assigned to you ?
For internal corporate reasons. While it looks strange, it is
deliberate and legitimate.
I didn't think otherwise, but I still think Aaron should be given
credit.
Douglas sometimes added a note to such effect, and in any case Aaron
is still listed as the author of each patch, which is the primary way
we track these things.
Post by Rowland Penny via samba-technical
Post by Rowland Penny via samba-technical
As I said before, get the locking bug code into Samba and
backport it, we can discuss the python backup script then.
The inability to correctly lock the various databases safely in
samba_backup is the major purpose of the new script. However, now
that we have needed to build it, we have tried to make it a proper
part of samba-tool, built in a way we can support long-term and
backed by a dazzling array of tests.
By your own admission, you cannot test it, the only valid test I can
think of is, can you backup a DC, remove the DB etc and then
restore it to a working state again.
I'm not sure where I said I can't test it, because we plan to do just
that. One of the tests we plan fully animate a working DC alongside
ad_dc and the other selftest environments, using a stored backup and
these tools.
This must have been my mis-understanding, it sounded like you couldn't
write an actual test.
Post by Andrew Bartlett via samba-technical
There are lots of other good tests already written that validate the
state of the backup, that likewise gives us good confidence that the
tool works as described.
This is a serious matter, we don't intend to leave it to chance.
I hope to post some actual patches soon, so we can move this
discussion to the practical details of what can reasonably be
improved, as we are going back over ground we already discussed a few
weeks ago.
Thanks,
Andrew Bartlett
Rowland

L.P.H. van Belle via samba-technical
2018-05-16 10:14:16 UTC
Permalink
Bit of a feedback on this subject.
Post by Rowland Penny via samba-technical
On the flip side, a file-based backup will get non-replicated
attributes. Each has their place.
Tim will also be building on the basis of the online tool to provide a
new domain-rename feature, which should be quite handy.
This is a very good idea, this topic comes up quite often.
Good idee, Yes, big time and No, also big time.
I preffer NO.

If a domain name changes, how is samba going to handle the needed system changes.
These get often forgotten or people mis changes, which will result in more list questions imo.
Harder to detect problems, etc.

Like the primary domain/search or /etc/hosts or howto handle the "clients" needed changes.
This is not a small thing.

The idee is good, but, ive seen this idee for at least 15-20 years now in my windows/novell/samba (and other's) experiance.
And it (almost) always ends up with.. A clean install was better and less problems and quicker because of the resulting other problems.
I've dont it a few times with success on windows 2000/2003/2008 but only 100% success before you join any pc and before production.

Once a server is in production, i dont do it, i know the risk, and the time it cost to fix things.

Just some extra input here for a system engineers viewpoint.


Greetz,

Louis
Loading...