Discussion:
Slow browsing of DFS namespace
(too old to reply)
d***@gmx.net
2006-09-06 14:33:56 UTC
Permalink
Problem:
Users have occasionally delays of 30sec when browsing our DFS
namespace. Connecting by UNC path to the fileserver works without
delays.

Configuration:
Single domain called domain1.domain.net with 10 sites all over the
world. All domain controllers are W2K3 SP1. Each site has at least one
domain controller acting as a root target. All root targets are part of
the same namespace called domain1.domain.net\dfsshare and are pointing
to their local fileserver.
All sites are correct in Sites and Services. No DNS issues were
encountered. Logon times and AD replikation are all fine.

What we found out so far:
- dfsutil /pktinfo shows that the clients which have the problem have
their local root target selected (active). Because of this I don't
assume that KB905846 is our issue.
- Network analyses with Netmon show a lot of SMB errors when the
problem occurs. Actually what happens is that the client is trying to
connect to one specific fileserver in a remote site and gets a lot of
Error Code = (52) STATUS_OBJECT_NAME_NOT_FOUND. After 30sec it connects
to the local fileserver and everything works fast and smooth. The name
of the mentioned fileserver in the remote site starts with an "A" and
therefor is the first in the alphabetic list of all our fileservers. It
seems like the client is trying to connect to first fileserver before
it connects to lokal fileserver.

Any help is appreciated
Dominique
Ned Pyle [MSFT]
2006-09-06 14:38:57 UTC
Permalink
Do you use Mcafee antivirus in your environment, and is it running Patch 11?
--
Ned Pyle
Microsoft Enterprise Platforms Support

All postings on this newsgroup are provided "AS IS" with no warranties, and
confer no rights.
For more information please visit
http://www.microsoft.com/info/cpyright.mspx to find terms of use.
d***@gmx.net
2006-09-06 14:45:40 UTC
Permalink
Post by Ned Pyle [MSFT]
Do you use Mcafee antivirus in your environment, and is it running Patch 11?
--
Thanks for the fast reply.

No, we have Trendmicro Officescan 7.3. But as my next step I will
disabled Officescan on my test machine.

PS: All clients are running WinXP SP2 with the latest patches.
Ned Pyle [MSFT]
2006-09-06 21:13:07 UTC
Permalink
Hmmm... interesting. So it's trying to connect to a remote file server
before it connects to the local one? Are these the only two possible
machines serving that particular share? Should always do the local first if
the AD Sites and subnets are configured correctly, of course...
--
Ned Pyle
Microsoft Enterprise Platforms Support

All postings on this newsgroup are provided "AS IS" with no warranties, and
confer no rights.
For more information please visit
http://www.microsoft.com/info/cpyright.mspx to find terms of use.
d***@gmx.net
2006-09-07 06:59:17 UTC
Permalink
No, we have 10 file servers which are all serving the same share. Each
one is located at a different site. But when the delay occurs it's
always the same remote file server that the client wants to connect to.
That particular remote file server happens to be the one that is the
most far away from our site. I have to say that the problem occurs only
once in a while (maybe every 10th access is delayed) but it's clearly
reproducible on my test workstation.

I'm positive that our AD sites and subnets are correctly configured.
Also the referral determination seems to work as it should be. I ran
dfsutil /pktinfo before and after the problem and the client always has
the local domain controller as its root target active.

Here is an example of the network traffic during the delay. This goes
on for about 30sec between the client and the wrong file server until
it finally connects to local file server:

SMB R NT create & X - NT error, System, Error, Code = (52)
STATUS_OBJECT_NAME_NOT_FOUND

SMB C NT create & X, File = :Docf_ OzngklrtOwudrp0bAayojd1qWh:$DATA

It's a mystery to me what the part between File= and $DATA means. This
kind of data doesn't show up when the client communicates with the
local file server.

Dominique
d***@gmx.net
2006-09-07 08:41:26 UTC
Permalink
PS: I've disabled all Officescan services in the meantime but the error
still occurs.
Ned Pyle [MSFT]
2006-09-07 16:11:33 UTC
Permalink
Mmmph. The fact that we're leaving the site is definitely one problem. Then
the fact that we find a target which refuses the connections is another
issue. We could be leaving the site because there's a network issue with the
local target, then based on bridge all site links/site link costs, the out
of site referrals might simply be a big ol' random list at that point.

Since you know the error behavior, you might consider running NETCAP with a
trigger that stops a large circular trace when the issue occurs; then we'd
see the whole problem as it unfolded.
--
Ned Pyle
Microsoft Enterprise Platforms Support

All postings on this newsgroup are provided "AS IS" with no warranties, and
confer no rights.
For more information please visit
http://www.microsoft.com/info/cpyright.mspx to find terms of use.
d***@gmx.net
2006-09-08 09:16:52 UTC
Permalink
I was able to capture the network traffic with NETMON several times
when the issue occured. Would the capture with NETCAP be any different?
Unfortunately I don't have enough in-deep network knowledge to really
tell what's going on. The only thing that is obviously to me are the
packets that I mentioned in my last posting.
Would you like to take a look in those CAP files?

Dominique
Post by Ned Pyle [MSFT]
Mmmph. The fact that we're leaving the site is definitely one problem. Then
the fact that we find a target which refuses the connections is another
issue. We could be leaving the site because there's a network issue with the
local target, then based on bridge all site links/site link costs, the out
of site referrals might simply be a big ol' random list at that point.
Since you know the error behavior, you might consider running NETCAP with a
trigger that stops a large circular trace when the issue occurs; then we'd
see the whole problem as it unfolded.
--
Ned Pyle
Microsoft Enterprise Platforms Support
All postings on this newsgroup are provided "AS IS" with no warranties, and
confer no rights.
For more information please visit
http://www.microsoft.com/info/cpyright.mspx to find terms of use.
Ned Pyle [MSFT]
2006-09-08 15:10:34 UTC
Permalink
No difference - one's just cmdline.

Feel free to send me the CAP files zipped - ***@microsoft.com.
Please point out which IP's are which - who's the client, DC, DFS servers,
etc.
--
Ned Pyle
Microsoft Enterprise Platforms Support

All postings on this newsgroup are provided "AS IS" with no warranties, and
confer no rights.
For more information please visit
http://www.microsoft.com/info/cpyright.mspx to find terms of use.
d***@gmx.net
2006-09-11 12:18:18 UTC
Permalink
I've sent you an email with the caps.
Let me know if you need further information.

Dominique
Post by Ned Pyle [MSFT]
No difference - one's just cmdline.
Please point out which IP's are which - who's the client, DC, DFS servers,
etc.
--
Ned Pyle
Microsoft Enterprise Platforms Support
All postings on this newsgroup are provided "AS IS" with no warranties, and
confer no rights.
For more information please visit
http://www.microsoft.com/info/cpyright.mspx to find terms of use.
tonyr
2006-09-11 20:29:01 UTC
Permalink
get dsfutil it will tell you whats responding, most likely not the correct
site!
tr
Post by d***@gmx.net
I've sent you an email with the caps.
Let me know if you need further information.
Dominique
Post by Ned Pyle [MSFT]
No difference - one's just cmdline.
Please point out which IP's are which - who's the client, DC, DFS servers,
etc.
--
Ned Pyle
Microsoft Enterprise Platforms Support
All postings on this newsgroup are provided "AS IS" with no warranties, and
confer no rights.
For more information please visit
http://www.microsoft.com/info/cpyright.mspx to find terms of use.
Ned Pyle [MSFT]
2006-09-11 21:53:26 UTC
Permalink
(Dominique, I responded to you offline as well):

Basically, in frame 999 we can see that ONLY the \REMOTESERVER\ server is
returned in the referral.

Referrals
Referral
Version: 3
Size: 34
Server Type: Don't know (0)
Flags: 0x0000
.... .... .... ...0 = Strip: Do NOT strip off any
characters
Proximity: 1800
TTL: 0
Path Offset: 34
Alt Path Offset: 94
Node Offset: 154
Path: \CONTOSO.COM\ROOT\LINKTARGETNAMESPACE
Alt Path: \CONTOSO.COM\ROOT\LINKTARGETNAMESPACE
Node: \REMOTESERVER\SOME_SHARE
Unknown Data: 00000000000000000000000000000000

None others are listed. This means that as far as the DC is concerned, the
client’s settings should put him in the same site as this server. Does the
client’s IP address and
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters |
DynamicSiteName match up with the DC1, DC2, LOCALSERVER subnets in AD
(including checking for one-off VLAN’s, DHCP super-netting, whatever). Also,
was the /INSITE switch applied to this namespace or link target namespace
(http://technet2.microsoft.com/WindowsServer/en/library/a9096e88-1634-4da6-b820-537341d349061033.mspx?mfr=true)?
That would explain why there is only one link returned in total (without it,
should see other machines, but ordered with targets in site at the top).
--
Ned Pyle
Microsoft Enterprise Platforms Support

All postings on this newsgroup are provided "AS IS" with no warranties, and
confer no rights.
For more information please visit
http://www.microsoft.com/info/cpyright.mspx to find terms of use.
Loading...