starlight
2008-03-30 17:09:43 UTC
Hello,
I'm trying to configure a small network for high precision time.
Recently acquired an Endrun CDMA time server that runs like
a dream, tracking CDMA time to about +/- 5 microseconds.
The clients are a rag-tag assembly of diverse systems including
a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80,
IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.
All are configured to prefer the Endrun clock and poll it on a
16 second interval. All are attached to a single SMC gigabit
Ethernet switch with only the Endrun and two Sun systems running
at a lower speed of 100 MBPS. Close to zero network traffic
and system loads.
All systems are running 'ntpd' 4.2.4p4. Compiled NTP native
64-bit for the Windows X64 system. [A #ifdef tweak to
'intptr_t' and 'uintptr_t' is required, will provide patch if
desired].
It generally is working well, with the systems tracking anywhere
from +/- 100 microseconds to +/- 500 microseconds most of the
time.
However once or twice a day, all the systems experience a
random, uncorrelated time shift of from one to several
milliseconds. Had an issue where a UPS voltage correction shift
and cheap power supply on the Windows X64 box appeared to be a
problem, but that was fixed by configuring the UPS to consider
110V nominal instead of 120V.
Does anyone have any ideas about what could be causing these
random time jumps and what might be done to eliminate them?
Something I'm planning to try is to make sure that 'mlock' is
configured in the daemons--presently 'autoconf' has left it
disabled for some reason. However I don't belive page
faults are the culprit. All the daemons are running at
the highest real-time priority in the respective systems.
The above configuration is a controlled lab setup. The next
target is a stack eight of DELL 1950 servers in a production
data center running Windows 2003 R2 and slaved to a newer Endrun
time server. Don't have useful data from these systems yet
because the network jitter is outrageous. Working with the
network admin to hopefully have the NTP traffic to and from the
Endrun clock bypass level 3 switch/router rule checking. They
have large, complex router ACL rulesets I suspect as the cause
of the jitter.
Attached are fairly representative graphs of the offset and
frequency for two of the lab servers.
Thanks
P.S. Resent without graphs as the list mailer says
they're not allowed. Happy to send them or the raw
'loopstats' to anyone interested.
I'm trying to configure a small network for high precision time.
Recently acquired an Endrun CDMA time server that runs like
a dream, tracking CDMA time to about +/- 5 microseconds.
The clients are a rag-tag assembly of diverse systems including
a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80,
IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.
All are configured to prefer the Endrun clock and poll it on a
16 second interval. All are attached to a single SMC gigabit
Ethernet switch with only the Endrun and two Sun systems running
at a lower speed of 100 MBPS. Close to zero network traffic
and system loads.
All systems are running 'ntpd' 4.2.4p4. Compiled NTP native
64-bit for the Windows X64 system. [A #ifdef tweak to
'intptr_t' and 'uintptr_t' is required, will provide patch if
desired].
It generally is working well, with the systems tracking anywhere
from +/- 100 microseconds to +/- 500 microseconds most of the
time.
However once or twice a day, all the systems experience a
random, uncorrelated time shift of from one to several
milliseconds. Had an issue where a UPS voltage correction shift
and cheap power supply on the Windows X64 box appeared to be a
problem, but that was fixed by configuring the UPS to consider
110V nominal instead of 120V.
Does anyone have any ideas about what could be causing these
random time jumps and what might be done to eliminate them?
Something I'm planning to try is to make sure that 'mlock' is
configured in the daemons--presently 'autoconf' has left it
disabled for some reason. However I don't belive page
faults are the culprit. All the daemons are running at
the highest real-time priority in the respective systems.
The above configuration is a controlled lab setup. The next
target is a stack eight of DELL 1950 servers in a production
data center running Windows 2003 R2 and slaved to a newer Endrun
time server. Don't have useful data from these systems yet
because the network jitter is outrageous. Working with the
network admin to hopefully have the NTP traffic to and from the
Endrun clock bypass level 3 switch/router rule checking. They
have large, complex router ACL rulesets I suspect as the cause
of the jitter.
Attached are fairly representative graphs of the offset and
frequency for two of the lab servers.
Thanks
P.S. Resent without graphs as the list mailer says
they're not allowed. Happy to send them or the raw
'loopstats' to anyone interested.