Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


alexx

867 posts

Ultimate Geek
+1 received by user: 291


#102986 28-May-2012 00:35
Send private message

New system as follows:

* Gigabyte GA-Z77-D3H Intel Z77 Ivy Bridge Motherboard
* Intel Core i5 2500K Sandy Bridge 3.30GHz
* Corsair CML16GX3M4A1600C9 Vengeance LP 4x4GB DDR3-1600 RAM
* Crucial M4 64GB SSD

Motherboard BIOS/Firmware updated to the most recent: F13
http://www.gigabyte.com/products/product-page.aspx?pid=4140#bios

Motherboard settings are default, except time is set to UTC and changed default SATA setting from IDE to AHCI. The operating system is Debian GNU/Linux 6.0.5 (squeeze) with backported Linux 3.2.0 kernel needed to support the new motherboard.

I'm running ntp and I notice that when I restart the ntp service the system time is accurate.
But after a while it drifts outside the limits of what ntp can handle.

E.g. restarting ntp and checking status in 5 second intervals....

# service ntp restart; sleep 5; ntpq -pn; sleep 5; ntpq -pn; sleep 5; ntpq -pn
Stopping NTP server: ntpd.
Starting NTP server: ntpd.
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 219.88.250.190  119.47.118.129   3 u    2   64    1   32.344  -4645.6  24.938
 202.21.137.10   211.219.125.170  2 u    1   64    1   42.656  -4657.0  24.844
 202.6.116.123   202.46.183.13    2 u    2   64    1   52.763  -4642.1   0.000
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 219.88.250.190  119.47.118.129   3 u    2   64    1   32.373  -12.021   0.000
 202.21.137.10   211.219.125.170  2 u    2   64    1   42.895  -11.757   0.000
 202.6.116.123   202.46.183.13    2 u    3   64    0    0.000    0.000   0.000
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*219.88.250.190  119.47.118.129   3 u    1   64    1   32.713  -85.881  53.304
+202.21.137.10   211.219.125.170  2 u    1   64    1   42.945  -85.295  53.092
 202.6.116.123   202.46.183.13    2 u    8   64    0    0.000    0.000   0.000

Then check about 15 minutes later.

# ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 219.88.250.190  119.47.118.129   3 u   61   64  377   31.689  -14676. 3779.72
 202.21.137.10   211.219.125.170  2 u   63   64  377   42.899  -14650. 3699.08
 202.6.116.123   202.46.181.123   2 u   34   64  377   49.662  -15006. 3725.38

Note: the first one is my ISP's time server -> ntp.orcon.net.nz = 219.88.250.190

The offset (above) is about 15 seconds in less than 20 minutes after ntp restart and none of the time sources have "+" or "*" marks, which means that ntp doesn't consider them to be valid.

Then the next day after about 14 hours:

$ ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 219.88.250.190  119.47.118.129   3 u    3   64  377   31.807  -544726 3599.06
 202.6.116.123   202.46.183.13    2 u   15   64  377   51.482  -544573 3576.86
 202.21.137.10   211.219.125.170  2 u    1   64  377   42.417  -544749 3612.79

This appears to be a sign of a bad hardware clock that is drifting too fast for NTP to cope with.

I started to follow the procedure here:
http://support.ntp.org/bin/view/Support/HowToCalibrateSystemClockUsingNTP

I changed /etc/ntp.conf to simplify it with just one server line:
server ntp.orcon.net.nz iburst

Then restart NTP and check after a few minutes.

$ ntpq -c rv
associd=0 status=c028 leap_alarm, sync_unspec, 2 events, no_sys_peer,
version="ntpd 4.2.6p2@1.2194-o Sun Oct 17 13:35:13 UTC 2010 (1)",
processor="x86_64", system="Linux/3.2.0-0.bpo.1-amd64", leap=11,
stratum=4, precision=-23, rootdelay=44.022, rootdisp=2693.963,
refid=219.88.250.190,
reftime=d36c13cc.47138567  Sun, May 27 2012 14:45:32.277,
clock=d36c15ea.13d6b8c4  Sun, May 27 2012 14:54:34.077, peer=0, tc=6,
mintc=3, offset=0.000, frequency=15.778, sys_jitter=1153.294,
clk_jitter=30.364, clk_wander=0.000

But I don't think it's valid due to this "no_sys_peer" so there is no successful synchronisation.
If I check on another machine it is saying "clock_sync".

Next step is to check the drift by switching the ntp daemon off and using ntpdate (-q option means query but don't set the clock).

First ran "ntptime -f 0" and also remove ntp.drift file as suggested in the link above.
Stop ntp and use ntpdate.

root@debian-i5:~# service ntp stop
Stopping NTP server: ntpd.
root@debian-i5:~# ntpdate ntp.orcon.net.nz
27 May 16:47:15 ntpdate[21638]: adjust time server 219.88.250.190 offset -0.449953 sec

Then after about one hour:

root@debian-i5:~# ntpdate -q ntp.orcon.net.nz
server 219.88.250.190, stratum 3, offset -46.641260, delay 0.05699
27 May 17:50:33 ntpdate[21804]: step time server 219.88.250.190 offset -46.641260 sec

Huge jump of more than 46 seconds, but perhaps it might settle down after a while...

But an hour later we see a similar drift of about 44 seconds, making a total of more than 90 seconds in two hours.

root@debian-i5:~# ntpdate -q ntp.orcon.net.nz
server 219.88.250.190, stratum 3, offset -90.685901, delay 0.05710
27 May 18:50:15 ntpdate[22051]: step time server 219.88.250.190 offset -90.685901 sec


So I'm about to e-mail Computerlounge and request a replacement motherboard, but I decide to just reboot and see what happens.
System reboots and ntpd starts.

Several hours later ntp is doing just fine.

# ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*219.88.250.190  119.47.118.129   3 u  257 1024  377   31.948   -2.031   1.403

# ntpdate -q ntp.orcon.net.nz
server 219.88.250.190, stratum 3, offset -0.002055, delay 0.05696
28 May 00:05:43 ntpdate[30722]: adjust time server 219.88.250.190 offset -0.002055 sec

Then I find some similar cases - at least the first case appears to be software related:
https://bugzilla.redhat.com/show_bug.cgi?id=666558
http://forums.fedoraforum.org/showthread.php?p=1443346

Then I remember that the previous day I had been using the power control button on the Logitech K200 keyboard to suspend the pc while I went out for dinner and a couple of other times and maybe this upset ntpd or the kernel timekeeping.

Questions:
1. Has anyone else seen this sort of strange behaviour with ntpd?
2. If I eliminate the operating system issues and find there is some drift in the hardware clock, how much is too much?
3. Has anyone here measured the drift of their real time clock?

Here it suggests that 12 PPM (one second per day) might be considered normal/acceptable, but 500 PPM would be considered very bad (only poor old mechanical wristwatches are worse):
http://www.ntp.org/ntpfaq/NTP-s-sw-clocks-quality.htm

From memory I recall reading elsewhere, that ntp might be able to cope with a drift of 100 PPM, but that should only be seen in some old systems and not a brand new motherboard. My understanding is that the real time clock is integrated into the Southbridge chipset so replacement would mean motherboard swap.

Edit: fix some typos.





#include <standard.disclaimer>


Create new topic
Ragnor
8279 posts

Uber Geek
+1 received by user: 585

Trusted

  #631385 28-May-2012 11:39
Send private message

Small time drift is normal but that seems extreme.

Could you use a live boot from usb/cd distro to try and rule out the current OS as part of the problem?



alexx

867 posts

Ultimate Geek
+1 received by user: 291


  #631720 28-May-2012 22:34
Send private message

It seems to be quite good now - most of the time there is just 1-2 mS offset and quite stable.

$ ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*219.88.250.190  202.21.137.10    3 u  902 1024  377   31.577    1.095   3.242

I'm going to remove ntp then reboot and just let it free run for a while (checking with ntpdate).
I thought about the bootable cdrom/usb idea, but I'm not sure how many of those come with ntpdate.

If I really want to eliminate the O/S I can just boot to the BIOS menu and leave it overnight and see if there is any significant change.





#include <standard.disclaimer>


alexx

867 posts

Ultimate Geek
+1 received by user: 291


  #632345 30-May-2012 00:12
Send private message

Removed ntpd and rebooted, then set time with ntpdate.

# ntpdate ntp.orcon.net.nz
28 May 22:50:16 ntpdate[2312]: adjust time server 219.88.250.190 offset -0.363547 sec

Check time after about 20 minutes.

# ntpdate -q ntp.orcon.net.nz
server 219.88.250.190, stratum 3, offset 0.085513, delay 0.06082
28 May 23:12:25 ntpdate[2463]: adjust time server 219.88.250.190 offset 0.085513 sec

Check after 24 hours.

# ntpdate -q ntp.orcon.net.nz
server 219.88.250.190, stratum 3, offset 1.471014, delay 0.05736
29 May 23:52:24 ntpdate[4897]: step time server 219.88.250.190 offset 1.471014 sec

If we ignore the initial drift which was not long after the reboot, then the clock drift is about 1.38 seconds in a little over 24 hours, or less than 16 PPM. Even if we include the initial drift from that first ntpdate, then about 20 PPM drift overall, but most likely still quite acceptable and certainly within the limits of what ntpd should be able to handle.

I suspect the earlier problem might have been due to when I used the "suspend" from the keyboard control. Perhaps suspend and ntpd don't play nice with each other. Might do some more experiments with that later.




#include <standard.disclaimer>




alexx

867 posts

Ultimate Geek
+1 received by user: 291


  #1076221 28-Jun-2014 19:28
Send private message

Long time no update, but in case anyone find this as the result of a search, the problem is solved. Several motherboard firmware updates and multiple linux kernel updates later there are no problems. I'm not sure if it was the firmware or linux updates that did it. Perhaps I just didn't give a new installation enough time to settle down after installation.

One thing I noticed, is that ntp.orcon.net.nz has disappeared, but that would be off-topic for the linux forum.




#include <standard.disclaimer>


alexx

867 posts

Ultimate Geek
+1 received by user: 291


  #1076319 28-Jun-2014 23:34
Send private message

One more update... now using s1.ntp.net.nz and s2.ntp.net.nz instead of the old orcon ntp server.
Please read acceptable use info: https://ntp.net.nz/aup.html





#include <standard.disclaimer>


insane
3324 posts

Uber Geek
+1 received by user: 1006

ID Verified
Trusted
2degrees
Subscriber

  #1076321 28-Jun-2014 23:40
Send private message

I'd say this sounds like a problem with your CPU going into some very low level power state, you could try disabling the c3 state in the BIOS? I see a similar thing with virtual machines where clock cycles are shared, resulting in some clock drift if not tied to a time source or the host through vmware / open vm tools.





 
 
 
 

Shop now for Lenovo laptops and other devices (affiliate link).
alexx

867 posts

Ultimate Geek
+1 received by user: 291


  #1076324 29-Jun-2014 00:04
Send private message

The problem is solved now thanks, perhaps one of the BIOS updates fixed it. But yes virtual machines and ntp can be a problem and OS X 10.9 appears to have done some tricks with ntp to help achieve super long battery life, which can also cause a few problems.





#include <standard.disclaimer>


Create new topic








Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.