[Tfug] ntpd problems - server losing an hour or more each day

Mon Oct 30 14:08:52 MST 2006

--- Chad Woolley <thewoolleyman at gmail.com> wrote:

> As a follow up, this ended up NOT fixing the problem
> after all.  It
> kept time for a couple of days, then started losing
> again.  I finally
> just stuck this in root's crontab to get rid of the
> problem for now:
> 
> 0 1 * * * /usr/sbin/ntpdate
> pool.ntp.org;/etc/init.d/ntp-server start
> 
> This will force a sync and restart the ntp server
> every hour, whether it needs it or not.

What does this machine normally *do*?
I assume the RTC (if you reboot the machine and
examine it *before* any ntpdate(1) calls) *is*
maintaining proper time.

ntpd will "give up" if the kernel's idea of the
"current time" differs from *it's* idea of the
current time.  Presmably, a discrepancy comes into
existence at some point during normal operation.
Then, ntpd takes itself out of the loop and the
discrepancy *grows*.

Usually, resulting in time being *lost* (not
gained).

If the machine sees heavy I/O, this can result in
the jiffy being missed -- hence the time "loses"
one jiffy.  Do this a few hundred times in each period
of heavy disk I/O (e.g., burning CD's!) and the
effects are cumulative.

I don't know how much leeway you have regarding how
the machine is used.  Can you leave it quiescent
and verify that everything works fine?  With and
*without* ntpd running?

If the problem is losing interrupts, then you have
to ensure that you don't engage in activities that
let you lose "too many" interrupts for ntpd to
compensate.  I suspect there are splx()'s in
your drivers that are used instead of genuine
mutex's.  The more time your system spends *in*
these locks, the greter the chance of a lost IRQ.

--don

____________________________________________________________________________________
Cheap Talk? Check out Yahoo! Messenger's low PC-to-Phone call rates 
(http://voice.yahoo.com)