All of lore.kernel.org
 help / color / mirror / Atom feed
* hrtimer_interrupt time sync issues across cores
@ 2017-12-14  7:01 Rajasekaran Chandrasekaran
  2017-12-14  7:47 ` Greg KH
  2017-12-14 17:02 ` valdis.kletnieks at vt.edu
  0 siblings, 2 replies; 3+ messages in thread
From: Rajasekaran Chandrasekaran @ 2017-12-14  7:01 UTC (permalink / raw)
  To: kernelnewbies

Hi,


In our multi-core x86 based system that is running 3.4.19 version of
kernel, hrtimer_interrupt (called from apic_timer_interrupt) keeps looping
in hardirq for atleast 1.6 seconds.  We use tsc as our clock source. The
issue happens very rarely in our system and hard to reproduce.



Problem:

Inside hrtimer_interrupt function, basenow.tv64 in CPU-3 is 1.6 seconds
ahead of other CPU?s (we have 4 cores), whereas hrtimer->_softexpires.tv64
is in sync with remaining CPU?s. Due to this, the if condition inside
hrtimer_interrupt where we check if basenow.tv64 <
hrtimer_get_softexpires_tv64(timer) is not true for 1.6 seconds, which
cause the while loop inside hrtimer_interrupt to not exit. Below is the
ftrace captured during the problem.



<idle>-0     [002] d.h. 800364.533632: hrtimer_expire_entry:
hrtimer=ffff88017fd0c960 function=tick_sched_timer now=801616439840902

ksoftirqd/3-19    [003] dNh. 800364.539178: hrtimer_expire_entry:
hrtimer=ffff88017fd8c960 function=tick_sched_timer now=801618042768641

ksoftirqd/3-19    [003] dNh. 800364.539185: hrtimer_start:
hrtimer=ffff88017fd8c960 function=tick_sched_timer expires=801616446505014
softexpires=801616446505014



As we can see, the difference in now time between CPU-2 and CPU-3(where the
time jump is seen) is significant. Ftrace indicates that the now time has
drifted apart in CPU-3 by 1602 milliseconds, even though timestamp is apart
by only 6 milliseconds. Also since the hrtimer expiry time is in the past,
we end up spending lot of time in hardirq. From my understanding of the
code, , basenow.tv64  is computed in hrtimer_update_base()
->ktime_get_update_offsets() as timekeeper.xtime ? offs_real. Both
timekeeper.xtime and offs_real are always updated under a lock.  So, I am
still unsure on how only one core is seeing the time incorrectly.


Any inputs will be greatly help.



Thanks,

Raj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20171213/05fede11/attachment.html 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* hrtimer_interrupt time sync issues across cores
  2017-12-14  7:01 hrtimer_interrupt time sync issues across cores Rajasekaran Chandrasekaran
@ 2017-12-14  7:47 ` Greg KH
  2017-12-14 17:02 ` valdis.kletnieks at vt.edu
  1 sibling, 0 replies; 3+ messages in thread
From: Greg KH @ 2017-12-14  7:47 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Dec 13, 2017 at 11:01:57PM -0800, Rajasekaran Chandrasekaran wrote:
> Hi,
> 
> 
> In our multi-core x86 based system that is running 3.4.19 version of
> kernel, hrtimer_interrupt (called from apic_timer_interrupt) keeps looping
> in hardirq for atleast 1.6 seconds.  We use tsc as our clock source. The
> issue happens very rarely in our system and hard to reproduce.

Please note, 3.4.19 is _very_ old, almost 300 thousand kernel changes
have been done since then.  Also, your kernel is missing an untold
number of security fixes, making your systems totally insecure.

If you are stuck at this kernel version, please work with the company
that you are paying to support it, as they are the only ones that can do
this, not the community, sorry.  Also, you are paying for that support,
please use it!

best of luck!

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

* hrtimer_interrupt time sync issues across cores
  2017-12-14  7:01 hrtimer_interrupt time sync issues across cores Rajasekaran Chandrasekaran
  2017-12-14  7:47 ` Greg KH
@ 2017-12-14 17:02 ` valdis.kletnieks at vt.edu
  1 sibling, 0 replies; 3+ messages in thread
From: valdis.kletnieks at vt.edu @ 2017-12-14 17:02 UTC (permalink / raw)
  To: kernelnewbies

On Wed, 13 Dec 2017 23:01:57 -0800, Rajasekaran Chandrasekaran said:

> In our multi-core x86 based system that is running 3.4.19 version of

As Greg already pointed out, that's ancient history suitable only for kernel
archaeologists and masochists.

> Problem:
>
> Inside hrtimer_interrupt function, basenow.tv64 in CPU-3 is 1.6 seconds
> ahead of other CPU???s (we have 4 cores

I may be mis-remembering, but I think the timer code has undergone at least
2 major reworkings in the 6 years since 3.4.  So there's a good chance the bug
has already been fixed.  Of course, back-porting the fix may be close to impossible.

Good luck, you will need it.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20171214/b345566d/attachment.bin 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-12-14 17:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-14  7:01 hrtimer_interrupt time sync issues across cores Rajasekaran Chandrasekaran
2017-12-14  7:47 ` Greg KH
2017-12-14 17:02 ` valdis.kletnieks at vt.edu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.