All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Stultz <johnstul@us.ibm.com>
To: Prarit Bhargava <prarit@redhat.com>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	stable@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3)
Date: Tue, 03 Jul 2012 17:19:54 -0700	[thread overview]
Message-ID: <4FF38C2A.9080301@us.ibm.com> (raw)
In-Reply-To: <4FF30F48.3030702@redhat.com>

On 07/03/2012 08:27 AM, Prarit Bhargava wrote:
> Thanks John -- I moved to using this for testing and hit the following
> softlockup when running latest + your patchset:
>
> [ 1084.433362] BUG: soft lockup - CPU#17 stuck for 22s! [leap-a-day:1275]^M
[snip]
> [ 1084.531860] RIP: 0010:[<ffffffff810b3d57>]  [<ffffffff810b3d57>]
> smp_call_function_many+0x1f7/0x260^M
[snip]
> [ 1084.663723] Call Trace:^M
> [ 1084.666466]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
> [ 1084.672784]  [<ffffffff8107e960>] ? hrtimer_wakeup+0x30/0x30^M
> [ 1084.679107]  [<ffffffff810b3f12>] smp_call_function+0x22/0x30^M
> [ 1084.685530]  [<ffffffff810b3f78>] on_each_cpu+0x28/0x70^M
> [ 1084.691371]  [<ffffffff8107ec1c>] do_clock_was_set+0x1c/0x30^M
> [ 1084.697691]  [<ffffffff8107f005>] clock_was_set+0x55/0x60^M
> [ 1084.703732]  [<ffffffff810a6a23>] do_settimeofday+0xd3/0xe0^M
> [ 1084.709971]  [<ffffffff8105f4e5>] do_sys_settimeofday+0xb5/0x110^M
> [ 1084.716677]  [<ffffffff8105f5c3>] sys_settimeofday+0x83/0xb0^M
> [ 1084.723012]  [<ffffffff8160f129>] system_call_fastpath+0x16/0x1b^M
> [ 1084.729782] Code: f7 ff 15 95 89 b6 00 80 7d bf 00 0f 84 9c fe ff ff 41 f6 47
> 20 01 0f 84 91 fe ff ff 0f 1f 84 00 00 00 00 00 f3 90 41 f6 47 20 01 <75> f7 e9
> 7b fe ff ff 66 90 4c 89 e2 4c 89 ee 89 df e8 53 8b 21 ^M
>
> I'm taking a look now ... I'm not sure I believe the hrtimer_wakeup() calls on
> the stack.
I worked with Prarit and Thomas today to try to chase this down.

Prarit was also seeing "BUG at kernel/timer.c:1091!" problems, and once 
he sent me his config I was able to reproduce the problem. Thomas 
suggested enabling debugobjects and that quickly pointed out the 
think-o: I had mistook __hrtimer_init() as the hrtimer subsystem 
initialization, rather then what gets to initialize every hrtimer. So 
when in my patch I initialized the clock_was_set_timer there, we end up 
potentially re-initializing that timer while it is enqueued, which can 
cause the cpu its enqueued on to lockup with irqs off, which then gums 
up the smp_call_function().

The obvious fix is to initialize the clock_was_set_timer when we define it.

Thanks for Prarit for testing and noticing the problem and Thomas for 
suggesting how to isolate it!

I'm going to continue testing for a bit longer and then will send out 
the revised patchset. Hopefully I can collect some acks tomorrow and 
hopefully try to get it merged later Thursday  (I'd like for Prarit to 
get a chance to test the patch thurs before pushing it).

thanks
-john


      parent reply	other threads:[~2012-07-04  0:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-03  2:16 [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
2012-07-03  2:16 ` [PATCH 1/3] [RFC] hrtimer: Fix clock_was_set so it is safe to call from atomic John Stultz
2012-07-03  2:16 ` [PATCH 2/3] [RFC] time: Fix leapsecond triggered hrtimer/futex load spike issue John Stultz
2012-07-03  2:16 ` [PATCH 3/3] [RFC] hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
2012-07-03  6:09 ` [PATCH 0/3][RFC] Potential fix for leapsecond caused futex issue (v3) John Stultz
2012-07-03 15:27   ` Prarit Bhargava
2012-07-03 16:02     ` John Stultz
2012-07-04  0:19     ` John Stultz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FF38C2A.9080301@us.ibm.com \
    --to=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.