All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Rafael David Tinoco <inaddy@ubuntu.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Jens Axboe <axboe@kernel.dk>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Gema Gomez <gema.gomez-solano@canonical.com>,
	Christopher Arges <chris.j.arges@canonical.com>
Subject: Re: smp_call_function_single lockups
Date: Mon, 23 Feb 2015 11:32:50 -0800	[thread overview]
Message-ID: <CA+55aFx_H3-=qy09Gu2mTSGR9F31fO50eFzKVRditAxhV9p7CA@mail.gmail.com> (raw)
In-Reply-To: <CAMiJ5CU0K4b9xK+u1DRFsJBUE8Bp6NvZrUwZWHiiMRGA8Uqaig@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2187 bytes --]

On Mon, Feb 23, 2015 at 6:01 AM, Rafael David Tinoco <inaddy@ubuntu.com> wrote:
>
> This is v3.19 + your patch (smp acquire/release)
> - (nested kvm with 2 vcpus on top of proliant with x2apic cluster mode
> and acpi_idle)

Hmm. There is absolutely nothing else going on on that machine, except
for the single call to smp_call_function_single() that is waiting for
the CSD to be released.

> * It looks like we got locked because of reentrant flush_tlb_* through
> smp_call_*
> but I'll leave it to you.

No, that is all a perfectly regular callchain:

 .. native_flush_tlb_others -> smp_call_function_many ->
smp_call_function_single

but the stack contains some stale addresses (one is probably just from
smp_call_function_single() calling into "generic_exec_single()", and
thus the stack contains the return address inside
smp_call_function_single() in _addition_ to the actual place where the
watchdog timer then interrupted it).

It all really looks very regular and sane, and looks like
smp_call_function_single() is happily just waiting for the IPI to
finish in the (inlined) csd_lock_wait().

I see nothing wrong at all.

However, here's a patch to the actual x86
smp_call_function_single_interrupt() handler, which probably doesn't
make any difference at all, but does:

 - gets rid of the incredibly crazy and ugly smp_entering_irq() that
inexplicably pairs with irq_exit()

 - makes the SMP IPI functions dot he APIC ACK cycle at the *end*
rather than at the beginning.

The exact placement of the ACK should make absolutely no difference,
since interrupts should be disabled for the whole period anyway. But
I'm desperate and am flailing around worrying about x2apic bugs and
whatnot in addition to whatever bugs we migth have in this area in sw.

So maybe there is some timing thing. This makes us consistently ack
the IPI _after_ we have cleared the list of smp call actions, and
executed the list, rather than ack it before, and then go on to muck
with the list.

Plus that old code really annoyed me anyway.

So I don't see why this should matter, but since I don't see how the
bug could happen in the first place, might as well try it..

                          Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 2511 bytes --]

 arch/x86/kernel/smp.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index be8e1bde07aa..ed43259111d8 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -265,12 +265,23 @@ __visible void smp_reschedule_interrupt(struct pt_regs *regs)
 	 */
 }
 
-static inline void smp_entering_irq(void)
+/*
+ * This is the same as "entering_irq()", except it doesn't
+ * do the 'exit_idle()'.
+ *
+ * It pairs with smp_irq_exit().
+ */
+static inline void smp_irq_enter(void)
 {
-	ack_APIC_irq();
 	irq_enter();
 }
 
+static inline void smp_irq_exit(void)
+{
+	irq_exit();
+	ack_APIC_irq();
+}
+
 __visible void smp_trace_reschedule_interrupt(struct pt_regs *regs)
 {
 	/*
@@ -279,11 +290,11 @@ __visible void smp_trace_reschedule_interrupt(struct pt_regs *regs)
 	 * scheduler_ipi(). This is OK, since those functions are allowed
 	 * to nest.
 	 */
-	smp_entering_irq();
+	smp_irq_enter();
 	trace_reschedule_entry(RESCHEDULE_VECTOR);
 	__smp_reschedule_interrupt();
 	trace_reschedule_exit(RESCHEDULE_VECTOR);
-	exiting_irq();
+	smp_irq_exit();
 	/*
 	 * KVM uses this interrupt to force a cpu out of guest mode
 	 */
@@ -297,18 +308,18 @@ static inline void __smp_call_function_interrupt(void)
 
 __visible void smp_call_function_interrupt(struct pt_regs *regs)
 {
-	smp_entering_irq();
+	smp_irq_enter();
 	__smp_call_function_interrupt();
-	exiting_irq();
+	smp_irq_exit();
 }
 
 __visible void smp_trace_call_function_interrupt(struct pt_regs *regs)
 {
-	smp_entering_irq();
+	smp_irq_enter();
 	trace_call_function_entry(CALL_FUNCTION_VECTOR);
 	__smp_call_function_interrupt();
 	trace_call_function_exit(CALL_FUNCTION_VECTOR);
-	exiting_irq();
+	smp_irq_exit();
 }
 
 static inline void __smp_call_function_single_interrupt(void)
@@ -319,18 +330,18 @@ static inline void __smp_call_function_single_interrupt(void)
 
 __visible void smp_call_function_single_interrupt(struct pt_regs *regs)
 {
-	smp_entering_irq();
+	smp_irq_enter();
 	__smp_call_function_single_interrupt();
-	exiting_irq();
+	smp_irq_exit();
 }
 
 __visible void smp_trace_call_function_single_interrupt(struct pt_regs *regs)
 {
-	smp_entering_irq();
+	smp_irq_enter();
 	trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
 	__smp_call_function_single_interrupt();
 	trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
-	exiting_irq();
+	smp_irq_exit();
 }
 
 static int __init nonmi_ipi_setup(char *str)

  reply	other threads:[~2015-02-23 19:32 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-11 13:19 smp_call_function_single lockups Rafael David Tinoco
2015-02-11 18:18 ` Linus Torvalds
2015-02-11 19:59   ` Linus Torvalds
2015-02-11 20:42     ` Linus Torvalds
2015-02-12 16:38       ` Rafael David Tinoco
2015-02-18 22:25       ` Peter Zijlstra
2015-02-19 15:42         ` Rafael David Tinoco
2015-02-19 16:14           ` Linus Torvalds
2015-02-23 14:01             ` Rafael David Tinoco
2015-02-23 19:32               ` Linus Torvalds [this message]
2015-02-23 20:50                 ` Peter Zijlstra
2015-02-23 21:02                   ` Rafael David Tinoco
2015-02-19 16:16           ` Peter Zijlstra
2015-02-19 16:26           ` Linus Torvalds
2015-02-19 16:32             ` Rafael David Tinoco
2015-02-19 16:59               ` Linus Torvalds
2015-02-19 17:30                 ` Rafael David Tinoco
2015-02-19 17:39                 ` Linus Torvalds
2015-02-19 20:29                   ` Linus Torvalds
2015-02-19 21:59                     ` Linus Torvalds
2015-02-19 22:45                       ` Linus Torvalds
2015-03-31  3:15                         ` Chris J Arges
2015-03-31  4:28                           ` Linus Torvalds
2015-03-31 10:56                             ` [debug PATCHes] " Ingo Molnar
2015-03-31 22:38                               ` Chris J Arges
2015-04-01 12:39                                 ` Ingo Molnar
2015-04-01 14:10                                   ` Chris J Arges
2015-04-01 14:55                                     ` Ingo Molnar
2015-03-31  4:46                           ` Linus Torvalds
2015-03-31 15:08                           ` Linus Torvalds
2015-03-31 22:23                             ` Chris J Arges
2015-03-31 23:07                               ` Linus Torvalds
2015-04-01 14:32                                 ` Chris J Arges
2015-04-01 15:36                                   ` Linus Torvalds
2015-04-02  9:55                                     ` Ingo Molnar
2015-04-02 17:35                                       ` Linus Torvalds
2015-04-01 12:43                               ` Ingo Molnar
2015-04-01 16:10                                 ` Chris J Arges
2015-04-01 16:14                                   ` Linus Torvalds
2015-04-01 21:59                                     ` Chris J Arges
2015-04-02 17:31                                       ` Linus Torvalds
2015-04-02 18:26                                         ` Ingo Molnar
2015-04-02 18:51                                           ` Chris J Arges
2015-04-02 19:07                                             ` Ingo Molnar
2015-04-02 20:57                                               ` Linus Torvalds
2015-04-02 21:13                                               ` Chris J Arges
2015-04-03  5:43                                                 ` [PATCH] smp/call: Detect stuck CSD locks Ingo Molnar
2015-04-03  5:47                                                   ` Ingo Molnar
2015-04-06 16:58                                                   ` Chris J Arges
2015-04-06 17:32                                                     ` Linus Torvalds
2015-04-07  9:21                                                       ` Ingo Molnar
2015-04-07 20:59                                                         ` Chris J Arges
2015-04-07 21:15                                                           ` Linus Torvalds
2015-04-08  6:47                                                           ` Ingo Molnar
2015-04-13  3:56                                                             ` Chris J Arges
2015-04-13  6:14                                                               ` Ingo Molnar
2015-04-15 19:54                                                                 ` Chris J Arges
2015-04-16 11:04                                                                   ` Ingo Molnar
2015-04-16 15:58                                                                     ` Chris J Arges
2015-04-16 16:31                                                                       ` Ingo Molnar
2015-04-29 21:08                                                                         ` Chris J Arges
2015-05-11 14:00                                                                           ` Ingo Molnar
2015-05-20 18:19                                                                             ` Chris J Arges
2015-04-03  5:45                                                 ` smp_call_function_single lockups Ingo Molnar
2015-04-06 17:23                                         ` Chris J Arges
2015-02-20  9:30                     ` Ingo Molnar
2015-02-20 16:49                       ` Linus Torvalds
2015-02-20 19:41                         ` Ingo Molnar
2015-02-20 20:03                           ` Linus Torvalds
2015-02-20 20:11                             ` Ingo Molnar
2015-03-20 10:15       ` Peter Zijlstra
2015-03-20 16:26         ` Linus Torvalds
2015-03-20 17:14           ` Mike Galbraith
2015-04-01 14:22       ` Frederic Weisbecker
2015-04-18 10:13       ` [tip:locking/urgent] smp: Fix smp_call_function_single_async() locking tip-bot for Linus Torvalds
2015-02-22  8:59 smp_call_function_single lockups Daniel J Blueman
2015-02-22 10:37 ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFx_H3-=qy09Gu2mTSGR9F31fO50eFzKVRditAxhV9p7CA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=chris.j.arges@canonical.com \
    --cc=fweisbec@gmail.com \
    --cc=gema.gomez-solano@canonical.com \
    --cc=inaddy@ubuntu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.