[PATCH v3 0/2] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/2] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue
@ 2014-05-11 20:36 Srivatsa S. Bhat
  2014-05-11 20:36 ` [PATCH v3 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
  2014-05-11 20:37 ` [PATCH v3 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
  0 siblings, 2 replies; 10+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-11 20:36 UTC (permalink / raw)
  To: peterz, tglx, mingo, tj, rusty, akpm, fweisbec, hch
  Cc: mgorman, riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel, srivatsa.bhat

Hi,

There is a long-standing problem related to CPU hotplug which causes IPIs to
be delivered to offline CPUs, and the smp-call-function IPI handler code
prints out a warning whenever this is detected. Every once in a while this
(usually harmless) warning gets reported on LKML, but so far it has not been
completely fixed. Usually the solution involves finding out the IPI sender
and fixing it by adding appropriate synchronization with CPU hotplug.

However, while going through one such internal bug reports, I found that
there is a significant bug in the receiver side itself (more specifically,
in stop-machine) that can lead to this problem even when the sender code
is perfectly fine. This patchset fixes that synchronization problem in the
CPU hotplug stop-machine code.

Patch 1 adds some additional debug code to the smp-call-function framework,
to help debug such issues easily.

Patch 2 modifies the stop-machine code to ensure that any IPIs that were sent
while the target CPU was online, would be noticed and handled by that CPU
without fail before it goes offline. Thus, this avoids scenarios where IPIs
are received on offline CPUs (as long as the sender uses proper hotplug
synchronization).

In fact, I debugged the problem by using Patch 1, and found that the
payload of the IPI was always the block layer's trigger_softirq() function.
But I was not able to find anything wrong with the block layer code. That's
when I started looking at the stop-machine code and realized that there is
a race-window which makes the IPI _receiver_ the culprit, not the sender.
Patch 2 fixes that race and hence this should put an end to most of the
hard-to-debug IPI-to-offline-CPU issues.

Changes in v3:

Rewrote patch 2 and split the MULTI_STOP_DISABLE_IRQ state into two:
MULTI_STOP_DISABLE_IRQ_INACTIVE and MULTI_STOP_DISABLE_IRQ_ACTIVE, and
used this framework to ensure that the CPU going offline always disables
its interrupts last. Suggested by Tejun Heo.

v1 and v2:
https://lkml.org/lkml/2014/5/6/474

 Srivatsa S. Bhat (2):
      smp: Print more useful debug info upon receiving IPI on an offline CPU
      CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"

 kernel/smp.c          |   18 ++++++++++++++----
 kernel/stop_machine.c |   25 ++++++++++++++++++++++---
 2 files changed, 36 insertions(+), 7 deletions(-)

Thanks,
Srivatsa S. Bhat
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU
  2014-05-11 20:36 [PATCH v3 0/2] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue Srivatsa S. Bhat
@ 2014-05-11 20:36 ` Srivatsa S. Bhat
  2014-05-13 15:38   ` Frederic Weisbecker
  2014-05-11 20:37 ` [PATCH v3 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
  1 sibling, 1 reply; 10+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-11 20:36 UTC (permalink / raw)
  To: peterz, tglx, mingo, tj, rusty, akpm, fweisbec, hch
  Cc: mgorman, riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel, srivatsa.bhat

Today the smp-call-function code just prints a warning if we get an IPI on
an offline CPU. This info is sufficient to let us know that something went
wrong, but often it is very hard to debug exactly who sent the IPI and why,
from this info alone.

In most cases, we get the warning about the IPI to an offline CPU, immediately
after the CPU going offline comes out of the stop-machine phase and reenables
interrupts. Since all online CPUs participate in stop-machine, the information
regarding the sender of the IPI is already lost by the time we exit the
stop-machine loop. So even if we dump the stack on each CPU at this point,
we won't find anything useful since all of them will show the stack-trace of
the stopper thread. So we need a better way to figure out who sent the IPI and
why.

To achieve this, when we detect an IPI targeted to an offline CPU, loop through
the call-single-data linked list and print out the payload (i.e., the name
of the function which was supposed to be executed by the target CPU). This
would give us an insight as to who might have sent the IPI and help us debug
this further.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..f864921 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -185,14 +185,24 @@ void generic_smp_call_function_single_interrupt(void)
 {
 	struct llist_node *entry;
 	struct call_single_data *csd, *csd_next;
+	static bool warned;
+
+	entry = llist_del_all(&__get_cpu_var(call_single_queue));
+	entry = llist_reverse_order(entry);

 	/*
 	 * Shouldn't receive this interrupt on a cpu that is not yet online.
 	 */
-	WARN_ON_ONCE(!cpu_online(smp_processor_id()));
-
-	entry = llist_del_all(&__get_cpu_var(call_single_queue));
-	entry = llist_reverse_order(entry);
+	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
+		warned = true;
+		WARN_ON(1);
+		/*
+		 * We don't have to use the _safe() variant here
+		 * because we are not invoking the IPI handlers yet.
+		 */
+		llist_for_each_entry(csd, entry, llist)
+			pr_warn("SMP IPI Payload: %pS \n", csd->func);
+	}

 	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
 		csd->func(csd->info);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-11 20:36 [PATCH v3 0/2] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue Srivatsa S. Bhat
  2014-05-11 20:36 ` [PATCH v3 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
@ 2014-05-11 20:37 ` Srivatsa S. Bhat
  2014-05-12 20:57   ` Tejun Heo
  1 sibling, 1 reply; 10+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-11 20:37 UTC (permalink / raw)
  To: peterz, tglx, mingo, tj, rusty, akpm, fweisbec, hch
  Cc: mgorman, riel, bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw,
	linux-kernel, srivatsa.bhat

During CPU offline, stop-machine is used to take control over all the online
CPUs (via the per-cpu stopper thread) and then run take_cpu_down() on the CPU
that is to be taken offline.

But stop-machine itself has several stages: _PREPARE, _DISABLE_IRQ, _RUN etc.
The important thing to note here is that the _DISABLE_IRQ stage comes much
later after starting stop-machine, and hence there is a large window where
other CPUs can send IPIs to the CPU going offline. As a result, we can
encounter a scenario as depicted below, which causes IPIs to be sent to the
CPU going offline, and that CPU notices them *after* it has gone offline,
triggering the "IPI-to-offline-CPU" warning from the smp-call-function code.

              CPU 1                                         CPU 2
          (Online CPU)                               (CPU going offline)

       Enter _PREPARE stage                          Enter _PREPARE stage

                                                     Enter _DISABLE_IRQ stage

                                                   =
       Got a device interrupt,                     | Didn't notice the IPI
       and the interrupt handler                   | since interrupts were
       called smp_call_function()                  | disabled on this CPU.
       and sent an IPI to CPU 2.                   |
                                                   =

       Enter _DISABLE_IRQ stage

       Enter _RUN stage                              Enter _RUN stage

                                  =
       Busy loop with interrupts  |                  Invoke take_cpu_down()
       disabled.                  |                  and take CPU 2 offline
                                  =

       Enter _EXIT stage                             Enter _EXIT stage

       Re-enable interrupts                          Re-enable interrupts

                                                     The pending IPI is noted
                                                     immediately, but alas,
                                                     the CPU is offline at
                                                     this point.

So, as we can observe from this scenario, the IPI was sent when CPU 2 was
still online, and hence it was perfectly legal. But unfortunately it was
noted only after CPU 2 went offline, resulting in the warning from the
IPI handling code. In other words, the fault was not at the sender, but
at the receiver side - and if we look closely, the real bug is in the
stop-machine sequence itself.

The problem here is that the CPU going offline disabled its local interrupts
(by entering _DISABLE_IRQ phase) *before* the other CPUs. And that's the
reason why it was not able to respond to the IPI before going offline.

A simple solution to this problem is to ensure that the CPU going offline
disables its interrupts only *after* the other CPUs do the same thing.
To achieve this, split the _DISABLE_IRQ state into 2 parts:

1st part: MULTI_STOP_DISABLE_IRQ_INACTIVE, where only the non-active CPUs
(i.e., the "other" CPUs) disable their interrupts.

2nd part: MULTI_STOP_DISABLE_IRQ_ACTIVE, where the active CPU (i.e., the
CPU going offline) disables its interrupts.

With this in place, the CPU going offline will always be the last one to
disable interrupts. After this step, no further IPIs can be sent to the
outgoing CPU, since all the other CPUs would be executing the stop-machine
code with interrupts disabled. And by the time stop-machine ends, the CPU
would have gone offline and disappeared from the cpu_online_mask, and hence
future invocations of smp_call_function() and friends will automatically
prune that CPU out. Thus, we can guarantee that no CPU will end up
*inadvertently* sending IPIs to an offline CPU.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/stop_machine.c |   25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 01fbae5..cac8590 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -130,8 +130,10 @@ enum multi_stop_state {
 	MULTI_STOP_NONE,
 	/* Awaiting everyone to be scheduled. */
 	MULTI_STOP_PREPARE,
-	/* Disable interrupts. */
-	MULTI_STOP_DISABLE_IRQ,
+	/* Disable interrupts on CPUs not in ->active_cpus mask. */
+	MULTI_STOP_DISABLE_IRQ_INACTIVE,
+	/* Disable interrupts on CPUs in ->active_cpus mask. */
+	MULTI_STOP_DISABLE_IRQ_ACTIVE,
 	/* Run the function */
 	MULTI_STOP_RUN,
 	/* Exit */
@@ -189,10 +191,27 @@ static int multi_cpu_stop(void *data)
 	do {
 		/* Chill out and ensure we re-read multi_stop_state. */
 		cpu_relax();
+
+		/*
+		 * In the case of CPU offline, we don't want the other CPUs to
+		 * send IPIs to the active_cpu (the one going offline) after it
+		 * has disabled interrupts in the _DISABLE_IRQ state (because,
+		 * then it will notice the IPIs only after it goes offline). So
+		 * we split this state into _INACTIVE and _ACTIVE, and thereby
+		 * ensure that the active_cpu disables interrupts only after
+		 * the other CPUs do the same thing.
+		 */
+
 		if (msdata->state != curstate) {
 			curstate = msdata->state;
 			switch (curstate) {
-			case MULTI_STOP_DISABLE_IRQ:
+			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
+				if (is_active)
+					break;
+
+				/* Else, fall-through */
+
+			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
 				local_irq_disable();
 				hard_irq_disable();
 				break;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-11 20:37 ` [PATCH v3 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
@ 2014-05-12 20:57   ` Tejun Heo
  2014-05-13  9:02     ` [PATCH v4 " Srivatsa S. Bhat
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2014-05-12 20:57 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: peterz, tglx, mingo, rusty, akpm, fweisbec, hch, mgorman, riel,
	bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

Hello,

On Mon, May 12, 2014 at 02:07:04AM +0530, Srivatsa S. Bhat wrote:
> @@ -189,10 +191,27 @@ static int multi_cpu_stop(void *data)
>  	do {
>  		/* Chill out and ensure we re-read multi_stop_state. */
>  		cpu_relax();
> +
> +		/*
> +		 * In the case of CPU offline, we don't want the other CPUs to
> +		 * send IPIs to the active_cpu (the one going offline) after it
> +		 * has disabled interrupts in the _DISABLE_IRQ state (because,
> +		 * then it will notice the IPIs only after it goes offline). So
> +		 * we split this state into _INACTIVE and _ACTIVE, and thereby
> +		 * ensure that the active_cpu disables interrupts only after
> +		 * the other CPUs do the same thing.
> +		 */

It probably would be clearer to first describe what's going on and
then provide rationale for that.  IOW, state that inactive cpus
disable irqs first and then explain why that's done.  The above
paragraph looks somewhat out of place as is.

> +
>  		if (msdata->state != curstate) {
>  			curstate = msdata->state;
>  			switch (curstate) {
> -			case MULTI_STOP_DISABLE_IRQ:
> +			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
> +				if (is_active)
> +					break;
> +
> +				/* Else, fall-through */
> +
> +			case MULTI_STOP_DISABLE_IRQ_ACTIVE:

Wouldn't it be cleaner to do the following?

	case MULTI_STOP_DISABLE_IRQ_INACTIVE:
		if (!is_active) {
			disable;
		}
		break;
	case MULTI_STOP_DISABLE_IRQ_ACTIVE:
		if (is_active) {
			disable;
		}
		break;

The duplicated amount is trivial and what's going on would be far
clearer.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-12 20:57   ` Tejun Heo
@ 2014-05-13  9:02     ` Srivatsa S. Bhat
  2014-05-13 15:57       ` Frederic Weisbecker
  0 siblings, 1 reply; 10+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-13  9:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: peterz, tglx, mingo, rusty, akpm, fweisbec, hch, mgorman, riel,
	bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On 05/13/2014 02:27 AM, Tejun Heo wrote:
> Hello,
> 
> On Mon, May 12, 2014 at 02:07:04AM +0530, Srivatsa S. Bhat wrote:
>> @@ -189,10 +191,27 @@ static int multi_cpu_stop(void *data)
>>  	do {
>>  		/* Chill out and ensure we re-read multi_stop_state. */
>>  		cpu_relax();
>> +
>> +		/*
>> +		 * In the case of CPU offline, we don't want the other CPUs to
>> +		 * send IPIs to the active_cpu (the one going offline) after it
>> +		 * has disabled interrupts in the _DISABLE_IRQ state (because,
>> +		 * then it will notice the IPIs only after it goes offline). So
>> +		 * we split this state into _INACTIVE and _ACTIVE, and thereby
>> +		 * ensure that the active_cpu disables interrupts only after
>> +		 * the other CPUs do the same thing.
>> +		 */
> 
> It probably would be clearer to first describe what's going on and
> then provide rationale for that.  IOW, state that inactive cpus
> disable irqs first and then explain why that's done.  The above
> paragraph looks somewhat out of place as is.
> 

Ok..

>> +
>>  		if (msdata->state != curstate) {
>>  			curstate = msdata->state;
>>  			switch (curstate) {
>> -			case MULTI_STOP_DISABLE_IRQ:
>> +			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
>> +				if (is_active)
>> +					break;
>> +
>> +				/* Else, fall-through */
>> +
>> +			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
> 
> Wouldn't it be cleaner to do the following?
> 
> 	case MULTI_STOP_DISABLE_IRQ_INACTIVE:
> 		if (!is_active) {
> 			disable;
> 		}
> 		break;
> 	case MULTI_STOP_DISABLE_IRQ_ACTIVE:
> 		if (is_active) {
> 			disable;
> 		}
> 		break;
>

Well, I wrote it this way the first time and later thought of using
the switch fall-through mechanism to avoid the duplication :-)

> The duplicated amount is trivial and what's going on would be far
> clearer.
>

But yeah, I agree that the expanded form is less cryptic and hence
better for readability.

How about the updated version below?

Regards,
Srivatsa S. Bhat

-------------------------------------------------------------------

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
[PATCH v4 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"

During CPU offline, stop-machine is used to take control over all the online
CPUs (via the per-cpu stopper thread) and then run take_cpu_down() on the CPU
that is to be taken offline.

But stop-machine itself has several stages: _PREPARE, _DISABLE_IRQ, _RUN etc.
The important thing to note here is that the _DISABLE_IRQ stage comes much
later after starting stop-machine, and hence there is a large window where
other CPUs can send IPIs to the CPU going offline. As a result, we can
encounter a scenario as depicted below, which causes IPIs to be sent to the
CPU going offline, and that CPU notices them *after* it has gone offline,
triggering the "IPI-to-offline-CPU" warning from the smp-call-function code.

              CPU 1                                         CPU 2
          (Online CPU)                               (CPU going offline)

       Enter _PREPARE stage                          Enter _PREPARE stage

                                                     Enter _DISABLE_IRQ stage

                                                   =
       Got a device interrupt,                     | Didn't notice the IPI
       and the interrupt handler                   | since interrupts were
       called smp_call_function()                  | disabled on this CPU.
       and sent an IPI to CPU 2.                   |
                                                   =

       Enter _DISABLE_IRQ stage

       Enter _RUN stage                              Enter _RUN stage

                                  =
       Busy loop with interrupts  |                  Invoke take_cpu_down()
       disabled.                  |                  and take CPU 2 offline
                                  =

       Enter _EXIT stage                             Enter _EXIT stage

       Re-enable interrupts                          Re-enable interrupts

                                                     The pending IPI is noted
                                                     immediately, but alas,
                                                     the CPU is offline at
                                                     this point.

So, as we can observe from this scenario, the IPI was sent when CPU 2 was
still online, and hence it was perfectly legal. But unfortunately it was
noted only after CPU 2 went offline, resulting in the warning from the
IPI handling code. In other words, the fault was not at the sender, but
at the receiver side - and if we look closely, the real bug is in the
stop-machine sequence itself.

The problem here is that the CPU going offline disabled its local interrupts
(by entering _DISABLE_IRQ phase) *before* the other CPUs. And that's the
reason why it was not able to respond to the IPI before going offline.

A simple solution to this problem is to ensure that the CPU going offline
disables its interrupts only *after* the other CPUs do the same thing.
To achieve this, split the _DISABLE_IRQ state into 2 parts:

1st part: MULTI_STOP_DISABLE_IRQ_INACTIVE, where only the non-active CPUs
(i.e., the "other" CPUs) disable their interrupts.

2nd part: MULTI_STOP_DISABLE_IRQ_ACTIVE, where the active CPU (i.e., the
CPU going offline) disables its interrupts.

With this in place, the CPU going offline will always be the last one to
disable interrupts. After this step, no further IPIs can be sent to the
outgoing CPU, since all the other CPUs would be executing the stop-machine
code with interrupts disabled. And by the time stop-machine ends, the CPU
would have gone offline and disappeared from the cpu_online_mask, and hence
future invocations of smp_call_function() and friends will automatically
prune that CPU out. Thus, we can guarantee that no CPU will end up
*inadvertently* sending IPIs to an offline CPU.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/stop_machine.c |   39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 01fbae5..288f7fe 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -130,8 +130,10 @@ enum multi_stop_state {
 	MULTI_STOP_NONE,
 	/* Awaiting everyone to be scheduled. */
 	MULTI_STOP_PREPARE,
-	/* Disable interrupts. */
-	MULTI_STOP_DISABLE_IRQ,
+	/* Disable interrupts on CPUs not in ->active_cpus mask. */
+	MULTI_STOP_DISABLE_IRQ_INACTIVE,
+	/* Disable interrupts on CPUs in ->active_cpus mask. */
+	MULTI_STOP_DISABLE_IRQ_ACTIVE,
 	/* Run the function */
 	MULTI_STOP_RUN,
 	/* Exit */
@@ -189,12 +191,39 @@ static int multi_cpu_stop(void *data)
 	do {
 		/* Chill out and ensure we re-read multi_stop_state. */
 		cpu_relax();
+
+		/*
+		 * We use 2 separate stages to disable interrupts, namely
+		 * _INACTIVE and _ACTIVE, to ensure that the inactive CPUs
+		 * disable their interrupts first, followed by the active CPUs.
+		 *
+		 * This is done to avoid a race in the CPU offline path, which
+		 * can lead to receiving IPIs on the outgoing CPU *after* it
+		 * has gone offline.
+		 *
+		 * During CPU offline, we don't want the other CPUs to send
+		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
+		 * disabled interrupts (because, then it will notice the IPIs
+		 * only after it has gone offline). We can prevent this by
+		 * making the other CPUs disable their interrupts first - that
+		 * way, they will run the stop-machine code with interrupts
+		 * disabled, and hence won't send IPIs after that point.
+		 */
+
 		if (msdata->state != curstate) {
 			curstate = msdata->state;
 			switch (curstate) {
-			case MULTI_STOP_DISABLE_IRQ:
-				local_irq_disable();
-				hard_irq_disable();
+			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
+				if (!is_active) {
+					local_irq_disable();
+					hard_irq_disable();
+				}
+				break;
+			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
+				if (is_active) {
+					local_irq_disable();
+					hard_irq_disable();
+				}
 				break;
 			case MULTI_STOP_RUN:
 				if (is_active)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU
  2014-05-11 20:36 ` [PATCH v3 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
@ 2014-05-13 15:38   ` Frederic Weisbecker
  2014-05-15  6:42     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2014-05-13 15:38 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: peterz, tglx, mingo, tj, rusty, akpm, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On Mon, May 12, 2014 at 02:06:49AM +0530, Srivatsa S. Bhat wrote:
> Today the smp-call-function code just prints a warning if we get an IPI on
> an offline CPU. This info is sufficient to let us know that something went
> wrong, but often it is very hard to debug exactly who sent the IPI and why,
> from this info alone.
> 
> In most cases, we get the warning about the IPI to an offline CPU, immediately
> after the CPU going offline comes out of the stop-machine phase and reenables
> interrupts. Since all online CPUs participate in stop-machine, the information
> regarding the sender of the IPI is already lost by the time we exit the
> stop-machine loop. So even if we dump the stack on each CPU at this point,
> we won't find anything useful since all of them will show the stack-trace of
> the stopper thread. So we need a better way to figure out who sent the IPI and
> why.
> 
> To achieve this, when we detect an IPI targeted to an offline CPU, loop through
> the call-single-data linked list and print out the payload (i.e., the name
> of the function which was supposed to be executed by the target CPU). This
> would give us an insight as to who might have sent the IPI and help us debug
> this further.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  kernel/smp.c |   18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 06d574e..f864921 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -185,14 +185,24 @@ void generic_smp_call_function_single_interrupt(void)
>  {
>  	struct llist_node *entry;
>  	struct call_single_data *csd, *csd_next;
> +	static bool warned;
> +
> +	entry = llist_del_all(&__get_cpu_var(call_single_queue));
> +	entry = llist_reverse_order(entry);
>  
>  	/*
>  	 * Shouldn't receive this interrupt on a cpu that is not yet online.
>  	 */
> -	WARN_ON_ONCE(!cpu_online(smp_processor_id()));
> -
> -	entry = llist_del_all(&__get_cpu_var(call_single_queue));
> -	entry = llist_reverse_order(entry);
> +	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
> +		warned = true;
> +		WARN_ON(1);

More details may be better:

WARN_ONCE(1, "IPI on offline CPU");

> +		/*
> +		 * We don't have to use the _safe() variant here
> +		 * because we are not invoking the IPI handlers yet.
> +		 */
> +		llist_for_each_entry(csd, entry, llist)
> +			pr_warn("SMP IPI Payload: %pS \n", csd->func);

Payload is kind of vague. How about "IPI func %pS sent on offline CPU".

> +	}
>  
>  	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
>  		csd->func(csd->info);
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-13  9:02     ` [PATCH v4 " Srivatsa S. Bhat
@ 2014-05-13 15:57       ` Frederic Weisbecker
  2014-05-15  6:54         ` Srivatsa S. Bhat
  0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2014-05-13 15:57 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Tejun Heo, peterz, tglx, mingo, rusty, akpm, hch, mgorman, riel,
	bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On Tue, May 13, 2014 at 02:32:00PM +0530, Srivatsa S. Bhat wrote:
> 
>  kernel/stop_machine.c |   39 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 01fbae5..288f7fe 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -130,8 +130,10 @@ enum multi_stop_state {
>  	MULTI_STOP_NONE,
>  	/* Awaiting everyone to be scheduled. */
>  	MULTI_STOP_PREPARE,
> -	/* Disable interrupts. */
> -	MULTI_STOP_DISABLE_IRQ,
> +	/* Disable interrupts on CPUs not in ->active_cpus mask. */
> +	MULTI_STOP_DISABLE_IRQ_INACTIVE,
> +	/* Disable interrupts on CPUs in ->active_cpus mask. */
> +	MULTI_STOP_DISABLE_IRQ_ACTIVE,
>  	/* Run the function */
>  	MULTI_STOP_RUN,
>  	/* Exit */
> @@ -189,12 +191,39 @@ static int multi_cpu_stop(void *data)
>  	do {
>  		/* Chill out and ensure we re-read multi_stop_state. */
>  		cpu_relax();
> +
> +		/*
> +		 * We use 2 separate stages to disable interrupts, namely
> +		 * _INACTIVE and _ACTIVE, to ensure that the inactive CPUs
> +		 * disable their interrupts first, followed by the active CPUs.
> +		 *
> +		 * This is done to avoid a race in the CPU offline path, which
> +		 * can lead to receiving IPIs on the outgoing CPU *after* it
> +		 * has gone offline.
> +		 *
> +		 * During CPU offline, we don't want the other CPUs to send
> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
> +		 * disabled interrupts (because, then it will notice the IPIs
> +		 * only after it has gone offline). We can prevent this by
> +		 * making the other CPUs disable their interrupts first - that
> +		 * way, they will run the stop-machine code with interrupts
> +		 * disabled, and hence won't send IPIs after that point.
> +		 */
> +
>  		if (msdata->state != curstate) {
>  			curstate = msdata->state;
>  			switch (curstate) {
> -			case MULTI_STOP_DISABLE_IRQ:
> -				local_irq_disable();
> -				hard_irq_disable();
> +			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
> +				if (!is_active) {
> +					local_irq_disable();
> +					hard_irq_disable();
> +				}
> +				break;
> +			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
> +				if (is_active) {
> +					local_irq_disable();
> +					hard_irq_disable();

I have no idea about possible IPI latencies due to hardware. But are we sure that a stop
machine transition state is enough to make sure we get a pending IPI? Shouldn't we have
some sort of IPI flush in between, like polling on call_single_queue?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU
  2014-05-13 15:38   ` Frederic Weisbecker
@ 2014-05-15  6:42     ` Srivatsa S. Bhat
  2014-05-15 14:16       ` Frederic Weisbecker
  0 siblings, 1 reply; 10+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-15  6:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: peterz, tglx, mingo, tj, rusty, akpm, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On 05/13/2014 09:08 PM, Frederic Weisbecker wrote:
> On Mon, May 12, 2014 at 02:06:49AM +0530, Srivatsa S. Bhat wrote:
>> Today the smp-call-function code just prints a warning if we get an IPI on
>> an offline CPU. This info is sufficient to let us know that something went
>> wrong, but often it is very hard to debug exactly who sent the IPI and why,
>> from this info alone.
>>
>> In most cases, we get the warning about the IPI to an offline CPU, immediately
>> after the CPU going offline comes out of the stop-machine phase and reenables
>> interrupts. Since all online CPUs participate in stop-machine, the information
>> regarding the sender of the IPI is already lost by the time we exit the
>> stop-machine loop. So even if we dump the stack on each CPU at this point,
>> we won't find anything useful since all of them will show the stack-trace of
>> the stopper thread. So we need a better way to figure out who sent the IPI and
>> why.
>>
>> To achieve this, when we detect an IPI targeted to an offline CPU, loop through
>> the call-single-data linked list and print out the payload (i.e., the name
>> of the function which was supposed to be executed by the target CPU). This
>> would give us an insight as to who might have sent the IPI and help us debug
>> this further.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>>
>>  kernel/smp.c |   18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index 06d574e..f864921 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -185,14 +185,24 @@ void generic_smp_call_function_single_interrupt(void)
>>  {
>>  	struct llist_node *entry;
>>  	struct call_single_data *csd, *csd_next;
>> +	static bool warned;
>> +
>> +	entry = llist_del_all(&__get_cpu_var(call_single_queue));
>> +	entry = llist_reverse_order(entry);
>>  
>>  	/*
>>  	 * Shouldn't receive this interrupt on a cpu that is not yet online.
>>  	 */
>> -	WARN_ON_ONCE(!cpu_online(smp_processor_id()));
>> -
>> -	entry = llist_del_all(&__get_cpu_var(call_single_queue));
>> -	entry = llist_reverse_order(entry);
>> +	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
>> +		warned = true;
>> +		WARN_ON(1);
> 
> More details may be better:
> 
> WARN_ONCE(1, "IPI on offline CPU");
>

Sure, that sounds better.
 
>> +		/*
>> +		 * We don't have to use the _safe() variant here
>> +		 * because we are not invoking the IPI handlers yet.
>> +		 */
>> +		llist_for_each_entry(csd, entry, llist)
>> +			pr_warn("SMP IPI Payload: %pS \n", csd->func);
> 
> Payload is kind of vague. How about "IPI func %pS sent on offline CPU".
> 

Ok, and maybe s/func/function and s/on/to ?

>> +	}
>>  
>>  	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
>>  		csd->func(csd->info);
>>
> 

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
  2014-05-13 15:57       ` Frederic Weisbecker
@ 2014-05-15  6:54         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 10+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-15  6:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Tejun Heo, peterz, tglx, mingo, rusty, akpm, hch, mgorman, riel,
	bp, rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On 05/13/2014 09:27 PM, Frederic Weisbecker wrote:
> On Tue, May 13, 2014 at 02:32:00PM +0530, Srivatsa S. Bhat wrote:
>>
>>  kernel/stop_machine.c |   39 ++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 34 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index 01fbae5..288f7fe 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -130,8 +130,10 @@ enum multi_stop_state {
>>  	MULTI_STOP_NONE,
>>  	/* Awaiting everyone to be scheduled. */
>>  	MULTI_STOP_PREPARE,
>> -	/* Disable interrupts. */
>> -	MULTI_STOP_DISABLE_IRQ,
>> +	/* Disable interrupts on CPUs not in ->active_cpus mask. */
>> +	MULTI_STOP_DISABLE_IRQ_INACTIVE,
>> +	/* Disable interrupts on CPUs in ->active_cpus mask. */
>> +	MULTI_STOP_DISABLE_IRQ_ACTIVE,
>>  	/* Run the function */
>>  	MULTI_STOP_RUN,
>>  	/* Exit */
>> @@ -189,12 +191,39 @@ static int multi_cpu_stop(void *data)
>>  	do {
>>  		/* Chill out and ensure we re-read multi_stop_state. */
>>  		cpu_relax();
>> +
>> +		/*
>> +		 * We use 2 separate stages to disable interrupts, namely
>> +		 * _INACTIVE and _ACTIVE, to ensure that the inactive CPUs
>> +		 * disable their interrupts first, followed by the active CPUs.
>> +		 *
>> +		 * This is done to avoid a race in the CPU offline path, which
>> +		 * can lead to receiving IPIs on the outgoing CPU *after* it
>> +		 * has gone offline.
>> +		 *
>> +		 * During CPU offline, we don't want the other CPUs to send
>> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
>> +		 * disabled interrupts (because, then it will notice the IPIs
>> +		 * only after it has gone offline). We can prevent this by
>> +		 * making the other CPUs disable their interrupts first - that
>> +		 * way, they will run the stop-machine code with interrupts
>> +		 * disabled, and hence won't send IPIs after that point.
>> +		 */
>> +
>>  		if (msdata->state != curstate) {
>>  			curstate = msdata->state;
>>  			switch (curstate) {
>> -			case MULTI_STOP_DISABLE_IRQ:
>> -				local_irq_disable();
>> -				hard_irq_disable();
>> +			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
>> +				if (!is_active) {
>> +					local_irq_disable();
>> +					hard_irq_disable();
>> +				}
>> +				break;
>> +			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
>> +				if (is_active) {
>> +					local_irq_disable();
>> +					hard_irq_disable();
> 
> I have no idea about possible IPI latencies due to hardware. But are we sure that a stop
> machine transition state is enough to make sure we get a pending IPI? Shouldn't we have
> some sort of IPI flush in between, like polling on call_single_queue?
> 

That might not be actually required, but the concept of flushing out
all pending work before going offline (irrespective of whether the
outgoing CPU got the corresponding IPIs in time or not) sounds like
a good idea, because we can guarantee that any late IPIs landing on
the CPU (and thus generating the warning) will be completely harmless.

We can empty the call_single_queue after disabling interrupts in
the _ACTIVE stage. That way, we can guarantee that all pending IPI
functions have been executed by the outgoing CPU (even if the
corresponding IPIs come a bit later). Also, no new IPI work can be
assigned to that CPU beyond the _INACTIVE stage.

Thanks for the suggestion, Frederic!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU
  2014-05-15  6:42     ` Srivatsa S. Bhat
@ 2014-05-15 14:16       ` Frederic Weisbecker
  0 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2014-05-15 14:16 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: peterz, tglx, mingo, tj, rusty, akpm, hch, mgorman, riel, bp,
	rostedt, mgalbraith, ego, paulmck, oleg, rjw, linux-kernel

On Thu, May 15, 2014 at 12:12:17PM +0530, Srivatsa S. Bhat wrote:
> On 05/13/2014 09:08 PM, Frederic Weisbecker wrote:
> > On Mon, May 12, 2014 at 02:06:49AM +0530, Srivatsa S. Bhat wrote:
> >> Today the smp-call-function code just prints a warning if we get an IPI on
> >> an offline CPU. This info is sufficient to let us know that something went
> >> wrong, but often it is very hard to debug exactly who sent the IPI and why,
> >> from this info alone.
> >>
> >> In most cases, we get the warning about the IPI to an offline CPU, immediately
> >> after the CPU going offline comes out of the stop-machine phase and reenables
> >> interrupts. Since all online CPUs participate in stop-machine, the information
> >> regarding the sender of the IPI is already lost by the time we exit the
> >> stop-machine loop. So even if we dump the stack on each CPU at this point,
> >> we won't find anything useful since all of them will show the stack-trace of
> >> the stopper thread. So we need a better way to figure out who sent the IPI and
> >> why.
> >>
> >> To achieve this, when we detect an IPI targeted to an offline CPU, loop through
> >> the call-single-data linked list and print out the payload (i.e., the name
> >> of the function which was supposed to be executed by the target CPU). This
> >> would give us an insight as to who might have sent the IPI and help us debug
> >> this further.
> >>
> >> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> >> ---
> >>
> >>  kernel/smp.c |   18 ++++++++++++++----
> >>  1 file changed, 14 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/kernel/smp.c b/kernel/smp.c
> >> index 06d574e..f864921 100644
> >> --- a/kernel/smp.c
> >> +++ b/kernel/smp.c
> >> @@ -185,14 +185,24 @@ void generic_smp_call_function_single_interrupt(void)
> >>  {
> >>  	struct llist_node *entry;
> >>  	struct call_single_data *csd, *csd_next;
> >> +	static bool warned;
> >> +
> >> +	entry = llist_del_all(&__get_cpu_var(call_single_queue));
> >> +	entry = llist_reverse_order(entry);
> >>  
> >>  	/*
> >>  	 * Shouldn't receive this interrupt on a cpu that is not yet online.
> >>  	 */
> >> -	WARN_ON_ONCE(!cpu_online(smp_processor_id()));
> >> -
> >> -	entry = llist_del_all(&__get_cpu_var(call_single_queue));
> >> -	entry = llist_reverse_order(entry);
> >> +	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
> >> +		warned = true;
> >> +		WARN_ON(1);
> > 
> > More details may be better:
> > 
> > WARN_ONCE(1, "IPI on offline CPU");
> >
> 
> Sure, that sounds better.
>  
> >> +		/*
> >> +		 * We don't have to use the _safe() variant here
> >> +		 * because we are not invoking the IPI handlers yet.
> >> +		 */
> >> +		llist_for_each_entry(csd, entry, llist)
> >> +			pr_warn("SMP IPI Payload: %pS \n", csd->func);
> > 
> > Payload is kind of vague. How about "IPI func %pS sent on offline CPU".
> > 
> 
> Ok, and maybe s/func/function and s/on/to ?

Yeah looks good.

Thanks.

> 
> >> +	}
> >>  
> >>  	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> >>  		csd->func(csd->info);
> >>
> > 
> 
> Regards,
> Srivatsa S. Bhat
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-05-15 14:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-11 20:36 [PATCH v3 0/2] CPU hotplug: Fix the long-standing "IPI to offline CPU" issue Srivatsa S. Bhat
2014-05-11 20:36 ` [PATCH v3 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU Srivatsa S. Bhat
2014-05-13 15:38   ` Frederic Weisbecker
2014-05-15  6:42     ` Srivatsa S. Bhat
2014-05-15 14:16       ` Frederic Weisbecker
2014-05-11 20:37 ` [PATCH v3 2/2] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU" Srivatsa S. Bhat
2014-05-12 20:57   ` Tejun Heo
2014-05-13  9:02     ` [PATCH v4 " Srivatsa S. Bhat
2014-05-13 15:57       ` Frederic Weisbecker
2014-05-15  6:54         ` Srivatsa S. Bhat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).