From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
To: paulmck@linux.vnet.ibm.com
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>,
Fengguang Wu <fengguang.wu@intel.com>, LKP <lkp@01.org>,
linux-kernel@vger.kernel.org,
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
linux-arm-kernel@lists.infradead.org,
Arnd Bergmann <arnd@arndb.de>, MarkRutland <mark.rutland@arm.com>
Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]
Date: Wed, 04 Feb 2015 16:22:28 +0100 [thread overview]
Message-ID: <1423063348.24415.10.camel@AMDC1943> (raw)
In-Reply-To: <20150204151028.GD5370@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 3139 bytes --]
On śro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> > On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > > +Cc some ARM people
> > > >
> > > > I wish that people would CC this list with problems seen on ARM. I'm
> > > > minded to just ignore this message because of this in the hope that by
> > > > doing so, people will learn something...
> > > >
> > > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > > was departing, which would allow a call to as follows to be placed
> > > > > > immediately after the complete():
> > > > > >
> > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > >
> > > > > > Note: This absolutely requires that the rcu_cpu_notify() -always-
> > > > > > be allowed to execute!!! This will not work if there is -any- possibility
> > > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > > rcu_cpu_notify() returns.
> > > >
> > > > Exactly, so that's not going to be possible. The completion at that
> > > > point marks the point at which power _could_ be removed from the CPU
> > > > going down.
> > >
> > > OK, sounds like a polling loop is required.
> >
> > I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> > and clearing the bit on CPU being powered down. What do you think about
> > such idea?
>
> Hmmm... It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
> which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
> finish_wait(). These are in the scheduler, but this is being called from
> the CPU that remains online, so that should be OK.
>
> But what do you invoke on the outgoing CPU? Can you get away with
> simply clearing the bit, or do you also have to do a wakeup? It looks
> to me like a wakeup is required, which would be illegal on the outgoing
> CPU, which is at a point where it cannot legally invoke the scheduler.
> Or am I missing something?
Actually the timeout versions but I think that doesn't matter.
The wait_on_bit will busy-loop with testing for the bit. Inside the loop
it calls the 'action' which in my case will be bit_wait_io_timeout().
This calls schedule_timeout().
See proof of concept in attachment. One observed issue: hot unplug from
commandline takes a lot more time. About 7 seconds instead of ~0.5.
Probably I did something wrong.
>
> You know, this situation is giving me a bad case of nostalgia for the
> old Sequent Symmetry and NUMA-Q hardware. On those platforms, the
> outgoing CPU could turn itself off, and thus didn't need to tell some
> other CPU when it was ready to be turned off. Seems to me that this
> self-turn-off capability would be a great feature for future systems!
There are a lot more issues with hotplug on ARM...
Patch/RFC attached.
[-- Attachment #2: 0001-ARM-Don-t-use-complete-during-__cpu_die.patch --]
[-- Type: text/x-patch, Size: 2311 bytes --]
>From feaad18a483871747170fa797f80b49592489ad1 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Date: Wed, 4 Feb 2015 16:14:41 +0100
Subject: [RFC] ARM: Don't use complete() during __cpu_die
The complete() should not be used on offlined CPU.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
arch/arm/kernel/smp.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 86ef244c5a24..f3a5ad80a253 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -26,6 +26,7 @@
#include <linux/completion.h>
#include <linux/cpufreq.h>
#include <linux/irq_work.h>
+#include <linux/wait.h>
#include <linux/atomic.h>
#include <asm/smp.h>
@@ -76,6 +77,9 @@ enum ipi_msg_type {
static DECLARE_COMPLETION(cpu_running);
+#define CPU_DIE_WAIT_BIT 0
+static unsigned long wait_cpu_die;
+
static struct smp_operations smp_ops;
void __init smp_set_ops(struct smp_operations *ops)
@@ -133,7 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
}
-
+ set_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
memset(&secondary_data, 0, sizeof(secondary_data));
return ret;
}
@@ -213,7 +217,17 @@ int __cpu_disable(void)
return 0;
}
-static DECLARE_COMPLETION(cpu_died);
+static int wait_for_cpu_die(void)
+{
+ might_sleep();
+
+ if (!test_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die))
+ return 0;
+
+ return out_of_line_wait_on_bit_timeout(&wait_cpu_die, CPU_DIE_WAIT_BIT,
+ bit_wait_timeout, TASK_UNINTERRUPTIBLE,
+ msecs_to_jiffies(5000));
+}
/*
* called on the thread which is asking for a CPU to be shutdown -
@@ -221,7 +235,7 @@ static DECLARE_COMPLETION(cpu_died);
*/
void __cpu_die(unsigned int cpu)
{
- if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
+ if (wait_for_cpu_die()) {
pr_err("CPU%u: cpu didn't die\n", cpu);
return;
}
@@ -267,7 +281,7 @@ void __ref cpu_die(void)
* this returns, power and/or clocks can be removed at any point
* from this CPU and its cache by platform_cpu_kill().
*/
- complete(&cpu_died);
+ clear_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
/*
* Ensure that the cache lines associated with that completion are
--
1.9.1
next prev parent reply other threads:[~2015-02-04 15:34 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-01 2:59 [rcu] [ INFO: suspicious RCU usage. ] Fengguang Wu
2015-02-03 10:01 ` Krzysztof Kozlowski
2015-02-03 16:27 ` Paul E. McKenney
2015-02-04 11:39 ` Krzysztof Kozlowski
2015-02-04 13:00 ` Russell King - ARM Linux
2015-02-04 13:14 ` Paul E. McKenney
2015-02-04 14:16 ` Krzysztof Kozlowski
2015-02-04 15:10 ` Paul E. McKenney
2015-02-04 15:16 ` Russell King - ARM Linux
2015-02-04 15:46 ` Paul E. McKenney
2015-02-04 15:22 ` Krzysztof Kozlowski [this message]
2015-02-04 15:56 ` Paul E. McKenney
2015-02-04 16:10 ` Krzysztof Kozlowski
2015-02-04 16:28 ` Paul E. McKenney
2015-02-04 16:43 ` Krzysztof Kozlowski
2015-02-04 13:13 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1423063348.24415.10.camel@AMDC1943 \
--to=k.kozlowski@samsung.com \
--cc=arnd@arndb.de \
--cc=b.zolnierkie@samsung.com \
--cc=fengguang.wu@intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=lkp@01.org \
--cc=mark.rutland@arm.com \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).