All of lore.kernel.org
 help / color / mirror / Atom feed
* RCU related performance regression in 3.3
@ 2012-04-04 15:27 Josh Boyer
  2012-04-04 21:36 ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Josh Boyer @ 2012-04-04 15:27 UTC (permalink / raw)
  To: paulmck; +Cc: linux-kernel, kernel-team

Hi Paul,

We've had a few reports of some boot slowdowns with the 3.3 rebase we
did in Fedora 16.  One of our users volunteered to bisect a vanilla
kernel and wound up at "rcu: Permit dyntick-idle with callbacks pending
(commit 7cb924990)".  You can find more details in the bug [1], but it's
been reported on both physical hardware and in virtual machines.

Have you seen anything like this in your testing?  Given the user used a
vanilla 3.3 kernel, I'm wondering if there is a targetted fix that might
be backported.  If so, it would be good to get that headed to the 3.3.y
stable tree.

josh

[1] https://bugzilla.redhat.com/show_bug.cgi?id=806548

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-04 15:27 RCU related performance regression in 3.3 Josh Boyer
@ 2012-04-04 21:36 ` Paul E. McKenney
  2012-04-05 12:37   ` Josh Boyer
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-04-04 21:36 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linux-kernel, kernel-team

On Wed, Apr 04, 2012 at 11:27:27AM -0400, Josh Boyer wrote:
> Hi Paul,
> 
> We've had a few reports of some boot slowdowns with the 3.3 rebase we
> did in Fedora 16.  One of our users volunteered to bisect a vanilla
> kernel and wound up at "rcu: Permit dyntick-idle with callbacks pending
> (commit 7cb924990)".  You can find more details in the bug [1], but it's
> been reported on both physical hardware and in virtual machines.
> 
> Have you seen anything like this in your testing?  Given the user used a
> vanilla 3.3 kernel, I'm wondering if there is a targetted fix that might
> be backported.  If so, it would be good to get that headed to the 3.3.y
> stable tree.
> 
> josh
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=806548

I will look into this.  In the meantime, does setting
CONFIG_RCU_FAST_NO_HZ=n get rid of the slowdowns?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-04 21:36 ` Paul E. McKenney
@ 2012-04-05 12:37   ` Josh Boyer
  2012-04-05 14:00     ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Josh Boyer @ 2012-04-05 12:37 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, kernel-team, pascal.chapperon

On Wed, Apr 04, 2012 at 02:36:33PM -0700, Paul E. McKenney wrote:
> On Wed, Apr 04, 2012 at 11:27:27AM -0400, Josh Boyer wrote:
> > Hi Paul,
> > 
> > We've had a few reports of some boot slowdowns with the 3.3 rebase we
> > did in Fedora 16.  One of our users volunteered to bisect a vanilla
> > kernel and wound up at "rcu: Permit dyntick-idle with callbacks pending
> > (commit 7cb924990)".  You can find more details in the bug [1], but it's
> > been reported on both physical hardware and in virtual machines.
> > 
> > Have you seen anything like this in your testing?  Given the user used a
> > vanilla 3.3 kernel, I'm wondering if there is a targetted fix that might
> > be backported.  If so, it would be good to get that headed to the 3.3.y
> > stable tree.
> > 
> > josh
> > 
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=806548
> 
> I will look into this.  In the meantime, does setting
> CONFIG_RCU_FAST_NO_HZ=n get rid of the slowdowns?

Seems so.  Pascal tried 3.4-rc1 with and without that set and found that
disabling it lead to consistent timings.  He put his findings in the
bug, and I've also now CC'd him.

josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-05 12:37   ` Josh Boyer
@ 2012-04-05 14:00     ` Paul E. McKenney
  2012-04-05 14:15       ` Pascal CHAPPERON
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-04-05 14:00 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linux-kernel, kernel-team, pascal.chapperon

On Thu, Apr 05, 2012 at 08:37:59AM -0400, Josh Boyer wrote:
> On Wed, Apr 04, 2012 at 02:36:33PM -0700, Paul E. McKenney wrote:
> > On Wed, Apr 04, 2012 at 11:27:27AM -0400, Josh Boyer wrote:
> > > Hi Paul,
> > > 
> > > We've had a few reports of some boot slowdowns with the 3.3 rebase we
> > > did in Fedora 16.  One of our users volunteered to bisect a vanilla
> > > kernel and wound up at "rcu: Permit dyntick-idle with callbacks pending
> > > (commit 7cb924990)".  You can find more details in the bug [1], but it's
> > > been reported on both physical hardware and in virtual machines.
> > > 
> > > Have you seen anything like this in your testing?  Given the user used a
> > > vanilla 3.3 kernel, I'm wondering if there is a targetted fix that might
> > > be backported.  If so, it would be good to get that headed to the 3.3.y
> > > stable tree.
> > > 
> > > josh
> > > 
> > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=806548
> > 
> > I will look into this.  In the meantime, does setting
> > CONFIG_RCU_FAST_NO_HZ=n get rid of the slowdowns?
> 
> Seems so.  Pascal tried 3.4-rc1 with and without that set and found that
> disabling it lead to consistent timings.  He put his findings in the
> bug, and I've also now CC'd him.

Thank you for the info!  Is the performance problem limited to boot time,
or are there performance problems when the system is up and running?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-05 14:00     ` Paul E. McKenney
@ 2012-04-05 14:15       ` Pascal CHAPPERON
  2012-04-05 14:39         ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal CHAPPERON @ 2012-04-05 14:15 UTC (permalink / raw)
  To: paulmck, Josh Boyer; +Cc: linux-kernel, kernel-team

Hello,

I didn't notice any significant slowdown while the system is up and running. A full kernel compilation (make -j 16) takes 14mn with both 3.2.10 and 3.3.0.

Pascal


> Message du 05/04/12 16:01
> De : "Paul E. McKenney" 
> A : "Josh Boyer" 
> Copie à : linux-kernel@vger.kernel.org, kernel-team@fedoraproject.org, pascal.chapperon@wanadoo.fr
> Objet : Re: RCU related performance regression in 3.3
> 
> On Thu, Apr 05, 2012 at 08:37:59AM -0400, Josh Boyer wrote:
> > On Wed, Apr 04, 2012 at 02:36:33PM -0700, Paul E. McKenney wrote:
> > > On Wed, Apr 04, 2012 at 11:27:27AM -0400, Josh Boyer wrote:
> > > > Hi Paul,
> > > > 
> > > > We've had a few reports of some boot slowdowns with the 3.3 rebase we
> > > > did in Fedora 16. One of our users volunteered to bisect a vanilla
> > > > kernel and wound up at "rcu: Permit dyntick-idle with callbacks pending
> > > > (commit 7cb924990)". You can find more details in the bug [1], but it's
> > > > been reported on both physical hardware and in virtual machines.
> > > > 
> > > > Have you seen anything like this in your testing? Given the user used a
> > > > vanilla 3.3 kernel, I'm wondering if there is a targetted fix that might
> > > > be backported. If so, it would be good to get that headed to the 3.3.y
> > > > stable tree.
> > > > 
> > > > josh
> > > > 
> > > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=806548
> > > 
> > > I will look into this. In the meantime, does setting
> > > CONFIG_RCU_FAST_NO_HZ=n get rid of the slowdowns?
> > 
> > Seems so. Pascal tried 3.4-rc1 with and without that set and found that
> > disabling it lead to consistent timings. He put his findings in the
> > bug, and I've also now CC'd him.
> 
> Thank you for the info! Is the performance problem limited to boot time,
> or are there performance problems when the system is up and running?
> 
> Thanx, Paul
> 
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-05 14:15       ` Pascal CHAPPERON
@ 2012-04-05 14:39         ` Paul E. McKenney
  2012-04-06  9:18           ` Pascal Chapperon
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-04-05 14:39 UTC (permalink / raw)
  To: Pascal CHAPPERON; +Cc: Josh Boyer, linux-kernel, kernel-team

On Thu, Apr 05, 2012 at 04:15:33PM +0200, Pascal CHAPPERON wrote:
> Hello,
> 
> I didn't notice any significant slowdown while the system is up and running. A full kernel compilation (make -j 16) takes 14mn with both 3.2.10 and 3.3.0.

OK, so the natural approach is to disable CONFIG_RCU_FAST_NO_HZ at
boot time.  Unfortunately, you appear to need it to remain disabled
through at least filesystem mounting, which if I understand correctly
happens long after system_state gets set to SYSTEM_RUNNING.

If RCU has some way to find out when init is complete, I can easily
make it so that CONFIG_RCU_FAST_NO_HZ optimizes for speed during boot
and energy efficiency during runtime.

One thing I could easily do would be to provide a sysfs parameter or
some such that allows the boot process to enable energy-efficiency
mode at runtime.  I would much prefer to make this automatic, though.

Other thoughts?

							Thanx, Paul

> Pascal
> 
> 
> > Message du 05/04/12 16:01
> > De : "Paul E. McKenney" 
> > A : "Josh Boyer" 
> > Copie à : linux-kernel@vger.kernel.org, kernel-team@fedoraproject.org, pascal.chapperon@wanadoo.fr
> > Objet : Re: RCU related performance regression in 3.3
> > 
> > On Thu, Apr 05, 2012 at 08:37:59AM -0400, Josh Boyer wrote:
> > > On Wed, Apr 04, 2012 at 02:36:33PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Apr 04, 2012 at 11:27:27AM -0400, Josh Boyer wrote:
> > > > > Hi Paul,
> > > > > 
> > > > > We've had a few reports of some boot slowdowns with the 3.3 rebase we
> > > > > did in Fedora 16. One of our users volunteered to bisect a vanilla
> > > > > kernel and wound up at "rcu: Permit dyntick-idle with callbacks pending
> > > > > (commit 7cb924990)". You can find more details in the bug [1], but it's
> > > > > been reported on both physical hardware and in virtual machines.
> > > > > 
> > > > > Have you seen anything like this in your testing? Given the user used a
> > > > > vanilla 3.3 kernel, I'm wondering if there is a targetted fix that might
> > > > > be backported. If so, it would be good to get that headed to the 3.3.y
> > > > > stable tree.
> > > > > 
> > > > > josh
> > > > > 
> > > > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=806548
> > > > 
> > > > I will look into this. In the meantime, does setting
> > > > CONFIG_RCU_FAST_NO_HZ=n get rid of the slowdowns?
> > > 
> > > Seems so. Pascal tried 3.4-rc1 with and without that set and found that
> > > disabling it lead to consistent timings. He put his findings in the
> > > bug, and I've also now CC'd him.
> > 
> > Thank you for the info! Is the performance problem limited to boot time,
> > or are there performance problems when the system is up and running?
> > 
> > Thanx, Paul
> > 
> >
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-05 14:39         ` Paul E. McKenney
@ 2012-04-06  9:18           ` Pascal Chapperon
  2012-04-10 16:07             ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal Chapperon @ 2012-04-06  9:18 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

> Message du 05/04/12 16:40
> De : "Paul E. McKenney" 
> A : "Pascal CHAPPERON" 
> Copie à : "Josh Boyer" , linux-kernel@vger.kernel.org, kernel-team@fedoraproject.org
> Objet : Re: RCU related performance regression in 3.3
> 
> On Thu, Apr 05, 2012 at 04:15:33PM +0200, Pascal CHAPPERON wrote:
> > Hello,
> > 
> > I didn't notice any significant slowdown while the system is up and running. 
A full kernel compilation (make -j 16) takes 14mn with both 3.2.10 and 3.3.0.
> 
> OK, so the natural approach is to disable CONFIG_RCU_FAST_NO_HZ at
> boot time. Unfortunately, you appear to need it to remain disabled
> through at least filesystem mounting, which if I understand correctly
> happens long after system_state gets set to SYSTEM_RUNNING.
> 
In fact, I need it to remain disable until all the systemd units are completed. 
Some units, such as NetworkManager can take longer time to complete with
RCU_FAST_NO_HZ enabled.
And i need it to be disabled at shutdown, as umounting cgroups, sysfs, etc.
plus old-root mounting can take one plain second for each umounting.

> If RCU has some way to find out when init is complete, I can easily
> make it so that CONFIG_RCU_FAST_NO_HZ optimizes for speed during boot
> and energy efficiency during runtime.
> 
I said that I didn't noticed significant slowdown during runtime, but my
laptop usage is basic. Some specific tasks similar to systemd may 
perhaps be impacted by this feature.
I can test a task/program that could stress RCU_FAST_NO_HZ if any ?

> One thing I could easily do would be to provide a sysfs parameter or
> some such that allows the boot process to enable energy-efficiency
> mode at runtime. I would much prefer to make this automatic, though.
> 
So the feature is disabled until you trigger a sysfs parameter, and can be
disabled before shutdown ? It would be fair, at least for hardware like my
own.

> Other thoughts?
> 
Do you think that the culprit is a buggy hardware in my laptop, or the 
number of cpu/threads ?

> Thanx, Paul
> 

Resent in plain text because rejected. Sorry, i forgot the rules.

Pascal

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-06  9:18           ` Pascal Chapperon
@ 2012-04-10 16:07             ` Paul E. McKenney
  2012-04-11 15:06               ` Pascal
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-04-10 16:07 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Fri, Apr 06, 2012 at 11:18:03AM +0200, Pascal Chapperon wrote:
> > Message du 05/04/12 16:40
> > De : "Paul E. McKenney" 
> > A : "Pascal CHAPPERON" 
> > Copie à : "Josh Boyer" , linux-kernel@vger.kernel.org, kernel-team@fedoraproject.org
> > Objet : Re: RCU related performance regression in 3.3
> > 
> > On Thu, Apr 05, 2012 at 04:15:33PM +0200, Pascal CHAPPERON wrote:
> > > Hello,
> > > 
> > > I didn't notice any significant slowdown while the system is up and running. 
> A full kernel compilation (make -j 16) takes 14mn with both 3.2.10 and 3.3.0.

OK, good.

> > OK, so the natural approach is to disable CONFIG_RCU_FAST_NO_HZ at
> > boot time. Unfortunately, you appear to need it to remain disabled
> > through at least filesystem mounting, which if I understand correctly
> > happens long after system_state gets set to SYSTEM_RUNNING.
> > 
> In fact, I need it to remain disable until all the systemd units are completed. 
> Some units, such as NetworkManager can take longer time to complete with
> RCU_FAST_NO_HZ enabled.
> And i need it to be disabled at shutdown, as umounting cgroups, sysfs, etc.
> plus old-root mounting can take one plain second for each umounting.

OK...

> > If RCU has some way to find out when init is complete, I can easily
> > make it so that CONFIG_RCU_FAST_NO_HZ optimizes for speed during boot
> > and energy efficiency during runtime.
> > 
> I said that I didn't noticed significant slowdown during runtime, but my
> laptop usage is basic. Some specific tasks similar to systemd may 
> perhaps be impacted by this feature.
> I can test a task/program that could stress RCU_FAST_NO_HZ if any ?

One thing to try first -- could you please check boot/shutdown slowdown
with the patch below?

But yes, there are things like modifying netfilter rules and updating
security configuration that might be affected.

> > One thing I could easily do would be to provide a sysfs parameter or
> > some such that allows the boot process to enable energy-efficiency
> > mode at runtime. I would much prefer to make this automatic, though.
> > 
> So the feature is disabled until you trigger a sysfs parameter, and can be
> disabled before shutdown ? It would be fair, at least for hardware like my
> own.

That is the thought, though again I would really really prefer that this
be automated.

> > Other thoughts?
> > 
> Do you think that the culprit is a buggy hardware in my laptop, or the 
> number of cpu/threads ?

Maybe just more filesystems to mount?

> > Thanx, Paul
> > 
> 
> Resent in plain text because rejected. Sorry, i forgot the rules.

No problem!  Please see below for the patch.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Decrease RCU_FAST_NO_HZ timer interval

Use of RCU_FAST_NO_HZ reportedly significantly slowed boot time in
configurations involving lots of filesystem mounts.  This commit attempts
to work around this by decreasing the RCU_FAST_NO_HZ timer interval
from 6 jiffies down to 3 jiffies.

Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c023464..980a4bb 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1975,7 +1975,7 @@ static void rcu_prepare_for_idle(int cpu)
  */
 #define RCU_IDLE_FLUSHES 5		/* Number of dyntick-idle tries. */
 #define RCU_IDLE_OPT_FLUSHES 3		/* Optional dyntick-idle tries. */
-#define RCU_IDLE_GP_DELAY 6		/* Roughly one grace period. */
+#define RCU_IDLE_GP_DELAY 3		/* Roughly one grace period. */
 #define RCU_IDLE_LAZY_GP_DELAY (6 * HZ)	/* Roughly six seconds. */
 
 static DEFINE_PER_CPU(int, rcu_dyntick_drain);


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-10 16:07             ` Paul E. McKenney
@ 2012-04-11 15:06               ` Pascal
  2012-04-12 18:04                 ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal @ 2012-04-11 15:06 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 10/04/2012 18:07, Paul E. McKenney a écrit :
> On Fri, Apr 06, 2012 at 11:18:03AM +0200, Pascal Chapperon wrote:
>>> Message du 05/04/12 16:40
>>> De : "Paul E. McKenney"
>>> A : "Pascal CHAPPERON"
>>> Copie à : "Josh Boyer" , linux-kernel@vger.kernel.org, kernel-team@fedoraproject.org
>>> Objet : Re: RCU related performance regression in 3.3
>>>
>>> On Thu, Apr 05, 2012 at 04:15:33PM +0200, Pascal CHAPPERON wrote:
>>>> Hello,
>>>>
>>>> I didn't notice any significant slowdown while the system is up and running.
>> A full kernel compilation (make -j 16) takes 14mn with both 3.2.10 and 3.3.0.
>
> OK, good.
>
>>> OK, so the natural approach is to disable CONFIG_RCU_FAST_NO_HZ at
>>> boot time. Unfortunately, you appear to need it to remain disabled
>>> through at least filesystem mounting, which if I understand correctly
>>> happens long after system_state gets set to SYSTEM_RUNNING.
>>>
>> In fact, I need it to remain disable until all the systemd units are completed.
>> Some units, such as NetworkManager can take longer time to complete with
>> RCU_FAST_NO_HZ enabled.
>> And i need it to be disabled at shutdown, as umounting cgroups, sysfs, etc.
>> plus old-root mounting can take one plain second for each umounting.
>
> OK...
>
>>> If RCU has some way to find out when init is complete, I can easily
>>> make it so that CONFIG_RCU_FAST_NO_HZ optimizes for speed during boot
>>> and energy efficiency during runtime.
>>>
>> I said that I didn't noticed significant slowdown during runtime, but my
>> laptop usage is basic. Some specific tasks similar to systemd may
>> perhaps be impacted by this feature.
>> I can test a task/program that could stress RCU_FAST_NO_HZ if any ?
>
> One thing to try first -- could you please check boot/shutdown slowdown
> with the patch below?
>
> But yes, there are things like modifying netfilter rules and updating
> security configuration that might be affected.
>
>>> One thing I could easily do would be to provide a sysfs parameter or
>>> some such that allows the boot process to enable energy-efficiency
>>> mode at runtime. I would much prefer to make this automatic, though.
>>>
>> So the feature is disabled until you trigger a sysfs parameter, and can be
>> disabled before shutdown ? It would be fair, at least for hardware like my
>> own.
>
> That is the thought, though again I would really really prefer that this
> be automated.
>
>>> Other thoughts?
>>>
>> Do you think that the culprit is a buggy hardware in my laptop, or the
>> number of cpu/threads ?
>
> Maybe just more filesystems to mount?
>
>>> Thanx, Paul
>>>
>>
>> Resent in plain text because rejected. Sorry, i forgot the rules.
>
> No problem!  Please see below for the patch.
>
> 							Thanx, Paul
>
RCU_IDLE_GP_DELAY=3 instead of 6 does not improve significantly
startup time. Shutdown is shorter, but there are still some delays on
umounting some sysfs or oldroot mounts (not always the same).
Startup time always varies randomly from 12s to 22s (8s stable without
RCU_FAST_NO_HZ).
The tasks taking time during startup are rarely the same from boot to
boot,and some of them run after filesystems mounting.
Example : "console-kit-system-log-start.service" systemd unit took 5s to
complete in my last try, and 1s in the previous run. This one occurs
after mountings.
I enabled CONFIG_RCU_TRACE, and hey, the result in debugfs is beyond
my knowledge :(
Do you want some data from sys/kernel/debug/rcu (rcudata, ...) ?

Pascal


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-11 15:06               ` Pascal
@ 2012-04-12 18:04                 ` Paul E. McKenney
  2012-04-16 21:02                   ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-04-12 18:04 UTC (permalink / raw)
  To: Pascal; +Cc: Josh Boyer, linux-kernel, kernel-team

On Wed, Apr 11, 2012 at 05:06:54PM +0200, Pascal wrote:
> Le 10/04/2012 18:07, Paul E. McKenney a écrit :
> >On Fri, Apr 06, 2012 at 11:18:03AM +0200, Pascal Chapperon wrote:
> >>>Message du 05/04/12 16:40
> >>>De : "Paul E. McKenney"
> >>>A : "Pascal CHAPPERON"
> >>>Copie à : "Josh Boyer" , linux-kernel@vger.kernel.org, kernel-team@fedoraproject.org
> >>>Objet : Re: RCU related performance regression in 3.3
> >>>
> >>>On Thu, Apr 05, 2012 at 04:15:33PM +0200, Pascal CHAPPERON wrote:
> >>>>Hello,
> >>>>
> >>>>I didn't notice any significant slowdown while the system is up and running.
> >>A full kernel compilation (make -j 16) takes 14mn with both 3.2.10 and 3.3.0.
> >
> >OK, good.
> >
> >>>OK, so the natural approach is to disable CONFIG_RCU_FAST_NO_HZ at
> >>>boot time. Unfortunately, you appear to need it to remain disabled
> >>>through at least filesystem mounting, which if I understand correctly
> >>>happens long after system_state gets set to SYSTEM_RUNNING.
> >>>
> >>In fact, I need it to remain disable until all the systemd units are completed.
> >>Some units, such as NetworkManager can take longer time to complete with
> >>RCU_FAST_NO_HZ enabled.
> >>And i need it to be disabled at shutdown, as umounting cgroups, sysfs, etc.
> >>plus old-root mounting can take one plain second for each umounting.
> >
> >OK...
> >
> >>>If RCU has some way to find out when init is complete, I can easily
> >>>make it so that CONFIG_RCU_FAST_NO_HZ optimizes for speed during boot
> >>>and energy efficiency during runtime.
> >>>
> >>I said that I didn't noticed significant slowdown during runtime, but my
> >>laptop usage is basic. Some specific tasks similar to systemd may
> >>perhaps be impacted by this feature.
> >>I can test a task/program that could stress RCU_FAST_NO_HZ if any ?
> >
> >One thing to try first -- could you please check boot/shutdown slowdown
> >with the patch below?
> >
> >But yes, there are things like modifying netfilter rules and updating
> >security configuration that might be affected.
> >
> >>>One thing I could easily do would be to provide a sysfs parameter or
> >>>some such that allows the boot process to enable energy-efficiency
> >>>mode at runtime. I would much prefer to make this automatic, though.
> >>>
> >>So the feature is disabled until you trigger a sysfs parameter, and can be
> >>disabled before shutdown ? It would be fair, at least for hardware like my
> >>own.
> >
> >That is the thought, though again I would really really prefer that this
> >be automated.
> >
> >>>Other thoughts?
> >>>
> >>Do you think that the culprit is a buggy hardware in my laptop, or the
> >>number of cpu/threads ?
> >
> >Maybe just more filesystems to mount?
> >
> >>>Thanx, Paul
> >>>
> >>
> >>Resent in plain text because rejected. Sorry, i forgot the rules.
> >
> >No problem!  Please see below for the patch.
> >
> >							Thanx, Paul
> >
> RCU_IDLE_GP_DELAY=3 instead of 6 does not improve significantly
> startup time. Shutdown is shorter, but there are still some delays on
> umounting some sysfs or oldroot mounts (not always the same).
> Startup time always varies randomly from 12s to 22s (8s stable without
> RCU_FAST_NO_HZ).
> The tasks taking time during startup are rarely the same from boot to
> boot,and some of them run after filesystems mounting.
> Example : "console-kit-system-log-start.service" systemd unit took 5s to
> complete in my last try, and 1s in the previous run. This one occurs
> after mountings.
> I enabled CONFIG_RCU_TRACE, and hey, the result in debugfs is beyond
> my knowledge :(
> Do you want some data from sys/kernel/debug/rcu (rcudata, ...) ?

So it seems that mount and unmount operations are often slower with
RCU_FAST_NO_HZ during boot and shutdown.  Are these operations also
slower during runtime?  If so, the RCU event tracing across both a fast
and a slow mount or unmount operation would likely be quite helpful.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-12 18:04                 ` Paul E. McKenney
@ 2012-04-16 21:02                   ` Paul E. McKenney
  2012-04-18  9:37                     ` Pascal Chapperon
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-04-16 21:02 UTC (permalink / raw)
  To: Pascal; +Cc: Josh Boyer, linux-kernel, kernel-team

On Thu, Apr 12, 2012 at 11:04:32AM -0700, Paul E. McKenney wrote:
> On Wed, Apr 11, 2012 at 05:06:54PM +0200, Pascal wrote:

[ . . .]

> > RCU_IDLE_GP_DELAY=3 instead of 6 does not improve significantly
> > startup time. Shutdown is shorter, but there are still some delays on
> > umounting some sysfs or oldroot mounts (not always the same).
> > Startup time always varies randomly from 12s to 22s (8s stable without
> > RCU_FAST_NO_HZ).
> > The tasks taking time during startup are rarely the same from boot to
> > boot,and some of them run after filesystems mounting.
> > Example : "console-kit-system-log-start.service" systemd unit took 5s to
> > complete in my last try, and 1s in the previous run. This one occurs
> > after mountings.
> > I enabled CONFIG_RCU_TRACE, and hey, the result in debugfs is beyond
> > my knowledge :(
> > Do you want some data from sys/kernel/debug/rcu (rcudata, ...) ?
> 
> So it seems that mount and unmount operations are often slower with
> RCU_FAST_NO_HZ during boot and shutdown.  Are these operations also
> slower during runtime?  If so, the RCU event tracing across both a fast
> and a slow mount or unmount operation would likely be quite helpful.

Actually, one other possibility is that RCU_FAST_NO_HZ's timer is
being migrated.  If you get a chance, could you please try out the
diagnostic patch below?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Check for timer migration for RCU_FAST_NO_HZ

If RCU_FAST_NO_HZ's timer is migrated, then the CPU that went dyntick-idle
with callbacks might never wake up, which could indefinitely postpone
invocation of its callbacks, which could in turn result in a system hang.
But if the timer is migrated, then it might actually fire.  In contrast,
if it remains on the CPU that posted it, it is guaranteed to be cancelled.

This patch therefore adds a WARN_ON_ONCE() to this timer's handler as
a diagnostic test.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c023464..67ee640 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2053,6 +2053,7 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
  */
 static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
 {
+	WARN_ON_ONCE(1);
 	trace_rcu_prep_idle("Timer");
 	return HRTIMER_NORESTART;
 }


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-16 21:02                   ` Paul E. McKenney
@ 2012-04-18  9:37                     ` Pascal Chapperon
  2012-04-18 14:01                       ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal Chapperon @ 2012-04-18  9:37 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 16/04/2012 23:02, Paul E. McKenney a écrit :
.
>> So it seems that mount and unmount operations are often slower with
>> RCU_FAST_NO_HZ during boot and shutdown.  Are these operations also
>> slower during runtime?  If so, the RCU event tracing across both a fast
>> and a slow mount or unmount operation would likely be quite helpful.
-
Mount and umount operations are not slower with RCU_FAST_NO_HZ during
runtime; systemctl start and stop operations are also not slower. In
fact, i couldn't find a single operation slower during runtime with
RCU_FAST_NO_HZ.
-
>
> Actually, one other possibility is that RCU_FAST_NO_HZ's timer is
> being migrated.  If you get a chance, could you please try out the
> diagnostic patch below?
>
> 							Thanx, Paul
>
> ------------------------------------------------------------------------
>
> rcu: Check for timer migration for RCU_FAST_NO_HZ
>
> If RCU_FAST_NO_HZ's timer is migrated, then the CPU that went dyntick-idle
> with callbacks might never wake up, which could indefinitely postpone
> invocation of its callbacks, which could in turn result in a system hang.
> But if the timer is migrated, then it might actually fire.  In contrast,
> if it remains on the CPU that posted it, it is guaranteed to be cancelled.
>
> This patch therefore adds a WARN_ON_ONCE() to this timer's handler as
> a diagnostic test.
>
> Signed-off-by: Paul E. McKenney<paulmck@linux.vnet.ibm.com>
>
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index c023464..67ee640 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -2053,6 +2053,7 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
>    */
>   static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
>   {
> +	WARN_ON_ONCE(1);
>   	trace_rcu_prep_idle("Timer");
>   	return HRTIMER_NORESTART;
>   }
>
>
-
The result below.
Pascal
[    0.758325] ------------[ cut here ]------------
[    0.758330] WARNING: at kernel/rcutree_plugin.h:2056
rcu_idle_gp_timer_func+0x27/0x30()
[    0.758332] Hardware name: GX780R/GT780R/GT780DXR/GT783R
[    0.758334] Modules linked in:
[    0.758337] Pid: 0, comm: swapper/0 Not tainted 3.4.0-rc2-rcu+ #38
[    0.758338] Call Trace:
[    0.758340]  <IRQ>
[<ffffffff81057a1f>]warn_slowpath_common+0x7f/0xc0
[    0.758348]  [<ffffffff81057a7a>] warn_slowpath_null+0x1a/0x20
[    0.758350]  [<ffffffff810e61e7>] rcu_idle_gp_timer_func+0x27/0x30
[    0.758354]  [<ffffffff8107d5f1>] __run_hrtimer+0x71/0x1e0
[    0.758357]  [<ffffffff810e61c0>] ? rcu_batches_completed+0x20/0x20
[    0.758360]  [<ffffffff8107df3b>] hrtimer_interrupt+0xeb/0x210
[    0.758365]  [<ffffffff81601dd9>] smp_apic_timer_interrupt+0x69/0x99
[    0.758368]  [<ffffffff81600b4a>] apic_timer_interrupt+0x6a/0x70
[    0.758369]  <EOI>  [<ffffffff8101b979>] ? sched_clock+0x9/0x10
[    0.758374]  [<ffffffff8101cc05>] ? mwait_idle+0x95/0x230
[    0.758377]  [<ffffffff8101d629>] cpu_idle+0xd9/0x120
[    0.758380]  [<ffffffff815d4f0e>] rest_init+0x72/0x74
[    0.758384]  [<ffffffff81cf6c12>] start_kernel+0x3c1/0x3ce
[    0.758386]  [<ffffffff81cf6582>] ? loglevel+0x31/0x31
[    0.758389]  [<ffffffff81cf6346>
x86_64_start_reservations+0x131/0x135
[    0.758392]  [<ffffffff81cf6140>] ? early_idt_handlers+0x140/0x140
[    0.758394]  [<ffffffff81cf644c>] x86_64_start_kernel+0x102/0x111
[    0.758398] ---[ end trace 82bc736bb33fe366 ]---


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-18  9:37                     ` Pascal Chapperon
@ 2012-04-18 14:01                       ` Paul E. McKenney
  2012-04-18 15:00                         ` Pascal Chapperon
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-04-18 14:01 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Wed, Apr 18, 2012 at 11:37:28AM +0200, Pascal Chapperon wrote:
> Le 16/04/2012 23:02, Paul E. McKenney a écrit :
> .
> >>So it seems that mount and unmount operations are often slower with
> >>RCU_FAST_NO_HZ during boot and shutdown.  Are these operations also
> >>slower during runtime?  If so, the RCU event tracing across both a fast
> >>and a slow mount or unmount operation would likely be quite helpful.
> -
> Mount and umount operations are not slower with RCU_FAST_NO_HZ during
> runtime; systemctl start and stop operations are also not slower. In
> fact, i couldn't find a single operation slower during runtime with
> RCU_FAST_NO_HZ.

Your boot-time setup is such that all CPUs are online before the
boot-time mount operations take place, right?  Struggling to understand
how RCU can tell the difference between post-CPU-bringup boot time
and run time...

							Thanx, Paul

> >Actually, one other possibility is that RCU_FAST_NO_HZ's timer is
> >being migrated.  If you get a chance, could you please try out the
> >diagnostic patch below?
> >
> >							Thanx, Paul
> >
> >------------------------------------------------------------------------
> >
> >rcu: Check for timer migration for RCU_FAST_NO_HZ
> >
> >If RCU_FAST_NO_HZ's timer is migrated, then the CPU that went dyntick-idle
> >with callbacks might never wake up, which could indefinitely postpone
> >invocation of its callbacks, which could in turn result in a system hang.
> >But if the timer is migrated, then it might actually fire.  In contrast,
> >if it remains on the CPU that posted it, it is guaranteed to be cancelled.
> >
> >This patch therefore adds a WARN_ON_ONCE() to this timer's handler as
> >a diagnostic test.
> >
> >Signed-off-by: Paul E. McKenney<paulmck@linux.vnet.ibm.com>
> >
> >diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> >index c023464..67ee640 100644
> >--- a/kernel/rcutree_plugin.h
> >+++ b/kernel/rcutree_plugin.h
> >@@ -2053,6 +2053,7 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
> >   */
> >  static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
> >  {
> >+	WARN_ON_ONCE(1);
> >  	trace_rcu_prep_idle("Timer");
> >  	return HRTIMER_NORESTART;
> >  }
> >
> >
> -
> The result below.
> Pascal
> [    0.758325] ------------[ cut here ]------------
> [    0.758330] WARNING: at kernel/rcutree_plugin.h:2056
> rcu_idle_gp_timer_func+0x27/0x30()
> [    0.758332] Hardware name: GX780R/GT780R/GT780DXR/GT783R
> [    0.758334] Modules linked in:
> [    0.758337] Pid: 0, comm: swapper/0 Not tainted 3.4.0-rc2-rcu+ #38
> [    0.758338] Call Trace:
> [    0.758340]  <IRQ>
> [<ffffffff81057a1f>]warn_slowpath_common+0x7f/0xc0
> [    0.758348]  [<ffffffff81057a7a>] warn_slowpath_null+0x1a/0x20
> [    0.758350]  [<ffffffff810e61e7>] rcu_idle_gp_timer_func+0x27/0x30
> [    0.758354]  [<ffffffff8107d5f1>] __run_hrtimer+0x71/0x1e0
> [    0.758357]  [<ffffffff810e61c0>] ? rcu_batches_completed+0x20/0x20
> [    0.758360]  [<ffffffff8107df3b>] hrtimer_interrupt+0xeb/0x210
> [    0.758365]  [<ffffffff81601dd9>] smp_apic_timer_interrupt+0x69/0x99
> [    0.758368]  [<ffffffff81600b4a>] apic_timer_interrupt+0x6a/0x70
> [    0.758369]  <EOI>  [<ffffffff8101b979>] ? sched_clock+0x9/0x10
> [    0.758374]  [<ffffffff8101cc05>] ? mwait_idle+0x95/0x230
> [    0.758377]  [<ffffffff8101d629>] cpu_idle+0xd9/0x120
> [    0.758380]  [<ffffffff815d4f0e>] rest_init+0x72/0x74
> [    0.758384]  [<ffffffff81cf6c12>] start_kernel+0x3c1/0x3ce
> [    0.758386]  [<ffffffff81cf6582>] ? loglevel+0x31/0x31
> [    0.758389]  [<ffffffff81cf6346>
> x86_64_start_reservations+0x131/0x135
> [    0.758392]  [<ffffffff81cf6140>] ? early_idt_handlers+0x140/0x140
> [    0.758394]  [<ffffffff81cf644c>] x86_64_start_kernel+0x102/0x111
> [    0.758398] ---[ end trace 82bc736bb33fe366 ]---
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-18 14:01                       ` Paul E. McKenney
@ 2012-04-18 15:00                         ` Pascal Chapperon
  2012-04-18 15:23                           ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal Chapperon @ 2012-04-18 15:00 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 18/04/2012 16:01, Paul E. McKenney a écrit :
> On Wed, Apr 18, 2012 at 11:37:28AM +0200, Pascal Chapperon wrote:
>> Mount and umount operations are not slower with RCU_FAST_NO_HZ during
>> runtime; systemctl start and stop operations are also not slower. In
>> fact, i couldn't find a single operation slower during runtime with
>> RCU_FAST_NO_HZ.
>
> Your boot-time setup is such that all CPUs are online before the
> boot-time mount operations take place, right?
Yes :
[    0.242697] Brought up 8 CPUs
[    0.242699] Total of 8 processors activated (35118.33 BogoMIPS).

>  Struggling to understand
> how RCU can tell the difference between post-CPU-bringup boot time
> and run time...
>
systemd is controlling the whole boot process including mount
operation (apart root filesystem) and as I can see, uses heavily
sockets to do it (not to mention cpu-affinity). It controls also the
major part of umount operations. Is it possible that your patch hits
a systemd bug ?
What I don't understand is that systemd coexists well with
RCU_FAST_NO_HZ on a smaller laptop with older and much less powerful
cpu.
I'll do further tests on another machine.

Pascal


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-18 15:00                         ` Pascal Chapperon
@ 2012-04-18 15:23                           ` Paul E. McKenney
  2012-04-20 14:45                             ` Pascal Chapperon
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-04-18 15:23 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Wed, Apr 18, 2012 at 05:00:14PM +0200, Pascal Chapperon wrote:
> Le 18/04/2012 16:01, Paul E. McKenney a écrit :
> >On Wed, Apr 18, 2012 at 11:37:28AM +0200, Pascal Chapperon wrote:
> >>Mount and umount operations are not slower with RCU_FAST_NO_HZ during
> >>runtime; systemctl start and stop operations are also not slower. In
> >>fact, i couldn't find a single operation slower during runtime with
> >>RCU_FAST_NO_HZ.
> >
> >Your boot-time setup is such that all CPUs are online before the
> >boot-time mount operations take place, right?
> Yes :
> [    0.242697] Brought up 8 CPUs
> [    0.242699] Total of 8 processors activated (35118.33 BogoMIPS).
> 
> > Struggling to understand
> >how RCU can tell the difference between post-CPU-bringup boot time
> >and run time...
> >
> systemd is controlling the whole boot process including mount
> operation (apart root filesystem) and as I can see, uses heavily
> sockets to do it (not to mention cpu-affinity). It controls also the
> major part of umount operations. Is it possible that your patch hits
> a systemd bug ?

Is it possible that systemd is using network operations that include
synchronize_rcu()?  Then if you did the same operation from the
command line at runtime, you might not see the slowdown.

Is it possible for you to convince systemd to collect RCU event tracing
during the slow operation?  RCU event tracing is available under
/sys/kernel/debug/tracing/rcu.

> What I don't understand is that systemd coexists well with
> RCU_FAST_NO_HZ on a smaller laptop with older and much less powerful
> cpu.
> I'll do further tests on another machine.

There might well be a timing-related problem.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-18 15:23                           ` Paul E. McKenney
@ 2012-04-20 14:45                             ` Pascal Chapperon
  0 siblings, 0 replies; 30+ messages in thread
From: Pascal Chapperon @ 2012-04-20 14:45 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 18/04/2012 17:23, Paul E. McKenney a écrit :
>> systemd is controlling the whole boot process including mount
>> operation (apart root filesystem) and as I can see, uses heavily
>> sockets to do it (not to mention cpu-affinity). It controls also the
>> major part of umount operations. Is it possible that your patch hits
>> a systemd bug ?
>
> Is it possible that systemd is using network operations that include
> synchronize_rcu()?  Then if you did the same operation from the
> command line at runtime, you might not see the slowdown.
>
> Is it possible for you to convince systemd to collect RCU event tracing
> during the slow operation?  RCU event tracing is available under
> /sys/kernel/debug/tracing/rcu.
>
-
Paul, thank you for these suggestions : I'll manage to enable RCU event
tracing immediately after mounting debugfs (systemd permits custom
units). As debugfs is mounted early in the boot sequence, maybe I'll
catch something.

Pascal


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-18 12:14                     ` Paul E. McKenney
@ 2012-05-18 14:48                       ` Pascal Chapperon
  0 siblings, 0 replies; 30+ messages in thread
From: Pascal Chapperon @ 2012-05-18 14:48 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 18/05/2012 14:14, Paul E. McKenney a écrit :
> On Fri, May 18, 2012 at 01:01:41PM +0200, Pascal Chapperon wrote:
>> Le 15/05/2012 00:32, Paul E. McKenney a écrit :
>>> On Fri, May 04, 2012 at 04:14:42PM -0700, Paul E. McKenney wrote:
>>>> On Fri, May 04, 2012 at 11:41:13PM +0200, Pascal Chapperon wrote:
>>>>> Le 04/05/2012 17:04, Paul E. McKenney a écrit :
>>>>>> On Fri, May 04, 2012 at 04:42:54PM +0200, Pascal Chapperon wrote:
>>>>>>> Le 01/05/2012 17:45, Paul E. McKenney a écrit :
>>>>>>>
>>>>>>>> Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.
>>>>>>>>
>>>>>>>> Or you can pull branch fnh.2012.05.01a from:
>>>>>>>>
>>>>>>>> 	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>>>>>>>>
>>>>>>>> 							Thanx, Paul
>>>>>>>>
>>>>>>> I applied your global patch on top of v3.4-rc4. But the slowdown is
>>>>>>> worse than before : boot sequence took 80s instead 20-30s (12s for
>>>>>>> initramfs instead of 2s).
>>>>>>>
>>>>>>> I'll send you rcu tracing log in a second mail.
>>>>>>
>>>>>> Hmmm...  Well, I guess I am glad that I finally did something that
>>>>>> had an effect, but I sure wish that the effect had been in the other
>>>>>> direction!
>>>>>>
>>>>>> Just to make sure I understand: the difference between the 20-30s and
>>>>>> the 80s is exactly the patch I sent you?
>>>>>>
>>>>>> 							Thanx, Paul
>>>>>>
>>>>>>
>>>>> Yes. Exactly same kernel config as in previous results, I applied
>>>>> your patch against v3.4-rc4, and sorry, the result is exactly what I
>>>>> said;
>>>>> I saw that your global patch was quite huge, and addresses things which
>>>>> are not directly related with the initial patch (commit
>>>>> 7cb92499000e3c86dae653077b1465458a039ef6); maybe a side effect?
>>>>>
>>>>> However, I'm ready to try this patch on my smaller laptop which
>>>>> supports well CONFIG_FAST_NO_HZ=y and systemd, if you think it can
>>>>> help ?
>>>>>
>>>>> Another thought: this issue as nothing to do with i7 Hyper-threading
>>>>> capacities ? (as I test core2duo, Pentium ulv in same conditions and I
>>>>> don't encountered any slowdown ?)
>>>>
>>>> Well, one possibility is that your setup starts the jiffies counter
>>>> at some interesting value.  The attached patch (also against v3.4-rc4)
>>>> applies a bit more paranoia to the initialization to handle this
>>>> and other possibilities.
>>>
>>> This patchset fixes the problem where RCU_FAST_NO_HZ's timers were
>>> being ignored due to the dyntick-idle code having already calculated
>>> the CPU's wakeup time (which I sent earlier, mistakenly offlist), but
>>> also fixes a botched check in my workaround.
>>>
>>> Could you please try it out?  This patch is against 3.4-rc4.
>>>
>>> 							Thanx, Paul
>>>
>> Hi Paul,
>>
>> <  +     if (!rcu_cpu_has_nonlazy_callbacks(cpu))
>> ---
>>> +     if (rcu_cpu_has_nonlazy_callbacks(cpu))
>>
>> I was a little disappointed by the previous patch (boot sequence still
>> took 72 s.), but this one makes a huge difference ;-)
>> Slowdown during boot or shutdown with CONFIG_RCU_FAST_NO_HZ has
>> disappeared (~ 10 attempts) :
>> # systemd-analyze
>> Startup finished in 1990ms (kernel) + 1174ms (initramfs) + 3121ms
>> (userspace) = 6285ms
>> .
>
> Very good!  And thank you very much for all your testing efforts and
> for bearing with me through this!
>
> Does this mean that I can add your Tested-by?

Yes: the results are good and stable, at least for my hardware.
I tried with both a standard fedora 16 kernel configuration and a custom
one (hardware optimized, preempted, etc...) and this on over more 20
attempts.
With or without FAST_NO_HZ makes no difference now.

>
>> Do you want the rcu tracing log for this patch ?
>
> Could you please?  Just in case there is some other surprise that
> I should know about that might not be visible.  ;-)
>
> 							Thanx, Paul
I'll send you the logs in a second mail (offlist).

Pascal



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-18 11:01                   ` Pascal Chapperon
@ 2012-05-18 12:14                     ` Paul E. McKenney
  2012-05-18 14:48                       ` Pascal Chapperon
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-05-18 12:14 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Fri, May 18, 2012 at 01:01:41PM +0200, Pascal Chapperon wrote:
> Le 15/05/2012 00:32, Paul E. McKenney a écrit :
> >On Fri, May 04, 2012 at 04:14:42PM -0700, Paul E. McKenney wrote:
> >>On Fri, May 04, 2012 at 11:41:13PM +0200, Pascal Chapperon wrote:
> >>>Le 04/05/2012 17:04, Paul E. McKenney a écrit :
> >>>>On Fri, May 04, 2012 at 04:42:54PM +0200, Pascal Chapperon wrote:
> >>>>>Le 01/05/2012 17:45, Paul E. McKenney a écrit :
> >>>>>
> >>>>>>Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.
> >>>>>>
> >>>>>>Or you can pull branch fnh.2012.05.01a from:
> >>>>>>
> >>>>>>	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> >>>>>>
> >>>>>>							Thanx, Paul
> >>>>>>
> >>>>>I applied your global patch on top of v3.4-rc4. But the slowdown is
> >>>>>worse than before : boot sequence took 80s instead 20-30s (12s for
> >>>>>initramfs instead of 2s).
> >>>>>
> >>>>>I'll send you rcu tracing log in a second mail.
> >>>>
> >>>>Hmmm...  Well, I guess I am glad that I finally did something that
> >>>>had an effect, but I sure wish that the effect had been in the other
> >>>>direction!
> >>>>
> >>>>Just to make sure I understand: the difference between the 20-30s and
> >>>>the 80s is exactly the patch I sent you?
> >>>>
> >>>>							Thanx, Paul
> >>>>
> >>>>
> >>>Yes. Exactly same kernel config as in previous results, I applied
> >>>your patch against v3.4-rc4, and sorry, the result is exactly what I
> >>>said;
> >>>I saw that your global patch was quite huge, and addresses things which
> >>>are not directly related with the initial patch (commit
> >>>7cb92499000e3c86dae653077b1465458a039ef6); maybe a side effect?
> >>>
> >>>However, I'm ready to try this patch on my smaller laptop which
> >>>supports well CONFIG_FAST_NO_HZ=y and systemd, if you think it can
> >>>help ?
> >>>
> >>>Another thought: this issue as nothing to do with i7 Hyper-threading
> >>>capacities ? (as I test core2duo, Pentium ulv in same conditions and I
> >>>don't encountered any slowdown ?)
> >>
> >>Well, one possibility is that your setup starts the jiffies counter
> >>at some interesting value.  The attached patch (also against v3.4-rc4)
> >>applies a bit more paranoia to the initialization to handle this
> >>and other possibilities.
> >
> >This patchset fixes the problem where RCU_FAST_NO_HZ's timers were
> >being ignored due to the dyntick-idle code having already calculated
> >the CPU's wakeup time (which I sent earlier, mistakenly offlist), but
> >also fixes a botched check in my workaround.
> >
> >Could you please try it out?  This patch is against 3.4-rc4.
> >
> >							Thanx, Paul
> >
> Hi Paul,
> 
> < +     if (!rcu_cpu_has_nonlazy_callbacks(cpu))
> ---
> > +     if (rcu_cpu_has_nonlazy_callbacks(cpu))
> 
> I was a little disappointed by the previous patch (boot sequence still
> took 72 s.), but this one makes a huge difference ;-)
> Slowdown during boot or shutdown with CONFIG_RCU_FAST_NO_HZ has
> disappeared (~ 10 attempts) :
> # systemd-analyze
> Startup finished in 1990ms (kernel) + 1174ms (initramfs) + 3121ms
> (userspace) = 6285ms
> .

Very good!  And thank you very much for all your testing efforts and
for bearing with me through this!

Does this mean that I can add your Tested-by?

> Do you want the rcu tracing log for this patch ?

Could you please?  Just in case there is some other surprise that
I should know about that might not be visible.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-14 22:32                 ` Paul E. McKenney
@ 2012-05-18 11:01                   ` Pascal Chapperon
  2012-05-18 12:14                     ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal Chapperon @ 2012-05-18 11:01 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 15/05/2012 00:32, Paul E. McKenney a écrit :
> On Fri, May 04, 2012 at 04:14:42PM -0700, Paul E. McKenney wrote:
>> On Fri, May 04, 2012 at 11:41:13PM +0200, Pascal Chapperon wrote:
>>> Le 04/05/2012 17:04, Paul E. McKenney a écrit :
>>>> On Fri, May 04, 2012 at 04:42:54PM +0200, Pascal Chapperon wrote:
>>>>> Le 01/05/2012 17:45, Paul E. McKenney a écrit :
>>>>>
>>>>>> Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.
>>>>>>
>>>>>> Or you can pull branch fnh.2012.05.01a from:
>>>>>>
>>>>>> 	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>>>>>>
>>>>>> 							Thanx, Paul
>>>>>>
>>>>> I applied your global patch on top of v3.4-rc4. But the slowdown is
>>>>> worse than before : boot sequence took 80s instead 20-30s (12s for
>>>>> initramfs instead of 2s).
>>>>>
>>>>> I'll send you rcu tracing log in a second mail.
>>>>
>>>> Hmmm...  Well, I guess I am glad that I finally did something that
>>>> had an effect, but I sure wish that the effect had been in the other
>>>> direction!
>>>>
>>>> Just to make sure I understand: the difference between the 20-30s and
>>>> the 80s is exactly the patch I sent you?
>>>>
>>>> 							Thanx, Paul
>>>>
>>>>
>>> Yes. Exactly same kernel config as in previous results, I applied
>>> your patch against v3.4-rc4, and sorry, the result is exactly what I
>>> said;
>>> I saw that your global patch was quite huge, and addresses things which
>>> are not directly related with the initial patch (commit
>>> 7cb92499000e3c86dae653077b1465458a039ef6); maybe a side effect?
>>>
>>> However, I'm ready to try this patch on my smaller laptop which
>>> supports well CONFIG_FAST_NO_HZ=y and systemd, if you think it can
>>> help ?
>>>
>>> Another thought: this issue as nothing to do with i7 Hyper-threading
>>> capacities ? (as I test core2duo, Pentium ulv in same conditions and I
>>> don't encountered any slowdown ?)
>>
>> Well, one possibility is that your setup starts the jiffies counter
>> at some interesting value.  The attached patch (also against v3.4-rc4)
>> applies a bit more paranoia to the initialization to handle this
>> and other possibilities.
>
> This patchset fixes the problem where RCU_FAST_NO_HZ's timers were
> being ignored due to the dyntick-idle code having already calculated
> the CPU's wakeup time (which I sent earlier, mistakenly offlist), but
> also fixes a botched check in my workaround.
>
> Could you please try it out?  This patch is against 3.4-rc4.
>
> 							Thanx, Paul
>
Hi Paul,

< +     if (!rcu_cpu_has_nonlazy_callbacks(cpu))
---
 > +     if (rcu_cpu_has_nonlazy_callbacks(cpu))

I was a little disappointed by the previous patch (boot sequence still
took 72 s.), but this one makes a huge difference ;-)
Slowdown during boot or shutdown with CONFIG_RCU_FAST_NO_HZ has
disappeared (~ 10 attempts) :
# systemd-analyze
Startup finished in 1990ms (kernel) + 1174ms (initramfs) + 3121ms
(userspace) = 6285ms
.
Do you want the rcu tracing log for this patch ?

Pascal


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-04 23:14               ` Paul E. McKenney
  2012-05-10  8:40                 ` Pascal Chapperon
@ 2012-05-14 22:32                 ` Paul E. McKenney
  2012-05-18 11:01                   ` Pascal Chapperon
  1 sibling, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-05-14 22:32 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Fri, May 04, 2012 at 04:14:42PM -0700, Paul E. McKenney wrote:
> On Fri, May 04, 2012 at 11:41:13PM +0200, Pascal Chapperon wrote:
> > Le 04/05/2012 17:04, Paul E. McKenney a écrit :
> > >On Fri, May 04, 2012 at 04:42:54PM +0200, Pascal Chapperon wrote:
> > >>Le 01/05/2012 17:45, Paul E. McKenney a écrit :
> > >>
> > >>>Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.
> > >>>
> > >>>Or you can pull branch fnh.2012.05.01a from:
> > >>>
> > >>>	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> > >>>
> > >>>							Thanx, Paul
> > >>>
> > >>I applied your global patch on top of v3.4-rc4. But the slowdown is
> > >>worse than before : boot sequence took 80s instead 20-30s (12s for
> > >>initramfs instead of 2s).
> > >>
> > >>I'll send you rcu tracing log in a second mail.
> > >
> > >Hmmm...  Well, I guess I am glad that I finally did something that
> > >had an effect, but I sure wish that the effect had been in the other
> > >direction!
> > >
> > >Just to make sure I understand: the difference between the 20-30s and
> > >the 80s is exactly the patch I sent you?
> > >
> > >							Thanx, Paul
> > >
> > >
> > Yes. Exactly same kernel config as in previous results, I applied
> > your patch against v3.4-rc4, and sorry, the result is exactly what I
> > said;
> > I saw that your global patch was quite huge, and addresses things which
> > are not directly related with the initial patch (commit
> > 7cb92499000e3c86dae653077b1465458a039ef6); maybe a side effect?
> > 
> > However, I'm ready to try this patch on my smaller laptop which
> > supports well CONFIG_FAST_NO_HZ=y and systemd, if you think it can
> > help ?
> > 
> > Another thought: this issue as nothing to do with i7 Hyper-threading
> > capacities ? (as I test core2duo, Pentium ulv in same conditions and I
> > don't encountered any slowdown ?)
> 
> Well, one possibility is that your setup starts the jiffies counter
> at some interesting value.  The attached patch (also against v3.4-rc4)
> applies a bit more paranoia to the initialization to handle this
> and other possibilities.

This patchset fixes the problem where RCU_FAST_NO_HZ's timers were
being ignored due to the dyntick-idle code having already calculated
the CPU's wakeup time (which I sent earlier, mistakenly offlist), but
also fixes a botched check in my workaround.

Could you please try it out?  This patch is against 3.4-rc4.

							Thanx, Paul

------------------------------------------------------------------------

 include/linux/rcutiny.h    |    6 -
 include/linux/rcutree.h    |    2 
 include/trace/events/rcu.h |    3 
 kernel/rcutiny_plugin.h    |    2 
 kernel/rcutree.c           |    6 -
 kernel/rcutree.h           |   15 +++
 kernel/rcutree_plugin.h    |  213 ++++++++++++++++++++++++++++++---------------
 kernel/time/tick-sched.c   |    7 +
 8 files changed, 179 insertions(+), 75 deletions(-)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index e93df77..56048bc 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -95,8 +95,9 @@ static inline void exit_rcu(void)
 {
 }
 
-static inline int rcu_needs_cpu(int cpu)
+static inline int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
 {
+	*delta_jiffies = ULONG_MAX;
 	return 0;
 }
 
@@ -106,8 +107,9 @@ void rcu_preempt_note_context_switch(void);
 extern void exit_rcu(void);
 int rcu_preempt_needs_cpu(void);
 
-static inline int rcu_needs_cpu(int cpu)
+static inline int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
 {
+	*delta_jiffies = ULONG_MAX;
 	return rcu_preempt_needs_cpu();
 }
 
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index e8ee5dd..06b1939 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -32,7 +32,7 @@
 
 extern void rcu_init(void);
 extern void rcu_note_context_switch(int cpu);
-extern int rcu_needs_cpu(int cpu);
+extern int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies);
 extern void rcu_cpu_stall_reset(void);
 
 /*
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 3370997..d274734 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -289,9 +289,12 @@ TRACE_EVENT(rcu_dyntick,
  *	"In holdoff": Nothing to do, holding off after unsuccessful attempt.
  *	"Begin holdoff": Attempt failed, don't retry until next jiffy.
  *	"Dyntick with callbacks": Entering dyntick-idle despite callbacks.
+ *	"Dyntick with lazy callbacks": Entering dyntick-idle w/lazy callbacks.
  *	"More callbacks": Still more callbacks, try again to clear them out.
  *	"Callbacks drained": All callbacks processed, off to dyntick idle!
  *	"Timer": Timer fired to cause CPU to continue processing callbacks.
+ *	"Demigrate": Timer fired on wrong CPU, woke up correct CPU.
+ *	"Cleanup after idle": Idle exited, timer canceled.
  */
 TRACE_EVENT(rcu_prep_idle,
 
diff --git a/kernel/rcutiny_plugin.h b/kernel/rcutiny_plugin.h
index 22ecea0..f3b995a 100644
--- a/kernel/rcutiny_plugin.h
+++ b/kernel/rcutiny_plugin.h
@@ -846,8 +846,6 @@ EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
  */
 int rcu_preempt_needs_cpu(void)
 {
-	if (!rcu_preempt_running_reader())
-		rcu_preempt_cpu_qs();
 	return rcu_preempt_ctrlblk.rcb.rcucblist != NULL;
 }
 
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 1050d6d..38300c0 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -349,7 +349,7 @@ static void rcu_idle_enter_common(struct rcu_dynticks *rdtp, long long oldval)
 		struct task_struct *idle = idle_task(smp_processor_id());
 
 		trace_rcu_dyntick("Error on entry: not idle task", oldval, 0);
-		ftrace_dump(DUMP_ALL);
+		ftrace_dump(DUMP_ORIG);
 		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
 			  current->pid, current->comm,
 			  idle->pid, idle->comm); /* must be idle task! */
@@ -459,7 +459,7 @@ static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
 
 		trace_rcu_dyntick("Error on exit: not idle task",
 				  oldval, rdtp->dynticks_nesting);
-		ftrace_dump(DUMP_ALL);
+		ftrace_dump(DUMP_ORIG);
 		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
 			  current->pid, current->comm,
 			  idle->pid, idle->comm); /* must be idle task! */
@@ -1829,6 +1829,8 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 	rdp->qlen++;
 	if (lazy)
 		rdp->qlen_lazy++;
+	else
+		rcu_idle_count_callbacks_posted();
 
 	if (__is_kfree_rcu_offset((unsigned long)func))
 		trace_rcu_kfree_callback(rsp->name, head, (unsigned long)func,
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index cdd1be0..7e914db 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -88,6 +88,20 @@ struct rcu_dynticks {
 				    /* Process level is worth LLONG_MAX/2. */
 	int dynticks_nmi_nesting;   /* Track NMI nesting level. */
 	atomic_t dynticks;	    /* Even value for idle, else odd. */
+#ifdef CONFIG_RCU_FAST_NO_HZ
+	int dyntick_drain;	    /* Prepare-for-idle state variable. */
+	unsigned long dyntick_holdoff;
+				    /* No retries for the jiffy of failure. */
+	struct timer_list idle_gp_timer;
+				    /* Wake up CPU sleeping with callbacks. */
+	unsigned long idle_gp_timer_expires;
+				    /* When to wake up CPU (for repost). */
+	bool idle_first_pass;	    /* First pass of attempt to go idle? */
+	unsigned long nonlazy_posted;
+				    /* # times non-lazy CBs posted to CPU. */
+	unsigned long nonlazy_posted_snap;
+				    /* idle-period nonlazy_posted snapshot. */
+#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 };
 
 /* RCU's kthread states for tracing. */
@@ -471,6 +485,7 @@ static void __cpuinit rcu_prepare_kthreads(int cpu);
 static void rcu_prepare_for_idle_init(int cpu);
 static void rcu_cleanup_after_idle(int cpu);
 static void rcu_prepare_for_idle(int cpu);
+static void rcu_idle_count_callbacks_posted(void);
 static void print_cpu_stall_info_begin(void);
 static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
 static void print_cpu_stall_info_end(void);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c023464..7eef245 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1910,8 +1910,9 @@ static void __cpuinit rcu_prepare_kthreads(int cpu)
  * Because we not have RCU_FAST_NO_HZ, just check whether this CPU needs
  * any flavor of RCU.
  */
-int rcu_needs_cpu(int cpu)
+int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
 {
+	*delta_jiffies = ULONG_MAX;
 	return rcu_cpu_has_callbacks(cpu);
 }
 
@@ -1938,6 +1939,14 @@ static void rcu_prepare_for_idle(int cpu)
 {
 }
 
+/*
+ * Don't bother keeping a running count of the number of RCU callbacks
+ * posted because CONFIG_RCU_FAST_NO_HZ=n.
+ */
+static void rcu_idle_count_callbacks_posted(void)
+{
+}
+
 #else /* #if !defined(CONFIG_RCU_FAST_NO_HZ) */
 
 /*
@@ -1978,30 +1987,6 @@ static void rcu_prepare_for_idle(int cpu)
 #define RCU_IDLE_GP_DELAY 6		/* Roughly one grace period. */
 #define RCU_IDLE_LAZY_GP_DELAY (6 * HZ)	/* Roughly six seconds. */
 
-static DEFINE_PER_CPU(int, rcu_dyntick_drain);
-static DEFINE_PER_CPU(unsigned long, rcu_dyntick_holdoff);
-static DEFINE_PER_CPU(struct hrtimer, rcu_idle_gp_timer);
-static ktime_t rcu_idle_gp_wait;	/* If some non-lazy callbacks. */
-static ktime_t rcu_idle_lazy_gp_wait;	/* If only lazy callbacks. */
-
-/*
- * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
- * callbacks on this CPU, (2) this CPU has not yet attempted to enter
- * dyntick-idle mode, or (3) this CPU is in the process of attempting to
- * enter dyntick-idle mode.  Otherwise, if we have recently tried and failed
- * to enter dyntick-idle mode, we refuse to try to enter it.  After all,
- * it is better to incur scheduling-clock interrupts than to spin
- * continuously for the same time duration!
- */
-int rcu_needs_cpu(int cpu)
-{
-	/* If no callbacks, RCU doesn't need the CPU. */
-	if (!rcu_cpu_has_callbacks(cpu))
-		return 0;
-	/* Otherwise, RCU needs the CPU only if it recently tried and failed. */
-	return per_cpu(rcu_dyntick_holdoff, cpu) == jiffies;
-}
-
 /*
  * Does the specified flavor of RCU have non-lazy callbacks pending on
  * the specified CPU?  Both RCU flavor and CPU are specified by the
@@ -2045,16 +2030,75 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
 }
 
 /*
+ * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
+ * callbacks on this CPU, (2) this CPU has not yet attempted to enter
+ * dyntick-idle mode, or (3) this CPU is in the process of attempting to
+ * enter dyntick-idle mode.  Otherwise, if we have recently tried and failed
+ * to enter dyntick-idle mode, we refuse to try to enter it.  After all,
+ * it is better to incur scheduling-clock interrupts than to spin
+ * continuously for the same time duration!
+ *
+ * The delta_jiffies argument is used to store the time when RCU is
+ * going to need the CPU again if it still has callbacks.  The reason
+ * for this is that rcu_prepare_for_idle() might need to post a timer,
+ * but if so, it will do so after tick_nohz_stop_sched_tick() has set
+ * the wakeup time for this CPU.  This means that RCU's timer can be
+ * delayed until the wakeup time, which defeats the purpose of posting
+ * a timer.
+ */
+int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
+{
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+	/* Flag a new idle sojourn to the idle-entry state machine. */
+	rdtp->idle_first_pass = 1;
+	/* If no callbacks, RCU doesn't need the CPU. */
+	if (!rcu_cpu_has_callbacks(cpu)) {
+		*delta_jiffies = ULONG_MAX;
+		return 0;
+	}
+	if (rdtp->dyntick_holdoff == jiffies) {
+		/* RCU recently tried and failed, so don't try again. */
+		*delta_jiffies = 1;
+		return 1;
+	}
+	/* Set up for the possibility that RCU will post a timer. */
+	if (rcu_cpu_has_nonlazy_callbacks(cpu))
+		*delta_jiffies = RCU_IDLE_GP_DELAY;
+	else
+		*delta_jiffies = RCU_IDLE_LAZY_GP_DELAY;
+	return 0;
+}
+
+/*
+ * Handler for smp_call_function_single().  The only point of this
+ * handler is to wake the CPU up, so the handler does only tracing.
+ */
+void rcu_idle_demigrate(void *unused)
+{
+	trace_rcu_prep_idle("Demigrate");
+}
+
+/*
  * Timer handler used to force CPU to start pushing its remaining RCU
  * callbacks in the case where it entered dyntick-idle mode with callbacks
  * pending.  The hander doesn't really need to do anything because the
  * real work is done upon re-entry to idle, or by the next scheduling-clock
  * interrupt should idle not be re-entered.
+ *
+ * One special case: the timer gets migrated without awakening the CPU
+ * on which the timer was scheduled on.  In this case, we must wake up
+ * that CPU.  We do so with smp_call_function_single().
  */
-static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
+static void rcu_idle_gp_timer_func(unsigned long cpu_in)
 {
+	int cpu = (int)cpu_in;
+
 	trace_rcu_prep_idle("Timer");
-	return HRTIMER_NORESTART;
+	if (cpu != smp_processor_id())
+		smp_call_function_single(cpu, rcu_idle_demigrate, NULL, 0);
+	else
+		WARN_ON_ONCE(1); /* Getting here can hang the system... */
 }
 
 /*
@@ -2062,29 +2106,25 @@ static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
  */
 static void rcu_prepare_for_idle_init(int cpu)
 {
-	static int firsttime = 1;
-	struct hrtimer *hrtp = &per_cpu(rcu_idle_gp_timer, cpu);
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
 
-	hrtimer_init(hrtp, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	hrtp->function = rcu_idle_gp_timer_func;
-	if (firsttime) {
-		unsigned int upj = jiffies_to_usecs(RCU_IDLE_GP_DELAY);
-
-		rcu_idle_gp_wait = ns_to_ktime(upj * (u64)1000);
-		upj = jiffies_to_usecs(RCU_IDLE_LAZY_GP_DELAY);
-		rcu_idle_lazy_gp_wait = ns_to_ktime(upj * (u64)1000);
-		firsttime = 0;
-	}
+	rdtp->dyntick_holdoff = jiffies - 1;
+	setup_timer(&rdtp->idle_gp_timer, rcu_idle_gp_timer_func, cpu);
+	rdtp->idle_gp_timer_expires = jiffies - 1;
+	rdtp->idle_first_pass = 1;
 }
 
 /*
  * Clean up for exit from idle.  Because we are exiting from idle, there
- * is no longer any point to rcu_idle_gp_timer, so cancel it.  This will
+ * is no longer any point to ->idle_gp_timer, so cancel it.  This will
  * do nothing if this timer is not active, so just cancel it unconditionally.
  */
 static void rcu_cleanup_after_idle(int cpu)
 {
-	hrtimer_cancel(&per_cpu(rcu_idle_gp_timer, cpu));
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+	del_timer(&rdtp->idle_gp_timer);
+	trace_rcu_prep_idle("Cleanup after idle");
 }
 
 /*
@@ -2102,19 +2142,41 @@ static void rcu_cleanup_after_idle(int cpu)
  * Because it is not legal to invoke rcu_process_callbacks() with irqs
  * disabled, we do one pass of force_quiescent_state(), then do a
  * invoke_rcu_core() to cause rcu_process_callbacks() to be invoked
- * later.  The per-cpu rcu_dyntick_drain variable controls the sequencing.
+ * later.  The ->dyntick_drain field controls the sequencing.
  *
  * The caller must have disabled interrupts.
  */
 static void rcu_prepare_for_idle(int cpu)
 {
+	struct timer_list *tp;
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+	/*
+	 * If this is an idle re-entry, for example, due to use of
+	 * RCU_NONIDLE() or the new idle-loop tracing API within the idle
+	 * loop, then don't take any state-machine actions, unless the
+	 * momentary exit from idle queued additional non-lazy callbacks.
+	 * Instead, repost the ->idle_gp_timer if this CPU has callbacks
+	 * pending.
+	 */
+	if (!rdtp->idle_first_pass &&
+	    (rdtp->nonlazy_posted == rdtp->nonlazy_posted_snap)) {
+		if (rcu_cpu_has_callbacks(cpu)) {
+			tp = &rdtp->idle_gp_timer;
+			mod_timer_pinned(tp, rdtp->idle_gp_timer_expires);
+		}
+		return;
+	}
+	rdtp->idle_first_pass = 0;
+	rdtp->nonlazy_posted_snap = rdtp->nonlazy_posted - 1;
+
 	/*
 	 * If there are no callbacks on this CPU, enter dyntick-idle mode.
 	 * Also reset state to avoid prejudicing later attempts.
 	 */
 	if (!rcu_cpu_has_callbacks(cpu)) {
-		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;
-		per_cpu(rcu_dyntick_drain, cpu) = 0;
+		rdtp->dyntick_holdoff = jiffies - 1;
+		rdtp->dyntick_drain = 0;
 		trace_rcu_prep_idle("No callbacks");
 		return;
 	}
@@ -2123,32 +2185,37 @@ static void rcu_prepare_for_idle(int cpu)
 	 * If in holdoff mode, just return.  We will presumably have
 	 * refrained from disabling the scheduling-clock tick.
 	 */
-	if (per_cpu(rcu_dyntick_holdoff, cpu) == jiffies) {
+	if (rdtp->dyntick_holdoff == jiffies) {
 		trace_rcu_prep_idle("In holdoff");
 		return;
 	}
 
-	/* Check and update the rcu_dyntick_drain sequencing. */
-	if (per_cpu(rcu_dyntick_drain, cpu) <= 0) {
+	/* Check and update the ->dyntick_drain sequencing. */
+	if (rdtp->dyntick_drain <= 0) {
 		/* First time through, initialize the counter. */
-		per_cpu(rcu_dyntick_drain, cpu) = RCU_IDLE_FLUSHES;
-	} else if (per_cpu(rcu_dyntick_drain, cpu) <= RCU_IDLE_OPT_FLUSHES &&
+		rdtp->dyntick_drain = RCU_IDLE_FLUSHES;
+	} else if (rdtp->dyntick_drain <= RCU_IDLE_OPT_FLUSHES &&
 		   !rcu_pending(cpu) &&
 		   !local_softirq_pending()) {
 		/* Can we go dyntick-idle despite still having callbacks? */
-		trace_rcu_prep_idle("Dyntick with callbacks");
-		per_cpu(rcu_dyntick_drain, cpu) = 0;
-		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;
-		if (rcu_cpu_has_nonlazy_callbacks(cpu))
-			hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
-				      rcu_idle_gp_wait, HRTIMER_MODE_REL);
-		else
-			hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
-				      rcu_idle_lazy_gp_wait, HRTIMER_MODE_REL);
+		rdtp->dyntick_drain = 0;
+		rdtp->dyntick_holdoff = jiffies;
+		if (rcu_cpu_has_nonlazy_callbacks(cpu)) {
+			trace_rcu_prep_idle("Dyntick with callbacks");
+			rdtp->idle_gp_timer_expires =
+					   jiffies + RCU_IDLE_GP_DELAY;
+		} else {
+			rdtp->idle_gp_timer_expires =
+					   jiffies + RCU_IDLE_LAZY_GP_DELAY;
+			trace_rcu_prep_idle("Dyntick with lazy callbacks");
+		}
+		tp = &rdtp->idle_gp_timer;
+		mod_timer_pinned(tp, rdtp->idle_gp_timer_expires);
+		rdtp->nonlazy_posted_snap = rdtp->nonlazy_posted;
 		return; /* Nothing more to do immediately. */
-	} else if (--per_cpu(rcu_dyntick_drain, cpu) <= 0) {
+	} else if (--(rdtp->dyntick_drain) <= 0) {
 		/* We have hit the limit, so time to give up. */
-		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;
+		rdtp->dyntick_holdoff = jiffies;
 		trace_rcu_prep_idle("Begin holdoff");
 		invoke_rcu_core();  /* Force the CPU out of dyntick-idle. */
 		return;
@@ -2184,6 +2251,19 @@ static void rcu_prepare_for_idle(int cpu)
 		trace_rcu_prep_idle("Callbacks drained");
 }
 
+/*
+ * Keep a running count of the number of non-lazy callbacks posted
+ * on this CPU.  This running counter (which is never decremented) allows
+ * rcu_prepare_for_idle() to detect when something out of the idle loop
+ * posts a callback, even if an equal number of callbacks are invoked.
+ * Of course, callbacks should only be posted from within a trace event
+ * designed to be called from idle or from within RCU_NONIDLE().
+ */
+static void rcu_idle_count_callbacks_posted(void)
+{
+	__this_cpu_add(rcu_dynticks.nonlazy_posted, 1);
+}
+
 #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
 
 #ifdef CONFIG_RCU_CPU_STALL_INFO
@@ -2192,14 +2272,13 @@ static void rcu_prepare_for_idle(int cpu)
 
 static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
 {
-	struct hrtimer *hrtp = &per_cpu(rcu_idle_gp_timer, cpu);
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+	struct timer_list *tltp = &rdtp->idle_gp_timer;
 
-	sprintf(cp, "drain=%d %c timer=%lld",
-		per_cpu(rcu_dyntick_drain, cpu),
-		per_cpu(rcu_dyntick_holdoff, cpu) == jiffies ? 'H' : '.',
-		hrtimer_active(hrtp)
-			? ktime_to_us(hrtimer_get_remaining(hrtp))
-			: -1);
+	sprintf(cp, "drain=%d %c timer=%lu",
+		rdtp->dyntick_drain,
+		rdtp->dyntick_holdoff == jiffies ? 'H' : '.',
+		timer_pending(tltp) ? tltp->expires - jiffies : -1);
 }
 
 #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6a3a5b9..52f5ebb 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -274,6 +274,7 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
 static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
 {
 	unsigned long seq, last_jiffies, next_jiffies, delta_jiffies;
+	unsigned long rcu_delta_jiffies;
 	ktime_t last_update, expires, now;
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	u64 time_delta;
@@ -322,7 +323,7 @@ static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
 		time_delta = timekeeping_max_deferment();
 	} while (read_seqretry(&xtime_lock, seq));
 
-	if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
+	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
 	    arch_needs_cpu(cpu)) {
 		next_jiffies = last_jiffies + 1;
 		delta_jiffies = 1;
@@ -330,6 +331,10 @@ static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
 		/* Get the next timer wheel timer */
 		next_jiffies = get_next_timer_interrupt(last_jiffies);
 		delta_jiffies = next_jiffies - last_jiffies;
+		if (rcu_delta_jiffies < delta_jiffies) {
+			next_jiffies = last_jiffies + rcu_delta_jiffies;
+			delta_jiffies = rcu_delta_jiffies;
+		}
 	}
 	/*
 	 * Do not stop the tick, if we are only one off


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-04 23:14               ` Paul E. McKenney
@ 2012-05-10  8:40                 ` Pascal Chapperon
  2012-05-14 22:32                 ` Paul E. McKenney
  1 sibling, 0 replies; 30+ messages in thread
From: Pascal Chapperon @ 2012-05-10  8:40 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 05/05/2012 01:14, Paul E. McKenney a écrit :
> On Fri, May 04, 2012 at 11:41:13PM +0200, Pascal Chapperon wrote:
>> Le 04/05/2012 17:04, Paul E. McKenney a écrit :
>>> On Fri, May 04, 2012 at 04:42:54PM +0200, Pascal Chapperon wrote:
>>>> Le 01/05/2012 17:45, Paul E. McKenney a écrit :
>>>>
>>>>> Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.
>>>>>
>>>>> Or you can pull branch fnh.2012.05.01a from:
>>>>>
>>>>> 	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>>>>>
>>>>> 							Thanx, Paul
>>>>>
>>>> I applied your global patch on top of v3.4-rc4. But the slowdown is
>>>> worse than before : boot sequence took 80s instead 20-30s (12s for
>>>> initramfs instead of 2s).
>>>>
>>>> I'll send you rcu tracing log in a second mail.
>>>
>>> Hmmm...  Well, I guess I am glad that I finally did something that
>>> had an effect, but I sure wish that the effect had been in the other
>>> direction!
>>>
>>> Just to make sure I understand: the difference between the 20-30s and
>>> the 80s is exactly the patch I sent you?
>>>
>>> 							Thanx, Paul
>>>
>>>
>> Yes. Exactly same kernel config as in previous results, I applied
>> your patch against v3.4-rc4, and sorry, the result is exactly what I
>> said;
>> I saw that your global patch was quite huge, and addresses things which
>> are not directly related with the initial patch (commit
>> 7cb92499000e3c86dae653077b1465458a039ef6); maybe a side effect?
>>
>> However, I'm ready to try this patch on my smaller laptop which
>> supports well CONFIG_FAST_NO_HZ=y and systemd, if you think it can
>> help ?
>>
>> Another thought: this issue as nothing to do with i7 Hyper-threading
>> capacities ? (as I test core2duo, Pentium ulv in same conditions and I
>> don't encountered any slowdown ?)
>
> Well, one possibility is that your setup starts the jiffies counter
> at some interesting value.  The attached patch (also against v3.4-rc4)
> applies a bit more paranoia to the initialization to handle this
> and other possibilities.
>
> 							Thanx, Paul
>
> ------------------------------------------------------------------------
I tried your new patch against v3.4-rc5 and saw no improvement :
still 75 s. for the boot sequence.

I'll send you the logs in a second mail.

Pascal



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-04 21:41             ` Pascal Chapperon
@ 2012-05-04 23:14               ` Paul E. McKenney
  2012-05-10  8:40                 ` Pascal Chapperon
  2012-05-14 22:32                 ` Paul E. McKenney
  0 siblings, 2 replies; 30+ messages in thread
From: Paul E. McKenney @ 2012-05-04 23:14 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Fri, May 04, 2012 at 11:41:13PM +0200, Pascal Chapperon wrote:
> Le 04/05/2012 17:04, Paul E. McKenney a écrit :
> >On Fri, May 04, 2012 at 04:42:54PM +0200, Pascal Chapperon wrote:
> >>Le 01/05/2012 17:45, Paul E. McKenney a écrit :
> >>
> >>>Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.
> >>>
> >>>Or you can pull branch fnh.2012.05.01a from:
> >>>
> >>>	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> >>>
> >>>							Thanx, Paul
> >>>
> >>I applied your global patch on top of v3.4-rc4. But the slowdown is
> >>worse than before : boot sequence took 80s instead 20-30s (12s for
> >>initramfs instead of 2s).
> >>
> >>I'll send you rcu tracing log in a second mail.
> >
> >Hmmm...  Well, I guess I am glad that I finally did something that
> >had an effect, but I sure wish that the effect had been in the other
> >direction!
> >
> >Just to make sure I understand: the difference between the 20-30s and
> >the 80s is exactly the patch I sent you?
> >
> >							Thanx, Paul
> >
> >
> Yes. Exactly same kernel config as in previous results, I applied
> your patch against v3.4-rc4, and sorry, the result is exactly what I
> said;
> I saw that your global patch was quite huge, and addresses things which
> are not directly related with the initial patch (commit
> 7cb92499000e3c86dae653077b1465458a039ef6); maybe a side effect?
> 
> However, I'm ready to try this patch on my smaller laptop which
> supports well CONFIG_FAST_NO_HZ=y and systemd, if you think it can
> help ?
> 
> Another thought: this issue as nothing to do with i7 Hyper-threading
> capacities ? (as I test core2duo, Pentium ulv in same conditions and I
> don't encountered any slowdown ?)

Well, one possibility is that your setup starts the jiffies counter
at some interesting value.  The attached patch (also against v3.4-rc4)
applies a bit more paranoia to the initialization to handle this
and other possibilities.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 3370997..1480900 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -292,6 +292,8 @@ TRACE_EVENT(rcu_dyntick,
  *	"More callbacks": Still more callbacks, try again to clear them out.
  *	"Callbacks drained": All callbacks processed, off to dyntick idle!
  *	"Timer": Timer fired to cause CPU to continue processing callbacks.
+ *	"Demigrate": Timer fired on wrong CPU, woke up correct CPU.
+ *	"Cleanup after idle": Idle exited, timer canceled.
  */
 TRACE_EVENT(rcu_prep_idle,
 
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 1050d6d..403306b 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1829,6 +1829,8 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 	rdp->qlen++;
 	if (lazy)
 		rdp->qlen_lazy++;
+	else
+		rcu_idle_count_callbacks_posted();
 
 	if (__is_kfree_rcu_offset((unsigned long)func))
 		trace_rcu_kfree_callback(rsp->name, head, (unsigned long)func,
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index cdd1be0..36ca28e 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -471,6 +471,7 @@ static void __cpuinit rcu_prepare_kthreads(int cpu);
 static void rcu_prepare_for_idle_init(int cpu);
 static void rcu_cleanup_after_idle(int cpu);
 static void rcu_prepare_for_idle(int cpu);
+static void rcu_idle_count_callbacks_posted(void);
 static void print_cpu_stall_info_begin(void);
 static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
 static void print_cpu_stall_info_end(void);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c023464..ccbdc72 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1938,6 +1938,14 @@ static void rcu_prepare_for_idle(int cpu)
 {
 }
 
+/*
+ * Don't bother keeping a running count of the number of RCU callbacks
+ * posted because CONFIG_RCU_FAST_NO_HZ=n.
+ */
+static void rcu_idle_count_callbacks_posted(void)
+{
+}
+
 #else /* #if !defined(CONFIG_RCU_FAST_NO_HZ) */
 
 /*
@@ -1978,11 +1986,20 @@ static void rcu_prepare_for_idle(int cpu)
 #define RCU_IDLE_GP_DELAY 6		/* Roughly one grace period. */
 #define RCU_IDLE_LAZY_GP_DELAY (6 * HZ)	/* Roughly six seconds. */
 
+/* Loop counter for rcu_prepare_for_idle(). */
 static DEFINE_PER_CPU(int, rcu_dyntick_drain);
+/* If rcu_dyntick_holdoff==jiffies, don't try to enter dyntick-idle mode. */
 static DEFINE_PER_CPU(unsigned long, rcu_dyntick_holdoff);
-static DEFINE_PER_CPU(struct hrtimer, rcu_idle_gp_timer);
-static ktime_t rcu_idle_gp_wait;	/* If some non-lazy callbacks. */
-static ktime_t rcu_idle_lazy_gp_wait;	/* If only lazy callbacks. */
+/* Timer to awaken the CPU if it enters dyntick-idle mode with callbacks. */
+static DEFINE_PER_CPU(struct timer_list, rcu_idle_gp_timer);
+/* Scheduled expiry time for rcu_idle_gp_timer to allow reposting. */
+static DEFINE_PER_CPU(unsigned long, rcu_idle_gp_timer_expires);
+/* Enable special processing on first attempt to enter dyntick-idle mode. */
+static DEFINE_PER_CPU(bool, rcu_idle_first_pass);
+/* Running count of non-lazy callbacks posted, never decremented. */
+static DEFINE_PER_CPU(unsigned long, rcu_nonlazy_posted);
+/* Snapshot of rcu_nonlazy_posted to detect meaningful exits from idle. */
+static DEFINE_PER_CPU(unsigned long, rcu_nonlazy_posted_snap);
 
 /*
  * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
@@ -1995,6 +2012,8 @@ static ktime_t rcu_idle_lazy_gp_wait;	/* If only lazy callbacks. */
  */
 int rcu_needs_cpu(int cpu)
 {
+	/* Flag a new idle sojourn to the idle-entry state machine. */
+	per_cpu(rcu_idle_first_pass, cpu) = 1;
 	/* If no callbacks, RCU doesn't need the CPU. */
 	if (!rcu_cpu_has_callbacks(cpu))
 		return 0;
@@ -2045,16 +2064,39 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
 }
 
 /*
+ * Handler for smp_call_function_single().  The only point of this
+ * handler is to wake the CPU up, so the handler does only tracing.
+ */
+void rcu_idle_demigrate(void *unused)
+{
+	trace_rcu_prep_idle("Demigrate");
+}
+
+/*
  * Timer handler used to force CPU to start pushing its remaining RCU
  * callbacks in the case where it entered dyntick-idle mode with callbacks
  * pending.  The hander doesn't really need to do anything because the
  * real work is done upon re-entry to idle, or by the next scheduling-clock
  * interrupt should idle not be re-entered.
+ *
+ * One special case: the timer gets migrated without awakening the CPU
+ * on which the timer was scheduled on.  In this case, we must wake up
+ * that CPU.  We do so with smp_call_function_single().
  */
-static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
+static void rcu_idle_gp_timer_func(unsigned long cpu_in)
 {
+	int cpu = (int)cpu_in;
+
 	trace_rcu_prep_idle("Timer");
-	return HRTIMER_NORESTART;
+	if (cpu == smp_processor_id()) {
+		WARN_ON_ONCE(1); /* Getting here can hang the system... */
+	} else {
+		preempt_disable();
+		if (cpu_online(cpu))
+			smp_call_function_single(cpu, rcu_idle_demigrate,
+						 NULL, 0);
+		preempt_enable();
+	}
 }
 
 /*
@@ -2062,19 +2104,11 @@ static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
  */
 static void rcu_prepare_for_idle_init(int cpu)
 {
-	static int firsttime = 1;
-	struct hrtimer *hrtp = &per_cpu(rcu_idle_gp_timer, cpu);
-
-	hrtimer_init(hrtp, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	hrtp->function = rcu_idle_gp_timer_func;
-	if (firsttime) {
-		unsigned int upj = jiffies_to_usecs(RCU_IDLE_GP_DELAY);
-
-		rcu_idle_gp_wait = ns_to_ktime(upj * (u64)1000);
-		upj = jiffies_to_usecs(RCU_IDLE_LAZY_GP_DELAY);
-		rcu_idle_lazy_gp_wait = ns_to_ktime(upj * (u64)1000);
-		firsttime = 0;
-	}
+	per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;
+	setup_timer(&per_cpu(rcu_idle_gp_timer, cpu),
+		    rcu_idle_gp_timer_func, cpu);
+	per_cpu(rcu_idle_gp_timer_expires, cpu) = jiffies - 1;
+	per_cpu(rcu_idle_first_pass, cpu) = 1;
 }
 
 /*
@@ -2084,7 +2118,8 @@ static void rcu_prepare_for_idle_init(int cpu)
  */
 static void rcu_cleanup_after_idle(int cpu)
 {
-	hrtimer_cancel(&per_cpu(rcu_idle_gp_timer, cpu));
+	del_timer(&per_cpu(rcu_idle_gp_timer, cpu));
+	trace_rcu_prep_idle("Cleanup after idle");
 }
 
 /*
@@ -2108,6 +2143,29 @@ static void rcu_cleanup_after_idle(int cpu)
  */
 static void rcu_prepare_for_idle(int cpu)
 {
+	struct timer_list *tp;
+
+	/*
+	 * If this is an idle re-entry, for example, due to use of
+	 * RCU_NONIDLE() or the new idle-loop tracing API within the idle
+	 * loop, then don't take any state-machine actions, unless the
+	 * momentary exit from idle queued additional non-lazy callbacks.
+	 * Instead, repost the rcu_idle_gp_timer if this CPU has callbacks
+	 * pending.
+	 */
+	if (!per_cpu(rcu_idle_first_pass, cpu) &&
+	    (per_cpu(rcu_nonlazy_posted, cpu) ==
+	     per_cpu(rcu_nonlazy_posted_snap, cpu))) {
+		if (rcu_cpu_has_callbacks(cpu)) {
+			tp = &per_cpu(rcu_idle_gp_timer, cpu);
+			mod_timer_pinned(tp, per_cpu(rcu_idle_gp_timer_expires, cpu));
+		}
+		return;
+	}
+	per_cpu(rcu_idle_first_pass, cpu) = 0;
+	per_cpu(rcu_nonlazy_posted_snap, cpu) =
+		per_cpu(rcu_nonlazy_posted, cpu) - 1;
+
 	/*
 	 * If there are no callbacks on this CPU, enter dyntick-idle mode.
 	 * Also reset state to avoid prejudicing later attempts.
@@ -2140,11 +2198,15 @@ static void rcu_prepare_for_idle(int cpu)
 		per_cpu(rcu_dyntick_drain, cpu) = 0;
 		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;
 		if (rcu_cpu_has_nonlazy_callbacks(cpu))
-			hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
-				      rcu_idle_gp_wait, HRTIMER_MODE_REL);
+			per_cpu(rcu_idle_gp_timer_expires, cpu) =
+					   jiffies + RCU_IDLE_GP_DELAY;
 		else
-			hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
-				      rcu_idle_lazy_gp_wait, HRTIMER_MODE_REL);
+			per_cpu(rcu_idle_gp_timer_expires, cpu) =
+					   jiffies + RCU_IDLE_LAZY_GP_DELAY;
+		tp = &per_cpu(rcu_idle_gp_timer, cpu);
+		mod_timer_pinned(tp, per_cpu(rcu_idle_gp_timer_expires, cpu));
+		per_cpu(rcu_nonlazy_posted_snap, cpu) =
+			per_cpu(rcu_nonlazy_posted, cpu);
 		return; /* Nothing more to do immediately. */
 	} else if (--per_cpu(rcu_dyntick_drain, cpu) <= 0) {
 		/* We have hit the limit, so time to give up. */
@@ -2184,6 +2246,19 @@ static void rcu_prepare_for_idle(int cpu)
 		trace_rcu_prep_idle("Callbacks drained");
 }
 
+/*
+ * Keep a running count of the number of non-lazy callbacks posted
+ * on this CPU.  This running counter (which is never decremented) allows
+ * rcu_prepare_for_idle() to detect when something out of the idle loop
+ * posts a callback, even if an equal number of callbacks are invoked.
+ * Of course, callbacks should only be posted from within a trace event
+ * designed to be called from idle or from within RCU_NONIDLE().
+ */
+static void rcu_idle_count_callbacks_posted(void)
+{
+	__this_cpu_add(rcu_nonlazy_posted, 1);
+}
+
 #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
 
 #ifdef CONFIG_RCU_CPU_STALL_INFO
@@ -2192,14 +2267,12 @@ static void rcu_prepare_for_idle(int cpu)
 
 static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
 {
-	struct hrtimer *hrtp = &per_cpu(rcu_idle_gp_timer, cpu);
+	struct timer_list *tltp = &per_cpu(rcu_idle_gp_timer, cpu);
 
-	sprintf(cp, "drain=%d %c timer=%lld",
+	sprintf(cp, "drain=%d %c timer=%lu",
 		per_cpu(rcu_dyntick_drain, cpu),
 		per_cpu(rcu_dyntick_holdoff, cpu) == jiffies ? 'H' : '.',
-		hrtimer_active(hrtp)
-			? ktime_to_us(hrtimer_get_remaining(hrtp))
-			: -1);
+		timer_pending(tltp) ? tltp->expires - jiffies : -1);
 }
 
 #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-04 15:04           ` Paul E. McKenney
@ 2012-05-04 21:41             ` Pascal Chapperon
  2012-05-04 23:14               ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal Chapperon @ 2012-05-04 21:41 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 04/05/2012 17:04, Paul E. McKenney a écrit :
> On Fri, May 04, 2012 at 04:42:54PM +0200, Pascal Chapperon wrote:
>> Le 01/05/2012 17:45, Paul E. McKenney a écrit :
>>
>>> Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.
>>>
>>> Or you can pull branch fnh.2012.05.01a from:
>>>
>>> 	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>>>
>>> 							Thanx, Paul
>>>
>> I applied your global patch on top of v3.4-rc4. But the slowdown is
>> worse than before : boot sequence took 80s instead 20-30s (12s for
>> initramfs instead of 2s).
>>
>> I'll send you rcu tracing log in a second mail.
>
> Hmmm...  Well, I guess I am glad that I finally did something that
> had an effect, but I sure wish that the effect had been in the other
> direction!
>
> Just to make sure I understand: the difference between the 20-30s and
> the 80s is exactly the patch I sent you?
>
> 							Thanx, Paul
>
>
Yes. Exactly same kernel config as in previous results, I applied your 
patch against v3.4-rc4, and sorry, the result is exactly what I said;
I saw that your global patch was quite huge, and addresses things which
are not directly related with the initial patch (commit 
7cb92499000e3c86dae653077b1465458a039ef6); maybe a side effect?

However, I'm ready to try this patch on my smaller laptop which
supports well CONFIG_FAST_NO_HZ=y and systemd, if you think it can
help ?

Another thought: this issue as nothing to do with i7 Hyper-threading
capacities ? (as I test core2duo, Pentium ulv in same conditions and I
don't encountered any slowdown ?)

Pascal


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-04 14:42         ` Pascal Chapperon
@ 2012-05-04 15:04           ` Paul E. McKenney
  2012-05-04 21:41             ` Pascal Chapperon
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-05-04 15:04 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Fri, May 04, 2012 at 04:42:54PM +0200, Pascal Chapperon wrote:
> Le 01/05/2012 17:45, Paul E. McKenney a écrit :
> 
> >Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.
> >
> >Or you can pull branch fnh.2012.05.01a from:
> >
> >	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> >
> >							Thanx, Paul
> >
> I applied your global patch on top of v3.4-rc4. But the slowdown is
> worse than before : boot sequence took 80s instead 20-30s (12s for
> initramfs instead of 2s).
> 
> I'll send you rcu tracing log in a second mail.

Hmmm...  Well, I guess I am glad that I finally did something that
had an effect, but I sure wish that the effect had been in the other
direction!

Just to make sure I understand: the difference between the 20-30s and
the 80s is exactly the patch I sent you?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-01 15:45       ` Paul E. McKenney
@ 2012-05-04 14:42         ` Pascal Chapperon
  2012-05-04 15:04           ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal Chapperon @ 2012-05-04 14:42 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 01/05/2012 17:45, Paul E. McKenney a écrit :

> Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.
>
> Or you can pull branch fnh.2012.05.01a from:
>
> 	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>
> 							Thanx, Paul
>
I applied your global patch on top of v3.4-rc4. But the slowdown is
worse than before : boot sequence took 80s instead 20-30s (12s for
initramfs instead of 2s).

I'll send you rcu tracing log in a second mail.

Pascal


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-01  8:55     ` Pascal Chapperon
@ 2012-05-01 15:45       ` Paul E. McKenney
  2012-05-04 14:42         ` Pascal Chapperon
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-05-01 15:45 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Tue, May 01, 2012 at 10:55:40AM +0200, Pascal Chapperon wrote:
> Le 01/05/2012 02:02, Paul E. McKenney a écrit :
> >On Fri, Apr 27, 2012 at 08:42:58PM -0700, Paul E. McKenney wrote:
> >>On Fri, Apr 27, 2012 at 02:15:20PM +0200, Pascal Chapperon wrote:
> >>>Le 18/04/2012 17:23, Paul E. McKenney a écrit :
> >>>>On Wed, Apr 18, 2012 at 05:00:14PM +0200, Pascal Chapperon wrote:
> >>>>>Le 18/04/2012 16:01, Paul E. McKenney a écrit :
> >>>>>>On Wed, Apr 18, 2012 at 11:37:28AM +0200, Pascal Chapperon wrote:
> >>>>>>>Mount and umount operations are not slower with RCU_FAST_NO_HZ during
> >>>>>>>runtime; systemctl start and stop operations are also not slower. In
> >>>>>>>fact, i couldn't find a single operation slower during runtime with
> >>>>>>>RCU_FAST_NO_HZ.
> >>>>>>
> >>>>>>Your boot-time setup is such that all CPUs are online before the
> >>>>>>boot-time mount operations take place, right?
> >>>>>Yes :
> >>>>>[ 0.242697] Brought up 8 CPUs
> >>>>>[ 0.242699] Total of 8 processors activated (35118.33 BogoMIPS).
> >>>>>
> >>>>>>Struggling to understand
> >>>>>>how RCU can tell the difference between post-CPU-bringup boot time
> >>>>>>and run time...
> >>>>>>
> >>>>>systemd is controlling the whole boot process including mount
> >>>>>operation (apart root filesystem) and as I can see, uses heavily
> >>>>>sockets to do it (not to mention cpu-affinity). It controls also the
> >>>>>major part of umount operations. Is it possible that your patch hits
> >>>>>a systemd bug ?
> >>>>
> >>>>Is it possible that systemd is using network operations that include
> >>>>synchronize_rcu()? Then if you did the same operation from the
> >>>>command line at runtime, you might not see the slowdown.
> >>>>
> >>>>Is it possible for you to convince systemd to collect RCU event tracing
> >>>>during the slow operation? RCU event tracing is available under
> >>>>/sys/kernel/debug/tracing/rcu.
> >>>>
> >>>.
> >>>I have collected the RCU event tracing during a slow boot with
> >>>FAST_NO_HZ (and the same without FAST_NO_HZ, same kernel config).
> >>>The full logs and associated "systemd-analyze plot" can be found
> >>>(in comment 32) at :
> >>>
> >>>https://bugzilla.redhat.com/show_bug.cgi?id=806548
> >>>
> >>>With FAST_NO_HZ, almost each rcu_prep_idle is followed by ksoftirqd
> >>>(75000 ksoftirqd lines with FAST_NO_HZ, 4000 without).
> >>>
> >>>Sorry, the logs are very huge, but I can't figure where are the
> >>>plots of some interest.
> >>
> >>Thank you for collecting them!  I clearly will need to do some scripting.  ;-)
> >
> >And it appears that your system is migrating timers without waking up
> >the CPU on which the timer was posted.  This explains the slowdowns:
> >RCU assumes that the timer will either fire on the CPU that it was posted
> >on or that that CPU will be awakened when it goes offline.  If the timer
> >does not fire on that CPU and that CPU is not otherwise awakened, then
> >that CPU's RCU callbacks can be indefinitely postponed, which could account
> >for the slowdowns that you were seeing.
> >
> >Please see below for a lightly tested patch that should address this
> >problem, and thank you again for your patient testing efforts!
> >
> >							Thanx, Paul
> >
> >------------------------------------------------------------------------
> >
> >rcu: Make RCU_FAST_NO_HZ handle timer migration
> >
> >The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
> >CPU goes offline, in which case it assumes that the CPU will have to come
> >out of dyntick-idle mode (cancelling the timer) in order to go offline.
> >This is important because when RCU_FAST_NO_HZ permits a CPU to enter
> >dyntick-idle mode despite having RCU callbacks pending, it posts a timer
> >on that CPU to force a wakeup on that CPU.  This wakeup ensures that the
> >CPU will eventually handle the end of the grace period, including invoking
> >its RCU callbacks.
> >
> >However, Pascal Chapperon's test setup shows that the timer handler
> >rcu_idle_gp_timer_func() really does get invoked in some cases.  This is
> >problematic because this can cause the CPU that entered dyntick-idle
> >mode despite still having RCU callbacks pending to remain in
> >dyntick-idle mode indefinitely, which means that its RCU callbacks might
> >never be invoked.  This situation can result in grace-period delays or
> >even system hangs, which matches Pascal's observations of slow boot-up
> >and shutdown (https://lkml.org/lkml/2012/4/5/142).  See also the bugzilla:
> >
> >https://bugzilla.redhat.com/show_bug.cgi?id=806548
> >
> >This commit therefore causes the "should never be invoked" timer handler
> >rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
> >the CPU for which the timer was intended, allowing that CPU to invoke
> >its RCU callbacks in a timely manner.
> >
> >Reported-by: Pascal Chapperon<pascal.chapperon@wanadoo.fr>
> >Signed-off-by: Paul E. McKenney<paul.mckenney@linaro.org>
> >Signed-off-by: Paul E. McKenney<paulmck@linux.vnet.ibm.com>
> >---
> >
> >  include/trace/events/rcu.h |    1 +
> >  kernel/rcutree_plugin.h    |   23 ++++++++++++++++++++---
> >  2 files changed, 21 insertions(+), 3 deletions(-)
> >
> >diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> >index aaa55e1..1480900 100644
> >--- a/include/trace/events/rcu.h
> >+++ b/include/trace/events/rcu.h
> >@@ -292,6 +292,7 @@ TRACE_EVENT(rcu_dyntick,
> >   *	"More callbacks": Still more callbacks, try again to clear them out.
> >   *	"Callbacks drained": All callbacks processed, off to dyntick idle!
> >   *	"Timer": Timer fired to cause CPU to continue processing callbacks.
> >+ *	"Demigrate": Timer fired on wrong CPU, woke up correct CPU.
> >   *	"Cleanup after idle": Idle exited, timer canceled.
> >   */
> >  TRACE_EVENT(rcu_prep_idle,
> >diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> >index dc12efc..bbd064a 100644
> >--- a/kernel/rcutree_plugin.h
> >+++ b/kernel/rcutree_plugin.h
> >@@ -1994,16 +1994,33 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
> >  }
> >
> >  /*
> >+ *
> >+ */
> >+void rcu_idle_demigrate(void *unused)
> >+{
> >+	trace_rcu_prep_idle("Demigrate");
> >+}
> >+
> >+/*
> >   * Timer handler used to force CPU to start pushing its remaining RCU
> >   * callbacks in the case where it entered dyntick-idle mode with callbacks
> >   * pending.  The hander doesn't really need to do anything because the
> >   * real work is done upon re-entry to idle, or by the next scheduling-clock
> >   * interrupt should idle not be re-entered.
> >+ *
> >+ * One special case: the timer gets migrated without awakening the CPU
> >+ * on which the timer was scheduled on.  In this case, we must wake up
> >+ * that CPU.  We do so with smp_call_function_single().
> >   */
> >-static void rcu_idle_gp_timer_func(unsigned long unused)
> >+static void rcu_idle_gp_timer_func(unsigned long cpu_in)
> >  {
> >-	WARN_ON_ONCE(1); /* Getting here can hang the system... */
> >+	int cpu = (int)cpu_in;
> >+
> >  	trace_rcu_prep_idle("Timer");
> >+	if (cpu != smp_processor_id())
> >+		smp_call_function_single(cpu, rcu_idle_demigrate, NULL, 0);
> >+	else
> >+		WARN_ON_ONCE(1); /* Getting here can hang the system... */
> >  }
> >
> >  /*
> >@@ -2012,7 +2029,7 @@ static void rcu_idle_gp_timer_func(unsigned long unused)
> >  static void rcu_prepare_for_idle_init(int cpu)
> >  {
> >  	setup_timer(&per_cpu(rcu_idle_gp_timer, cpu),
> >-		    rcu_idle_gp_timer_func, 0);
> >+		    rcu_idle_gp_timer_func, cpu);
> >  }
> >
> >  /*
> >
> >
> Paul, I can't apply your patch on top of master branch; perhaps I
> need to pull your own git repository ?
> Among other things, you have:
> static void rcu_idle_gp_timer_func(unsigned long unused)
> and I have:
> static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)

Here is my RCU_FAST_NO_HZ patch stack on top of v3.4-rc4.

Or you can pull branch fnh.2012.05.01a from:

	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 3370997..1480900 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -292,6 +292,8 @@ TRACE_EVENT(rcu_dyntick,
  *	"More callbacks": Still more callbacks, try again to clear them out.
  *	"Callbacks drained": All callbacks processed, off to dyntick idle!
  *	"Timer": Timer fired to cause CPU to continue processing callbacks.
+ *	"Demigrate": Timer fired on wrong CPU, woke up correct CPU.
+ *	"Cleanup after idle": Idle exited, timer canceled.
  */
 TRACE_EVENT(rcu_prep_idle,
 
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 1050d6d..403306b 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1829,6 +1829,8 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 	rdp->qlen++;
 	if (lazy)
 		rdp->qlen_lazy++;
+	else
+		rcu_idle_count_callbacks_posted();
 
 	if (__is_kfree_rcu_offset((unsigned long)func))
 		trace_rcu_kfree_callback(rsp->name, head, (unsigned long)func,
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index cdd1be0..36ca28e 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -471,6 +471,7 @@ static void __cpuinit rcu_prepare_kthreads(int cpu);
 static void rcu_prepare_for_idle_init(int cpu);
 static void rcu_cleanup_after_idle(int cpu);
 static void rcu_prepare_for_idle(int cpu);
+static void rcu_idle_count_callbacks_posted(void);
 static void print_cpu_stall_info_begin(void);
 static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
 static void print_cpu_stall_info_end(void);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c023464..cadef05 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1938,6 +1938,14 @@ static void rcu_prepare_for_idle(int cpu)
 {
 }
 
+/*
+ * Don't bother keeping a running count of the number of RCU callbacks
+ * posted because CONFIG_RCU_FAST_NO_HZ=n.
+ */
+static void rcu_idle_count_callbacks_posted(void)
+{
+}
+
 #else /* #if !defined(CONFIG_RCU_FAST_NO_HZ) */
 
 /*
@@ -1980,9 +1988,11 @@ static void rcu_prepare_for_idle(int cpu)
 
 static DEFINE_PER_CPU(int, rcu_dyntick_drain);
 static DEFINE_PER_CPU(unsigned long, rcu_dyntick_holdoff);
-static DEFINE_PER_CPU(struct hrtimer, rcu_idle_gp_timer);
-static ktime_t rcu_idle_gp_wait;	/* If some non-lazy callbacks. */
-static ktime_t rcu_idle_lazy_gp_wait;	/* If only lazy callbacks. */
+static DEFINE_PER_CPU(struct timer_list, rcu_idle_gp_timer);
+static DEFINE_PER_CPU(unsigned long, rcu_idle_gp_timer_expires);
+static DEFINE_PER_CPU(bool, rcu_idle_first_pass);
+static DEFINE_PER_CPU(unsigned long, rcu_nonlazy_posted);
+static DEFINE_PER_CPU(unsigned long, rcu_nonlazy_posted_snap);
 
 /*
  * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
@@ -1995,6 +2005,8 @@ static ktime_t rcu_idle_lazy_gp_wait;	/* If only lazy callbacks. */
  */
 int rcu_needs_cpu(int cpu)
 {
+	/* Flag a new idle sojourn to the idle-entry state machine. */
+	per_cpu(rcu_idle_first_pass, cpu) = 1;
 	/* If no callbacks, RCU doesn't need the CPU. */
 	if (!rcu_cpu_has_callbacks(cpu))
 		return 0;
@@ -2045,16 +2057,33 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
 }
 
 /*
+ *
+ */
+void rcu_idle_demigrate(void *unused)
+{
+	trace_rcu_prep_idle("Demigrate");
+}
+
+/*
  * Timer handler used to force CPU to start pushing its remaining RCU
  * callbacks in the case where it entered dyntick-idle mode with callbacks
  * pending.  The hander doesn't really need to do anything because the
  * real work is done upon re-entry to idle, or by the next scheduling-clock
  * interrupt should idle not be re-entered.
+ *
+ * One special case: the timer gets migrated without awakening the CPU
+ * on which the timer was scheduled on.  In this case, we must wake up
+ * that CPU.  We do so with smp_call_function_single().
  */
-static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
+static void rcu_idle_gp_timer_func(unsigned long cpu_in)
 {
+	int cpu = (int)cpu_in;
+
 	trace_rcu_prep_idle("Timer");
-	return HRTIMER_NORESTART;
+	if (cpu != smp_processor_id())
+		smp_call_function_single(cpu, rcu_idle_demigrate, NULL, 0);
+	else
+		WARN_ON_ONCE(1); /* Getting here can hang the system... */
 }
 
 /*
@@ -2062,19 +2091,8 @@ static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
  */
 static void rcu_prepare_for_idle_init(int cpu)
 {
-	static int firsttime = 1;
-	struct hrtimer *hrtp = &per_cpu(rcu_idle_gp_timer, cpu);
-
-	hrtimer_init(hrtp, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	hrtp->function = rcu_idle_gp_timer_func;
-	if (firsttime) {
-		unsigned int upj = jiffies_to_usecs(RCU_IDLE_GP_DELAY);
-
-		rcu_idle_gp_wait = ns_to_ktime(upj * (u64)1000);
-		upj = jiffies_to_usecs(RCU_IDLE_LAZY_GP_DELAY);
-		rcu_idle_lazy_gp_wait = ns_to_ktime(upj * (u64)1000);
-		firsttime = 0;
-	}
+	setup_timer(&per_cpu(rcu_idle_gp_timer, cpu),
+		    rcu_idle_gp_timer_func, cpu);
 }
 
 /*
@@ -2084,7 +2102,8 @@ static void rcu_prepare_for_idle_init(int cpu)
  */
 static void rcu_cleanup_after_idle(int cpu)
 {
-	hrtimer_cancel(&per_cpu(rcu_idle_gp_timer, cpu));
+	del_timer(&per_cpu(rcu_idle_gp_timer, cpu));
+	trace_rcu_prep_idle("Cleanup after idle");
 }
 
 /*
@@ -2108,6 +2127,29 @@ static void rcu_cleanup_after_idle(int cpu)
  */
 static void rcu_prepare_for_idle(int cpu)
 {
+	struct timer_list *tp;
+
+	/*
+	 * If this is an idle re-entry, for example, due to use of
+	 * RCU_NONIDLE() or the new idle-loop tracing API within the idle
+	 * loop, then don't take any state-machine actions, unless the
+	 * momentary exit from idle queued additional non-lazy callbacks.
+	 * Instead, repost the rcu_idle_gp_timer if this CPU has callbacks
+	 * pending.
+	 */
+	if (!per_cpu(rcu_idle_first_pass, cpu) &&
+	    (per_cpu(rcu_nonlazy_posted, cpu) ==
+	     per_cpu(rcu_nonlazy_posted_snap, cpu))) {
+		if (rcu_cpu_has_callbacks(cpu)) {
+			tp = &per_cpu(rcu_idle_gp_timer, cpu);
+			mod_timer_pinned(tp, per_cpu(rcu_idle_gp_timer_expires, cpu));
+		}
+		return;
+	}
+	per_cpu(rcu_idle_first_pass, cpu) = 0;
+	per_cpu(rcu_nonlazy_posted_snap, cpu) =
+		per_cpu(rcu_nonlazy_posted, cpu) - 1;
+
 	/*
 	 * If there are no callbacks on this CPU, enter dyntick-idle mode.
 	 * Also reset state to avoid prejudicing later attempts.
@@ -2140,11 +2182,15 @@ static void rcu_prepare_for_idle(int cpu)
 		per_cpu(rcu_dyntick_drain, cpu) = 0;
 		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;
 		if (rcu_cpu_has_nonlazy_callbacks(cpu))
-			hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
-				      rcu_idle_gp_wait, HRTIMER_MODE_REL);
+			per_cpu(rcu_idle_gp_timer_expires, cpu) =
+					   jiffies + RCU_IDLE_GP_DELAY;
 		else
-			hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
-				      rcu_idle_lazy_gp_wait, HRTIMER_MODE_REL);
+			per_cpu(rcu_idle_gp_timer_expires, cpu) =
+					   jiffies + RCU_IDLE_LAZY_GP_DELAY;
+		tp = &per_cpu(rcu_idle_gp_timer, cpu);
+		mod_timer_pinned(tp, per_cpu(rcu_idle_gp_timer_expires, cpu));
+		per_cpu(rcu_nonlazy_posted_snap, cpu) =
+			per_cpu(rcu_nonlazy_posted, cpu);
 		return; /* Nothing more to do immediately. */
 	} else if (--per_cpu(rcu_dyntick_drain, cpu) <= 0) {
 		/* We have hit the limit, so time to give up. */
@@ -2184,6 +2230,17 @@ static void rcu_prepare_for_idle(int cpu)
 		trace_rcu_prep_idle("Callbacks drained");
 }
 
+/*
+ * Keep a running count of callbacks posted so that rcu_prepare_for_idle()
+ * can detect when something out of the idle loop posts a callback.
+ * Of course, it had better do so either from a trace event designed to
+ * be called from idle or from within RCU_NONIDLE().
+ */
+static void rcu_idle_count_callbacks_posted(void)
+{
+	__this_cpu_add(rcu_nonlazy_posted, 1);
+}
+
 #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
 
 #ifdef CONFIG_RCU_CPU_STALL_INFO
@@ -2192,14 +2249,12 @@ static void rcu_prepare_for_idle(int cpu)
 
 static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
 {
-	struct hrtimer *hrtp = &per_cpu(rcu_idle_gp_timer, cpu);
+	struct timer_list *tltp = &per_cpu(rcu_idle_gp_timer, cpu);
 
-	sprintf(cp, "drain=%d %c timer=%lld",
+	sprintf(cp, "drain=%d %c timer=%lu",
 		per_cpu(rcu_dyntick_drain, cpu),
 		per_cpu(rcu_dyntick_holdoff, cpu) == jiffies ? 'H' : '.',
-		hrtimer_active(hrtp)
-			? ktime_to_us(hrtimer_get_remaining(hrtp))
-			: -1);
+		timer_pending(tltp) ? tltp->expires - jiffies : -1);
 }
 
 #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-05-01  0:02   ` Paul E. McKenney
@ 2012-05-01  8:55     ` Pascal Chapperon
  2012-05-01 15:45       ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal Chapperon @ 2012-05-01  8:55 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 01/05/2012 02:02, Paul E. McKenney a écrit :
> On Fri, Apr 27, 2012 at 08:42:58PM -0700, Paul E. McKenney wrote:
>> On Fri, Apr 27, 2012 at 02:15:20PM +0200, Pascal Chapperon wrote:
>>> Le 18/04/2012 17:23, Paul E. McKenney a écrit :
>>>> On Wed, Apr 18, 2012 at 05:00:14PM +0200, Pascal Chapperon wrote:
>>>>> Le 18/04/2012 16:01, Paul E. McKenney a écrit :
>>>>>> On Wed, Apr 18, 2012 at 11:37:28AM +0200, Pascal Chapperon wrote:
>>>>>>> Mount and umount operations are not slower with RCU_FAST_NO_HZ during
>>>>>>> runtime; systemctl start and stop operations are also not slower. In
>>>>>>> fact, i couldn't find a single operation slower during runtime with
>>>>>>> RCU_FAST_NO_HZ.
>>>>>>
>>>>>> Your boot-time setup is such that all CPUs are online before the
>>>>>> boot-time mount operations take place, right?
>>>>> Yes :
>>>>> [ 0.242697] Brought up 8 CPUs
>>>>> [ 0.242699] Total of 8 processors activated (35118.33 BogoMIPS).
>>>>>
>>>>>> Struggling to understand
>>>>>> how RCU can tell the difference between post-CPU-bringup boot time
>>>>>> and run time...
>>>>>>
>>>>> systemd is controlling the whole boot process including mount
>>>>> operation (apart root filesystem) and as I can see, uses heavily
>>>>> sockets to do it (not to mention cpu-affinity). It controls also the
>>>>> major part of umount operations. Is it possible that your patch hits
>>>>> a systemd bug ?
>>>>
>>>> Is it possible that systemd is using network operations that include
>>>> synchronize_rcu()? Then if you did the same operation from the
>>>> command line at runtime, you might not see the slowdown.
>>>>
>>>> Is it possible for you to convince systemd to collect RCU event tracing
>>>> during the slow operation? RCU event tracing is available under
>>>> /sys/kernel/debug/tracing/rcu.
>>>>
>>> .
>>> I have collected the RCU event tracing during a slow boot with
>>> FAST_NO_HZ (and the same without FAST_NO_HZ, same kernel config).
>>> The full logs and associated "systemd-analyze plot" can be found
>>> (in comment 32) at :
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=806548
>>>
>>> With FAST_NO_HZ, almost each rcu_prep_idle is followed by ksoftirqd
>>> (75000 ksoftirqd lines with FAST_NO_HZ, 4000 without).
>>>
>>> Sorry, the logs are very huge, but I can't figure where are the
>>> plots of some interest.
>>
>> Thank you for collecting them!  I clearly will need to do some scripting.  ;-)
>
> And it appears that your system is migrating timers without waking up
> the CPU on which the timer was posted.  This explains the slowdowns:
> RCU assumes that the timer will either fire on the CPU that it was posted
> on or that that CPU will be awakened when it goes offline.  If the timer
> does not fire on that CPU and that CPU is not otherwise awakened, then
> that CPU's RCU callbacks can be indefinitely postponed, which could account
> for the slowdowns that you were seeing.
>
> Please see below for a lightly tested patch that should address this
> problem, and thank you again for your patient testing efforts!
>
> 							Thanx, Paul
>
> ------------------------------------------------------------------------
>
> rcu: Make RCU_FAST_NO_HZ handle timer migration
>
> The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
> CPU goes offline, in which case it assumes that the CPU will have to come
> out of dyntick-idle mode (cancelling the timer) in order to go offline.
> This is important because when RCU_FAST_NO_HZ permits a CPU to enter
> dyntick-idle mode despite having RCU callbacks pending, it posts a timer
> on that CPU to force a wakeup on that CPU.  This wakeup ensures that the
> CPU will eventually handle the end of the grace period, including invoking
> its RCU callbacks.
>
> However, Pascal Chapperon's test setup shows that the timer handler
> rcu_idle_gp_timer_func() really does get invoked in some cases.  This is
> problematic because this can cause the CPU that entered dyntick-idle
> mode despite still having RCU callbacks pending to remain in
> dyntick-idle mode indefinitely, which means that its RCU callbacks might
> never be invoked.  This situation can result in grace-period delays or
> even system hangs, which matches Pascal's observations of slow boot-up
> and shutdown (https://lkml.org/lkml/2012/4/5/142).  See also the bugzilla:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=806548
>
> This commit therefore causes the "should never be invoked" timer handler
> rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
> the CPU for which the timer was intended, allowing that CPU to invoke
> its RCU callbacks in a timely manner.
>
> Reported-by: Pascal Chapperon<pascal.chapperon@wanadoo.fr>
> Signed-off-by: Paul E. McKenney<paul.mckenney@linaro.org>
> Signed-off-by: Paul E. McKenney<paulmck@linux.vnet.ibm.com>
> ---
>
>   include/trace/events/rcu.h |    1 +
>   kernel/rcutree_plugin.h    |   23 ++++++++++++++++++++---
>   2 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index aaa55e1..1480900 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -292,6 +292,7 @@ TRACE_EVENT(rcu_dyntick,
>    *	"More callbacks": Still more callbacks, try again to clear them out.
>    *	"Callbacks drained": All callbacks processed, off to dyntick idle!
>    *	"Timer": Timer fired to cause CPU to continue processing callbacks.
> + *	"Demigrate": Timer fired on wrong CPU, woke up correct CPU.
>    *	"Cleanup after idle": Idle exited, timer canceled.
>    */
>   TRACE_EVENT(rcu_prep_idle,
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index dc12efc..bbd064a 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -1994,16 +1994,33 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
>   }
>
>   /*
> + *
> + */
> +void rcu_idle_demigrate(void *unused)
> +{
> +	trace_rcu_prep_idle("Demigrate");
> +}
> +
> +/*
>    * Timer handler used to force CPU to start pushing its remaining RCU
>    * callbacks in the case where it entered dyntick-idle mode with callbacks
>    * pending.  The hander doesn't really need to do anything because the
>    * real work is done upon re-entry to idle, or by the next scheduling-clock
>    * interrupt should idle not be re-entered.
> + *
> + * One special case: the timer gets migrated without awakening the CPU
> + * on which the timer was scheduled on.  In this case, we must wake up
> + * that CPU.  We do so with smp_call_function_single().
>    */
> -static void rcu_idle_gp_timer_func(unsigned long unused)
> +static void rcu_idle_gp_timer_func(unsigned long cpu_in)
>   {
> -	WARN_ON_ONCE(1); /* Getting here can hang the system... */
> +	int cpu = (int)cpu_in;
> +
>   	trace_rcu_prep_idle("Timer");
> +	if (cpu != smp_processor_id())
> +		smp_call_function_single(cpu, rcu_idle_demigrate, NULL, 0);
> +	else
> +		WARN_ON_ONCE(1); /* Getting here can hang the system... */
>   }
>
>   /*
> @@ -2012,7 +2029,7 @@ static void rcu_idle_gp_timer_func(unsigned long unused)
>   static void rcu_prepare_for_idle_init(int cpu)
>   {
>   	setup_timer(&per_cpu(rcu_idle_gp_timer, cpu),
> -		    rcu_idle_gp_timer_func, 0);
> +		    rcu_idle_gp_timer_func, cpu);
>   }
>
>   /*
>
>
Paul, I can't apply your patch on top of master branch; perhaps I need 
to pull your own git repository ?
Among other things, you have:
static void rcu_idle_gp_timer_func(unsigned long unused)
and I have:
static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)

Pascal


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-28  3:42 ` Paul E. McKenney
@ 2012-05-01  0:02   ` Paul E. McKenney
  2012-05-01  8:55     ` Pascal Chapperon
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-05-01  0:02 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Fri, Apr 27, 2012 at 08:42:58PM -0700, Paul E. McKenney wrote:
> On Fri, Apr 27, 2012 at 02:15:20PM +0200, Pascal Chapperon wrote:
> > Le 18/04/2012 17:23, Paul E. McKenney a écrit :
> > > On Wed, Apr 18, 2012 at 05:00:14PM +0200, Pascal Chapperon wrote:
> > >> Le 18/04/2012 16:01, Paul E. McKenney a écrit :
> > >>> On Wed, Apr 18, 2012 at 11:37:28AM +0200, Pascal Chapperon wrote:
> > >>>> Mount and umount operations are not slower with RCU_FAST_NO_HZ during
> > >>>> runtime; systemctl start and stop operations are also not slower. In
> > >>>> fact, i couldn't find a single operation slower during runtime with
> > >>>> RCU_FAST_NO_HZ.
> > >>>
> > >>> Your boot-time setup is such that all CPUs are online before the
> > >>> boot-time mount operations take place, right?
> > >> Yes :
> > >> [ 0.242697] Brought up 8 CPUs
> > >> [ 0.242699] Total of 8 processors activated (35118.33 BogoMIPS).
> > >>
> > >>> Struggling to understand
> > >>> how RCU can tell the difference between post-CPU-bringup boot time
> > >>> and run time...
> > >>>
> > >> systemd is controlling the whole boot process including mount
> > >> operation (apart root filesystem) and as I can see, uses heavily
> > >> sockets to do it (not to mention cpu-affinity). It controls also the
> > >> major part of umount operations. Is it possible that your patch hits
> > >> a systemd bug ?
> > > 
> > > Is it possible that systemd is using network operations that include
> > > synchronize_rcu()? Then if you did the same operation from the
> > > command line at runtime, you might not see the slowdown.
> > > 
> > > Is it possible for you to convince systemd to collect RCU event tracing
> > > during the slow operation? RCU event tracing is available under
> > > /sys/kernel/debug/tracing/rcu.
> > >
> > .
> > I have collected the RCU event tracing during a slow boot with
> > FAST_NO_HZ (and the same without FAST_NO_HZ, same kernel config).
> > The full logs and associated "systemd-analyze plot" can be found
> > (in comment 32) at :
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=806548
> > 
> > With FAST_NO_HZ, almost each rcu_prep_idle is followed by ksoftirqd
> > (75000 ksoftirqd lines with FAST_NO_HZ, 4000 without).
> > 
> > Sorry, the logs are very huge, but I can't figure where are the 
> > plots of some interest. 
> 
> Thank you for collecting them!  I clearly will need to do some scripting.  ;-)

And it appears that your system is migrating timers without waking up
the CPU on which the timer was posted.  This explains the slowdowns:
RCU assumes that the timer will either fire on the CPU that it was posted
on or that that CPU will be awakened when it goes offline.  If the timer
does not fire on that CPU and that CPU is not otherwise awakened, then
that CPU's RCU callbacks can be indefinitely postponed, which could account
for the slowdowns that you were seeing.

Please see below for a lightly tested patch that should address this
problem, and thank you again for your patient testing efforts!

							Thanx, Paul

------------------------------------------------------------------------

rcu: Make RCU_FAST_NO_HZ handle timer migration

The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
CPU goes offline, in which case it assumes that the CPU will have to come
out of dyntick-idle mode (cancelling the timer) in order to go offline.
This is important because when RCU_FAST_NO_HZ permits a CPU to enter
dyntick-idle mode despite having RCU callbacks pending, it posts a timer
on that CPU to force a wakeup on that CPU.  This wakeup ensures that the
CPU will eventually handle the end of the grace period, including invoking
its RCU callbacks.

However, Pascal Chapperon's test setup shows that the timer handler
rcu_idle_gp_timer_func() really does get invoked in some cases.  This is
problematic because this can cause the CPU that entered dyntick-idle
mode despite still having RCU callbacks pending to remain in
dyntick-idle mode indefinitely, which means that its RCU callbacks might
never be invoked.  This situation can result in grace-period delays or
even system hangs, which matches Pascal's observations of slow boot-up
and shutdown (https://lkml.org/lkml/2012/4/5/142).  See also the bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=806548

This commit therefore causes the "should never be invoked" timer handler
rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
the CPU for which the timer was intended, allowing that CPU to invoke
its RCU callbacks in a timely manner.

Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 include/trace/events/rcu.h |    1 +
 kernel/rcutree_plugin.h    |   23 ++++++++++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aaa55e1..1480900 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -292,6 +292,7 @@ TRACE_EVENT(rcu_dyntick,
  *	"More callbacks": Still more callbacks, try again to clear them out.
  *	"Callbacks drained": All callbacks processed, off to dyntick idle!
  *	"Timer": Timer fired to cause CPU to continue processing callbacks.
+ *	"Demigrate": Timer fired on wrong CPU, woke up correct CPU.
  *	"Cleanup after idle": Idle exited, timer canceled.
  */
 TRACE_EVENT(rcu_prep_idle,
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index dc12efc..bbd064a 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1994,16 +1994,33 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
 }
 
 /*
+ *
+ */
+void rcu_idle_demigrate(void *unused)
+{
+	trace_rcu_prep_idle("Demigrate");
+}
+
+/*
  * Timer handler used to force CPU to start pushing its remaining RCU
  * callbacks in the case where it entered dyntick-idle mode with callbacks
  * pending.  The hander doesn't really need to do anything because the
  * real work is done upon re-entry to idle, or by the next scheduling-clock
  * interrupt should idle not be re-entered.
+ *
+ * One special case: the timer gets migrated without awakening the CPU
+ * on which the timer was scheduled on.  In this case, we must wake up
+ * that CPU.  We do so with smp_call_function_single().
  */
-static void rcu_idle_gp_timer_func(unsigned long unused)
+static void rcu_idle_gp_timer_func(unsigned long cpu_in)
 {
-	WARN_ON_ONCE(1); /* Getting here can hang the system... */
+	int cpu = (int)cpu_in;
+
 	trace_rcu_prep_idle("Timer");
+	if (cpu != smp_processor_id())
+		smp_call_function_single(cpu, rcu_idle_demigrate, NULL, 0);
+	else
+		WARN_ON_ONCE(1); /* Getting here can hang the system... */
 }
 
 /*
@@ -2012,7 +2029,7 @@ static void rcu_idle_gp_timer_func(unsigned long unused)
 static void rcu_prepare_for_idle_init(int cpu)
 {
 	setup_timer(&per_cpu(rcu_idle_gp_timer, cpu),
-		    rcu_idle_gp_timer_func, 0);
+		    rcu_idle_gp_timer_func, cpu);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
  2012-04-27 12:15 Pascal Chapperon
@ 2012-04-28  3:42 ` Paul E. McKenney
  2012-05-01  0:02   ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2012-04-28  3:42 UTC (permalink / raw)
  To: Pascal Chapperon; +Cc: Josh Boyer, linux-kernel, kernel-team

On Fri, Apr 27, 2012 at 02:15:20PM +0200, Pascal Chapperon wrote:
> Le 18/04/2012 17:23, Paul E. McKenney a écrit :
> > On Wed, Apr 18, 2012 at 05:00:14PM +0200, Pascal Chapperon wrote:
> >> Le 18/04/2012 16:01, Paul E. McKenney a écrit :
> >>> On Wed, Apr 18, 2012 at 11:37:28AM +0200, Pascal Chapperon wrote:
> >>>> Mount and umount operations are not slower with RCU_FAST_NO_HZ during
> >>>> runtime; systemctl start and stop operations are also not slower. In
> >>>> fact, i couldn't find a single operation slower during runtime with
> >>>> RCU_FAST_NO_HZ.
> >>>
> >>> Your boot-time setup is such that all CPUs are online before the
> >>> boot-time mount operations take place, right?
> >> Yes :
> >> [ 0.242697] Brought up 8 CPUs
> >> [ 0.242699] Total of 8 processors activated (35118.33 BogoMIPS).
> >>
> >>> Struggling to understand
> >>> how RCU can tell the difference between post-CPU-bringup boot time
> >>> and run time...
> >>>
> >> systemd is controlling the whole boot process including mount
> >> operation (apart root filesystem) and as I can see, uses heavily
> >> sockets to do it (not to mention cpu-affinity). It controls also the
> >> major part of umount operations. Is it possible that your patch hits
> >> a systemd bug ?
> > 
> > Is it possible that systemd is using network operations that include
> > synchronize_rcu()? Then if you did the same operation from the
> > command line at runtime, you might not see the slowdown.
> > 
> > Is it possible for you to convince systemd to collect RCU event tracing
> > during the slow operation? RCU event tracing is available under
> > /sys/kernel/debug/tracing/rcu.
> >
> .
> I have collected the RCU event tracing during a slow boot with
> FAST_NO_HZ (and the same without FAST_NO_HZ, same kernel config).
> The full logs and associated "systemd-analyze plot" can be found
> (in comment 32) at :
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=806548
> 
> With FAST_NO_HZ, almost each rcu_prep_idle is followed by ksoftirqd
> (75000 ksoftirqd lines with FAST_NO_HZ, 4000 without).
> 
> Sorry, the logs are very huge, but I can't figure where are the 
> plots of some interest. 

Thank you for collecting them!  I clearly will need to do some scripting.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RCU related performance regression in 3.3
@ 2012-04-27 12:15 Pascal Chapperon
  2012-04-28  3:42 ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Pascal Chapperon @ 2012-04-27 12:15 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, linux-kernel, kernel-team

Le 18/04/2012 17:23, Paul E. McKenney a écrit :
> On Wed, Apr 18, 2012 at 05:00:14PM +0200, Pascal Chapperon wrote:
>> Le 18/04/2012 16:01, Paul E. McKenney a écrit :
>>> On Wed, Apr 18, 2012 at 11:37:28AM +0200, Pascal Chapperon wrote:
>>>> Mount and umount operations are not slower with RCU_FAST_NO_HZ during
>>>> runtime; systemctl start and stop operations are also not slower. In
>>>> fact, i couldn't find a single operation slower during runtime with
>>>> RCU_FAST_NO_HZ.
>>>
>>> Your boot-time setup is such that all CPUs are online before the
>>> boot-time mount operations take place, right?
>> Yes :
>> [ 0.242697] Brought up 8 CPUs
>> [ 0.242699] Total of 8 processors activated (35118.33 BogoMIPS).
>>
>>> Struggling to understand
>>> how RCU can tell the difference between post-CPU-bringup boot time
>>> and run time...
>>>
>> systemd is controlling the whole boot process including mount
>> operation (apart root filesystem) and as I can see, uses heavily
>> sockets to do it (not to mention cpu-affinity). It controls also the
>> major part of umount operations. Is it possible that your patch hits
>> a systemd bug ?
> 
> Is it possible that systemd is using network operations that include
> synchronize_rcu()? Then if you did the same operation from the
> command line at runtime, you might not see the slowdown.
> 
> Is it possible for you to convince systemd to collect RCU event tracing
> during the slow operation? RCU event tracing is available under
> /sys/kernel/debug/tracing/rcu.
>
.
I have collected the RCU event tracing during a slow boot with
FAST_NO_HZ (and the same without FAST_NO_HZ, same kernel config).
The full logs and associated "systemd-analyze plot" can be found
(in comment 32) at :

https://bugzilla.redhat.com/show_bug.cgi?id=806548

With FAST_NO_HZ, almost each rcu_prep_idle is followed by ksoftirqd
(75000 ksoftirqd lines with FAST_NO_HZ, 4000 without).

Sorry, the logs are very huge, but I can't figure where are the 
plots of some interest. 

Pascal

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-05-18 14:48 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-04 15:27 RCU related performance regression in 3.3 Josh Boyer
2012-04-04 21:36 ` Paul E. McKenney
2012-04-05 12:37   ` Josh Boyer
2012-04-05 14:00     ` Paul E. McKenney
2012-04-05 14:15       ` Pascal CHAPPERON
2012-04-05 14:39         ` Paul E. McKenney
2012-04-06  9:18           ` Pascal Chapperon
2012-04-10 16:07             ` Paul E. McKenney
2012-04-11 15:06               ` Pascal
2012-04-12 18:04                 ` Paul E. McKenney
2012-04-16 21:02                   ` Paul E. McKenney
2012-04-18  9:37                     ` Pascal Chapperon
2012-04-18 14:01                       ` Paul E. McKenney
2012-04-18 15:00                         ` Pascal Chapperon
2012-04-18 15:23                           ` Paul E. McKenney
2012-04-20 14:45                             ` Pascal Chapperon
2012-04-27 12:15 Pascal Chapperon
2012-04-28  3:42 ` Paul E. McKenney
2012-05-01  0:02   ` Paul E. McKenney
2012-05-01  8:55     ` Pascal Chapperon
2012-05-01 15:45       ` Paul E. McKenney
2012-05-04 14:42         ` Pascal Chapperon
2012-05-04 15:04           ` Paul E. McKenney
2012-05-04 21:41             ` Pascal Chapperon
2012-05-04 23:14               ` Paul E. McKenney
2012-05-10  8:40                 ` Pascal Chapperon
2012-05-14 22:32                 ` Paul E. McKenney
2012-05-18 11:01                   ` Pascal Chapperon
2012-05-18 12:14                     ` Paul E. McKenney
2012-05-18 14:48                       ` Pascal Chapperon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.