All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] rcu: use killable versions of swait
@ 2017-06-14 23:06 Luis R. Rodriguez
  2017-06-14 23:43 ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-14 23:06 UTC (permalink / raw)
  To: paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai
  Cc: paul.gortmaker, ebiederm, dmitry.torokhov, linux-kernel,
	Luis R. Rodriguez

These waits don't even check for the return value for interruption,
using the non-killable variants means we could be killed by other
signals than SIGKILL, this is fragile.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---

The killable swaits were just posted [1] as part of a series where SIGCHLD
was detected as interrupting and killing kernel calls waiting using
non-killable swaits [1]. The fragility here made curious about other callers
and seeing if they really meant to use such broad wait which captures a lot
of signals.

I can't see why we'd want to have these killed by other signals, specialy
since it seems we don't even check for the return value... Granted to abort
properly we'd have to check for the return value for -ERESTARTSYS, but yeah,
none of this is done, so it would seem we don't want fragile signals
interrupting these ?

Also can someone confirm if the original change of to swait_event_timeout()
from wait_event_interruptible_timeout() was actually intentional on
synchronize_sched_expedited_wait() on commit abedf8e2419fb ("rcu: Use simple
wait queues where possible in rcutree") ? I can't easily confirm.

[0] https://lkml.kernel.org/r/20170614222017.14653-3-mcgrof@kernel.org
[1] https://lkml.kernel.org/r/20170614222017.14653-1-mcgrof@kernel.org

 kernel/rcu/tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 695fee7cafe0..9a8d06486a3c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2191,7 +2191,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("reqwait"));
 			rsp->gp_state = RCU_GP_WAIT_GPS;
-			swait_event_interruptible(rsp->gp_wq,
+			swait_event_killable(rsp->gp_wq,
 						 READ_ONCE(rsp->gp_flags) &
 						 RCU_GP_FLAG_INIT);
 			rsp->gp_state = RCU_GP_DONE_GPS;
@@ -2224,7 +2224,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("fqswait"));
 			rsp->gp_state = RCU_GP_WAIT_FQS;
-			ret = swait_event_interruptible_timeout(rsp->gp_wq,
+			ret = swait_event_killable_timeout(rsp->gp_wq,
 					rcu_gp_fqs_check_wake(rsp, &gf), j);
 			rsp->gp_state = RCU_GP_DOING_FQS;
 			/* Locking provides needed memory barriers. */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC] rcu: use killable versions of swait
  2017-06-14 23:06 [RFC] rcu: use killable versions of swait Luis R. Rodriguez
@ 2017-06-14 23:43 ` Paul E. McKenney
  2017-06-15 15:50   ` Luis R. Rodriguez
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2017-06-14 23:43 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, paul.gortmaker,
	ebiederm, dmitry.torokhov, linux-kernel, oleg

On Wed, Jun 14, 2017 at 04:06:39PM -0700, Luis R. Rodriguez wrote:
> These waits don't even check for the return value for interruption,
> using the non-killable variants means we could be killed by other
> signals than SIGKILL, this is fragile.

A number of people asserted that kthreads could never catch signals.
I checked this at the time, and it seemed like this was the case, and
the call to ignore_signals() seems to confirm this.

So it looks to me like RCU doesn't care about this change (give or
take any affect on the load average, anyway), but there is much that I
don't know about Linux-kernel signal handling, so please feel free to
educate me.

> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
> 
> The killable swaits were just posted [1] as part of a series where SIGCHLD
> was detected as interrupting and killing kernel calls waiting using
> non-killable swaits [1]. The fragility here made curious about other callers
> and seeing if they really meant to use such broad wait which captures a lot
> of signals.
> 
> I can't see why we'd want to have these killed by other signals, specialy
> since it seems we don't even check for the return value... Granted to abort
> properly we'd have to check for the return value for -ERESTARTSYS, but yeah,
> none of this is done, so it would seem we don't want fragile signals
> interrupting these ?

The later WARN_ON(signal_pending(current)) complains if a signal somehow
makes it to this task.  Assuming that the signal is nonfatal, anyway.

> Also can someone confirm if the original change of to swait_event_timeout()
> from wait_event_interruptible_timeout() was actually intentional on
> synchronize_sched_expedited_wait() on commit abedf8e2419fb ("rcu: Use simple
> wait queues where possible in rcutree") ? I can't easily confirm.

This is also called from a workqueue (at least once the core_initcall()s
get done), which again should mean no signals.  A WARN_ON(ret < 0)
complains if this ever changes.  So it should be OK that this is
swait_event_timeout().  And expedited grace periods are supposed to
get done quickly, so effect on the load average should be negligible.

Or am I missing something?

							Thanx, Paul

> [0] https://lkml.kernel.org/r/20170614222017.14653-3-mcgrof@kernel.org
> [1] https://lkml.kernel.org/r/20170614222017.14653-1-mcgrof@kernel.org
> 
>  kernel/rcu/tree.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 695fee7cafe0..9a8d06486a3c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2191,7 +2191,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
>  					       READ_ONCE(rsp->gpnum),
>  					       TPS("reqwait"));
>  			rsp->gp_state = RCU_GP_WAIT_GPS;
> -			swait_event_interruptible(rsp->gp_wq,
> +			swait_event_killable(rsp->gp_wq,
>  						 READ_ONCE(rsp->gp_flags) &
>  						 RCU_GP_FLAG_INIT);
>  			rsp->gp_state = RCU_GP_DONE_GPS;
> @@ -2224,7 +2224,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
>  					       READ_ONCE(rsp->gpnum),
>  					       TPS("fqswait"));
>  			rsp->gp_state = RCU_GP_WAIT_FQS;
> -			ret = swait_event_interruptible_timeout(rsp->gp_wq,
> +			ret = swait_event_killable_timeout(rsp->gp_wq,
>  					rcu_gp_fqs_check_wake(rsp, &gf), j);
>  			rsp->gp_state = RCU_GP_DOING_FQS;
>  			/* Locking provides needed memory barriers. */
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] rcu: use killable versions of swait
  2017-06-14 23:43 ` Paul E. McKenney
@ 2017-06-15 15:50   ` Luis R. Rodriguez
  2017-06-15 16:22     ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-15 15:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Luis R. Rodriguez, josh, rostedt, mathieu.desnoyers,
	jiangshanlai, paul.gortmaker, ebiederm, dmitry.torokhov,
	linux-kernel, oleg

On Wed, Jun 14, 2017 at 04:43:45PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 14, 2017 at 04:06:39PM -0700, Luis R. Rodriguez wrote:
> > These waits don't even check for the return value for interruption,
> > using the non-killable variants means we could be killed by other
> > signals than SIGKILL, this is fragile.
> 
> A number of people asserted that kthreads could never catch signals.
> I checked this at the time, and it seemed like this was the case, and
> the call to ignore_signals() seems to confirm this.
> 
> So it looks to me like RCU doesn't care about this change (give or
> take any affect on the load average, anyway), but there is much that I
> don't know about Linux-kernel signal handling, so please feel free to
> educate me.

Thanks, I had seen the kthread but figured best to ask, just got into parnaoia
mode. If we were to do a sanity check for usage we'd then have to white list
when kthreads are used, however since we don't care to be interrupted why not
use a wait which is also explicit about our current uninterruptible state?

> > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > ---
> > 
> > The killable swaits were just posted [1] as part of a series where SIGCHLD
> > was detected as interrupting and killing kernel calls waiting using
> > non-killable swaits [1]. The fragility here made curious about other callers
> > and seeing if they really meant to use such broad wait which captures a lot
> > of signals.
> > 
> > I can't see why we'd want to have these killed by other signals, specialy
> > since it seems we don't even check for the return value... Granted to abort
> > properly we'd have to check for the return value for -ERESTARTSYS, but yeah,
> > none of this is done, so it would seem we don't want fragile signals
> > interrupting these ?
> 
> The later WARN_ON(signal_pending(current)) complains if a signal somehow
> makes it to this task.  Assuming that the signal is nonfatal, anyway.

I see, how about just using swait_event_timeout() and removing the WARN_ON()?
Is there a reason for having the interruptible ?

> > Also can someone confirm if the original change of to swait_event_timeout()
> > from wait_event_interruptible_timeout() was actually intentional on
> > synchronize_sched_expedited_wait() on commit abedf8e2419fb ("rcu: Use simple
> > wait queues where possible in rcutree") ? I can't easily confirm.
> 
> This is also called from a workqueue (at least once the core_initcall()s
> get done), which again should mean no signals.  A WARN_ON(ret < 0)
> complains if this ever changes.  So it should be OK that this is
> swait_event_timeout().  And expedited grace periods are supposed to
> get done quickly, so effect on the load average should be negligible.

Great thanks.

> Or am I missing something?

No, just got into parnoia mode and better to ask than be sorry later!

  Luis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] rcu: use killable versions of swait
  2017-06-15 15:50   ` Luis R. Rodriguez
@ 2017-06-15 16:22     ` Paul E. McKenney
  2017-06-15 16:35       ` Luis R. Rodriguez
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2017-06-15 16:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, paul.gortmaker,
	ebiederm, dmitry.torokhov, linux-kernel, oleg

On Thu, Jun 15, 2017 at 05:50:39PM +0200, Luis R. Rodriguez wrote:
> On Wed, Jun 14, 2017 at 04:43:45PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 14, 2017 at 04:06:39PM -0700, Luis R. Rodriguez wrote:
> > > These waits don't even check for the return value for interruption,
> > > using the non-killable variants means we could be killed by other
> > > signals than SIGKILL, this is fragile.
> > 
> > A number of people asserted that kthreads could never catch signals.
> > I checked this at the time, and it seemed like this was the case, and
> > the call to ignore_signals() seems to confirm this.
> > 
> > So it looks to me like RCU doesn't care about this change (give or
> > take any affect on the load average, anyway), but there is much that I
> > don't know about Linux-kernel signal handling, so please feel free to
> > educate me.
> 
> Thanks, I had seen the kthread but figured best to ask, just got into parnaoia
> mode. If we were to do a sanity check for usage we'd then have to white list
> when kthreads are used, however since we don't care to be interrupted why not
> use a wait which is also explicit about our current uninterruptible state?

I do appreciate any and all inspection, actually, so thank you!

I used to have it uninterruptable, but got complaints on the effect on
the load average -- sysadms didn't like the "D" state and the fact that
the load average was always greater than 2 (or 3 on PREEMPT=y kernels)
when the system was completely idle.

> > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > > ---
> > > 
> > > The killable swaits were just posted [1] as part of a series where SIGCHLD
> > > was detected as interrupting and killing kernel calls waiting using
> > > non-killable swaits [1]. The fragility here made curious about other callers
> > > and seeing if they really meant to use such broad wait which captures a lot
> > > of signals.
> > > 
> > > I can't see why we'd want to have these killed by other signals, specialy
> > > since it seems we don't even check for the return value... Granted to abort
> > > properly we'd have to check for the return value for -ERESTARTSYS, but yeah,
> > > none of this is done, so it would seem we don't want fragile signals
> > > interrupting these ?
> > 
> > The later WARN_ON(signal_pending(current)) complains if a signal somehow
> > makes it to this task.  Assuming that the signal is nonfatal, anyway.
> 
> I see, how about just using swait_event_timeout() and removing the WARN_ON()?
> Is there a reason for having the interruptible ?

If sleeping-uninterruptible kthreads are now excluded from the load average,
no reason.  But if sleeping-uninterruptible kthreads are still included in
the load average, it must stay interruptible.

> > > Also can someone confirm if the original change of to swait_event_timeout()
> > > from wait_event_interruptible_timeout() was actually intentional on
> > > synchronize_sched_expedited_wait() on commit abedf8e2419fb ("rcu: Use simple
> > > wait queues where possible in rcutree") ? I can't easily confirm.
> > 
> > This is also called from a workqueue (at least once the core_initcall()s
> > get done), which again should mean no signals.  A WARN_ON(ret < 0)
> > complains if this ever changes.  So it should be OK that this is
> > swait_event_timeout().  And expedited grace periods are supposed to
> > get done quickly, so effect on the load average should be negligible.
> 
> Great thanks.
> 
> > Or am I missing something?
> 
> No, just got into parnoia mode and better to ask than be sorry later!

Indeed, better safe that sorry!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] rcu: use killable versions of swait
  2017-06-15 16:22     ` Paul E. McKenney
@ 2017-06-15 16:35       ` Luis R. Rodriguez
  2017-06-15 16:55         ` Eric W. Biederman
  2017-06-15 17:34         ` [RFC] rcu: use killable versions of swait Paul E. McKenney
  0 siblings, 2 replies; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-15 16:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Josh Triplett, Steven Rostedt, mathieu.desnoyers, jiangshanlai,
	Paul Gortmaker, Eric W. Biederman, Dmitry Torokhov, linux-kernel,
	Oleg Nesterov, Peter Zijlstra

On Thu, Jun 15, 2017 at 9:22 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, Jun 15, 2017 at 05:50:39PM +0200, Luis R. Rodriguez wrote:
>> On Wed, Jun 14, 2017 at 04:43:45PM -0700, Paul E. McKenney wrote:
>> > On Wed, Jun 14, 2017 at 04:06:39PM -0700, Luis R. Rodriguez wrote:
>> > > These waits don't even check for the return value for interruption,
>> > > using the non-killable variants means we could be killed by other
>> > > signals than SIGKILL, this is fragile.
>> >
>> > A number of people asserted that kthreads could never catch signals.
>> > I checked this at the time, and it seemed like this was the case, and
>> > the call to ignore_signals() seems to confirm this.
>> >
>> > So it looks to me like RCU doesn't care about this change (give or
>> > take any affect on the load average, anyway), but there is much that I
>> > don't know about Linux-kernel signal handling, so please feel free to
>> > educate me.
>>
>> Thanks, I had seen the kthread but figured best to ask, just got into parnaoia
>> mode. If we were to do a sanity check for usage we'd then have to white list
>> when kthreads are used, however since we don't care to be interrupted why not
>> use a wait which is also explicit about our current uninterruptible state?
>
> I do appreciate any and all inspection, actually, so thank you!
>
> I used to have it uninterruptable, but got complaints on the effect on
> the load average -- sysadms didn't like the "D" state and the fact that
> the load average was always greater than 2 (or 3 on PREEMPT=y kernels)
> when the system was completely idle.

Ah wow, never would I have imagined this to be the reason. Are you
aware of other cases in the kernel that use interruptible wait even
though kthreads are used (therefore uninterruptible) as work around
for this? The above information is very helpful for folks doing random
code introspection, having this annotated somehow via API seems fair
to me. Would we be concerned with abuse? Would it be fair for other
kthreads to use similar strategy? If not, why?

>> > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
>> > > ---
>> > >
>> > > The killable swaits were just posted [1] as part of a series where SIGCHLD
>> > > was detected as interrupting and killing kernel calls waiting using
>> > > non-killable swaits [1]. The fragility here made curious about other callers
>> > > and seeing if they really meant to use such broad wait which captures a lot
>> > > of signals.
>> > >
>> > > I can't see why we'd want to have these killed by other signals, specialy
>> > > since it seems we don't even check for the return value... Granted to abort
>> > > properly we'd have to check for the return value for -ERESTARTSYS, but yeah,
>> > > none of this is done, so it would seem we don't want fragile signals
>> > > interrupting these ?
>> >
>> > The later WARN_ON(signal_pending(current)) complains if a signal somehow
>> > makes it to this task.  Assuming that the signal is nonfatal, anyway.
>>
>> I see, how about just using swait_event_timeout() and removing the WARN_ON()?
>> Is there a reason for having the interruptible ?
>
> If sleeping-uninterruptible kthreads are now excluded from the load average,
> no reason.  But if sleeping-uninterruptible kthreads are still included in
> the load average, it must stay interruptible.

Got it!

>> > > Also can someone confirm if the original change of to swait_event_timeout()
>> > > from wait_event_interruptible_timeout() was actually intentional on
>> > > synchronize_sched_expedited_wait() on commit abedf8e2419fb ("rcu: Use simple
>> > > wait queues where possible in rcutree") ? I can't easily confirm.
>> >
>> > This is also called from a workqueue (at least once the core_initcall()s
>> > get done), which again should mean no signals.  A WARN_ON(ret < 0)
>> > complains if this ever changes.  So it should be OK that this is
>> > swait_event_timeout().  And expedited grace periods are supposed to
>> > get done quickly, so effect on the load average should be negligible.
>>
>> Great thanks.
>>
>> > Or am I missing something?
>>
>> No, just got into parnoia mode and better to ask than be sorry later!
>
> Indeed, better safe that sorry!

:D

 Luis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] rcu: use killable versions of swait
  2017-06-15 16:35       ` Luis R. Rodriguez
@ 2017-06-15 16:55         ` Eric W. Biederman
  2017-06-15 18:48           ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Luis R. Rodriguez
  2017-06-15 17:34         ` [RFC] rcu: use killable versions of swait Paul E. McKenney
  1 sibling, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2017-06-15 16:55 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	mathieu.desnoyers, jiangshanlai, Paul Gortmaker, Dmitry Torokhov,
	linux-kernel, Oleg Nesterov, Peter Zijlstra

"Luis R. Rodriguez" <mcgrof@kernel.org> writes:

> On Thu, Jun 15, 2017 at 9:22 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
>> On Thu, Jun 15, 2017 at 05:50:39PM +0200, Luis R. Rodriguez wrote:
>>> On Wed, Jun 14, 2017 at 04:43:45PM -0700, Paul E. McKenney wrote:
>>> > On Wed, Jun 14, 2017 at 04:06:39PM -0700, Luis R. Rodriguez wrote:

>>> > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
>>> > > ---
>>> > >
>>> > > The killable swaits were just posted [1] as part of a series where SIGCHLD
>>> > > was detected as interrupting and killing kernel calls waiting using
>>> > > non-killable swaits [1]. The fragility here made curious about other callers
>>> > > and seeing if they really meant to use such broad wait which captures a lot
>>> > > of signals.
>>> > >
>>> > > I can't see why we'd want to have these killed by other signals, specialy
>>> > > since it seems we don't even check for the return value... Granted to abort
>>> > > properly we'd have to check for the return value for -ERESTARTSYS, but yeah,
>>> > > none of this is done, so it would seem we don't want fragile signals
>>> > > interrupting these ?
>>> >
>>> > The later WARN_ON(signal_pending(current)) complains if a signal somehow
>>> > makes it to this task.  Assuming that the signal is nonfatal, anyway.
>>>
>>> I see, how about just using swait_event_timeout() and removing the WARN_ON()?
>>> Is there a reason for having the interruptible ?
>>
>> If sleeping-uninterruptible kthreads are now excluded from the load average,
>> no reason.  But if sleeping-uninterruptible kthreads are still included in
>> the load average, it must stay interruptible.
>
> Got it!

There is now TASK_IDLE that is uninterruptible and does not contribute
to load avearage.  see: task_contributes_to_load.

So a swait_event_idle() could be written for this case.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] rcu: use killable versions of swait
  2017-06-15 16:35       ` Luis R. Rodriguez
  2017-06-15 16:55         ` Eric W. Biederman
@ 2017-06-15 17:34         ` Paul E. McKenney
  1 sibling, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2017-06-15 17:34 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Josh Triplett, Steven Rostedt, mathieu.desnoyers, jiangshanlai,
	Paul Gortmaker, Eric W. Biederman, Dmitry Torokhov, linux-kernel,
	Oleg Nesterov, Peter Zijlstra

On Thu, Jun 15, 2017 at 09:35:45AM -0700, Luis R. Rodriguez wrote:
> On Thu, Jun 15, 2017 at 9:22 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, Jun 15, 2017 at 05:50:39PM +0200, Luis R. Rodriguez wrote:
> >> On Wed, Jun 14, 2017 at 04:43:45PM -0700, Paul E. McKenney wrote:
> >> > On Wed, Jun 14, 2017 at 04:06:39PM -0700, Luis R. Rodriguez wrote:
> >> > > These waits don't even check for the return value for interruption,
> >> > > using the non-killable variants means we could be killed by other
> >> > > signals than SIGKILL, this is fragile.
> >> >
> >> > A number of people asserted that kthreads could never catch signals.
> >> > I checked this at the time, and it seemed like this was the case, and
> >> > the call to ignore_signals() seems to confirm this.
> >> >
> >> > So it looks to me like RCU doesn't care about this change (give or
> >> > take any affect on the load average, anyway), but there is much that I
> >> > don't know about Linux-kernel signal handling, so please feel free to
> >> > educate me.
> >>
> >> Thanks, I had seen the kthread but figured best to ask, just got into parnaoia
> >> mode. If we were to do a sanity check for usage we'd then have to white list
> >> when kthreads are used, however since we don't care to be interrupted why not
> >> use a wait which is also explicit about our current uninterruptible state?
> >
> > I do appreciate any and all inspection, actually, so thank you!
> >
> > I used to have it uninterruptable, but got complaints on the effect on
> > the load average -- sysadms didn't like the "D" state and the fact that
> > the load average was always greater than 2 (or 3 on PREEMPT=y kernels)
> > when the system was completely idle.
> 
> Ah wow, never would I have imagined this to be the reason.

Let's just say that it came as a bit of a surprise to me as well.  ;-)

And yes, this does deserve a comment.  Please see the patch at the
end of this email.

>                                                            Are you
> aware of other cases in the kernel that use interruptible wait even
> though kthreads are used (therefore uninterruptible) as work around
> for this? The above information is very helpful for folks doing random
> code introspection, having this annotated somehow via API seems fair
> to me. Would we be concerned with abuse? Would it be fair for other
> kthreads to use similar strategy? If not, why?

The per-CPU smp_hotplug_thread kthreads also use TASK_INTERRUPTIBLE
when they are idle (or TASK_PARKED when their CPU is offline).  However,
their use of TASK_INTERRUPTIBLE is buried inside smpboot_thread_fn(),
which seems less likely to cause confusion.

Taking a quick look at a few other kthreads...

o	event_test_thread() manually sets TASK_INTERRUPTIBLE and
	calls schedule() directly.

o	ecard_task() uses wait_event_interruptible(), just like
	rcu_gp_kthread() used to do before it switched to swait.

o	eeh_event_handler() gets the same effect via down_interruptible(),
	but silently terminates if a signal is received.

o	pika_dtm_thread() manually sets TASK_INTERRUPTIBLE and then
	calls schedule_timeout().

o	xen_blkif_schedule() uses wait_event_interruptible().

So waiting interruptibly seems to be a common kthread implementation
pattern, though there does appear to be some variety in the actual
means of interruptibly waiting.  I could argue either way on having
a separate API for this purpose, though my default position is to add
comments rather than new APIs -- it is not like the Linux kernel has
any sort of API shortage.  But it would be good to get others' opinions.

							Thanx, Paul

------------------------------------------------------------------------

commit 2ef081931b9be6abdda4b6355edcd970913bed4c
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Thu Jun 15 10:06:57 2017 -0700

    rcu: Add comment relating _interruptible to load average
    
    The rcu_gp_kthread() function uses swait_event_interruptible() and
    swait_event_interruptible_timeout(), which at first glance seems quite
    strange.  Especially given WARN_ON(signal_pending(current)) following
    both calls.  After all, this function is invoked only from a kthread,
    which ignores all signals, so why bother with either the _interruptible
    or the WARN_ON()?
    
    Let's start with the swait_event_interruptible().  If the system is idle,
    the RCU grace-period kthread will spend all its time blocked inside this
    call.  If the _interruptible() form was not used, then this kthread would
    contribute to the load average.  This means that an idle system would
    have a load average of 2 (or 3 if PREEMPT=y), rather than the load average
    of 0 that almost fifty years of UNIX has conditioned sysadms to expect.
    Ergo, swait_event_interruptible() is used to avoid surprising load-average
    values.
    
    This same argument applies to the swait_event_interruptible_timeout()
    invocation.  The RCU grace-period kthread spends its time blocked inside
    this call while waiting for grace periods to complete.  In particular,
    if there was only one busy CPU, but that CPU was frequently invoking
    call_rcu(), then the RCU grace-period kthread would spend almost all its
    time blocked inside the swait_event_interruptible_timeout().  This would
    mean that the load average would be 2 rather than the expected 1 for
    the single busy CPU.
    
    This is not an intuitively obvious situation, in fact, it came as quite
    a surprise to me when I first implemented the grace-period kthreads.
    It therefore deserves comments, which this commit supplies.
    
    Reported-by: "Luis R. Rodriguez" <mcgrof@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 695fee7cafe0..2d5cc9315cfb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2052,7 +2052,7 @@ static bool rcu_gp_init(struct rcu_state *rsp)
 }
 
 /*
- * Helper function for wait_event_interruptible_timeout() wakeup
+ * Helper function for swait_event_interruptible_timeout() wakeup
  * at force-quiescent-state time.
  */
 static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
@@ -2191,6 +2191,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("reqwait"));
 			rsp->gp_state = RCU_GP_WAIT_GPS;
+			/* _interruptible() to avoid messing up load average. */
 			swait_event_interruptible(rsp->gp_wq,
 						 READ_ONCE(rsp->gp_flags) &
 						 RCU_GP_FLAG_INIT);
@@ -2224,6 +2225,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("fqswait"));
 			rsp->gp_state = RCU_GP_WAIT_FQS;
+			/* _interruptible() to avoid messing up load average. */
 			ret = swait_event_interruptible_timeout(rsp->gp_wq,
 					rcu_gp_fqs_check_wake(rsp, &gf), j);
 			rsp->gp_state = RCU_GP_DOING_FQS;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-15 16:55         ` Eric W. Biederman
@ 2017-06-15 18:48           ` Luis R. Rodriguez
  2017-06-15 18:48             ` [RFC v2 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
                               ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-15 18:48 UTC (permalink / raw)
  To: peterz, paulmck, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai
  Cc: paul.gortmaker, ebiederm, dmitry.torokhov, linux-kernel,
	Luis R. Rodriguez

While reviewing RCU's interruptible swaits I noticed signals were actually
not expected. Paul explained that the reason signals are not expected is
we use kthreads, which don't get signals, furthermore the code avoided the
uninterruptible swaits as otherwise it would contribute to the system load
average on idle, bumping it from 0 to 2 or 3 (depending on preemption).

Since this can be confusing its best to be explicit about the requirements and
goals. This patch depends on the other killable swaits [0] recently proposed as
well interms of context. Thee patch can however be tested independently if
the hunk is addressed separately.

[0] https://lkml.kernel.org/r/20170614222017.14653-3-mcgrof@kernel.org

Luis R. Rodriguez (2):
  swait: add idle variants which don't contribute to load average
  rcu: use idle versions of swait to make idle-hack clear

 include/linux/swait.h | 25 +++++++++++++++++++++++++
 kernel/rcu/tree.c     |  4 ++--
 2 files changed, 27 insertions(+), 2 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC v2 1/2] swait: add idle variants which don't contribute to load average
  2017-06-15 18:48           ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Luis R. Rodriguez
@ 2017-06-15 18:48             ` Luis R. Rodriguez
  2017-06-16  0:47               ` Boqun Feng
  2017-06-16 20:31               ` Eric W. Biederman
  2017-06-15 18:48             ` [RFC v2 2/2] rcu: use idle versions of swait to make idle-hack clear Luis R. Rodriguez
                               ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-15 18:48 UTC (permalink / raw)
  To: peterz, paulmck, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai
  Cc: paul.gortmaker, ebiederm, dmitry.torokhov, linux-kernel,
	Luis R. Rodriguez

There are cases where folks are using an interruptible swait when
using kthreads. This is rather confusing given you'd expect
interruptible waits to be -- interruptible, but kthreads are not
interruptible ! The reason for such practice though is to avoid
having these kthreads contribute to the system load average.

When systems are idle some kthreads may spend a lot of time blocking if
using swait_event_timeout(). This would contribute to the system load
average. On systems without preemption this would mean the load average
of an idle system is bumped to 2 instead of 0. On systems with PREEMPT=y
this would mean the load average of an idle system is bumped to 3
instead of 0.

This adds proper API using TASK_IDLE to make such goals explicit and
avoid confusion.

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 include/linux/swait.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 2c700694d50a..105c70e23286 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -194,4 +194,29 @@ do {									\
 	__ret;								\
 })
 
+#define __swait_event_idle(wq, condition)				\
+	___swait_event(wq, condition, TASK_IDLE, 0, schedule())
+
+#define swait_event_idle(wq, condition)					\
+({									\
+	int __ret = 0;							\
+	if (!(condition))						\
+		__ret = __swait_event_idle(wq, condition);		\
+	__ret;								\
+})
+
+#define __swait_event_idle_timeout(wq, condition, timeout)		\
+	___swait_event(wq, ___wait_cond_timeout(condition),		\
+		       TASK_IDLE, timeout,				\
+		       __ret = schedule_timeout(__ret))
+
+#define swait_event_idle_timeout(wq, condition, timeout)		\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __swait_event_idle_timeout(wq,			\
+						   condition, timeout);	\
+	__ret;								\
+})
+
 #endif /* _LINUX_SWAIT_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 2/2] rcu: use idle versions of swait to make idle-hack clear
  2017-06-15 18:48           ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Luis R. Rodriguez
  2017-06-15 18:48             ` [RFC v2 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
@ 2017-06-15 18:48             ` Luis R. Rodriguez
  2017-06-15 21:57             ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Paul E. McKenney
  2017-06-20 21:45             ` [PATCH " Luis R. Rodriguez
  3 siblings, 0 replies; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-15 18:48 UTC (permalink / raw)
  To: peterz, paulmck, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai
  Cc: paul.gortmaker, ebiederm, dmitry.torokhov, linux-kernel,
	Luis R. Rodriguez

These RCU waits were set to use interruptible waits to avoid the kthreads
contributing to system load average, even though they are not interruptible
as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
it clear our goal, and removes confusion about these paths possibly being
interruptible -- they are not.

When the system is idle the RCU grace-period kthread will spend all its time
blocked inside the swait_event_interruptible(). If the interruptible() was
not used, then this kthread would contribute to the load average. This means
that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
rather than the load average of 0 that almost fifty years of UNIX has
conditioned sysadms to expect.

The same argument applies to swait_event_interruptible_timeout() use. The
RCU grace-period kthread spends its time blocked inside this call while
waiting for grace periods to complete. In particular, if there was only one
busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
grace-period kthread would spend almost all its time blocked inside the
swait_event_interruptible_timeout(). This would mean that the load average
would be 2 rather than the expected 1 for the single busy CPU.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 kernel/rcu/tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 695fee7cafe0..1de8e55636c0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2191,7 +2191,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("reqwait"));
 			rsp->gp_state = RCU_GP_WAIT_GPS;
-			swait_event_interruptible(rsp->gp_wq,
+			swait_event_idle(rsp->gp_wq,
 						 READ_ONCE(rsp->gp_flags) &
 						 RCU_GP_FLAG_INIT);
 			rsp->gp_state = RCU_GP_DONE_GPS;
@@ -2224,7 +2224,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("fqswait"));
 			rsp->gp_state = RCU_GP_WAIT_FQS;
-			ret = swait_event_interruptible_timeout(rsp->gp_wq,
+			ret = swait_event_idle_timeout(rsp->gp_wq,
 					rcu_gp_fqs_check_wake(rsp, &gf), j);
 			rsp->gp_state = RCU_GP_DOING_FQS;
 			/* Locking provides needed memory barriers. */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-15 18:48           ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Luis R. Rodriguez
  2017-06-15 18:48             ` [RFC v2 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
  2017-06-15 18:48             ` [RFC v2 2/2] rcu: use idle versions of swait to make idle-hack clear Luis R. Rodriguez
@ 2017-06-15 21:57             ` Paul E. McKenney
  2017-06-15 23:26               ` Luis R. Rodriguez
  2017-06-20 21:45             ` [PATCH " Luis R. Rodriguez
  3 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2017-06-15 21:57 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: peterz, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai,
	paul.gortmaker, ebiederm, dmitry.torokhov, linux-kernel

On Thu, Jun 15, 2017 at 11:48:18AM -0700, Luis R. Rodriguez wrote:
> While reviewing RCU's interruptible swaits I noticed signals were actually
> not expected. Paul explained that the reason signals are not expected is
> we use kthreads, which don't get signals, furthermore the code avoided the
> uninterruptible swaits as otherwise it would contribute to the system load
> average on idle, bumping it from 0 to 2 or 3 (depending on preemption).
> 
> Since this can be confusing its best to be explicit about the requirements and
> goals. This patch depends on the other killable swaits [0] recently proposed as
> well interms of context. Thee patch can however be tested independently if
> the hunk is addressed separately.
> 
> [0] https://lkml.kernel.org/r/20170614222017.14653-3-mcgrof@kernel.org

Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Are you looking to push these or were you wanting me to?

							Thanx, Paul

> Luis R. Rodriguez (2):
>   swait: add idle variants which don't contribute to load average
>   rcu: use idle versions of swait to make idle-hack clear
> 
>  include/linux/swait.h | 25 +++++++++++++++++++++++++
>  kernel/rcu/tree.c     |  4 ++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
> 
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-15 21:57             ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Paul E. McKenney
@ 2017-06-15 23:26               ` Luis R. Rodriguez
  2017-06-15 23:43                 ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-15 23:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Luis R. Rodriguez, peterz, oleg, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, paul.gortmaker, ebiederm,
	dmitry.torokhov, linux-kernel

On Thu, Jun 15, 2017 at 02:57:17PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 15, 2017 at 11:48:18AM -0700, Luis R. Rodriguez wrote:
> > While reviewing RCU's interruptible swaits I noticed signals were actually
> > not expected. Paul explained that the reason signals are not expected is
> > we use kthreads, which don't get signals, furthermore the code avoided the
> > uninterruptible swaits as otherwise it would contribute to the system load
> > average on idle, bumping it from 0 to 2 or 3 (depending on preemption).
> > 
> > Since this can be confusing its best to be explicit about the requirements and
> > goals. This patch depends on the other killable swaits [0] recently proposed as
> > well interms of context. Thee patch can however be tested independently if
> > the hunk is addressed separately.
> > 
> > [0] https://lkml.kernel.org/r/20170614222017.14653-3-mcgrof@kernel.org
> 
> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Are you looking to push these or were you wanting me to?

I'd be happy for you to take them.

  Luis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-15 23:26               ` Luis R. Rodriguez
@ 2017-06-15 23:43                 ` Paul E. McKenney
  2017-06-16 20:37                   ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2017-06-15 23:43 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: peterz, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai,
	paul.gortmaker, ebiederm, dmitry.torokhov, linux-kernel

On Fri, Jun 16, 2017 at 01:26:19AM +0200, Luis R. Rodriguez wrote:
> On Thu, Jun 15, 2017 at 02:57:17PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 15, 2017 at 11:48:18AM -0700, Luis R. Rodriguez wrote:
> > > While reviewing RCU's interruptible swaits I noticed signals were actually
> > > not expected. Paul explained that the reason signals are not expected is
> > > we use kthreads, which don't get signals, furthermore the code avoided the
> > > uninterruptible swaits as otherwise it would contribute to the system load
> > > average on idle, bumping it from 0 to 2 or 3 (depending on preemption).
> > > 
> > > Since this can be confusing its best to be explicit about the requirements and
> > > goals. This patch depends on the other killable swaits [0] recently proposed as
> > > well interms of context. Thee patch can however be tested independently if
> > > the hunk is addressed separately.
> > > 
> > > [0] https://lkml.kernel.org/r/20170614222017.14653-3-mcgrof@kernel.org
> > 
> > Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > Are you looking to push these or were you wanting me to?
> 
> I'd be happy for you to take them.

OK, let's see if we can get some Acked-by's or Reviewed-by's from the
relevant people.

For but one example, Eric, does this look good to you or are adjustments
needed?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/2] swait: add idle variants which don't contribute to load average
  2017-06-15 18:48             ` [RFC v2 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
@ 2017-06-16  0:47               ` Boqun Feng
  2017-06-20 21:32                 ` Luis R. Rodriguez
  2017-06-16 20:31               ` Eric W. Biederman
  1 sibling, 1 reply; 25+ messages in thread
From: Boqun Feng @ 2017-06-16  0:47 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: peterz, paulmck, oleg, josh, rostedt, mathieu.desnoyers,
	jiangshanlai, paul.gortmaker, ebiederm, dmitry.torokhov,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2883 bytes --]

On Thu, Jun 15, 2017 at 11:48:19AM -0700, Luis R. Rodriguez wrote:
> There are cases where folks are using an interruptible swait when
> using kthreads. This is rather confusing given you'd expect
> interruptible waits to be -- interruptible, but kthreads are not
> interruptible ! The reason for such practice though is to avoid
> having these kthreads contribute to the system load average.
> 
> When systems are idle some kthreads may spend a lot of time blocking if
> using swait_event_timeout(). This would contribute to the system load
> average. On systems without preemption this would mean the load average
> of an idle system is bumped to 2 instead of 0. On systems with PREEMPT=y
> this would mean the load average of an idle system is bumped to 3
> instead of 0.
> 
> This adds proper API using TASK_IDLE to make such goals explicit and
> avoid confusion.
> 
> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
>  include/linux/swait.h | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/include/linux/swait.h b/include/linux/swait.h
> index 2c700694d50a..105c70e23286 100644
> --- a/include/linux/swait.h
> +++ b/include/linux/swait.h
> @@ -194,4 +194,29 @@ do {									\
>  	__ret;								\
>  })
>  
> +#define __swait_event_idle(wq, condition)				\
> +	___swait_event(wq, condition, TASK_IDLE, 0, schedule())
> +
> +#define swait_event_idle(wq, condition)					\

Better to have some comments before to describe the purpose of this API?
Something like:

/**
 * swait_event_idle - wait uninterruptibly without system load contribution
 * @wq: the waitqueue to wait on
 * @condition: a C expression for the event to wait for
 *
 * The process is put to sleep (TASK_IDLE) until the
 * @condition evaluates to true or a signal is received.
 * The @condition is checked each time the waitqueue @wq is woken up.
 *
 * This function is mostly used when a kthread waits some condition and
 * doesn't want to contribute to system load
 */

so is for swait_event_idle_timeout().

Thoughts?

Regards,
Boqun

> +({									\
> +	int __ret = 0;							\
> +	if (!(condition))						\
> +		__ret = __swait_event_idle(wq, condition);		\
> +	__ret;								\
> +})
> +
> +#define __swait_event_idle_timeout(wq, condition, timeout)		\
> +	___swait_event(wq, ___wait_cond_timeout(condition),		\
> +		       TASK_IDLE, timeout,				\
> +		       __ret = schedule_timeout(__ret))
> +
> +#define swait_event_idle_timeout(wq, condition, timeout)		\
> +({									\
> +	long __ret = timeout;						\
> +	if (!___wait_cond_timeout(condition))				\
> +		__ret = __swait_event_idle_timeout(wq,			\
> +						   condition, timeout);	\
> +	__ret;								\
> +})
> +
>  #endif /* _LINUX_SWAIT_H */
> -- 
> 2.11.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/2] swait: add idle variants which don't contribute to load average
  2017-06-15 18:48             ` [RFC v2 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
  2017-06-16  0:47               ` Boqun Feng
@ 2017-06-16 20:31               ` Eric W. Biederman
  2017-06-19 17:53                 ` Paul E. McKenney
  1 sibling, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2017-06-16 20:31 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: peterz, paulmck, oleg, josh, rostedt, mathieu.desnoyers,
	jiangshanlai, paul.gortmaker, dmitry.torokhov, linux-kernel

"Luis R. Rodriguez" <mcgrof@kernel.org> writes:

> There are cases where folks are using an interruptible swait when
> using kthreads. This is rather confusing given you'd expect
> interruptible waits to be -- interruptible, but kthreads are not
> interruptible ! The reason for such practice though is to avoid
> having these kthreads contribute to the system load average.
>
> When systems are idle some kthreads may spend a lot of time blocking if
> using swait_event_timeout(). This would contribute to the system load
> average. On systems without preemption this would mean the load average
> of an idle system is bumped to 2 instead of 0. On systems with PREEMPT=y
> this would mean the load average of an idle system is bumped to 3
> instead of 0.
>
> This adds proper API using TASK_IDLE to make such goals explicit and
> avoid confusion.
>
> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
>  include/linux/swait.h | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/include/linux/swait.h b/include/linux/swait.h
> index 2c700694d50a..105c70e23286 100644
> --- a/include/linux/swait.h
> +++ b/include/linux/swait.h
> @@ -194,4 +194,29 @@ do {									\
>  	__ret;								\
>  })
>  
> +#define __swait_event_idle(wq, condition)				\
> +	___swait_event(wq, condition, TASK_IDLE, 0, schedule())
> +
> +#define swait_event_idle(wq, condition)					\
> +({									\
> +	int __ret = 0;							\
> +	if (!(condition))						\
> +		__ret = __swait_event_idle(wq, condition);		\
> +	__ret;								\
> +})

The wait isn't interruptible so a return code doesn't make sense here.

> +#define __swait_event_idle_timeout(wq, condition, timeout)		\
> +	___swait_event(wq, ___wait_cond_timeout(condition),		\
> +		       TASK_IDLE, timeout,				\
> +		       __ret = schedule_timeout(__ret))
> +
> +#define swait_event_idle_timeout(wq, condition, timeout)		\
> +({									\
> +	long __ret = timeout;						\
> +	if (!___wait_cond_timeout(condition))				\
> +		__ret = __swait_event_idle_timeout(wq,			\
> +						   condition, timeout);	\
> +	__ret;								\
> +})
> +
>  #endif /* _LINUX_SWAIT_H */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-15 23:43                 ` Paul E. McKenney
@ 2017-06-16 20:37                   ` Eric W. Biederman
  2017-06-19 17:54                     ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2017-06-16 20:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Luis R. Rodriguez, peterz, oleg, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, paul.gortmaker, dmitry.torokhov,
	linux-kernel

"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:

> On Fri, Jun 16, 2017 at 01:26:19AM +0200, Luis R. Rodriguez wrote:
>> On Thu, Jun 15, 2017 at 02:57:17PM -0700, Paul E. McKenney wrote:
>> > On Thu, Jun 15, 2017 at 11:48:18AM -0700, Luis R. Rodriguez wrote:
>> > > While reviewing RCU's interruptible swaits I noticed signals were actually
>> > > not expected. Paul explained that the reason signals are not expected is
>> > > we use kthreads, which don't get signals, furthermore the code avoided the
>> > > uninterruptible swaits as otherwise it would contribute to the system load
>> > > average on idle, bumping it from 0 to 2 or 3 (depending on preemption).
>> > > 
>> > > Since this can be confusing its best to be explicit about the requirements and
>> > > goals. This patch depends on the other killable swaits [0] recently proposed as
>> > > well interms of context. Thee patch can however be tested independently if
>> > > the hunk is addressed separately.
>> > > 
>> > > [0] https://lkml.kernel.org/r/20170614222017.14653-3-mcgrof@kernel.org
>> > 
>> > Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> > 
>> > Are you looking to push these or were you wanting me to?
>> 
>> I'd be happy for you to take them.
>
> OK, let's see if we can get some Acked-by's or Reviewed-by's from the
> relevant people.
>
> For but one example, Eric, does this look good to you or are adjustments
> needed?

Other than an unnecessary return code I don't see any issues.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

In truth I am just barely ahead of you folks.  I ran into the same issue
the other day with a piece of my code and someone pointed me to TASK_IDLE.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/2] swait: add idle variants which don't contribute to load average
  2017-06-16 20:31               ` Eric W. Biederman
@ 2017-06-19 17:53                 ` Paul E. McKenney
  0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2017-06-19 17:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Luis R. Rodriguez, peterz, oleg, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, paul.gortmaker, dmitry.torokhov,
	linux-kernel

On Fri, Jun 16, 2017 at 03:31:51PM -0500, Eric W. Biederman wrote:
> "Luis R. Rodriguez" <mcgrof@kernel.org> writes:
> 
> > There are cases where folks are using an interruptible swait when
> > using kthreads. This is rather confusing given you'd expect
> > interruptible waits to be -- interruptible, but kthreads are not
> > interruptible ! The reason for such practice though is to avoid
> > having these kthreads contribute to the system load average.
> >
> > When systems are idle some kthreads may spend a lot of time blocking if
> > using swait_event_timeout(). This would contribute to the system load
> > average. On systems without preemption this would mean the load average
> > of an idle system is bumped to 2 instead of 0. On systems with PREEMPT=y
> > this would mean the load average of an idle system is bumped to 3
> > instead of 0.
> >
> > This adds proper API using TASK_IDLE to make such goals explicit and
> > avoid confusion.
> >
> > Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > ---
> >  include/linux/swait.h | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> >
> > diff --git a/include/linux/swait.h b/include/linux/swait.h
> > index 2c700694d50a..105c70e23286 100644
> > --- a/include/linux/swait.h
> > +++ b/include/linux/swait.h
> > @@ -194,4 +194,29 @@ do {									\
> >  	__ret;								\
> >  })
> >  
> > +#define __swait_event_idle(wq, condition)				\
> > +	___swait_event(wq, condition, TASK_IDLE, 0, schedule())
> > +
> > +#define swait_event_idle(wq, condition)					\
> > +({									\
> > +	int __ret = 0;							\
> > +	if (!(condition))						\
> > +		__ret = __swait_event_idle(wq, condition);		\
> > +	__ret;								\
> > +})
> 
> The wait isn't interruptible so a return code doesn't make sense here.

How about as shown below?

							Thanx, Paul

------------------------------------------------------------------------

commit 48a96574e9c62ca5e13744e86e2219eb38f080b4
Author: Luis R. Rodriguez <mcgrof@kernel.org>
Date:   Thu Jun 15 11:48:19 2017 -0700

    swait: add idle variants which don't contribute to load average
    
    There are cases where folks are using an interruptible swait when
    using kthreads. This is rather confusing given you'd expect
    interruptible waits to be -- interruptible, but kthreads are not
    interruptible ! The reason for such practice though is to avoid
    having these kthreads contribute to the system load average.
    
    When systems are idle some kthreads may spend a lot of time blocking if
    using swait_event_timeout(). This would contribute to the system load
    average. On systems without preemption this would mean the load average
    of an idle system is bumped to 2 instead of 0. On systems with PREEMPT=y
    this would mean the load average of an idle system is bumped to 3
    instead of 0.
    
    This adds proper API using TASK_IDLE to make such goals explicit and
    avoid confusion.
    
    Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
    Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
    [ paulmck: No return value from swait_event_idle per Eric's feedback. ]
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

diff --git a/include/linux/swait.h b/include/linux/swait.h
index c1f9c62a8a50..80d8aee5e7e1 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -169,4 +169,27 @@ do {									\
 	__ret;								\
 })
 
+#define __swait_event_idle(wq, condition)				\
+	___swait_event(wq, condition, TASK_IDLE, 0, schedule())
+
+#define swait_event_idle(wq, condition)					\
+do {									\
+	if (!(condition))						\
+		__swait_event_idle(wq, condition);			\
+} while (0)
+
+#define __swait_event_idle_timeout(wq, condition, timeout)		\
+	___swait_event(wq, ___wait_cond_timeout(condition),		\
+		       TASK_IDLE, timeout,				\
+		       __ret = schedule_timeout(__ret))
+
+#define swait_event_idle_timeout(wq, condition, timeout)		\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __swait_event_idle_timeout(wq,			\
+						   condition, timeout);	\
+	__ret;								\
+})
+
 #endif /* _LINUX_SWAIT_H */

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-16 20:37                   ` Eric W. Biederman
@ 2017-06-19 17:54                     ` Paul E. McKenney
  0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2017-06-19 17:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Luis R. Rodriguez, peterz, oleg, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, paul.gortmaker, dmitry.torokhov,
	linux-kernel

On Fri, Jun 16, 2017 at 03:37:54PM -0500, Eric W. Biederman wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> 
> > On Fri, Jun 16, 2017 at 01:26:19AM +0200, Luis R. Rodriguez wrote:
> >> On Thu, Jun 15, 2017 at 02:57:17PM -0700, Paul E. McKenney wrote:
> >> > On Thu, Jun 15, 2017 at 11:48:18AM -0700, Luis R. Rodriguez wrote:
> >> > > While reviewing RCU's interruptible swaits I noticed signals were actually
> >> > > not expected. Paul explained that the reason signals are not expected is
> >> > > we use kthreads, which don't get signals, furthermore the code avoided the
> >> > > uninterruptible swaits as otherwise it would contribute to the system load
> >> > > average on idle, bumping it from 0 to 2 or 3 (depending on preemption).
> >> > > 
> >> > > Since this can be confusing its best to be explicit about the requirements and
> >> > > goals. This patch depends on the other killable swaits [0] recently proposed as
> >> > > well interms of context. Thee patch can however be tested independently if
> >> > > the hunk is addressed separately.
> >> > > 
> >> > > [0] https://lkml.kernel.org/r/20170614222017.14653-3-mcgrof@kernel.org
> >> > 
> >> > Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >> > 
> >> > Are you looking to push these or were you wanting me to?
> >> 
> >> I'd be happy for you to take them.
> >
> > OK, let's see if we can get some Acked-by's or Reviewed-by's from the
> > relevant people.
> >
> > For but one example, Eric, does this look good to you or are adjustments
> > needed?
> 
> Other than an unnecessary return code I don't see any issues.
> 
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> In truth I am just barely ahead of you folks.  I ran into the same issue
> the other day with a piece of my code and someone pointed me to TASK_IDLE.

;-) ;-) ;-)

And here is an updated version of the second patch.  Thoughts?

(The bogus comment was not Luis's fault, but I figured I should fix
it while thinking about it.)

							Thanx, Paul

------------------------------------------------------------------------

commit 5877121be4ba90a32298d7a00a678cae5cbb6a82
Author: Luis R. Rodriguez <mcgrof@kernel.org>
Date:   Thu Jun 15 11:48:20 2017 -0700

    rcu: use idle versions of swait to make idle-hack clear
    
    These RCU waits were set to use interruptible waits to avoid the kthreads
    contributing to system load average, even though they are not interruptible
    as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
    it clear our goal, and removes confusion about these paths possibly being
    interruptible -- they are not.
    
    When the system is idle the RCU grace-period kthread will spend all its time
    blocked inside the swait_event_interruptible(). If the interruptible() was
    not used, then this kthread would contribute to the load average. This means
    that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
    rather than the load average of 0 that almost fifty years of UNIX has
    conditioned sysadms to expect.
    
    The same argument applies to swait_event_interruptible_timeout() use. The
    RCU grace-period kthread spends its time blocked inside this call while
    waiting for grace periods to complete. In particular, if there was only one
    busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
    grace-period kthread would spend almost all its time blocked inside the
    swait_event_interruptible_timeout(). This would mean that the load average
    would be 2 rather than the expected 1 for the single busy CPU.
    
    Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    [ paulmck: Fix indentation and obsolete comment. ]
    Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 695fee7cafe0..94ec7455fc46 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2052,8 +2052,8 @@ static bool rcu_gp_init(struct rcu_state *rsp)
 }
 
 /*
- * Helper function for wait_event_interruptible_timeout() wakeup
- * at force-quiescent-state time.
+ * Helper function for swait_event_idle() wakeup at force-quiescent-state
+ * time.
  */
 static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
 {
@@ -2191,9 +2191,8 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("reqwait"));
 			rsp->gp_state = RCU_GP_WAIT_GPS;
-			swait_event_interruptible(rsp->gp_wq,
-						 READ_ONCE(rsp->gp_flags) &
-						 RCU_GP_FLAG_INIT);
+			swait_event_idle(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
+						     RCU_GP_FLAG_INIT);
 			rsp->gp_state = RCU_GP_DONE_GPS;
 			/* Locking provides needed memory barrier. */
 			if (rcu_gp_init(rsp))
@@ -2224,7 +2223,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("fqswait"));
 			rsp->gp_state = RCU_GP_WAIT_FQS;
-			ret = swait_event_interruptible_timeout(rsp->gp_wq,
+			ret = swait_event_idle_timeout(rsp->gp_wq,
 					rcu_gp_fqs_check_wake(rsp, &gf), j);
 			rsp->gp_state = RCU_GP_DOING_FQS;
 			/* Locking provides needed memory barriers. */

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/2] swait: add idle variants which don't contribute to load average
  2017-06-16  0:47               ` Boqun Feng
@ 2017-06-20 21:32                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-20 21:32 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Luis R. Rodriguez, peterz, paulmck, oleg, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, paul.gortmaker, ebiederm,
	dmitry.torokhov, linux-kernel

On Fri, Jun 16, 2017 at 08:47:32AM +0800, Boqun Feng wrote:
> On Thu, Jun 15, 2017 at 11:48:19AM -0700, Luis R. Rodriguez wrote:
> > There are cases where folks are using an interruptible swait when
> > using kthreads. This is rather confusing given you'd expect
> > interruptible waits to be -- interruptible, but kthreads are not
> > interruptible ! The reason for such practice though is to avoid
> > having these kthreads contribute to the system load average.
> > 
> > When systems are idle some kthreads may spend a lot of time blocking if
> > using swait_event_timeout(). This would contribute to the system load
> > average. On systems without preemption this would mean the load average
> > of an idle system is bumped to 2 instead of 0. On systems with PREEMPT=y
> > this would mean the load average of an idle system is bumped to 3
> > instead of 0.
> > 
> > This adds proper API using TASK_IDLE to make such goals explicit and
> > avoid confusion.
> > 
> > Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > ---
> >  include/linux/swait.h | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/include/linux/swait.h b/include/linux/swait.h
> > index 2c700694d50a..105c70e23286 100644
> > --- a/include/linux/swait.h
> > +++ b/include/linux/swait.h
> > @@ -194,4 +194,29 @@ do {									\
> >  	__ret;								\
> >  })
> >  
> > +#define __swait_event_idle(wq, condition)				\
> > +	___swait_event(wq, condition, TASK_IDLE, 0, schedule())
> > +
> > +#define swait_event_idle(wq, condition)					\
> 
> Better to have some comments before to describe the purpose of this API?
> Something like:
> 
> /**
>  * swait_event_idle - wait uninterruptibly without system load contribution
>  * @wq: the waitqueue to wait on
>  * @condition: a C expression for the event to wait for
>  *
>  * The process is put to sleep (TASK_IDLE) until the
>  * @condition evaluates to true or a signal is received.

Except we get no signals.

>  * The @condition is checked each time the waitqueue @wq is woken up.
>  *
>  * This function is mostly used when a kthread waits some condition and
>  * doesn't want to contribute to system load
>  */
> 
> so is for swait_event_idle_timeout().

Will add for both calls and elaborate on the return value for the timeout case.

  Luis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-15 18:48           ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Luis R. Rodriguez
                               ` (2 preceding siblings ...)
  2017-06-15 21:57             ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Paul E. McKenney
@ 2017-06-20 21:45             ` Luis R. Rodriguez
  2017-06-20 21:45               ` [PATCH 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
                                 ` (2 more replies)
  3 siblings, 3 replies; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-20 21:45 UTC (permalink / raw)
  To: peterz, paulmck, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai
  Cc: paul.gortmaker, boqun.feng, ebiederm, dmitry.torokhov,
	linux-kernel, Luis R. Rodriguez

In this proper patch form I've made the non-timeout idle swait void.
I've also integrated Paul's comment / ident changes, and added documentation
as suggested by Boqun.

Let me know if there are issue, otherwise, Paul feel free to take!

Luis R. Rodriguez (2):
  swait: add idle variants which don't contribute to load average
  rcu: use idle versions of swait to make idle-hack clear

 include/linux/swait.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/rcu/tree.c     | 11 +++++------
 2 files changed, 60 insertions(+), 6 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] swait: add idle variants which don't contribute to load average
  2017-06-20 21:45             ` [PATCH " Luis R. Rodriguez
@ 2017-06-20 21:45               ` Luis R. Rodriguez
  2017-06-20 21:45               ` [PATCH 2/2] rcu: use idle versions of swait to make idle-hack clear Luis R. Rodriguez
  2017-06-21 16:48               ` [PATCH 0/2] swait: add idle to make idle-hacks on kthreads explicit Paul E. McKenney
  2 siblings, 0 replies; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-20 21:45 UTC (permalink / raw)
  To: peterz, paulmck, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai
  Cc: paul.gortmaker, boqun.feng, ebiederm, dmitry.torokhov,
	linux-kernel, Luis R. Rodriguez

There are cases where folks are using an interruptible swait when
using kthreads. This is rather confusing given you'd expect
interruptible waits to be -- interruptible, but kthreads are not
interruptible ! The reason for such practice though is to avoid
having these kthreads contribute to the system load average.

When systems are idle some kthreads may spend a lot of time blocking if
using swait_event_timeout(). This would contribute to the system load
average. On systems without preemption this would mean the load average
of an idle system is bumped to 2 instead of 0. On systems with PREEMPT=y
this would mean the load average of an idle system is bumped to 3
instead of 0.

This adds proper API using TASK_IDLE to make such goals explicit and
avoid confusion.

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 include/linux/swait.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 2c700694d50a..10b80900b509 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -194,4 +194,59 @@ do {									\
 	__ret;								\
 })
 
+#define __swait_event_idle(wq, condition)				\
+	(void)___swait_event(wq, condition, TASK_IDLE, 0, schedule())
+
+/**
+ * swait_event_idle - wait without system load contribution
+ * @wq: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ *
+ * The process is put to sleep (TASK_IDLE) until the @condition evaluates to
+ * true. The @condition is checked each time the waitqueue @wq is woken up.
+ *
+ * This function is mostly used when a kthread or workqueue waits for some
+ * condition and doesn't want to contribute to system load. Signals are
+ * ignored.
+ */
+#define swait_event_idle(wq, condition)					\
+do {									\
+	if (condition)							\
+		break;							\
+	__swait_event_idle(wq, condition);				\
+} while (0)
+
+#define __swait_event_idle_timeout(wq, condition, timeout)		\
+	___swait_event(wq, ___wait_cond_timeout(condition),		\
+		       TASK_IDLE, timeout,				\
+		       __ret = schedule_timeout(__ret))
+
+/**
+ * swait_event_idle_timeout - wait up to timeout without load contribution
+ * @wq: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ * @timeout: timeout at which we'll give up in jiffies
+ *
+ * The process is put to sleep (TASK_IDLE) until the @condition evaluates to
+ * true. The @condition is checked each time the waitqueue @wq is woken up.
+ *
+ * This function is mostly used when a kthread or workqueue waits for some
+ * condition and doesn't want to contribute to system load. Signals are
+ * ignored.
+ *
+ * Returns:
+ * 0 if the @condition evaluated to %false after the @timeout elapsed,
+ * 1 if the @condition evaluated to %true after the @timeout elapsed,
+ * or the remaining jiffies (at least 1) if the @condition evaluated
+ * to %true before the @timeout elapsed.
+ */
+#define swait_event_idle_timeout(wq, condition, timeout)		\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __swait_event_idle_timeout(wq,			\
+						   condition, timeout);	\
+	__ret;								\
+})
+
 #endif /* _LINUX_SWAIT_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] rcu: use idle versions of swait to make idle-hack clear
  2017-06-20 21:45             ` [PATCH " Luis R. Rodriguez
  2017-06-20 21:45               ` [PATCH 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
@ 2017-06-20 21:45               ` Luis R. Rodriguez
  2017-06-21 16:48               ` [PATCH 0/2] swait: add idle to make idle-hacks on kthreads explicit Paul E. McKenney
  2 siblings, 0 replies; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-20 21:45 UTC (permalink / raw)
  To: peterz, paulmck, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai
  Cc: paul.gortmaker, boqun.feng, ebiederm, dmitry.torokhov,
	linux-kernel, Luis R. Rodriguez

These RCU waits were set to use interruptible waits to avoid the kthreads
contributing to system load average, even though they are not interruptible
as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
our goal clear, and removes confusion about these paths possibly being
interruptible -- they are not.

When the system is idle the RCU grace-period kthread will spend all its time
blocked inside the swait_event_interruptible(). If the interruptible() was
not used, then this kthread would contribute to the load average. This means
that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
rather than the load average of 0 that almost fifty years of UNIX has
conditioned sysadmins to expect.

The same argument applies to swait_event_interruptible_timeout() use. The
RCU grace-period kthread spends its time blocked inside this call while
waiting for grace periods to complete. In particular, if there was only one
busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
grace-period kthread would spend almost all its time blocked inside the
swait_event_interruptible_timeout(). This would mean that the load average
would be 2 rather than the expected 1 for the single busy CPU.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 kernel/rcu/tree.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 695fee7cafe0..94ec7455fc46 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2052,8 +2052,8 @@ static bool rcu_gp_init(struct rcu_state *rsp)
 }
 
 /*
- * Helper function for wait_event_interruptible_timeout() wakeup
- * at force-quiescent-state time.
+ * Helper function for swait_event_idle() wakeup at force-quiescent-state
+ * time.
  */
 static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
 {
@@ -2191,9 +2191,8 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("reqwait"));
 			rsp->gp_state = RCU_GP_WAIT_GPS;
-			swait_event_interruptible(rsp->gp_wq,
-						 READ_ONCE(rsp->gp_flags) &
-						 RCU_GP_FLAG_INIT);
+			swait_event_idle(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
+						     RCU_GP_FLAG_INIT);
 			rsp->gp_state = RCU_GP_DONE_GPS;
 			/* Locking provides needed memory barrier. */
 			if (rcu_gp_init(rsp))
@@ -2224,7 +2223,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("fqswait"));
 			rsp->gp_state = RCU_GP_WAIT_FQS;
-			ret = swait_event_interruptible_timeout(rsp->gp_wq,
+			ret = swait_event_idle_timeout(rsp->gp_wq,
 					rcu_gp_fqs_check_wake(rsp, &gf), j);
 			rsp->gp_state = RCU_GP_DOING_FQS;
 			/* Locking provides needed memory barriers. */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-20 21:45             ` [PATCH " Luis R. Rodriguez
  2017-06-20 21:45               ` [PATCH 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
  2017-06-20 21:45               ` [PATCH 2/2] rcu: use idle versions of swait to make idle-hack clear Luis R. Rodriguez
@ 2017-06-21 16:48               ` Paul E. McKenney
  2017-06-21 17:57                 ` Luis R. Rodriguez
  2 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2017-06-21 16:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: peterz, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai,
	paul.gortmaker, boqun.feng, ebiederm, dmitry.torokhov,
	linux-kernel

On Tue, Jun 20, 2017 at 02:45:45PM -0700, Luis R. Rodriguez wrote:
> In this proper patch form I've made the non-timeout idle swait void.
> I've also integrated Paul's comment / ident changes, and added documentation
> as suggested by Boqun.
> 
> Let me know if there are issue, otherwise, Paul feel free to take!

Nice docbook comments!  I replaced my modified commits with your new ones,
queued for further review and testing.

Just out of curiosity, why the three-line swait_event_idle() with the
"break" statement instead of the two-line version with the inverted
condition?  (I am fine either way, just curious.)

							Thanx, Paul

> Luis R. Rodriguez (2):
>   swait: add idle variants which don't contribute to load average
>   rcu: use idle versions of swait to make idle-hack clear
> 
>  include/linux/swait.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/rcu/tree.c     | 11 +++++------
>  2 files changed, 60 insertions(+), 6 deletions(-)
> 
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-21 16:48               ` [PATCH 0/2] swait: add idle to make idle-hacks on kthreads explicit Paul E. McKenney
@ 2017-06-21 17:57                 ` Luis R. Rodriguez
  2017-06-21 18:19                   ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Luis R. Rodriguez @ 2017-06-21 17:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Luis R. Rodriguez, peterz, oleg, josh, rostedt,
	mathieu.desnoyers, jiangshanlai, paul.gortmaker, boqun.feng,
	ebiederm, dmitry.torokhov, linux-kernel

On Wed, Jun 21, 2017 at 09:48:46AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 20, 2017 at 02:45:45PM -0700, Luis R. Rodriguez wrote:
> > In this proper patch form I've made the non-timeout idle swait void.
> > I've also integrated Paul's comment / ident changes, and added documentation
> > as suggested by Boqun.
> > 
> > Let me know if there are issue, otherwise, Paul feel free to take!
> 
> Nice docbook comments!  I replaced my modified commits with your new ones,
> queued for further review and testing.

Great thanks!

> Just out of curiosity, why the three-line swait_event_idle() with the
> "break" statement instead of the two-line version with the inverted
> condition?  (I am fine either way, just curious.)

No strong reason -- just went with what swait.h already used before on
similar condition before, in this case it follows swait_event() model.
In the future it may be possible to share a very nasty macro for both
but since that would involve using a helper function as an argument
I deferred that at this point -- it'd be ugly.

  Luis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/2] swait: add idle to make idle-hacks on kthreads explicit
  2017-06-21 17:57                 ` Luis R. Rodriguez
@ 2017-06-21 18:19                   ` Paul E. McKenney
  0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2017-06-21 18:19 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: peterz, oleg, josh, rostedt, mathieu.desnoyers, jiangshanlai,
	paul.gortmaker, boqun.feng, ebiederm, dmitry.torokhov,
	linux-kernel

On Wed, Jun 21, 2017 at 07:57:08PM +0200, Luis R. Rodriguez wrote:
> On Wed, Jun 21, 2017 at 09:48:46AM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 20, 2017 at 02:45:45PM -0700, Luis R. Rodriguez wrote:
> > > In this proper patch form I've made the non-timeout idle swait void.
> > > I've also integrated Paul's comment / ident changes, and added documentation
> > > as suggested by Boqun.
> > > 
> > > Let me know if there are issue, otherwise, Paul feel free to take!
> > 
> > Nice docbook comments!  I replaced my modified commits with your new ones,
> > queued for further review and testing.
> 
> Great thanks!
> 
> > Just out of curiosity, why the three-line swait_event_idle() with the
> > "break" statement instead of the two-line version with the inverted
> > condition?  (I am fine either way, just curious.)
> 
> No strong reason -- just went with what swait.h already used before on
> similar condition before, in this case it follows swait_event() model.
> In the future it may be possible to share a very nasty macro for both
> but since that would involve using a helper function as an argument
> I deferred that at this point -- it'd be ugly.

Fair enough, works for me!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-06-21 18:19 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-14 23:06 [RFC] rcu: use killable versions of swait Luis R. Rodriguez
2017-06-14 23:43 ` Paul E. McKenney
2017-06-15 15:50   ` Luis R. Rodriguez
2017-06-15 16:22     ` Paul E. McKenney
2017-06-15 16:35       ` Luis R. Rodriguez
2017-06-15 16:55         ` Eric W. Biederman
2017-06-15 18:48           ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Luis R. Rodriguez
2017-06-15 18:48             ` [RFC v2 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
2017-06-16  0:47               ` Boqun Feng
2017-06-20 21:32                 ` Luis R. Rodriguez
2017-06-16 20:31               ` Eric W. Biederman
2017-06-19 17:53                 ` Paul E. McKenney
2017-06-15 18:48             ` [RFC v2 2/2] rcu: use idle versions of swait to make idle-hack clear Luis R. Rodriguez
2017-06-15 21:57             ` [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit Paul E. McKenney
2017-06-15 23:26               ` Luis R. Rodriguez
2017-06-15 23:43                 ` Paul E. McKenney
2017-06-16 20:37                   ` Eric W. Biederman
2017-06-19 17:54                     ` Paul E. McKenney
2017-06-20 21:45             ` [PATCH " Luis R. Rodriguez
2017-06-20 21:45               ` [PATCH 1/2] swait: add idle variants which don't contribute to load average Luis R. Rodriguez
2017-06-20 21:45               ` [PATCH 2/2] rcu: use idle versions of swait to make idle-hack clear Luis R. Rodriguez
2017-06-21 16:48               ` [PATCH 0/2] swait: add idle to make idle-hacks on kthreads explicit Paul E. McKenney
2017-06-21 17:57                 ` Luis R. Rodriguez
2017-06-21 18:19                   ` Paul E. McKenney
2017-06-15 17:34         ` [RFC] rcu: use killable versions of swait Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.