linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kmod: avoid propagating PF_NO_SETAFFINITY into userspace child
@ 2013-11-14  1:51 zhang.yi20
  2013-11-14  5:23 ` Tejun Heo
  0 siblings, 1 reply; 68+ messages in thread
From: zhang.yi20 @ 2013-11-14  1:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Tetsuo Handa, Oleg Nesterov


The kernel worker thread has the PF_NO_SETAFFINITY flag, and it is propagated
into the userspace child. Clearing this flag in usersapce child to enable its
migrating capability.


Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>

--- linux3-12/kernel/kmod.c	2013-11-14 09:06:58.991781656 +0000
+++ linux3-12/kernel/kmod.c	2013-11-14 09:08:47.511781621 +0000
@@ -217,6 +217,7 @@ static int ____call_usermodehelper(void
 	 * Avoid propagating that into the userspace child.
 	 */
 	set_user_nice(current, 0);
+	current->flags &= ~PF_NO_SETAFFINITY;

 	retval = -ENOMEM;
 	new = prepare_kernel_cred(current);


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: kmod: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-14  1:51 kmod: avoid propagating PF_NO_SETAFFINITY into userspace child zhang.yi20
@ 2013-11-14  5:23 ` Tejun Heo
  2013-11-14 11:40   ` Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-14  5:23 UTC (permalink / raw)
  To: zhang.yi20; +Cc: linux-kernel, Tetsuo Handa, Oleg Nesterov

Hello,

On Thu, Nov 14, 2013 at 09:51:47AM +0800, zhang.yi20@zte.com.cn wrote:
> The kernel worker thread has the PF_NO_SETAFFINITY flag, and it is propagated
> into the userspace child. Clearing this flag in usersapce child to enable its
> migrating capability.
> 
> 
> Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
> 
> --- linux3-12/kernel/kmod.c	2013-11-14 09:06:58.991781656 +0000
> +++ linux3-12/kernel/kmod.c	2013-11-14 09:08:47.511781621 +0000
> @@ -217,6 +217,7 @@ static int ____call_usermodehelper(void
>  	 * Avoid propagating that into the userspace child.
>  	 */
>  	set_user_nice(current, 0);
> +	current->flags &= ~PF_NO_SETAFFINITY;

I'm a bit confused.  kernel_thread() doesn't use workqueue or
kthread_bind(), so the thread shouldn't have PF_NO_SETAFFINITY set.
Have you actually observed this happening?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: kmod: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-14  5:23 ` Tejun Heo
@ 2013-11-14 11:40   ` Oleg Nesterov
  2013-11-14 11:55     ` [PATCH 0/1]: (Was: kmod: avoid propagating PF_NO_SETAFFINITY into userspace child) Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-14 11:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: zhang.yi20, linux-kernel, Tetsuo Handa

On 11/14, Tejun Heo wrote:
>
> Hello,
>
> On Thu, Nov 14, 2013 at 09:51:47AM +0800, zhang.yi20@zte.com.cn wrote:
> > The kernel worker thread has the PF_NO_SETAFFINITY flag, and it is propagated
> > into the userspace child. Clearing this flag in usersapce child to enable its
> > migrating capability.
> >
> >
> > Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
> >
> > --- linux3-12/kernel/kmod.c	2013-11-14 09:06:58.991781656 +0000
> > +++ linux3-12/kernel/kmod.c	2013-11-14 09:08:47.511781621 +0000
> > @@ -217,6 +217,7 @@ static int ____call_usermodehelper(void
> >  	 * Avoid propagating that into the userspace child.
> >  	 */
> >  	set_user_nice(current, 0);
> > +	current->flags &= ~PF_NO_SETAFFINITY;
>
> I'm a bit confused.  kernel_thread() doesn't use workqueue or
> kthread_bind(),

Yes, but kernel_thread() is called by the worker thread which has
PF_NO_SETAFFINITY, this flag is copied to child->flags.

Looks like Zhang is right... But I'd suggest to change flush_old_exec()
instead (see "current->flags &= ...").

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 0/1]: (Was: kmod: avoid propagating PF_NO_SETAFFINITY into userspace child)
  2013-11-14 11:40   ` Oleg Nesterov
@ 2013-11-14 11:55     ` Oleg Nesterov
  2013-11-14 11:56       ` [PATCH 1/1] workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-14 11:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: zhang.yi20, linux-kernel, Tetsuo Handa

On 11/14, Oleg Nesterov wrote:
>
> Yes, but kernel_thread() is called by the worker thread which has
> PF_NO_SETAFFINITY, this flag is copied to child->flags.

Hmm, and it seems that create_worker() needs a minor fix, please see
the patch.

> Looks like Zhang is right... But I'd suggest to change flush_old_exec()
> instead (see "current->flags &= ...").

Yes.


But... May be we should (also?) change copy_flags() to clear
PF_NO_SETAFFINITY? Probably not, this can lead to the subtle behaviour
changes. But I am wondering if there is any kthread() which relies on
PF_NO_SETAFFINITY copied by kernel_thread().

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/1] workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
  2013-11-14 11:55     ` [PATCH 0/1]: (Was: kmod: avoid propagating PF_NO_SETAFFINITY into userspace child) Oleg Nesterov
@ 2013-11-14 11:56       ` Oleg Nesterov
  2013-11-22 23:13         ` Tejun Heo
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-14 11:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: zhang.yi20, linux-kernel, Tetsuo Handa

Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
in create_worker(). Otherwise userland can change ->cpus_allowed
in between.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/workqueue.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 987293d..775af74 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1736,16 +1736,16 @@ static struct worker *create_worker(struct worker_pool *pool)
 	if (IS_ERR(worker->task))
 		goto fail;
 
+	set_user_nice(worker->task, pool->attrs->nice);
+
+	/* prevent userland from meddling with cpumask of workqueue workers */
+	worker->task->flags |= PF_NO_SETAFFINITY;
 	/*
 	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
 	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
 	 */
-	set_user_nice(worker->task, pool->attrs->nice);
 	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
-	/* prevent userland from meddling with cpumask of workqueue workers */
-	worker->task->flags |= PF_NO_SETAFFINITY;
-
 	/*
 	 * The caller is responsible for ensuring %POOL_DISASSOCIATED
 	 * remains stable across this function.  See the comments above the
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
  2013-11-14 11:56       ` [PATCH 1/1] workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY Oleg Nesterov
@ 2013-11-22 23:13         ` Tejun Heo
       [not found]           ` <OF36E72FA9.51146BE3-ON48257C2E.0008BC6D-48257C2E.0008FF9C@zte.com.cn>
  0 siblings, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-22 23:13 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: zhang.yi20, lkml, Tetsuo Handa

On Thu, Nov 14, 2013 at 6:56 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
> in create_worker(). Otherwise userland can change ->cpus_allowed
> in between.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Applied to wq/for-3.13-fixes.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: 答复: Re: [PATCH 1/1] workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
       [not found]           ` <OF36E72FA9.51146BE3-ON48257C2E.0008BC6D-48257C2E.0008FF9C@zte.com.cn>
@ 2013-11-25 12:14             ` Oleg Nesterov
  2013-11-26  2:10               ` [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child zhang.yi20
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-25 12:14 UTC (permalink / raw)
  To: zhang.yi20; +Cc: Tejun Heo, htejun, lkml, Tetsuo Handa

On 11/25, zhang.yi20@zte.com.cn wrote:
>
> htejun@gmail.com wrote on 2013/11/23 07:13:43:
>
> >
> > Re: [PATCH 1/1] workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
> >
> > On Thu, Nov 14, 2013 at 6:56 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > > Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
> > > in create_worker(). Otherwise userland can change ->cpus_allowed
> > > in between.
> > >
> > > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> >
> > Applied to wq/for-3.13-fixes.
> >
> > Thanks!
> >
> > --
> > tejun
>
> How about the first patch?

Let me quote myself:

	Looks like Zhang is right... But I'd suggest to change flush_old_exec()
	instead (see "current->flags &= ...").

Do you agree? If yes, could you make v2? flush_old_exec() already clears
the unwanted PF_ bits, I do not think we should add another place.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-25 12:14             ` 答复: " Oleg Nesterov
@ 2013-11-26  2:10               ` zhang.yi20
  2013-11-26 18:04                 ` Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: zhang.yi20 @ 2013-11-26  2:10 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: htejun, lkml, Tetsuo Handa, Tejun Heo

Userspace process doesn't want the PF_NO_SETAFFINITY, but its parent may be
a kernel worker thread which has PF_NO_SETAFFINITY set.
Clearing this flag in usersapce child to enable its migrating capability.


Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>

--- linux-3.12.old/fs/exec.c	2013-11-26 08:53:12.175811856 +0000
+++ linux-3.12/fs/exec.c	2013-11-26 09:26:53.575999604 +0000
@@ -1091,7 +1091,8 @@ int flush_old_exec(struct linux_binprm *

 	set_fs(USER_DS);
 	current->flags &=
-		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
+		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE
+		| PF_NO_SETAFFINITY);
 	flush_thread();
 	current->personality &= ~bprm->per_clear;


Thanks.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-26  2:10               ` [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child zhang.yi20
@ 2013-11-26 18:04                 ` Oleg Nesterov
  2013-11-27  2:07                   ` zhang.yi20
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-26 18:04 UTC (permalink / raw)
  To: zhang.yi20; +Cc: htejun, lkml, Tetsuo Handa, Tejun Heo

On 11/26, zhang.yi20@zte.com.cn wrote:
>
> Userspace process doesn't want the PF_NO_SETAFFINITY, but its parent may be
> a kernel worker thread which has PF_NO_SETAFFINITY set.

... and this worker thread can do kernel_thread() to create the child.

> Clearing this flag in usersapce child to enable its migrating capability.
>
>
> Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
>
> --- linux-3.12.old/fs/exec.c	2013-11-26 08:53:12.175811856 +0000
> +++ linux-3.12/fs/exec.c	2013-11-26 09:26:53.575999604 +0000
> @@ -1091,7 +1091,8 @@ int flush_old_exec(struct linux_binprm *
>
>  	set_fs(USER_DS);
>  	current->flags &=
> -		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
> +		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE
> +		| PF_NO_SETAFFINITY);

Thanks,

Acked-by: Oleg Nesterov <oleg@redhat.com>




Can't resist... perhaps

	current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD |
			    PF_NOFREEZE | PF_NO_SETAFFINITY);

looks a bit better, but this is minor and I won't insist.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-26 18:04                 ` Oleg Nesterov
@ 2013-11-27  2:07                   ` zhang.yi20
  2013-11-27 13:20                     ` Oleg Nesterov
  2013-11-27 18:31                     ` Tejun Heo
  0 siblings, 2 replies; 68+ messages in thread
From: zhang.yi20 @ 2013-11-27  2:07 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: htejun, lkml, Tetsuo Handa, Tejun Heo

Userspace process doesn't want the PF_NO_SETAFFINITY, but its parent may be
a kernel worker thread which has PF_NO_SETAFFINITY set, and this worker thread
can do kernel_thread() to create the child.
Clearing this flag in usersapce child to enable its migrating capability.


Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>

--- linux-3.12.old/fs/exec.c	2013-11-26 08:53:12.175811856 +0000
+++ linux-3.12/fs/exec.c	2013-11-27 09:36:56.231972168 +0000
@@ -1090,8 +1090,8 @@ int flush_old_exec(struct linux_binprm *
 	bprm->mm = NULL;		/* We're using it now */

 	set_fs(USER_DS);
-	current->flags &=
-		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
+	current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD |
+					PF_NOFREEZE | PF_NO_SETAFFINITY);
 	flush_thread();
 	current->personality &= ~bprm->per_clear;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-27  2:07                   ` zhang.yi20
@ 2013-11-27 13:20                     ` Oleg Nesterov
  2013-11-27 18:31                     ` Tejun Heo
  1 sibling, 0 replies; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-27 13:20 UTC (permalink / raw)
  To: zhang.yi20; +Cc: htejun, lkml, Tetsuo Handa, Tejun Heo

On 11/27, zhang.yi20@zte.com.cn wrote:
>
> Userspace process doesn't want the PF_NO_SETAFFINITY, but its parent may be
> a kernel worker thread which has PF_NO_SETAFFINITY set, and this worker thread
> can do kernel_thread() to create the child.
> Clearing this flag in usersapce child to enable its migrating capability.
>
>
> Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>

Thanks!

Acked-by: Oleg Nesterov <oleg@redhat.com>

> --- linux-3.12.old/fs/exec.c	2013-11-26 08:53:12.175811856 +0000
> +++ linux-3.12/fs/exec.c	2013-11-27 09:36:56.231972168 +0000
> @@ -1090,8 +1090,8 @@ int flush_old_exec(struct linux_binprm *
>  	bprm->mm = NULL;		/* We're using it now */
> 
>  	set_fs(USER_DS);
> -	current->flags &=
> -		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
> +	current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD |
> +					PF_NOFREEZE | PF_NO_SETAFFINITY);
>  	flush_thread();
>  	current->personality &= ~bprm->per_clear;
> 
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-27  2:07                   ` zhang.yi20
  2013-11-27 13:20                     ` Oleg Nesterov
@ 2013-11-27 18:31                     ` Tejun Heo
  2013-11-28  9:13                       ` Peter Zijlstra
  1 sibling, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-27 18:31 UTC (permalink / raw)
  To: zhang.yi20; +Cc: Oleg Nesterov, lkml, Tetsuo Handa, Ingo Molnar, Peter Zijlstra

On Wed, Nov 27, 2013 at 10:07:03AM +0800, zhang.yi20@zte.com.cn wrote:
> Userspace process doesn't want the PF_NO_SETAFFINITY, but its parent may be
> a kernel worker thread which has PF_NO_SETAFFINITY set, and this worker thread
> can do kernel_thread() to create the child.
> Clearing this flag in usersapce child to enable its migrating capability.
> 
> 
> Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>

Acked-by: Tejun Heo <tj@kernel.org>

cc'ing Ingo and Peter.  Ingo, I think this one doesn't really suit the
workqueue tree.  Can you please pick this one up w/ Oleg's ack added
and stable cc'd?  The original patch is

  http://article.gmane.org/gmane.linux.kernel/1602429/raw

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-27 18:31                     ` Tejun Heo
@ 2013-11-28  9:13                       ` Peter Zijlstra
  2013-11-28 11:45                         ` Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28  9:13 UTC (permalink / raw)
  To: Tejun Heo; +Cc: zhang.yi20, Oleg Nesterov, lkml, Tetsuo Handa, Ingo Molnar

On Wed, Nov 27, 2013 at 01:31:17PM -0500, Tejun Heo wrote:
> On Wed, Nov 27, 2013 at 10:07:03AM +0800, zhang.yi20@zte.com.cn wrote:
> > Userspace process doesn't want the PF_NO_SETAFFINITY, but its parent may be
> > a kernel worker thread which has PF_NO_SETAFFINITY set, and this worker thread
> > can do kernel_thread() to create the child.
> > Clearing this flag in usersapce child to enable its migrating capability.
> > 
> > 
> > Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
> 
> Acked-by: Tejun Heo <tj@kernel.org>
> 
> cc'ing Ingo and Peter.  Ingo, I think this one doesn't really suit the
> workqueue tree.  Can you please pick this one up w/ Oleg's ack added
> and stable cc'd?  The original patch is
> 
>   http://article.gmane.org/gmane.linux.kernel/1602429/raw

So I don't get the problem; aren't all usermode helper thingies spawned
by the khelper task, which doesn't have PG_NO_SETAFFINITY set?

So how come this is a problem?

The Changelog is not explaining anything much -- so no I will not take
this patch.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28  9:13                       ` Peter Zijlstra
@ 2013-11-28 11:45                         ` Oleg Nesterov
  2013-11-28 12:17                           ` Peter Zijlstra
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 11:45 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Peter Zijlstra wrote:
>
> On Wed, Nov 27, 2013 at 01:31:17PM -0500, Tejun Heo wrote:
> > On Wed, Nov 27, 2013 at 10:07:03AM +0800, zhang.yi20@zte.com.cn wrote:
> > > Userspace process doesn't want the PF_NO_SETAFFINITY, but its parent may be
> > > a kernel worker thread which has PF_NO_SETAFFINITY set, and this worker thread
> > > can do kernel_thread() to create the child.
> > > Clearing this flag in usersapce child to enable its migrating capability.
> > >
> > >
> > > Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
> >
> > Acked-by: Tejun Heo <tj@kernel.org>
> >
> > cc'ing Ingo and Peter.  Ingo, I think this one doesn't really suit the
> > workqueue tree.  Can you please pick this one up w/ Oleg's ack added
> > and stable cc'd?  The original patch is
> >
> >   http://article.gmane.org/gmane.linux.kernel/1602429/raw
>
> So I don't get the problem; aren't all usermode helper thingies spawned
> by the khelper task, which doesn't have PG_NO_SETAFFINITY set?

It has? khelper is a workqueue thread, this flag is set by create_worker().

And it does kernel_thread() (not kthread_create()) so the child gets this
flag too.

> The Changelog is not explaining anything much -- so no I will not take
> this patch.

Well, perhaps the changelog can be more verbose and clear...

Zhang, I'm afraid you have to send v3 ;)

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 11:45                         ` Oleg Nesterov
@ 2013-11-28 12:17                           ` Peter Zijlstra
  2013-11-28 13:31                             ` Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 12:17 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 12:45:42PM +0100, Oleg Nesterov wrote:
> It has? khelper is a workqueue thread, this flag is set by create_worker().
> 
> And it does kernel_thread() (not kthread_create()) so the child gets this
> flag too.

Urgh, but that's still completely wrong. khelper is a singlethread
workqueue, those should be unbound and therefore should not have this
flag set at all.

In fact, I know people want to set affinity on khelper to ensure all the
usermode helper tasks get spawned on the specific system-cpus and leave
the rest of the system unperturbed.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 12:17                           ` Peter Zijlstra
@ 2013-11-28 13:31                             ` Oleg Nesterov
  2013-11-28 13:39                               ` Peter Zijlstra
  2013-11-28 13:41                               ` [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child Peter Zijlstra
  0 siblings, 2 replies; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 13:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Peter Zijlstra wrote:
>
> On Thu, Nov 28, 2013 at 12:45:42PM +0100, Oleg Nesterov wrote:
> > It has? khelper is a workqueue thread, this flag is set by create_worker().
> >
> > And it does kernel_thread() (not kthread_create()) so the child gets this
> > flag too.
>
> Urgh, but that's still completely wrong. khelper is a singlethread
> workqueue, those should be unbound and therefore should not have this
> flag set at all.

Well. This is debatable, but I leave this to you and Tejun ;)

But:

> In fact, I know people want to set affinity on khelper

This is not that simple. Note that khelper itself is the rescuer thread,
it doesn't not process the works. There are other kworker/u* threads which
run the work queued on khelper_wq. There is a pool of threads.

> to ensure all the
> usermode helper tasks get spawned on the specific system-cpus and leave
> the rest of the system unperturbed.

I _guess_ usermodehelper_init() should use WQ_SYSFS then, and in this case
the user can write to wq_cpumask_store somewhere in /sys/.

Not sure I read this code correctly though, I didn't check.

But PF_NO_SETAFFINITY should be cleared on exec anyway, so I still think
the patch does the right thing.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 13:31                             ` Oleg Nesterov
@ 2013-11-28 13:39                               ` Peter Zijlstra
  2013-11-28 14:13                                 ` Tejun Heo
  2013-11-28 14:23                                 ` Oleg Nesterov
  2013-11-28 13:41                               ` [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child Peter Zijlstra
  1 sibling, 2 replies; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 13:39 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 02:31:52PM +0100, Oleg Nesterov wrote:
> On 11/28, Peter Zijlstra wrote:
> >
> > On Thu, Nov 28, 2013 at 12:45:42PM +0100, Oleg Nesterov wrote:
> > > It has? khelper is a workqueue thread, this flag is set by create_worker().
> > >
> > > And it does kernel_thread() (not kthread_create()) so the child gets this
> > > flag too.
> >
> > Urgh, but that's still completely wrong. khelper is a singlethread
> > workqueue, those should be unbound and therefore should not have this
> > flag set at all.
> 
> Well. This is debatable, but I leave this to you and Tejun ;)

How can that be debatable? I don't see a single argument in favour of
doing that; its plain ridiculous.

> > In fact, I know people want to set affinity on khelper
> 
> This is not that simple. Note that khelper itself is the rescuer thread,
> it doesn't not process the works. There are other kworker/u* threads which
> run the work queued on khelper_wq. There is a pool of threads.

That's just fucked. WTF does singlethreaded mean then?

A single parent process for all usermode helpers makes so much sense;
not doing it is just weird.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 13:31                             ` Oleg Nesterov
  2013-11-28 13:39                               ` Peter Zijlstra
@ 2013-11-28 13:41                               ` Peter Zijlstra
  2013-11-28 14:05                                 ` Tejun Heo
  2013-11-28 14:34                                 ` Oleg Nesterov
  1 sibling, 2 replies; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 13:41 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 02:31:52PM +0100, Oleg Nesterov wrote:
> I _guess_ usermodehelper_init() should use WQ_SYSFS then, and in this case
> the user can write to wq_cpumask_store somewhere in /sys/.

WTF is that and why are we creating alternative affinity interfaces when
sched_setaffinity() is a prefectly fine one?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 13:41                               ` [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child Peter Zijlstra
@ 2013-11-28 14:05                                 ` Tejun Heo
  2013-11-28 14:34                                 ` Oleg Nesterov
  1 sibling, 0 replies; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 14:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hello, Peter.

On Thu, Nov 28, 2013 at 02:41:24PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 28, 2013 at 02:31:52PM +0100, Oleg Nesterov wrote:
> > I _guess_ usermodehelper_init() should use WQ_SYSFS then, and in this case
> > the user can write to wq_cpumask_store somewhere in /sys/.
> 
> WTF is that and why are we creating alternative affinity interfaces when
> sched_setaffinity() is a prefectly fine one?

Hmmm?  Because workqueue is a shared worker pool implementation.  The
attributes are per-workqueue and if two workqueues have a single set
of attributes, they share the same execution resources.  There's no
one-to-one relationship between a worker thread and a workqueue;
otherwise, we end up with a ton of tasks sitting around doing nothing,
so you can't individually do sched_setaffinity() on a task and expect
it to work.

The singlethread name just exists for compatibility and there's new
interface named alloc_ordered_workqueue() and it creates a workqueue
which execute work items one-by-one in issue order.  For singlethread
/ ordered workqueues with rescuer, it could work to always execute
that work item on the rescuer as we're reserving an execution resource
anyway, but that'd unnecessarily increase cache footprint from
actively using more tasks than necessary and also increase code
complexity.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 13:39                               ` Peter Zijlstra
@ 2013-11-28 14:13                                 ` Tejun Heo
  2013-11-28 14:31                                   ` Peter Zijlstra
  2013-11-28 14:38                                   ` Peter Zijlstra
  2013-11-28 14:23                                 ` Oleg Nesterov
  1 sibling, 2 replies; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 14:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hello,

> > > Urgh, but that's still completely wrong. khelper is a singlethread
> > > workqueue, those should be unbound and therefore should not have this
> > > flag set at all.
> > 
> > Well. This is debatable, but I leave this to you and Tejun ;)
> 
> How can that be debatable? I don't see a single argument in favour of
> doing that; its plain ridiculous.

As the flag name suggests, it prevents userland from changing affinity
of the worker threads which we need whether the worker is confined to
a cpu, NUMA node, random subset of CPUs or not at all.  Please note
that there's no one-to-one mapping between a worker and any given
workqueue.  workqueue can't allow userland to set affinity on random
workers.  They're shared resources.

> > > In fact, I know people want to set affinity on khelper
> > 
> > This is not that simple. Note that khelper itself is the rescuer thread,
> > it doesn't not process the works. There are other kworker/u* threads which
> > run the work queued on khelper_wq. There is a pool of threads.
> 
> That's just fucked. WTF does singlethreaded mean then?

As explained in the other reply, it's a carried-over name and will be
replaced by alloc_ordered_workqueue().  Note that if khelper_wq is
converted to use alloc_ordered_workqueue(), the task named "khelper"
wouldn't even exist as there's no reason for the wq to have a rescuer.

> A single parent process for all usermode helpers makes so much sense;
> not doing it is just weird.

If we're gonna allow userland to play with the parent attributes,
yeah, that'd make sense.  I'm not sure whether that's an interface
that we'd want to commit to tho?  Do we really want to tell userland
"there will always be a kernel task khelper and if you change that
one's attributes all processes forked from it will inherit those
attributes no matter what they are."  I think we'd want something more
specific cause that's a lot of commitment to things that we haven't
carefully thought about.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 13:39                               ` Peter Zijlstra
  2013-11-28 14:13                                 ` Tejun Heo
@ 2013-11-28 14:23                                 ` Oleg Nesterov
  2013-11-28 14:31                                   ` Tejun Heo
  1 sibling, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 14:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Peter Zijlstra wrote:
>
> On Thu, Nov 28, 2013 at 02:31:52PM +0100, Oleg Nesterov wrote:
> > On 11/28, Peter Zijlstra wrote:
> > >
> > > On Thu, Nov 28, 2013 at 12:45:42PM +0100, Oleg Nesterov wrote:
> > > > It has? khelper is a workqueue thread, this flag is set by create_worker().
> > > >
> > > > And it does kernel_thread() (not kthread_create()) so the child gets this
> > > > flag too.
> > >
> > > Urgh, but that's still completely wrong. khelper is a singlethread
> > > workqueue, those should be unbound and therefore should not have this
> > > flag set at all.
> >
> > Well. This is debatable, but I leave this to you and Tejun ;)
>
> How can that be debatable? I don't see a single argument in favour of
> doing that; its plain ridiculous.

Let me repeat, I do not pretend I understand the current implementation
in details.

However, I'd like to say that I do not think think this is ridiculous.
I think this is clever.


> > > In fact, I know people want to set affinity on khelper
> >
> > This is not that simple. Note that khelper itself is the rescuer thread,
> > it doesn't not process the works. There are other kworker/u* threads which
> > run the work queued on khelper_wq. There is a pool of threads.
>
> That's just fucked. WTF does singlethreaded mean then?

Yes, the naming is misleading. But it was always misleading.

"singlethreaded" meant a single thread, yes, but this just reflected the
implementation details. What it actually meant is: not bound to any cpu,
and the works can't race with each other.

create_singlethread_workqueue() still has the same semantics due to
WQ_UNBOUND && max_active == 1. So in this sense (max_active == 1) it
is still single threaded, just (iiuc, Tejun can correct me) it does
not guarantee that the kernel thread which actually runs the work will
be always the same.

> A single parent process for all usermode helpers makes so much sense;

This was never true, at least in UMH_WAIT_PROC case.

> not doing it is just weird.

Why?

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:13                                 ` Tejun Heo
@ 2013-11-28 14:31                                   ` Peter Zijlstra
  2013-11-28 14:38                                     ` Tejun Heo
  2013-11-28 14:43                                     ` Peter Zijlstra
  2013-11-28 14:38                                   ` Peter Zijlstra
  1 sibling, 2 replies; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 14:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 09:13:29AM -0500, Tejun Heo wrote:
> Hello,
> 
> > > > Urgh, but that's still completely wrong. khelper is a singlethread
> > > > workqueue, those should be unbound and therefore should not have this
> > > > flag set at all.
> > > 
> > > Well. This is debatable, but I leave this to you and Tejun ;)
> > 
> > How can that be debatable? I don't see a single argument in favour of
> > doing that; its plain ridiculous.
> 
> As the flag name suggests, it prevents userland from changing affinity
> of the worker threads which we need whether the worker is confined to
> a cpu, NUMA node, random subset of CPUs or not at all.  Please note
> that there's no one-to-one mapping between a worker and any given
> workqueue.  workqueue can't allow userland to set affinity on random
> workers.  They're shared resources.

But the WQ_UNBOUND thingies should be just that and should thus not have
the NO_SETAFFINITY flag set because there is no valid reason to have it
set.

Regardless of whether the threads are shared between unbound workqueues
or not.

Sure, we absolutely must set it for per-cpu workqueues (and their
workers) otherwise we cannot guarantee correctness. Same for per-node if
we have that.

But absolutely not for unbound.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:23                                 ` Oleg Nesterov
@ 2013-11-28 14:31                                   ` Tejun Heo
  2013-11-28 15:00                                     ` Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 14:31 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Peter Zijlstra, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hello,

On Thu, Nov 28, 2013 at 03:23:59PM +0100, Oleg Nesterov wrote:
> Yes, the naming is misleading. But it was always misleading.
> "singlethreaded" meant a single thread, yes, but this just reflected the
> implementation details. What it actually meant is: not bound to any cpu,
> and the works can't race with each other.

Way back when it was actually a singlethread, the naming was kinda
okay.  Now it's a deprecated interface.  I just didn't finish
convering the existing ones to alloc_ordered_workqueue().

> create_singlethread_workqueue() still has the same semantics due to
> WQ_UNBOUND && max_active == 1. So in this sense (max_active == 1) it
> is still single threaded, just (iiuc, Tejun can correct me) it does
> not guarantee that the kernel thread which actually runs the work will
> be always the same.

So, the thing with create_singlethread_workqueue() is that the
original implementation guaranteed dedicated execution resource, which
means that it requires WQ_RESCUER in the new (well, newer)
implementation.  Also, singlethread workqueue used to be used not
because singlethreadedness was necessary but just to avoid creating
all the per-cpu workers, which is no longer necessary and actually is
less efficient.  So, converting create_singlethread_workqueue()
instances to the new interface usually involves making the two
determinations.

* Is WQ_RESCUER actually necessary?  If not, WQ_RESCUER will be
  dropped and the task bearing the name of the workqueue will no
  longer exist.

* Is ordered execution necessary?  If not, it can be converted to
  alloc_workqueue() or just to use system_wq.

khelper is special as its attributes get inherited to its children,
so, yeah, we probably wanna keep that one's cpumask set to all.
Hmmm... I think it's already broken.  So, on NUMA machines, unbound
workqueues will be segmented to each NUMA node by default.  If khelper
is gonna keep using workqueue, it should set turn on no_numa attribute
explicitly.  I'll prep a patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 13:41                               ` [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child Peter Zijlstra
  2013-11-28 14:05                                 ` Tejun Heo
@ 2013-11-28 14:34                                 ` Oleg Nesterov
  1 sibling, 0 replies; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 14:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Peter Zijlstra wrote:
>
> On Thu, Nov 28, 2013 at 02:31:52PM +0100, Oleg Nesterov wrote:
> > I _guess_ usermodehelper_init() should use WQ_SYSFS then, and in this case
> > the user can write to wq_cpumask_store somewhere in /sys/.
>
> WTF is that and why are we creating alternative affinity interfaces when
> sched_setaffinity() is a prefectly fine one?

Because there is no a simple workqueue/thread connection, I guess.

And I do not understand why do you dislike this.

For example. Please note that with the new design we can even kill
khelper_wq and the ugly kmod_thread_locker hack (just in case, I am not
saying that the patch which added kmod_thread_locker was ugly ;).

We can just use one of the system_ WQ_UNBOUND workqueues which has
the "large enough" max_active.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:31                                   ` Peter Zijlstra
@ 2013-11-28 14:38                                     ` Tejun Heo
  2013-11-28 14:47                                       ` Peter Zijlstra
  2013-11-28 14:55                                       ` Peter Zijlstra
  2013-11-28 14:43                                     ` Peter Zijlstra
  1 sibling, 2 replies; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 14:38 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hey, Peter.

On Thu, Nov 28, 2013 at 03:31:45PM +0100, Peter Zijlstra wrote:
> But the WQ_UNBOUND thingies should be just that and should thus not have
> the NO_SETAFFINITY flag set because there is no valid reason to have it
> set.
> 
> Regardless of whether the threads are shared between unbound workqueues
> or not.

Hah?  No, we do not want to allow userland to be able to set
affinities on any workqueue workers, period.  That's just inviting
people to do weirdest things and then reporting things like "crypt
jobs on some of our 500 machines end up stuck on a single cpu once in
a while" which will eventually be tracked down to some weird shell
script setting affinity on workers doing something else.

We really want to insulate workers and pool operation from userland.
e.g. unbound workqueues now default to per-NUMA affinity unless
explicitly told not to, which leads to better overall behavior for
most workloads.  We do wanna keep those details from userland so that
they can be improved in the future too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:13                                 ` Tejun Heo
  2013-11-28 14:31                                   ` Peter Zijlstra
@ 2013-11-28 14:38                                   ` Peter Zijlstra
  2013-11-28 14:45                                     ` Tejun Heo
  2013-11-28 15:34                                     ` Oleg Nesterov
  1 sibling, 2 replies; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 14:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 09:13:29AM -0500, Tejun Heo wrote:
> > A single parent process for all usermode helpers makes so much sense;
> > not doing it is just weird.
> 
> If we're gonna allow userland to play with the parent attributes,
> yeah, that'd make sense.  I'm not sure whether that's an interface
> that we'd want to commit to tho?  Do we really want to tell userland
> "there will always be a kernel task khelper and if you change that
> one's attributes all processes forked from it will inherit those
> attributes no matter what they are."  I think we'd want something more
> specific cause that's a lot of commitment to things that we haven't
> carefully thought about.

It seems like a perfectly fine interface to me. And much preferable to
creating yet another weird interface to manage tasks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:31                                   ` Peter Zijlstra
  2013-11-28 14:38                                     ` Tejun Heo
@ 2013-11-28 14:43                                     ` Peter Zijlstra
  2013-11-28 14:53                                       ` Tejun Heo
  1 sibling, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 14:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 03:31:45PM +0100, Peter Zijlstra wrote:
> Sure, we absolutely must set it for per-cpu workqueues (and their
> workers) otherwise we cannot guarantee correctness. Same for per-node if
> we have that.

On that, the per-node thing is debatable. There's no correctness issues
with per-node stuff as we have with per-cpu storage.

And if there's no correctness implications we should not force things.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:38                                   ` Peter Zijlstra
@ 2013-11-28 14:45                                     ` Tejun Heo
  2013-11-28 14:53                                       ` Peter Zijlstra
  2013-11-28 15:34                                     ` Oleg Nesterov
  1 sibling, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 14:45 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hello,

On Thu, Nov 28, 2013 at 03:38:57PM +0100, Peter Zijlstra wrote:
> It seems like a perfectly fine interface to me. And much preferable to
> creating yet another weird interface to manage tasks.

The problem with that is it isn't an interface where the user
specifies what's desired but more of just an exposed facet of
implementation details and locks us into either keeping that single
parent model for the eternity or doing a shitty hack like extracing
attributes from that task if we ever need to change the implementation
for whatever reason.  In general, it's a much better idea to have an
interface where the feature supported is represented explicitly and
finitely.  If this is something which is actually necessary, I'd vote
for having an explicit interface for the desired feature, whatever
that may be.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:38                                     ` Tejun Heo
@ 2013-11-28 14:47                                       ` Peter Zijlstra
  2013-11-28 14:51                                         ` Tejun Heo
  2013-11-28 14:55                                       ` Peter Zijlstra
  1 sibling, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 14:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 09:38:48AM -0500, Tejun Heo wrote:
> Hey, Peter.
> 
> On Thu, Nov 28, 2013 at 03:31:45PM +0100, Peter Zijlstra wrote:
> > But the WQ_UNBOUND thingies should be just that and should thus not have
> > the NO_SETAFFINITY flag set because there is no valid reason to have it
> > set.
> > 
> > Regardless of whether the threads are shared between unbound workqueues
> > or not.
> 
> Hah?  No, we do not want to allow userland to be able to set
> affinities on any workqueue workers, period.  That's just inviting
> people to do weirdest things and then reporting things like "crypt
> jobs on some of our 500 machines end up stuck on a single cpu once in
> a while" which will eventually be tracked down to some weird shell
> script setting affinity on workers doing something else.
> 
> We really want to insulate workers and pool operation from userland.
> e.g. unbound workqueues now default to per-NUMA affinity unless
> explicitly told not to, which leads to better overall behavior for
> most workloads.  We do wanna keep those details from userland so that
> they can be improved in the future too.

Bah, windows mentality of we know better.

If they do stupid they get stupid; the only thing we should be concerned
about is correctness, they shouldn't be able to crash the system.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:47                                       ` Peter Zijlstra
@ 2013-11-28 14:51                                         ` Tejun Heo
  0 siblings, 0 replies; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 14:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 03:47:14PM +0100, Peter Zijlstra wrote:
> Bah, windows mentality of we know better.
> 
> If they do stupid they get stupid; the only thing we should be concerned
> about is correctness, they shouldn't be able to crash the system.

It isn't about that we know better but more that we do not want to
leak implementation details to userland.  "Whatever isn't crashing is
okay" doesn't work because we aren't supposed to change behaviors
which are depended upon from userland.  Unless what's exposed is
actively managed, we end up being locked into specific unintended
characteristics of the current implementation.  It hurts both the
kernel and userland.  We end up having to hack around silly
constraints in the future and userland gets surprised when such hacks
inevitably falls apart in corner cases.

So, no, it's not about which part knows better or windows mentaility.
It's about having proper separation between implementation details and
exposed interface.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:43                                     ` Peter Zijlstra
@ 2013-11-28 14:53                                       ` Tejun Heo
  2013-11-28 14:57                                         ` Peter Zijlstra
  0 siblings, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 14:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 03:43:59PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 28, 2013 at 03:31:45PM +0100, Peter Zijlstra wrote:
> > Sure, we absolutely must set it for per-cpu workqueues (and their
> > workers) otherwise we cannot guarantee correctness. Same for per-node if
> > we have that.
> 
> On that, the per-node thing is debatable. There's no correctness issues
> with per-node stuff as we have with per-cpu storage.
> 
> And if there's no correctness implications we should not force things.

That's true iff you confine the "correctness" to not crashing.  That's
an extremely narrow definition tho and most will argue against that.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:45                                     ` Tejun Heo
@ 2013-11-28 14:53                                       ` Peter Zijlstra
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 14:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 09:45:02AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Thu, Nov 28, 2013 at 03:38:57PM +0100, Peter Zijlstra wrote:
> > It seems like a perfectly fine interface to me. And much preferable to
> > creating yet another weird interface to manage tasks.
> 
> The problem with that is it isn't an interface where the user
> specifies what's desired but more of just an exposed facet of
> implementation details and locks us into either keeping that single
> parent model for the eternity or doing a shitty hack like extracing
> attributes from that task if we ever need to change the implementation
> for whatever reason.  In general, it's a much better idea to have an
> interface where the feature supported is represented explicitly and
> finitely.  If this is something which is actually necessary, I'd vote
> for having an explicit interface for the desired feature, whatever
> that may be.

The desired feature is to contain all current and future userspace
helpers.

Having the single parent allows using either cgroups or direct affinity
constraints, depending on whatever the user likes.

Related is the desire to contain all current and future unbound
workqueue threads. Not being able to set affinity (or cpusets for that
matter) on unbound workqueue threads actively breaks these things.

While the current use-cases I'm aware of treat these two groups as one,
I'm not sure we want to merge them.

The problem with yet another interface is that we'll likely expose just
the affinity, which would preclude using cgroups for this task.

Having single parents (even virtual as you propose, where we harvest
context attributes) allows us to easily find the extant child tasks and
set the desired attributes for future children with the full set of task
interfaces we've currently got.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:38                                     ` Tejun Heo
  2013-11-28 14:47                                       ` Peter Zijlstra
@ 2013-11-28 14:55                                       ` Peter Zijlstra
  2013-11-28 14:57                                         ` Tejun Heo
  1 sibling, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 14:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 09:38:48AM -0500, Tejun Heo wrote:
> Hah?  No, we do not want to allow userland to be able to set
> affinities on any workqueue workers, period.  That's just inviting
> people to do weirdest things and then reporting things like "crypt
> jobs on some of our 500 machines end up stuck on a single cpu once in
> a while" which will eventually be tracked down to some weird shell
> script setting affinity on workers doing something else.

Because that is exactly what some users desire? Some RT and HPC people
want all those crypt jobs to run on a few designated cpus and not wreck
their compute jobs.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:55                                       ` Peter Zijlstra
@ 2013-11-28 14:57                                         ` Tejun Heo
  2013-11-28 14:59                                           ` Peter Zijlstra
  0 siblings, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 14:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 03:55:05PM +0100, Peter Zijlstra wrote:
> Because that is exactly what some users desire? Some RT and HPC people
> want all those crypt jobs to run on a few designated cpus and not wreck
> their compute jobs.

And you can't do that reliably from userland.  Those things are
created dynamically.  workqueue provides per-workqueue attributes to
enable those use cases.  It doesn't have system-wide default
attributes right now but adding that shouldn't be too difficult.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:53                                       ` Tejun Heo
@ 2013-11-28 14:57                                         ` Peter Zijlstra
  2013-11-28 15:02                                           ` Tejun Heo
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 14:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 09:53:25AM -0500, Tejun Heo wrote:
> On Thu, Nov 28, 2013 at 03:43:59PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 28, 2013 at 03:31:45PM +0100, Peter Zijlstra wrote:
> > > Sure, we absolutely must set it for per-cpu workqueues (and their
> > > workers) otherwise we cannot guarantee correctness. Same for per-node if
> > > we have that.
> > 
> > On that, the per-node thing is debatable. There's no correctness issues
> > with per-node stuff as we have with per-cpu storage.
> > 
> > And if there's no correctness implications we should not force things.
> 
> That's true iff you confine the "correctness" to not crashing.  That's
> an extremely narrow definition tho and most will argue against that.

Do you have a better definition that doesn't get in the way of people
wanting to do actual work?

So far I just see you breaking existing setups because you don't want to
support things that work perfectly well.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:57                                         ` Tejun Heo
@ 2013-11-28 14:59                                           ` Peter Zijlstra
  2013-11-28 15:07                                             ` Tejun Heo
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 14:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 09:57:23AM -0500, Tejun Heo wrote:
> On Thu, Nov 28, 2013 at 03:55:05PM +0100, Peter Zijlstra wrote:
> > Because that is exactly what some users desire? Some RT and HPC people
> > want all those crypt jobs to run on a few designated cpus and not wreck
> > their compute jobs.
> 
> And you can't do that reliably from userland.  Those things are
> created dynamically.  workqueue provides per-workqueue attributes to
> enable those use cases.  It doesn't have system-wide default
> attributes right now but adding that shouldn't be too difficult.

Like a parent process, right? So people can use the existing task
interfaces.

Or are we going to tell people; here, there's cgroups for managing your
tasks. Oh and here is this crippled interface X for if you want to
manage these few tasks, and interface Y for managing these other few
tasks.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:31                                   ` Tejun Heo
@ 2013-11-28 15:00                                     ` Oleg Nesterov
  2013-11-28 15:02                                       ` Peter Zijlstra
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 15:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Zijlstra, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Tejun Heo wrote:
>
> * Is WQ_RESCUER actually necessary?  If not, WQ_RESCUER will be
>   dropped and the task bearing the name of the workqueue will no
>   longer exist.

WQ_MEM_RECLAIM, I guess. Probably not...

> * Is ordered execution necessary?  If not, it can be converted to
>   alloc_workqueue() or just to use system_wq.

I think no. This is the reason for kmod_thread_locker hack.

> khelper is special as its attributes get inherited to its children,
> so, yeah, we probably wanna keep that one's cpumask set to all.

And btw. Note ____call_usermodehelper()->set_cpus_allowed_ptr(cpu_all_mask).

Even if we change the affinity of the "khelper" worker threads this
won't restrict the user-space helpers.

I think this set_cpus_allowed_ptr() should die in any case?

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:57                                         ` Peter Zijlstra
@ 2013-11-28 15:02                                           ` Tejun Heo
  2013-11-28 15:07                                             ` Peter Zijlstra
  0 siblings, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 15:02 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 03:57:55PM +0100, Peter Zijlstra wrote:
> Do you have a better definition that doesn't get in the way of people
> wanting to do actual work?

As written multiple times now, workqueue does have per-workqueue
attributes and if you really want a dedicated worker thread, use a
dedicated kthread.

> So far I just see you breaking existing setups because you don't want to
> support things that work perfectly well.

It doesn't work as explained multiple times in this thread.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:00                                     ` Oleg Nesterov
@ 2013-11-28 15:02                                       ` Peter Zijlstra
  2013-11-28 19:33                                         ` [PATCH 0/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr() Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 15:02 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 04:00:55PM +0100, Oleg Nesterov wrote:
> And btw. Note ____call_usermodehelper()->set_cpus_allowed_ptr(cpu_all_mask).
> 
> Even if we change the affinity of the "khelper" worker threads this
> won't restrict the user-space helpers.
> 
> I think this set_cpus_allowed_ptr() should die in any case?

Yes. A while back someone proposed adding a magic cpumask to tweak in
order to contain the usermode helpers.

Obviously I thought that was a bad idea.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:02                                           ` Tejun Heo
@ 2013-11-28 15:07                                             ` Peter Zijlstra
  2013-11-28 15:10                                               ` Tejun Heo
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 15:07 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 10:02:10AM -0500, Tejun Heo wrote:
> > So far I just see you breaking existing setups because you don't want to
> > support things that work perfectly well.
> 
> It doesn't work as explained multiple times in this thread.

It used to.. just not on recent kernels. You know 'enterprise' latency.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:59                                           ` Peter Zijlstra
@ 2013-11-28 15:07                                             ` Tejun Heo
  2013-11-28 15:17                                               ` Peter Zijlstra
  0 siblings, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 15:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 03:59:48PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 28, 2013 at 09:57:23AM -0500, Tejun Heo wrote:
> > On Thu, Nov 28, 2013 at 03:55:05PM +0100, Peter Zijlstra wrote:
> > > Because that is exactly what some users desire? Some RT and HPC people
> > > want all those crypt jobs to run on a few designated cpus and not wreck
> > > their compute jobs.
> > 
> > And you can't do that reliably from userland.  Those things are
> > created dynamically.  workqueue provides per-workqueue attributes to
> > enable those use cases.  It doesn't have system-wide default
> > attributes right now but adding that shouldn't be too difficult.
> 
> Like a parent process, right? So people can use the existing task
> interfaces.
>
> Or are we going to tell people; here, there's cgroups for managing your
> tasks. Oh and here is this crippled interface X for if you want to
> manage these few tasks, and interface Y for managing these other few
> tasks.

Changing attributes on the parent doesn't get propagated to its
children, so I don't think that'd be a terribly good interface for
workqueue.  You'll end up with workers with mixed attributes and
regardless, workqueue management logic needs to know which workqueue
has which set of attributes to decide how the workers are shared.
Overrloading all that over task mgmt interface would be horrible.

For transient processes like usermode helpers, single parent could
work.  I don't necessarily think exposing that is a good idea but if
that's gonna happen, that's not gonna be part of workqueue.  Just
create a dedicated kthread for it.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:07                                             ` Peter Zijlstra
@ 2013-11-28 15:10                                               ` Tejun Heo
  2013-11-28 15:18                                                 ` Peter Zijlstra
  0 siblings, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 15:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 04:07:04PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 28, 2013 at 10:02:10AM -0500, Tejun Heo wrote:
> > > So far I just see you breaking existing setups because you don't want to
> > > support things that work perfectly well.
> > 
> > It doesn't work as explained multiple times in this thread.
> 
> It used to.. just not on recent kernels. You know 'enterprise' latency.

If you're talking about khelfper and wanna restore it, it really
should be broken out into a separate kthread.  It doesn't make any
sense to implement that in the workqueue framework.  Why would you
implement a dedicated task inside a worker pool implementation which
makes use of the said tasks?  There's even kthread_work interface
which pretty much provides workqueue-equivalent interface on top of a
single task for cases like this.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:07                                             ` Tejun Heo
@ 2013-11-28 15:17                                               ` Peter Zijlstra
  2013-11-28 15:39                                                 ` Tejun Heo
  2013-11-28 15:47                                                 ` Oleg Nesterov
  0 siblings, 2 replies; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 15:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 10:07:22AM -0500, Tejun Heo wrote:
> Changing attributes on the parent doesn't get propagated to its
> children, so I don't think that'd be a terribly good interface for
> workqueue. 

The idea was that if there's a single parent, its easy to find all the
children using the task hierarchy. So flip the parent to the new
attributes, then iterate the children and flip them.

That's a single pass race-free approach. New children will already have
the desired attributes, dead children we don't care about.

So there's three useful parts to having a single parent task:

 - its a task so you can change the entire task attribute set; current
   and future.

 - new children will automatically get the desired attributes.

 - all children are easily identified by virtual of being children of
   said parent process.

Currently its hard to find all usermode helpers or all workqueue tasks.

I know people who are poking at /proc/$pid/stat to inspect
task_struct::flags while gleaning the PF_ flags from sched.h and run
this in a polling loop to catch new entries.

Surely that's a bad idea ;-)

> You'll end up with workers with mixed attributes and
> regardless, workqueue management logic needs to know which workqueue
> has which set of attributes to decide how the workers are shared.
> Overrloading all that over task mgmt interface would be horrible.

Well, mixed attributes is you own responsibility. I'm all for letting
people shoot themselves in the foot as long we don't crash.

The huge disadvantage to creating special interfaces is that you can
only capture a small part of the task attributes; and worse, you create
a special limited interface for a special few tasks.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:10                                               ` Tejun Heo
@ 2013-11-28 15:18                                                 ` Peter Zijlstra
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 15:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 10:10:35AM -0500, Tejun Heo wrote:
> On Thu, Nov 28, 2013 at 04:07:04PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 28, 2013 at 10:02:10AM -0500, Tejun Heo wrote:
> > > > So far I just see you breaking existing setups because you don't want to
> > > > support things that work perfectly well.
> > > 
> > > It doesn't work as explained multiple times in this thread.
> > 
> > It used to.. just not on recent kernels. You know 'enterprise' latency.
> 
> If you're talking about khelfper and wanna restore it, it really
> should be broken out into a separate kthread.  It doesn't make any
> sense to implement that in the workqueue framework.  Why would you
> implement a dedicated task inside a worker pool implementation which
> makes use of the said tasks?  There's even kthread_work interface
> which pretty much provides workqueue-equivalent interface on top of a
> single task for cases like this.

That would only solve one of my problems. People want to contain the
unbound workqueues too.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 14:38                                   ` Peter Zijlstra
  2013-11-28 14:45                                     ` Tejun Heo
@ 2013-11-28 15:34                                     ` Oleg Nesterov
  2013-11-28 15:40                                       ` Peter Zijlstra
  1 sibling, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 15:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Peter Zijlstra wrote:
>
> On Thu, Nov 28, 2013 at 09:13:29AM -0500, Tejun Heo wrote:
> > > A single parent process for all usermode helpers makes so much sense;
> > > not doing it is just weird.
> >
> > If we're gonna allow userland to play with the parent attributes,
> > yeah, that'd make sense.  I'm not sure whether that's an interface
> > that we'd want to commit to tho?  Do we really want to tell userland
> > "there will always be a kernel task khelper and if you change that
> > one's attributes all processes forked from it will inherit those
> > attributes no matter what they are."  I think we'd want something more
> > specific cause that's a lot of commitment to things that we haven't
> > carefully thought about.
>
> It seems like a perfectly fine interface to me. And much preferable to
> creating yet another weird interface to manage tasks.

OK. I am not sure, but perhaps this makes sense.

(Although this means that we will always have the problem with the
 recursive UMH_WAIT_* requests).

In this case khelper should be turned into kthread_worker, this looks
simple.

But note that in the longer term we might want even more. We probably
want a non-daemonized thread controlled by the user-space. And even
more, this thread should be per-namespace (this needs a lot more
discussion).

But whataver we do later, I believe that the patch from Zhang should
be applied now.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:17                                               ` Peter Zijlstra
@ 2013-11-28 15:39                                                 ` Tejun Heo
  2013-11-28 16:33                                                   ` Peter Zijlstra
  2013-11-28 15:47                                                 ` Oleg Nesterov
  1 sibling, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-28 15:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hey,

On Thu, Nov 28, 2013 at 04:17:04PM +0100, Peter Zijlstra wrote:
> So there's three useful parts to having a single parent task:
> 
>  - its a task so you can change the entire task attribute set; current
>    and future.

Using task as interface could be okay but I'd still go for explicitly
specifying what gets inherited and expand them gradually; otherwise,
we end up exposing broken stuff unintentionally.  cpuset did this with
bound workers and the capability was removed retro-actively, which is
not a happy situation.

>  - new children will automatically get the desired attributes.
> 
>  - all children are easily identified by virtual of being children of
>    said parent process.

That'd mean that we'd have to have a dummy target task for attributes
for each workqueue and hooks for workqueue to get notified of
attribute changes.  Unless we're gonna go back to per-workqueue
workers, we can't have a single parent per workqueue and all its
workers as children of it.  Different workqueue configure different
set of attributes.  Not all !percpu workers are equal and each
workqueue serves as an attribute domain.

We *could* do all that and it proably won't require walking the
children from userland as each attribute change would surmount to
finding or creating a matching worker pool, but it doesn't look
attractive to me.

> Well, mixed attributes is you own responsibility. I'm all for letting
> people shoot themselves in the foot as long we don't crash.

Again, I'm worried about exposing unintended characteristics of
implementation and being locked into it.  Regardless of interface, I
think it's important to control what can be depended upon from
userland if we're gonna keep up "no userland visible behavior will
break" thing.

> The huge disadvantage to creating special interfaces is that you can
> only capture a small part of the task attributes; and worse, you create
> a special limited interface for a special few tasks.

Yeah, that's the disadvantage but I don't think the single parent per
workqueue model is gonna work.  FWIW, workqueue implements
standardized sysfs interface so that each user doesn't end up with
custom interface (writeback was growing one and got switched to the
workqueue standard one).

workqueue is shared pools of workers keyed by specific worker
attributes.  There evidently are restrictions coming from its nature
and no matter what we do workqueue needs to be taught to distinguish
each attribute.  I think workqueue-wide interface is an acceptable
compromise especially considering that there are attributes which
can't be represented by a single task such as max_active and automatic
NUMA binding, which means we need workqueue-specific interface anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:34                                     ` Oleg Nesterov
@ 2013-11-28 15:40                                       ` Peter Zijlstra
  2013-11-28 16:20                                         ` Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 15:40 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 04:34:43PM +0100, Oleg Nesterov wrote:
> But note that in the longer term we might want even more. We probably
> want a non-daemonized thread controlled by the user-space. And even
> more, this thread should be per-namespace (this needs a lot more
> discussion).

Which namespace? PID namespace I presume where we can have a 'new' init
task and everything.

I'm not sure, are any of these things (workqueues, userspace helpers)
pid namespace aware? If not it doesn't seem to make sense to expose this
to nested PID namespaces and would be something special for the root
namespace.

> But whataver we do later, I believe that the patch from Zhang should
> be applied now.

Probably, although with a much adjusted changelog.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:17                                               ` Peter Zijlstra
  2013-11-28 15:39                                                 ` Tejun Heo
@ 2013-11-28 15:47                                                 ` Oleg Nesterov
  2013-11-28 16:07                                                   ` Oleg Nesterov
  1 sibling, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 15:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Peter Zijlstra wrote:
>
> So there's three useful parts to having a single parent task:
>
>  - its a task so you can change the entire task attribute set; current
>    and future.

You can't change, say, its fs->root.

OK, OK, this is almost off-topic, I am just trying to say that
(perhaps) in the longer term we might want even more than just
a dedicated kernel thread.

>  - new children will automatically get the desired attributes.

yes,

>  - all children are easily identified by virtual of being children of
>    said parent process.

Well, UMH_WAIT_PROC won't work in this case, or it will be "single-threaded"
too. OK, this is probably solvable, but nontrivial. In this case "khelper"
should act as a /sbin/init for user-mode helpers.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:47                                                 ` Oleg Nesterov
@ 2013-11-28 16:07                                                   ` Oleg Nesterov
  0 siblings, 0 replies; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 16:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Oleg Nesterov wrote:
>
> On 11/28, Peter Zijlstra wrote:
> >
> > So there's three useful parts to having a single parent task:
> >
> >  - its a task so you can change the entire task attribute set; current
> >    and future.
>
> You can't change, say, its fs->root.
>
> OK, OK, this is almost off-topic, I am just trying to say that
> (perhaps) in the longer term we might want even more than just
> a dedicated kernel thread.
>
> >  - new children will automatically get the desired attributes.
>
> yes,
>
> >  - all children are easily identified by virtual of being children of
> >    said parent process.
>
> Well, UMH_WAIT_PROC won't work in this case, or it will be "single-threaded"
> too. OK, this is probably solvable, but nontrivial. In this case "khelper"
> should act as a /sbin/init for user-mode helpers.

But I'm afraid I got lost...

I interpreted "having a single parent task" as "parent of all usermode
helpers".

If you actually meant that every wq thread should have a parent which
represents the workqueue, well, I disagree but I leave this to you and
Tejun again ;)

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:40                                       ` Peter Zijlstra
@ 2013-11-28 16:20                                         ` Oleg Nesterov
  2013-11-28 16:38                                           ` Peter Zijlstra
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 16:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Peter Zijlstra wrote:
>
> On Thu, Nov 28, 2013 at 04:34:43PM +0100, Oleg Nesterov wrote:
> > But note that in the longer term we might want even more. We probably
> > want a non-daemonized thread controlled by the user-space. And even
> > more, this thread should be per-namespace (this needs a lot more
> > discussion).
>
> Which namespace? PID namespace I presume where we can have a 'new' init
> task and everything.
>
> I'm not sure, are any of these things (workqueues, userspace helpers)
> pid namespace aware? If not it doesn't seem to make sense to expose this
> to nested PID namespaces and would be something special for the root
> namespace.

Not sure I understand correctly. But yes, of course, it is not that
call_usermodehelper() should be namespace-friendly "unconditionally".

We need another API (although perhaps we can simply add UMH_NAMESPACE
flag, this doesn't matter).

Just for example, the piped core handler. Currently it is hardly useful
in containers.

> > But whataver we do later, I believe that the patch from Zhang should
> > be applied now.
>
> Probably, although with a much adjusted changelog.

Zhang, please make v3 ;)

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 15:39                                                 ` Tejun Heo
@ 2013-11-28 16:33                                                   ` Peter Zijlstra
  2013-11-29 14:33                                                     ` Tejun Heo
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 16:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 10:39:06AM -0500, Tejun Heo wrote:
> Hey,
> 
> On Thu, Nov 28, 2013 at 04:17:04PM +0100, Peter Zijlstra wrote:
> > So there's three useful parts to having a single parent task:
> > 
> >  - its a task so you can change the entire task attribute set; current
> >    and future.
> 
> Using task as interface could be okay but I'd still go for explicitly
> specifying what gets inherited and expand them gradually; otherwise,
> we end up exposing broken stuff unintentionally.  cpuset did this with
> bound workers and the capability was removed retro-actively, which is
> not a happy situation.

I can work with that. We'd need way to inhibit setting certain
attributes, but that can be worked out -- its all in-kernel anyway.

> >  - new children will automatically get the desired attributes.
> > 
> >  - all children are easily identified by virtual of being children of
> >    said parent process.
> 
> That'd mean that we'd have to have a dummy target task for attributes
> for each workqueue and hooks for workqueue to get notified of
> attribute changes.  Unless we're gonna go back to per-workqueue
> workers, we can't have a single parent per workqueue and all its
> workers as children of it.  Different workqueue configure different
> set of attributes.  Not all !percpu workers are equal and each
> workqueue serves as an attribute domain.
> 
> We *could* do all that and it proably won't require walking the
> children from userland as each attribute change would surmount to
> finding or creating a matching worker pool, but it doesn't look
> attractive to me.

I'm not sure we need a single parent per workqueue; certainly the case I
get asked most frequently about doesn't care, they only want to contain
_all_ unbound workers.

I don't see a problem with later splitting out other workqueues if
there's a good use-case for those.

I'm not even sure we need to split out the userspace helpers per-se;
again, they fall in the all-unbound category and I don't think I've seen
people ask for specific control of those over other unbound workers --
although conceptually it does make some sense to split them out.

> > Well, mixed attributes is you own responsibility. I'm all for letting
> > people shoot themselves in the foot as long we don't crash.
> 
> Again, I'm worried about exposing unintended characteristics of
> implementation and being locked into it.  Regardless of interface, I
> think it's important to control what can be depended upon from
> userland if we're gonna keep up "no userland visible behavior will
> break" thing.

I appreciate your caution, but we shouldn't overdo the thing and
dis-allow everything.

> > The huge disadvantage to creating special interfaces is that you can
> > only capture a small part of the task attributes; and worse, you create
> > a special limited interface for a special few tasks.
> 
> Yeah, that's the disadvantage but I don't think the single parent per
> workqueue model is gonna work. 

I never proposed a parent per workqueue. The most I proposed was a
single parent for all unbound workers and a parent for all usermode
helpers.

> automatic
> NUMA binding, which means we need workqueue-specific interface anyway.

I'm curious; why is there workqueue numa stuff? NUMA doesn't have the
correctness issues per-cpu has -- per-cpu is fundamentally special in
that there's no concurrency.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 16:20                                         ` Oleg Nesterov
@ 2013-11-28 16:38                                           ` Peter Zijlstra
  2013-11-28 18:13                                             ` Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2013-11-28 16:38 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On Thu, Nov 28, 2013 at 05:20:25PM +0100, Oleg Nesterov wrote:
> On 11/28, Peter Zijlstra wrote:
> >
> > On Thu, Nov 28, 2013 at 04:34:43PM +0100, Oleg Nesterov wrote:
> > > But note that in the longer term we might want even more. We probably
> > > want a non-daemonized thread controlled by the user-space. And even
> > > more, this thread should be per-namespace (this needs a lot more
> > > discussion).
> >
> > Which namespace? PID namespace I presume where we can have a 'new' init
> > task and everything.
> >
> > I'm not sure, are any of these things (workqueues, userspace helpers)
> > pid namespace aware? If not it doesn't seem to make sense to expose this
> > to nested PID namespaces and would be something special for the root
> > namespace.
> 
> Not sure I understand correctly. But yes, of course, it is not that
> call_usermodehelper() should be namespace-friendly "unconditionally".
> 
> We need another API (although perhaps we can simply add UMH_NAMESPACE
> flag, this doesn't matter).
> 
> Just for example, the piped core handler. Currently it is hardly useful
> in containers.

I'm afraid I'm not much familiar with the entire namespace thing other
than broad concepts.

But if there's specific per-pid-namespace functionality for
usermode-helpers, then yes it makes sense to have per-pid-namespace
parents.

So in specific, you say that piping a core file into a usermode helper
is currently busted in pid-namespaces and that fixing that would indeed
introduce such pid-namespace awareness to the usermode-helper stuff?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 16:38                                           ` Peter Zijlstra
@ 2013-11-28 18:13                                             ` Oleg Nesterov
  0 siblings, 0 replies; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 18:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Peter Zijlstra wrote:
>
> So in specific, you say that piping a core file into a usermode helper
> is currently busted in pid-namespaces and that fixing that would indeed
> introduce such pid-namespace awareness to the usermode-helper stuff?

Perhaps yes. People want to run the core handler in the namespace of
the crashed task. And (perhaps) even dumping to the file should use
per-namespace ->root/etc.

(and of course core_pattern[] should not be global, but this is
 another story).

And there are other potential users. Say, nfsd user-mode helpers wants
to have the "right" mnt (at least) namespace.

But again, this all needs a lot of discussions. Even the API is not
actually clear. As for implementation, we could probably even force
->child_reaper (the subnamespace's init) to fork/exec the usermode
helper in this case. Or a namespace should explicitly create a thread
which should do this.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 0/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-11-28 15:02                                       ` Peter Zijlstra
@ 2013-11-28 19:33                                         ` Oleg Nesterov
  2013-11-28 19:33                                           ` [PATCH 1/1] " Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 19:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

On 11/28, Peter Zijlstra wrote:
>
> On Thu, Nov 28, 2013 at 04:00:55PM +0100, Oleg Nesterov wrote:
> > And btw. Note ____call_usermodehelper()->set_cpus_allowed_ptr(cpu_all_mask).
> >
> > Even if we change the affinity of the "khelper" worker threads this
> > won't restrict the user-space helpers.
> >
> > I think this set_cpus_allowed_ptr() should die in any case?
>
> Yes. A while back someone proposed adding a magic cpumask to tweak in
> order to contain the usermode helpers.
>
> Obviously I thought that was a bad idea.

OK. let me send the trivial patch then, please take it or ack
for Andrew.

Assumung that you and Tejun agree with this change, of course.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-11-28 19:33                                         ` [PATCH 0/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr() Oleg Nesterov
@ 2013-11-28 19:33                                           ` Oleg Nesterov
  2013-11-29 13:44                                             ` Tejun Heo
  2013-12-05 14:21                                             ` Frederic Weisbecker
  0 siblings, 2 replies; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-28 19:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

____call_usermodehelper() does set_cpus_allowed_ptr(cpu_all_mask),
this (and the comment) is misleading. We no longer have keventd_wq,
and kmod.c switched to khelper_wq a long ago.

And more importantly, "unlike our parent" is no longer true too,
this thread was created by WQ_UNBOUND worker thread which has the
full ->cpus_allowed mask, so this set_cpus_allowed_ptr() is simply
unnecessary.

Perhaps we will change this later, so that userspace can control
the affinity of the usermode helper tasks, but this is yet another
reason to remove this set_cpus_allowed_ptr().

To some degree this also applies to set_user_nice(), but this
patch only updates the comment.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/kmod.c |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index b086006..2fe4544 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -208,13 +208,9 @@ static int ____call_usermodehelper(void *data)
 	spin_lock_irq(&current->sighand->siglock);
 	flush_signal_handlers(current, 1);
 	spin_unlock_irq(&current->sighand->siglock);
-
-	/* We can run anywhere, unlike our parent keventd(). */
-	set_cpus_allowed_ptr(current, cpu_all_mask);
-
 	/*
-	 * Our parent is keventd, which runs with elevated scheduling priority.
-	 * Avoid propagating that into the userspace child.
+	 * Our parent is a workqueue thread, which can run with elevated
+	 * scheduling priority. Avoid propagating that into the userspace.
 	 */
 	set_user_nice(current, 0);
 
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-11-28 19:33                                           ` [PATCH 1/1] " Oleg Nesterov
@ 2013-11-29 13:44                                             ` Tejun Heo
  2013-11-29 16:49                                               ` Oleg Nesterov
  2013-12-05 14:21                                             ` Frederic Weisbecker
  1 sibling, 1 reply; 68+ messages in thread
From: Tejun Heo @ 2013-11-29 13:44 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Peter Zijlstra, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hey, Oleg.

On Thu, Nov 28, 2013 at 08:33:49PM +0100, Oleg Nesterov wrote:
> @@ -208,13 +208,9 @@ static int ____call_usermodehelper(void *data)
>  	spin_lock_irq(&current->sighand->siglock);
>  	flush_signal_handlers(current, 1);
>  	spin_unlock_irq(&current->sighand->siglock);
> -
> -	/* We can run anywhere, unlike our parent keventd(). */
> -	set_cpus_allowed_ptr(current, cpu_all_mask);

I don't think this is correct on numa machines where unbound workers
are NUMA-node affine by default.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child
  2013-11-28 16:33                                                   ` Peter Zijlstra
@ 2013-11-29 14:33                                                     ` Tejun Heo
  0 siblings, 0 replies; 68+ messages in thread
From: Tejun Heo @ 2013-11-29 14:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Oleg Nesterov, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hey,

On Thu, Nov 28, 2013 at 05:33:23PM +0100, Peter Zijlstra wrote:
> I'm not sure we need a single parent per workqueue; certainly the case I
> get asked most frequently about doesn't care, they only want to contain
> _all_ unbound workers.
> 
> I don't see a problem with later splitting out other workqueues if
> there's a good use-case for those.
>
> I'm not even sure we need to split out the userspace helpers per-se;
> again, they fall in the all-unbound category and I don't think I've seen
> people ask for specific control of those over other unbound workers --
> although conceptually it does make some sense to split them out.

There are workqueues with custom attributes, so, if we want to have
default settings for all workers, we'll have to implement nested
configuration where global knobs control the default settings with
individual settings overriding them.

> > Again, I'm worried about exposing unintended characteristics of
> > implementation and being locked into it.  Regardless of interface, I
> > think it's important to control what can be depended upon from
> > userland if we're gonna keep up "no userland visible behavior will
> > break" thing.
> 
> I appreciate your caution, but we shouldn't overdo the thing and
> dis-allow everything.

Yeah, going extreme on either extreme would suck but I'm not
suggesting disallowing everything, just that we need to vet what gets
exposed.

> I never proposed a parent per workqueue. The most I proposed was a
> single parent for all unbound workers and a parent for all usermode
> helpers.

It'd be weird to have one mechanism to control default attributes and
a different one to control per-workqueue attributes (again, we need a
custom mechanism to control attributes outside task scope anyway).  If
we're actually gonna do this, I'd much prefer reusing the existing
workqueue mechanism than introducing something new which wouldn't be
able to replace the existing one.

> > automatic
> > NUMA binding, which means we need workqueue-specific interface anyway.
> 
> I'm curious; why is there workqueue numa stuff? NUMA doesn't have the
> correctness issues per-cpu has -- per-cpu is fundamentally special in
> that there's no concurrency.

NUMA doesn't have correctness issues but it does have significant
performance impacts.  There are work itmes which may consume
considerable amount of cpu cycles, e.g. crypt, writeback, io
completion.  Binding them to cpu limits how much they can scale and
using plain unbound deteriorates NUMA affinity.  Even for unbound work
items which aren't necessary cpu cycle / memory access heavy, sticking
to NUMA affinity leads to generally better behavior.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-11-29 13:44                                             ` Tejun Heo
@ 2013-11-29 16:49                                               ` Oleg Nesterov
  0 siblings, 0 replies; 68+ messages in thread
From: Oleg Nesterov @ 2013-11-29 16:49 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Zijlstra, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hi Tejun,

On 11/29, Tejun Heo wrote:
>
> Hey, Oleg.
>
> On Thu, Nov 28, 2013 at 08:33:49PM +0100, Oleg Nesterov wrote:
> > @@ -208,13 +208,9 @@ static int ____call_usermodehelper(void *data)
> >  	spin_lock_irq(&current->sighand->siglock);
> >  	flush_signal_handlers(current, 1);
> >  	spin_unlock_irq(&current->sighand->siglock);
> > -
> > -	/* We can run anywhere, unlike our parent keventd(). */
> > -	set_cpus_allowed_ptr(current, cpu_all_mask);
>
> I don't think this is correct on numa machines where unbound workers
> are NUMA-node affine by default.

Aaah, I see. And you even mentioned this before, but I didn't pay
attention.

Thank!

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-11-28 19:33                                           ` [PATCH 1/1] " Oleg Nesterov
  2013-11-29 13:44                                             ` Tejun Heo
@ 2013-12-05 14:21                                             ` Frederic Weisbecker
  2013-12-05 14:37                                               ` Oleg Nesterov
  1 sibling, 1 reply; 68+ messages in thread
From: Frederic Weisbecker @ 2013-12-05 14:21 UTC (permalink / raw)
  To: Oleg Nesterov, Christoph Lameter
  Cc: Peter Zijlstra, Tejun Heo, zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

2013/11/28 Oleg Nesterov <oleg@redhat.com>:
> ____call_usermodehelper() does set_cpus_allowed_ptr(cpu_all_mask),
> this (and the comment) is misleading. We no longer have keventd_wq,
> and kmod.c switched to khelper_wq a long ago.
>
> And more importantly, "unlike our parent" is no longer true too,
> this thread was created by WQ_UNBOUND worker thread which has the
> full ->cpus_allowed mask, so this set_cpus_allowed_ptr() is simply
> unnecessary.
>
> Perhaps we will change this later, so that userspace can control
> the affinity of the usermode helper tasks, but this is yet another
> reason to remove this set_cpus_allowed_ptr().
>
> To some degree this also applies to set_user_nice(), but this
> patch only updates the comment.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

I'm adding Christophe in Cc because he is interested in tweaking the
affinity of call_usermodehelper for cpu isolation. This welcome
cleanup confirms that we want to take the direction of being able to
change the affinity of workqueue themselves instead of just
call_usermodehelper() alone.

> ---
>  kernel/kmod.c |    8 ++------
>  1 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index b086006..2fe4544 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -208,13 +208,9 @@ static int ____call_usermodehelper(void *data)
>         spin_lock_irq(&current->sighand->siglock);
>         flush_signal_handlers(current, 1);
>         spin_unlock_irq(&current->sighand->siglock);
> -
> -       /* We can run anywhere, unlike our parent keventd(). */
> -       set_cpus_allowed_ptr(current, cpu_all_mask);
> -
>         /*
> -        * Our parent is keventd, which runs with elevated scheduling priority.
> -        * Avoid propagating that into the userspace child.
> +        * Our parent is a workqueue thread, which can run with elevated
> +        * scheduling priority. Avoid propagating that into the userspace.
>          */
>         set_user_nice(current, 0);
>
> --
> 1.5.5.1
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-12-05 14:21                                             ` Frederic Weisbecker
@ 2013-12-05 14:37                                               ` Oleg Nesterov
  2013-12-05 14:39                                                 ` Tejun Heo
  2013-12-05 14:42                                                 ` Frederic Weisbecker
  0 siblings, 2 replies; 68+ messages in thread
From: Oleg Nesterov @ 2013-12-05 14:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, Peter Zijlstra, Tejun Heo, zhang.yi20, lkml,
	Tetsuo Handa, Ingo Molnar

On 12/05, Frederic Weisbecker wrote:
>
> 2013/11/28 Oleg Nesterov <oleg@redhat.com>:
> > ____call_usermodehelper() does set_cpus_allowed_ptr(cpu_all_mask),
> > this (and the comment) is misleading. We no longer have keventd_wq,
> > and kmod.c switched to khelper_wq a long ago.
> >
> > And more importantly, "unlike our parent" is no longer true too,
> > this thread was created by WQ_UNBOUND worker thread which has the
> > full ->cpus_allowed mask, so this set_cpus_allowed_ptr() is simply
> > unnecessary.
> >
> > Perhaps we will change this later, so that userspace can control
> > the affinity of the usermode helper tasks, but this is yet another
> > reason to remove this set_cpus_allowed_ptr().
> >
> > To some degree this also applies to set_user_nice(), but this
> > patch only updates the comment.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> I'm adding Christophe in Cc because he is interested in tweaking the
> affinity of call_usermodehelper for cpu isolation. This welcome
> cleanup confirms that we want to take the direction of being able to
> change the affinity of workqueue themselves instead of just
> call_usermodehelper() alone.

OK, but I'd like to remind just in case, as Tejun pointed out this
patch is wrong ;)

And "change the affinity of workqueue themselves" is not simple, but
we can make khelper_wq WQ_SYSFS.

> >  kernel/kmod.c |    8 ++------
> >  1 files changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index b086006..2fe4544 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -208,13 +208,9 @@ static int ____call_usermodehelper(void *data)
> >         spin_lock_irq(&current->sighand->siglock);
> >         flush_signal_handlers(current, 1);
> >         spin_unlock_irq(&current->sighand->siglock);
> > -
> > -       /* We can run anywhere, unlike our parent keventd(). */
> > -       set_cpus_allowed_ptr(current, cpu_all_mask);
> > -
> >         /*
> > -        * Our parent is keventd, which runs with elevated scheduling priority.
> > -        * Avoid propagating that into the userspace child.
> > +        * Our parent is a workqueue thread, which can run with elevated
> > +        * scheduling priority. Avoid propagating that into the userspace.
> >          */
> >         set_user_nice(current, 0);
> >
> > --
> > 1.5.5.1
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-12-05 14:37                                               ` Oleg Nesterov
@ 2013-12-05 14:39                                                 ` Tejun Heo
  2013-12-05 15:30                                                   ` Oleg Nesterov
  2013-12-05 16:26                                                   ` Frederic Weisbecker
  2013-12-05 14:42                                                 ` Frederic Weisbecker
  1 sibling, 2 replies; 68+ messages in thread
From: Tejun Heo @ 2013-12-05 14:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Frederic Weisbecker, Christoph Lameter, Peter Zijlstra,
	zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hello,

On Thu, Dec 05, 2013 at 03:37:45PM +0100, Oleg Nesterov wrote:
> > I'm adding Christophe in Cc because he is interested in tweaking the
> > affinity of call_usermodehelper for cpu isolation. This welcome
> > cleanup confirms that we want to take the direction of being able to
> > change the affinity of workqueue themselves instead of just
> > call_usermodehelper() alone.
> 
> OK, but I'd like to remind just in case, as Tejun pointed out this
> patch is wrong ;)
> 
> And "change the affinity of workqueue themselves" is not simple, but
> we can make khelper_wq WQ_SYSFS.

Maybe workqueue should implement and expose default attributes which
are inherited by all workqueues unless they're explicitly overridden?
The use case here is not really about isolating certain subgroup of
workers but rather being able to control the default behavior, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-12-05 14:37                                               ` Oleg Nesterov
  2013-12-05 14:39                                                 ` Tejun Heo
@ 2013-12-05 14:42                                                 ` Frederic Weisbecker
  1 sibling, 0 replies; 68+ messages in thread
From: Frederic Weisbecker @ 2013-12-05 14:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Christoph Lameter, Peter Zijlstra, Tejun Heo, zhang.yi20, lkml,
	Tetsuo Handa, Ingo Molnar

On Thu, Dec 05, 2013 at 03:37:45PM +0100, Oleg Nesterov wrote:
> On 12/05, Frederic Weisbecker wrote:
> >
> > 2013/11/28 Oleg Nesterov <oleg@redhat.com>:
> > > ____call_usermodehelper() does set_cpus_allowed_ptr(cpu_all_mask),
> > > this (and the comment) is misleading. We no longer have keventd_wq,
> > > and kmod.c switched to khelper_wq a long ago.
> > >
> > > And more importantly, "unlike our parent" is no longer true too,
> > > this thread was created by WQ_UNBOUND worker thread which has the
> > > full ->cpus_allowed mask, so this set_cpus_allowed_ptr() is simply
> > > unnecessary.
> > >
> > > Perhaps we will change this later, so that userspace can control
> > > the affinity of the usermode helper tasks, but this is yet another
> > > reason to remove this set_cpus_allowed_ptr().
> > >
> > > To some degree this also applies to set_user_nice(), but this
> > > patch only updates the comment.
> > >
> > > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> >
> > I'm adding Christophe in Cc because he is interested in tweaking the
> > affinity of call_usermodehelper for cpu isolation. This welcome
> > cleanup confirms that we want to take the direction of being able to
> > change the affinity of workqueue themselves instead of just
> > call_usermodehelper() alone.
> 
> OK, but I'd like to remind just in case, as Tejun pointed out this
> patch is wrong ;)

Ok, I see that's because unboud workqueues are actually bound to NUMA nodes.
Doesn't that eventually work for call_usermodehelper() as well? Or may be there
are some reasons for it to keep a global affinity?

> 
> And "change the affinity of workqueue themselves" is not simple, but
> we can make khelper_wq WQ_SYSFS.

Right it can be either user driven like through sysfs or some adaptive decision
taken by the kernel on top of full dynticks CPUs, etc...

But the sysfs interface based solution looks better from a quick POV.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-12-05 14:39                                                 ` Tejun Heo
@ 2013-12-05 15:30                                                   ` Oleg Nesterov
  2013-12-05 19:13                                                     ` Christoph Lameter
  2013-12-05 16:26                                                   ` Frederic Weisbecker
  1 sibling, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-12-05 15:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, Christoph Lameter, Peter Zijlstra,
	zhang.yi20, lkml, Tetsuo Handa, Ingo Molnar

Hi,

On 12/05, Tejun Heo wrote:
> Hello,
>
> On Thu, Dec 05, 2013 at 03:37:45PM +0100, Oleg Nesterov wrote:
> >
> > OK, but I'd like to remind just in case, as Tejun pointed out this
> > patch is wrong ;)
> >
> > And "change the affinity of workqueue themselves" is not simple, but
> > we can make khelper_wq WQ_SYSFS.
>
> Maybe workqueue should implement and expose default attributes which
> are inherited by all workqueues unless they're explicitly overridden?
> The use case here is not really about isolating certain subgroup of
> workers but rather being able to control the default behavior, right?

Perhaps, I simply do not know what the users want.

Either way, if khelper_wq will be controlled by sysfs, then this
set_cpus_allowed_ptr() should probably die.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-12-05 14:39                                                 ` Tejun Heo
  2013-12-05 15:30                                                   ` Oleg Nesterov
@ 2013-12-05 16:26                                                   ` Frederic Weisbecker
  1 sibling, 0 replies; 68+ messages in thread
From: Frederic Weisbecker @ 2013-12-05 16:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Christoph Lameter, Peter Zijlstra, zhang.yi20,
	lkml, Tetsuo Handa, Ingo Molnar

On Thu, Dec 05, 2013 at 09:39:03AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Thu, Dec 05, 2013 at 03:37:45PM +0100, Oleg Nesterov wrote:
> > > I'm adding Christophe in Cc because he is interested in tweaking the
> > > affinity of call_usermodehelper for cpu isolation. This welcome
> > > cleanup confirms that we want to take the direction of being able to
> > > change the affinity of workqueue themselves instead of just
> > > call_usermodehelper() alone.
> > 
> > OK, but I'd like to remind just in case, as Tejun pointed out this
> > patch is wrong ;)
> > 
> > And "change the affinity of workqueue themselves" is not simple, but
> > we can make khelper_wq WQ_SYSFS.
> 
> Maybe workqueue should implement and expose default attributes which
> are inherited by all workqueues unless they're explicitly overridden?
> The use case here is not really about isolating certain subgroup of
> workers but rather being able to control the default behavior, right?

So the need I know of is to perform CPU isolation on a given set of CPUs.
Their are two known usecases:

1) Optimize the use of the CPU and avoid being disturbed with interrupts, other
tasks, etc... These disturbances trash the cache and steal cputime. We want to
maximize the use of the CPU. So that's the HPC case.

2) Guarantee that the CPU won't be disturbed while some high priority tasks are running.
So we have a critical task that has something to do under a given time deadline or
some bounded latency. That's the real time case.

Case 1) can deal with a few rare noise, but case 2) can't.

In both cases I think that people are happy with having one CPU that handles all
the housekeeping: handle timekeeping, unbound timers, workqueues, RCU callbacks
and random stats maintainance. But this housekeeping could possibly be divided in
the future with some per NUMA node affine housekeepers, who knows? So I think it may be nice to
have some flexibility in how we affine these workqueues, hence a user-driven
action taken through exisiting interfaces like sysfs sounds like a good
candidate.
 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-12-05 15:30                                                   ` Oleg Nesterov
@ 2013-12-05 19:13                                                     ` Christoph Lameter
  2013-12-06 14:53                                                       ` Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Christoph Lameter @ 2013-12-05 19:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, Frederic Weisbecker, Peter Zijlstra, zhang.yi20, lkml,
	Tetsuo Handa, Ingo Molnar

On Thu, 5 Dec 2013, Oleg Nesterov wrote:

> Perhaps, I simply do not know what the users want.

We would like to control OS spawning of threads and restrict them to a
limited set of cpus so that the other processors can do latency sensitive
work without being impacted by creation of kernel threads etc.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-12-05 19:13                                                     ` Christoph Lameter
@ 2013-12-06 14:53                                                       ` Oleg Nesterov
  2013-12-06 15:37                                                         ` Christoph Lameter
  0 siblings, 1 reply; 68+ messages in thread
From: Oleg Nesterov @ 2013-12-06 14:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, Frederic Weisbecker, Peter Zijlstra, zhang.yi20, lkml,
	Tetsuo Handa, Ingo Molnar

On 12/05, Christoph Lameter wrote:
>
> On Thu, 5 Dec 2013, Oleg Nesterov wrote:
>
> > Perhaps, I simply do not know what the users want.
>
> We would like to control OS spawning of threads and restrict them to a
> limited set of cpus so that the other processors can do latency sensitive
> work without being impacted by creation of kernel threads etc.

This probably means that Tejun's "default attributes which are inherited
by all workqueues" suggestion was right.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-12-06 14:53                                                       ` Oleg Nesterov
@ 2013-12-06 15:37                                                         ` Christoph Lameter
  2013-12-06 15:56                                                           ` Oleg Nesterov
  0 siblings, 1 reply; 68+ messages in thread
From: Christoph Lameter @ 2013-12-06 15:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, Frederic Weisbecker, Peter Zijlstra, zhang.yi20, lkml,
	Tetsuo Handa, Ingo Molnar

On Fri, 6 Dec 2013, Oleg Nesterov wrote:

> This probably means that Tejun's "default attributes which are inherited
> by all workqueues" suggestion was right.

There are workqueues that are required to run on specific nodes /
processors. Those cannot inherit these attributes and would need to be
used regardless of the attributes if the kernel code demands it.




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr()
  2013-12-06 15:37                                                         ` Christoph Lameter
@ 2013-12-06 15:56                                                           ` Oleg Nesterov
  0 siblings, 0 replies; 68+ messages in thread
From: Oleg Nesterov @ 2013-12-06 15:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, Frederic Weisbecker, Peter Zijlstra, zhang.yi20, lkml,
	Tetsuo Handa, Ingo Molnar

On 12/06, Christoph Lameter wrote:
>
> On Fri, 6 Dec 2013, Oleg Nesterov wrote:
>
> > This probably means that Tejun's "default attributes which are inherited
> > by all workqueues" suggestion was right.
>
> There are workqueues that are required to run on specific nodes /
> processors. Those cannot inherit these attributes and would need to be
> used regardless of the attributes if the kernel code demands it.

Yes, yes, Tejun actually said "inherited by all workqueues unless
they're explicitly overridden", I stripped his words too much.

Oleg.


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2013-12-06 15:55 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-14  1:51 kmod: avoid propagating PF_NO_SETAFFINITY into userspace child zhang.yi20
2013-11-14  5:23 ` Tejun Heo
2013-11-14 11:40   ` Oleg Nesterov
2013-11-14 11:55     ` [PATCH 0/1]: (Was: kmod: avoid propagating PF_NO_SETAFFINITY into userspace child) Oleg Nesterov
2013-11-14 11:56       ` [PATCH 1/1] workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY Oleg Nesterov
2013-11-22 23:13         ` Tejun Heo
     [not found]           ` <OF36E72FA9.51146BE3-ON48257C2E.0008BC6D-48257C2E.0008FF9C@zte.com.cn>
2013-11-25 12:14             ` 答复: " Oleg Nesterov
2013-11-26  2:10               ` [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child zhang.yi20
2013-11-26 18:04                 ` Oleg Nesterov
2013-11-27  2:07                   ` zhang.yi20
2013-11-27 13:20                     ` Oleg Nesterov
2013-11-27 18:31                     ` Tejun Heo
2013-11-28  9:13                       ` Peter Zijlstra
2013-11-28 11:45                         ` Oleg Nesterov
2013-11-28 12:17                           ` Peter Zijlstra
2013-11-28 13:31                             ` Oleg Nesterov
2013-11-28 13:39                               ` Peter Zijlstra
2013-11-28 14:13                                 ` Tejun Heo
2013-11-28 14:31                                   ` Peter Zijlstra
2013-11-28 14:38                                     ` Tejun Heo
2013-11-28 14:47                                       ` Peter Zijlstra
2013-11-28 14:51                                         ` Tejun Heo
2013-11-28 14:55                                       ` Peter Zijlstra
2013-11-28 14:57                                         ` Tejun Heo
2013-11-28 14:59                                           ` Peter Zijlstra
2013-11-28 15:07                                             ` Tejun Heo
2013-11-28 15:17                                               ` Peter Zijlstra
2013-11-28 15:39                                                 ` Tejun Heo
2013-11-28 16:33                                                   ` Peter Zijlstra
2013-11-29 14:33                                                     ` Tejun Heo
2013-11-28 15:47                                                 ` Oleg Nesterov
2013-11-28 16:07                                                   ` Oleg Nesterov
2013-11-28 14:43                                     ` Peter Zijlstra
2013-11-28 14:53                                       ` Tejun Heo
2013-11-28 14:57                                         ` Peter Zijlstra
2013-11-28 15:02                                           ` Tejun Heo
2013-11-28 15:07                                             ` Peter Zijlstra
2013-11-28 15:10                                               ` Tejun Heo
2013-11-28 15:18                                                 ` Peter Zijlstra
2013-11-28 14:38                                   ` Peter Zijlstra
2013-11-28 14:45                                     ` Tejun Heo
2013-11-28 14:53                                       ` Peter Zijlstra
2013-11-28 15:34                                     ` Oleg Nesterov
2013-11-28 15:40                                       ` Peter Zijlstra
2013-11-28 16:20                                         ` Oleg Nesterov
2013-11-28 16:38                                           ` Peter Zijlstra
2013-11-28 18:13                                             ` Oleg Nesterov
2013-11-28 14:23                                 ` Oleg Nesterov
2013-11-28 14:31                                   ` Tejun Heo
2013-11-28 15:00                                     ` Oleg Nesterov
2013-11-28 15:02                                       ` Peter Zijlstra
2013-11-28 19:33                                         ` [PATCH 0/1] usermodehelper: kill ____call_usermodehelper()->set_cpus_allowed_ptr() Oleg Nesterov
2013-11-28 19:33                                           ` [PATCH 1/1] " Oleg Nesterov
2013-11-29 13:44                                             ` Tejun Heo
2013-11-29 16:49                                               ` Oleg Nesterov
2013-12-05 14:21                                             ` Frederic Weisbecker
2013-12-05 14:37                                               ` Oleg Nesterov
2013-12-05 14:39                                                 ` Tejun Heo
2013-12-05 15:30                                                   ` Oleg Nesterov
2013-12-05 19:13                                                     ` Christoph Lameter
2013-12-06 14:53                                                       ` Oleg Nesterov
2013-12-06 15:37                                                         ` Christoph Lameter
2013-12-06 15:56                                                           ` Oleg Nesterov
2013-12-05 16:26                                                   ` Frederic Weisbecker
2013-12-05 14:42                                                 ` Frederic Weisbecker
2013-11-28 13:41                               ` [PATCH]: exec: avoid propagating PF_NO_SETAFFINITY into userspace child Peter Zijlstra
2013-11-28 14:05                                 ` Tejun Heo
2013-11-28 14:34                                 ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).