All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: Fix PF_NO_SETAFFINITY blind inheritance
@ 2021-05-25 23:58 Frederic Weisbecker
  2021-05-26  7:16 ` [tip: sched/core] sched: Stop PF_NO_SETAFFINITY from being inherited by various init system threads tip-bot2 for Frederic Weisbecker
  2021-05-26 10:10 ` [PATCH] sched: Fix PF_NO_SETAFFINITY blind inheritance Valentin Schneider
  0 siblings, 2 replies; 3+ messages in thread
From: Frederic Weisbecker @ 2021-05-25 23:58 UTC (permalink / raw)
  To: Valentin Schneider, Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Paul E . McKenney,
	Daniel Bristot de Oliveira, Yejune Deng

The patch 00b89fe0197f
	("sched: Make the idle task quack like a per-CPU kthread")
added PF_KTHREAD | PF_NO_SETAFFINITY to the idle kernel threads.

Unfortunately these properties are inherited to the init/0 children
through kernel_thread() calls: init/1 and kthreadd. There are several
side effects to that:

1) kthreadd affinity can not be reset anymore from userspace. Also
   PF_NO_SETAFFINITY propagates to all kthreadd children, including
   the unbound kthreads Therefore it's not possible anymore to overwrite
   the affinity of any of them. Here is an example of warning reported
   by rcutorture:

		WARNING: CPU: 0 PID: 116 at kernel/rcu/tree_nocb.h:1306 rcu_bind_current_to_nocb+0x31/0x40
		Modules linked in:
		CPU: 0 PID: 116 Comm: rcu_torture_fwd Not tainted 5.13.0-rc3-next-20210524 #478
		Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
		RIP: 0010:rcu_bind_current_to_nocb+0x31/0x40
		Code: e8 d4 0c 35 00 85 c0 75 01 c3 65 48 8b 04 25 00 6d 01 00 8b b8 58 05 00 00 48 c7 c6 68 76 ce a6 e8 e4 2e fc ff 48 85 c0 74 df <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 55 48
		+63 c7 48 c7
		RSP: 0000:ffff9bb78036fe48 EFLAGS: 00010282
		RAX: ffffffffffffffea RBX: ffff8a2d01b1b800 RCX: 0000000000000000
		RDX: ffff8a2d01b1b800 RSI: 0000000000000074 RDI: ffffffffa644eec0
		RBP: ffff9bb78036ff00 R08: ffff8a2d01b06400 R09: ffff8a2d014346d8
		R10: ffff8a2d014348a0 R11: 0000000000000000 R12: ffff8a2d01af8420
		R13: ffff8a2d01895000 R14: ffff8a2d01895000 R15: ffff8a2d01b1b800
		FS:  0000000000000000(0000) GS:ffff8a2d1f400000(0000) knlGS:0000000000000000
		CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
		CR2: 0000000000000000 CR3: 000000000660c000 CR4: 00000000000006f0
		Call Trace:
		 rcu_torture_fwd_prog+0x62/0x730
		 ? ttwu_do_wakeup.isra.0+0xd/0xd0
		 ? rcu_torture_read_lock_trivial+0x10/0x10
		 ? kthread+0x122/0x140
		 kthread+0x122/0x140
		 ? set_kthread_struct+0x40/0x40
		 ret_from_fork+0x22/0x30
		---[ end trace 135cca867023305b ]---

2) init/1 does an exec() in the end which clears both
   PF_KTHREAD and PF_NO_SETAFFINITY so we are fine once kernel_init()
   escapes to userspace. But until then, no initcall or init code can
   successfully call sched_setaffinity() to init/1.

   Also PF_KTHREAD looks legit on init/1 before it calls exec() but
   we better be careful with unknown introduced side effects.

One way to solve the PF_NO_SETAFFINITY issue is to not inherit this flag
on copy_process() at all. The cases where it matters are:

* fork_idle(): explicitly set the flag already.
* fork() syscalls: userspace tasks that shouldn't be concerned by that.
* create_io_thread(): the callers explicitly attribute the flag to the
                      newly created tasks.
* kernel_thread():
	_ Fix the issues on init/1 and kthreadd
	_ Fix the issues on kthreadd children.
	_ Usermode helper created by an unbound workqueue. This shouldn't
	  matter. In the worst case it gives more control to userspace
	  on setting affinity to these short living tasks although this can
	  be tuned with inherited unbound workqueues affinity already.

Reported-and-tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Yejune Deng <yejune.deng@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index ace4631b5b54..e595e77913eb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2000,7 +2000,7 @@ static __latent_entropy struct task_struct *copy_process(
 		goto bad_fork_cleanup_count;
 
 	delayacct_tsk_init(p);	/* Must remain after dup_task_struct() */
-	p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
+	p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE | PF_NO_SETAFFINITY);
 	p->flags |= PF_FORKNOEXEC;
 	INIT_LIST_HEAD(&p->children);
 	INIT_LIST_HEAD(&p->sibling);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [tip: sched/core] sched: Stop PF_NO_SETAFFINITY from being inherited by various init system threads
  2021-05-25 23:58 [PATCH] sched: Fix PF_NO_SETAFFINITY blind inheritance Frederic Weisbecker
@ 2021-05-26  7:16 ` tip-bot2 for Frederic Weisbecker
  2021-05-26 10:10 ` [PATCH] sched: Fix PF_NO_SETAFFINITY blind inheritance Valentin Schneider
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2021-05-26  7:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Paul E. McKenney, Frederic Weisbecker, Ingo Molnar, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     a8ea6fc9b089156d9230bfeef964dd9be101a4a9
Gitweb:        https://git.kernel.org/tip/a8ea6fc9b089156d9230bfeef964dd9be101a4a9
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Wed, 26 May 2021 01:58:49 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 26 May 2021 08:58:53 +02:00

sched: Stop PF_NO_SETAFFINITY from being inherited by various init system threads

Commit:

  00b89fe0197f ("sched: Make the idle task quack like a per-CPU kthread")

... added PF_KTHREAD | PF_NO_SETAFFINITY to the idle kernel threads.

Unfortunately these properties are inherited to the init/0 children
through kernel_thread() calls: init/1 and kthreadd. There are several
side effects to that:

1) kthreadd affinity can not be reset anymore from userspace. Also
   PF_NO_SETAFFINITY propagates to all kthreadd children, including
   the unbound kthreads Therefore it's not possible anymore to overwrite
   the affinity of any of them. Here is an example of warning reported
   by rcutorture:

		WARNING: CPU: 0 PID: 116 at kernel/rcu/tree_nocb.h:1306 rcu_bind_current_to_nocb+0x31/0x40
		Call Trace:
		 rcu_torture_fwd_prog+0x62/0x730
		 kthread+0x122/0x140
		 ret_from_fork+0x22/0x30

2) init/1 does an exec() in the end which clears both
   PF_KTHREAD and PF_NO_SETAFFINITY so we are fine once kernel_init()
   escapes to userspace. But until then, no initcall or init code can
   successfully call sched_setaffinity() to init/1.

   Also PF_KTHREAD looks legit on init/1 before it calls exec() but
   we better be careful with unknown introduced side effects.

One way to solve the PF_NO_SETAFFINITY issue is to not inherit this flag
on copy_process() at all. The cases where it matters are:

* fork_idle(): explicitly set the flag already.
* fork() syscalls: userspace tasks that shouldn't be concerned by that.
* create_io_thread(): the callers explicitly attribute the flag to the
                      newly created tasks.
* kernel_thread():
	- Fix the issues on init/1 and kthreadd
	- Fix the issues on kthreadd children.
	- Usermode helper created by an unbound workqueue. This shouldn't
	  matter. In the worst case it gives more control to userspace
	  on setting affinity to these short living tasks although this can
	  be tuned with inherited unbound workqueues affinity already.

Fixes: 00b89fe0197f ("sched: Make the idle task quack like a per-CPU kthread")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20210525235849.441842-1-frederic@kernel.org
---
 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index ace4631..e595e77 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2000,7 +2000,7 @@ static __latent_entropy struct task_struct *copy_process(
 		goto bad_fork_cleanup_count;
 
 	delayacct_tsk_init(p);	/* Must remain after dup_task_struct() */
-	p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
+	p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE | PF_NO_SETAFFINITY);
 	p->flags |= PF_FORKNOEXEC;
 	INIT_LIST_HEAD(&p->children);
 	INIT_LIST_HEAD(&p->sibling);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched: Fix PF_NO_SETAFFINITY blind inheritance
  2021-05-25 23:58 [PATCH] sched: Fix PF_NO_SETAFFINITY blind inheritance Frederic Weisbecker
  2021-05-26  7:16 ` [tip: sched/core] sched: Stop PF_NO_SETAFFINITY from being inherited by various init system threads tip-bot2 for Frederic Weisbecker
@ 2021-05-26 10:10 ` Valentin Schneider
  1 sibling, 0 replies; 3+ messages in thread
From: Valentin Schneider @ 2021-05-26 10:10 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Paul E . McKenney,
	Daniel Bristot de Oliveira, Yejune Deng


Hi Frederic,

Sorry about that one; thanks for having dug into it.

On 26/05/21 01:58, Frederic Weisbecker wrote:
> One way to solve the PF_NO_SETAFFINITY issue is to not inherit this flag
> on copy_process() at all. The cases where it matters are:
>
> * fork_idle(): explicitly set the flag already.
> * fork() syscalls: userspace tasks that shouldn't be concerned by that.
> * create_io_thread(): the callers explicitly attribute the flag to the
>                       newly created tasks.
> * kernel_thread():
>       _ Fix the issues on init/1 and kthreadd
>       _ Fix the issues on kthreadd children.
>       _ Usermode helper created by an unbound workqueue. This shouldn't
>         matter. In the worst case it gives more control to userspace
>         on setting affinity to these short living tasks although this can
>         be tuned with inherited unbound workqueues affinity already.
>

(I just saw it got shoved into tip already, but in any case:)

That makes sense to me. Regarding the UMH point, I don't believe there are
others like it creeping around; otherwise we might've had to go with e.g.

  p->flags &= ~(... | (PF_NO_SETAFFINITY * !!(p->flags & PF_IDLE)))

but per the above that doesn't seem necessary.

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-05-26 10:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-25 23:58 [PATCH] sched: Fix PF_NO_SETAFFINITY blind inheritance Frederic Weisbecker
2021-05-26  7:16 ` [tip: sched/core] sched: Stop PF_NO_SETAFFINITY from being inherited by various init system threads tip-bot2 for Frederic Weisbecker
2021-05-26 10:10 ` [PATCH] sched: Fix PF_NO_SETAFFINITY blind inheritance Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.