stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/7] kthread: Don't allocate kthread_struct for init and umh
       [not found] <87mtfu4up3.fsf@email.froward.int.ebiederm.org>
@ 2022-05-06 14:15 ` Eric W. Biederman
       [not found]   ` <CANpfEhNAQvazzCSN-dVgYmwNSRjqOrqZF0_j7GPLbCdEkogzSg@mail.gmail.com>
  2022-05-10 14:38   ` Thomas Gleixner
  0 siblings, 2 replies; 5+ messages in thread
From: Eric W. Biederman @ 2022-05-06 14:15 UTC (permalink / raw)
  To: linux-arch
  Cc: Tejun Heo, Peter Zijlstra, Vincent Guittot, Al Viro, Jens Axboe,
	Thomas Gleixner, Linus Torvalds, linux-kernel, Eric W. Biederman,
	stable,
	Максим
	Кутявин

If kthread_is_per_cpu runs concurrently with free_kthread_struct the
kthread_struct that was just freed may be read from.

This bug was introduced by commit 40966e316f86 ("kthread: Ensure
struct kthread is present for all kthreads").  When kthread_struct
started to be allocated for all tasks that have PF_KTHREAD set.  This
in turn required the kthread_struct to be freed in kernel_execve and
violated the assumption that kthread_struct will have the same
lifetime as the task.

Looking a bit deeper this only applies to callers of kernel_execve
which is just the init process and the user mode helper processes.
These processes really don't want to be kernel threads but are for
historical reasons.  Mostly that copy_thread does not know how to take
a kernel mode function to the process with for processes without
PF_KTHREAD or PF_IO_WORKER set.

Solve this by not allocating kthread_struct for the init process and
the user mode helper processes.

This is done by adding a kthread member to struct kernel_clone_args.
Setting kthread in fork_idle and kernel_thread.  Adding
user_mode_thread that works like kernel_thread except it does not set
kthread.  In fork only allocating the kthread_struct if .kthread is set.

I have looked at kernel/kthread.c and since commit 40966e316f86
("kthread: Ensure struct kthread is present for all kthreads") there
have been no assumptions added that to_kthread or __to_kthread will
not return NULL.

There are a few callers of to_kthread or __to_kthread that assume a
non-NULL struct kthread pointer will be returned.  These functions are
kthread_data(), kthread_parmme(), kthread_exit(), kthread(),
kthread_park(), kthread_unpark(), kthread_stop().  All of those functions
can reasonably expected to be called when it is know that a task is a
kthread so that assumption seems reasonable.

Cc: stable@vger.kernel.org
Fixes: 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads")
Reported-by: Максим Кутявин <maximkabox13@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c                  |  6 ++++--
 include/linux/sched/task.h |  2 ++
 init/main.c                |  2 +-
 kernel/fork.c              | 22 ++++++++++++++++++++--
 kernel/umh.c               |  6 +++---
 5 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index e3e55d5e0be1..75eb6e0ee7b2 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1308,8 +1308,6 @@ int begin_new_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out_unlock;
 
-	if (me->flags & PF_KTHREAD)
-		free_kthread_struct(me);
 	me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD |
 					PF_NOFREEZE | PF_NO_SETAFFINITY);
 	flush_thread();
@@ -1955,6 +1953,10 @@ int kernel_execve(const char *kernel_filename,
 	int fd = AT_FDCWD;
 	int retval;
 
+	if (WARN_ON_ONCE((current->flags & PF_KTHREAD) &&
+			(current->worker_private)))
+		return -EINVAL;
+
 	filename = getname_kernel(kernel_filename);
 	if (IS_ERR(filename))
 		return PTR_ERR(filename);
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 719c9a6cac8d..4492266935dd 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -32,6 +32,7 @@ struct kernel_clone_args {
 	size_t set_tid_size;
 	int cgroup;
 	int io_thread;
+	int kthread;
 	struct cgroup *cgrp;
 	struct css_set *cset;
 };
@@ -89,6 +90,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
 struct task_struct *fork_idle(int);
 struct mm_struct *copy_init_mm(void);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
+extern pid_t user_mode_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
 int kernel_wait(pid_t pid, int *stat);
 
diff --git a/init/main.c b/init/main.c
index 98182c3c2c4b..39baac0211c6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -688,7 +688,7 @@ noinline void __ref rest_init(void)
 	 * the init task will end up wanting to create kthreads, which, if
 	 * we schedule it before we create kthreadd, will OOPS.
 	 */
-	pid = kernel_thread(kernel_init, NULL, CLONE_FS);
+	pid = user_mode_thread(kernel_init, NULL, CLONE_FS);
 	/*
 	 * Pin init on the boot CPU. Task migration is not properly working
 	 * until sched_init_smp() has been run. It will set the allowed
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..27c5203750b4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2157,7 +2157,7 @@ static __latent_entropy struct task_struct *copy_process(
 	p->io_context = NULL;
 	audit_set_context(p, NULL);
 	cgroup_fork(p);
-	if (p->flags & PF_KTHREAD) {
+	if (args->kthread) {
 		if (!set_kthread_struct(p))
 			goto bad_fork_cleanup_delayacct;
 	}
@@ -2548,7 +2548,8 @@ struct task_struct * __init fork_idle(int cpu)
 {
 	struct task_struct *task;
 	struct kernel_clone_args args = {
-		.flags = CLONE_VM,
+		.flags		= CLONE_VM,
+		.kthread	= 1,
 	};
 
 	task = copy_process(&init_struct_pid, 0, cpu_to_node(cpu), &args);
@@ -2679,6 +2680,23 @@ pid_t kernel_clone(struct kernel_clone_args *args)
  * Create a kernel thread.
  */
 pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+	struct kernel_clone_args args = {
+		.flags		= ((lower_32_bits(flags) | CLONE_VM |
+				    CLONE_UNTRACED) & ~CSIGNAL),
+		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
+		.stack		= (unsigned long)fn,
+		.stack_size	= (unsigned long)arg,
+		.kthread	= 1,
+	};
+
+	return kernel_clone(&args);
+}
+
+/*
+ * Create a user mode thread.
+ */
+pid_t user_mode_thread(int (*fn)(void *), void *arg, unsigned long flags)
 {
 	struct kernel_clone_args args = {
 		.flags		= ((lower_32_bits(flags) | CLONE_VM |
diff --git a/kernel/umh.c b/kernel/umh.c
index 36c123360ab8..b989736e8707 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -132,7 +132,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
 
 	/* If SIGCLD is ignored do_wait won't populate the status. */
 	kernel_sigaction(SIGCHLD, SIG_DFL);
-	pid = kernel_thread(call_usermodehelper_exec_async, sub_info, SIGCHLD);
+	pid = user_mode_thread(call_usermodehelper_exec_async, sub_info, SIGCHLD);
 	if (pid < 0)
 		sub_info->retval = pid;
 	else
@@ -171,8 +171,8 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
 		 * want to pollute current->children, and we need a parent
 		 * that always ignores SIGCHLD to ensure auto-reaping.
 		 */
-		pid = kernel_thread(call_usermodehelper_exec_async, sub_info,
-				    CLONE_PARENT | SIGCHLD);
+		pid = user_mode_thread(call_usermodehelper_exec_async, sub_info,
+				       CLONE_PARENT | SIGCHLD);
 		if (pid < 0) {
 			sub_info->retval = pid;
 			umh_complete(sub_info);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/7] kthread: Don't allocate kthread_struct for init and umh
       [not found]   ` <CANpfEhNAQvazzCSN-dVgYmwNSRjqOrqZF0_j7GPLbCdEkogzSg@mail.gmail.com>
@ 2022-05-06 20:53     ` Eric W. Biederman
  0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2022-05-06 20:53 UTC (permalink / raw)
  To: chill
  Cc: linux-arch, Tejun Heo, Peter Zijlstra, Vincent Guittot, Al Viro,
	Jens Axboe, Thomas Gleixner, Linus Torvalds, linux-kernel,
	stable

chill <maximkabox13@gmail.com> writes:

> this looks like a real uaf vulnerability and can be executed by the user

The potential to use memory after it has been freed appears completely
real.  As such it is a bug and it should definitely be fixed.  That is
as far as I can see.

What I don't see, and I am very bad at this so I could be missing
something, is what bad thing kthread_is_per_cpu could be tricked into
doing.

I see a window of a single instruction which reads a single bit
that normally will return false.  If that bit instead reads true
it looks like the scheduler will simply decide to not run the
process on another cpu.


So I will put this change in linux-next.  It will be tested and I will
send it to Linus when the merge window for v5.19 opens.  After Linus
merges this I expect after a week or so it will be backported to the
various stable kernels.  Not that it needs to go farther than about
v5.17 where I introduced the bug.

Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/7] kthread: Don't allocate kthread_struct for init and umh
  2022-05-06 14:15 ` [PATCH 1/7] kthread: Don't allocate kthread_struct for init and umh Eric W. Biederman
       [not found]   ` <CANpfEhNAQvazzCSN-dVgYmwNSRjqOrqZF0_j7GPLbCdEkogzSg@mail.gmail.com>
@ 2022-05-10 14:38   ` Thomas Gleixner
  2022-05-10 15:14     ` Eric W. Biederman
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2022-05-10 14:38 UTC (permalink / raw)
  To: Eric W. Biederman, linux-arch
  Cc: Tejun Heo, Peter Zijlstra, Vincent Guittot, Al Viro, Jens Axboe,
	Linus Torvalds, linux-kernel, Eric W. Biederman, stable,
	Максим
	Кутявин

On Fri, May 06 2022 at 09:15, Eric W. Biederman wrote:
>  	 * the init task will end up wanting to create kthreads, which, if
>  	 * we schedule it before we create kthreadd, will OOPS.
>  	 */
> -	pid = kernel_thread(kernel_init, NULL, CLONE_FS);
> +	pid = user_mode_thread(kernel_init, NULL, CLONE_FS);

So init does not have PF_KTHREAD set anymore, which causes this to go
sideways with a NULL pointer dereference in get_mm_counter() on next:

 get_mm_counter include/linux/mm.h:1996 [inline]
 get_mm_rss include/linux/mm.h:2049 [inline]
 task_nr_scan_windows.isra.0+0x23/0x120 kernel/sched/fair.c:1123
 task_scan_min kernel/sched/fair.c:1144 [inline]
 task_scan_start+0x6c/0x400 kernel/sched/fair.c:1150
 task_tick_numa kernel/sched/fair.c:2944 [inline]
 task_tick_fair+0xaeb/0xef0 kernel/sched/fair.c:11186
 scheduler_tick+0x20a/0x5e0 kernel/sched/core.c:5380

 https://lore.kernel.org/lkml/0000000000008a9fbb05dea76400@google.com

because the fence in task_tick_numa():

 	if ((curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work)
		return;

is not longer sufficient. It needs also to bail if !curr->mm.

I'm worried that there are more of these issues lurking. Haven't looked
yet.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/7] kthread: Don't allocate kthread_struct for init and umh
  2022-05-10 14:38   ` Thomas Gleixner
@ 2022-05-10 15:14     ` Eric W. Biederman
  2022-05-11 17:41       ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2022-05-10 15:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-arch, Tejun Heo, Peter Zijlstra, Vincent Guittot, Al Viro,
	Jens Axboe, Linus Torvalds, linux-kernel, stable,
	Максим
	Кутявин

Thomas Gleixner <tglx@linutronix.de> writes:

> On Fri, May 06 2022 at 09:15, Eric W. Biederman wrote:
>>  	 * the init task will end up wanting to create kthreads, which, if
>>  	 * we schedule it before we create kthreadd, will OOPS.
>>  	 */
>> -	pid = kernel_thread(kernel_init, NULL, CLONE_FS);
>> +	pid = user_mode_thread(kernel_init, NULL, CLONE_FS);
>
> So init does not have PF_KTHREAD set anymore, which causes this to go
> sideways with a NULL pointer dereference in get_mm_counter() on next:

Well not after the change above, but in a later patch yes.

Patch 1/7 really gets us back to the previous status quo, where
I introduced the breakage.

>  get_mm_counter include/linux/mm.h:1996 [inline]
>  get_mm_rss include/linux/mm.h:2049 [inline]
>  task_nr_scan_windows.isra.0+0x23/0x120 kernel/sched/fair.c:1123
>  task_scan_min kernel/sched/fair.c:1144 [inline]
>  task_scan_start+0x6c/0x400 kernel/sched/fair.c:1150
>  task_tick_numa kernel/sched/fair.c:2944 [inline]
>  task_tick_fair+0xaeb/0xef0 kernel/sched/fair.c:11186
>  scheduler_tick+0x20a/0x5e0 kernel/sched/core.c:5380
>
>  https://lore.kernel.org/lkml/0000000000008a9fbb05dea76400@google.com
>
> because the fence in task_tick_numa():
>
>  	if ((curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work)
> 		return;
>
> is not longer sufficient. It needs also to bail if !curr->mm.

Agreed.  I proposed a patch to do just that a little while ago.

> I'm worried that there are more of these issues lurking. Haven't looked
> yet.

I looked earlier and I missed this one.  I am going to look again today,
along with applying the obvious fix to task_tick_numa.

I don't think there are many but when the code has evolved into a shape
that is not easy to understand things occasionally slip through when the
abstractions are made clear to understand.  The reason to rework the
code and make it clear is that once the code has evolved to a point of
many subtle issues making any change is brittle.

Eric


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/7] kthread: Don't allocate kthread_struct for init and umh
  2022-05-10 15:14     ` Eric W. Biederman
@ 2022-05-11 17:41       ` Eric W. Biederman
  0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2022-05-11 17:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-arch, Tejun Heo, Peter Zijlstra, Vincent Guittot, Al Viro,
	Jens Axboe, Linus Torvalds, linux-kernel, stable,
	Максим
	Кутявин

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Thomas Gleixner <tglx@linutronix.de> writes:
>
>> I'm worried that there are more of these issues lurking. Haven't looked
>> yet.
>
> I looked earlier and I missed this one.  I am going to look again today,
> along with applying the obvious fix to task_tick_numa.

So I have just looked again and I don't see anything.  There are a
couple of subtle issues on x86.  Especially with saving floating
point but as I read the code copy_thread initializes things
properly so that code that doesn't touch floating point registers
doesn't need to do anything.

The important thing for lurking issues is that even if I missed
something practically all of the uses of PF_KTHREAD are on x86
or generic code so they should be flushed out quickly.

Eric


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-11 17:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87mtfu4up3.fsf@email.froward.int.ebiederm.org>
2022-05-06 14:15 ` [PATCH 1/7] kthread: Don't allocate kthread_struct for init and umh Eric W. Biederman
     [not found]   ` <CANpfEhNAQvazzCSN-dVgYmwNSRjqOrqZF0_j7GPLbCdEkogzSg@mail.gmail.com>
2022-05-06 20:53     ` Eric W. Biederman
2022-05-10 14:38   ` Thomas Gleixner
2022-05-10 15:14     ` Eric W. Biederman
2022-05-11 17:41       ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).