linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michal Koutný" <mkoutny@suse.com>
To: linux-kernel@vger.kernel.org
Cc: "Eric W . Biederman" <ebiederm@xmission.com>,
	Jens Axboe <axboe@kernel.dk>, Kees Cook <keescook@chromium.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Jim Newsome <jnewsome@torproject.org>,
	Alexey Gladkov <legion@kernel.org>, Tejun Heo <tj@kernel.org>
Subject: [PATCH] exit: Retain nsproxy for exit_task_work() work entries
Date: Wed,  8 Dec 2021 19:05:01 +0100	[thread overview]
Message-ID: <20211208180501.11969-1-mkoutny@suse.com> (raw)

The reported issue is an attempted write in a cgroup file, by a zombie
who has/had acct(2) enabled. Such a write needs cgroup_ns for access
checking. Ordinary acct_process() would not be affected by this as it is
called well before exit_task_namespaces(). However, the reported NULL
dereference is a different acct data writer:

	Call Trace:
	 <TASK>
	 kernfs_fop_write_iter+0x3b6/0x510 fs/kernfs/file.c:296
	 __kernel_write+0x5d1/0xaf0 fs/read_write.c:535
	 do_acct_process+0x112a/0x17b0 kernel/acct.c:518
	 acct_pin_kill+0x27/0x130 kernel/acct.c:173
	 pin_kill+0x2a6/0x940 fs/fs_pin.c:44
	 mnt_pin_kill+0xc1/0x170 fs/fs_pin.c:81
	 cleanup_mnt+0x4bc/0x510 fs/namespace.c:1130
	 task_work_run+0x146/0x1c0 kernel/task_work.c:164
	 exit_task_work include/linux/task_work.h:32 [inline]
	 do_exit+0x705/0x24f0 kernel/exit.c:832
	 do_group_exit+0x168/0x2d0 kernel/exit.c:929
	 get_signal+0x16b0/0x2090 kernel/signal.c:2820
	 arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868
	 handle_signal_work kernel/entry/common.c:148 [inline]
	 exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
	 exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207
	 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
	 syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300
	 do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

i.e. called as one of task_work_run() entries.

The historical commit 8aac62706ada ("move exit_task_namespaces() outside
of exit_notify()") argues that exit_task_namespaces() must come before
exit_task_work() because ipc_ns cleanup calls fput/task_work_add.

There is accompanying commit e7b2c4069252 ("fput: task_work_add() can
fail if the caller has passed exit_task_work()") in the original series
that makes fput() robust in situations when task_work_add() cannot be
used anymore.

So in order to ensure that task_work_run() entries of the exiting task
have the nsproxy still available, swap the order of
exit_task_namespaces() and exit_task_work().

This change may appear like a partial revert of 8aac62706ada but this
particular ordering change shouldn't matter with the fix from
e7b2c4069252 and the other reason for 8aac62706ada (keeping exit_notify
simpler) still holds.

Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Cc: Oleg Nesterov <oleg@redhat.com>
Fixes: 5136f6365ce3 ("cgroup: implement "nsdelegate" mount option")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/exit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

I wasn't able to reproduce the syzbot's crash manually so the effectiveness of
the fix is only based on the reasoning.

diff --git a/kernel/exit.c b/kernel/exit.c
index f702a6a63686..0c2abdebb87c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -828,8 +828,8 @@ void __noreturn do_exit(long code)
 	exit_fs(tsk);
 	if (group_dead)
 		disassociate_ctty(1);
-	exit_task_namespaces(tsk);
 	exit_task_work(tsk);
+	exit_task_namespaces(tsk);
 	exit_thread(tsk);
 
 	/*
-- 
2.33.1


             reply	other threads:[~2021-12-08 18:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 18:05 Michal Koutný [this message]
2021-12-08 18:45 ` [PATCH] exit: Retain nsproxy for exit_task_work() work entries Eric W. Biederman
2021-12-08 19:06   ` Tejun Heo
2021-12-08 19:39     ` Linus Torvalds
2021-12-08 19:49       ` Tejun Heo
2021-12-08 23:07         ` Tejun Heo
2021-12-09 13:44           ` Michal Koutný
2021-12-09 14:08             ` Christian Brauner
2021-12-09 14:47               ` Michal Koutný
2021-12-09 15:06                 ` Christian Brauner
2021-12-09 16:39                   ` Michal Koutný
2021-12-10 23:12   ` Michal Koutný
2021-12-13 15:24     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211208180501.11969-1-mkoutny@suse.com \
    --to=mkoutny@suse.com \
    --cc=axboe@kernel.dk \
    --cc=ebiederm@xmission.com \
    --cc=jnewsome@torproject.org \
    --cc=keescook@chromium.org \
    --cc=legion@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).