All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, dave@stgolabs.net, dbueso@suse.de,
	linux-mm@kvack.org, mm-commits@vger.kernel.org, oleg@redhat.com,
	torvalds@linux-foundation.org
Subject: [patch 20/55] kernel/sys.c: only take tasklist_lock for get/setpriority(PRIO_PGRP)
Date: Wed, 19 Jan 2022 18:08:47 -0800	[thread overview]
Message-ID: <20220120020847.8t0ZohHcP%akpm@linux-foundation.org> (raw)
In-Reply-To: <20220119180714.9e187ce100e4510de3cd9f7d@linux-foundation.org>

From: Davidlohr Bueso <dave@stgolabs.net>
Subject: kernel/sys.c: only take tasklist_lock for get/setpriority(PRIO_PGRP)

PRIO_PGRP needs the tasklist_lock mainly to serialize vs setpgid(2), to
protect against any concurrent change_pid(PIDTYPE_PGID) that can move the
task from one hlist to another while iterating.

However, the remaining can only rely only on RCU:

PRIO_PROCESS only does the task lookup and never iterates over tasklist
and we already have an rcu-aware stable pointer.

PRIO_USER is already racy vs setuid(2) so with creds being rcu protected,
we can end up seeing stale data.  When removing the tasklist_lock there
can be a race with (i) fork but this is benign as the child's nice is
inherited and the new task is not observable by the user yet either, hence
the return semantics do not differ.  And (ii) a race with exit, which is a
small window and can cause us to miss a task which was removed from the
list and it had the highest nice.

Similarly change the buggy do_each_thread/while_each_thread combo in
PRIO_USER for the rcu-safe for_each_process_thread flavor, which doesn't
make use of next_thread/p->thread_group.

[akpm@linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/20211210182250.43734-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/sys.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- a/kernel/sys.c~kernel-sys-only-take-tasklist_lock-for-get-setpriorityprio_pgrp
+++ a/kernel/sys.c
@@ -220,7 +220,6 @@ SYSCALL_DEFINE3(setpriority, int, which,
 		niceval = MAX_NICE;
 
 	rcu_read_lock();
-	read_lock(&tasklist_lock);
 	switch (which) {
 	case PRIO_PROCESS:
 		if (who)
@@ -235,9 +234,11 @@ SYSCALL_DEFINE3(setpriority, int, which,
 			pgrp = find_vpid(who);
 		else
 			pgrp = task_pgrp(current);
+		read_lock(&tasklist_lock);
 		do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
 			error = set_one_prio(p, niceval, error);
 		} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
+		read_unlock(&tasklist_lock);
 		break;
 	case PRIO_USER:
 		uid = make_kuid(cred->user_ns, who);
@@ -249,16 +250,15 @@ SYSCALL_DEFINE3(setpriority, int, which,
 			if (!user)
 				goto out_unlock;	/* No processes for this user */
 		}
-		do_each_thread(g, p) {
+		for_each_process_thread(g, p) {
 			if (uid_eq(task_uid(p), uid) && task_pid_vnr(p))
 				error = set_one_prio(p, niceval, error);
-		} while_each_thread(g, p);
+		}
 		if (!uid_eq(uid, cred->uid))
 			free_uid(user);		/* For find_user() */
 		break;
 	}
 out_unlock:
-	read_unlock(&tasklist_lock);
 	rcu_read_unlock();
 out:
 	return error;
@@ -283,7 +283,6 @@ SYSCALL_DEFINE2(getpriority, int, which,
 		return -EINVAL;
 
 	rcu_read_lock();
-	read_lock(&tasklist_lock);
 	switch (which) {
 	case PRIO_PROCESS:
 		if (who)
@@ -301,11 +300,13 @@ SYSCALL_DEFINE2(getpriority, int, which,
 			pgrp = find_vpid(who);
 		else
 			pgrp = task_pgrp(current);
+		read_lock(&tasklist_lock);
 		do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
 			niceval = nice_to_rlimit(task_nice(p));
 			if (niceval > retval)
 				retval = niceval;
 		} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
+		read_unlock(&tasklist_lock);
 		break;
 	case PRIO_USER:
 		uid = make_kuid(cred->user_ns, who);
@@ -317,19 +318,18 @@ SYSCALL_DEFINE2(getpriority, int, which,
 			if (!user)
 				goto out_unlock;	/* No processes for this user */
 		}
-		do_each_thread(g, p) {
+		for_each_process_thread(g, p) {
 			if (uid_eq(task_uid(p), uid) && task_pid_vnr(p)) {
 				niceval = nice_to_rlimit(task_nice(p));
 				if (niceval > retval)
 					retval = niceval;
 			}
-		} while_each_thread(g, p);
+		}
 		if (!uid_eq(uid, cred->uid))
 			free_uid(user);		/* for find_user() */
 		break;
 	}
 out_unlock:
-	read_unlock(&tasklist_lock);
 	rcu_read_unlock();
 
 	return retval;
_

  parent reply	other threads:[~2022-01-20  2:08 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-20  2:07 incoming Andrew Morton
2022-01-20  2:07 ` [patch 01/55] mm: percpu: generalize percpu related config Andrew Morton
2022-01-20  2:07 ` [patch 02/55] mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef Andrew Morton
2022-01-20  2:07 ` [patch 03/55] mm: percpu: add generic pcpu_fc_alloc/free funciton Andrew Morton
2022-01-20  2:07 ` [patch 04/55] mm: percpu: add generic pcpu_populate_pte() function Andrew Morton
2022-01-20  2:07 ` [patch 05/55] proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistration Andrew Morton
2022-01-20  2:08 ` [patch 06/55] proc: make the proc_create[_data]() stubs static inlines Andrew Morton
2022-01-20  2:08 ` [patch 07/55] proc: convert the return type of proc_fd_access_allowed() to be boolean Andrew Morton
2022-01-20  2:08 ` [patch 08/55] sysctl: fix duplicate path separator in printed entries Andrew Morton
2022-01-20  2:08 ` [patch 09/55] sysctl: remove redundant ret assignment Andrew Morton
2022-01-20  2:08 ` [patch 10/55] include/linux/unaligned: replace kernel.h with the necessary inclusions Andrew Morton
2022-01-20  2:08 ` [patch 11/55] kernel.h: include a note to discourage people from including it in headers Andrew Morton
2022-01-20  2:08 ` [patch 12/55] fs/exec: replace strlcpy with strscpy_pad in __set_task_comm Andrew Morton
2022-01-20  2:08 ` [patch 13/55] fs/exec: replace strncpy with strscpy_pad in __get_task_comm Andrew Morton
2022-01-20  2:08 ` [patch 14/55] drivers/infiniband: replace open-coded string copy with get_task_comm Andrew Morton
2022-01-20  2:08 ` [patch 15/55] fs/binfmt_elf: " Andrew Morton
2022-01-20  2:08 ` [patch 16/55] samples/bpf/test_overhead_kprobe_kern: replace bpf_probe_read_kernel with bpf_probe_read_kernel_str to get task comm Andrew Morton
2022-01-20  2:08 ` [patch 17/55] tools/bpf/bpftool/skeleton: " Andrew Morton
2022-01-20  2:08 ` [patch 18/55] tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN Andrew Morton
2022-01-20  2:08 ` [patch 19/55] kthread: dynamically allocate memory to store kthread's full name Andrew Morton
2022-01-20  2:08 ` Andrew Morton [this message]
2022-01-20  2:08 ` [patch 21/55] get_maintainer: don't remind about no git repo when --nogit is used Andrew Morton
2022-01-20  2:08 ` [patch 22/55] kstrtox: uninline everything Andrew Morton
2022-01-20  2:08 ` [patch 23/55] list: introduce list_is_head() helper and re-use it in list.h Andrew Morton
2022-01-20  2:08 ` [patch 24/55] lib/list_debug.c: print more list debugging context in __list_del_entry_valid() Andrew Morton
2022-01-20  2:09 ` [patch 25/55] hash.h: remove unused define directive Andrew Morton
2022-01-20  2:09 ` [patch 26/55] test_hash.c: split test_int_hash into arch-specific functions Andrew Morton
2022-01-20  2:09 ` [patch 27/55] test_hash.c: split test_hash_init Andrew Morton
2022-01-20  2:09 ` [patch 28/55] lib/Kconfig.debug: properly split hash test kernel entries Andrew Morton
2022-01-20  2:09 ` [patch 29/55] test_hash.c: refactor into kunit Andrew Morton
2022-01-20  2:09 ` [patch 30/55] kunit: replace kernel.h with the necessary inclusions Andrew Morton
2022-01-20  2:09 ` [patch 31/55] uuid: discourage people from using UAPI header in new code Andrew Morton
2022-01-20  2:09 ` [patch 32/55] uuid: remove licence boilerplate text from the header Andrew Morton
2022-01-20  2:09 ` [patch 33/55] lib/test_meminit: destroy cache in kmem_cache_alloc_bulk() test Andrew Morton
2022-01-20  2:09 ` [patch 34/55] checkpatch: relax regexp for COMMIT_LOG_LONG_LINE Andrew Morton
2022-01-20  2:09 ` [patch 35/55] checkpatch: improve Kconfig help test Andrew Morton
2022-01-20  2:09 ` [patch 36/55] const_structs.checkpatch: add frequently used ops structs Andrew Morton
2022-01-20  2:09 ` [patch 37/55] fs/binfmt_elf: use PT_LOAD p_align values for static PIE Andrew Morton
2022-01-20  2:09 ` [patch 38/55] nilfs2: remove redundant pointer sbufs Andrew Morton
2022-01-20  2:09 ` [patch 39/55] hfsplus: use struct_group_attr() for memcpy() region Andrew Morton
2022-01-20  2:09 ` [patch 40/55] FAT: use io_schedule_timeout() instead of congestion_wait() Andrew Morton
2022-01-20  2:09 ` [patch 41/55] fs/adfs: remove unneeded variable make code cleaner Andrew Morton
2022-01-20  2:09 ` [patch 42/55] panic: use error_report_end tracepoint on warnings Andrew Morton
2022-01-20  2:09 ` [patch 43/55] panic: remove oops_id Andrew Morton
2022-01-20  2:10 ` [patch 44/55] delayacct: support swapin delay accounting for swapping without blkio Andrew Morton
2022-01-20  2:10 ` [patch 45/55] delayacct: fix incomplete disable operation when switch enable to disable Andrew Morton
2022-01-20  2:10 ` [patch 46/55] delayacct: cleanup flags in struct task_delay_info and functions use it Andrew Morton
2022-01-20  2:10 ` [patch 47/55] Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact Andrew Morton
2022-01-20  2:10 ` [patch 48/55] delayacct: track delays from memory compact Andrew Morton
2022-01-20  2:10 ` [patch 49/55] configs: introduce debug.config for CI-like setup Andrew Morton
2022-01-20  2:10 ` [patch 50/55] arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB Andrew Morton
2022-01-20  2:10 ` [patch 51/55] btrfs: use generic Kconfig option for 256kB page size limit Andrew Morton
2022-01-20  2:10 ` [patch 52/55] lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB Andrew Morton
2022-01-20  2:10 ` [patch 53/55] kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR Andrew Morton
2022-01-20  2:10 ` [patch 54/55] ubsan: remove CONFIG_UBSAN_OBJECT_SIZE Andrew Morton
2022-01-20  2:10 ` [patch 55/55] lib: remove redundant assignment to variable ret Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220120020847.8t0ZohHcP%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=dbueso@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.