linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Davidlohr Bueso <dave@stgolabs.net>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, oleg@redhat.com, paulmck@kernel.org,
	tglx@linutronix.de, linux-kernel@vger.kernel.org,
	dave@stgolabs.net, Davidlohr Bueso <dbueso@suse.de>
Subject: [PATCH 1/2] kernel/sys: only rely on rcu for getpriority(2)
Date: Mon, 11 May 2020 17:03:52 -0700	[thread overview]
Message-ID: <20200512000353.23653-2-dave@stgolabs.net> (raw)
In-Reply-To: <20200512000353.23653-1-dave@stgolabs.net>

Currently the tasklist_lock is shared mainly in order to observe
the list atomically for the PRIO_PGRP and PRIO_USER cases, as
the actual lookups are already rcu-safe, providing a stable task
pointer. By removing the lock, we can race with:

(i) fork (insertion), but this is benign as the child's nice
is inherited and the actual task is not observable by the user
yet either. The return semantics do not differ.

and;

(ii) exit (deletion), this window is small but if a task is
deleted with the highest nice and it is not observed this would
cause a change in return semantics. To further reduce the window
we ignore any tasks that are PF_EXITING in the 'old' version of
the list.

The case for PRIO_PROCESS does not need the lock at all as it only
looks up the pointer.

The following raw microbenchmark improvements on a 40-core box
were seen running the stressng-get workload, which pathologically
pounds on various syscalls that get information from the kernel.
Increasing thread counts of course shows more wins, albeit probably
not something that would be seen in a real workload.

			      5.7.0-rc3              5.7.0-rc3
						getpriority-v1
Hmean     get-1      3443.65 (   0.00%)     3314.08 *  -3.76%*
Hmean     get-2      7809.99 (   0.00%)     8547.60 *   9.44%*
Hmean     get-4     15498.01 (   0.00%)    17396.85 *  12.25%*
Hmean     get-8     28001.37 (   0.00%)    31137.53 *  11.20%*
Hmean     get-16    31460.88 (   0.00%)    40284.35 *  28.05%*
Hmean     get-32    30036.64 (   0.00%)    40657.88 *  35.36%*
Hmean     get-64    31429.86 (   0.00%)    41021.73 *  30.52%*
Hmean     get-80    31804.13 (   0.00%)    39188.55 *  23.22%*

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/sys.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index d325f3ab624a..0b72184f5e3e 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -277,7 +277,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 		return -EINVAL;
 
 	rcu_read_lock();
-	read_lock(&tasklist_lock);
 	switch (which) {
 	case PRIO_PROCESS:
 		if (who)
@@ -296,6 +295,9 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 		else
 			pgrp = task_pgrp(current);
 		do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
+			if (unlikely(p->flags & PF_EXITING))
+				continue;
+
 			niceval = nice_to_rlimit(task_nice(p));
 			if (niceval > retval)
 				retval = niceval;
@@ -313,6 +315,9 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 		}
 		do_each_thread(g, p) {
 			if (uid_eq(task_uid(p), uid) && task_pid_vnr(p)) {
+				if (unlikely(p->flags & PF_EXITING))
+					continue;
+
 				niceval = nice_to_rlimit(task_nice(p));
 				if (niceval > retval)
 					retval = niceval;
@@ -323,7 +328,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 		break;
 	}
 out_unlock:
-	read_unlock(&tasklist_lock);
 	rcu_read_unlock();
 
 	return retval;
-- 
2.26.1


  reply	other threads:[~2020-05-12  0:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-12  0:03 [PATCH -next v2 0/2] kernel/sys: reduce tasklist_lock usage get/set priorities Davidlohr Bueso
2020-05-12  0:03 ` Davidlohr Bueso [this message]
2020-05-12 15:09   ` [PATCH 1/2] kernel/sys: only rely on rcu for getpriority(2) Oleg Nesterov
2020-05-12 16:09     ` Davidlohr Bueso
2020-05-12 16:41       ` Oleg Nesterov
2020-05-12 16:58         ` Davidlohr Bueso
2020-05-12 18:16           ` Oleg Nesterov
2020-05-12  0:03 ` [PATCH 2/2] kernel/sys: do not grab tasklist_lock for sys_setpriority(PRIO_PROCESS) Davidlohr Bueso
2020-05-12 16:10   ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200512000353.23653-2-dave@stgolabs.net \
    --to=dave@stgolabs.net \
    --cc=akpm@linux-foundation.org \
    --cc=dbueso@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).