linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next 0/2] tasklist_lock vs get/set priority syscalls
@ 2020-08-17  0:31 Davidlohr Bueso
  2020-08-17  0:31 ` [PATCH 1/2] kernel/sys: only take tasklist_lock for get/setpriority(PRIO_PGRP) Davidlohr Bueso
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Davidlohr Bueso @ 2020-08-17  0:31 UTC (permalink / raw)
  To: akpm; +Cc: oleg, axboe, linux-kernel, Davidlohr Bueso

Hi,

This is a (late) update on trying to reduce some of the scope of the tasklist_lock
for get/setpriority(2) as well as the block io equivalent. This version addresses
Oleg's previous concerns and incorporates his feedback.

Changes from v1:
https://lore.kernel.org/lkml/20200512000353.23653-1-dave@stgolabs.net/

 - only take the lock for PGID cases.
 - drop bogus PF_EXITING checks (and live with the small exit race).
 - add patch for IOPRIO_WHO_PGRP.

Please consider for v5.10.

Thanks!

Davidlohr Bueso (2):
  kernel/sys: only take tasklist_lock for get/setpriority(PRIO_PGRP)
  block: fix ioprio_get/set(IOPRIO_WHO_PGRP) vs setuid(2)

 block/ioprio.c |  4 ++++
 kernel/sys.c   | 16 ++++++++--------
 2 files changed, 12 insertions(+), 8 deletions(-)

--
2.26.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] kernel/sys: only take tasklist_lock for get/setpriority(PRIO_PGRP)
  2020-08-17  0:31 [PATCH -next 0/2] tasklist_lock vs get/set priority syscalls Davidlohr Bueso
@ 2020-08-17  0:31 ` Davidlohr Bueso
  2020-08-17 14:10   ` Oleg Nesterov
  2020-08-17  0:31 ` [PATCH 2/2] block: fix ioprio_get/set(IOPRIO_WHO_PGRP) vs setuid(2) Davidlohr Bueso
  2020-09-04 22:39 ` [PATCH -next 0/2] tasklist_lock vs get/set priority syscalls Davidlohr Bueso
  2 siblings, 1 reply; 7+ messages in thread
From: Davidlohr Bueso @ 2020-08-17  0:31 UTC (permalink / raw)
  To: akpm; +Cc: oleg, axboe, linux-kernel, Davidlohr Bueso, Davidlohr Bueso

PRIO_PGRP needs the tasklist_lock mainly to serialize vs setpgid(2),
to protect against any concurrent change_pid(PIDTYPE_PGID) that
can move the task from one hlist to another while iterating.

However, the remaining can only rely only on RCU:

PRIO_PROCESS only does the task lookup and never iterates over
tasklist and we already have an rcu-aware stable pointer.

PRIO_USER is already racy vs setuid(2) so with creds being rcu
protected, we can end up seeing stale data. When removing the
tasklist_lock there can be a race with (i) fork but this is benign
as the child's nice is inherited and the new task is not observable
by the user yet either, hence the return semantics do not differ.
And (ii) a race with exit, which is a small window and can cause
us to miss a task which was removed from the list and it had the
highest nice.

Similarly change the buggy do_each_thread/while_each_thread combo
in PRIO_USER for the rcu-safe for_each_process_thread flavor,
which doesn't make use of next_thread/p->thread_group.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/sys.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index ca11af9d815d..4d4f17c01a4f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -214,7 +214,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 		niceval = MAX_NICE;
 
 	rcu_read_lock();
-	read_lock(&tasklist_lock);
 	switch (which) {
 	case PRIO_PROCESS:
 		if (who)
@@ -229,9 +228,11 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			pgrp = find_vpid(who);
 		else
 			pgrp = task_pgrp(current);
+		read_lock(&tasklist_lock);
 		do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
 			error = set_one_prio(p, niceval, error);
 		} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
+		read_unlock(&tasklist_lock);
 		break;
 	case PRIO_USER:
 		uid = make_kuid(cred->user_ns, who);
@@ -243,16 +244,15 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			if (!user)
 				goto out_unlock;	/* No processes for this user */
 		}
-		do_each_thread(g, p) {
+		for_each_process_thread(g, p) {
 			if (uid_eq(task_uid(p), uid) && task_pid_vnr(p))
 				error = set_one_prio(p, niceval, error);
-		} while_each_thread(g, p);
+		}
 		if (!uid_eq(uid, cred->uid))
 			free_uid(user);		/* For find_user() */
 		break;
 	}
 out_unlock:
-	read_unlock(&tasklist_lock);
 	rcu_read_unlock();
 out:
 	return error;
@@ -277,7 +277,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 		return -EINVAL;
 
 	rcu_read_lock();
-	read_lock(&tasklist_lock);
 	switch (which) {
 	case PRIO_PROCESS:
 		if (who)
@@ -295,11 +294,13 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			pgrp = find_vpid(who);
 		else
 			pgrp = task_pgrp(current);
+		read_lock(&tasklist_lock);
 		do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
 			niceval = nice_to_rlimit(task_nice(p));
 			if (niceval > retval)
 				retval = niceval;
 		} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
+		read_unlock(&tasklist_lock);
 		break;
 	case PRIO_USER:
 		uid = make_kuid(cred->user_ns, who);
@@ -311,19 +312,18 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			if (!user)
 				goto out_unlock;	/* No processes for this user */
 		}
-		do_each_thread(g, p) {
+	        for_each_process_thread(g, p) {
 			if (uid_eq(task_uid(p), uid) && task_pid_vnr(p)) {
 				niceval = nice_to_rlimit(task_nice(p));
 				if (niceval > retval)
 					retval = niceval;
 			}
-		} while_each_thread(g, p);
+		}
 		if (!uid_eq(uid, cred->uid))
 			free_uid(user);		/* for find_user() */
 		break;
 	}
 out_unlock:
-	read_unlock(&tasklist_lock);
 	rcu_read_unlock();
 
 	return retval;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] block: fix ioprio_get/set(IOPRIO_WHO_PGRP) vs setuid(2)
  2020-08-17  0:31 [PATCH -next 0/2] tasklist_lock vs get/set priority syscalls Davidlohr Bueso
  2020-08-17  0:31 ` [PATCH 1/2] kernel/sys: only take tasklist_lock for get/setpriority(PRIO_PGRP) Davidlohr Bueso
@ 2020-08-17  0:31 ` Davidlohr Bueso
  2020-08-17 14:09   ` Oleg Nesterov
  2020-08-17 14:12   ` Jens Axboe
  2020-09-04 22:39 ` [PATCH -next 0/2] tasklist_lock vs get/set priority syscalls Davidlohr Bueso
  2 siblings, 2 replies; 7+ messages in thread
From: Davidlohr Bueso @ 2020-08-17  0:31 UTC (permalink / raw)
  To: akpm
  Cc: oleg, axboe, linux-kernel, Davidlohr Bueso, Greg Thelen, Davidlohr Bueso

do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent
change_pid(PIDTYPE_PGID) that can move the task from one hlist
to another while iterating. Serialize ioprio_get/set to take
the tasklist_lock in this case.

Fixes: d69b78ba1de (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}())
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 block/ioprio.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/ioprio.c b/block/ioprio.c
index 77bcab11dce5..4ede2da961bb 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -119,11 +119,13 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 				pgrp = task_pgrp(current);
 			else
 				pgrp = find_vpid(who);
+			read_lock(&tasklist_lock);
 			do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
 				ret = set_task_ioprio(p, ioprio);
 				if (ret)
 					break;
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
+			read_unlock(&tasklist_lock);
 			break;
 		case IOPRIO_WHO_USER:
 			uid = make_kuid(current_user_ns(), who);
@@ -207,6 +209,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 				pgrp = task_pgrp(current);
 			else
 				pgrp = find_vpid(who);
+			read_lock(&tasklist_lock);
 			do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
 				tmpio = get_task_ioprio(p);
 				if (tmpio < 0)
@@ -216,6 +219,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 				else
 					ret = ioprio_best(ret, tmpio);
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
+			read_unlock(&tasklist_lock);
 			break;
 		case IOPRIO_WHO_USER:
 			uid = make_kuid(current_user_ns(), who);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] block: fix ioprio_get/set(IOPRIO_WHO_PGRP) vs setuid(2)
  2020-08-17  0:31 ` [PATCH 2/2] block: fix ioprio_get/set(IOPRIO_WHO_PGRP) vs setuid(2) Davidlohr Bueso
@ 2020-08-17 14:09   ` Oleg Nesterov
  2020-08-17 14:12   ` Jens Axboe
  1 sibling, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2020-08-17 14:09 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: akpm, axboe, linux-kernel, Greg Thelen, Davidlohr Bueso

On 08/16, Davidlohr Bueso wrote:
>
> --- a/block/ioprio.c
> +++ b/block/ioprio.c
> @@ -119,11 +119,13 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
>  				pgrp = task_pgrp(current);
>  			else
>  				pgrp = find_vpid(who);
> +			read_lock(&tasklist_lock);
>  			do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
>  				ret = set_task_ioprio(p, ioprio);
>  				if (ret)
>  					break;
>  			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
> +			read_unlock(&tasklist_lock);
>  			break;
>  		case IOPRIO_WHO_USER:
>  			uid = make_kuid(current_user_ns(), who);
> @@ -207,6 +209,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
>  				pgrp = task_pgrp(current);
>  			else
>  				pgrp = find_vpid(who);
> +			read_lock(&tasklist_lock);
>  			do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
>  				tmpio = get_task_ioprio(p);
>  				if (tmpio < 0)
> @@ -216,6 +219,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
>  				else
>  					ret = ioprio_best(ret, tmpio);
>  			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
> +			read_unlock(&tasklist_lock);

Acked-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] kernel/sys: only take tasklist_lock for get/setpriority(PRIO_PGRP)
  2020-08-17  0:31 ` [PATCH 1/2] kernel/sys: only take tasklist_lock for get/setpriority(PRIO_PGRP) Davidlohr Bueso
@ 2020-08-17 14:10   ` Oleg Nesterov
  0 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2020-08-17 14:10 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: akpm, axboe, linux-kernel, Davidlohr Bueso

On 08/16, Davidlohr Bueso wrote:
>
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  kernel/sys.c | 16 ++++++++--------

Acked-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] block: fix ioprio_get/set(IOPRIO_WHO_PGRP) vs setuid(2)
  2020-08-17  0:31 ` [PATCH 2/2] block: fix ioprio_get/set(IOPRIO_WHO_PGRP) vs setuid(2) Davidlohr Bueso
  2020-08-17 14:09   ` Oleg Nesterov
@ 2020-08-17 14:12   ` Jens Axboe
  1 sibling, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2020-08-17 14:12 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm; +Cc: oleg, linux-kernel, Greg Thelen, Davidlohr Bueso

On 8/16/20 5:31 PM, Davidlohr Bueso wrote:
> do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent
> change_pid(PIDTYPE_PGID) that can move the task from one hlist
> to another while iterating. Serialize ioprio_get/set to take
> the tasklist_lock in this case.

LGTM:

Reviewed-by: Jens Axboe <axboe@kernel.dk>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH -next 0/2] tasklist_lock vs get/set priority syscalls
  2020-08-17  0:31 [PATCH -next 0/2] tasklist_lock vs get/set priority syscalls Davidlohr Bueso
  2020-08-17  0:31 ` [PATCH 1/2] kernel/sys: only take tasklist_lock for get/setpriority(PRIO_PGRP) Davidlohr Bueso
  2020-08-17  0:31 ` [PATCH 2/2] block: fix ioprio_get/set(IOPRIO_WHO_PGRP) vs setuid(2) Davidlohr Bueso
@ 2020-09-04 22:39 ` Davidlohr Bueso
  2 siblings, 0 replies; 7+ messages in thread
From: Davidlohr Bueso @ 2020-09-04 22:39 UTC (permalink / raw)
  To: akpm; +Cc: oleg, axboe, linux-kernel

On Sun, 16 Aug 2020, Davidlohr Bueso wrote:

>Hi,
>
>This is a (late) update on trying to reduce some of the scope of the tasklist_lock
>for get/setpriority(2) as well as the block io equivalent. This version addresses
>Oleg's previous concerns and incorporates his feedback.
>
>Changes from v1:
>https://lore.kernel.org/lkml/20200512000353.23653-1-dave@stgolabs.net/
>
> - only take the lock for PGID cases.
> - drop bogus PF_EXITING checks (and live with the small exit race).
> - add patch for IOPRIO_WHO_PGRP.
>
>Please consider for v5.10.

Andrew, unless you have any objections, could you please pick these up?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-04 22:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-17  0:31 [PATCH -next 0/2] tasklist_lock vs get/set priority syscalls Davidlohr Bueso
2020-08-17  0:31 ` [PATCH 1/2] kernel/sys: only take tasklist_lock for get/setpriority(PRIO_PGRP) Davidlohr Bueso
2020-08-17 14:10   ` Oleg Nesterov
2020-08-17  0:31 ` [PATCH 2/2] block: fix ioprio_get/set(IOPRIO_WHO_PGRP) vs setuid(2) Davidlohr Bueso
2020-08-17 14:09   ` Oleg Nesterov
2020-08-17 14:12   ` Jens Axboe
2020-09-04 22:39 ` [PATCH -next 0/2] tasklist_lock vs get/set priority syscalls Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).