All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] sched: sched_getattr should return E2BIG, not EFBIG
@ 2019-09-03 17:16 Thadeu Lima de Souza Cascardo
  2019-09-03 17:16 ` [PATCH 2/2] sched: allow sched_getattr with old size Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 10+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2019-09-03 17:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Thadeu Lima de Souza Cascardo

As documented and the behavior before commit 22400674945c (sched: Simplify
return logic in sched_read_attr()), sched_getattr should return E2BIG instead
of EFBIG when there is not enough space to copy sched_attr.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Fixes: 22400674945c (sched: Simplify return logic in sched_read_attr())
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2477893dd069..0fd67281e656 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5129,7 +5129,7 @@ static int sched_read_attr(struct sched_attr __user *uattr,
 
 		for (; addr < end; addr++) {
 			if (*addr)
-				return -EFBIG;
+				return -E2BIG;
 		}
 
 		attr->size = usize;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] sched: allow sched_getattr with old size
  2019-09-03 17:16 [PATCH 1/2] sched: sched_getattr should return E2BIG, not EFBIG Thadeu Lima de Souza Cascardo
@ 2019-09-03 17:16 ` Thadeu Lima de Souza Cascardo
  2019-09-04  7:55   ` [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2019-09-03 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Thadeu Lima de Souza Cascardo,
	Patrick Bellasi

After commit a509a7cd7974 (sched/uclamp: Extend sched_setattr() to support
utilization clamping), using sched_getattr with size 48 will return E2BIG.

This breaks, for example, chrt.
$ chrt -p $$
chrt: failed to get pid 26306's policy: Argument list too long
$

With this fix, when using the old size, the utilization clamping values will be
absent from sched_attr. When using the new size or some larger size, they will
be present.

After the fix, chrt works again.
$ chrt -p $$
pid 4649's current scheduling policy: SCHED_OTHER
pid 4649's current scheduling priority: 0
$

The drawback with this solution is that userspace will ignore there are
non-default utilization clamps, but it's arguable whether returning E2BIG in
this case makes sense when that same userspace doesn't know about those values
anyway.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Fixes: a509a7cd7974 (sched/uclamp: Extend sched_setattr() to support utilization clamping)
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
---
 kernel/sched/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0fd67281e656..0ccc7fa80da6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5183,8 +5183,10 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 		attr.sched_nice = task_nice(p);
 
 #ifdef CONFIG_UCLAMP_TASK
-	attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
-	attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+	if (size >= SCHED_ATTR_SIZE_VER1) {
+		attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
+		attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+	}
 #endif
 
 	rcu_read_unlock();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
  2019-09-03 17:16 ` [PATCH 2/2] sched: allow sched_getattr with old size Thadeu Lima de Souza Cascardo
@ 2019-09-04  7:55   ` Ingo Molnar
  2019-09-04  8:49     ` [PATCH v2] " Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2019-09-04  7:55 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Patrick Bellasi,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Linus Torvalds,
	Jiri Olsa


* Thadeu Lima de Souza Cascardo <cascardo@canonical.com> wrote:

> After commit a509a7cd7974 (sched/uclamp: Extend sched_setattr() to support
> utilization clamping), using sched_getattr with size 48 will return E2BIG.
> 
> This breaks, for example, chrt.
> $ chrt -p $$
> chrt: failed to get pid 26306's policy: Argument list too long
> $
> 
> With this fix, when using the old size, the utilization clamping values will be
> absent from sched_attr. When using the new size or some larger size, they will
> be present.
> 
> After the fix, chrt works again.
> $ chrt -p $$
> pid 4649's current scheduling policy: SCHED_OTHER
> pid 4649's current scheduling priority: 0
> $
> 
> The drawback with this solution is that userspace will ignore there are
> non-default utilization clamps, but it's arguable whether returning E2BIG in
> this case makes sense when that same userspace doesn't know about those values
> anyway.

a509a7cd7974 breaks the ABI when CONFIG_UCLAMP_TASK=y, so there's really 
no choice here but to fix it.

But this is not the right fix:

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5183,8 +5183,10 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
>  		attr.sched_nice = task_nice(p);
>  
>  #ifdef CONFIG_UCLAMP_TASK
> -	attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
> -	attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
> +	if (size >= SCHED_ATTR_SIZE_VER1) {
> +		attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
> +		attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
> +	}
>  #endif

The real bug is that the whole sched_read_attr() logic of checking 
non-zero bits in new ABI components is arguably broken, and pretty much 
any extension of the ABI will spuriously break the ABI as you just found 
out. That's way too fragile.

Instead how about something like the patch below? It cleans up the ABI 
logic:

 - if user-attributes have the same size as kernel attributes then the 
   logic is unchanged.

 - if user-attributes are larger than the kernel knows about then simply 
   skip the extra bits, but set attr->size to the (smaller) kernel size 
   so that tooling can (in principle) handle older kernel as well.

 - if user-attributes are smaller than the kernel knows about then just 
   copy whatever user-space can accept.

Also clean up the whole logic:

 - Simplify the code flow - there's no need for 'ret' for example.

 - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we 
   always know which side we are dealing with.

 - Why is it called 'read' when what it does is to copy to user? This 
   code is so far away from VFS read() semantics that the naming is 
   actively confusing. Name it sched_attr_copy_to_user() instead, which
   mirrors other copy_to_user() functionality.

 - Move the attr->size assignment from the head of sched_getattr() to the 
   sched_attr_copy_to_user() function. Nothing else within the kernel 
   should care about the size of the structure.

With this patch I believe the sched_getattr() syscall now nicely supports 
an extensible ABI in both a forward and backward compatible fashion, and 
will also fix the bug.

As an added bonus the bogus -EFBIG return is removed as well.

This needs to be fixed before v5.3 is released.

NOTE: one additional open question is whether to set attr->size to the 
      *larger* value if the kernel has a newer ABI. This would allow 
      user-space to detect a new ABI.

I've also Cc:-ed perf tooling ABI experts, which handles things in a 
similar fashion. (And if it doesn't or shouldn't then please chime in.)

Thanks,

	Ingo

------
Subject: sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
From: Ingo Molnar <mingo@kernel.org>
Date: Sat Aug 24 17:41:57 CEST 2019

Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c |   80 ++++++++++++++++++++++++++--------------------------
 1 file changed, 40 insertions(+), 40 deletions(-)

Index: linux/kernel/sched/core.c
===================================================================
--- linux.orig/kernel/sched/core.c
+++ linux/kernel/sched/core.c
@@ -5299,37 +5299,40 @@ out_unlock:
 	return retval;
 }
 
-static int sched_read_attr(struct sched_attr __user *uattr,
-			   struct sched_attr *attr,
-			   unsigned int usize)
+/*
+ * Copy the kernel size attribute structure (which might be larger
+ * than what user-space knows about) to user-space.
+ *
+ * Note that all cases are valid: user-space buffer can be larger or
+ * smaller than the kernel-space buffer. The usual case is that both
+ * have the same size.
+ */
+static int
+sched_attr_copy_to_user(struct sched_attr __user *uattr,
+			struct sched_attr *kattr,
+			unsigned int usize)
 {
-	int ret;
+	unsigned int ksize = sizeof(*kattr);
 
-	if (!access_ok(uattr, usize))
+	if (!access_ok(uattr, max(usize, ksize)))
 		return -EFAULT;
 
 	/*
-	 * If we're handed a smaller struct than we know of,
-	 * ensure all the unknown bits are 0 - i.e. old
-	 * user-space does not get uncomplete information.
+	 * sched_getattr() ABI forwards and backwards compatibility:
+	 *
+	 * If usize == ksize then we just copy everything to user-space and all is good.
+	 *
+	 * If usize < ksize then we only copy as much as user-space has space for,
+	 * this keeps ABI compatibility as well. We skip the rest.
+	 *
+	 * If usize > ksize then user-space is using a newer version of the ABI,
+	 * which part the kernel doesn't know about. Just ignore it - tooling can
+	 * detect the kernel's knowledge of attributes from the attr->size value
+	 * which is set to ksize in this case.
 	 */
-	if (usize < sizeof(*attr)) {
-		unsigned char *addr;
-		unsigned char *end;
-
-		addr = (void *)attr + usize;
-		end  = (void *)attr + sizeof(*attr);
-
-		for (; addr < end; addr++) {
-			if (*addr)
-				return -EFBIG;
-		}
-
-		attr->size = usize;
-	}
+	kattr->size = min(usize, ksize);
 
-	ret = copy_to_user(uattr, attr, attr->size);
-	if (ret)
+	if (copy_to_user(uattr, kattr, kattr->size))
 		return -EFAULT;
 
 	return 0;
@@ -5339,20 +5342,18 @@ static int sched_read_attr(struct sched_
  * sys_sched_getattr - similar to sched_getparam, but with sched_attr
  * @pid: the pid in question.
  * @uattr: structure containing the extended parameters.
- * @size: sizeof(attr) for fwd/bwd comp.
+ * @usize: sizeof(attr) that user-space knows about, for forwards and backwards compatibility.
  * @flags: for future extension.
  */
 SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
-		unsigned int, size, unsigned int, flags)
+		unsigned int, usize, unsigned int, flags)
 {
-	struct sched_attr attr = {
-		.size = sizeof(struct sched_attr),
-	};
+	struct sched_attr kattr;
 	struct task_struct *p;
 	int retval;
 
-	if (!uattr || pid < 0 || size > PAGE_SIZE ||
-	    size < SCHED_ATTR_SIZE_VER0 || flags)
+	if (!uattr || pid < 0 || usize > PAGE_SIZE ||
+	    usize < SCHED_ATTR_SIZE_VER0 || flags)
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -5365,25 +5366,24 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pi
 	if (retval)
 		goto out_unlock;
 
-	attr.sched_policy = p->policy;
+	kattr.sched_policy = p->policy;
 	if (p->sched_reset_on_fork)
-		attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
+		kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
 	if (task_has_dl_policy(p))
-		__getparam_dl(p, &attr);
+		__getparam_dl(p, &kattr);
 	else if (task_has_rt_policy(p))
-		attr.sched_priority = p->rt_priority;
+		kattr.sched_priority = p->rt_priority;
 	else
-		attr.sched_nice = task_nice(p);
+		kattr.sched_nice = task_nice(p);
 
 #ifdef CONFIG_UCLAMP_TASK
-	attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
-	attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+	kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
+	kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
 #endif
 
 	rcu_read_unlock();
 
-	retval = sched_read_attr(uattr, &attr, size);
-	return retval;
+	return sched_attr_copy_to_user(uattr, &kattr, usize);
 
 out_unlock:
 	rcu_read_unlock();

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
  2019-09-04  7:55   ` [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code Ingo Molnar
@ 2019-09-04  8:49     ` Ingo Molnar
  2019-09-04  8:55       ` [PATCH v3] " Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2019-09-04  8:49 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Patrick Bellasi,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Linus Torvalds,
	Jiri Olsa


* Ingo Molnar <mingo@kernel.org> wrote:

> As an added bonus the bogus -EFBIG return is removed as well.
> 
> This needs to be fixed before v5.3 is released.
> 
> NOTE: one additional open question is whether to set attr->size to the 
>       *larger* value if the kernel has a newer ABI. This would allow 
>       user-space to detect a new ABI.
> 
> I've also Cc:-ed perf tooling ABI experts, which handles things in a 
> similar fashion. (And if it doesn't or shouldn't then please chime in.)

Latest version of the patch - simplified it some more and created a 
changelog:

===================>
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 4 Sep 2019 09:55:32 +0200
Subject: [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code

Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:

  $ chrt -p $$
  chrt: failed to get pid 26306's policy: Argument list too long

and he has root-caused the bug to the following commit increasing sched_attr
size and breaking sched_read_attr() into returning -EFBIG:

  a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")

The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
logic of checking non-zero bits in new ABI components is arguably broken,
and pretty much any extension of the ABI will spuriously break the ABI.
That's way too fragile.

Instead implement the perf syscall's extensible ABI instead, which we
already implement on the sched_setattr() side:

 - if user-attributes have the same size as kernel attributes then the
   logic is unchanged.

 - if user-attributes are larger than the kernel knows about then simply
   skip the extra bits, but set attr->size to the (smaller) kernel size
   so that tooling can (in principle) handle older kernel as well.

 - if user-attributes are smaller than the kernel knows about then just
   copy whatever user-space can accept.

Also clean up the whole logic:

 - Simplify the code flow - there's no need for 'ret' for example.

 - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
   always know which side we are dealing with.

 - Why is it called 'read' when what it does is to copy to user? This
   code is so far away from VFS read() semantics that the naming is
   actively confusing. Name it sched_attr_copy_to_user() instead, which
   mirrors other copy_to_user() functionality.

 - Move the attr->size assignment from the head of sched_getattr() to the
   sched_attr_copy_to_user() function. Nothing else within the kernel
   should care about the size of the structure.

With these fixes the sched_getattr() syscall now nicely supports an
extensible ABI in both a forward and backward compatible fashion, and
will also fix the chrt bug.

As an added bonus the bogus -EFBIG return is removed as well, which as
Thadeu noted should have been -E2BIG to begin with.

Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 80 ++++++++++++++++++++++++++---------------------------
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 010d578118d6..3258ee421695 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5105,37 +5105,40 @@ SYSCALL_DEFINE2(sched_getparam, pid_t, pid, struct sched_param __user *, param)
 	return retval;
 }
 
-static int sched_read_attr(struct sched_attr __user *uattr,
-			   struct sched_attr *attr,
-			   unsigned int usize)
+/*
+ * Copy the kernel size attribute structure (which might be larger
+ * than what user-space knows about) to user-space.
+ *
+ * Note that all cases are valid: user-space buffer can be larger or
+ * smaller than the kernel-space buffer. The usual case is that both
+ * have the same size.
+ */
+static int
+sched_attr_copy_to_user(struct sched_attr __user *uattr,
+			struct sched_attr *kattr,
+			unsigned int usize)
 {
-	int ret;
+	unsigned int ksize = sizeof(*kattr);
 
-	if (!access_ok(uattr, usize))
+	if (!access_ok(uattr, ksize)
 		return -EFAULT;
 
 	/*
-	 * If we're handed a smaller struct than we know of,
-	 * ensure all the unknown bits are 0 - i.e. old
-	 * user-space does not get uncomplete information.
+	 * sched_getattr() ABI forwards and backwards compatibility:
+	 *
+	 * If usize == ksize then we just copy everything to user-space and all is good.
+	 *
+	 * If usize < ksize then we only copy as much as user-space has space for,
+	 * this keeps ABI compatibility as well. We skip the rest.
+	 *
+	 * If usize > ksize then user-space is using a newer version of the ABI,
+	 * which part the kernel doesn't know about. Just ignore it - tooling can
+	 * detect the kernel's knowledge of attributes from the attr->size value
+	 * which is set to ksize in this case.
 	 */
-	if (usize < sizeof(*attr)) {
-		unsigned char *addr;
-		unsigned char *end;
+	kattr->size = min(usize, ksize);
 
-		addr = (void *)attr + usize;
-		end  = (void *)attr + sizeof(*attr);
-
-		for (; addr < end; addr++) {
-			if (*addr)
-				return -EFBIG;
-		}
-
-		attr->size = usize;
-	}
-
-	ret = copy_to_user(uattr, attr, attr->size);
-	if (ret)
+	if (copy_to_user(uattr, kattr, kattr->size))
 		return -EFAULT;
 
 	return 0;
@@ -5145,20 +5148,18 @@ static int sched_read_attr(struct sched_attr __user *uattr,
  * sys_sched_getattr - similar to sched_getparam, but with sched_attr
  * @pid: the pid in question.
  * @uattr: structure containing the extended parameters.
- * @size: sizeof(attr) for fwd/bwd comp.
+ * @usize: sizeof(attr) that user-space knows about, for forwards and backwards compatibility.
  * @flags: for future extension.
  */
 SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
-		unsigned int, size, unsigned int, flags)
+		unsigned int, usize, unsigned int, flags)
 {
-	struct sched_attr attr = {
-		.size = sizeof(struct sched_attr),
-	};
+	struct sched_attr kattr;
 	struct task_struct *p;
 	int retval;
 
-	if (!uattr || pid < 0 || size > PAGE_SIZE ||
-	    size < SCHED_ATTR_SIZE_VER0 || flags)
+	if (!uattr || pid < 0 || usize > PAGE_SIZE ||
+	    usize < SCHED_ATTR_SIZE_VER0 || flags)
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -5171,25 +5172,24 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	if (retval)
 		goto out_unlock;
 
-	attr.sched_policy = p->policy;
+	kattr.sched_policy = p->policy;
 	if (p->sched_reset_on_fork)
-		attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
+		kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
 	if (task_has_dl_policy(p))
-		__getparam_dl(p, &attr);
+		__getparam_dl(p, &kattr);
 	else if (task_has_rt_policy(p))
-		attr.sched_priority = p->rt_priority;
+		kattr.sched_priority = p->rt_priority;
 	else
-		attr.sched_nice = task_nice(p);
+		kattr.sched_nice = task_nice(p);
 
 #ifdef CONFIG_UCLAMP_TASK
-	attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
-	attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+	kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
+	kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
 #endif
 
 	rcu_read_unlock();
 
-	retval = sched_read_attr(uattr, &attr, size);
-	return retval;
+	return sched_attr_copy_to_user(uattr, &kattr, usize);
 
 out_unlock:
 	rcu_read_unlock();

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
  2019-09-04  8:49     ` [PATCH v2] " Ingo Molnar
@ 2019-09-04  8:55       ` Ingo Molnar
  2019-09-04  9:31         ` Dietmar Eggemann
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2019-09-04  8:55 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Patrick Bellasi,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Linus Torvalds,
	Jiri Olsa


* Ingo Molnar <mingo@kernel.org> wrote:

> +	if (!access_ok(uattr, ksize)
>  		return -EFAULT;

How about we pretend that I never sent v2? ;-)

-v3 attached. Build and minimally boot tested.

Thanks,

	Ingo

======================>
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 4 Sep 2019 09:55:32 +0200
Subject: [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code

Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:

  $ chrt -p $$
  chrt: failed to get pid 26306's policy: Argument list too long

and he has root-caused the bug to the following commit increasing sched_attr
size and breaking sched_read_attr() into returning -EFBIG:

  a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")

The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
logic of checking non-zero bits in new ABI components is arguably broken,
and pretty much any extension of the ABI will spuriously break the ABI.
That's way too fragile.

Instead implement the perf syscall's extensible ABI instead, which we
already implement on the sched_setattr() side:

 - if user-attributes have the same size as kernel attributes then the
   logic is unchanged.

 - if user-attributes are larger than the kernel knows about then simply
   skip the extra bits, but set attr->size to the (smaller) kernel size
   so that tooling can (in principle) handle older kernel as well.

 - if user-attributes are smaller than the kernel knows about then just
   copy whatever user-space can accept.

Also clean up the whole logic:

 - Simplify the code flow - there's no need for 'ret' for example.

 - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
   always know which side we are dealing with.

 - Why is it called 'read' when what it does is to copy to user? This
   code is so far away from VFS read() semantics that the naming is
   actively confusing. Name it sched_attr_copy_to_user() instead, which
   mirrors other copy_to_user() functionality.

 - Move the attr->size assignment from the head of sched_getattr() to the
   sched_attr_copy_to_user() function. Nothing else within the kernel
   should care about the size of the structure.

With these fixes the sched_getattr() syscall now nicely supports an
extensible ABI in both a forward and backward compatible fashion, and
will also fix the chrt bug.

As an added bonus the bogus -EFBIG return is removed as well, which as
Thadeu noted should have been -E2BIG to begin with.

Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 80 ++++++++++++++++++++++++++---------------------------
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 010d578118d6..437a1a479e3b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5105,37 +5105,40 @@ SYSCALL_DEFINE2(sched_getparam, pid_t, pid, struct sched_param __user *, param)
 	return retval;
 }
 
-static int sched_read_attr(struct sched_attr __user *uattr,
-			   struct sched_attr *attr,
-			   unsigned int usize)
+/*
+ * Copy the kernel size attribute structure (which might be larger
+ * than what user-space knows about) to user-space.
+ *
+ * Note that all cases are valid: user-space buffer can be larger or
+ * smaller than the kernel-space buffer. The usual case is that both
+ * have the same size.
+ */
+static int
+sched_attr_copy_to_user(struct sched_attr __user *uattr,
+			struct sched_attr *kattr,
+			unsigned int usize)
 {
-	int ret;
+	unsigned int ksize = sizeof(*kattr);
 
-	if (!access_ok(uattr, usize))
+	if (!access_ok(uattr, ksize))
 		return -EFAULT;
 
 	/*
-	 * If we're handed a smaller struct than we know of,
-	 * ensure all the unknown bits are 0 - i.e. old
-	 * user-space does not get uncomplete information.
+	 * sched_getattr() ABI forwards and backwards compatibility:
+	 *
+	 * If usize == ksize then we just copy everything to user-space and all is good.
+	 *
+	 * If usize < ksize then we only copy as much as user-space has space for,
+	 * this keeps ABI compatibility as well. We skip the rest.
+	 *
+	 * If usize > ksize then user-space is using a newer version of the ABI,
+	 * which part the kernel doesn't know about. Just ignore it - tooling can
+	 * detect the kernel's knowledge of attributes from the attr->size value
+	 * which is set to ksize in this case.
 	 */
-	if (usize < sizeof(*attr)) {
-		unsigned char *addr;
-		unsigned char *end;
+	kattr->size = min(usize, ksize);
 
-		addr = (void *)attr + usize;
-		end  = (void *)attr + sizeof(*attr);
-
-		for (; addr < end; addr++) {
-			if (*addr)
-				return -EFBIG;
-		}
-
-		attr->size = usize;
-	}
-
-	ret = copy_to_user(uattr, attr, attr->size);
-	if (ret)
+	if (copy_to_user(uattr, kattr, kattr->size))
 		return -EFAULT;
 
 	return 0;
@@ -5145,20 +5148,18 @@ static int sched_read_attr(struct sched_attr __user *uattr,
  * sys_sched_getattr - similar to sched_getparam, but with sched_attr
  * @pid: the pid in question.
  * @uattr: structure containing the extended parameters.
- * @size: sizeof(attr) for fwd/bwd comp.
+ * @usize: sizeof(attr) that user-space knows about, for forwards and backwards compatibility.
  * @flags: for future extension.
  */
 SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
-		unsigned int, size, unsigned int, flags)
+		unsigned int, usize, unsigned int, flags)
 {
-	struct sched_attr attr = {
-		.size = sizeof(struct sched_attr),
-	};
+	struct sched_attr kattr;
 	struct task_struct *p;
 	int retval;
 
-	if (!uattr || pid < 0 || size > PAGE_SIZE ||
-	    size < SCHED_ATTR_SIZE_VER0 || flags)
+	if (!uattr || pid < 0 || usize > PAGE_SIZE ||
+	    usize < SCHED_ATTR_SIZE_VER0 || flags)
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -5171,25 +5172,24 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	if (retval)
 		goto out_unlock;
 
-	attr.sched_policy = p->policy;
+	kattr.sched_policy = p->policy;
 	if (p->sched_reset_on_fork)
-		attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
+		kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
 	if (task_has_dl_policy(p))
-		__getparam_dl(p, &attr);
+		__getparam_dl(p, &kattr);
 	else if (task_has_rt_policy(p))
-		attr.sched_priority = p->rt_priority;
+		kattr.sched_priority = p->rt_priority;
 	else
-		attr.sched_nice = task_nice(p);
+		kattr.sched_nice = task_nice(p);
 
 #ifdef CONFIG_UCLAMP_TASK
-	attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
-	attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+	kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
+	kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
 #endif
 
 	rcu_read_unlock();
 
-	retval = sched_read_attr(uattr, &attr, size);
-	return retval;
+	return sched_attr_copy_to_user(uattr, &kattr, usize);
 
 out_unlock:
 	rcu_read_unlock();

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
  2019-09-04  8:55       ` [PATCH v3] " Ingo Molnar
@ 2019-09-04  9:31         ` Dietmar Eggemann
  2019-09-04 10:39           ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Dietmar Eggemann @ 2019-09-04  9:31 UTC (permalink / raw)
  To: Ingo Molnar, Thadeu Lima de Souza Cascardo
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Patrick Bellasi,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Linus Torvalds,
	Jiri Olsa

On 04/09/2019 10:55, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
>> +	if (!access_ok(uattr, ksize)
>>  		return -EFAULT;
> 
> How about we pretend that I never sent v2? ;-)
> 
> -v3 attached. Build and minimally boot tested.
> 
> Thanks,
> 
> 	Ingo
> 

This patch fixes the issue (almost).

LTP's sched_getattr01 passes again. But IMHO the prio 'chrt -p $$'
should be 0 instead of -65536.

- tip sched/core (w/ CONFIG_UCLAMP_TASK):

root@juno:/opt/ltp/results# chrt -p $$
chrt: failed to get pid 1635's policy: File too large

Test Start Time: Wed Sep  4 10:17:34 2019
-----------------------------------------
Testcase                                           Result     Exit Value
--------                                           ------     ----------
sched_get_priority_min01                           PASS       0
sched_get_priority_min02                           PASS       0
sched_getparam01                                   PASS       0
sched_getparam02                                   PASS       0
sched_getparam03                                   PASS       0
sched_rr_get_interval01                            PASS       0
sched_rr_get_interval02                            PASS       0
sched_rr_get_interval03                            PASS       0
sched_setparam01                                   PASS       0
sched_setparam02                                   PASS       0
sched_setparam03                                   PASS       0
sched_setparam04                                   PASS       0
sched_setparam05                                   PASS       0
sched_getscheduler01                               PASS       0
sched_getscheduler02                               PASS       0
sched_setscheduler01                               PASS       0
sched_setscheduler02                               PASS       0
sched_setscheduler03                               PASS       0
sched_yield01                                      PASS       0
sched_setaffinity01                                PASS       0
sched_getaffinity01                                PASS       0
sched_setattr01                                    PASS       0
sched_getattr01                                    FAIL       1  <---
sched_getattr02                                    PASS       0

-----------------------------------------------
Total Tests: 24
Total Skipped Tests: 0
Total Failures: 1
Kernel Version: 5.3.0-rc1-00101-g0413d7f33e60
Machine Architecture: aarch64
Hostname: juno

- tip sched/core (w/ CONFIG_UCLAMP_TASK) + patch:

root@juno:~# chrt -p $$
pid 1633's current scheduling policy: SCHED_OTHER
pid 1633's current scheduling priority: -65536    <--- should be 0

Test Start Time: Wed Sep  4 10:22:45 2019
-----------------------------------------
Testcase                                           Result     Exit Value
--------                                           ------     ----------
sched_get_priority_min01                           PASS       0
sched_get_priority_min02                           PASS       0
sched_getparam01                                   PASS       0
sched_getparam02                                   PASS       0
sched_getparam03                                   PASS       0
sched_rr_get_interval01                            PASS       0
sched_rr_get_interval02                            PASS       0
sched_rr_get_interval03                            PASS       0
sched_setparam01                                   PASS       0
sched_setparam02                                   PASS       0
sched_setparam03                                   PASS       0
sched_setparam04                                   PASS       0
sched_setparam05                                   PASS       0
sched_getscheduler01                               PASS       0
sched_getscheduler02                               PASS       0
sched_setscheduler01                               PASS       0
sched_setscheduler02                               PASS       0
sched_setscheduler03                               PASS       0
sched_yield01                                      PASS       0
sched_setaffinity01                                PASS       0
sched_getaffinity01                                PASS       0
sched_setattr01                                    PASS       0
sched_getattr01                                    PASS       0 <---
sched_getattr02                                    PASS       0

-----------------------------------------------
Total Tests: 24
Total Skipped Tests: 0
Total Failures: 0
Kernel Version: 5.3.0-rc1-00102-g80a776a6e3b7
Machine Architecture: aarch64
Hostname: juno

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
  2019-09-04  9:31         ` Dietmar Eggemann
@ 2019-09-04 10:39           ` Ingo Molnar
  2019-09-04 10:55             ` Dietmar Eggemann
  2019-09-04 13:19             ` Thadeu Lima de Souza Cascardo
  0 siblings, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2019-09-04 10:39 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Thadeu Lima de Souza Cascardo, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Patrick Bellasi, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Linus Torvalds, Jiri Olsa


* Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

> > -v3 attached. Build and minimally boot tested.
> > 
> > Thanks,
> > 
> > 	Ingo
> > 
> 
> This patch fixes the issue (almost).
> 
> LTP's sched_getattr01 passes again. But IMHO the prio 'chrt -p $$'
> should be 0 instead of -65536.

Yeah, I forgot to zero-initialize the structure ...

Does the attached -v4 work any better for you?

Thanks,

	Ingo

===============================>
Date: Wed, 4 Sep 2019 09:55:32 +0200
Subject: [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code

Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:

  $ chrt -p $$
  chrt: failed to get pid 26306's policy: Argument list too long

and he has root-caused the bug to the following commit increasing sched_attr
size and breaking sched_read_attr() into returning -EFBIG:

  a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")

The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
logic of checking non-zero bits in new ABI components is arguably broken,
and pretty much any extension of the ABI will spuriously break the ABI.
That's way too fragile.

Instead implement the perf syscall's extensible ABI instead, which we
already implement on the sched_setattr() side:

 - if user-attributes have the same size as kernel attributes then the
   logic is unchanged.

 - if user-attributes are larger than the kernel knows about then simply
   skip the extra bits, but set attr->size to the (smaller) kernel size
   so that tooling can (in principle) handle older kernel as well.

 - if user-attributes are smaller than the kernel knows about then just
   copy whatever user-space can accept.

Also clean up the whole logic:

 - Simplify the code flow - there's no need for 'ret' for example.

 - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
   always know which side we are dealing with.

 - Why is it called 'read' when what it does is to copy to user? This
   code is so far away from VFS read() semantics that the naming is
   actively confusing. Name it sched_attr_copy_to_user() instead, which
   mirrors other copy_to_user() functionality.

 - Move the attr->size assignment from the head of sched_getattr() to the
   sched_attr_copy_to_user() function. Nothing else within the kernel
   should care about the size of the structure.

With these fixes the sched_getattr() syscall now nicely supports an
extensible ABI in both a forward and backward compatible fashion, and
will also fix the chrt bug.

As an added bonus the bogus -EFBIG return is removed as well, which as
Thadeu noted should have been -E2BIG to begin with.

Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 82 +++++++++++++++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 40 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 010d578118d6..3c64eb28dd70 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5105,39 +5105,44 @@ SYSCALL_DEFINE2(sched_getparam, pid_t, pid, struct sched_param __user *, param)
 	return retval;
 }
 
-static int sched_read_attr(struct sched_attr __user *uattr,
-			   struct sched_attr *attr,
-			   unsigned int usize)
+/*
+ * Copy the kernel size attribute structure (which might be larger
+ * than what user-space knows about) to user-space.
+ *
+ * Note that all cases are valid: user-space buffer can be larger or
+ * smaller than the kernel-space buffer. The usual case is that both
+ * have the same size.
+ */
+static int
+sched_attr_copy_to_user(struct sched_attr __user *uattr,
+			struct sched_attr *kattr,
+			unsigned int usize)
 {
-	int ret;
+	unsigned int ksize = sizeof(*kattr);
 
-	if (!access_ok(uattr, usize))
+	if (!access_ok(uattr, ksize))
 		return -EFAULT;
 
 	/*
-	 * If we're handed a smaller struct than we know of,
-	 * ensure all the unknown bits are 0 - i.e. old
-	 * user-space does not get uncomplete information.
+	 * sched_getattr() ABI forwards and backwards compatibility:
+	 *
+	 * If usize == ksize then we just copy everything to user-space and all is good.
+	 *
+	 * If usize < ksize then we only copy as much as user-space has space for,
+	 * this keeps ABI compatibility as well. We skip the rest.
+	 *
+	 * If usize > ksize then user-space is using a newer version of the ABI,
+	 * which part the kernel doesn't know about. Just ignore it - tooling can
+	 * detect the kernel's knowledge of attributes from the attr->size value
+	 * which is set to ksize in this case.
 	 */
-	if (usize < sizeof(*attr)) {
-		unsigned char *addr;
-		unsigned char *end;
-
-		addr = (void *)attr + usize;
-		end  = (void *)attr + sizeof(*attr);
-
-		for (; addr < end; addr++) {
-			if (*addr)
-				return -EFBIG;
-		}
+	kattr->size = min(usize, ksize);
 
-		attr->size = usize;
-	}
-
-	ret = copy_to_user(uattr, attr, attr->size);
-	if (ret)
+	if (copy_to_user(uattr, kattr, kattr->size))
 		return -EFAULT;
 
+	printk("sched_attr_copy_to_user(): copied %d bytes. (ksize: %d, usize: %d)\n", kattr->size, ksize, usize);
+
 	return 0;
 }
 
@@ -5145,20 +5150,18 @@ static int sched_read_attr(struct sched_attr __user *uattr,
  * sys_sched_getattr - similar to sched_getparam, but with sched_attr
  * @pid: the pid in question.
  * @uattr: structure containing the extended parameters.
- * @size: sizeof(attr) for fwd/bwd comp.
+ * @usize: sizeof(attr) that user-space knows about, for forwards and backwards compatibility.
  * @flags: for future extension.
  */
 SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
-		unsigned int, size, unsigned int, flags)
+		unsigned int, usize, unsigned int, flags)
 {
-	struct sched_attr attr = {
-		.size = sizeof(struct sched_attr),
-	};
+	struct sched_attr kattr = { };
 	struct task_struct *p;
 	int retval;
 
-	if (!uattr || pid < 0 || size > PAGE_SIZE ||
-	    size < SCHED_ATTR_SIZE_VER0 || flags)
+	if (!uattr || pid < 0 || usize > PAGE_SIZE ||
+	    usize < SCHED_ATTR_SIZE_VER0 || flags)
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -5171,25 +5174,24 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	if (retval)
 		goto out_unlock;
 
-	attr.sched_policy = p->policy;
+	kattr.sched_policy = p->policy;
 	if (p->sched_reset_on_fork)
-		attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
+		kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
 	if (task_has_dl_policy(p))
-		__getparam_dl(p, &attr);
+		__getparam_dl(p, &kattr);
 	else if (task_has_rt_policy(p))
-		attr.sched_priority = p->rt_priority;
+		kattr.sched_priority = p->rt_priority;
 	else
-		attr.sched_nice = task_nice(p);
+		kattr.sched_nice = task_nice(p);
 
 #ifdef CONFIG_UCLAMP_TASK
-	attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
-	attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+	kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
+	kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
 #endif
 
 	rcu_read_unlock();
 
-	retval = sched_read_attr(uattr, &attr, size);
-	return retval;
+	return sched_attr_copy_to_user(uattr, &kattr, usize);
 
 out_unlock:
 	rcu_read_unlock();

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
  2019-09-04 10:39           ` Ingo Molnar
@ 2019-09-04 10:55             ` Dietmar Eggemann
  2019-09-04 13:19             ` Thadeu Lima de Souza Cascardo
  1 sibling, 0 replies; 10+ messages in thread
From: Dietmar Eggemann @ 2019-09-04 10:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thadeu Lima de Souza Cascardo, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Patrick Bellasi, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Linus Torvalds, Jiri Olsa

On 04/09/2019 12:39, Ingo Molnar wrote:
> 
> * Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> 
>>> -v3 attached. Build and minimally boot tested.
>>>
>>> Thanks,
>>>
>>> 	Ingo
>>>
>>
>> This patch fixes the issue (almost).
>>
>> LTP's sched_getattr01 passes again. But IMHO the prio 'chrt -p $$'
>> should be 0 instead of -65536.
> 
> Yeah, I forgot to zero-initialize the structure ...
> 
> Does the attached -v4 work any better for you?

Yeah, looks better now.

# chrt -p $$
[  258.245786] sched_attr_copy_to_user(): copied 48 bytes. (ksize: 56,
usize: 48)
pid 1635's current scheduling policy: SCHED_OTHER
pid 1635's current scheduling priority: 0

w/o the extra printk:

Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
  2019-09-04 10:39           ` Ingo Molnar
  2019-09-04 10:55             ` Dietmar Eggemann
@ 2019-09-04 13:19             ` Thadeu Lima de Souza Cascardo
  2019-09-04 17:53               ` Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2019-09-04 13:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dietmar Eggemann, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Patrick Bellasi, Thomas Gleixner, Arnaldo Carvalho de Melo,
	Linus Torvalds, Jiri Olsa

On Wed, Sep 04, 2019 at 12:39:25PM +0200, Ingo Molnar wrote:
> 
> * Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> 
> > > -v3 attached. Build and minimally boot tested.
> > > 
> > > Thanks,
> > > 
> > > 	Ingo
> > > 
> > 
> > This patch fixes the issue (almost).
> > 
> > LTP's sched_getattr01 passes again. But IMHO the prio 'chrt -p $$'
> > should be 0 instead of -65536.
> 
> Yeah, I forgot to zero-initialize the structure ...
> 
> Does the attached -v4 work any better for you?
> 
> Thanks,
> 
> 	Ingo

Hi, Ingo.

Thanks for the patch. It works just fine for me, I have only one comment about
it, below.

Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Tested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

I also tested LTP, chrt, and that 5.2 (before the breaking commit) and 5.3 with
your patch behave the same when the user size is larger than what the kernel
knows, the return is 0 and the struct size is set to the known value.

There is one odd behaviour now, which is whenever size is set between VER0 and
VER1, we will have a partial struct filled up, instead of getting E2BIG or
EINVAL.

> 
> ===============================>
> Date: Wed, 4 Sep 2019 09:55:32 +0200
> Subject: [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
> 
> Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:
> 
>   $ chrt -p $$
>   chrt: failed to get pid 26306's policy: Argument list too long
> 
> and he has root-caused the bug to the following commit increasing sched_attr
> size and breaking sched_read_attr() into returning -EFBIG:
> 
>   a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
> 
> The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
> logic of checking non-zero bits in new ABI components is arguably broken,
> and pretty much any extension of the ABI will spuriously break the ABI.
> That's way too fragile.
> 
> Instead implement the perf syscall's extensible ABI instead, which we
> already implement on the sched_setattr() side:
> 
>  - if user-attributes have the same size as kernel attributes then the
>    logic is unchanged.
> 
>  - if user-attributes are larger than the kernel knows about then simply
>    skip the extra bits, but set attr->size to the (smaller) kernel size
>    so that tooling can (in principle) handle older kernel as well.
> 
>  - if user-attributes are smaller than the kernel knows about then just
>    copy whatever user-space can accept.
> 
> Also clean up the whole logic:
> 
>  - Simplify the code flow - there's no need for 'ret' for example.
> 
>  - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
>    always know which side we are dealing with.
> 
>  - Why is it called 'read' when what it does is to copy to user? This
>    code is so far away from VFS read() semantics that the naming is
>    actively confusing. Name it sched_attr_copy_to_user() instead, which
>    mirrors other copy_to_user() functionality.
> 
>  - Move the attr->size assignment from the head of sched_getattr() to the
>    sched_attr_copy_to_user() function. Nothing else within the kernel
>    should care about the size of the structure.
> 
> With these fixes the sched_getattr() syscall now nicely supports an
> extensible ABI in both a forward and backward compatible fashion, and
> will also fix the chrt bug.
> 
> As an added bonus the bogus -EFBIG return is removed as well, which as
> Thadeu noted should have been -E2BIG to begin with.
> 
> Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Patrick Bellasi <patrick.bellasi@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
> Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/core.c | 82 +++++++++++++++++++++++++++--------------------------
>  1 file changed, 42 insertions(+), 40 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 010d578118d6..3c64eb28dd70 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5105,39 +5105,44 @@ SYSCALL_DEFINE2(sched_getparam, pid_t, pid, struct sched_param __user *, param)
>  	return retval;
>  }
>  
> -static int sched_read_attr(struct sched_attr __user *uattr,
> -			   struct sched_attr *attr,
> -			   unsigned int usize)
> +/*
> + * Copy the kernel size attribute structure (which might be larger
> + * than what user-space knows about) to user-space.
> + *
> + * Note that all cases are valid: user-space buffer can be larger or
> + * smaller than the kernel-space buffer. The usual case is that both
> + * have the same size.
> + */
> +static int
> +sched_attr_copy_to_user(struct sched_attr __user *uattr,
> +			struct sched_attr *kattr,
> +			unsigned int usize)
>  {
> -	int ret;
> +	unsigned int ksize = sizeof(*kattr);
>  
> -	if (!access_ok(uattr, usize))
> +	if (!access_ok(uattr, ksize))
>  		return -EFAULT;
>  

I believe this should be either usize or kattr->size and moved below (or just
reuse min(usize,ksize)).

Otherwise, an old binary that uses the old ABI (that is, VER0 as size) may get
EFAULT here. This should almost never in practice. I tried, but I could hardly
use an address that would fail access_ok. But, theoretically, that still breaks
ABI.

Regards.
Cascardo.

>  	/*
> -	 * If we're handed a smaller struct than we know of,
> -	 * ensure all the unknown bits are 0 - i.e. old
> -	 * user-space does not get uncomplete information.
> +	 * sched_getattr() ABI forwards and backwards compatibility:
> +	 *
> +	 * If usize == ksize then we just copy everything to user-space and all is good.
> +	 *
> +	 * If usize < ksize then we only copy as much as user-space has space for,
> +	 * this keeps ABI compatibility as well. We skip the rest.
> +	 *
> +	 * If usize > ksize then user-space is using a newer version of the ABI,
> +	 * which part the kernel doesn't know about. Just ignore it - tooling can
> +	 * detect the kernel's knowledge of attributes from the attr->size value
> +	 * which is set to ksize in this case.
>  	 */
> -	if (usize < sizeof(*attr)) {
> -		unsigned char *addr;
> -		unsigned char *end;
> -
> -		addr = (void *)attr + usize;
> -		end  = (void *)attr + sizeof(*attr);
> -
> -		for (; addr < end; addr++) {
> -			if (*addr)
> -				return -EFBIG;
> -		}
> +	kattr->size = min(usize, ksize);
>  
> -		attr->size = usize;
> -	}
> -
> -	ret = copy_to_user(uattr, attr, attr->size);
> -	if (ret)
> +	if (copy_to_user(uattr, kattr, kattr->size))
>  		return -EFAULT;
>  
> +	printk("sched_attr_copy_to_user(): copied %d bytes. (ksize: %d, usize: %d)\n", kattr->size, ksize, usize);
> +
>  	return 0;
>  }
>  
> @@ -5145,20 +5150,18 @@ static int sched_read_attr(struct sched_attr __user *uattr,
>   * sys_sched_getattr - similar to sched_getparam, but with sched_attr
>   * @pid: the pid in question.
>   * @uattr: structure containing the extended parameters.
> - * @size: sizeof(attr) for fwd/bwd comp.
> + * @usize: sizeof(attr) that user-space knows about, for forwards and backwards compatibility.
>   * @flags: for future extension.
>   */
>  SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
> -		unsigned int, size, unsigned int, flags)
> +		unsigned int, usize, unsigned int, flags)
>  {
> -	struct sched_attr attr = {
> -		.size = sizeof(struct sched_attr),
> -	};
> +	struct sched_attr kattr = { };
>  	struct task_struct *p;
>  	int retval;
>  
> -	if (!uattr || pid < 0 || size > PAGE_SIZE ||
> -	    size < SCHED_ATTR_SIZE_VER0 || flags)
> +	if (!uattr || pid < 0 || usize > PAGE_SIZE ||
> +	    usize < SCHED_ATTR_SIZE_VER0 || flags)
>  		return -EINVAL;
>  
>  	rcu_read_lock();
> @@ -5171,25 +5174,24 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
>  	if (retval)
>  		goto out_unlock;
>  
> -	attr.sched_policy = p->policy;
> +	kattr.sched_policy = p->policy;
>  	if (p->sched_reset_on_fork)
> -		attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
> +		kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
>  	if (task_has_dl_policy(p))
> -		__getparam_dl(p, &attr);
> +		__getparam_dl(p, &kattr);
>  	else if (task_has_rt_policy(p))
> -		attr.sched_priority = p->rt_priority;
> +		kattr.sched_priority = p->rt_priority;
>  	else
> -		attr.sched_nice = task_nice(p);
> +		kattr.sched_nice = task_nice(p);
>  
>  #ifdef CONFIG_UCLAMP_TASK
> -	attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
> -	attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
> +	kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
> +	kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
>  #endif
>  
>  	rcu_read_unlock();
>  
> -	retval = sched_read_attr(uattr, &attr, size);
> -	return retval;
> +	return sched_attr_copy_to_user(uattr, &kattr, usize);
>  
>  out_unlock:
>  	rcu_read_unlock();

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
  2019-09-04 13:19             ` Thadeu Lima de Souza Cascardo
@ 2019-09-04 17:53               ` Ingo Molnar
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2019-09-04 17:53 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Dietmar Eggemann, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Patrick Bellasi, Thomas Gleixner, Arnaldo Carvalho de Melo,
	Linus Torvalds, Jiri Olsa


* Thadeu Lima de Souza Cascardo <cascardo@canonical.com> wrote:

> There is one odd behaviour now, which is whenever size is set between 
> VER0 and VER1, we will have a partial struct filled up, instead of 
> getting E2BIG or EINVAL.

Well, that's pretty much by design: user-space is asking for 'usize' 
bytes of information, and the kernel provides it if it can.

This way the kernel can keep backwards compatibility indefinitely, 
without new kernels having to be aware of the zillions of prior ABI 
versions that were due to slow expansion of the ABI. (This actually 
happened to the perf ABI, which was expanded dozens of times already.)

> > +static int
> > +sched_attr_copy_to_user(struct sched_attr __user *uattr,
> > +			struct sched_attr *kattr,
> > +			unsigned int usize)
> >  {
> > -	int ret;
> > +	unsigned int ksize = sizeof(*kattr);
> >  
> > -	if (!access_ok(uattr, usize))
> > +	if (!access_ok(uattr, ksize))
> >  		return -EFAULT;
> >  
> 
> I believe this should be either usize or kattr->size and moved below 
> (or just reuse min(usize,ksize)).
> 
> Otherwise, an old binary that uses the old ABI (that is, VER0 as size) 
> may get EFAULT here. This should almost never in practice. I tried, but 
> I could hardly use an address that would fail access_ok. But, 
> theoretically, that still breaks ABI.

I agree that this is mostly theoretical, as currently these sizes are 
limited between VER0 and PAGE_SIZE - and hardly anyone puts these 
structures near the very end of virtual memory.

But checking 'usize' is arguably the more correct value, as that's the 
size of the user-space buffer. I've done this change in the latest 
version of the fix.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-09-04 17:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-03 17:16 [PATCH 1/2] sched: sched_getattr should return E2BIG, not EFBIG Thadeu Lima de Souza Cascardo
2019-09-03 17:16 ` [PATCH 2/2] sched: allow sched_getattr with old size Thadeu Lima de Souza Cascardo
2019-09-04  7:55   ` [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code Ingo Molnar
2019-09-04  8:49     ` [PATCH v2] " Ingo Molnar
2019-09-04  8:55       ` [PATCH v3] " Ingo Molnar
2019-09-04  9:31         ` Dietmar Eggemann
2019-09-04 10:39           ` Ingo Molnar
2019-09-04 10:55             ` Dietmar Eggemann
2019-09-04 13:19             ` Thadeu Lima de Souza Cascardo
2019-09-04 17:53               ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.