linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace
@ 2019-05-03 18:10 Joel Savitz
  2019-05-03 18:10 ` [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2) Joel Savitz
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Joel Savitz @ 2019-05-03 18:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Savitz, Thomas Gleixner, Ingo Molnar, Masami Hiramatsu,
	Waiman Long, Mauro Carvalho Chehab, Kristina Martsenko,
	Andrew Morton, Cyrill Gorcunov, Kees Cook, Gustavo A. R. Silva,
	YueHaibing, Micah Morton, Yang Shi, Jann Horn, Alexey Dobriyan,
	Rafael Aquini, Michael Kerrisk, Yury Norov, David Laight

In the mainline kernel, there is no quick mechanism to get the virtual
memory size of the current process from userspace.

Despite the current state of affairs, this information is available to the
user through several means, one being a linear search of the entire address
space. This is an inefficient use of cpu cycles.

A component of the libhugetlb kernel test does exactly this, and as
systems' address spaces increase beyond 32-bits, this method becomes
exceedingly tedious.

For example, on a ppc64le system with a 47-bit address space, the linear
search causes the test to hang for some unknown amount of time. I
couldn't give you an exact number because I just ran it for about 10-20
minutes and went to go do something else, probably to get coffee or
something, and when I came back, I just killed the test and patched it
to use this new mechanism. I re-ran my new version of the test using a
kernel with this patch, and of course it passed through the previously
bottlenecking codepath nearly instantaneously.

As such, I propose that the prctl syscall be extended to include the
option to retrieve TASK_SIZE from the kernel.

This patch will allow us to upgrade an O(n) codepath to O(1) in an
architecture-independent manner, and provide a mechanism for future
generations to do the same.

Changes from v2:
 We now account for the case of 32-bit compat userspace on a 64-bit kernel
 More detail about the nature of TASK_SIZE in documentation

Joel Savitz(2):
  sys/prctl: add PR_GET_TASK_SIZE option to prctl(2)
  prctl.2: Document the new PR_GET_TASK_SIZE option

 include/uapi/linux/prctl.h |  3 +++
 kernel/sys.c               | 23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+)

 man2/prctl.2 | 10 ++++++++++
 1 file changed, 10 insertions(+)
--
2.18.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)
  2019-05-03 18:10 [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace Joel Savitz
@ 2019-05-03 18:10 ` Joel Savitz
  2019-05-03 21:08   ` Yury Norov
                     ` (2 more replies)
  2019-05-03 18:10 ` [PATCH v3 2/2] prctl.2: Document the new PR_GET_TASK_SIZE option Joel Savitz
  2019-05-03 20:49 ` [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace Yury Norov
  2 siblings, 3 replies; 11+ messages in thread
From: Joel Savitz @ 2019-05-03 18:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Savitz, Thomas Gleixner, Ingo Molnar, Masami Hiramatsu,
	Waiman Long, Mauro Carvalho Chehab, Kristina Martsenko,
	Andrew Morton, Cyrill Gorcunov, Kees Cook, Gustavo A. R. Silva,
	YueHaibing, Micah Morton, Yang Shi, Jann Horn, Alexey Dobriyan,
	Rafael Aquini, Michael Kerrisk, Yury Norov, David Laight

When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
copy the value of TASK_SIZE to the userspace address in arg2.

It is important that we account for the case of the userspace task
running in 32-bit compat mode on a 64-bit kernel. As such, we must be
careful to copy the correct number of bytes to userspace to avoid stack
corruption.

Suggested-by: Yuri Norov <yury.norov@gmail.com>
Suggested-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
---
 include/uapi/linux/prctl.h |  3 +++
 kernel/sys.c               | 23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 094bb03b9cc2..2c261c461952 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -229,4 +229,7 @@ struct prctl_mm_map {
 # define PR_PAC_APDBKEY                        (1UL << 3)
 # define PR_PAC_APGAKEY                        (1UL << 4)

+/* Get the process virtual memory size (i.e. the highest usable VM address) */
+#define PR_GET_TASK_SIZE               55
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 12df0e5434b8..709584400070 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2252,6 +2252,26 @@ static int propagate_has_child_subreaper(struct task_struct *p, void *data)
        return 1;
 }

+static int prctl_get_tasksize(void __user *uaddr)
+{
+	unsigned long current_task_size, current_word_size;
+
+	current_task_size = TASK_SIZE;
+	current_word_size = sizeof(unsigned long);
+
+#ifdef CONFIG_64BIT
+	/* On 64-bit architecture, we must check whether the current thread
+	 * is running in 32-bit compat mode. If it is, we can simply cut
+	 * the size in half. This avoids corruption of the userspace stack.
+	 */
+	if (test_thread_flag(TIF_ADDR32))
+		current_word_size >>= 1;
+#endif
+
+	return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
+}
+
 int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
 {
        return -EINVAL;
@@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
                        return -EINVAL;
                error = PAC_RESET_KEYS(me, arg2);
                break;
+	case PR_GET_TASK_SIZE:
+		error = prctl_get_tasksize((void *)arg2);
+		break;
        default:
                error = -EINVAL;
                break;
--
2.18.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2] prctl.2: Document the new PR_GET_TASK_SIZE option
  2019-05-03 18:10 [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace Joel Savitz
  2019-05-03 18:10 ` [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2) Joel Savitz
@ 2019-05-03 18:10 ` Joel Savitz
  2019-05-03 20:49 ` [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace Yury Norov
  2 siblings, 0 replies; 11+ messages in thread
From: Joel Savitz @ 2019-05-03 18:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Savitz, Thomas Gleixner, Ingo Molnar, Masami Hiramatsu,
	Waiman Long, Mauro Carvalho Chehab, Kristina Martsenko,
	Andrew Morton, Cyrill Gorcunov, Kees Cook, Gustavo A. R. Silva,
	YueHaibing, Micah Morton, Yang Shi, Jann Horn, Alexey Dobriyan,
	Rafael Aquini, Michael Kerrisk, Yury Norov, David Laight

Add a short explanation of the new PR_GET_TASK_SIZE option for the benefit
of future generations.

Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
---
 man2/prctl.2 | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/man2/prctl.2 b/man2/prctl.2
index 06d8e13c7..cae582726 100644
--- a/man2/prctl.2
+++ b/man2/prctl.2
@@ -49,6 +49,7 @@
 .\" 2013-01-10 Kees Cook, document PR_SET_PTRACER
 .\" 2012-02-04 Michael Kerrisk, document PR_{SET,GET}_CHILD_SUBREAPER
 .\" 2014-11-10 Dave Hansen, document PR_MPX_{EN,DIS}ABLE_MANAGEMENT
+.\" 2019-05-03 Joel Savitz, document PR_GET_TASK_SIZE
 .\"
 .\"
 .TH PRCTL 2 2019-03-06 "Linux" "Linux Programmer's Manual"
@@ -1375,6 +1376,15 @@ system call on Tru64).
 for information on versions and architectures)
 Return unaligned access control bits, in the location pointed to by
 .IR "(unsigned int\ *) arg2" .
+.TP
+.B PR_GET_TASK_SIZE
+Copy the value of TASK_SIZE to the userspace address in
+.IR "(unsigned long\ *) arg2" .
+This value represents the highest virtual address available
+to the current process. Return
+.B EFAULT
+if this operation fails.
+
 .SH RETURN VALUE
 On success,
 .BR PR_GET_DUMPABLE ,
--
2.18.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace
  2019-05-03 18:10 [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace Joel Savitz
  2019-05-03 18:10 ` [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2) Joel Savitz
  2019-05-03 18:10 ` [PATCH v3 2/2] prctl.2: Document the new PR_GET_TASK_SIZE option Joel Savitz
@ 2019-05-03 20:49 ` Yury Norov
  2019-05-03 21:57   ` Rafael Aquini
  2 siblings, 1 reply; 11+ messages in thread
From: Yury Norov @ 2019-05-03 20:49 UTC (permalink / raw)
  To: Joel Savitz
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Masami Hiramatsu,
	Waiman Long, Mauro Carvalho Chehab, Kristina Martsenko,
	Andrew Morton, Cyrill Gorcunov, Kees Cook, Gustavo A. R. Silva,
	YueHaibing, Micah Morton, Yang Shi, Jann Horn, Alexey Dobriyan,
	Rafael Aquini, Michael Kerrisk, David Laight

On Fri, May 03, 2019 at 02:10:19PM -0400, Joel Savitz wrote:
> In the mainline kernel, there is no quick mechanism to get the virtual
> memory size of the current process from userspace.
> 
> Despite the current state of affairs, this information is available to the
> user through several means, one being a linear search of the entire address
> space. This is an inefficient use of cpu cycles.
> 
> A component of the libhugetlb kernel test does exactly this, and as
> systems' address spaces increase beyond 32-bits, this method becomes
> exceedingly tedious.
> 
> For example, on a ppc64le system with a 47-bit address space, the linear
> search causes the test to hang for some unknown amount of time. I
> couldn't give you an exact number because I just ran it for about 10-20
> minutes and went to go do something else, probably to get coffee or
> something, and when I came back, I just killed the test and patched it
> to use this new mechanism. I re-ran my new version of the test using a
> kernel with this patch, and of course it passed through the previously
> bottlenecking codepath nearly instantaneously.
> 
> As such, I propose that the prctl syscall be extended to include the
> option to retrieve TASK_SIZE from the kernel.
> 
> This patch will allow us to upgrade an O(n) codepath to O(1) in an
> architecture-independent manner, and provide a mechanism for future
> generations to do the same.

So the only reason for the new API is boosting some random poorly
written userspace test? Why don't you introduce binary search instead?

Look at /proc/<pid>/maps. It may help to reduce the memory area to be
checked.
 
> Changes from v2:
>  We now account for the case of 32-bit compat userspace on a 64-bit kernel
>  More detail about the nature of TASK_SIZE in documentation
> 
> Joel Savitz(2):
>   sys/prctl: add PR_GET_TASK_SIZE option to prctl(2)
>   prctl.2: Document the new PR_GET_TASK_SIZE option
> 
>  include/uapi/linux/prctl.h |  3 +++
>  kernel/sys.c               | 23 +++++++++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
>  man2/prctl.2 | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> --
> 2.18.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)
  2019-05-03 18:10 ` [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2) Joel Savitz
@ 2019-05-03 21:08   ` Yury Norov
  2019-05-03 21:51     ` Rafael Aquini
  2019-05-03 22:14     ` Rafael Aquini
  2019-05-03 23:15   ` Jann Horn
  2019-05-04  6:56   ` Alexey Dobriyan
  2 siblings, 2 replies; 11+ messages in thread
From: Yury Norov @ 2019-05-03 21:08 UTC (permalink / raw)
  To: Joel Savitz
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Masami Hiramatsu,
	Waiman Long, Mauro Carvalho Chehab, Kristina Martsenko,
	Andrew Morton, Cyrill Gorcunov, Kees Cook, Gustavo A. R. Silva,
	YueHaibing, Micah Morton, Yang Shi, Jann Horn, Alexey Dobriyan,
	Rafael Aquini, Michael Kerrisk, David Laight

On Fri, May 03, 2019 at 02:10:20PM -0400, Joel Savitz wrote:
> When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
> copy the value of TASK_SIZE to the userspace address in arg2.
> 
> It is important that we account for the case of the userspace task
> running in 32-bit compat mode on a 64-bit kernel. As such, we must be
> careful to copy the correct number of bytes to userspace to avoid stack
> corruption.
> 
> Suggested-by: Yuri Norov <yury.norov@gmail.com>

I actually didn't suggest that. If you _really_ need TASK_SIZE to
be exposed, I would suggest to expose it in kernel headers. TASK_SIZE
is a compile-time information, and it may available for userspace at
compile time as well.

> Suggested-by: Alexey Dobriyan <adobriyan@gmail.com>
> Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> ---
>  include/uapi/linux/prctl.h |  3 +++
>  kernel/sys.c               | 23 +++++++++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 094bb03b9cc2..2c261c461952 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -229,4 +229,7 @@ struct prctl_mm_map {
>  # define PR_PAC_APDBKEY                        (1UL << 3)
>  # define PR_PAC_APGAKEY                        (1UL << 4)
> 
> +/* Get the process virtual memory size (i.e. the highest usable VM address) */
> +#define PR_GET_TASK_SIZE               55
> +
>  #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 12df0e5434b8..709584400070 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2252,6 +2252,26 @@ static int propagate_has_child_subreaper(struct task_struct *p, void *data)
>         return 1;
>  }
> 
> +static int prctl_get_tasksize(void __user *uaddr)
> +{
> +	unsigned long current_task_size, current_word_size;
> +
> +	current_task_size = TASK_SIZE;
> +	current_word_size = sizeof(unsigned long);
> +
> +#ifdef CONFIG_64BIT
> +	/* On 64-bit architecture, we must check whether the current thread
> +	 * is running in 32-bit compat mode. If it is, we can simply cut
> +	 * the size in half. This avoids corruption of the userspace stack.
> +	 */
> +	if (test_thread_flag(TIF_ADDR32))

It breaks build for all architectures except x86 since TIF_ADDR32 is
defined for x86 only.

In comment to v2 I suggested you to stick to fixed-size data type to
avoid exactly this problem.

NACK

Yury

> +		current_word_size >>= 1;
> +#endif
> +
> +	return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
> +}
> +
>  int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
>  {
>         return -EINVAL;
> @@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>                         return -EINVAL;
>                 error = PAC_RESET_KEYS(me, arg2);
>                 break;
> +	case PR_GET_TASK_SIZE:
> +		error = prctl_get_tasksize((void *)arg2);
> +		break;
>         default:
>                 error = -EINVAL;
>                 break;
> --
> 2.18.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)
  2019-05-03 21:08   ` Yury Norov
@ 2019-05-03 21:51     ` Rafael Aquini
  2019-05-03 22:14     ` Rafael Aquini
  1 sibling, 0 replies; 11+ messages in thread
From: Rafael Aquini @ 2019-05-03 21:51 UTC (permalink / raw)
  To: Yury Norov
  Cc: Joel Savitz, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Masami Hiramatsu, Waiman Long, Mauro Carvalho Chehab,
	Kristina Martsenko, Andrew Morton, Cyrill Gorcunov, Kees Cook,
	Gustavo A. R. Silva, YueHaibing, Micah Morton, Yang Shi,
	Jann Horn, Alexey Dobriyan, Michael Kerrisk, David Laight

On Fri, May 03, 2019 at 02:08:31PM -0700, Yury Norov wrote:
> On Fri, May 03, 2019 at 02:10:20PM -0400, Joel Savitz wrote:
> > When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
> > copy the value of TASK_SIZE to the userspace address in arg2.
> > 
> > It is important that we account for the case of the userspace task
> > running in 32-bit compat mode on a 64-bit kernel. As such, we must be
> > careful to copy the correct number of bytes to userspace to avoid stack
> > corruption.
> > 
> > Suggested-by: Yuri Norov <yury.norov@gmail.com>
> 
> I actually didn't suggest that. If you _really_ need TASK_SIZE to
> be exposed, I would suggest to expose it in kernel headers. TASK_SIZE
> is a compile-time information, and it may available for userspace at
> compile time as well.
> 

TASK_SIZE is a runtime resolved macro, dependent on the thread currently
running on the CPU. It's not a compile time constant.

Anyways, it's proven that going prctl(2), although interesting, as
suggested by Alexey, wasn't worth the hassle as it poses more issues 
than it can possibly solve. 

A better way to get this value exposed to userspace is really through
/proc/<pid>/status, where one can utilize TASK_SIZE_OF(mm->owner), or
simply mm->task_size, which seems to be properly assigned for each arch


> > Suggested-by: Alexey Dobriyan <adobriyan@gmail.com>
> > Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> > ---
> >  include/uapi/linux/prctl.h |  3 +++
> >  kernel/sys.c               | 23 +++++++++++++++++++++++
> >  2 files changed, 26 insertions(+)
> > 
> > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> > index 094bb03b9cc2..2c261c461952 100644
> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -229,4 +229,7 @@ struct prctl_mm_map {
> >  # define PR_PAC_APDBKEY                        (1UL << 3)
> >  # define PR_PAC_APGAKEY                        (1UL << 4)
> > 
> > +/* Get the process virtual memory size (i.e. the highest usable VM address) */
> > +#define PR_GET_TASK_SIZE               55
> > +
> >  #endif /* _LINUX_PRCTL_H */
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 12df0e5434b8..709584400070 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -2252,6 +2252,26 @@ static int propagate_has_child_subreaper(struct task_struct *p, void *data)
> >         return 1;
> >  }
> > 
> > +static int prctl_get_tasksize(void __user *uaddr)
> > +{
> > +	unsigned long current_task_size, current_word_size;
> > +
> > +	current_task_size = TASK_SIZE;
> > +	current_word_size = sizeof(unsigned long);
> > +
> > +#ifdef CONFIG_64BIT
> > +	/* On 64-bit architecture, we must check whether the current thread
> > +	 * is running in 32-bit compat mode. If it is, we can simply cut
> > +	 * the size in half. This avoids corruption of the userspace stack.
> > +	 */
> > +	if (test_thread_flag(TIF_ADDR32))
> 
> It breaks build for all architectures except x86 since TIF_ADDR32 is
> defined for x86 only.
> 
> In comment to v2 I suggested you to stick to fixed-size data type to
> avoid exactly this problem.
> 
> NACK
> 
> Yury
> 
> > +		current_word_size >>= 1;
> > +#endif
> > +
> > +	return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
> > +}
> > +
> >  int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
> >  {
> >         return -EINVAL;
> > @@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >                         return -EINVAL;
> >                 error = PAC_RESET_KEYS(me, arg2);
> >                 break;
> > +	case PR_GET_TASK_SIZE:
> > +		error = prctl_get_tasksize((void *)arg2);
> > +		break;
> >         default:
> >                 error = -EINVAL;
> >                 break;
> > --
> > 2.18.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace
  2019-05-03 20:49 ` [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace Yury Norov
@ 2019-05-03 21:57   ` Rafael Aquini
  2019-05-04  4:21     ` Yury Norov
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael Aquini @ 2019-05-03 21:57 UTC (permalink / raw)
  To: Yury Norov
  Cc: Joel Savitz, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Masami Hiramatsu, Waiman Long, Mauro Carvalho Chehab,
	Kristina Martsenko, Andrew Morton, Cyrill Gorcunov, Kees Cook,
	Gustavo A. R. Silva, YueHaibing, Micah Morton, Yang Shi,
	Jann Horn, Alexey Dobriyan, Michael Kerrisk, David Laight

On Fri, May 03, 2019 at 01:49:12PM -0700, Yury Norov wrote:
> On Fri, May 03, 2019 at 02:10:19PM -0400, Joel Savitz wrote:
> > In the mainline kernel, there is no quick mechanism to get the virtual
> > memory size of the current process from userspace.
> > 
> > Despite the current state of affairs, this information is available to the
> > user through several means, one being a linear search of the entire address
> > space. This is an inefficient use of cpu cycles.
> > 
> > A component of the libhugetlb kernel test does exactly this, and as
> > systems' address spaces increase beyond 32-bits, this method becomes
> > exceedingly tedious.
> > 
> > For example, on a ppc64le system with a 47-bit address space, the linear
> > search causes the test to hang for some unknown amount of time. I
> > couldn't give you an exact number because I just ran it for about 10-20
> > minutes and went to go do something else, probably to get coffee or
> > something, and when I came back, I just killed the test and patched it
> > to use this new mechanism. I re-ran my new version of the test using a
> > kernel with this patch, and of course it passed through the previously
> > bottlenecking codepath nearly instantaneously.
> > 
> > As such, I propose that the prctl syscall be extended to include the
> > option to retrieve TASK_SIZE from the kernel.
> > 
> > This patch will allow us to upgrade an O(n) codepath to O(1) in an
> > architecture-independent manner, and provide a mechanism for future
> > generations to do the same.
> 
> So the only reason for the new API is boosting some random poorly
> written userspace test? Why don't you introduce binary search instead?
>

there's no real cost in exposing the value that is known to the kernel,
anyways, as long as it's not a freaking hassle (like trying to go with
this prctl(2) stunt). We just need to get it properly exported alongside
other task's VM-related values at /proc/<pid>/status.
 
> Look at /proc/<pid>/maps. It may help to reduce the memory area to be
> checked.
>  
> > Changes from v2:
> >  We now account for the case of 32-bit compat userspace on a 64-bit kernel
> >  More detail about the nature of TASK_SIZE in documentation
> > 
> > Joel Savitz(2):
> >   sys/prctl: add PR_GET_TASK_SIZE option to prctl(2)
> >   prctl.2: Document the new PR_GET_TASK_SIZE option
> > 
> >  include/uapi/linux/prctl.h |  3 +++
> >  kernel/sys.c               | 23 +++++++++++++++++++++++
> >  2 files changed, 26 insertions(+)
> > 
> >  man2/prctl.2 | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > --
> > 2.18.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)
  2019-05-03 21:08   ` Yury Norov
  2019-05-03 21:51     ` Rafael Aquini
@ 2019-05-03 22:14     ` Rafael Aquini
  1 sibling, 0 replies; 11+ messages in thread
From: Rafael Aquini @ 2019-05-03 22:14 UTC (permalink / raw)
  To: Yury Norov
  Cc: Joel Savitz, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Masami Hiramatsu, Waiman Long, Mauro Carvalho Chehab,
	Kristina Martsenko, Andrew Morton, Cyrill Gorcunov, Kees Cook,
	Gustavo A. R. Silva, YueHaibing, Micah Morton, Yang Shi,
	Jann Horn, Alexey Dobriyan, Michael Kerrisk, David Laight

On Fri, May 03, 2019 at 02:08:31PM -0700, Yury Norov wrote:
> On Fri, May 03, 2019 at 02:10:20PM -0400, Joel Savitz wrote:
> > When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
> > copy the value of TASK_SIZE to the userspace address in arg2.
> > 
> > It is important that we account for the case of the userspace task
> > running in 32-bit compat mode on a 64-bit kernel. As such, we must be
> > careful to copy the correct number of bytes to userspace to avoid stack
> > corruption.
> > 
> > Suggested-by: Yuri Norov <yury.norov@gmail.com>
> 
> I actually didn't suggest that. If you _really_ need TASK_SIZE to
> be exposed, I would suggest to expose it in kernel headers. TASK_SIZE
> is a compile-time information, and it may available for userspace at
> compile time as well.
> 
> > Suggested-by: Alexey Dobriyan <adobriyan@gmail.com>
> > Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> > ---
> >  include/uapi/linux/prctl.h |  3 +++
> >  kernel/sys.c               | 23 +++++++++++++++++++++++
> >  2 files changed, 26 insertions(+)
> > 
> > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> > index 094bb03b9cc2..2c261c461952 100644
> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -229,4 +229,7 @@ struct prctl_mm_map {
> >  # define PR_PAC_APDBKEY                        (1UL << 3)
> >  # define PR_PAC_APGAKEY                        (1UL << 4)
> > 
> > +/* Get the process virtual memory size (i.e. the highest usable VM address) */
> > +#define PR_GET_TASK_SIZE               55
> > +
> >  #endif /* _LINUX_PRCTL_H */
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 12df0e5434b8..709584400070 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -2252,6 +2252,26 @@ static int propagate_has_child_subreaper(struct task_struct *p, void *data)
> >         return 1;
> >  }
> > 
> > +static int prctl_get_tasksize(void __user *uaddr)
> > +{
> > +	unsigned long current_task_size, current_word_size;
> > +
> > +	current_task_size = TASK_SIZE;
> > +	current_word_size = sizeof(unsigned long);
> > +
> > +#ifdef CONFIG_64BIT
> > +	/* On 64-bit architecture, we must check whether the current thread
> > +	 * is running in 32-bit compat mode. If it is, we can simply cut
> > +	 * the size in half. This avoids corruption of the userspace stack.
> > +	 */
> > +	if (test_thread_flag(TIF_ADDR32))
> 
> It breaks build for all architectures except x86 since TIF_ADDR32 is
> defined for x86 only.

Or we could get TIF_32BIT also defined for x86 (same value of
 TIF_ADDR32) and check for it instead. i.e.

...
#if defined(CONFIG_64BIT) && defined(TIF_32BIT)
	if (test_thread_flag(TIF_32BIT))
... 

which is also uglier and keeps adding unecessary complexity to a very
simple task. At this point, I think we just should give up on trying
this via prctl(2) and do it via /proc/<pid>/status instead. 


> 
> In comment to v2 I suggested you to stick to fixed-size data type to
> avoid exactly this problem.
> 
> NACK
> 
> Yury
> 
> > +		current_word_size >>= 1;
> > +#endif
> > +
> > +	return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
> > +}
> > +
> >  int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
> >  {
> >         return -EINVAL;
> > @@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >                         return -EINVAL;
> >                 error = PAC_RESET_KEYS(me, arg2);
> >                 break;
> > +	case PR_GET_TASK_SIZE:
> > +		error = prctl_get_tasksize((void *)arg2);
> > +		break;
> >         default:
> >                 error = -EINVAL;
> >                 break;
> > --
> > 2.18.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)
  2019-05-03 18:10 ` [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2) Joel Savitz
  2019-05-03 21:08   ` Yury Norov
@ 2019-05-03 23:15   ` Jann Horn
  2019-05-04  6:56   ` Alexey Dobriyan
  2 siblings, 0 replies; 11+ messages in thread
From: Jann Horn @ 2019-05-03 23:15 UTC (permalink / raw)
  To: Joel Savitz
  Cc: kernel list, Thomas Gleixner, Ingo Molnar, Masami Hiramatsu,
	Waiman Long, Mauro Carvalho Chehab, Kristina Martsenko,
	Andrew Morton, Cyrill Gorcunov, Kees Cook, Gustavo A. R. Silva,
	YueHaibing, Micah Morton, Yang Shi, Alexey Dobriyan,
	Rafael Aquini, Michael Kerrisk, Yury Norov, David Laight

On Fri, May 3, 2019 at 2:12 PM Joel Savitz <jsavitz@redhat.com> wrote:
> When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
> copy the value of TASK_SIZE to the userspace address in arg2.

A commit message shouldn't just describe what you're doing, but also
why you're doing it. Is this intended for processes that are running
on X86-64 and want to determine whether the system supports 5-level
paging, or something like that?

> +static int prctl_get_tasksize(void __user *uaddr)
> +{
> +       unsigned long current_task_size, current_word_size;
> +
> +       current_task_size = TASK_SIZE;
> +       current_word_size = sizeof(unsigned long);
> +
> +#ifdef CONFIG_64BIT
> +       /* On 64-bit architecture, we must check whether the current thread
> +        * is running in 32-bit compat mode. If it is, we can simply cut
> +        * the size in half. This avoids corruption of the userspace stack.
> +        */
> +       if (test_thread_flag(TIF_ADDR32))
> +               current_word_size >>= 1;
> +#endif
> +
> +       return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
> +}

This function looks completely wrong; in particular, you're assuming
that the architecture is little-endian.
Make the value a u64, and you won't have these problems:

static int prctl_get_tasksize(u64 __user *uaddr)
{
        return put_user(TASK_SIZE, uaddr) ? -EFAULT : 0;
}

A bunch of other new pieces of userspace API already use "u64" to
store userspace pointers and lengths to avoid compat issues.

> @@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>                         return -EINVAL;
>                 error = PAC_RESET_KEYS(me, arg2);
>                 break;
> +       case PR_GET_TASK_SIZE:
> +               error = prctl_get_tasksize((void *)arg2);

s/void */void __user */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace
  2019-05-03 21:57   ` Rafael Aquini
@ 2019-05-04  4:21     ` Yury Norov
  0 siblings, 0 replies; 11+ messages in thread
From: Yury Norov @ 2019-05-04  4:21 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: Joel Savitz, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Masami Hiramatsu, Waiman Long, Mauro Carvalho Chehab,
	Kristina Martsenko, Andrew Morton, Cyrill Gorcunov, Kees Cook,
	Gustavo A. R. Silva, YueHaibing, Micah Morton, Yang Shi,
	Jann Horn, Alexey Dobriyan, Michael Kerrisk, David Laight

On Fri, May 03, 2019 at 05:57:31PM -0400, Rafael Aquini wrote:
> On Fri, May 03, 2019 at 01:49:12PM -0700, Yury Norov wrote:
> > On Fri, May 03, 2019 at 02:10:19PM -0400, Joel Savitz wrote:
> > > In the mainline kernel, there is no quick mechanism to get the virtual
> > > memory size of the current process from userspace.
> > > 
> > > Despite the current state of affairs, this information is available to the
> > > user through several means, one being a linear search of the entire address
> > > space. This is an inefficient use of cpu cycles.
> > > 
> > > A component of the libhugetlb kernel test does exactly this, and as
> > > systems' address spaces increase beyond 32-bits, this method becomes
> > > exceedingly tedious.
> > > 
> > > For example, on a ppc64le system with a 47-bit address space, the linear
> > > search causes the test to hang for some unknown amount of time. I
> > > couldn't give you an exact number because I just ran it for about 10-20
> > > minutes and went to go do something else, probably to get coffee or
> > > something, and when I came back, I just killed the test and patched it
> > > to use this new mechanism. I re-ran my new version of the test using a
> > > kernel with this patch, and of course it passed through the previously
> > > bottlenecking codepath nearly instantaneously.
> > > 
> > > As such, I propose that the prctl syscall be extended to include the
> > > option to retrieve TASK_SIZE from the kernel.
> > > 
> > > This patch will allow us to upgrade an O(n) codepath to O(1) in an
> > > architecture-independent manner, and provide a mechanism for future
> > > generations to do the same.
> > 
> > So the only reason for the new API is boosting some random poorly
> > written userspace test? Why don't you introduce binary search instead?
> >
> 
> there's no real cost in exposing the value that is known to the kernel,

Really? We all here used to think that kernel programming is one of
the most difficult professions in the world. There is huge cost of
proper implementation of a feature, careful review, spread testing on
various platforms and long-term maintenance and support.

In this specific example of exposing TASK_SIZE your team made too much
things wrong to realize it, I hope.

> anyways, as long as it's not a freaking hassle (like trying to go with
> this prctl(2) stunt). We just need to get it properly exported alongside
> other task's VM-related values at /proc/<pid>/status.

I found this thread thrilling. Please keep me in CC with your
/proc/<pid>/status effort.

Yury
  
> > Look at /proc/<pid>/maps. It may help to reduce the memory area to be
> > checked.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)
  2019-05-03 18:10 ` [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2) Joel Savitz
  2019-05-03 21:08   ` Yury Norov
  2019-05-03 23:15   ` Jann Horn
@ 2019-05-04  6:56   ` Alexey Dobriyan
  2 siblings, 0 replies; 11+ messages in thread
From: Alexey Dobriyan @ 2019-05-04  6:56 UTC (permalink / raw)
  To: Joel Savitz
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Masami Hiramatsu,
	Waiman Long, Mauro Carvalho Chehab, Kristina Martsenko,
	Andrew Morton, Cyrill Gorcunov, Kees Cook, Gustavo A. R. Silva,
	YueHaibing, Micah Morton, Yang Shi, Jann Horn, Rafael Aquini,
	Michael Kerrisk, Yury Norov, David Laight

On Fri, May 03, 2019 at 02:10:20PM -0400, Joel Savitz wrote:
> +/* Get the process virtual memory size (i.e. the highest usable VM address) */
> +#define PR_GET_TASK_SIZE               55

TASK_SIZE is in fact the lowest _un_usable address. :^)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-05-04  6:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-03 18:10 [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace Joel Savitz
2019-05-03 18:10 ` [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2) Joel Savitz
2019-05-03 21:08   ` Yury Norov
2019-05-03 21:51     ` Rafael Aquini
2019-05-03 22:14     ` Rafael Aquini
2019-05-03 23:15   ` Jann Horn
2019-05-04  6:56   ` Alexey Dobriyan
2019-05-03 18:10 ` [PATCH v3 2/2] prctl.2: Document the new PR_GET_TASK_SIZE option Joel Savitz
2019-05-03 20:49 ` [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace Yury Norov
2019-05-03 21:57   ` Rafael Aquini
2019-05-04  4:21     ` Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).