All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
@ 2021-03-03 18:58 ` Suren Baghdasaryan
  0 siblings, 0 replies; 14+ messages in thread
From: Suren Baghdasaryan @ 2021-03-03 18:58 UTC (permalink / raw)
  To: akpm
  Cc: jannh, keescook, jeffv, minchan, mhocko, shakeelb, rientjes,
	edgararriaga, timmurray, fweimer, oleg, jmorris, linux-mm,
	selinux, linux-api, linux-security-module, stable, linux-kernel,
	kernel-team, surenb

process_madvise currently requires ptrace attach capability.
PTRACE_MODE_ATTACH gives one process complete control over another
process. It effectively removes the security boundary between the
two processes (in one direction). Granting ptrace attach capability
even to a system process is considered dangerous since it creates an
attack surface. This severely limits the usage of this API.
The operations process_madvise can perform do not affect the correctness
of the operation of the target process; they only affect where the data
is physically located (and therefore, how fast it can be accessed).
What we want is the ability for one process to influence another process
in order to optimize performance across the entire system while leaving
the security boundary intact.
Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
and CAP_SYS_NICE for influencing process performance.

Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
---
changes in v3
- Added Reviewed-by: Kees Cook <keescook@chromium.org>
- Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
- cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
- cc'ed linux-security-module@vger.kernel.org per James Morris's request

 mm/madvise.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index df692d2e35d4..01fef79ac761 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
 		goto release_task;
 	}
 
-	mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
+	/* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
+	mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
 	if (IS_ERR_OR_NULL(mm)) {
 		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
 		goto release_task;
 	}
 
+	/*
+	 * Require CAP_SYS_NICE for influencing process performance. Note that
+	 * only non-destructive hints are currently supported.
+	 */
+	if (!capable(CAP_SYS_NICE)) {
+		ret = -EPERM;
+		goto release_mm;
+	}
+
 	total_len = iov_iter_count(&iter);
 
 	while (iov_iter_count(&iter)) {
@@ -1218,6 +1228,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
 	if (ret == 0)
 		ret = total_len - iov_iter_count(&iter);
 
+release_mm:
 	mmput(mm);
 release_task:
 	put_task_struct(task);
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
@ 2021-03-03 18:58 ` Suren Baghdasaryan
  0 siblings, 0 replies; 14+ messages in thread
From: Suren Baghdasaryan @ 2021-03-03 18:58 UTC (permalink / raw)
  To: akpm
  Cc: jannh, keescook, jeffv, minchan, mhocko, shakeelb, rientjes,
	edgararriaga, timmurray, fweimer, oleg, jmorris, linux-mm,
	selinux, linux-api, linux-security-module, stable, linux-kernel,
	kernel-team, surenb

process_madvise currently requires ptrace attach capability.
PTRACE_MODE_ATTACH gives one process complete control over another
process. It effectively removes the security boundary between the
two processes (in one direction). Granting ptrace attach capability
even to a system process is considered dangerous since it creates an
attack surface. This severely limits the usage of this API.
The operations process_madvise can perform do not affect the correctness
of the operation of the target process; they only affect where the data
is physically located (and therefore, how fast it can be accessed).
What we want is the ability for one process to influence another process
in order to optimize performance across the entire system while leaving
the security boundary intact.
Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
and CAP_SYS_NICE for influencing process performance.

Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
---
changes in v3
- Added Reviewed-by: Kees Cook <keescook@chromium.org>
- Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
- cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
- cc'ed linux-security-module@vger.kernel.org per James Morris's request

 mm/madvise.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index df692d2e35d4..01fef79ac761 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
 		goto release_task;
 	}
 
-	mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
+	/* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
+	mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
 	if (IS_ERR_OR_NULL(mm)) {
 		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
 		goto release_task;
 	}
 
+	/*
+	 * Require CAP_SYS_NICE for influencing process performance. Note that
+	 * only non-destructive hints are currently supported.
+	 */
+	if (!capable(CAP_SYS_NICE)) {
+		ret = -EPERM;
+		goto release_mm;
+	}
+
 	total_len = iov_iter_count(&iter);
 
 	while (iov_iter_count(&iter)) {
@@ -1218,6 +1228,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
 	if (ret == 0)
 		ret = total_len - iov_iter_count(&iter);
 
+release_mm:
 	mmput(mm);
 release_task:
 	put_task_struct(task);
-- 
2.30.1.766.gb4fecdf3b7-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-03 18:58 ` Suren Baghdasaryan
@ 2021-03-03 23:17   ` Shakeel Butt
  -1 siblings, 0 replies; 14+ messages in thread
From: Shakeel Butt @ 2021-03-03 23:17 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, Jann Horn, Kees Cook, jeffv, Minchan Kim,
	Michal Hocko, David Rientjes, edgararriaga, Tim Murray, fweimer,
	oleg, jmorris, Linux MM, selinux, linux-api,
	linux-security-module, stable, LKML, kernel-team

On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> process_madvise currently requires ptrace attach capability.
> PTRACE_MODE_ATTACH gives one process complete control over another
> process. It effectively removes the security boundary between the
> two processes (in one direction). Granting ptrace attach capability
> even to a system process is considered dangerous since it creates an
> attack surface. This severely limits the usage of this API.
> The operations process_madvise can perform do not affect the correctness
> of the operation of the target process; they only affect where the data
> is physically located (and therefore, how fast it can be accessed).
> What we want is the ability for one process to influence another process
> in order to optimize performance across the entire system while leaving
> the security boundary intact.
> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> and CAP_SYS_NICE for influencing process performance.
>
> Cc: stable@vger.kernel.org # 5.10+
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Acked-by: Minchan Kim <minchan@kernel.org>
> Acked-by: David Rientjes <rientjes@google.com>
> ---
> changes in v3
> - Added Reviewed-by: Kees Cook <keescook@chromium.org>
> - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
> - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
> - cc'ed linux-security-module@vger.kernel.org per James Morris's request
>
>  mm/madvise.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index df692d2e35d4..01fef79ac761 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>                 goto release_task;
>         }
>
> -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
>         if (IS_ERR_OR_NULL(mm)) {
>                 ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>                 goto release_task;
>         }
>
> +       /*
> +        * Require CAP_SYS_NICE for influencing process performance. Note that
> +        * only non-destructive hints are currently supported.

How is non-destructive defined? Is MADV_DONTNEED non-destructive?

> +        */
> +       if (!capable(CAP_SYS_NICE)) {
> +               ret = -EPERM;
> +               goto release_mm;
> +       }
> +
>         total_len = iov_iter_count(&iter);
>
>         while (iov_iter_count(&iter)) {
> @@ -1218,6 +1228,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>         if (ret == 0)
>                 ret = total_len - iov_iter_count(&iter);
>
> +release_mm:
>         mmput(mm);
>  release_task:
>         put_task_struct(task);
> --
> 2.30.1.766.gb4fecdf3b7-goog
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
@ 2021-03-03 23:17   ` Shakeel Butt
  0 siblings, 0 replies; 14+ messages in thread
From: Shakeel Butt @ 2021-03-03 23:17 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, Jann Horn, Kees Cook, jeffv, Minchan Kim,
	Michal Hocko, David Rientjes, edgararriaga, Tim Murray, fweimer,
	oleg, jmorris, Linux MM, selinux, linux-api,
	linux-security-module, stable, LKML, kernel-team

On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> process_madvise currently requires ptrace attach capability.
> PTRACE_MODE_ATTACH gives one process complete control over another
> process. It effectively removes the security boundary between the
> two processes (in one direction). Granting ptrace attach capability
> even to a system process is considered dangerous since it creates an
> attack surface. This severely limits the usage of this API.
> The operations process_madvise can perform do not affect the correctness
> of the operation of the target process; they only affect where the data
> is physically located (and therefore, how fast it can be accessed).
> What we want is the ability for one process to influence another process
> in order to optimize performance across the entire system while leaving
> the security boundary intact.
> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> and CAP_SYS_NICE for influencing process performance.
>
> Cc: stable@vger.kernel.org # 5.10+
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Acked-by: Minchan Kim <minchan@kernel.org>
> Acked-by: David Rientjes <rientjes@google.com>
> ---
> changes in v3
> - Added Reviewed-by: Kees Cook <keescook@chromium.org>
> - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
> - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
> - cc'ed linux-security-module@vger.kernel.org per James Morris's request
>
>  mm/madvise.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index df692d2e35d4..01fef79ac761 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>                 goto release_task;
>         }
>
> -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
>         if (IS_ERR_OR_NULL(mm)) {
>                 ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>                 goto release_task;
>         }
>
> +       /*
> +        * Require CAP_SYS_NICE for influencing process performance. Note that
> +        * only non-destructive hints are currently supported.

How is non-destructive defined? Is MADV_DONTNEED non-destructive?

> +        */
> +       if (!capable(CAP_SYS_NICE)) {
> +               ret = -EPERM;
> +               goto release_mm;
> +       }
> +
>         total_len = iov_iter_count(&iter);
>
>         while (iov_iter_count(&iter)) {
> @@ -1218,6 +1228,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>         if (ret == 0)
>                 ret = total_len - iov_iter_count(&iter);
>
> +release_mm:
>         mmput(mm);
>  release_task:
>         put_task_struct(task);
> --
> 2.30.1.766.gb4fecdf3b7-goog
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-03 23:17   ` Shakeel Butt
  (?)
@ 2021-03-03 23:34   ` Suren Baghdasaryan
  2021-03-04  0:03     ` Shakeel Butt
  -1 siblings, 1 reply; 14+ messages in thread
From: Suren Baghdasaryan @ 2021-03-03 23:34 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Jann Horn, Kees Cook, Jeffrey Vander Stoep,
	Minchan Kim, Michal Hocko, David Rientjes,
	Edgar Arriaga García, Tim Murray, Florian Weimer,
	Oleg Nesterov, James Morris, Linux MM, SElinux list, Linux API,
	linux-security-module, stable, LKML, kernel-team

On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > process_madvise currently requires ptrace attach capability.
> > PTRACE_MODE_ATTACH gives one process complete control over another
> > process. It effectively removes the security boundary between the
> > two processes (in one direction). Granting ptrace attach capability
> > even to a system process is considered dangerous since it creates an
> > attack surface. This severely limits the usage of this API.
> > The operations process_madvise can perform do not affect the correctness
> > of the operation of the target process; they only affect where the data
> > is physically located (and therefore, how fast it can be accessed).
> > What we want is the ability for one process to influence another process
> > in order to optimize performance across the entire system while leaving
> > the security boundary intact.
> > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> > and CAP_SYS_NICE for influencing process performance.
> >
> > Cc: stable@vger.kernel.org # 5.10+
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > Acked-by: Minchan Kim <minchan@kernel.org>
> > Acked-by: David Rientjes <rientjes@google.com>
> > ---
> > changes in v3
> > - Added Reviewed-by: Kees Cook <keescook@chromium.org>
> > - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
> > - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
> > - cc'ed linux-security-module@vger.kernel.org per James Morris's request
> >
> >  mm/madvise.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index df692d2e35d4..01fef79ac761 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> >                 goto release_task;
> >         }
> >
> > -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> >         if (IS_ERR_OR_NULL(mm)) {
> >                 ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> >                 goto release_task;
> >         }
> >
> > +       /*
> > +        * Require CAP_SYS_NICE for influencing process performance. Note that
> > +        * only non-destructive hints are currently supported.
>
> How is non-destructive defined? Is MADV_DONTNEED non-destructive?

Non-destructive in this context means the data is not lost and can be
recovered. I follow the logic described in
https://lwn.net/Articles/794704/ where Minchan was introducing
MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
would be considered destructive hints.
Note that process_madvise_behavior_valid() allows only MADV_COLD and
MADV_PAGEOUT at the moment, which are both non-destructive.

>
> > +        */
> > +       if (!capable(CAP_SYS_NICE)) {
> > +               ret = -EPERM;
> > +               goto release_mm;
> > +       }
> > +
> >         total_len = iov_iter_count(&iter);
> >
> >         while (iov_iter_count(&iter)) {
> > @@ -1218,6 +1228,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> >         if (ret == 0)
> >                 ret = total_len - iov_iter_count(&iter);
> >
> > +release_mm:
> >         mmput(mm);
> >  release_task:
> >         put_task_struct(task);
> > --
> > 2.30.1.766.gb4fecdf3b7-goog
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-03 23:34   ` Suren Baghdasaryan
@ 2021-03-04  0:03     ` Shakeel Butt
  2021-03-04  1:17       ` Suren Baghdasaryan
  2021-03-05 17:37       ` David Hildenbrand
  0 siblings, 2 replies; 14+ messages in thread
From: Shakeel Butt @ 2021-03-04  0:03 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, Jann Horn, Kees Cook, Jeffrey Vander Stoep,
	Minchan Kim, Michal Hocko, David Rientjes,
	Edgar Arriaga García, Tim Murray, Florian Weimer,
	Oleg Nesterov, James Morris, Linux MM, SElinux list, Linux API,
	linux-security-module, stable, LKML, kernel-team

On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > process_madvise currently requires ptrace attach capability.
> > > PTRACE_MODE_ATTACH gives one process complete control over another
> > > process. It effectively removes the security boundary between the
> > > two processes (in one direction). Granting ptrace attach capability
> > > even to a system process is considered dangerous since it creates an
> > > attack surface. This severely limits the usage of this API.
> > > The operations process_madvise can perform do not affect the correctness
> > > of the operation of the target process; they only affect where the data
> > > is physically located (and therefore, how fast it can be accessed).
> > > What we want is the ability for one process to influence another process
> > > in order to optimize performance across the entire system while leaving
> > > the security boundary intact.
> > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> > > and CAP_SYS_NICE for influencing process performance.
> > >
> > > Cc: stable@vger.kernel.org # 5.10+
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > Reviewed-by: Kees Cook <keescook@chromium.org>
> > > Acked-by: Minchan Kim <minchan@kernel.org>
> > > Acked-by: David Rientjes <rientjes@google.com>
> > > ---
> > > changes in v3
> > > - Added Reviewed-by: Kees Cook <keescook@chromium.org>
> > > - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
> > > - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
> > > - cc'ed linux-security-module@vger.kernel.org per James Morris's request
> > >
> > >  mm/madvise.c | 13 ++++++++++++-
> > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index df692d2e35d4..01fef79ac761 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > >                 goto release_task;
> > >         }
> > >
> > > -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > > +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > > +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > >         if (IS_ERR_OR_NULL(mm)) {
> > >                 ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > >                 goto release_task;
> > >         }
> > >
> > > +       /*
> > > +        * Require CAP_SYS_NICE for influencing process performance. Note that
> > > +        * only non-destructive hints are currently supported.
> >
> > How is non-destructive defined? Is MADV_DONTNEED non-destructive?
>
> Non-destructive in this context means the data is not lost and can be
> recovered. I follow the logic described in
> https://lwn.net/Articles/794704/ where Minchan was introducing
> MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
> and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
> would be considered destructive hints.
> Note that process_madvise_behavior_valid() allows only MADV_COLD and
> MADV_PAGEOUT at the moment, which are both non-destructive.
>

There is a plan to support MADV_DONTNEED for this syscall. Do we need
to change these access checks again with that support?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-04  0:03     ` Shakeel Butt
@ 2021-03-04  1:17       ` Suren Baghdasaryan
  2021-03-05 17:37       ` David Hildenbrand
  1 sibling, 0 replies; 14+ messages in thread
From: Suren Baghdasaryan @ 2021-03-04  1:17 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Jann Horn, Kees Cook, Jeffrey Vander Stoep,
	Minchan Kim, Michal Hocko, David Rientjes,
	Edgar Arriaga García, Tim Murray, Florian Weimer,
	Oleg Nesterov, James Morris, Linux MM, SElinux list, Linux API,
	linux-security-module, stable, LKML, kernel-team

On Wed, Mar 3, 2021 at 4:04 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > process_madvise currently requires ptrace attach capability.
> > > > PTRACE_MODE_ATTACH gives one process complete control over another
> > > > process. It effectively removes the security boundary between the
> > > > two processes (in one direction). Granting ptrace attach capability
> > > > even to a system process is considered dangerous since it creates an
> > > > attack surface. This severely limits the usage of this API.
> > > > The operations process_madvise can perform do not affect the correctness
> > > > of the operation of the target process; they only affect where the data
> > > > is physically located (and therefore, how fast it can be accessed).
> > > > What we want is the ability for one process to influence another process
> > > > in order to optimize performance across the entire system while leaving
> > > > the security boundary intact.
> > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> > > > and CAP_SYS_NICE for influencing process performance.
> > > >
> > > > Cc: stable@vger.kernel.org # 5.10+
> > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > Reviewed-by: Kees Cook <keescook@chromium.org>
> > > > Acked-by: Minchan Kim <minchan@kernel.org>
> > > > Acked-by: David Rientjes <rientjes@google.com>
> > > > ---
> > > > changes in v3
> > > > - Added Reviewed-by: Kees Cook <keescook@chromium.org>
> > > > - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
> > > > - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
> > > > - cc'ed linux-security-module@vger.kernel.org per James Morris's request
> > > >
> > > >  mm/madvise.c | 13 ++++++++++++-
> > > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > index df692d2e35d4..01fef79ac761 100644
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > > >                 goto release_task;
> > > >         }
> > > >
> > > > -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > > > +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > > > +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > > >         if (IS_ERR_OR_NULL(mm)) {
> > > >                 ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > > >                 goto release_task;
> > > >         }
> > > >
> > > > +       /*
> > > > +        * Require CAP_SYS_NICE for influencing process performance. Note that
> > > > +        * only non-destructive hints are currently supported.
> > >
> > > How is non-destructive defined? Is MADV_DONTNEED non-destructive?
> >
> > Non-destructive in this context means the data is not lost and can be
> > recovered. I follow the logic described in
> > https://lwn.net/Articles/794704/ where Minchan was introducing
> > MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
> > and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
> > would be considered destructive hints.
> > Note that process_madvise_behavior_valid() allows only MADV_COLD and
> > MADV_PAGEOUT at the moment, which are both non-destructive.
> >
>
> There is a plan to support MADV_DONTNEED for this syscall. Do we need
> to change these access checks again with that support?

I think so. Destructive hints affect the data, so we will probably
need stricter checks for those hints.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-04  0:03     ` Shakeel Butt
  2021-03-04  1:17       ` Suren Baghdasaryan
@ 2021-03-05 17:37       ` David Hildenbrand
  2021-03-05 17:45         ` Shakeel Butt
  1 sibling, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2021-03-05 17:37 UTC (permalink / raw)
  To: Shakeel Butt, Suren Baghdasaryan
  Cc: Andrew Morton, Jann Horn, Kees Cook, Jeffrey Vander Stoep,
	Minchan Kim, Michal Hocko, David Rientjes,
	Edgar Arriaga García, Tim Murray, Florian Weimer,
	Oleg Nesterov, James Morris, Linux MM, SElinux list, Linux API,
	linux-security-module, stable, LKML, kernel-team

On 04.03.21 01:03, Shakeel Butt wrote:
> On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
>>>
>>> On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>
>>>> process_madvise currently requires ptrace attach capability.
>>>> PTRACE_MODE_ATTACH gives one process complete control over another
>>>> process. It effectively removes the security boundary between the
>>>> two processes (in one direction). Granting ptrace attach capability
>>>> even to a system process is considered dangerous since it creates an
>>>> attack surface. This severely limits the usage of this API.
>>>> The operations process_madvise can perform do not affect the correctness
>>>> of the operation of the target process; they only affect where the data
>>>> is physically located (and therefore, how fast it can be accessed).
>>>> What we want is the ability for one process to influence another process
>>>> in order to optimize performance across the entire system while leaving
>>>> the security boundary intact.
>>>> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
>>>> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
>>>> and CAP_SYS_NICE for influencing process performance.
>>>>
>>>> Cc: stable@vger.kernel.org # 5.10+
>>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>>>> Reviewed-by: Kees Cook <keescook@chromium.org>
>>>> Acked-by: Minchan Kim <minchan@kernel.org>
>>>> Acked-by: David Rientjes <rientjes@google.com>
>>>> ---
>>>> changes in v3
>>>> - Added Reviewed-by: Kees Cook <keescook@chromium.org>
>>>> - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
>>>> - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
>>>> - cc'ed linux-security-module@vger.kernel.org per James Morris's request
>>>>
>>>>   mm/madvise.c | 13 ++++++++++++-
>>>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>> index df692d2e35d4..01fef79ac761 100644
>>>> --- a/mm/madvise.c
>>>> +++ b/mm/madvise.c
>>>> @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>>>>                  goto release_task;
>>>>          }
>>>>
>>>> -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
>>>> +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
>>>> +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
>>>>          if (IS_ERR_OR_NULL(mm)) {
>>>>                  ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>>>>                  goto release_task;
>>>>          }
>>>>
>>>> +       /*
>>>> +        * Require CAP_SYS_NICE for influencing process performance. Note that
>>>> +        * only non-destructive hints are currently supported.
>>>
>>> How is non-destructive defined? Is MADV_DONTNEED non-destructive?
>>
>> Non-destructive in this context means the data is not lost and can be
>> recovered. I follow the logic described in
>> https://lwn.net/Articles/794704/ where Minchan was introducing
>> MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
>> and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
>> would be considered destructive hints.
>> Note that process_madvise_behavior_valid() allows only MADV_COLD and
>> MADV_PAGEOUT at the moment, which are both non-destructive.
>>
> 
> There is a plan to support MADV_DONTNEED for this syscall. Do we need
> to change these access checks again with that support?

Eh, I absolutely don't think letting another process discard memory in 
another process' address space is a good idea. The target process can 
observe that easily and might even run into real issues.

What's the use case?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-05 17:37       ` David Hildenbrand
@ 2021-03-05 17:45         ` Shakeel Butt
  2021-03-05 17:52           ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Shakeel Butt @ 2021-03-05 17:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Suren Baghdasaryan, Andrew Morton, Jann Horn, Kees Cook,
	Jeffrey Vander Stoep, Minchan Kim, Michal Hocko, David Rientjes,
	Edgar Arriaga García, Tim Murray, Florian Weimer,
	Oleg Nesterov, James Morris, Linux MM, SElinux list, Linux API,
	linux-security-module, stable, LKML, kernel-team

On Fri, Mar 5, 2021 at 9:37 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 04.03.21 01:03, Shakeel Butt wrote:
> > On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>
> >> On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
> >>>
> >>> On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>>
> >>>> process_madvise currently requires ptrace attach capability.
> >>>> PTRACE_MODE_ATTACH gives one process complete control over another
> >>>> process. It effectively removes the security boundary between the
> >>>> two processes (in one direction). Granting ptrace attach capability
> >>>> even to a system process is considered dangerous since it creates an
> >>>> attack surface. This severely limits the usage of this API.
> >>>> The operations process_madvise can perform do not affect the correctness
> >>>> of the operation of the target process; they only affect where the data
> >>>> is physically located (and therefore, how fast it can be accessed).
> >>>> What we want is the ability for one process to influence another process
> >>>> in order to optimize performance across the entire system while leaving
> >>>> the security boundary intact.
> >>>> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> >>>> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> >>>> and CAP_SYS_NICE for influencing process performance.
> >>>>
> >>>> Cc: stable@vger.kernel.org # 5.10+
> >>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> >>>> Reviewed-by: Kees Cook <keescook@chromium.org>
> >>>> Acked-by: Minchan Kim <minchan@kernel.org>
> >>>> Acked-by: David Rientjes <rientjes@google.com>
> >>>> ---
> >>>> changes in v3
> >>>> - Added Reviewed-by: Kees Cook <keescook@chromium.org>
> >>>> - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
> >>>> - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
> >>>> - cc'ed linux-security-module@vger.kernel.org per James Morris's request
> >>>>
> >>>>   mm/madvise.c | 13 ++++++++++++-
> >>>>   1 file changed, 12 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/mm/madvise.c b/mm/madvise.c
> >>>> index df692d2e35d4..01fef79ac761 100644
> >>>> --- a/mm/madvise.c
> >>>> +++ b/mm/madvise.c
> >>>> @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> >>>>                  goto release_task;
> >>>>          }
> >>>>
> >>>> -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> >>>> +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> >>>> +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> >>>>          if (IS_ERR_OR_NULL(mm)) {
> >>>>                  ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> >>>>                  goto release_task;
> >>>>          }
> >>>>
> >>>> +       /*
> >>>> +        * Require CAP_SYS_NICE for influencing process performance. Note that
> >>>> +        * only non-destructive hints are currently supported.
> >>>
> >>> How is non-destructive defined? Is MADV_DONTNEED non-destructive?
> >>
> >> Non-destructive in this context means the data is not lost and can be
> >> recovered. I follow the logic described in
> >> https://lwn.net/Articles/794704/ where Minchan was introducing
> >> MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
> >> and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
> >> would be considered destructive hints.
> >> Note that process_madvise_behavior_valid() allows only MADV_COLD and
> >> MADV_PAGEOUT at the moment, which are both non-destructive.
> >>
> >
> > There is a plan to support MADV_DONTNEED for this syscall. Do we need
> > to change these access checks again with that support?
>
> Eh, I absolutely don't think letting another process discard memory in
> another process' address space is a good idea. The target process can
> observe that easily and might even run into real issues.
>
> What's the use case?
>

Userspace oom reaper. Please look at
https://lore.kernel.org/linux-api/20201014183943.GA1489464@google.com/T/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-05 17:45         ` Shakeel Butt
@ 2021-03-05 17:52           ` David Hildenbrand
  2021-03-05 18:08             ` Suren Baghdasaryan
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2021-03-05 17:52 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Suren Baghdasaryan, Andrew Morton, Jann Horn, Kees Cook,
	Jeffrey Vander Stoep, Minchan Kim, Michal Hocko, David Rientjes,
	Edgar Arriaga García, Tim Murray, Florian Weimer,
	Oleg Nesterov, James Morris, Linux MM, SElinux list, Linux API,
	linux-security-module, stable, LKML, kernel-team

On 05.03.21 18:45, Shakeel Butt wrote:
> On Fri, Mar 5, 2021 at 9:37 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 04.03.21 01:03, Shakeel Butt wrote:
>>> On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>
>>>> On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
>>>>>
>>>>> On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>>>
>>>>>> process_madvise currently requires ptrace attach capability.
>>>>>> PTRACE_MODE_ATTACH gives one process complete control over another
>>>>>> process. It effectively removes the security boundary between the
>>>>>> two processes (in one direction). Granting ptrace attach capability
>>>>>> even to a system process is considered dangerous since it creates an
>>>>>> attack surface. This severely limits the usage of this API.
>>>>>> The operations process_madvise can perform do not affect the correctness
>>>>>> of the operation of the target process; they only affect where the data
>>>>>> is physically located (and therefore, how fast it can be accessed).
>>>>>> What we want is the ability for one process to influence another process
>>>>>> in order to optimize performance across the entire system while leaving
>>>>>> the security boundary intact.
>>>>>> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
>>>>>> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
>>>>>> and CAP_SYS_NICE for influencing process performance.
>>>>>>
>>>>>> Cc: stable@vger.kernel.org # 5.10+
>>>>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>>>>>> Reviewed-by: Kees Cook <keescook@chromium.org>
>>>>>> Acked-by: Minchan Kim <minchan@kernel.org>
>>>>>> Acked-by: David Rientjes <rientjes@google.com>
>>>>>> ---
>>>>>> changes in v3
>>>>>> - Added Reviewed-by: Kees Cook <keescook@chromium.org>
>>>>>> - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
>>>>>> - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
>>>>>> - cc'ed linux-security-module@vger.kernel.org per James Morris's request
>>>>>>
>>>>>>    mm/madvise.c | 13 ++++++++++++-
>>>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>>>> index df692d2e35d4..01fef79ac761 100644
>>>>>> --- a/mm/madvise.c
>>>>>> +++ b/mm/madvise.c
>>>>>> @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>>>>>>                   goto release_task;
>>>>>>           }
>>>>>>
>>>>>> -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
>>>>>> +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
>>>>>> +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
>>>>>>           if (IS_ERR_OR_NULL(mm)) {
>>>>>>                   ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>>>>>>                   goto release_task;
>>>>>>           }
>>>>>>
>>>>>> +       /*
>>>>>> +        * Require CAP_SYS_NICE for influencing process performance. Note that
>>>>>> +        * only non-destructive hints are currently supported.
>>>>>
>>>>> How is non-destructive defined? Is MADV_DONTNEED non-destructive?
>>>>
>>>> Non-destructive in this context means the data is not lost and can be
>>>> recovered. I follow the logic described in
>>>> https://lwn.net/Articles/794704/ where Minchan was introducing
>>>> MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
>>>> and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
>>>> would be considered destructive hints.
>>>> Note that process_madvise_behavior_valid() allows only MADV_COLD and
>>>> MADV_PAGEOUT at the moment, which are both non-destructive.
>>>>
>>>
>>> There is a plan to support MADV_DONTNEED for this syscall. Do we need
>>> to change these access checks again with that support?
>>
>> Eh, I absolutely don't think letting another process discard memory in
>> another process' address space is a good idea. The target process can
>> observe that easily and might even run into real issues.
>>
>> What's the use case?
>>
> 
> Userspace oom reaper. Please look at
> https://lore.kernel.org/linux-api/20201014183943.GA1489464@google.com/T/
> 

Thanks, somehow I missed that (not that it really changed my opinion on 
the approach while skimming over the discussion :) will have a more 
detailed look)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-05 17:52           ` David Hildenbrand
@ 2021-03-05 18:08             ` Suren Baghdasaryan
  2021-03-05 18:22               ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Suren Baghdasaryan @ 2021-03-05 18:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Shakeel Butt, Andrew Morton, Jann Horn, Kees Cook,
	Jeffrey Vander Stoep, Minchan Kim, Michal Hocko, David Rientjes,
	Edgar Arriaga García, Tim Murray, Florian Weimer,
	Oleg Nesterov, James Morris, Linux MM, SElinux list, Linux API,
	linux-security-module, stable, LKML, kernel-team

On Fri, Mar 5, 2021 at 9:52 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 05.03.21 18:45, Shakeel Butt wrote:
> > On Fri, Mar 5, 2021 at 9:37 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 04.03.21 01:03, Shakeel Butt wrote:
> >>> On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>>
> >>>> On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
> >>>>>
> >>>>> On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>>>>
> >>>>>> process_madvise currently requires ptrace attach capability.
> >>>>>> PTRACE_MODE_ATTACH gives one process complete control over another
> >>>>>> process. It effectively removes the security boundary between the
> >>>>>> two processes (in one direction). Granting ptrace attach capability
> >>>>>> even to a system process is considered dangerous since it creates an
> >>>>>> attack surface. This severely limits the usage of this API.
> >>>>>> The operations process_madvise can perform do not affect the correctness
> >>>>>> of the operation of the target process; they only affect where the data
> >>>>>> is physically located (and therefore, how fast it can be accessed).
> >>>>>> What we want is the ability for one process to influence another process
> >>>>>> in order to optimize performance across the entire system while leaving
> >>>>>> the security boundary intact.
> >>>>>> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> >>>>>> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> >>>>>> and CAP_SYS_NICE for influencing process performance.
> >>>>>>
> >>>>>> Cc: stable@vger.kernel.org # 5.10+
> >>>>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> >>>>>> Reviewed-by: Kees Cook <keescook@chromium.org>
> >>>>>> Acked-by: Minchan Kim <minchan@kernel.org>
> >>>>>> Acked-by: David Rientjes <rientjes@google.com>
> >>>>>> ---
> >>>>>> changes in v3
> >>>>>> - Added Reviewed-by: Kees Cook <keescook@chromium.org>
> >>>>>> - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
> >>>>>> - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
> >>>>>> - cc'ed linux-security-module@vger.kernel.org per James Morris's request
> >>>>>>
> >>>>>>    mm/madvise.c | 13 ++++++++++++-
> >>>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/mm/madvise.c b/mm/madvise.c
> >>>>>> index df692d2e35d4..01fef79ac761 100644
> >>>>>> --- a/mm/madvise.c
> >>>>>> +++ b/mm/madvise.c
> >>>>>> @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> >>>>>>                   goto release_task;
> >>>>>>           }
> >>>>>>
> >>>>>> -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> >>>>>> +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> >>>>>> +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> >>>>>>           if (IS_ERR_OR_NULL(mm)) {
> >>>>>>                   ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> >>>>>>                   goto release_task;
> >>>>>>           }
> >>>>>>
> >>>>>> +       /*
> >>>>>> +        * Require CAP_SYS_NICE for influencing process performance. Note that
> >>>>>> +        * only non-destructive hints are currently supported.
> >>>>>
> >>>>> How is non-destructive defined? Is MADV_DONTNEED non-destructive?
> >>>>
> >>>> Non-destructive in this context means the data is not lost and can be
> >>>> recovered. I follow the logic described in
> >>>> https://lwn.net/Articles/794704/ where Minchan was introducing
> >>>> MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
> >>>> and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
> >>>> would be considered destructive hints.
> >>>> Note that process_madvise_behavior_valid() allows only MADV_COLD and
> >>>> MADV_PAGEOUT at the moment, which are both non-destructive.
> >>>>
> >>>
> >>> There is a plan to support MADV_DONTNEED for this syscall. Do we need
> >>> to change these access checks again with that support?
> >>
> >> Eh, I absolutely don't think letting another process discard memory in
> >> another process' address space is a good idea. The target process can
> >> observe that easily and might even run into real issues.
> >>
> >> What's the use case?
> >>
> >
> > Userspace oom reaper. Please look at
> > https://lore.kernel.org/linux-api/20201014183943.GA1489464@google.com/T/
> >
>
> Thanks, somehow I missed that (not that it really changed my opinion on
> the approach while skimming over the discussion :) will have a more
> detailed look)

The latest version of that patchset is:
https://lore.kernel.org/patchwork/patch/1344419/
Yeah, memory reaping is a special case when we are operating on a
dying process to speed up the release of its memory. I don't know if
for that particular case we need to make the checks stricter. It's a
dying process anyway and the data is being destroyed. Allowing to
speed up that process probably can still use CAP_SYS_NICE.

>
> --
> Thanks,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-05 18:08             ` Suren Baghdasaryan
@ 2021-03-05 18:22               ` David Hildenbrand
  2021-03-05 18:36                 ` Suren Baghdasaryan
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2021-03-05 18:22 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Shakeel Butt, Andrew Morton, Jann Horn, Kees Cook,
	Jeffrey Vander Stoep, Minchan Kim, Michal Hocko, David Rientjes,
	Edgar Arriaga García, Tim Murray, Florian Weimer,
	Oleg Nesterov, James Morris, Linux MM, SElinux list, Linux API,
	linux-security-module, stable, LKML, kernel-team

On 05.03.21 19:08, Suren Baghdasaryan wrote:
> On Fri, Mar 5, 2021 at 9:52 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 05.03.21 18:45, Shakeel Butt wrote:
>>> On Fri, Mar 5, 2021 at 9:37 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 04.03.21 01:03, Shakeel Butt wrote:
>>>>> On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>>>
>>>>>> On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
>>>>>>>
>>>>>>> On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>>>>>
>>>>>>>> process_madvise currently requires ptrace attach capability.
>>>>>>>> PTRACE_MODE_ATTACH gives one process complete control over another
>>>>>>>> process. It effectively removes the security boundary between the
>>>>>>>> two processes (in one direction). Granting ptrace attach capability
>>>>>>>> even to a system process is considered dangerous since it creates an
>>>>>>>> attack surface. This severely limits the usage of this API.
>>>>>>>> The operations process_madvise can perform do not affect the correctness
>>>>>>>> of the operation of the target process; they only affect where the data
>>>>>>>> is physically located (and therefore, how fast it can be accessed).
>>>>>>>> What we want is the ability for one process to influence another process
>>>>>>>> in order to optimize performance across the entire system while leaving
>>>>>>>> the security boundary intact.
>>>>>>>> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
>>>>>>>> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
>>>>>>>> and CAP_SYS_NICE for influencing process performance.
>>>>>>>>
>>>>>>>> Cc: stable@vger.kernel.org # 5.10+
>>>>>>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>>>>>>>> Reviewed-by: Kees Cook <keescook@chromium.org>
>>>>>>>> Acked-by: Minchan Kim <minchan@kernel.org>
>>>>>>>> Acked-by: David Rientjes <rientjes@google.com>
>>>>>>>> ---
>>>>>>>> changes in v3
>>>>>>>> - Added Reviewed-by: Kees Cook <keescook@chromium.org>
>>>>>>>> - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
>>>>>>>> - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
>>>>>>>> - cc'ed linux-security-module@vger.kernel.org per James Morris's request
>>>>>>>>
>>>>>>>>     mm/madvise.c | 13 ++++++++++++-
>>>>>>>>     1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>>>>>> index df692d2e35d4..01fef79ac761 100644
>>>>>>>> --- a/mm/madvise.c
>>>>>>>> +++ b/mm/madvise.c
>>>>>>>> @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>>>>>>>>                    goto release_task;
>>>>>>>>            }
>>>>>>>>
>>>>>>>> -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
>>>>>>>> +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
>>>>>>>> +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
>>>>>>>>            if (IS_ERR_OR_NULL(mm)) {
>>>>>>>>                    ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>>>>>>>>                    goto release_task;
>>>>>>>>            }
>>>>>>>>
>>>>>>>> +       /*
>>>>>>>> +        * Require CAP_SYS_NICE for influencing process performance. Note that
>>>>>>>> +        * only non-destructive hints are currently supported.
>>>>>>>
>>>>>>> How is non-destructive defined? Is MADV_DONTNEED non-destructive?
>>>>>>
>>>>>> Non-destructive in this context means the data is not lost and can be
>>>>>> recovered. I follow the logic described in
>>>>>> https://lwn.net/Articles/794704/ where Minchan was introducing
>>>>>> MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
>>>>>> and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
>>>>>> would be considered destructive hints.
>>>>>> Note that process_madvise_behavior_valid() allows only MADV_COLD and
>>>>>> MADV_PAGEOUT at the moment, which are both non-destructive.
>>>>>>
>>>>>
>>>>> There is a plan to support MADV_DONTNEED for this syscall. Do we need
>>>>> to change these access checks again with that support?
>>>>
>>>> Eh, I absolutely don't think letting another process discard memory in
>>>> another process' address space is a good idea. The target process can
>>>> observe that easily and might even run into real issues.
>>>>
>>>> What's the use case?
>>>>
>>>
>>> Userspace oom reaper. Please look at
>>> https://lore.kernel.org/linux-api/20201014183943.GA1489464@google.com/T/
>>>
>>
>> Thanks, somehow I missed that (not that it really changed my opinion on
>> the approach while skimming over the discussion :) will have a more
>> detailed look)
> 
> The latest version of that patchset is:
> https://lore.kernel.org/patchwork/patch/1344419/
> Yeah, memory reaping is a special case when we are operating on a
> dying process to speed up the release of its memory. I don't know if
> for that particular case we need to make the checks stricter. It's a
> dying process anyway and the data is being destroyed. Allowing to
> speed up that process probably can still use CAP_SYS_NICE.

I know, unrelated discussion (sorry, I don't have above thread in my 
archive anymore due to automatic cleanups ...) , but introducing 
MADV_DONTEED on a remote processes, having to tweak range logic because 
we always want to apply it to the whole MM, just to speed up memory 
reaping sounds like completely abusing madvise()/process_madvise() to me.

You want different semantics than MADV_DONTNEED. You want different 
semantics than madvise.

Simple example: mlock()ed pages in the target process. MADV_DONTNEED 
would choke on that. For the use case of reaping, you certainly don't care.

I am not sure if process_madvise() is the right interface to enforce 
discarding of all target memory.


Not to mention that MADV_FREE doesn't make any sense IMHO for memory 
reaping ... no to mention exposing this via process_madvise().

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-05 18:22               ` David Hildenbrand
@ 2021-03-05 18:36                 ` Suren Baghdasaryan
  2021-03-05 19:41                   ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Suren Baghdasaryan @ 2021-03-05 18:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Shakeel Butt, Andrew Morton, Jann Horn, Kees Cook,
	Jeffrey Vander Stoep, Minchan Kim, Michal Hocko, David Rientjes,
	Edgar Arriaga García, Tim Murray, Florian Weimer,
	Oleg Nesterov, James Morris, Linux MM, SElinux list, Linux API,
	linux-security-module, stable, LKML, kernel-team

On Fri, Mar 5, 2021 at 10:23 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 05.03.21 19:08, Suren Baghdasaryan wrote:
> > On Fri, Mar 5, 2021 at 9:52 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 05.03.21 18:45, Shakeel Butt wrote:
> >>> On Fri, Mar 5, 2021 at 9:37 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 04.03.21 01:03, Shakeel Butt wrote:
> >>>>> On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>>>>
> >>>>>> On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
> >>>>>>>
> >>>>>>> On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>>>>>>
> >>>>>>>> process_madvise currently requires ptrace attach capability.
> >>>>>>>> PTRACE_MODE_ATTACH gives one process complete control over another
> >>>>>>>> process. It effectively removes the security boundary between the
> >>>>>>>> two processes (in one direction). Granting ptrace attach capability
> >>>>>>>> even to a system process is considered dangerous since it creates an
> >>>>>>>> attack surface. This severely limits the usage of this API.
> >>>>>>>> The operations process_madvise can perform do not affect the correctness
> >>>>>>>> of the operation of the target process; they only affect where the data
> >>>>>>>> is physically located (and therefore, how fast it can be accessed).
> >>>>>>>> What we want is the ability for one process to influence another process
> >>>>>>>> in order to optimize performance across the entire system while leaving
> >>>>>>>> the security boundary intact.
> >>>>>>>> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> >>>>>>>> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> >>>>>>>> and CAP_SYS_NICE for influencing process performance.
> >>>>>>>>
> >>>>>>>> Cc: stable@vger.kernel.org # 5.10+
> >>>>>>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> >>>>>>>> Reviewed-by: Kees Cook <keescook@chromium.org>
> >>>>>>>> Acked-by: Minchan Kim <minchan@kernel.org>
> >>>>>>>> Acked-by: David Rientjes <rientjes@google.com>
> >>>>>>>> ---
> >>>>>>>> changes in v3
> >>>>>>>> - Added Reviewed-by: Kees Cook <keescook@chromium.org>
> >>>>>>>> - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
> >>>>>>>> - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
> >>>>>>>> - cc'ed linux-security-module@vger.kernel.org per James Morris's request
> >>>>>>>>
> >>>>>>>>     mm/madvise.c | 13 ++++++++++++-
> >>>>>>>>     1 file changed, 12 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/mm/madvise.c b/mm/madvise.c
> >>>>>>>> index df692d2e35d4..01fef79ac761 100644
> >>>>>>>> --- a/mm/madvise.c
> >>>>>>>> +++ b/mm/madvise.c
> >>>>>>>> @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> >>>>>>>>                    goto release_task;
> >>>>>>>>            }
> >>>>>>>>
> >>>>>>>> -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> >>>>>>>> +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> >>>>>>>> +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> >>>>>>>>            if (IS_ERR_OR_NULL(mm)) {
> >>>>>>>>                    ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> >>>>>>>>                    goto release_task;
> >>>>>>>>            }
> >>>>>>>>
> >>>>>>>> +       /*
> >>>>>>>> +        * Require CAP_SYS_NICE for influencing process performance. Note that
> >>>>>>>> +        * only non-destructive hints are currently supported.
> >>>>>>>
> >>>>>>> How is non-destructive defined? Is MADV_DONTNEED non-destructive?
> >>>>>>
> >>>>>> Non-destructive in this context means the data is not lost and can be
> >>>>>> recovered. I follow the logic described in
> >>>>>> https://lwn.net/Articles/794704/ where Minchan was introducing
> >>>>>> MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
> >>>>>> and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
> >>>>>> would be considered destructive hints.
> >>>>>> Note that process_madvise_behavior_valid() allows only MADV_COLD and
> >>>>>> MADV_PAGEOUT at the moment, which are both non-destructive.
> >>>>>>
> >>>>>
> >>>>> There is a plan to support MADV_DONTNEED for this syscall. Do we need
> >>>>> to change these access checks again with that support?
> >>>>
> >>>> Eh, I absolutely don't think letting another process discard memory in
> >>>> another process' address space is a good idea. The target process can
> >>>> observe that easily and might even run into real issues.
> >>>>
> >>>> What's the use case?
> >>>>
> >>>
> >>> Userspace oom reaper. Please look at
> >>> https://lore.kernel.org/linux-api/20201014183943.GA1489464@google.com/T/
> >>>
> >>
> >> Thanks, somehow I missed that (not that it really changed my opinion on
> >> the approach while skimming over the discussion :) will have a more
> >> detailed look)
> >
> > The latest version of that patchset is:
> > https://lore.kernel.org/patchwork/patch/1344419/
> > Yeah, memory reaping is a special case when we are operating on a
> > dying process to speed up the release of its memory. I don't know if
> > for that particular case we need to make the checks stricter. It's a
> > dying process anyway and the data is being destroyed. Allowing to
> > speed up that process probably can still use CAP_SYS_NICE.
>
> I know, unrelated discussion (sorry, I don't have above thread in my
> archive anymore due to automatic cleanups ...) , but introducing
> MADV_DONTEED on a remote processes, having to tweak range logic because
> we always want to apply it to the whole MM, just to speed up memory
> reaping sounds like completely abusing madvise()/process_madvise() to me.
>
> You want different semantics than MADV_DONTNEED. You want different
> semantics than madvise.
>
> Simple example: mlock()ed pages in the target process. MADV_DONTNEED
> would choke on that. For the use case of reaping, you certainly don't care.
>
> I am not sure if process_madvise() is the right interface to enforce
> discarding of all target memory.
>
>
> Not to mention that MADV_FREE doesn't make any sense IMHO for memory
> reaping ... no to mention exposing this via process_madvise().

Yeah, that was the last comment from Christoph Hellwig on
https://lore.kernel.org/patchwork/patch/1344418/
I'll be rethinking the whole approach. Previously I proposed couple
different approaches that would make reaping a part of the kill by
adding a new flag for pidfd_send_signal:
https://lore.kernel.org/patchwork/patch/1338196/
https://lore.kernel.org/patchwork/patch/1060407/
but maybe a separate syscall for reaping is indeed the right way to go...

>
> --
> Thanks,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise
  2021-03-05 18:36                 ` Suren Baghdasaryan
@ 2021-03-05 19:41                   ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2021-03-05 19:41 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: David Hildenbrand, Shakeel Butt, Andrew Morton, Jann Horn,
	Kees Cook, Jeffrey Vander Stoep, Minchan Kim, Michal Hocko,
	David Rientjes, Edgar Arriaga García, Tim Murray,
	Florian Weimer, Oleg Nesterov, James Morris, Linux MM,
	SElinux list, Linux API, linux-security-module, stable, LKML,
	kernel-team


> Am 05.03.2021 um 19:36 schrieb Suren Baghdasaryan <surenb@google.com>:
> 
> On Fri, Mar 5, 2021 at 10:23 AM David Hildenbrand <david@redhat.com> wrote:
>> 
>>> On 05.03.21 19:08, Suren Baghdasaryan wrote:
>>> On Fri, Mar 5, 2021 at 9:52 AM David Hildenbrand <david@redhat.com> wrote:
>>>> 
>>>> On 05.03.21 18:45, Shakeel Butt wrote:
>>>>> On Fri, Mar 5, 2021 at 9:37 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>> 
>>>>>> On 04.03.21 01:03, Shakeel Butt wrote:
>>>>>>> On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>>>>> 
>>>>>>>> On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@google.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> process_madvise currently requires ptrace attach capability.
>>>>>>>>>> PTRACE_MODE_ATTACH gives one process complete control over another
>>>>>>>>>> process. It effectively removes the security boundary between the
>>>>>>>>>> two processes (in one direction). Granting ptrace attach capability
>>>>>>>>>> even to a system process is considered dangerous since it creates an
>>>>>>>>>> attack surface. This severely limits the usage of this API.
>>>>>>>>>> The operations process_madvise can perform do not affect the correctness
>>>>>>>>>> of the operation of the target process; they only affect where the data
>>>>>>>>>> is physically located (and therefore, how fast it can be accessed).
>>>>>>>>>> What we want is the ability for one process to influence another process
>>>>>>>>>> in order to optimize performance across the entire system while leaving
>>>>>>>>>> the security boundary intact.
>>>>>>>>>> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
>>>>>>>>>> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
>>>>>>>>>> and CAP_SYS_NICE for influencing process performance.
>>>>>>>>>> 
>>>>>>>>>> Cc: stable@vger.kernel.org # 5.10+
>>>>>>>>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>>>>>>>>>> Reviewed-by: Kees Cook <keescook@chromium.org>
>>>>>>>>>> Acked-by: Minchan Kim <minchan@kernel.org>
>>>>>>>>>> Acked-by: David Rientjes <rientjes@google.com>
>>>>>>>>>> ---
>>>>>>>>>> changes in v3
>>>>>>>>>> - Added Reviewed-by: Kees Cook <keescook@chromium.org>
>>>>>>>>>> - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
>>>>>>>>>> - cc'ed stable@vger.kernel.org # 5.10+ per Andrew's request
>>>>>>>>>> - cc'ed linux-security-module@vger.kernel.org per James Morris's request
>>>>>>>>>> 
>>>>>>>>>>    mm/madvise.c | 13 ++++++++++++-
>>>>>>>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>>>>> 
>>>>>>>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>>>>>>>> index df692d2e35d4..01fef79ac761 100644
>>>>>>>>>> --- a/mm/madvise.c
>>>>>>>>>> +++ b/mm/madvise.c
>>>>>>>>>> @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>>>>>>>>>>                   goto release_task;
>>>>>>>>>>           }
>>>>>>>>>> 
>>>>>>>>>> -       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
>>>>>>>>>> +       /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
>>>>>>>>>> +       mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
>>>>>>>>>>           if (IS_ERR_OR_NULL(mm)) {
>>>>>>>>>>                   ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>>>>>>>>>>                   goto release_task;
>>>>>>>>>>           }
>>>>>>>>>> 
>>>>>>>>>> +       /*
>>>>>>>>>> +        * Require CAP_SYS_NICE for influencing process performance. Note that
>>>>>>>>>> +        * only non-destructive hints are currently supported.
>>>>>>>>> 
>>>>>>>>> How is non-destructive defined? Is MADV_DONTNEED non-destructive?
>>>>>>>> 
>>>>>>>> Non-destructive in this context means the data is not lost and can be
>>>>>>>> recovered. I follow the logic described in
>>>>>>>> https://lwn.net/Articles/794704/ where Minchan was introducing
>>>>>>>> MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
>>>>>>>> and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
>>>>>>>> would be considered destructive hints.
>>>>>>>> Note that process_madvise_behavior_valid() allows only MADV_COLD and
>>>>>>>> MADV_PAGEOUT at the moment, which are both non-destructive.
>>>>>>>> 
>>>>>>> 
>>>>>>> There is a plan to support MADV_DONTNEED for this syscall. Do we need
>>>>>>> to change these access checks again with that support?
>>>>>> 
>>>>>> Eh, I absolutely don't think letting another process discard memory in
>>>>>> another process' address space is a good idea. The target process can
>>>>>> observe that easily and might even run into real issues.
>>>>>> 
>>>>>> What's the use case?
>>>>>> 
>>>>> 
>>>>> Userspace oom reaper. Please look at
>>>>> https://lore.kernel.org/linux-api/20201014183943.GA1489464@google.com/T/
>>>>> 
>>>> 
>>>> Thanks, somehow I missed that (not that it really changed my opinion on
>>>> the approach while skimming over the discussion :) will have a more
>>>> detailed look)
>>> 
>>> The latest version of that patchset is:
>>> https://lore.kernel.org/patchwork/patch/1344419/
>>> Yeah, memory reaping is a special case when we are operating on a
>>> dying process to speed up the release of its memory. I don't know if
>>> for that particular case we need to make the checks stricter. It's a
>>> dying process anyway and the data is being destroyed. Allowing to
>>> speed up that process probably can still use CAP_SYS_NICE.
>> 
>> I know, unrelated discussion (sorry, I don't have above thread in my
>> archive anymore due to automatic cleanups ...) , but introducing
>> MADV_DONTEED on a remote processes, having to tweak range logic because
>> we always want to apply it to the whole MM, just to speed up memory
>> reaping sounds like completely abusing madvise()/process_madvise() to me.
>> 
>> You want different semantics than MADV_DONTNEED. You want different
>> semantics than madvise.
>> 
>> Simple example: mlock()ed pages in the target process. MADV_DONTNEED
>> would choke on that. For the use case of reaping, you certainly don't care.
>> 
>> I am not sure if process_madvise() is the right interface to enforce
>> discarding of all target memory.
>> 
>> 
>> Not to mention that MADV_FREE doesn't make any sense IMHO for memory
>> reaping ... no to mention exposing this via process_madvise().
> 
> Yeah, that was the last comment from Christoph Hellwig on
> https://lore.kernel.org/patchwork/patch/1344418/
> I'll be rethinking the whole approach. Previously I proposed couple
> different approaches that would make reaping a part of the kill by
> adding a new flag for pidfd_send_signal:
> https://lore.kernel.org/patchwork/patch/1338196/
> https://lore.kernel.org/patchwork/patch/1060407/
> but maybe a separate syscall for reaping is indeed the right way to go...

Yeah, most likely!

> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-03-05 19:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03 18:58 [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise Suren Baghdasaryan
2021-03-03 18:58 ` Suren Baghdasaryan
2021-03-03 23:17 ` Shakeel Butt
2021-03-03 23:17   ` Shakeel Butt
2021-03-03 23:34   ` Suren Baghdasaryan
2021-03-04  0:03     ` Shakeel Butt
2021-03-04  1:17       ` Suren Baghdasaryan
2021-03-05 17:37       ` David Hildenbrand
2021-03-05 17:45         ` Shakeel Butt
2021-03-05 17:52           ` David Hildenbrand
2021-03-05 18:08             ` Suren Baghdasaryan
2021-03-05 18:22               ` David Hildenbrand
2021-03-05 18:36                 ` Suren Baghdasaryan
2021-03-05 19:41                   ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.