linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	 Christian Brauner <christian.brauner@ubuntu.com>,
	linux-mm <linux-mm@kvack.org>,
	 linux-api@vger.kernel.org,
	Oleksandr Natalenko <oleksandr@redhat.com>,
	 Tim Murray <timmurray@google.com>,
	Daniel Colascione <dancol@google.com>,
	 Sandeep Patil <sspatil@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	 Brian Geffon <bgeffon@google.com>,
	Michal Hocko <mhocko@suse.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	 John Dias <joaodias@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	 Jann Horn <jannh@google.com>,
	alexander.h.duyck@linux.intel.com,
	 SeongJae Park <sjpark@amazon.com>,
	David Rientjes <rientjes@google.com>,
	Arjun Roy <arjunroy@google.com>,
	 Kirill Tkhai <ktkhai@virtuozzo.com>
Subject: Re: [PATCH] mm: use only pidfd for process_madvise syscall
Date: Mon, 18 May 2020 12:22:56 -0700	[thread overview]
Message-ID: <CAJuCfpGbPUpWLDgwt5ZP4Uf8fp6ht_6eqUypMVYYh3btJdz_8Q@mail.gmail.com> (raw)
In-Reply-To: <20200516012055.126205-1-minchan@kernel.org>

On Fri, May 15, 2020 at 6:21 PM Minchan Kim <minchan@kernel.org> wrote:
>
> Based on discussion[1], people didn't feel we need to support both
> pid and pidfd for every new coming API[2] so this patch keeps only
> pidfd. This patch also changes flags's type with "unsigned int".
> So finally, the API is as follows,
>
>       ssize_t process_madvise(int pidfd, const struct iovec *iovec,
>                 unsigned long vlen, int advice, unsigned int flags);
>
>     DESCRIPTION
>       The process_madvise() system call is used to give advice or directions
>       to the kernel about the address ranges from external process as well as
>       local process. It provides the advice to address ranges of process
>       described by iovec and vlen. The goal of such advice is to improve system
>       or application performance.
>
>       The pidfd selects the process referred to by the PID file descriptor
>       specified in pidfd. (See pidofd_open(2) for further information)
>
>       The pointer iovec points to an array of iovec structures, defined in
>       <sys/uio.h> as:
>
>         struct iovec {
>             void *iov_base;         /* starting address */
>             size_t iov_len;         /* number of bytes to be advised */
>         };
>
>       The iovec describes address ranges beginning at address(iov_base)
>       and with size length of bytes(iov_len).
>
>       The vlen represents the number of elements in iovec.
>
>       The advice is indicated in the advice argument, which is one of the
>       following at this moment if the target process specified by idtype and

There is no idtype parameter anymore, so maybe just "if the target
process is external"?

>       id is external.
>
>         MADV_COLD
>         MADV_PAGEOUT
>         MADV_MERGEABLE
>         MADV_UNMERGEABLE
>
>       Permission to provide a hint to external process is governed by a
>       ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).
>
>       The process_madvise supports every advice madvise(2) has if target
>       process is in same thread group with calling process so user could
>       use process_madvise(2) to extend existing madvise(2) to support
>       vector address ranges.
>
>     RETURN VALUE
>       On success, process_madvise() returns the number of bytes advised.
>       This return value may be less than the total number of requested
>       bytes, if an error occurred. The caller should check return value
>       to determine whether a partial advice occurred.
>
> [1] https://lore.kernel.org/linux-mm/20200509124817.xmrvsrq3mla6b76k@wittgenstein/
> [2] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@virtuozzo.com/
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/madvise.c | 42 +++++++++++++-----------------------------
>  1 file changed, 13 insertions(+), 29 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index d3fbbe52d230..35c9b220146a 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1229,8 +1229,8 @@ static int process_madvise_vec(struct task_struct *target_task,
>         return ret;
>  }
>
> -static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter,
> -                                      int behavior, unsigned long flags)
> +static ssize_t do_process_madvise(int pidfd, struct iov_iter *iter,
> +                               int behavior, unsigned int flags)
>  {
>         ssize_t ret;
>         struct pid *pid;
> @@ -1241,26 +1241,12 @@ static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter,
>         if (flags != 0)
>                 return -EINVAL;
>
> -       switch (which) {
> -       case P_PID:
> -               if (upid <= 0)
> -                       return -EINVAL;
> -
> -               pid = find_get_pid(upid);
> -               if (!pid)
> -                       return -ESRCH;
> -               break;
> -       case P_PIDFD:
> -               if (upid < 0)
> -                       return -EINVAL;
> -
> -               pid = pidfd_get_pid(upid);
> -               if (IS_ERR(pid))
> -                       return PTR_ERR(pid);
> -               break;
> -       default:
> +       if (pidfd < 0)
>                 return -EINVAL;
> -       }
> +
> +       pid = pidfd_get_pid(pidfd);
> +       if (IS_ERR(pid))
> +               return PTR_ERR(pid);
>
>         task = get_pid_task(pid, PIDTYPE_PID);
>         if (!task) {
> @@ -1292,9 +1278,8 @@ static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter,
>         return ret;
>  }
>
> -SYSCALL_DEFINE6(process_madvise, int, which, pid_t, upid,
> -               const struct iovec __user *, vec, unsigned long, vlen,
> -               int, behavior, unsigned long, flags)
> +SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> +               unsigned long, vlen, int, behavior, unsigned int, flags)
>  {
>         ssize_t ret;
>         struct iovec iovstack[UIO_FASTIOV];
> @@ -1303,19 +1288,18 @@ SYSCALL_DEFINE6(process_madvise, int, which, pid_t, upid,
>
>         ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
>         if (ret >= 0) {
> -               ret = do_process_madvise(which, upid, &iter, behavior, flags);
> +               ret = do_process_madvise(pidfd, &iter, behavior, flags);
>                 kfree(iov);
>         }
>         return ret;
>  }
>
>  #ifdef CONFIG_COMPAT
> -COMPAT_SYSCALL_DEFINE6(process_madvise, compat_int_t, which,
> -                       compat_pid_t, upid,
> +COMPAT_SYSCALL_DEFINE5(process_madvise, compat_int_t, pidfd,
>                         const struct compat_iovec __user *, vec,
>                         compat_ulong_t, vlen,
>                         compat_int_t, behavior,
> -                       compat_ulong_t, flags)
> +                       compat_int_t, flags)
>
>  {
>         ssize_t ret;
> @@ -1326,7 +1310,7 @@ COMPAT_SYSCALL_DEFINE6(process_madvise, compat_int_t, which,
>         ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack),
>                                 &iov, &iter);
>         if (ret >= 0) {
> -               ret = do_process_madvise(which, upid, &iter, behavior, flags);
> +               ret = do_process_madvise(pidfd, &iter, behavior, flags);
>                 kfree(iov);
>         }
>         return ret;
> --
> 2.26.2.761.g0e0b3e54be-goog
>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>


  reply	other threads:[~2020-05-18 19:23 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-16  1:20 [PATCH] mm: use only pidfd for process_madvise syscall Minchan Kim
2020-05-18 19:22 ` Suren Baghdasaryan [this message]
2020-05-18 21:13   ` Minchan Kim
2020-05-18 23:06     ` Andrew Morton
2020-05-19  5:54       ` Minchan Kim
2020-05-19  7:45 ` Christian Brauner
2020-05-19 16:19   ` Minchan Kim
2020-05-19 18:14     ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpGbPUpWLDgwt5ZP4Uf8fp6ht_6eqUypMVYYh3btJdz_8Q@mail.gmail.com \
    --to=surenb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=arjunroy@google.com \
    --cc=bgeffon@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=joaodias@google.com \
    --cc=joel@joelfernandes.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleksandr@redhat.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=sjpark@amazon.com \
    --cc=sonnyrao@google.com \
    --cc=sspatil@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).