All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-api@vger.kernel.org, oleksandr@redhat.com,
	Suren Baghdasaryan <surenb@google.com>,
	Tim Murray <timmurray@google.com>,
	Daniel Colascione <dancol@google.com>,
	Sandeep Patil <sspatil@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	John Dias <joaodias@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Jann Horn <jannh@google.com>,
	alexander.h.duyck@linux.intel.com,
	SeongJae Park <sjpark@amazon.com>,
	David Rientjes <rientjes@google.com>,
	Arjun Roy <arjunroy@google.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>
Subject: Re: [PATCH] mm: use only pidfd for process_madvise syscall
Date: Tue, 19 May 2020 11:14:47 -0700	[thread overview]
Message-ID: <20200519181447.GA220547@google.com> (raw)
In-Reply-To: <20200519161956.GA66748@google.com>

On Tue, May 19, 2020 at 09:19:56AM -0700, Minchan Kim wrote:
> Hi Christian,
> 
> On Tue, May 19, 2020 at 09:45:18AM +0200, Christian Brauner wrote:
> > On Fri, May 15, 2020 at 06:20:55PM -0700, Minchan Kim wrote:
> > > Based on discussion[1], people didn't feel we need to support both
> > > pid and pidfd for every new coming API[2] so this patch keeps only
> > > pidfd. This patch also changes flags's type with "unsigned int".
> > > So finally, the API is as follows,
> > > 
> > >       ssize_t process_madvise(int pidfd, const struct iovec *iovec,
> > >       		unsigned long vlen, int advice, unsigned int flags);
> > > 
> > >     DESCRIPTION
> > >       The process_madvise() system call is used to give advice or directions
> > >       to the kernel about the address ranges from external process as well as
> > >       local process. It provides the advice to address ranges of process
> > >       described by iovec and vlen. The goal of such advice is to improve system
> > >       or application performance.
> > > 
> > >       The pidfd selects the process referred to by the PID file descriptor
> > >       specified in pidfd. (See pidofd_open(2) for further information)
> > > 
> > >       The pointer iovec points to an array of iovec structures, defined in
> > >       <sys/uio.h> as:
> > > 
> > >         struct iovec {
> > >             void *iov_base;         /* starting address */
> > >             size_t iov_len;         /* number of bytes to be advised */
> > >         };
> > > 
> > >       The iovec describes address ranges beginning at address(iov_base)
> > >       and with size length of bytes(iov_len).
> > > 
> > >       The vlen represents the number of elements in iovec.
> > > 
> > >       The advice is indicated in the advice argument, which is one of the
> > >       following at this moment if the target process specified by idtype and
> > >       id is external.
> > > 
> > >         MADV_COLD
> > >         MADV_PAGEOUT
> > >         MADV_MERGEABLE
> > >         MADV_UNMERGEABLE
> > > 
> > >       Permission to provide a hint to external process is governed by a
> > >       ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).
> > > 
> > >       The process_madvise supports every advice madvise(2) has if target
> > >       process is in same thread group with calling process so user could
> > >       use process_madvise(2) to extend existing madvise(2) to support
> > >       vector address ranges.
> > > 
> > >     RETURN VALUE
> > >       On success, process_madvise() returns the number of bytes advised.
> > >       This return value may be less than the total number of requested
> > >       bytes, if an error occurred. The caller should check return value
> > >       to determine whether a partial advice occurred.
> > > 
> > > [1] https://lore.kernel.org/linux-mm/20200509124817.xmrvsrq3mla6b76k@wittgenstein/
> > > [2] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@virtuozzo.com/
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > 
> > Thanks for the ping Minchan, and sorry for not replying earlier to this.
> > 
> > Also, sorry that i delayed the patch but this here really seems a way
> > cleaner api to me and feels less hackish. In general this patch seems
> > fine to me.
> 
> Thanks.
> 
> > But two comments below:
> > 
> > > ---
> > >  mm/madvise.c | 42 +++++++++++++-----------------------------
> > >  1 file changed, 13 insertions(+), 29 deletions(-)
> > > 
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index d3fbbe52d230..35c9b220146a 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -1229,8 +1229,8 @@ static int process_madvise_vec(struct task_struct *target_task,
> > >  	return ret;
> > >  }
> > >  
> > > -static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter,
> > > -				       int behavior, unsigned long flags)
> > > +static ssize_t do_process_madvise(int pidfd, struct iov_iter *iter,
> > > +				int behavior, unsigned int flags)
> > >  {
> > >  	ssize_t ret;
> > >  	struct pid *pid;
> > > @@ -1241,26 +1241,12 @@ static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter,
> > >  	if (flags != 0)
> > >  		return -EINVAL;
> > >  
> > > -	switch (which) {
> > > -	case P_PID:
> > > -		if (upid <= 0)
> > > -			return -EINVAL;
> > > -
> > > -		pid = find_get_pid(upid);
> > > -		if (!pid)
> > > -			return -ESRCH;
> > > -		break;
> > > -	case P_PIDFD:
> > > -		if (upid < 0)
> > > -			return -EINVAL;
> > > -
> > > -		pid = pidfd_get_pid(upid);
> > > -		if (IS_ERR(pid))
> > > -			return PTR_ERR(pid);
> > > -		break;
> > > -	default:
> > > +	if (pidfd < 0)
> > >  		return -EINVAL;
> > 
> > When garbage file descriptors are passed EBADF needs to be returned, not
> > EINVAL. That's the case with most apis and also with pidfds, compare:
> 
> True. Let me cook a patch for that.

Hi Andrew,

Please fold this patch against on last patch you have for process_madvise.
Thanks!

From 939e4c5b7ca12efc5d5eeb8ff55fc02752f70544 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 19 May 2020 11:06:57 -0700
Subject: [PATCH] mm: return EBADF if pidfd is invalid

This patch makes returning of EBADF when the fd passed as argument is
invalid. The implementaion relies on pidfd_get_pid's error return.

This patch also fixes syscall declare part since we removed pid support.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/compat.h   | 6 +++---
 include/linux/syscalls.h | 5 ++---
 mm/madvise.c             | 3 ---
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 2e2f0a2700ab..86b61e873947 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -827,10 +827,10 @@ asmlinkage long compat_sys_pwritev64v2(unsigned long fd,
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
 
-asmlinkage ssize_t compat_sys_process_madvise(compat_int_t which,
-		compat_pid_t upid, const struct compat_iovec __user *vec,
+asmlinkage ssize_t compat_sys_process_madvise(compat_int_t pidfd,
+		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, compat_int_t behavior,
-		compat_ulong_t flags);
+		compat_int_t flags);
 
 /*
  * Deprecated system calls which are still defined in
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 2cbd3660e8e6..63ffa6dc9da3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -879,9 +879,8 @@ asmlinkage long sys_munlockall(void);
 asmlinkage long sys_mincore(unsigned long start, size_t len,
 				unsigned char __user * vec);
 asmlinkage long sys_madvise(unsigned long start, size_t len, int behavior);
-asmlinkage long sys_process_madvise(int which, pid_t upid,
-		const struct iovec __user *vec, unsigned long vlen,
-		int behavior, unsigned long flags);
+asmlinkage long sys_process_madvise(int pidfd, const struct iovec __user *vec,
+		unsigned long vlen, int behavior, unsigned int flags);
 asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
 			unsigned long prot, unsigned long pgoff,
 			unsigned long flags);
diff --git a/mm/madvise.c b/mm/madvise.c
index 35c9b220146a..3ac1eda1203f 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1241,9 +1241,6 @@ static ssize_t do_process_madvise(int pidfd, struct iov_iter *iter,
 	if (flags != 0)
 		return -EINVAL;
 
-	if (pidfd < 0)
-		return -EINVAL;
-
 	pid = pidfd_get_pid(pidfd);
 	if (IS_ERR(pid))
 		return PTR_ERR(pid);
-- 
2.26.2.761.g0e0b3e54be-goog


      reply	other threads:[~2020-05-19 18:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-16  1:20 [PATCH] mm: use only pidfd for process_madvise syscall Minchan Kim
2020-05-18 19:22 ` Suren Baghdasaryan
2020-05-18 19:22   ` Suren Baghdasaryan
2020-05-18 21:13   ` Minchan Kim
2020-05-18 23:06     ` Andrew Morton
2020-05-19  5:54       ` Minchan Kim
2020-05-19  7:45 ` Christian Brauner
2020-05-19 16:19   ` Minchan Kim
2020-05-19 18:14     ` Minchan Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200519181447.GA220547@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=arjunroy@google.com \
    --cc=bgeffon@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=joaodias@google.com \
    --cc=joel@joelfernandes.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleksandr@redhat.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=sjpark@amazon.com \
    --cc=sonnyrao@google.com \
    --cc=sspatil@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.