linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Arnd Bergmann <arnd@arndb.de>, Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux API <linux-api@vger.kernel.org>,
	Oleksandr Natalenko <oleksandr@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Tim Murray <timmurray@google.com>,
	Sandeep Patil <sspatil@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	John Dias <joaodias@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Jann Horn <jannh@google.com>,
	alexander.h.duyck@linux.intel.com,
	SeongJae Park <sj38.park@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Arjun Roy <arjunroy@google.com>, Vlastimil Babka <vbabka@suse.cz>,
	Christian Brauner <christian@brauner.io>,
	Daniel Colascione <dancol@google.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	SeongJae Park <sjpark@amazon.de>,
	linux-man <linux-man@vger.kernel.org>
Subject: Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API
Date: Fri, 28 Aug 2020 12:24:08 -0600	[thread overview]
Message-ID: <9c339413-68c7-344e-dd01-327cb988d385@kernel.dk> (raw)
In-Reply-To: <CAK8P3a0Mnp2ekmX-BX9yr+N8fy2=gBtASELLXoa9uGSpSS9aOA@mail.gmail.com>

On 8/28/20 11:40 AM, Arnd Bergmann wrote:
> On Mon, Jun 22, 2020 at 9:29 PM Minchan Kim <minchan@kernel.org> wrote:
>> So finally, the API is as follows,
>>
>>      ssize_t process_madvise(int pidfd, const struct iovec *iovec,
>>                unsigned long vlen, int advice, unsigned int flags);
> 
> I had not followed the discussion earlier and only now came across
> the syscall in linux-next, sorry for stirring things up this late.
> 
>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
>> index 94bf4958d114..8f959d90338a 100644
>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>> @@ -364,6 +364,7 @@
>>  440    common  watch_mount             sys_watch_mount
>>  441    common  watch_sb                sys_watch_sb
>>  442    common  fsinfo                  sys_fsinfo
>> +443    64      process_madvise         sys_process_madvise
>>
>>  #
>>  # x32-specific system call numbers start at 512 to avoid cache impact
>> @@ -407,3 +408,4 @@
>>  545    x32     execveat                compat_sys_execveat
>>  546    x32     preadv2                 compat_sys_preadv64v2
>>  547    x32     pwritev2                compat_sys_pwritev64v2
>> +548    x32     process_madvise         compat_sys_process_madvise
> 
> I think we should not add any new x32-specific syscalls. Instead I think
> the compat_sys_process_madvise/sys_process_madvise can be
> merged into one.
> 
>> +       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
>> +       if (IS_ERR_OR_NULL(mm)) {
>> +               ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>> +               goto release_task;
>> +       }
> 
> Minor point: Having to use IS_ERR_OR_NULL() tends to be fragile,
> and I would try to avoid that. Can mm_access() be changed to
> itself return PTR_ERR(-ESRCH) instead of NULL to improve its
> calling conventions? I see there are only three other callers.
> 
> 
>> +       ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
>> +       if (ret >= 0) {
>> +               ret = do_process_madvise(pidfd, &iter, behavior, flags);
>> +               kfree(iov);
>> +       }
>> +       return ret;
>> +}
>> +
>> +#ifdef CONFIG_COMPAT
> ...
>> +
>> +       ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack),
>> +                               &iov, &iter);
>> +       if (ret >= 0) {
>> +               ret = do_process_madvise(pidfd, &iter, behavior, flags);
>> +               kfree(iov);
>> +       }
> 
> Every syscall that passes an iovec seems to do this. If we make import_iovec()
> handle both cases directly, this syscall and a number of others can
> be simplified, and you avoid the x32 entry point I mentioned above
> 
> Something like (untested)
> 
> index dad8d0cfaaf7..0de4ddff24c1 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1683,8 +1683,13 @@ ssize_t import_iovec(int type, const struct
> iovec __user * uvector,
>  {
>         ssize_t n;
>         struct iovec *p;
> -       n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
> -                                 *iov, &p);
> +
> +       if (in_compat_syscall())
> +               n = compat_rw_copy_check_uvector(type, uvector, nr_segs,
> +                                                fast_segs, *iov, &p);
> +       else
> +               n = rw_copy_check_uvector(type, uvector, nr_segs,
> +                                         fast_segs, *iov, &p);
>         if (n < 0) {
>                 if (p != *iov)
>                         kfree(p);

Doesn't work for the async case, where you want to be holding on to the
allocated iovec. But in general I think it's a good helper for the sync
case, which is by far the majority.

-- 
Jens Axboe


  reply	other threads:[~2020-08-28 18:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 19:28 [PATCH v8 0/4] introduce memory hinting API for external process Minchan Kim
2020-06-22 19:28 ` [PATCH v8 1/4] mm/madvise: pass task and mm to do_madvise Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-22 19:28 ` [PATCH v8 2/4] pid: move pidfd_get_pid() to pid.c Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-22 19:28 ` [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-25 20:38     ` Minchan Kim
2020-08-28 17:40   ` Arnd Bergmann
2020-08-28 18:24     ` Jens Axboe [this message]
2020-08-28 18:25       ` Christian Brauner
2020-08-28 19:04         ` Minchan Kim
2020-08-28 19:26       ` Jens Axboe
2020-08-28 20:15         ` Arnd Bergmann
2020-08-29  7:02     ` Christoph Hellwig
2020-06-22 19:29 ` [PATCH v8 4/4] mm/madvise: check fatal signal pending of target process Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-22 19:36 ` [PATCH v8 0/4] introduce memory hinting API for external process Minchan Kim
2020-06-23  8:59   ` Oleksandr Natalenko
2020-06-23  9:07 ` Oleksandr Natalenko
2020-06-24  1:31   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c339413-68c7-344e-dd01-327cb988d385@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=arjunroy@google.com \
    --cc=arnd@arndb.de \
    --cc=bgeffon@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=christian@brauner.io \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=joaodias@google.com \
    --cc=joel@joelfernandes.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleksandr@redhat.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=sj38.park@gmail.com \
    --cc=sjpark@amazon.de \
    --cc=sonnyrao@google.com \
    --cc=sspatil@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).