All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Arnd Bergmann <arnd@arndb.de>, Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux API <linux-api@vger.kernel.org>,
	Oleksandr Natalenko <oleksandr@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Tim Murray <timmurray@google.com>,
	Sandeep Patil <sspatil@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	John Dias <joaodias@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Jann Horn <jannh@google.com>,
	alexander.h.duyck@linux.intel.com,
	SeongJae Park <sj38.park@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Arjun Roy <arjunroy@google.com>, Vlastimil Babka <vbabka@suse.cz>,
	Christian Brauner <christian@brauner.io>,
	Daniel Colascione <dancol@google.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	SeongJae Park <sjpark@amazon.de>,
	linux-man <linux-man@vger.kernel.org>
Subject: Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API
Date: Fri, 28 Aug 2020 12:24:08 -0600	[thread overview]
Message-ID: <9c339413-68c7-344e-dd01-327cb988d385@kernel.dk> (raw)
In-Reply-To: <CAK8P3a0Mnp2ekmX-BX9yr+N8fy2=gBtASELLXoa9uGSpSS9aOA@mail.gmail.com>

On 8/28/20 11:40 AM, Arnd Bergmann wrote:
> On Mon, Jun 22, 2020 at 9:29 PM Minchan Kim <minchan@kernel.org> wrote:
>> So finally, the API is as follows,
>>
>>      ssize_t process_madvise(int pidfd, const struct iovec *iovec,
>>                unsigned long vlen, int advice, unsigned int flags);
> 
> I had not followed the discussion earlier and only now came across
> the syscall in linux-next, sorry for stirring things up this late.
> 
>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
>> index 94bf4958d114..8f959d90338a 100644
>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>> @@ -364,6 +364,7 @@
>>  440    common  watch_mount             sys_watch_mount
>>  441    common  watch_sb                sys_watch_sb
>>  442    common  fsinfo                  sys_fsinfo
>> +443    64      process_madvise         sys_process_madvise
>>
>>  #
>>  # x32-specific system call numbers start at 512 to avoid cache impact
>> @@ -407,3 +408,4 @@
>>  545    x32     execveat                compat_sys_execveat
>>  546    x32     preadv2                 compat_sys_preadv64v2
>>  547    x32     pwritev2                compat_sys_pwritev64v2
>> +548    x32     process_madvise         compat_sys_process_madvise
> 
> I think we should not add any new x32-specific syscalls. Instead I think
> the compat_sys_process_madvise/sys_process_madvise can be
> merged into one.
> 
>> +       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
>> +       if (IS_ERR_OR_NULL(mm)) {
>> +               ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>> +               goto release_task;
>> +       }
> 
> Minor point: Having to use IS_ERR_OR_NULL() tends to be fragile,
> and I would try to avoid that. Can mm_access() be changed to
> itself return PTR_ERR(-ESRCH) instead of NULL to improve its
> calling conventions? I see there are only three other callers.
> 
> 
>> +       ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
>> +       if (ret >= 0) {
>> +               ret = do_process_madvise(pidfd, &iter, behavior, flags);
>> +               kfree(iov);
>> +       }
>> +       return ret;
>> +}
>> +
>> +#ifdef CONFIG_COMPAT
> ...
>> +
>> +       ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack),
>> +                               &iov, &iter);
>> +       if (ret >= 0) {
>> +               ret = do_process_madvise(pidfd, &iter, behavior, flags);
>> +               kfree(iov);
>> +       }
> 
> Every syscall that passes an iovec seems to do this. If we make import_iovec()
> handle both cases directly, this syscall and a number of others can
> be simplified, and you avoid the x32 entry point I mentioned above
> 
> Something like (untested)
> 
> index dad8d0cfaaf7..0de4ddff24c1 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1683,8 +1683,13 @@ ssize_t import_iovec(int type, const struct
> iovec __user * uvector,
>  {
>         ssize_t n;
>         struct iovec *p;
> -       n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
> -                                 *iov, &p);
> +
> +       if (in_compat_syscall())
> +               n = compat_rw_copy_check_uvector(type, uvector, nr_segs,
> +                                                fast_segs, *iov, &p);
> +       else
> +               n = rw_copy_check_uvector(type, uvector, nr_segs,
> +                                         fast_segs, *iov, &p);
>         if (n < 0) {
>                 if (p != *iov)
>                         kfree(p);

Doesn't work for the async case, where you want to be holding on to the
allocated iovec. But in general I think it's a good helper for the sync
case, which is by far the majority.

-- 
Jens Axboe


  reply	other threads:[~2020-08-28 18:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 19:28 [PATCH v8 0/4] introduce memory hinting API for external process Minchan Kim
2020-06-22 19:28 ` [PATCH v8 1/4] mm/madvise: pass task and mm to do_madvise Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-24 20:00     ` David Rientjes
2020-06-22 19:28 ` [PATCH v8 2/4] pid: move pidfd_get_pid() to pid.c Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-24 20:00     ` David Rientjes
2020-06-22 19:28 ` [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-24 20:00     ` David Rientjes
2020-06-25 20:38     ` Minchan Kim
2020-08-28 17:40   ` Arnd Bergmann
2020-08-28 17:40     ` Arnd Bergmann
2020-08-28 18:24     ` Jens Axboe [this message]
2020-08-28 18:25       ` Christian Brauner
2020-08-28 18:25         ` Christian Brauner
2020-08-28 19:04         ` Minchan Kim
2020-08-28 19:26       ` Jens Axboe
2020-08-28 20:15         ` Arnd Bergmann
2020-08-28 20:15           ` Arnd Bergmann
2020-08-29  7:02     ` Christoph Hellwig
2020-06-22 19:29 ` [PATCH v8 4/4] mm/madvise: check fatal signal pending of target process Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-24 20:00     ` David Rientjes
2020-06-22 19:36 ` [PATCH v8 0/4] introduce memory hinting API for external process Minchan Kim
2020-06-23  8:59   ` Oleksandr Natalenko
2020-06-23  9:07 ` Oleksandr Natalenko
2020-06-24  1:31   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c339413-68c7-344e-dd01-327cb988d385@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=arjunroy@google.com \
    --cc=arnd@arndb.de \
    --cc=bgeffon@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=christian@brauner.io \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=joaodias@google.com \
    --cc=joel@joelfernandes.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleksandr@redhat.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=sj38.park@gmail.com \
    --cc=sjpark@amazon.de \
    --cc=sonnyrao@google.com \
    --cc=sspatil@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.