From: Jens Axboe <axboe@kernel.dk>
To: Arnd Bergmann <arnd@arndb.de>, Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Christian Brauner <christian.brauner@ubuntu.com>,
linux-mm <linux-mm@kvack.org>,
Linux API <linux-api@vger.kernel.org>,
Oleksandr Natalenko <oleksandr@redhat.com>,
Suren Baghdasaryan <surenb@google.com>,
Tim Murray <timmurray@google.com>,
Sandeep Patil <sspatil@google.com>,
Sonny Rao <sonnyrao@google.com>,
Brian Geffon <bgeffon@google.com>, Michal Hocko <mhocko@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeelb@google.com>,
John Dias <joaodias@google.com>,
Joel Fernandes <joel@joelfernandes.org>,
Jann Horn <jannh@google.com>,
alexander.h.duyck@linux.intel.com,
SeongJae Park <sj38.park@gmail.com>,
David Rientjes <rientjes@google.com>,
Arjun Roy <arjunroy@google.com>, Vlastimil Babka <vbabka@suse.cz>,
Christian Brauner <christian@brauner.io>,
Daniel Colascione <dancol@google.com>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
SeongJae Park <sjpark@amazon.de>,
linux-man <linux-man@vger.kernel.org>
Subject: Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API
Date: Fri, 28 Aug 2020 12:24:08 -0600 [thread overview]
Message-ID: <9c339413-68c7-344e-dd01-327cb988d385@kernel.dk> (raw)
In-Reply-To: <CAK8P3a0Mnp2ekmX-BX9yr+N8fy2=gBtASELLXoa9uGSpSS9aOA@mail.gmail.com>
On 8/28/20 11:40 AM, Arnd Bergmann wrote:
> On Mon, Jun 22, 2020 at 9:29 PM Minchan Kim <minchan@kernel.org> wrote:
>> So finally, the API is as follows,
>>
>> ssize_t process_madvise(int pidfd, const struct iovec *iovec,
>> unsigned long vlen, int advice, unsigned int flags);
>
> I had not followed the discussion earlier and only now came across
> the syscall in linux-next, sorry for stirring things up this late.
>
>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
>> index 94bf4958d114..8f959d90338a 100644
>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>> @@ -364,6 +364,7 @@
>> 440 common watch_mount sys_watch_mount
>> 441 common watch_sb sys_watch_sb
>> 442 common fsinfo sys_fsinfo
>> +443 64 process_madvise sys_process_madvise
>>
>> #
>> # x32-specific system call numbers start at 512 to avoid cache impact
>> @@ -407,3 +408,4 @@
>> 545 x32 execveat compat_sys_execveat
>> 546 x32 preadv2 compat_sys_preadv64v2
>> 547 x32 pwritev2 compat_sys_pwritev64v2
>> +548 x32 process_madvise compat_sys_process_madvise
>
> I think we should not add any new x32-specific syscalls. Instead I think
> the compat_sys_process_madvise/sys_process_madvise can be
> merged into one.
>
>> + mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
>> + if (IS_ERR_OR_NULL(mm)) {
>> + ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>> + goto release_task;
>> + }
>
> Minor point: Having to use IS_ERR_OR_NULL() tends to be fragile,
> and I would try to avoid that. Can mm_access() be changed to
> itself return PTR_ERR(-ESRCH) instead of NULL to improve its
> calling conventions? I see there are only three other callers.
>
>
>> + ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
>> + if (ret >= 0) {
>> + ret = do_process_madvise(pidfd, &iter, behavior, flags);
>> + kfree(iov);
>> + }
>> + return ret;
>> +}
>> +
>> +#ifdef CONFIG_COMPAT
> ...
>> +
>> + ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack),
>> + &iov, &iter);
>> + if (ret >= 0) {
>> + ret = do_process_madvise(pidfd, &iter, behavior, flags);
>> + kfree(iov);
>> + }
>
> Every syscall that passes an iovec seems to do this. If we make import_iovec()
> handle both cases directly, this syscall and a number of others can
> be simplified, and you avoid the x32 entry point I mentioned above
>
> Something like (untested)
>
> index dad8d0cfaaf7..0de4ddff24c1 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1683,8 +1683,13 @@ ssize_t import_iovec(int type, const struct
> iovec __user * uvector,
> {
> ssize_t n;
> struct iovec *p;
> - n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
> - *iov, &p);
> +
> + if (in_compat_syscall())
> + n = compat_rw_copy_check_uvector(type, uvector, nr_segs,
> + fast_segs, *iov, &p);
> + else
> + n = rw_copy_check_uvector(type, uvector, nr_segs,
> + fast_segs, *iov, &p);
> if (n < 0) {
> if (p != *iov)
> kfree(p);
Doesn't work for the async case, where you want to be holding on to the
allocated iovec. But in general I think it's a good helper for the sync
case, which is by far the majority.
--
Jens Axboe
next prev parent reply other threads:[~2020-08-28 18:24 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-22 19:28 [PATCH v8 0/4] introduce memory hinting API for external process Minchan Kim
2020-06-22 19:28 ` [PATCH v8 1/4] mm/madvise: pass task and mm to do_madvise Minchan Kim
2020-06-24 20:00 ` David Rientjes
2020-06-24 20:00 ` David Rientjes
2020-06-22 19:28 ` [PATCH v8 2/4] pid: move pidfd_get_pid() to pid.c Minchan Kim
2020-06-24 20:00 ` David Rientjes
2020-06-24 20:00 ` David Rientjes
2020-06-22 19:28 ` [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API Minchan Kim
2020-06-24 20:00 ` David Rientjes
2020-06-24 20:00 ` David Rientjes
2020-06-25 20:38 ` Minchan Kim
2020-08-28 17:40 ` Arnd Bergmann
2020-08-28 17:40 ` Arnd Bergmann
2020-08-28 18:24 ` Jens Axboe [this message]
2020-08-28 18:25 ` Christian Brauner
2020-08-28 18:25 ` Christian Brauner
2020-08-28 19:04 ` Minchan Kim
2020-08-28 19:26 ` Jens Axboe
2020-08-28 20:15 ` Arnd Bergmann
2020-08-28 20:15 ` Arnd Bergmann
2020-08-29 7:02 ` Christoph Hellwig
2020-06-22 19:29 ` [PATCH v8 4/4] mm/madvise: check fatal signal pending of target process Minchan Kim
2020-06-24 20:00 ` David Rientjes
2020-06-24 20:00 ` David Rientjes
2020-06-22 19:36 ` [PATCH v8 0/4] introduce memory hinting API for external process Minchan Kim
2020-06-23 8:59 ` Oleksandr Natalenko
2020-06-23 9:07 ` Oleksandr Natalenko
2020-06-24 1:31 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9c339413-68c7-344e-dd01-327cb988d385@kernel.dk \
--to=axboe@kernel.dk \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@linux.intel.com \
--cc=arjunroy@google.com \
--cc=arnd@arndb.de \
--cc=bgeffon@google.com \
--cc=christian.brauner@ubuntu.com \
--cc=christian@brauner.io \
--cc=dancol@google.com \
--cc=hannes@cmpxchg.org \
--cc=jannh@google.com \
--cc=joaodias@google.com \
--cc=joel@joelfernandes.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=oleksandr@redhat.com \
--cc=rientjes@google.com \
--cc=shakeelb@google.com \
--cc=sj38.park@gmail.com \
--cc=sjpark@amazon.de \
--cc=sonnyrao@google.com \
--cc=sspatil@google.com \
--cc=surenb@google.com \
--cc=timmurray@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.