All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org,
	lustre-devel@lists.lustre.org,
	v9fs-developer@lists.sourceforge.net,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jan Kara <jack@suse.cz>, Chris Wilson <chris@chris-wilson.co.uk>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call
Date: Fri, 3 Feb 2017 07:29:52 +0000	[thread overview]
Message-ID: <20170203072952.GI27291@ZenIV.linux.org.uk> (raw)
In-Reply-To: <1486040452.2812.6.camel@redhat.com>

On Thu, Feb 02, 2017 at 08:00:52AM -0500, Jeff Layton wrote:
> > I'm not sure we need to touch any get_user_pages_fast() at all; let it
> > fill a medium-sized array and use that as a buffer.  In particular,
> > I *really* don't like the idea of having the callbacks done in an
> > inconsistent locking environment - sometimes under ->mmap_sem, sometimes
> > not.
> > 
> 
> Yeah, that might work. You could kmalloc the buffer array according to
> the maxsize value. For small ones we could even consider using an on-
> stack buffer.

Yes.  FWIW, I'm not proposing to kill iov_iter_get_pages{,_alloc}()
immediately - the new primitive would initially use the damn thing.
That can be backported without any problems, and conversions would be
one-by-one.  If/when we get to the point where most of the users have
been switched to the new helper, we could try and see if what remains
could be dealt with; if that works, it might be possible to fold
iov_iter_get_pages() into it.  In particular, for ITER_BVEC and
ITER_PIPE we don't need to bother with any intermediate arrays at all, etc.
But that's only after several cycles when everyone is asked to try and use
the new primitive instead of iov_iter_get_pages().

The calling conventions of iov_iter_get_pages() are ugly and I would love to
get rid of them, but doing that in a flagday conversion is a lot of PITA
for no good reason.

Speaking of get_user_pages() and conversions:

get_user_pages() relies upon find_extend_vma() to reject kernel
addresses; the fast side of get_user_pages_fast() doesn't have anything
of that sort in case e.g. HAVE_GENERIC_RCU_GUP.  Sure, usually we have
that verified and rejected earlier anyway, but it's a fairly subtle difference
that doesn't seem to be documented anywhere.

For example, /dev/st* reads and writes (st_read()/st_write()) feed the
address of buffer to get_user_pages_unlocked().  If somebody replaced
those with get_user_pages_fast(), we'd be in trouble as soon as
some code got tricked into using kernel_write() on /dev/st*.

access_ok() in HAVE_GENERIC_RCU_GUP {__,}get_user_pages_fast() obviously
doesn't help in that scenario.  What am I missing here?

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
To: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Chris Wilson
	<chris-Y6uKTt2uX1cEflXRtASbqLVCufUGDwFn@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	v9fs-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	"Kirill A. Shutemov"
	<kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	lustre-devel-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
Subject: Re: [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call
Date: Fri, 3 Feb 2017 07:29:52 +0000	[thread overview]
Message-ID: <20170203072952.GI27291@ZenIV.linux.org.uk> (raw)
In-Reply-To: <1486040452.2812.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, Feb 02, 2017 at 08:00:52AM -0500, Jeff Layton wrote:
> > I'm not sure we need to touch any get_user_pages_fast() at all; let it
> > fill a medium-sized array and use that as a buffer.  In particular,
> > I *really* don't like the idea of having the callbacks done in an
> > inconsistent locking environment - sometimes under ->mmap_sem, sometimes
> > not.
> > 
> 
> Yeah, that might work. You could kmalloc the buffer array according to
> the maxsize value. For small ones we could even consider using an on-
> stack buffer.

Yes.  FWIW, I'm not proposing to kill iov_iter_get_pages{,_alloc}()
immediately - the new primitive would initially use the damn thing.
That can be backported without any problems, and conversions would be
one-by-one.  If/when we get to the point where most of the users have
been switched to the new helper, we could try and see if what remains
could be dealt with; if that works, it might be possible to fold
iov_iter_get_pages() into it.  In particular, for ITER_BVEC and
ITER_PIPE we don't need to bother with any intermediate arrays at all, etc.
But that's only after several cycles when everyone is asked to try and use
the new primitive instead of iov_iter_get_pages().

The calling conventions of iov_iter_get_pages() are ugly and I would love to
get rid of them, but doing that in a flagday conversion is a lot of PITA
for no good reason.

Speaking of get_user_pages() and conversions:

get_user_pages() relies upon find_extend_vma() to reject kernel
addresses; the fast side of get_user_pages_fast() doesn't have anything
of that sort in case e.g. HAVE_GENERIC_RCU_GUP.  Sure, usually we have
that verified and rejected earlier anyway, but it's a fairly subtle difference
that doesn't seem to be documented anywhere.

For example, /dev/st* reads and writes (st_read()/st_write()) feed the
address of buffer to get_user_pages_unlocked().  If somebody replaced
those with get_user_pages_fast(), we'd be in trouble as soon as
some code got tricked into using kernel_write() on /dev/st*.

access_ok() in HAVE_GENERIC_RCU_GUP {__,}get_user_pages_fast() obviously
doesn't help in that scenario.  What am I missing here?

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Chris Wilson
	<chris-Y6uKTt2uX1cEflXRtASbqLVCufUGDwFn@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	v9fs-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	"Kirill A. Shutemov"
	<kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	lustre-devel-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
Subject: [lustre-devel] [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call
Date: Fri, 3 Feb 2017 07:29:52 +0000	[thread overview]
Message-ID: <20170203072952.GI27291@ZenIV.linux.org.uk> (raw)
In-Reply-To: <1486040452.2812.6.camel@redhat.com>

On Thu, Feb 02, 2017 at 08:00:52AM -0500, Jeff Layton wrote:
> > I'm not sure we need to touch any get_user_pages_fast() at all; let it
> > fill a medium-sized array and use that as a buffer.  In particular,
> > I *really* don't like the idea of having the callbacks done in an
> > inconsistent locking environment - sometimes under ->mmap_sem, sometimes
> > not.
> > 
> 
> Yeah, that might work. You could kmalloc the buffer array according to
> the maxsize value. For small ones we could even consider using an on-
> stack buffer.

Yes.  FWIW, I'm not proposing to kill iov_iter_get_pages{,_alloc}()
immediately - the new primitive would initially use the damn thing.
That can be backported without any problems, and conversions would be
one-by-one.  If/when we get to the point where most of the users have
been switched to the new helper, we could try and see if what remains
could be dealt with; if that works, it might be possible to fold
iov_iter_get_pages() into it.  In particular, for ITER_BVEC and
ITER_PIPE we don't need to bother with any intermediate arrays at all, etc.
But that's only after several cycles when everyone is asked to try and use
the new primitive instead of iov_iter_get_pages().

The calling conventions of iov_iter_get_pages() are ugly and I would love to
get rid of them, but doing that in a flagday conversion is a lot of PITA
for no good reason.

Speaking of get_user_pages() and conversions:

get_user_pages() relies upon find_extend_vma() to reject kernel
addresses; the fast side of get_user_pages_fast() doesn't have anything
of that sort in case e.g. HAVE_GENERIC_RCU_GUP.  Sure, usually we have
that verified and rejected earlier anyway, but it's a fairly subtle difference
that doesn't seem to be documented anywhere.

For example, /dev/st* reads and writes (st_read()/st_write()) feed the
address of buffer to get_user_pages_unlocked().  If somebody replaced
those with get_user_pages_fast(), we'd be in trouble as soon as
some code got tricked into using kernel_write() on /dev/st*.

access_ok() in HAVE_GENERIC_RCU_GUP {__,}get_user_pages_fast() obviously
doesn't help in that scenario.  What am I missing here?

  reply	other threads:[~2017-02-03  7:30 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-24 21:23 [PATCH] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call Jeff Layton
2017-01-25 13:32 ` [PATCH v3 0/2] " Jeff Layton
2017-01-25 13:32   ` [PATCH v3 1/2] " Jeff Layton
2017-01-26 12:35     ` Jeff Layton
2017-01-27 13:24       ` [PATCH v4 0/2] " Jeff Layton
2017-01-27 13:24         ` [PATCH v4 1/2] " Jeff Layton
2017-01-27 13:24         ` [PATCH v4 2/2] ceph: switch DIO code to use iov_iter_get_pages_alloc Jeff Layton
2017-01-30 15:40           ` Jeff Layton
2017-01-30 15:40             ` Jeff Layton
2017-01-25 13:32   ` [PATCH v3 " Jeff Layton
2017-02-02  9:51   ` [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call Al Viro
2017-02-02  9:51     ` [lustre-devel] " Al Viro
2017-02-02  9:51     ` Al Viro
2017-02-02 10:56     ` Christoph Hellwig
2017-02-02 10:56       ` [lustre-devel] " Christoph Hellwig
2017-02-02 10:56       ` Christoph Hellwig
2017-02-02 11:16       ` Al Viro
2017-02-02 11:16         ` [lustre-devel] " Al Viro
2017-02-02 11:16         ` Al Viro
2017-02-02 13:00         ` Jeff Layton
2017-02-02 13:00           ` Jeff Layton
2017-02-03  7:29           ` Al Viro [this message]
2017-02-03  7:29             ` [lustre-devel] " Al Viro
2017-02-03  7:29             ` Al Viro
2017-02-03 18:29             ` Linus Torvalds
2017-02-03 18:29               ` [lustre-devel] " Linus Torvalds
2017-02-03 18:29               ` Linus Torvalds
2017-02-03 19:08               ` Al Viro
2017-02-03 19:08                 ` [lustre-devel] " Al Viro
2017-02-03 19:08                 ` Al Viro
2017-02-03 19:28                 ` Linus Torvalds
2017-02-03 19:28                   ` [lustre-devel] " Linus Torvalds
2017-02-03 19:28                   ` Linus Torvalds
2017-02-13  9:56                   ` Steve Capper
2017-02-13 21:40                     ` Linus Torvalds
2017-02-13 21:40                       ` [lustre-devel] " Linus Torvalds
2017-02-13 21:40                       ` Linus Torvalds
2017-02-03  7:49           ` Christoph Hellwig
2017-02-03  7:49             ` [lustre-devel] " Christoph Hellwig
2017-02-03  7:49             ` Christoph Hellwig
2017-02-03  8:54             ` Al Viro
2017-02-03  8:54               ` [lustre-devel] " Al Viro
2017-02-03  8:54               ` Al Viro
2017-02-03 11:09               ` Christoph Hellwig
2017-02-03 11:09                 ` [lustre-devel] " Christoph Hellwig
2017-02-03 11:09                 ` Christoph Hellwig
2017-02-02 14:48     ` Jan Kara
2017-02-02 14:48       ` [lustre-devel] " Jan Kara
2017-02-02 14:48       ` Jan Kara
2017-02-02 18:28       ` Al Viro
2017-02-02 18:28         ` [lustre-devel] " Al Viro
2017-02-02 18:28         ` Al Viro
2017-02-03 14:47         ` Jan Kara
2017-02-03 14:47           ` [lustre-devel] " Jan Kara
2017-02-03 14:47           ` Jan Kara
2017-02-04  3:08     ` Al Viro
2017-02-04  3:08       ` [lustre-devel] " Al Viro
2017-02-04  3:08       ` Al Viro
2017-02-04 19:26       ` Al Viro
2017-02-04 19:26         ` [lustre-devel] " Al Viro
2017-02-04 19:26         ` Al Viro
2017-02-04 22:12         ` Miklos Szeredi
2017-02-04 22:12           ` Miklos Szeredi
2017-02-04 22:11       ` Miklos Szeredi
2017-02-04 22:11         ` Miklos Szeredi
2017-02-05  1:51         ` Al Viro
2017-02-05  1:51           ` [lustre-devel] " Al Viro
2017-02-05  1:51           ` Al Viro
2017-02-05 20:15           ` Miklos Szeredi
2017-02-05 20:15             ` Miklos Szeredi
2017-02-05 21:01             ` Al Viro
2017-02-05 21:01               ` [lustre-devel] " Al Viro
2017-02-05 21:01               ` Al Viro
2017-02-05 21:19               ` Miklos Szeredi
2017-02-05 21:19                 ` Miklos Szeredi
2017-02-05 22:04                 ` Al Viro
2017-02-05 22:04                   ` [lustre-devel] " Al Viro
2017-02-05 22:04                   ` Al Viro
2017-02-05 22:04                   ` Al Viro
2017-02-06  3:05                   ` Al Viro
2017-02-06  3:05                     ` [lustre-devel] " Al Viro
2017-02-06  3:05                     ` Al Viro
2017-02-06  9:08                     ` Miklos Szeredi
2017-02-06  9:57                       ` Al Viro
2017-02-06  9:57                         ` [lustre-devel] " Al Viro
2017-02-06  9:57                         ` Al Viro
2017-02-06 14:18                         ` Miklos Szeredi
2017-02-07  7:19                           ` Al Viro
2017-02-07  7:19                             ` [lustre-devel] " Al Viro
2017-02-07  7:19                             ` Al Viro
2017-02-07 11:35                             ` Miklos Szeredi
2017-02-07 11:35                               ` Miklos Szeredi
2017-02-08  5:54                               ` Al Viro
2017-02-08  5:54                                 ` [lustre-devel] " Al Viro
2017-02-08  5:54                                 ` Al Viro
2017-02-08  9:53                                 ` Miklos Szeredi
2017-02-06  8:37                   ` Miklos Szeredi
2017-02-05 20:56           ` Al Viro
2017-02-05 20:56             ` [lustre-devel] " Al Viro
2017-02-05 20:56             ` Al Viro
2017-02-16 13:10     ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170203072952.GI27291@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jlayton@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lustre-devel@lists.lustre.org \
    --cc=torvalds@linux-foundation.org \
    --cc=v9fs-developer@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.