All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Al Viro <viro@ZenIV.linux.org.uk>, Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org,
	lustre-devel@lists.lustre.org,
	v9fs-developer@lists.sourceforge.net,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jan Kara <jack@suse.cz>, Chris Wilson <chris@chris-wilson.co.uk>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call
Date: Thu, 02 Feb 2017 08:00:52 -0500	[thread overview]
Message-ID: <1486040452.2812.6.camel@redhat.com> (raw)
In-Reply-To: <20170202111625.GG27291@ZenIV.linux.org.uk>

On Thu, 2017-02-02 at 11:16 +0000, Al Viro wrote:
> On Thu, Feb 02, 2017 at 02:56:51AM -0800, Christoph Hellwig wrote:
> > > I really wonder if this is the right approach.  Most of the users of
> > > iov_iter_get_pages()/iov_iter_get_pages_alloc() look like they want
> > > something like
> > > 	iov_iter_for_each_page(iter, size, f, data)
> > > with int (*f)(struct page *page, size_t from, size_t size, void *data)
> > > passed as callback.  Not everything fits that model, but there's a whole
> > > lot of things that do.
> > 

Yeah, that does seem nicer.

My only concern is that it sounds more invasive, and I'd like to get
some sort of interim fix in for Ceph in the meantime (preferably
something we can send to stable). It really is trivial to vmsplice some
data into the kernel, and then splice write that to a DIO file --
instant softlockup in my testing.

This patch also makes vectored DIO with small iovecs on NFS a lot more
efficient, which is a nice bonus.
 
> > I was planning to do that, mostly because of the iomap dio code that
> > would not only get a lot cleaner with this, but also support multi-page
> > bvecs that we hope to have in the block layer soon.  The issue with it
> > is that we need to touch all the arch get_user_pages_fast
> > implementations, so it's going to be a relatively invasive change that I
> > didn't want to fix with just introducing the new direct I/O code.
> 
> I'm not sure we need to touch any get_user_pages_fast() at all; let it
> fill a medium-sized array and use that as a buffer.  In particular,
> I *really* don't like the idea of having the callbacks done in an
> inconsistent locking environment - sometimes under ->mmap_sem, sometimes
> not.
> 

Yeah, that might work. You could kmalloc the buffer array according to
the maxsize value. For small ones we could even consider using an on-
stack buffer.

> I played with "let it fill bio_vec array", but it doesn't really fit the
> users; variant with callbacks is cleaner, IMO.

I looked at converting this over to a bio_vec array as well, but that
really doesn't work well for Ceph. I like the callback idea better too.

-- 
Jeff Layton <jlayton@redhat.com>

WARNING: multiple messages have this Message-ID (diff)
From: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	lustre-devel-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org,
	v9fs-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	Chris Wilson
	<chris-Y6uKTt2uX1cEflXRtASbqLVCufUGDwFn@public.gmane.org>,
	"Kirill A. Shutemov"
	<kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Subject: Re: [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call
Date: Thu, 02 Feb 2017 08:00:52 -0500	[thread overview]
Message-ID: <1486040452.2812.6.camel@redhat.com> (raw)
In-Reply-To: <20170202111625.GG27291-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>

On Thu, 2017-02-02 at 11:16 +0000, Al Viro wrote:
> On Thu, Feb 02, 2017 at 02:56:51AM -0800, Christoph Hellwig wrote:
> > > I really wonder if this is the right approach.  Most of the users of
> > > iov_iter_get_pages()/iov_iter_get_pages_alloc() look like they want
> > > something like
> > > 	iov_iter_for_each_page(iter, size, f, data)
> > > with int (*f)(struct page *page, size_t from, size_t size, void *data)
> > > passed as callback.  Not everything fits that model, but there's a whole
> > > lot of things that do.
> > 

Yeah, that does seem nicer.

My only concern is that it sounds more invasive, and I'd like to get
some sort of interim fix in for Ceph in the meantime (preferably
something we can send to stable). It really is trivial to vmsplice some
data into the kernel, and then splice write that to a DIO file --
instant softlockup in my testing.

This patch also makes vectored DIO with small iovecs on NFS a lot more
efficient, which is a nice bonus.
 
> > I was planning to do that, mostly because of the iomap dio code that
> > would not only get a lot cleaner with this, but also support multi-page
> > bvecs that we hope to have in the block layer soon.  The issue with it
> > is that we need to touch all the arch get_user_pages_fast
> > implementations, so it's going to be a relatively invasive change that I
> > didn't want to fix with just introducing the new direct I/O code.
> 
> I'm not sure we need to touch any get_user_pages_fast() at all; let it
> fill a medium-sized array and use that as a buffer.  In particular,
> I *really* don't like the idea of having the callbacks done in an
> inconsistent locking environment - sometimes under ->mmap_sem, sometimes
> not.
> 

Yeah, that might work. You could kmalloc the buffer array according to
the maxsize value. For small ones we could even consider using an on-
stack buffer.

> I played with "let it fill bio_vec array", but it doesn't really fit the
> users; variant with callbacks is cleaner, IMO.

I looked at converting this over to a bio_vec array as well, but that
really doesn't work well for Ceph. I like the callback idea better too.

-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-02-02 13:00 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-24 21:23 [PATCH] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call Jeff Layton
2017-01-25 13:32 ` [PATCH v3 0/2] " Jeff Layton
2017-01-25 13:32   ` [PATCH v3 1/2] " Jeff Layton
2017-01-26 12:35     ` Jeff Layton
2017-01-27 13:24       ` [PATCH v4 0/2] " Jeff Layton
2017-01-27 13:24         ` [PATCH v4 1/2] " Jeff Layton
2017-01-27 13:24         ` [PATCH v4 2/2] ceph: switch DIO code to use iov_iter_get_pages_alloc Jeff Layton
2017-01-30 15:40           ` Jeff Layton
2017-01-30 15:40             ` Jeff Layton
2017-01-25 13:32   ` [PATCH v3 " Jeff Layton
2017-02-02  9:51   ` [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call Al Viro
2017-02-02  9:51     ` [lustre-devel] " Al Viro
2017-02-02  9:51     ` Al Viro
2017-02-02 10:56     ` Christoph Hellwig
2017-02-02 10:56       ` [lustre-devel] " Christoph Hellwig
2017-02-02 10:56       ` Christoph Hellwig
2017-02-02 11:16       ` Al Viro
2017-02-02 11:16         ` [lustre-devel] " Al Viro
2017-02-02 11:16         ` Al Viro
2017-02-02 13:00         ` Jeff Layton [this message]
2017-02-02 13:00           ` Jeff Layton
2017-02-03  7:29           ` Al Viro
2017-02-03  7:29             ` [lustre-devel] " Al Viro
2017-02-03  7:29             ` Al Viro
2017-02-03 18:29             ` Linus Torvalds
2017-02-03 18:29               ` [lustre-devel] " Linus Torvalds
2017-02-03 18:29               ` Linus Torvalds
2017-02-03 19:08               ` Al Viro
2017-02-03 19:08                 ` [lustre-devel] " Al Viro
2017-02-03 19:08                 ` Al Viro
2017-02-03 19:28                 ` Linus Torvalds
2017-02-03 19:28                   ` [lustre-devel] " Linus Torvalds
2017-02-03 19:28                   ` Linus Torvalds
2017-02-13  9:56                   ` Steve Capper
2017-02-13 21:40                     ` Linus Torvalds
2017-02-13 21:40                       ` [lustre-devel] " Linus Torvalds
2017-02-13 21:40                       ` Linus Torvalds
2017-02-03  7:49           ` Christoph Hellwig
2017-02-03  7:49             ` [lustre-devel] " Christoph Hellwig
2017-02-03  7:49             ` Christoph Hellwig
2017-02-03  8:54             ` Al Viro
2017-02-03  8:54               ` [lustre-devel] " Al Viro
2017-02-03  8:54               ` Al Viro
2017-02-03 11:09               ` Christoph Hellwig
2017-02-03 11:09                 ` [lustre-devel] " Christoph Hellwig
2017-02-03 11:09                 ` Christoph Hellwig
2017-02-02 14:48     ` Jan Kara
2017-02-02 14:48       ` [lustre-devel] " Jan Kara
2017-02-02 14:48       ` Jan Kara
2017-02-02 18:28       ` Al Viro
2017-02-02 18:28         ` [lustre-devel] " Al Viro
2017-02-02 18:28         ` Al Viro
2017-02-03 14:47         ` Jan Kara
2017-02-03 14:47           ` [lustre-devel] " Jan Kara
2017-02-03 14:47           ` Jan Kara
2017-02-04  3:08     ` Al Viro
2017-02-04  3:08       ` [lustre-devel] " Al Viro
2017-02-04  3:08       ` Al Viro
2017-02-04 19:26       ` Al Viro
2017-02-04 19:26         ` [lustre-devel] " Al Viro
2017-02-04 19:26         ` Al Viro
2017-02-04 22:12         ` Miklos Szeredi
2017-02-04 22:12           ` Miklos Szeredi
2017-02-04 22:11       ` Miklos Szeredi
2017-02-04 22:11         ` Miklos Szeredi
2017-02-05  1:51         ` Al Viro
2017-02-05  1:51           ` [lustre-devel] " Al Viro
2017-02-05  1:51           ` Al Viro
2017-02-05 20:15           ` Miklos Szeredi
2017-02-05 20:15             ` Miklos Szeredi
2017-02-05 21:01             ` Al Viro
2017-02-05 21:01               ` [lustre-devel] " Al Viro
2017-02-05 21:01               ` Al Viro
2017-02-05 21:19               ` Miklos Szeredi
2017-02-05 21:19                 ` Miklos Szeredi
2017-02-05 22:04                 ` Al Viro
2017-02-05 22:04                   ` [lustre-devel] " Al Viro
2017-02-05 22:04                   ` Al Viro
2017-02-05 22:04                   ` Al Viro
2017-02-06  3:05                   ` Al Viro
2017-02-06  3:05                     ` [lustre-devel] " Al Viro
2017-02-06  3:05                     ` Al Viro
2017-02-06  9:08                     ` Miklos Szeredi
2017-02-06  9:57                       ` Al Viro
2017-02-06  9:57                         ` [lustre-devel] " Al Viro
2017-02-06  9:57                         ` Al Viro
2017-02-06 14:18                         ` Miklos Szeredi
2017-02-07  7:19                           ` Al Viro
2017-02-07  7:19                             ` [lustre-devel] " Al Viro
2017-02-07  7:19                             ` Al Viro
2017-02-07 11:35                             ` Miklos Szeredi
2017-02-07 11:35                               ` Miklos Szeredi
2017-02-08  5:54                               ` Al Viro
2017-02-08  5:54                                 ` [lustre-devel] " Al Viro
2017-02-08  5:54                                 ` Al Viro
2017-02-08  9:53                                 ` Miklos Szeredi
2017-02-06  8:37                   ` Miklos Szeredi
2017-02-05 20:56           ` Al Viro
2017-02-05 20:56             ` [lustre-devel] " Al Viro
2017-02-05 20:56             ` Al Viro
2017-02-16 13:10     ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1486040452.2812.6.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lustre-devel@lists.lustre.org \
    --cc=torvalds@linux-foundation.org \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.