From: Al Viro <viro@zeniv.linux.org.uk>
To: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>,
John Hubbard <jhubbard@nvidia.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jens Axboe <axboe@kernel.dk>, Miklos Szeredi <miklos@szeredi.hu>,
"Darrick J . Wong" <djwong@kernel.org>,
Trond Myklebust <trond.myklebust@hammerspace.com>,
Anna Schumaker <anna@kernel.org>,
David Hildenbrand <david@redhat.com>,
Logan Gunthorpe <logang@deltatee.com>,
linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines
Date: Thu, 22 Sep 2022 03:22:48 +0100 [thread overview]
Message-ID: <YyvG+Oih2A37Grcf@ZenIV> (raw)
In-Reply-To: <20220915081625.6a72nza6yq4l5etp@quack3>
On Thu, Sep 15, 2022 at 10:16:25AM +0200, Jan Kara wrote:
> > How would that work? What protects the area where you want to avoid running
> > into pinned pages from previously acceptable page getting pinned? If "they
> > must have been successfully unmapped" is a part of what you are planning, we
> > really do have a problem...
>
> But this is a very good question. So far the idea was that we lock the
> page, unmap (or writeprotect) the page, and then check pincount == 0 and
> that is a reliable method for making sure page data is stable (until we
> unlock the page & release other locks blocking page faults and writes). But
> once suddently ordinary page references can be used to create pins this
> does not work anymore. Hrm.
>
> Just brainstorming ideas now: So we'd either need to obtain the pins early
> when we still have the virtual address (but I guess that is often not
> practical but should work e.g. for normal direct IO path) or we need some
> way to "simulate" the page fault when pinning the page, just don't map it
> into page tables in the end. This simulated page fault could be perhaps
> avoided if rmap walk shows that the page is already mapped somewhere with
> suitable permissions.
OK. As far as I can see, the rules are along the lines of
* creator of ITER_BVEC/ITER_XARRAY is responsible for pages being safe.
That includes
* page known to be locked by caller
* page being privately allocated and not visible to anyone else
* iterator being data source
* page coming from pin_user_pages(), possibly as the result of
iov_iter_pin_pages() on ITER_IOVEC/ITER_UBUF.
* ITER_PIPE pages are always safe
* pages found in ITER_BVEC/ITER_XARRAY are safe, since the iterator
had been created with such.
My preference would be to have iov_iter_get_pages() and friends pin if and
only if we have data-destination iov_iter that is user-backed. For
data-source user-backed we only need FOLL_GET, and for all other flavours
(ITER_BVEC, etc.) we only do get_page(), if we need to grab any references
at all.
What I'd like to have is the understanding of the places where we drop
the references acquired by iov_iter_get_pages(). How do we decide
whether to unpin? E.g. pipe_buffer carries a reference to page and no
way to tell whether it's a pinned one; results of iov_iter_get_pages()
on ITER_IOVEC *can* end up there, but thankfully only from data-source
(== WRITE, aka. ITER_SOURCE) iov_iter. So for those we don't care.
Then there's nfs_request; AFAICS, we do need to pin the references in
those if they are coming from nfs_direct_read_schedule_iovec(), but
not if they come from readpage_async_filler(). How do we deal with
coalescence, etc.? It's been a long time since I really looked at
that code... Christoph, could you give any comments on that one?
Note, BTW, that nfs_request coming from readpage_async_filler() have
pages locked by caller; the ones from nfs_direct_read_schedule_iovec()
do not, and that's where we want them pinned. Resulting page references
end up (after quite a trip through data structures) stuffed into struct
rpc_rqst ->rc_recv_buf.pages[] and when a response arrives from server,
they get picked by xs_read_bvec() and fed to iov_iter_bvec(). In one
case it's safe since the pages are locked; in another - since they would
come from pin_user_pages(). The call chain at the time they are used
has nothing to do with the originator - sunrpc is looking at the arrived
response to READ that matches an rpc_rqst that had been created by sender
of that request and safety is the sender's responsibility.
next prev parent reply other threads:[~2022-09-22 2:23 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-31 4:18 [PATCH v2 0/7] convert most filesystems to pin_user_pages_fast() John Hubbard
2022-08-31 4:18 ` [PATCH v2 1/7] mm: change release_pages() to use unsigned long for npages John Hubbard
2022-08-31 4:18 ` [PATCH v2 2/7] mm/gup: introduce pin_user_page() John Hubbard
2022-09-06 6:37 ` Christoph Hellwig
2022-09-06 7:12 ` John Hubbard
2022-08-31 4:18 ` [PATCH v2 3/7] block: add dio_w_*() wrappers for pin, unpin user pages John Hubbard
2022-08-31 4:18 ` [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines John Hubbard
2022-09-01 0:42 ` Al Viro
2022-09-01 1:48 ` John Hubbard
2022-09-06 6:47 ` Christoph Hellwig
2022-09-06 7:44 ` John Hubbard
2022-09-06 7:48 ` Christoph Hellwig
2022-09-06 7:58 ` John Hubbard
2022-09-07 8:50 ` Christoph Hellwig
2022-09-06 10:21 ` Jan Kara
2022-09-07 8:45 ` Christoph Hellwig
2022-09-14 3:51 ` Al Viro
2022-09-14 14:52 ` Jan Kara
2022-09-14 16:42 ` Al Viro
2022-09-15 8:16 ` Jan Kara
2022-09-16 1:55 ` Al Viro
2022-09-20 5:02 ` Al Viro
2022-09-22 14:36 ` Christoph Hellwig
2022-09-22 14:43 ` David Hildenbrand
2022-09-22 14:45 ` Christoph Hellwig
2022-09-22 2:22 ` Al Viro [this message]
2022-09-22 6:09 ` John Hubbard
2022-09-22 11:29 ` Jan Kara
2022-09-23 3:19 ` Al Viro
2022-09-23 4:05 ` John Hubbard
2022-09-23 8:39 ` Christoph Hellwig
2022-09-23 12:22 ` Jan Kara
2022-09-23 4:34 ` John Hubbard
2022-09-22 14:38 ` Christoph Hellwig
2022-09-23 4:22 ` Al Viro
2022-09-23 8:44 ` Christoph Hellwig
2022-09-23 16:13 ` Al Viro
2022-09-26 15:53 ` Christoph Hellwig
2022-09-26 19:55 ` Al Viro
2022-09-22 14:31 ` Christoph Hellwig
2022-09-22 14:36 ` Al Viro
2022-08-31 4:18 ` [PATCH v2 5/7] block, bio, fs: convert most filesystems to pin_user_pages_fast() John Hubbard
2022-09-06 6:48 ` Christoph Hellwig
2022-09-06 7:15 ` John Hubbard
2022-08-31 4:18 ` [PATCH v2 6/7] NFS: direct-io: convert to FOLL_PIN pages John Hubbard
2022-09-06 6:49 ` Christoph Hellwig
2022-09-06 7:16 ` John Hubbard
2022-08-31 4:18 ` [PATCH v2 7/7] fuse: convert direct IO paths to use FOLL_PIN John Hubbard
2022-08-31 10:37 ` Miklos Szeredi
2022-09-01 1:33 ` John Hubbard
2022-09-06 6:36 ` [PATCH v2 0/7] convert most filesystems to pin_user_pages_fast() Christoph Hellwig
2022-09-06 7:10 ` John Hubbard
2022-09-06 7:22 ` Christoph Hellwig
2022-09-06 7:37 ` John Hubbard
2022-09-06 7:46 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YyvG+Oih2A37Grcf@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=akpm@linux-foundation.org \
--cc=anna@kernel.org \
--cc=axboe@kernel.dk \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jhubbard@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=miklos@szeredi.hu \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).