All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>, Miklos Szeredi <miklos@szeredi.hu>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Anna Schumaker <anna@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines
Date: Fri, 23 Sep 2022 14:22:43 +0200	[thread overview]
Message-ID: <20220923122243.bbw6agvopkhz5yud@quack3> (raw)
In-Reply-To: <7e652ba4-8b03-59e0-a9ef-1118c4bbd492@nvidia.com>

On Thu 22-09-22 21:05:16, John Hubbard wrote:
> On 9/22/22 20:19, Al Viro wrote:
> > On Thu, Sep 22, 2022 at 01:29:35PM +0200, Jan Kara wrote:
> > 
> >>> This rule would mostly work, as long as we can relax it in some cases, to
> >>> allow pinning of both source and dest pages, instead of just destination
> >>> pages, in some cases. In particular, bio_release_pages() has lost all
> >>> context about whether it was a read or a write request, as far as I can
> >>> tell. And bio_release_pages() is the primary place to unpin pages for
> >>> direct IO.
> >>
> >> Well, we already do have BIO_NO_PAGE_REF bio flag that gets checked in
> >> bio_release_pages(). I think we can easily spare another bio flag to tell
> >> whether we need to unpin or not. So as long as all the pages in the created
> >> bio need the same treatment, the situation should be simple.
> > 
> > Yes.  Incidentally, the same condition is already checked by the creators
> > of those bio - see the assorted should_dirty logics.
> 
> Beautiful!
> 
> > 
> > While we are at it - how much of the rationale around bio_check_pages_dirty()
> > doing dirtying is still applicable with pinning pages before we stick them
> > into bio?  We do dirty them before submitting bio, then on completion
> > bio_check_pages_dirty() checks if something has marked them clean while
> > we'd been doing IO; if all of them are still dirty we just drop the pages
> > (well, unpin and drop), otherwise we arrange for dirty + unpin + drop
> > done in process context (via schedule_work()).  Can they be marked clean by
> > anyone while they are pinned?  After all, pinning is done to prevent
> > writeback getting done on them while we are modifying the suckers...
> 
> I certainly hope not. And in fact, we should really just say that that's
> a rule: the whole time the page is pinned, it simply must remain dirty
> and writable, at least with the way things are right now.

I agree the page should be staying dirty the whole time it is pinned. I
don't think it is feasible to keep it writeable in the page tables because
that would mean you would need to block e.g. munmap() until the pages gets
unpinned and that will almost certainly upset some current userspace.

But keeping page dirty should be enough so that we can get rid of all these
nasty calls to set_page_dirty() from IO completion.

> This reminds me that I'm not exactly sure what the rules for
> FOLL_LONGTERM callers should be, with respect to dirtying. At the
> moment, most, if not all of the code that does "set_page_dirty_lock();
> unpin_user_page()" is wrong.

Right.

> To fix those cases, IIUC, the answer is: you must make the page dirty
> properly, with page_mkwrite(), not just with set_page_dirty_lock(). And

Correct, and GUP (or PUP) actually does that under the hood so I don't
think we need to change anything there.

> that has to be done probably a lot earlier, for reasons that I'm still
> vague on. But perhaps right after pinning the page. (Assuming that we
> hold off writeback while the page is pinned.)

Holding off writeback is not always doable - as Christoph mentions, for
data integrity writeback we'll have to get the data to disk before the page
is unpinned (as for longterm users it can take days for the page to be
unpinned). But we can just writeback the page without clearing the dirty
bit in these cases. We may need to use bounce pages to be able to safely
writeback pinned pages but that's another part of the story...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2022-09-23 12:27 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-31  4:18 [PATCH v2 0/7] convert most filesystems to pin_user_pages_fast() John Hubbard
2022-08-31  4:18 ` [PATCH v2 1/7] mm: change release_pages() to use unsigned long for npages John Hubbard
2022-08-31  4:18 ` [PATCH v2 2/7] mm/gup: introduce pin_user_page() John Hubbard
2022-09-06  6:37   ` Christoph Hellwig
2022-09-06  7:12     ` John Hubbard
2022-08-31  4:18 ` [PATCH v2 3/7] block: add dio_w_*() wrappers for pin, unpin user pages John Hubbard
2022-08-31  4:18 ` [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines John Hubbard
2022-09-01  0:42   ` Al Viro
2022-09-01  1:48     ` John Hubbard
2022-09-06  6:47   ` Christoph Hellwig
2022-09-06  7:44     ` John Hubbard
2022-09-06  7:48       ` Christoph Hellwig
2022-09-06  7:58         ` John Hubbard
2022-09-07  8:50           ` Christoph Hellwig
2022-09-06 10:21         ` Jan Kara
2022-09-07  8:45           ` Christoph Hellwig
2022-09-14  3:51             ` Al Viro
2022-09-14 14:52               ` Jan Kara
2022-09-14 16:42                 ` Al Viro
2022-09-15  8:16                   ` Jan Kara
2022-09-16  1:55                     ` Al Viro
2022-09-20  5:02                       ` Al Viro
2022-09-22 14:36                         ` Christoph Hellwig
2022-09-22 14:43                           ` David Hildenbrand
2022-09-22 14:45                             ` Christoph Hellwig
2022-09-22  2:22                     ` Al Viro
2022-09-22  6:09                       ` John Hubbard
2022-09-22 11:29                         ` Jan Kara
2022-09-23  3:19                           ` Al Viro
2022-09-23  4:05                             ` John Hubbard
2022-09-23  8:39                               ` Christoph Hellwig
2022-09-23 12:22                               ` Jan Kara [this message]
2022-09-23  4:34                           ` John Hubbard
2022-09-22 14:38                       ` Christoph Hellwig
2022-09-23  4:22                         ` Al Viro
2022-09-23  8:44                           ` Christoph Hellwig
2022-09-23 16:13                             ` Al Viro
2022-09-26 15:53                               ` Christoph Hellwig
2022-09-26 19:55                                 ` Al Viro
2022-09-22 14:31               ` Christoph Hellwig
2022-09-22 14:36                 ` Al Viro
2022-08-31  4:18 ` [PATCH v2 5/7] block, bio, fs: convert most filesystems to pin_user_pages_fast() John Hubbard
2022-09-06  6:48   ` Christoph Hellwig
2022-09-06  7:15     ` John Hubbard
2022-08-31  4:18 ` [PATCH v2 6/7] NFS: direct-io: convert to FOLL_PIN pages John Hubbard
2022-09-06  6:49   ` Christoph Hellwig
2022-09-06  7:16     ` John Hubbard
2022-08-31  4:18 ` [PATCH v2 7/7] fuse: convert direct IO paths to use FOLL_PIN John Hubbard
2022-08-31 10:37   ` Miklos Szeredi
2022-09-01  1:33     ` John Hubbard
2022-09-06  6:36 ` [PATCH v2 0/7] convert most filesystems to pin_user_pages_fast() Christoph Hellwig
2022-09-06  7:10   ` John Hubbard
2022-09-06  7:22     ` Christoph Hellwig
2022-09-06  7:37       ` John Hubbard
2022-09-06  7:46         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220923122243.bbw6agvopkhz5yud@quack3 \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=anna@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jhubbard@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=miklos@szeredi.hu \
    --cc=trond.myklebust@hammerspace.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.