linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: dhowells@redhat.com, Jens Axboe <axboe@kernel.dk>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Jeff Layton <jlayton@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Hillf Danton <hdanton@sina.com>,
	Christian Brauner <brauner@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Extending page pinning into fs/direct-io.c
Date: Tue, 23 May 2023 21:16:11 +0100	[thread overview]
Message-ID: <3068545.1684872971@warthog.procyon.org.uk> (raw)
In-Reply-To: <ZGxfrOLZ4aN9/MvE@infradead.org>

Christoph Hellwig <hch@infradead.org> wrote:

> But can you please also take care of the legacy direct I/O code?  I'd really
> hate to leave yet another unfinished transition around.

I've been poking at it this afternoon, but it doesn't look like it's going to
be straightforward, unfortunately.  The mm folks have been withdrawing access
to the pinning API behind the ramparts of the mm/ dir.  Further, the dio code
will (I think), under some circumstances, arbitrarily insert the zero_page
into a list of things that are maybe pinned or maybe unpinned, but I can (I
think) also be given a pinned zero_page from the GUP code if the page tables
point to one and a DIO-write is requested - so just doing if page == zero_page
isn't sufficient.

What I'd like to do is to make the GUP code not take a ref on the zero_page
if, say, FOLL_DONT_PIN_ZEROPAGE is passed in, and then make the bio cleanup
code always ignore the zero_page.

Alternatively, I can drop the pin immediately if I get given one on the
zero_page - it's not going anywhere, after all.

I also need to be able to take an additional pin on a folio that gets split
across multiple bio submissions to replace the get_page() that's there now.

Alternatively to that, I can decide how much data I'm willing to read/write in
one batch, call something like netfs_extract_user_iter() to decant that
portion of the parameter iterator into an bvec[] and let that look up the
overlapping page multiple times.  However, I'm not sure if this would work
well for a couple of reasons: does a single bio have to refer to a contiguous
range of disk blocks?  and we might expend time on getting pages we then have
to give up because we hit a hole.

Something that I noticed is that the dio code seems to wangle to page bits on
the target pages for a DIO-read, which seems odd, but I'm not sure I fully
understand the code yet.

David


  parent reply	other threads:[~2023-05-23 20:17 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-22 20:57 [PATCH v21 0/6] block: Use page pinning David Howells
2023-05-22 20:57 ` [PATCH v21 1/6] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
2023-05-23  8:07   ` Jan Kara
2023-05-23 12:35   ` Christian Brauner
2023-05-22 20:57 ` [PATCH v21 2/6] block: Fix bio_flagged() so that gcc can better optimise it David Howells
2023-05-23  8:07   ` Jan Kara
2023-05-23 12:37   ` Christian Brauner
2023-05-22 20:57 ` [PATCH v21 3/6] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic David Howells
2023-05-23  8:07   ` Jan Kara
2023-05-22 20:57 ` [PATCH v21 4/6] block: Add BIO_PAGE_PINNED and associated infrastructure David Howells
2023-05-23  8:08   ` Jan Kara
2023-05-22 20:57 ` [PATCH v21 5/6] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
2023-05-23  8:15   ` Jan Kara
2023-05-22 20:57 ` [PATCH v21 6/6] block: convert bio_map_user_iov " David Howells
2023-05-23  8:14   ` Jan Kara
2023-05-23  6:39 ` [PATCH v21 0/6] block: Use page pinning Christoph Hellwig
2023-05-23 20:16 ` David Howells [this message]
2023-05-24  5:55   ` Extending page pinning into fs/direct-io.c Christoph Hellwig
2023-05-24  7:06   ` David Hildenbrand
2023-05-24  8:47   ` David Howells
2023-05-25  9:51     ` Christoph Hellwig
2023-05-25 16:31     ` Linus Torvalds
2023-05-25 16:45       ` David Hildenbrand
2023-05-25 17:04         ` Linus Torvalds
2023-05-25 17:15         ` David Howells
2023-05-25 17:25           ` Linus Torvalds
2023-05-25 17:07       ` David Howells
2023-05-25 17:17         ` Linus Torvalds
2023-05-25 17:00     ` David Howells
2023-05-25 17:13       ` Linus Torvalds
2023-05-23 21:38 ` [PATCH v21 0/6] block: Use page pinning Jens Axboe
2023-05-24  5:52   ` Christoph Hellwig
2023-05-24 14:43     ` Jens Axboe
2023-05-24  7:35   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3068545.1684872971@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=hch@infradead.org \
    --cc=hdanton@sina.com \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jlayton@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).