All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	linux-block@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 04/11] block: Add bio_for_each_thp_segment_all
Date: Tue, 1 Sep 2020 14:05:25 +0100	[thread overview]
Message-ID: <20200901130525.GK14765@casper.infradead.org> (raw)
In-Reply-To: <20200901053426.GB24560@infradead.org>

On Tue, Sep 01, 2020 at 06:34:26AM +0100, Christoph Hellwig wrote:
> On Mon, Aug 31, 2020 at 08:48:37PM +0100, Matthew Wilcox wrote:
> > static void iomap_read_end_io(struct bio *bio)
> > {
> >         int i, error = blk_status_to_errno(bio->bi_status);
> > 
> >         for (i = 0; i < bio->bi_vcnt; i++) {
> >                 struct bio_vec *bvec = &bio->bi_io_vec[i];
> 
> This should probably use bio_for_each_bvec_all instead of directly
> poking into the bio.  I'd also be tempted to move the loop body into
> a separate helper, but that's just a slight stylistic preference.

Ah, got it.

> >                 size_t offset = bvec->bv_offset;
> >                 size_t length = bvec->bv_len;
> >                 struct page *page = bvec->bv_page;
> > 
> >                 while (length > 0) { 
> >                         size_t count = thp_size(page) - offset;
> >                         
> >                         if (count > length)
> >                                 count = length;
> >                         iomap_read_page_end_io(page, offset, count, error);
> >                         page += (offset + count) / PAGE_SIZE;
> 
> Shouldn't the page_size here be thp_size?

No.  Let's suppose we have a 20kB I/O which starts on a page boundary and
the first page is order-2.  To get from the first head page to the second
page, we need to add 4, which is 16kB / 4kB, not 16kB / 16kB.

> > Maybe I'm missing something important here, but it's significantly
> > simpler code -- iomap_read_end_io() goes down from 816 bytes to 560 bytes
> > (256 bytes less!) iomap_read_page_end_io is inlined into it both before
> > and after.
> 
> Yes, that's exactly why I think avoiding bio_for_each_segment_all is
> a good idea in general.

I took out all the attempts to cope with insane bv_offset to compare like
with like and I got the bio_for_each_thp_segment_all() version down to
656 bytes:

@@ -166,21 +166,15 @@ static inline void bvec_thp_advance(const struct bio_vec *
bvec,
                                struct bvec_iter_all *iter_all)
 {
        struct bio_vec *bv = &iter_all->bv;
-       unsigned int page_size;
 
        if (iter_all->done) {
                bv->bv_page += thp_nr_pages(bv->bv_page);
-               page_size = thp_size(bv->bv_page);
                bv->bv_offset = 0;
        } else {
-               bv->bv_page = thp_head(bvec->bv_page +
-                               (bvec->bv_offset >> PAGE_SHIFT));
-               page_size = thp_size(bv->bv_page);
-               bv->bv_offset = bvec->bv_offset -
-                               (bv->bv_page - bvec->bv_page) * PAGE_SIZE;
-               BUG_ON(bv->bv_offset >= page_size);
+               bv->bv_page = bvec->bv_page;
+               bv->bv_offset = bvec->bv_offset;
        }
-       bv->bv_len = min(page_size - bv->bv_offset,
+       bv->bv_len = min_t(unsigned int, thp_size(bv->bv_page) - bv->bv_offset,
                         bvec->bv_len - iter_all->done);
        iter_all->done += bv->bv_len;

> And yes, eventually bv_page and bv_offset should be replaced with a
> 
> 	phys_addr_t		bv_phys;
> 
> and life would become simpler in many places (and the bvec would
> shrink for most common setups as well).

I'd very much like to see that.  It causes quite considerable pain for
our virtualisation people that we need a struct page.  They'd like the
hypervisor to not have struct pages for the guest's memory, but if they
don't have them, they can't do I/O to them.  Perhaps I'll try getting
one of them to work on this.

I'm not entirely sure the bvec would shrink.  On 64-bit systems, it's
currently 8 bytes for the struct page, 4 bytes for the len and 4 bytes
for the offset.  Sure, we can get rid of the offset, but the compiler
will just pad the struct from 12 bytes back to 16.  On 32-bit systems
with 32-bit phys_addr_t, we go from 12 bytes down to 8, but most 32-bit
systems have a 64-bit phys_addr_t these days, don't they?

> For now I'd end up with something like:
> 
> static void iomap_read_end_bvec(struct page *page, size_t offset,
> 		size_t length, int error)
> {
> 	while (length > 0) {
> 		size_t page_size = thp_size(page);
> 		size_t count = min(page_size - offset, length);
> 
> 		iomap_read_page_end_io(page, offset, count, error);
> 
> 		page += (offset + count) / page_size;
> 		length -= count;
> 		offset = 0;
> 	}
> }
> 
> static void iomap_read_end_io(struct bio *bio)
> {
> 	int i, error = blk_status_to_errno(bio->bi_status);
> 	struct bio_vec *bvec;
> 
> 	bio_for_each_bvec_all(bvec, bio, i)
> 		iomap_read_end_bvec(bvec->bv_page, bvec->bv_offset,
> 				    bvec->bv_len, error;
>         bio_put(bio);
> }
> 
> and maybe even merge iomap_read_page_end_io into iomap_read_end_bvec.

The lines start to get a bit long.  Here's what I currently have on
the write side:

@@ -1104,6 +1117,20 @@ iomap_finish_page_writeback(struct inode *inode, struct p
age *page,
                end_page_writeback(page);
 }
 
+static void iomap_finish_bvec_write(struct inode *inode, struct page *page,
+               size_t offset, size_t length, int error)
+{
+       while (length > 0) {
+               size_t count = min(thp_size(page) - offset, length);
+
+               iomap_finish_page_writeback(inode, page, error, count);
+
+               page += (offset + count) / PAGE_SIZE;
+               offset = 0;
+               length -= count;
+       }
+}
+
 /*
  * We're now finished for good with this ioend structure.  Update the page
  * state, release holds on bios, and finally free up memory.  Do not use the
@@ -1121,7 +1148,7 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 
        for (bio = &ioend->io_inline_bio; bio; bio = next) {
                struct bio_vec *bv;
-               struct bvec_iter_all iter_all;
+               int i;
 
                /*
                 * For the last bio, bi_private points to the ioend, so we
@@ -1133,9 +1160,9 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
                        next = bio->bi_private;
 
                /* walk each page on bio, ending page IO on them */
-               bio_for_each_thp_segment_all(bv, bio, iter_all)
-                       iomap_finish_page_writeback(inode, bv->bv_page, error,
-                                       bv->bv_len);
+               bio_for_each_bvec_all(bv, bio, i)
+                       iomap_finish_bvec_writeback(inode, bv->bv_page,
+                                       bv->bv_offset, bv->bv_len, error);
                bio_put(bio);
        }
        /* The ioend has been freed by bio_put() */

That's a bit more boilerplate than I'd like, but if bio_vec is going to
lose its bv_page then I don't see a better way.  Unless we come up with
a different page/offset/length struct that bio_vecs are decomposed into.


  reply	other threads:[~2020-09-01 14:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 15:16 [PATCH 00/11] iomap/fs/block patches for 5.11 Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 01/11] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 02/11] mm: Support THPs in zero_user_segments Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 03/11] mm: Zero the head page, not the tail page Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 04/11] block: Add bio_for_each_thp_segment_all Matthew Wilcox (Oracle)
2020-08-27  8:44   ` Christoph Hellwig
2020-08-31 19:48     ` Matthew Wilcox
2020-09-01  5:34       ` Christoph Hellwig
2020-09-01 13:05         ` Matthew Wilcox [this message]
2020-09-01 14:50           ` Christoph Hellwig
2020-08-24 15:16 ` [PATCH 05/11] iomap: Support THPs in iomap_adjust_read_range Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 06/11] iomap: Support THPs in invalidatepage Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 07/11] iomap: Support THPs in read paths Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 08/11] iomap: Change iomap_write_begin calling convention Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 09/11] iomap: Support THPs in write paths Matthew Wilcox (Oracle)
2020-08-24 15:16 ` [PATCH 10/11] iomap: Inline data shouldn't see THPs Matthew Wilcox (Oracle)
2020-08-24 15:17 ` [PATCH 11/11] iomap: Handle tail pages in iomap_page_mkwrite Matthew Wilcox (Oracle)
2020-08-25 10:29 ` [PATCH 00/11] iomap/fs/block patches for 5.11 William Kucharski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200901130525.GK14765@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.