All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <willy@infradead.org>,
	Logan Gunthorpe <logang@deltatee.com>,
	Christoph Hellwig <hch@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 7/7] iov_iter, block: Make bio structs pin pages rather than ref'ing if appropriate
Date: Mon, 9 Jan 2023 18:35:13 +0100	[thread overview]
Message-ID: <20230109173513.htfqbkrtqm52pnye@quack3> (raw)
In-Reply-To: <167305166150.1521586.10220949115402059720.stgit@warthog.procyon.org.uk>

On Sat 07-01-23 00:34:21, David Howells wrote:
> Convert the block layer's bio code to use iov_iter_extract_pages() instead
> of iov_iter_get_pages().  This will pin pages or leave them unaltered
> rather than getting a ref on them as appropriate to the source iterator.
> 
> A field, bi_cleanup_mode, is added to the bio struct that gets set by
> iov_iter_extract_pages() with FOLL_* flags indicating what cleanup is
> necessary.  FOLL_GET -> put_page(), FOLL_PIN -> unpin_user_page().  Other
> flags could also be used in future.
> 
> Newly allocated bio structs have bi_cleanup_mode set to FOLL_GET to
> indicate that attached pages are ref'd by default.  Cloning sets it to 0.
> __bio_iov_iter_get_pages() overrides it to what iov_iter_extract_pages()
> indicates.
> 
> [!] Note that this is tested a bit with ext4, but nothing else.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: Logan Gunthorpe <logang@deltatee.com>

So currently we already have BIO_NO_PAGE_REF flag and what you do in this
patch partially duplicates that. So either I'd drop that flag or instead of
bi_cleanup_mode variable (which honestly looks a bit wasteful given how we
microoptimize struct bio) just add another BIO_ flag...

								Honza

> ---
> 
>  block/bio.c               |   47 +++++++++++++++++++++++++++++++++------------
>  include/linux/blk_types.h |    1 +
>  2 files changed, 35 insertions(+), 13 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 5f96fcae3f75..eafcbeba0bab 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -243,6 +243,11 @@ static void bio_free(struct bio *bio)
>   * Users of this function have their own bio allocation. Subsequently,
>   * they must remember to pair any call to bio_init() with bio_uninit()
>   * when IO has completed, or when the bio is released.
> + *
> + * We set the initial assumption that pages attached to the bio will be
> + * released with put_page() by setting bi_cleanup_mode to FOLL_GET, but this
> + * should be set to FOLL_PIN if the page should be unpinned instead; if the
> + * pages should not be put or unpinned, this should be set to 0
>   */
>  void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
>  	      unsigned short max_vecs, blk_opf_t opf)
> @@ -274,6 +279,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
>  #ifdef CONFIG_BLK_DEV_INTEGRITY
>  	bio->bi_integrity = NULL;
>  #endif
> +	bio->bi_cleanup_mode = FOLL_GET;
>  	bio->bi_vcnt = 0;
>  
>  	atomic_set(&bio->__bi_remaining, 1);
> @@ -302,6 +308,7 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf)
>  {
>  	bio_uninit(bio);
>  	memset(bio, 0, BIO_RESET_BYTES);
> +	bio->bi_cleanup_mode = FOLL_GET;
>  	atomic_set(&bio->__bi_remaining, 1);
>  	bio->bi_bdev = bdev;
>  	if (bio->bi_bdev)
> @@ -814,6 +821,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
>  	bio_set_flag(bio, BIO_CLONED);
>  	bio->bi_ioprio = bio_src->bi_ioprio;
>  	bio->bi_iter = bio_src->bi_iter;
> +	bio->bi_cleanup_mode = 0;
>  
>  	if (bio->bi_bdev) {
>  		if (bio->bi_bdev == bio_src->bi_bdev &&
> @@ -1168,6 +1176,18 @@ bool bio_add_folio(struct bio *bio, struct folio *folio, size_t len,
>  	return bio_add_page(bio, &folio->page, len, off) > 0;
>  }
>  
> +/*
> + * Clean up a page according to the mode indicated by iov_iter_extract_pages(),
> + * where the page is may be pinned or may have a ref taken on it.
> + */
> +static void bio_release_page(struct bio *bio, struct page *page)
> +{
> +	if (bio->bi_cleanup_mode & FOLL_PIN)
> +		unpin_user_page(page);
> +	if (bio->bi_cleanup_mode & FOLL_GET)
> +		put_page(page);
> +}
> +
>  void __bio_release_pages(struct bio *bio, bool mark_dirty)
>  {
>  	struct bvec_iter_all iter_all;
> @@ -1176,7 +1196,7 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty)
>  	bio_for_each_segment_all(bvec, bio, iter_all) {
>  		if (mark_dirty && !PageCompound(bvec->bv_page))
>  			set_page_dirty_lock(bvec->bv_page);
> -		put_page(bvec->bv_page);
> +		bio_release_page(bio, bvec->bv_page);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(__bio_release_pages);
> @@ -1213,7 +1233,7 @@ static int bio_iov_add_page(struct bio *bio, struct page *page,
>  	}
>  
>  	if (same_page)
> -		put_page(page);
> +		bio_release_page(bio, page);
>  	return 0;
>  }
>  
> @@ -1227,7 +1247,7 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
>  			queue_max_zone_append_sectors(q), &same_page) != len)
>  		return -EINVAL;
>  	if (same_page)
> -		put_page(page);
> +		bio_release_page(bio, page);
>  	return 0;
>  }
>  
> @@ -1238,10 +1258,10 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
>   * @bio: bio to add pages to
>   * @iter: iov iterator describing the region to be mapped
>   *
> - * Pins pages from *iter and appends them to @bio's bvec array. The
> - * pages will have to be released using put_page() when done.
> - * For multi-segment *iter, this function only adds pages from the
> - * next non-empty segment of the iov iterator.
> + * Pins pages from *iter and appends them to @bio's bvec array.  The pages will
> + * have to be released using put_page() or unpin_user_page() when done as
> + * according to bi_cleanup_mode.  For multi-segment *iter, this function only
> + * adds pages from the next non-empty segment of the iov iterator.
>   */
>  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  {
> @@ -1273,9 +1293,10 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  	 * result to ensure the bio's total size is correct. The remainder of
>  	 * the iov data will be picked up in the next bio iteration.
>  	 */
> -	size = iov_iter_get_pages(iter, pages,
> -				  UINT_MAX - bio->bi_iter.bi_size,
> -				  nr_pages, &offset, gup_flags);
> +	size = iov_iter_extract_pages(iter, &pages,
> +				      UINT_MAX - bio->bi_iter.bi_size,
> +				      nr_pages, gup_flags,
> +				      &offset, &bio->bi_cleanup_mode);
>  	if (unlikely(size <= 0))
>  		return size ? size : -EFAULT;
>  
> @@ -1308,7 +1329,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  	iov_iter_revert(iter, left);
>  out:
>  	while (i < nr_pages)
> -		put_page(pages[i++]);
> +		bio_release_page(bio, pages[i++]);
>  
>  	return ret;
>  }
> @@ -1489,8 +1510,8 @@ void bio_set_pages_dirty(struct bio *bio)
>   * the BIO and re-dirty the pages in process context.
>   *
>   * It is expected that bio_check_pages_dirty() will wholly own the BIO from
> - * here on.  It will run one put_page() against each page and will run one
> - * bio_put() against the BIO.
> + * here on.  It will run one put_page() or unpin_user_page() against each page
> + * and will run one bio_put() against the BIO.
>   */
>  
>  static void bio_dirty_fn(struct work_struct *work);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 99be590f952f..883f873a01ef 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -289,6 +289,7 @@ struct bio {
>  #endif
>  	};
>  
> +	unsigned int		bi_cleanup_mode; /* How to clean up pages */
>  	unsigned short		bi_vcnt;	/* how many bio_vec's */
>  
>  	/*
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2023-01-09 17:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-07  0:33 [PATCH v4 0/7] iov_iter: Add extraction helpers David Howells
2023-01-07  0:33 ` [PATCH v4 1/7] iov_iter: Change the direction macros into an enum David Howells
2023-01-07  0:33 ` [PATCH v4 2/7] iov_iter: Use the direction in the iterator functions David Howells
2023-01-07  0:33 ` [PATCH v4 3/7] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction David Howells
2023-01-07  0:33 ` [PATCH v4 4/7] iov_iter: Add a function to extract a page list from an iterator David Howells
2023-01-07  0:34 ` [PATCH v4 5/7] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator David Howells
2023-01-07  0:34 ` [PATCH v4 6/7] netfs: Add a function to extract an iterator into a scatterlist David Howells
2023-01-07  0:34 ` [PATCH v4 7/7] iov_iter, block: Make bio structs pin pages rather than ref'ing if appropriate David Howells
2023-01-09  3:54   ` Jens Axboe
2023-01-09  9:43   ` David Howells
2023-01-09 17:25     ` Jan Kara
2023-01-09 17:27     ` Jens Axboe
2023-01-10 14:42     ` David Howells
2023-01-11  9:58       ` Jan Kara
2023-01-09 17:35   ` Jan Kara [this message]
2023-01-09 21:37   ` David Howells
2023-01-09 21:57     ` Jens Axboe
2023-01-09 22:24     ` David Howells
2023-01-09 22:57       ` Jens Axboe
2023-01-10 14:37       ` David Howells
2023-01-10 21:41         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230109173513.htfqbkrtqm52pnye@quack3 \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=jlayton@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.