linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/15] Keep track of GUPed pages in fs and block
@ 2019-04-11 21:08 jglisse
  2019-04-11 21:08 ` [PATCH v1 01/15] fs/direct-io: fix trailing whitespace issues jglisse
                   ` (16 more replies)
  0 siblings, 17 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox, Steve French,
	linux-cifs, samba-technical, Yan Zheng, Sage Weil, Ilya Dryomov,
	Alex Elder, ceph-devel, Eric Van Hensbergen, Latchesar Ionkov,
	Mike Marshall, Martin Brandenburg, devel, Dominique Martinet,
	v9fs-developer, Coly Li, Kent Overstreet, linux-bcache,
	Ernesto A . Fernández

From: Jérôme Glisse <jglisse@redhat.com>

This patchset depends on various small fixes [1] and also on patchset
which introduce put_user_page*() [2] and thus is 5.3 material as those
pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
so that it can get review and comments on how and what should be done
to test things.

For various reasons [2] [3] we want to track page reference through GUP
differently than "regular" page reference. Thus we need to keep track
of how we got a page within the block and fs layer. To do so this patch-
set change the bio_bvec struct to store a pfn and flags instead of a
direct pointer to a page. This way we can flag page that are coming from
GUP.

This patchset is divided as follow:
    - First part of the patchset is just small cleanup i believe they
      can go in as his assuming people are ok with them.
    - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
      done in multi-step, first we replace all direct dereference of
      the field by call to inline helper, then we introduce macro for
      bio_bvec that are initialized on the stack. Finaly we change the
      bv_page field to bv_pfn.
    - Third part replace put_page(bv_page(bio_vec)) with a new helper
      which will use put_user_page() when the page in the bio_vec is
      coming from GUP.
    - Fourth part update BIO to use bv_set_user_page() for page that
      are coming from GUP this means updating bio_add_page*() to pass
      down the origin of the page (GUP or not).
    - Fith part convert few more places that directly use bvec_io or
      BIO.

Note that after this patchset they are still places in the kernel where
we should use put_user_page*(). The intention is to separate that task
in chewable chunk (driver by driver, sub-system by sub-system).


I have only lightly tested this patchset (branch [4]) on my desktop and
have not seen anything obviously wrong but i might have miss something.
What kind of test suite should i run to stress test the vfs/block layer
around DIO and BIO ?


Note that you coccinelle [5] recent enough for the semantic patch to work
properly ([5] with git commit >= eac73d191e4f03d759957fc5620062428fadada8).

Cheers,
Jérôme Glisse

[1] https://cgit.freedesktop.org/~glisse/linux/commit/?h=gup-fs-block&id=5f67db69fd9f95d12987d2a030a82bc390e05a71
    https://cgit.freedesktop.org/~glisse/linux/commit/?h=gup-fs-block&id=b070348d0e1fd9397eb8d0e97b4c89f1d04d5a0a
    https://cgit.freedesktop.org/~glisse/linux/commit/?h=gup-fs-block&id=83691c86a6c8f560b5b78f3f57fcd62c0f3f1c7a
[2] https://lkml.org/lkml/2019/3/26/1395
[3] https://lwn.net/Articles/753027/
[4] https://cgit.freedesktop.org/~glisse/linux/log/?h=gup-fs-block
[5] https://github.com/coccinelle/coccinelle

Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: Yan Zheng <zyan@redhat.com>
Cc: Sage Weil <sage@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Martin Brandenburg <martin@omnibond.com>
Cc: devel@lists.orangefs.org
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: v9fs-developer@lists.sourceforge.net
Cc: Coly Li <colyli@suse.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-bcache@vger.kernel.org
Cc: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>

Jérôme Glisse (15):
  fs/direct-io: fix trailing whitespace issues
  iov_iter: add helper to test if an iter would use GUP
  block: introduce bvec_page()/bvec_set_page() to get/set
    bio_vec.bv_page
  block: introduce BIO_VEC_INIT() macro to initialize bio_vec structure
  block: replace all bio_vec->bv_page by bvec_page()/bvec_set_page()
  block: convert bio_vec.bv_page to bv_pfn to store pfn and not page
  block: add bvec_put_page_dirty*() to replace put_page(bvec_page())
  block: use bvec_put_page() instead of put_page(bvec_page())
  block: bvec_put_page_dirty* instead of set_page_dirty* and
    bvec_put_page
  block: add gup flag to
    bio_add_page()/bio_add_pc_page()/__bio_add_page()
  block: make sure bio_add_page*() knows page that are coming from GUP
  fs/direct-io: keep track of wether a page is coming from GUP or not
  fs/splice: use put_user_page() when appropriate
  fs: use bvec_set_gup_page() where appropriate
  ceph: use put_user_pages() instead of ceph_put_page_vector()

 Documentation/block/biodoc.txt      |  7 +-
 arch/m68k/emu/nfblock.c             |  2 +-
 arch/um/drivers/ubd_kern.c          |  2 +-
 arch/xtensa/platforms/iss/simdisk.c |  2 +-
 block/bio-integrity.c               |  8 +--
 block/bio.c                         | 92 ++++++++++++++++-----------
 block/blk-core.c                    |  2 +-
 block/blk-integrity.c               |  7 +-
 block/blk-lib.c                     |  5 +-
 block/blk-merge.c                   |  9 +--
 block/blk.h                         |  4 +-
 block/bounce.c                      | 26 ++++----
 block/t10-pi.c                      |  4 +-
 drivers/block/aoe/aoecmd.c          |  4 +-
 drivers/block/brd.c                 |  2 +-
 drivers/block/drbd/drbd_actlog.c    |  2 +-
 drivers/block/drbd/drbd_bitmap.c    |  4 +-
 drivers/block/drbd/drbd_main.c      |  4 +-
 drivers/block/drbd/drbd_receiver.c  |  6 +-
 drivers/block/drbd/drbd_worker.c    |  2 +-
 drivers/block/floppy.c              |  6 +-
 drivers/block/loop.c                | 16 ++---
 drivers/block/null_blk_main.c       |  6 +-
 drivers/block/pktcdvd.c             |  4 +-
 drivers/block/ps3disk.c             |  2 +-
 drivers/block/ps3vram.c             |  2 +-
 drivers/block/rbd.c                 | 12 ++--
 drivers/block/rsxx/dma.c            |  3 +-
 drivers/block/umem.c                |  2 +-
 drivers/block/virtio_blk.c          |  4 +-
 drivers/block/xen-blkback/blkback.c |  2 +-
 drivers/block/zram/zram_drv.c       | 24 +++----
 drivers/lightnvm/core.c             |  2 +-
 drivers/lightnvm/pblk-core.c        | 12 ++--
 drivers/lightnvm/pblk-rb.c          |  2 +-
 drivers/lightnvm/pblk-read.c        |  6 +-
 drivers/md/bcache/btree.c           |  2 +-
 drivers/md/bcache/debug.c           |  4 +-
 drivers/md/bcache/request.c         |  4 +-
 drivers/md/bcache/super.c           |  6 +-
 drivers/md/bcache/util.c            | 11 ++--
 drivers/md/dm-bufio.c               |  2 +-
 drivers/md/dm-crypt.c               | 18 ++++--
 drivers/md/dm-integrity.c           | 18 +++---
 drivers/md/dm-io.c                  |  7 +-
 drivers/md/dm-log-writes.c          | 20 +++---
 drivers/md/dm-verity-target.c       |  4 +-
 drivers/md/dm-writecache.c          |  3 +-
 drivers/md/dm-zoned-metadata.c      |  6 +-
 drivers/md/md.c                     |  4 +-
 drivers/md/raid1-10.c               |  2 +-
 drivers/md/raid1.c                  |  4 +-
 drivers/md/raid10.c                 |  4 +-
 drivers/md/raid5-cache.c            |  7 +-
 drivers/md/raid5-ppl.c              |  6 +-
 drivers/md/raid5.c                  | 10 +--
 drivers/nvdimm/blk.c                |  6 +-
 drivers/nvdimm/btt.c                |  5 +-
 drivers/nvdimm/pmem.c               |  4 +-
 drivers/nvme/host/core.c            |  4 +-
 drivers/nvme/host/tcp.c             |  2 +-
 drivers/nvme/target/io-cmd-bdev.c   |  2 +-
 drivers/nvme/target/io-cmd-file.c   |  2 +-
 drivers/s390/block/dasd_diag.c      |  2 +-
 drivers/s390/block/dasd_eckd.c      | 14 ++--
 drivers/s390/block/dasd_fba.c       |  6 +-
 drivers/s390/block/dcssblk.c        |  2 +-
 drivers/s390/block/scm_blk.c        |  2 +-
 drivers/s390/block/xpram.c          |  2 +-
 drivers/scsi/sd.c                   | 25 ++++----
 drivers/staging/erofs/data.c        |  6 +-
 drivers/staging/erofs/unzip_vle.c   |  4 +-
 drivers/target/target_core_file.c   |  6 +-
 drivers/target/target_core_iblock.c |  4 +-
 drivers/target/target_core_pscsi.c  |  2 +-
 drivers/xen/biomerge.c              |  4 +-
 fs/9p/vfs_addr.c                    |  4 +-
 fs/afs/fsclient.c                   |  2 +-
 fs/afs/rxrpc.c                      |  4 +-
 fs/afs/yfsclient.c                  |  2 +-
 fs/block_dev.c                      | 10 ++-
 fs/btrfs/check-integrity.c          |  6 +-
 fs/btrfs/compression.c              | 22 +++----
 fs/btrfs/disk-io.c                  |  4 +-
 fs/btrfs/extent_io.c                | 16 ++---
 fs/btrfs/file-item.c                |  8 +--
 fs/btrfs/inode.c                    | 20 +++---
 fs/btrfs/raid56.c                   |  8 +--
 fs/btrfs/scrub.c                    | 10 +--
 fs/buffer.c                         |  4 +-
 fs/ceph/file.c                      | 20 +++---
 fs/cifs/connect.c                   |  4 +-
 fs/cifs/misc.c                      | 14 ++--
 fs/cifs/smb2ops.c                   |  2 +-
 fs/cifs/smbdirect.c                 |  2 +-
 fs/cifs/transport.c                 |  2 +-
 fs/crypto/bio.c                     |  4 +-
 fs/direct-io.c                      | 94 +++++++++++++++++++--------
 fs/ext4/page-io.c                   |  4 +-
 fs/ext4/readpage.c                  |  4 +-
 fs/f2fs/data.c                      | 20 +++---
 fs/gfs2/lops.c                      |  8 +--
 fs/gfs2/meta_io.c                   |  4 +-
 fs/gfs2/ops_fstype.c                |  2 +-
 fs/hfsplus/wrapper.c                |  3 +-
 fs/io_uring.c                       |  4 +-
 fs/iomap.c                          | 10 +--
 fs/jfs/jfs_logmgr.c                 |  4 +-
 fs/jfs/jfs_metapage.c               |  6 +-
 fs/mpage.c                          |  6 +-
 fs/nfs/blocklayout/blocklayout.c    |  2 +-
 fs/nilfs2/segbuf.c                  |  3 +-
 fs/ocfs2/cluster/heartbeat.c        |  2 +-
 fs/orangefs/inode.c                 |  2 +-
 fs/splice.c                         | 13 ++--
 fs/xfs/xfs_aops.c                   |  8 +--
 fs/xfs/xfs_buf.c                    |  2 +-
 include/linux/bio.h                 | 13 ++--
 include/linux/bvec.h                | 99 +++++++++++++++++++++++++----
 include/linux/uio.h                 | 11 ++++
 kernel/power/swap.c                 |  2 +-
 lib/iov_iter.c                      | 32 +++++-----
 mm/page_io.c                        |  8 +--
 net/ceph/messenger.c                | 10 +--
 net/sunrpc/xdr.c                    |  2 +-
 net/sunrpc/xprtsock.c               |  4 +-
 126 files changed, 628 insertions(+), 467 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v1 01/15] fs/direct-io: fix trailing whitespace issues
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 02/15] iov_iter: add helper to test if an iter would use GUP jglisse
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox,
	Ernesto A . Fernández

From: Jérôme Glisse <jglisse@redhat.com>

Remove bunch of trailing whitespace. Just hurts my eyes.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
---
 fs/direct-io.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 9bb015bc4a83..52a18858e3e7 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -196,7 +196,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
 		sdio->to = ((ret - 1) & (PAGE_SIZE - 1)) + 1;
 		return 0;
 	}
-	return ret;	
+	return ret;
 }
 
 /*
@@ -344,7 +344,7 @@ static void dio_aio_complete_work(struct work_struct *work)
 static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio);
 
 /*
- * Asynchronous IO callback. 
+ * Asynchronous IO callback.
  */
 static void dio_bio_end_aio(struct bio *bio)
 {
@@ -777,7 +777,7 @@ static inline int dio_bio_add_page(struct dio_submit *sdio)
 	}
 	return ret;
 }
-		
+
 /*
  * Put cur_page under IO.  The section of cur_page which is described by
  * cur_page_offset,cur_page_len is put into a BIO.  The section of cur_page
@@ -839,7 +839,7 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio,
  * An autonomous function to put a chunk of a page under deferred IO.
  *
  * The caller doesn't actually know (or care) whether this piece of page is in
- * a BIO, or is under IO or whatever.  We just take care of all possible 
+ * a BIO, or is under IO or whatever.  We just take care of all possible
  * situations here.  The separation between the logic of do_direct_IO() and
  * that of submit_page_section() is important for clarity.  Please don't break.
  *
@@ -940,7 +940,7 @@ static inline void dio_zero_block(struct dio *dio, struct dio_submit *sdio,
 	 * We need to zero out part of an fs block.  It is either at the
 	 * beginning or the end of the fs block.
 	 */
-	if (end) 
+	if (end)
 		this_chunk_blocks = dio_blocks_per_fs_block - this_chunk_blocks;
 
 	this_chunk_bytes = this_chunk_blocks << sdio->blkbits;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 02/15] iov_iter: add helper to test if an iter would use GUP
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
  2019-04-11 21:08 ` [PATCH v1 01/15] fs/direct-io: fix trailing whitespace issues jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 03/15] block: introduce bvec_page()/bvec_set_page() to get/set bio_vec.bv_page jglisse
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

From: Jérôme Glisse <jglisse@redhat.com>

Add an helper to test if call to iov_iter_get_pages*() with a given
iter would result in calls to GUP (get_user_pages*()). We want to
track differently page reference if they are coming from GUP and thus
we need to know when GUP is use for a given iter.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
---
 include/linux/uio.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index f184af1999a8..b12b2878a266 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -98,6 +98,17 @@ static inline bool iov_iter_bvec_no_ref(const struct iov_iter *i)
 	return (i->type & ITER_BVEC_FLAG_NO_REF) != 0;
 }
 
+/**
+ * iov_iter_get_pages_use_gup - true if iov_iter_get_pages(i) use GUP
+ * @i: iter
+ * Returns: true if a call to iov_iter_get_pages*() with the iter provided in
+ *          argument would result in the use of get_user_pages*()
+ */
+static inline bool iov_iter_get_pages_use_gup(const struct iov_iter *i)
+{
+	return iov_iter_type(i) & (ITER_IOVEC | ITER_PIPE);
+}
+
 /*
  * Total number of bytes covered by an iovec.
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 03/15] block: introduce bvec_page()/bvec_set_page() to get/set bio_vec.bv_page
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
  2019-04-11 21:08 ` [PATCH v1 01/15] fs/direct-io: fix trailing whitespace issues jglisse
  2019-04-11 21:08 ` [PATCH v1 02/15] iov_iter: add helper to test if an iter would use GUP jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 04/15] block: introduce BIO_VEC_INIT() macro to initialize bio_vec structure jglisse
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox, Coly Li,
	Kent Overstreet, linux-bcache

From: Jérôme Glisse <jglisse@redhat.com>

This add an helper to lookup the page a bvec struct points to. We want
to convert all direct dereference of bvec->page to call to those helpers
so that we can change the bvec->page fields.

To make coccinelle convertion (in latter patch) easier this patch also
do update some macro and some code that coccinelle is not able to match.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Coly Li <colyli@suse.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-bcache@vger.kernel.org
---
 block/bounce.c            |  2 +-
 drivers/block/rbd.c       |  2 +-
 drivers/md/bcache/btree.c |  2 +-
 include/linux/bvec.h      | 14 ++++++++++++--
 lib/iov_iter.c            | 32 ++++++++++++++++----------------
 5 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index 47eb7e936e22..d6ba1cac969f 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -85,7 +85,7 @@ static void bounce_copy_vec(struct bio_vec *to, unsigned char *vfrom)
 #else /* CONFIG_HIGHMEM */
 
 #define bounce_copy_vec(to, vfrom)	\
-	memcpy(page_address((to)->bv_page) + (to)->bv_offset, vfrom, (to)->bv_len)
+	memcpy(page_address(bvec_page(to)) + (to)->bv_offset, vfrom, (to)->bv_len)
 
 #endif /* CONFIG_HIGHMEM */
 
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 2210c1b9491b..aa3b82be5946 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2454,7 +2454,7 @@ static bool is_zero_bvecs(struct bio_vec *bvecs, u32 bytes)
 	};
 
 	ceph_bvec_iter_advance_step(&it, bytes, ({
-		if (memchr_inv(page_address(bv.bv_page) + bv.bv_offset, 0,
+		if (memchr_inv(page_address(bvec_page(&bv)) + bv.bv_offset, 0,
 			       bv.bv_len))
 			return false;
 	}));
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 64def336f053..b5f3168dc5ff 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -435,7 +435,7 @@ static void do_btree_node_write(struct btree *b)
 		struct bvec_iter_all iter_all;
 
 		bio_for_each_segment_all(bv, b->bio, j, iter_all)
-			memcpy(page_address(bv->bv_page),
+			memcpy(page_address(bvec_page(bv)),
 			       base + j * PAGE_SIZE, PAGE_SIZE);
 
 		bch_submit_bbio(b->bio, b->c, &k.key, 0);
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index f6275c4da13a..44866555258a 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -51,6 +51,16 @@ struct bvec_iter_all {
 	unsigned	done;
 };
 
+static inline struct page *bvec_page(const struct bio_vec *bvec)
+{
+	return bvec->bv_page;
+}
+
+static inline void bvec_set_page(struct bio_vec *bvec, struct page *page)
+{
+	bvec->bv_page = page;
+}
+
 static inline struct page *bvec_nth_page(struct page *page, int idx)
 {
 	return idx == 0 ? page : nth_page(page, idx);
@@ -64,7 +74,7 @@ static inline struct page *bvec_nth_page(struct page *page, int idx)
 
 /* multi-page (mp_bvec) helpers */
 #define mp_bvec_iter_page(bvec, iter)				\
-	(__bvec_iter_bvec((bvec), (iter))->bv_page)
+	(bvec_page(__bvec_iter_bvec((bvec), (iter))))
 
 #define mp_bvec_iter_len(bvec, iter)				\
 	min((iter).bi_size,					\
@@ -192,6 +202,6 @@ static inline void mp_bvec_last_segment(const struct bio_vec *bvec,
 #define mp_bvec_for_each_page(pg, bv, i)				\
 	for (i = (bv)->bv_offset / PAGE_SIZE;				\
 		(i <= (((bv)->bv_offset + (bv)->bv_len - 1) / PAGE_SIZE)) && \
-		(pg = bvec_nth_page((bv)->bv_page, i)); i += 1)
+		(pg = bvec_nth_page(bvec_page(bv), i)); i += 1)
 
 #endif /* __LINUX_BVEC_ITER_H */
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ea36dc355da1..e20a3b1d8b0e 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -608,7 +608,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 		might_fault();
 	iterate_and_advance(i, bytes, v,
 		copyout(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
-		memcpy_to_page(v.bv_page, v.bv_offset,
+		memcpy_to_page(bvec_page(&v), v.bv_offset,
 			       (from += v.bv_len) - v.bv_len, v.bv_len),
 		memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len)
 	)
@@ -709,7 +709,7 @@ size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
 	iterate_and_advance(i, bytes, v,
 		copyout_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
 		({
-		rem = memcpy_mcsafe_to_page(v.bv_page, v.bv_offset,
+		rem = memcpy_mcsafe_to_page(bvec_page(&v), v.bv_offset,
                                (from += v.bv_len) - v.bv_len, v.bv_len);
 		if (rem) {
 			curr_addr = (unsigned long) from;
@@ -744,7 +744,7 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 		might_fault();
 	iterate_and_advance(i, bytes, v,
 		copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
-		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+		memcpy_from_page((to += v.bv_len) - v.bv_len, bvec_page(&v),
 				 v.bv_offset, v.bv_len),
 		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
 	)
@@ -770,7 +770,7 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
 				      v.iov_base, v.iov_len))
 			return false;
 		0;}),
-		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+		memcpy_from_page((to += v.bv_len) - v.bv_len, bvec_page(&v),
 				 v.bv_offset, v.bv_len),
 		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
 	)
@@ -790,7 +790,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 	iterate_and_advance(i, bytes, v,
 		__copy_from_user_inatomic_nocache((to += v.iov_len) - v.iov_len,
 					 v.iov_base, v.iov_len),
-		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+		memcpy_from_page((to += v.bv_len) - v.bv_len, bvec_page(&v),
 				 v.bv_offset, v.bv_len),
 		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
 	)
@@ -824,7 +824,7 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 	iterate_and_advance(i, bytes, v,
 		__copy_from_user_flushcache((to += v.iov_len) - v.iov_len,
 					 v.iov_base, v.iov_len),
-		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
+		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, bvec_page(&v),
 				 v.bv_offset, v.bv_len),
 		memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
 			v.iov_len)
@@ -849,7 +849,7 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
 					     v.iov_base, v.iov_len))
 			return false;
 		0;}),
-		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+		memcpy_from_page((to += v.bv_len) - v.bv_len, bvec_page(&v),
 				 v.bv_offset, v.bv_len),
 		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
 	)
@@ -951,7 +951,7 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
 		return pipe_zero(bytes, i);
 	iterate_and_advance(i, bytes, v,
 		clear_user(v.iov_base, v.iov_len),
-		memzero_page(v.bv_page, v.bv_offset, v.bv_len),
+		memzero_page(bvec_page(&v), v.bv_offset, v.bv_len),
 		memset(v.iov_base, 0, v.iov_len)
 	)
 
@@ -974,7 +974,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
 	}
 	iterate_all_kinds(i, bytes, v,
 		copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
-		memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
+		memcpy_from_page((p += v.bv_len) - v.bv_len, bvec_page(&v),
 				 v.bv_offset, v.bv_len),
 		memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
 	)
@@ -1300,7 +1300,7 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 	0;}),({
 		/* can't be more than PAGE_SIZE */
 		*start = v.bv_offset;
-		get_page(*pages = v.bv_page);
+		get_page(*pages = bvec_page(&v));
 		return v.bv_len;
 	}),({
 		return -EFAULT;
@@ -1387,7 +1387,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		*pages = p = get_pages_array(1);
 		if (!p)
 			return -ENOMEM;
-		get_page(*p = v.bv_page);
+		get_page(*p = bvec_page(&v));
 		return v.bv_len;
 	}),({
 		return -EFAULT;
@@ -1419,7 +1419,7 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
 		}
 		err ? v.iov_len : 0;
 	}), ({
-		char *p = kmap_atomic(v.bv_page);
+		char *p = kmap_atomic(bvec_page(&v));
 		sum = csum_and_memcpy((to += v.bv_len) - v.bv_len,
 				      p + v.bv_offset, v.bv_len,
 				      sum, off);
@@ -1461,7 +1461,7 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
 		off += v.iov_len;
 		0;
 	}), ({
-		char *p = kmap_atomic(v.bv_page);
+		char *p = kmap_atomic(bvec_page(&v));
 		sum = csum_and_memcpy((to += v.bv_len) - v.bv_len,
 				      p + v.bv_offset, v.bv_len,
 				      sum, off);
@@ -1507,7 +1507,7 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
 		}
 		err ? v.iov_len : 0;
 	}), ({
-		char *p = kmap_atomic(v.bv_page);
+		char *p = kmap_atomic(bvec_page(&v));
 		sum = csum_and_memcpy(p + v.bv_offset,
 				      (from += v.bv_len) - v.bv_len,
 				      v.bv_len, sum, off);
@@ -1696,10 +1696,10 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
 		return 0;
 
 	iterate_all_kinds(i, bytes, v, -EINVAL, ({
-		w.iov_base = kmap(v.bv_page) + v.bv_offset;
+		w.iov_base = kmap(bvec_page(&v)) + v.bv_offset;
 		w.iov_len = v.bv_len;
 		err = f(&w, context);
-		kunmap(v.bv_page);
+		kunmap(bvec_page(&v));
 		err;}), ({
 		w = v;
 		err = f(&w, context);})
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 04/15] block: introduce BIO_VEC_INIT() macro to initialize bio_vec structure
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (2 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 03/15] block: introduce bvec_page()/bvec_set_page() to get/set bio_vec.bv_page jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 05/15] block: replace all bio_vec->bv_page by bvec_page()/bvec_set_page() jglisse
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox, Ilya Dryomov,
	Sage Weil, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Steve French, linux-cifs, Mike Marshall,
	Martin Brandenburg, devel, Dominique Martinet, v9fs-developer

From: Jérôme Glisse <jglisse@redhat.com>

This add a macro to initialize bio_vec structure. We want to convert
all initialization with that macro so that it is easier to change the
bvec->page fields in latter patch.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Sage Weil <sage@redhat.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Martin Brandenburg <martin@omnibond.com>
Cc: devel@lists.orangefs.org
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: v9fs-developer@lists.sourceforge.net
---
 block/blk-integrity.c | 4 ++--
 block/blk-merge.c     | 2 +-
 fs/9p/vfs_addr.c      | 2 +-
 fs/ceph/file.c        | 8 +++-----
 fs/cifs/connect.c     | 4 ++--
 fs/orangefs/inode.c   | 2 +-
 include/linux/bvec.h  | 2 ++
 mm/page_io.c          | 6 +-----
 net/ceph/messenger.c  | 6 +-----
 9 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index d1ab089e0919..916a5406649d 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -40,7 +40,7 @@
  */
 int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *bio)
 {
-	struct bio_vec iv, ivprv = { NULL };
+	struct bio_vec iv, ivprv = BIO_VEC_INIT(NULL, 0, 0);
 	unsigned int segments = 0;
 	unsigned int seg_size = 0;
 	struct bvec_iter iter;
@@ -82,7 +82,7 @@ EXPORT_SYMBOL(blk_rq_count_integrity_sg);
 int blk_rq_map_integrity_sg(struct request_queue *q, struct bio *bio,
 			    struct scatterlist *sglist)
 {
-	struct bio_vec iv, ivprv = { NULL };
+	struct bio_vec iv, ivprv = BIO_VEC_INIT(NULL, 0, 0);
 	struct scatterlist *sg = NULL;
 	unsigned int segments = 0;
 	struct bvec_iter iter;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1c9d4f0f96ea..c355fb9e9e8e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -447,7 +447,7 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio)
 static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
 				   struct bio *nxt)
 {
-	struct bio_vec end_bv = { NULL }, nxt_bv;
+	struct bio_vec end_bv = BIO_VEC_INIT(NULL, 0, 0), nxt_bv;
 
 	if (bio->bi_seg_back_size + nxt->bi_seg_front_size >
 	    queue_max_segment_size(q))
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 0bcbcc20f769..b626b28f0ce9 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -53,7 +53,7 @@
 static int v9fs_fid_readpage(struct p9_fid *fid, struct page *page)
 {
 	struct inode *inode = page->mapping->host;
-	struct bio_vec bvec = {.bv_page = page, .bv_len = PAGE_SIZE};
+	struct bio_vec bvec = BIO_VEC_INIT(page, PAGE_SIZE, 0);
 	struct iov_iter to;
 	int retval, err;
 
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 9f53c3d99304..d3c8035335a2 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -100,11 +100,9 @@ static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize,
 		size += bytes;
 
 		for ( ; bytes; idx++, bvec_idx++) {
-			struct bio_vec bv = {
-				.bv_page = pages[idx],
-				.bv_len = min_t(int, bytes, PAGE_SIZE - start),
-				.bv_offset = start,
-			};
+			struct bio_vec bv = BIO_VEC_INIT(pages[idx],
+				min_t(int, bytes, PAGE_SIZE - start),
+				start);
 
 			bvecs[bvec_idx] = bv;
 			bytes -= bv.bv_len;
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 4c0e44489f21..86438f3933a9 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -809,8 +809,8 @@ cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
 	unsigned int page_offset, unsigned int to_read)
 {
 	struct msghdr smb_msg;
-	struct bio_vec bv = {
-		.bv_page = page, .bv_len = to_read, .bv_offset = page_offset};
+	struct bio_vec bv = BIO_VEC_INIT(page, to_read, page_offset);
+
 	iov_iter_bvec(&smb_msg.msg_iter, READ, &bv, 1, to_read);
 	return cifs_readv_from_socket(server, &smb_msg);
 }
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index c3334eca18c7..5ebd2da4c093 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -23,7 +23,7 @@ static int read_one_page(struct page *page)
 	const __u32 blocksize = PAGE_SIZE;
 	const __u32 blockbits = PAGE_SHIFT;
 	struct iov_iter to;
-	struct bio_vec bv = {.bv_page = page, .bv_len = PAGE_SIZE};
+	struct bio_vec bv = BIO_VEC_INIT(page, PAGE_SIZE, 0);
 
 	iov_iter_bvec(&to, READ, &bv, 1, PAGE_SIZE);
 
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 44866555258a..8f8fb528ce53 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -70,6 +70,8 @@ static inline struct page *bvec_nth_page(struct page *page, int idx)
  * various member access, note that bio_data should of course not be used
  * on highmem page vectors
  */
+#define BIO_VEC_INIT(p, l, o) {.bv_page = (p), .bv_len = (l), .bv_offset = (o)}
+
 #define __bvec_iter_bvec(bvec, iter)	(&(bvec)[(iter).bi_idx])
 
 /* multi-page (mp_bvec) helpers */
diff --git a/mm/page_io.c b/mm/page_io.c
index 2e8019d0e048..6b3be0445c61 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -287,11 +287,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 		struct kiocb kiocb;
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
-		struct bio_vec bv = {
-			.bv_page = page,
-			.bv_len  = PAGE_SIZE,
-			.bv_offset = 0
-		};
+		struct bio_vec bv = BIO_VEC_INIT(page, PAGE_SIZE, 0);
 		struct iov_iter from;
 
 		iov_iter_bvec(&from, WRITE, &bv, 1, PAGE_SIZE);
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 3083988ce729..3e16187491d8 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -523,11 +523,7 @@ static int ceph_tcp_recvmsg(struct socket *sock, void *buf, size_t len)
 static int ceph_tcp_recvpage(struct socket *sock, struct page *page,
 		     int page_offset, size_t length)
 {
-	struct bio_vec bvec = {
-		.bv_page = page,
-		.bv_offset = page_offset,
-		.bv_len = length
-	};
+	struct bio_vec bvec = BIO_VEC_INIT(page, length, page_offset);
 	struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL };
 	int r;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 05/15] block: replace all bio_vec->bv_page by bvec_page()/bvec_set_page()
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (3 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 04/15] block: introduce BIO_VEC_INIT() macro to initialize bio_vec structure jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 06/15] block: convert bio_vec.bv_page to bv_pfn to store pfn and not page jglisse
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

From: Jérôme Glisse <jglisse@redhat.com>

This replace almost all direct dereference of bv_page field of bio_vec
struct with bvec_page() or bvec_set_page() (the latter when setting the
field). Motivation is to allow to change bv_page field to include some
context information and thus we need to go through an helper.

This is done using a coccinelle patch and running it with (take ~30min):

spatch --include-headers --sp-file spfile --in-place --dir .

with spfile:
%<---------------------------------------------------------------------
@exists@
struct bio_vec BVEC;
expression E1;
identifier FN!={bvec_set_page};
@@
FN(...) {<...
-BVEC.bv_page = E1;
+bvec_set_page(&BVEC, E1);
...>}

@exists@
struct bio_vec *BVEC;
expression E1, E2;
@@
-BVEC[E1].bv_page = E2;
+bvec_set_page(&BVEC[E1], E2);

@exists@
struct bio_vec *BVEC;
expression E1;
identifier FN!={bvec_set_page};
@@
FN(...) {<...
-BVEC->bv_page = E1;
+bvec_set_page(BVEC, E1);
...>}

@exists@
struct bvec_iter_all *ITER;
expression E1;
@@
-ITER->bv.bv_page = E1;
+bvec_set_page(&ITER->bv, E1);

@exists@
struct request *req;
expression E1;
@@
-req->special_vec.bv_page = E1;
+bvec_set_page(&req->special_vec, E1);

@exists@
struct bio *BIO;
expression E1;
@@
-BIO->bi_io_vec->bv_page = E1;
+bvec_set_page(bio->bi_io_vec, E1);

@exists@
struct rbd_obj_request *req;
expression E1, E2;
@@
-req->copyup_bvecs[E1].bv_page = E2;
+bvec_set_page(&req->copyup_bvecs[E1], E2);

@exists@
struct pending_block *block;
expression E1, E2;
@@
-block->vecs[E1].bv_page = E2;
+bvec_set_page(&block->vecs[E1], E2);

@exists@
struct stripe_head *sh;
expression E1, E2;
@@
-sh->dev[E1].vec.bv_page = E2;
+bvec_set_page(&sh->dev[E1].vec, E2);

@exists@
struct io_mapped_ubuf *imu;
expression E1, E2;
@@
-imu->bvec[E1].bv_page = E2;
+bvec_set_page(&imu->bvec[E1], E2);

@exists@
struct afs_call *call;
expression E1, E2;
@@
-call->bvec[E1].bv_page = E2;
+bvec_set_page(&call->bvec[E1], E2);

@exists@
struct xdr_buf *buf;
expression E1, E2;
@@
-buf->bvec[E1].bv_page = E2;
+bvec_set_page(&buf->bvec[E1], E2);

@exists@
expression E1, E2;
@@
-bio_first_bvec_all(E1)->bv_page = E2;
+bvec_set_page(bio_first_bvec_all(E1), E2);

@exists@
struct bio_vec BVEC;
identifier FN!={bvec_set_page,bvec_page};
@@
FN(...) {<...
-BVEC.bv_page
+bvec_page(&BVEC)
...>}

@exists@
struct bio_vec *BVEC;
expression E1;
@@
-BVEC[E1].bv_page
+bvec_page(&BVEC[E1])

@exists@
struct bio_vec *BVEC;
identifier FN!={bvec_set_page,bvec_page};
@@
FN(...) {<...
-BVEC->bv_page
+bvec_page(BVEC)
...>}

@exists@
struct bvec_iter_all *ITER;
@@
-ITER->bv.bv_page
+bvec_page(&ITER->bv)

@exists@
struct request *req;
@@
-req->special_vec.bv_page
+bvec_page(&req->special_vec)

@exists@
struct rbd_obj_request *req;
expression E1;
@@
-req->copyup_bvecs[E1].bv_page
+bvec_page(&req->copyup_bvecs[E1])

@exists@
struct pending_block *block;
expression E1;
@@
-block->vecs[E1].bv_page
+bvec_page(&block->vecs[E1])

@exists@
struct stripe_head *sh;
expression E1;
@@
-sh->dev[E1].vec.bv_page
+bvec_page(&sh->dev[E1].vec)

@exists@
struct io_mapped_ubuf *imu;
expression E1;
@@
-imu->bvec[E1].bv_page
+bvec_page(&imu->bvec[E1])

@exists@
struct afs_call *call;
expression E1;
@@
-call->bvec[E1].bv_page
+bvec_page(&call->bvec[E1])

@exists@
struct xdr_buf *buf;
expression E1;
@@
-buf->bvec[E1].bv_page
+bvec_page(&buf->bvec[E1])

@exists@
struct bio_integrity_payload *bip;
@@
-bip->bip_vec->bv_page
+bvec_page(bip->bip_vec)

@exists@
struct bio *BIO;
@@
-BIO->bi_io_vec->bv_page
+bvec_page(BIO->bi_io_vec)

@exists@
struct bio *BIO;
@@
-BIO->bi_io_vec[0].bv_page
+bvec_page(&BIO->bi_io_vec)

@exists@
struct nvme_tcp_request *req;
@@
-req->iter.bvec->bv_page
+bvec_page(req->iter.bvec)

@exists@
expression E1;
@@
-bio_first_bvec_all(E1)->bv_page
+bvec_page(bio_first_bvec_all(E1))

@exists@
struct cache *ca;
@@
-ca->sb_bio.bi_inline_vecs[0].bv_page
+bvec_page(ca->sb_bio.bi_inline_vecs)

@exists@
struct nvm_rq *rqd;
expression E1;
@@
-rqd->bio->bi_io_vec[E1].bv_page
+bvec_page(&rqd->bio->bi_io_vec[E1])

@exists@
struct msghdr *msg;
@@
-msg->msg_iter.bvec->bv_page
+bvec_page(msg->msg_iter.bvec)
--------------------------------------------------------------------->%

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
---
 arch/m68k/emu/nfblock.c             |  2 +-
 arch/um/drivers/ubd_kern.c          |  2 +-
 arch/xtensa/platforms/iss/simdisk.c |  2 +-
 block/bio-integrity.c               |  8 +++---
 block/bio.c                         | 44 ++++++++++++++---------------
 block/blk-core.c                    |  2 +-
 block/blk-integrity.c               |  3 +-
 block/blk-lib.c                     |  2 +-
 block/blk-merge.c                   |  7 +++--
 block/blk.h                         |  4 +--
 block/bounce.c                      | 24 ++++++++--------
 block/t10-pi.c                      |  4 +--
 drivers/block/aoe/aoecmd.c          |  4 +--
 drivers/block/brd.c                 |  2 +-
 drivers/block/drbd/drbd_bitmap.c    |  2 +-
 drivers/block/drbd/drbd_main.c      |  4 +--
 drivers/block/drbd/drbd_receiver.c  |  4 +--
 drivers/block/drbd/drbd_worker.c    |  2 +-
 drivers/block/floppy.c              |  4 +--
 drivers/block/loop.c                | 16 +++++------
 drivers/block/null_blk_main.c       |  6 ++--
 drivers/block/ps3disk.c             |  2 +-
 drivers/block/ps3vram.c             |  2 +-
 drivers/block/rbd.c                 | 10 +++----
 drivers/block/rsxx/dma.c            |  3 +-
 drivers/block/umem.c                |  2 +-
 drivers/block/virtio_blk.c          |  4 +--
 drivers/block/zram/zram_drv.c       | 22 +++++++--------
 drivers/lightnvm/pblk-core.c        |  7 ++---
 drivers/lightnvm/pblk-read.c        |  6 ++--
 drivers/md/bcache/debug.c           |  4 +--
 drivers/md/bcache/request.c         |  4 +--
 drivers/md/bcache/super.c           |  6 ++--
 drivers/md/bcache/util.c            | 11 ++++----
 drivers/md/dm-crypt.c               | 16 +++++++----
 drivers/md/dm-integrity.c           | 18 ++++++------
 drivers/md/dm-io.c                  |  2 +-
 drivers/md/dm-log-writes.c          | 12 ++++----
 drivers/md/dm-verity-target.c       |  4 +--
 drivers/md/raid5.c                  | 10 ++++---
 drivers/nvdimm/blk.c                |  6 ++--
 drivers/nvdimm/btt.c                |  5 ++--
 drivers/nvdimm/pmem.c               |  4 +--
 drivers/nvme/host/core.c            |  4 +--
 drivers/nvme/host/tcp.c             |  2 +-
 drivers/nvme/target/io-cmd-file.c   |  2 +-
 drivers/s390/block/dasd_diag.c      |  2 +-
 drivers/s390/block/dasd_eckd.c      | 14 ++++-----
 drivers/s390/block/dasd_fba.c       |  6 ++--
 drivers/s390/block/dcssblk.c        |  2 +-
 drivers/s390/block/scm_blk.c        |  2 +-
 drivers/s390/block/xpram.c          |  2 +-
 drivers/scsi/sd.c                   | 25 ++++++++--------
 drivers/staging/erofs/data.c        |  2 +-
 drivers/staging/erofs/unzip_vle.c   |  2 +-
 drivers/target/target_core_file.c   |  6 ++--
 drivers/xen/biomerge.c              |  4 +--
 fs/9p/vfs_addr.c                    |  2 +-
 fs/afs/fsclient.c                   |  2 +-
 fs/afs/rxrpc.c                      |  4 +--
 fs/afs/yfsclient.c                  |  2 +-
 fs/block_dev.c                      |  8 +++---
 fs/btrfs/check-integrity.c          |  4 +--
 fs/btrfs/compression.c              | 12 ++++----
 fs/btrfs/disk-io.c                  |  4 +--
 fs/btrfs/extent_io.c                |  8 +++---
 fs/btrfs/file-item.c                |  8 +++---
 fs/btrfs/inode.c                    | 20 +++++++------
 fs/btrfs/raid56.c                   |  4 +--
 fs/buffer.c                         |  2 +-
 fs/ceph/file.c                      |  6 ++--
 fs/cifs/misc.c                      |  6 ++--
 fs/cifs/smb2ops.c                   |  2 +-
 fs/cifs/smbdirect.c                 |  2 +-
 fs/cifs/transport.c                 |  2 +-
 fs/crypto/bio.c                     |  2 +-
 fs/direct-io.c                      |  2 +-
 fs/ext4/page-io.c                   |  2 +-
 fs/ext4/readpage.c                  |  2 +-
 fs/f2fs/data.c                      | 10 +++----
 fs/gfs2/lops.c                      |  4 +--
 fs/gfs2/meta_io.c                   |  2 +-
 fs/io_uring.c                       |  4 +--
 fs/iomap.c                          |  4 +--
 fs/mpage.c                          |  2 +-
 fs/splice.c                         |  2 +-
 fs/xfs/xfs_aops.c                   |  6 ++--
 include/linux/bio.h                 |  6 ++--
 include/linux/bvec.h                | 10 +++----
 net/ceph/messenger.c                |  4 +--
 net/sunrpc/xdr.c                    |  2 +-
 net/sunrpc/xprtsock.c               |  4 +--
 92 files changed, 282 insertions(+), 267 deletions(-)

diff --git a/arch/m68k/emu/nfblock.c b/arch/m68k/emu/nfblock.c
index 40712e49381b..79b90d62e916 100644
--- a/arch/m68k/emu/nfblock.c
+++ b/arch/m68k/emu/nfblock.c
@@ -73,7 +73,7 @@ static blk_qc_t nfhd_make_request(struct request_queue *queue, struct bio *bio)
 		len = bvec.bv_len;
 		len >>= 9;
 		nfhd_read_write(dev->id, 0, dir, sec >> shift, len >> shift,
-				page_to_phys(bvec.bv_page) + bvec.bv_offset);
+				page_to_phys(bvec_page(&bvec)) + bvec.bv_offset);
 		sec += len;
 	}
 	bio_endio(bio);
diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index aca09be2373e..da0f0229e2e9 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -1328,7 +1328,7 @@ static int ubd_queue_one_vec(struct blk_mq_hw_ctx *hctx, struct request *req,
 	io_req->error = 0;
 
 	if (bvec != NULL) {
-		io_req->buffer = page_address(bvec->bv_page) + bvec->bv_offset;
+		io_req->buffer = page_address(bvec_page(bvec)) + bvec->bv_offset;
 		io_req->length = bvec->bv_len;
 	} else {
 		io_req->buffer = NULL;
diff --git a/arch/xtensa/platforms/iss/simdisk.c b/arch/xtensa/platforms/iss/simdisk.c
index 026211e7ab09..bc792023bd92 100644
--- a/arch/xtensa/platforms/iss/simdisk.c
+++ b/arch/xtensa/platforms/iss/simdisk.c
@@ -109,7 +109,7 @@ static blk_qc_t simdisk_make_request(struct request_queue *q, struct bio *bio)
 	sector_t sector = bio->bi_iter.bi_sector;
 
 	bio_for_each_segment(bvec, bio, iter) {
-		char *buffer = kmap_atomic(bvec.bv_page) + bvec.bv_offset;
+		char *buffer = kmap_atomic(bvec_page(&bvec)) + bvec.bv_offset;
 		unsigned len = bvec.bv_len >> SECTOR_SHIFT;
 
 		simdisk_transfer(dev, sector, len, buffer,
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 1b633a3526d4..adcbae6ac6f4 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -108,7 +108,7 @@ static void bio_integrity_free(struct bio *bio)
 	struct bio_set *bs = bio->bi_pool;
 
 	if (bip->bip_flags & BIP_BLOCK_INTEGRITY)
-		kfree(page_address(bip->bip_vec->bv_page) +
+		kfree(page_address(bvec_page(bip->bip_vec)) +
 		      bip->bip_vec->bv_offset);
 
 	if (bs && mempool_initialized(&bs->bio_integrity_pool)) {
@@ -150,7 +150,7 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
 			     &bip->bip_vec[bip->bip_vcnt - 1], offset))
 		return 0;
 
-	iv->bv_page = page;
+	bvec_set_page(iv, page);
 	iv->bv_len = len;
 	iv->bv_offset = offset;
 	bip->bip_vcnt++;
@@ -174,7 +174,7 @@ static blk_status_t bio_integrity_process(struct bio *bio,
 	struct bio_vec bv;
 	struct bio_integrity_payload *bip = bio_integrity(bio);
 	blk_status_t ret = BLK_STS_OK;
-	void *prot_buf = page_address(bip->bip_vec->bv_page) +
+	void *prot_buf = page_address(bvec_page(bip->bip_vec)) +
 		bip->bip_vec->bv_offset;
 
 	iter.disk_name = bio->bi_disk->disk_name;
@@ -183,7 +183,7 @@ static blk_status_t bio_integrity_process(struct bio *bio,
 	iter.prot_buf = prot_buf;
 
 	__bio_for_each_segment(bv, bio, bviter, *proc_iter) {
-		void *kaddr = kmap_atomic(bv.bv_page);
+		void *kaddr = kmap_atomic(bvec_page(&bv));
 
 		iter.data_buf = kaddr + bv.bv_offset;
 		iter.data_size = bv.bv_len;
diff --git a/block/bio.c b/block/bio.c
index 716510ecd7ff..c73ac2120ca0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -541,7 +541,7 @@ void zero_fill_bio_iter(struct bio *bio, struct bvec_iter start)
 	__bio_for_each_segment(bv, bio, iter, start) {
 		char *data = bvec_kmap_irq(&bv, &flags);
 		memset(data, 0, bv.bv_len);
-		flush_dcache_page(bv.bv_page);
+		flush_dcache_page(bvec_page(&bv));
 		bvec_kunmap_irq(data, &flags);
 	}
 }
@@ -685,7 +685,7 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 	if (bio->bi_vcnt > 0) {
 		struct bio_vec *prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
-		if (page == prev->bv_page &&
+		if (page == bvec_page(prev) &&
 		    offset == prev->bv_offset + prev->bv_len) {
 			prev->bv_len += len;
 			bio->bi_iter.bi_size += len;
@@ -708,7 +708,7 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 	 * cannot add the page
 	 */
 	bvec = &bio->bi_io_vec[bio->bi_vcnt];
-	bvec->bv_page = page;
+	bvec_set_page(bvec, page);
 	bvec->bv_len = len;
 	bvec->bv_offset = offset;
 	bio->bi_vcnt++;
@@ -737,7 +737,7 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 	return len;
 
  failed:
-	bvec->bv_page = NULL;
+	bvec_set_page(bvec, NULL);
 	bvec->bv_len = 0;
 	bvec->bv_offset = 0;
 	bio->bi_vcnt--;
@@ -770,7 +770,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 
 	if (bio->bi_vcnt > 0) {
 		struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
-		phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
+		phys_addr_t vec_end_addr = page_to_phys(bvec_page(bv)) +
 			bv->bv_offset + bv->bv_len - 1;
 		phys_addr_t page_addr = page_to_phys(page);
 
@@ -805,7 +805,7 @@ void __bio_add_page(struct bio *bio, struct page *page,
 	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
 	WARN_ON_ONCE(bio_full(bio));
 
-	bv->bv_page = page;
+	bvec_set_page(bv, page);
 	bv->bv_offset = off;
 	bv->bv_len = len;
 
@@ -846,7 +846,7 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
 		return -EINVAL;
 
 	len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count);
-	size = bio_add_page(bio, bv->bv_page, len,
+	size = bio_add_page(bio, bvec_page(bv), len,
 				bv->bv_offset + iter->iov_offset);
 	if (size == len) {
 		if (!bio_flagged(bio, BIO_NO_PAGE_REF)) {
@@ -1022,8 +1022,8 @@ void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
 
 		bytes = min(src_bv.bv_len, dst_bv.bv_len);
 
-		src_p = kmap_atomic(src_bv.bv_page);
-		dst_p = kmap_atomic(dst_bv.bv_page);
+		src_p = kmap_atomic(bvec_page(&src_bv));
+		dst_p = kmap_atomic(bvec_page(&dst_bv));
 
 		memcpy(dst_p + dst_bv.bv_offset,
 		       src_p + src_bv.bv_offset,
@@ -1032,7 +1032,7 @@ void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
 		kunmap_atomic(dst_p);
 		kunmap_atomic(src_p);
 
-		flush_dcache_page(dst_bv.bv_page);
+		flush_dcache_page(bvec_page(&dst_bv));
 
 		bio_advance_iter(src, src_iter, bytes);
 		bio_advance_iter(dst, dst_iter, bytes);
@@ -1134,7 +1134,7 @@ static int bio_copy_from_iter(struct bio *bio, struct iov_iter *iter)
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
 		ssize_t ret;
 
-		ret = copy_page_from_iter(bvec->bv_page,
+		ret = copy_page_from_iter(bvec_page(bvec),
 					  bvec->bv_offset,
 					  bvec->bv_len,
 					  iter);
@@ -1166,7 +1166,7 @@ static int bio_copy_to_iter(struct bio *bio, struct iov_iter iter)
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
 		ssize_t ret;
 
-		ret = copy_page_to_iter(bvec->bv_page,
+		ret = copy_page_to_iter(bvec_page(bvec),
 					bvec->bv_offset,
 					bvec->bv_len,
 					&iter);
@@ -1188,7 +1188,7 @@ void bio_free_pages(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all)
-		__free_page(bvec->bv_page);
+		__free_page(bvec_page(bvec));
 }
 EXPORT_SYMBOL(bio_free_pages);
 
@@ -1433,7 +1433,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 
  out_unmap:
 	bio_for_each_segment_all(bvec, bio, j, iter_all) {
-		put_page(bvec->bv_page);
+		put_page(bvec_page(bvec));
 	}
 	bio_put(bio);
 	return ERR_PTR(ret);
@@ -1450,9 +1450,9 @@ static void __bio_unmap_user(struct bio *bio)
 	 */
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
 		if (bio_data_dir(bio) == READ)
-			set_page_dirty_lock(bvec->bv_page);
+			set_page_dirty_lock(bvec_page(bvec));
 
-		put_page(bvec->bv_page);
+		put_page(bvec_page(bvec));
 	}
 
 	bio_put(bio);
@@ -1543,7 +1543,7 @@ static void bio_copy_kern_endio_read(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		memcpy(p, page_address(bvec->bv_page), bvec->bv_len);
+		memcpy(p, page_address(bvec_page(bvec)), bvec->bv_len);
 		p += bvec->bv_len;
 	}
 
@@ -1654,8 +1654,8 @@ void bio_set_pages_dirty(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		if (!PageCompound(bvec->bv_page))
-			set_page_dirty_lock(bvec->bv_page);
+		if (!PageCompound(bvec_page(bvec)))
+			set_page_dirty_lock(bvec_page(bvec));
 	}
 }
 
@@ -1666,7 +1666,7 @@ static void bio_release_pages(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all)
-		put_page(bvec->bv_page);
+		put_page(bvec_page(bvec));
 }
 
 /*
@@ -1716,7 +1716,7 @@ void bio_check_pages_dirty(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		if (!PageDirty(bvec->bv_page) && !PageCompound(bvec->bv_page))
+		if (!PageDirty(bvec_page(bvec)) && !PageCompound(bvec_page(bvec)))
 			goto defer;
 	}
 
@@ -1789,7 +1789,7 @@ void bio_flush_dcache_pages(struct bio *bi)
 	struct bvec_iter iter;
 
 	bio_for_each_segment(bvec, bi, iter)
-		flush_dcache_page(bvec.bv_page);
+		flush_dcache_page(bvec_page(&bvec));
 }
 EXPORT_SYMBOL(bio_flush_dcache_pages);
 #endif
diff --git a/block/blk-core.c b/block/blk-core.c
index 4673ebe42255..ad6b3d4d3880 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1535,7 +1535,7 @@ void rq_flush_dcache_pages(struct request *rq)
 	struct bio_vec bvec;
 
 	rq_for_each_segment(bvec, rq, iter)
-		flush_dcache_page(bvec.bv_page);
+		flush_dcache_page(bvec_page(&bvec));
 }
 EXPORT_SYMBOL_GPL(rq_flush_dcache_pages);
 #endif
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 916a5406649d..7148e2a134fb 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -106,7 +106,8 @@ int blk_rq_map_integrity_sg(struct request_queue *q, struct bio *bio,
 				sg = sg_next(sg);
 			}
 
-			sg_set_page(sg, iv.bv_page, iv.bv_len, iv.bv_offset);
+			sg_set_page(sg, bvec_page(&iv), iv.bv_len,
+				    iv.bv_offset);
 			segments++;
 		}
 
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 5f2c429d4378..02a0b398566d 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -158,7 +158,7 @@ static int __blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 		bio->bi_iter.bi_sector = sector;
 		bio_set_dev(bio, bdev);
 		bio->bi_vcnt = 1;
-		bio->bi_io_vec->bv_page = page;
+		bvec_set_page(bio->bi_io_vec, page);
 		bio->bi_io_vec->bv_offset = 0;
 		bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);
 		bio_set_op_attrs(bio, REQ_OP_WRITE_SAME, 0);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index c355fb9e9e8e..35f8c76e5448 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -498,7 +498,7 @@ static unsigned blk_bvec_map_sg(struct request_queue *q,
 
 		offset = (total + bvec->bv_offset) % PAGE_SIZE;
 		idx = (total + bvec->bv_offset) / PAGE_SIZE;
-		pg = bvec_nth_page(bvec->bv_page, idx);
+		pg = bvec_nth_page(bvec_page(bvec), idx);
 
 		sg_set_page(*sg, pg, seg_size, offset);
 
@@ -529,7 +529,8 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 new_segment:
 		if (bvec->bv_offset + bvec->bv_len <= PAGE_SIZE) {
 			*sg = blk_next_sg(sg, sglist);
-			sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
+			sg_set_page(*sg, bvec_page(bvec), nbytes,
+				    bvec->bv_offset);
 			(*nsegs) += 1;
 		} else
 			(*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
@@ -541,7 +542,7 @@ static inline int __blk_bvec_map_sg(struct request_queue *q, struct bio_vec bv,
 		struct scatterlist *sglist, struct scatterlist **sg)
 {
 	*sg = sglist;
-	sg_set_page(*sg, bv.bv_page, bv.bv_len, bv.bv_offset);
+	sg_set_page(*sg, bvec_page(&bv), bv.bv_len, bv.bv_offset);
 	return 1;
 }
 
diff --git a/block/blk.h b/block/blk.h
index 5d636ee41663..8276ce4b9b3c 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -70,8 +70,8 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
 		struct bio_vec *vec1, struct bio_vec *vec2)
 {
 	unsigned long mask = queue_segment_boundary(q);
-	phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset;
-	phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;
+	phys_addr_t addr1 = page_to_phys(bvec_page(vec1)) + vec1->bv_offset;
+	phys_addr_t addr2 = page_to_phys(bvec_page(vec2)) + vec2->bv_offset;
 
 	if (addr1 + vec1->bv_len != addr2)
 		return false;
diff --git a/block/bounce.c b/block/bounce.c
index d6ba1cac969f..63529ec8ffe1 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -77,7 +77,7 @@ static void bounce_copy_vec(struct bio_vec *to, unsigned char *vfrom)
 {
 	unsigned char *vto;
 
-	vto = kmap_atomic(to->bv_page);
+	vto = kmap_atomic(bvec_page(to));
 	memcpy(vto + to->bv_offset, vfrom, to->bv_len);
 	kunmap_atomic(vto);
 }
@@ -143,17 +143,17 @@ static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
 
 	bio_for_each_segment(tovec, to, iter) {
 		fromvec = bio_iter_iovec(from, from_iter);
-		if (tovec.bv_page != fromvec.bv_page) {
+		if (bvec_page(&tovec) != bvec_page(&fromvec)) {
 			/*
 			 * fromvec->bv_offset and fromvec->bv_len might have
 			 * been modified by the block layer, so use the original
 			 * copy, bounce_copy_vec already uses tovec->bv_len
 			 */
-			vfrom = page_address(fromvec.bv_page) +
+			vfrom = page_address(bvec_page(&fromvec)) +
 				tovec.bv_offset;
 
 			bounce_copy_vec(&tovec, vfrom);
-			flush_dcache_page(tovec.bv_page);
+			flush_dcache_page(bvec_page(&tovec));
 		}
 		bio_advance_iter(from, &from_iter, tovec.bv_len);
 	}
@@ -172,9 +172,9 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool)
 	 */
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
 		orig_vec = bio_iter_iovec(bio_orig, orig_iter);
-		if (bvec->bv_page != orig_vec.bv_page) {
-			dec_zone_page_state(bvec->bv_page, NR_BOUNCE);
-			mempool_free(bvec->bv_page, pool);
+		if (bvec_page(bvec) != bvec_page(&orig_vec)) {
+			dec_zone_page_state(bvec_page(bvec), NR_BOUNCE);
+			mempool_free(bvec_page(bvec), pool);
 		}
 		bio_advance_iter(bio_orig, &orig_iter, orig_vec.bv_len);
 	}
@@ -299,7 +299,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 	bio_for_each_segment(from, *bio_orig, iter) {
 		if (i++ < BIO_MAX_PAGES)
 			sectors += from.bv_len >> 9;
-		if (page_to_pfn(from.bv_page) > q->limits.bounce_pfn)
+		if (page_to_pfn(bvec_page(&from)) > q->limits.bounce_pfn)
 			bounce = true;
 	}
 	if (!bounce)
@@ -320,20 +320,20 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 	 * because the 'bio' is single-page bvec.
 	 */
 	for (i = 0, to = bio->bi_io_vec; i < bio->bi_vcnt; to++, i++) {
-		struct page *page = to->bv_page;
+		struct page *page = bvec_page(to);
 
 		if (page_to_pfn(page) <= q->limits.bounce_pfn)
 			continue;
 
-		to->bv_page = mempool_alloc(pool, q->bounce_gfp);
-		inc_zone_page_state(to->bv_page, NR_BOUNCE);
+		bvec_set_page(to, mempool_alloc(pool, q->bounce_gfp));
+		inc_zone_page_state(bvec_page(to), NR_BOUNCE);
 
 		if (rw == WRITE) {
 			char *vto, *vfrom;
 
 			flush_dcache_page(page);
 
-			vto = page_address(to->bv_page) + to->bv_offset;
+			vto = page_address(bvec_page(to)) + to->bv_offset;
 			vfrom = kmap_atomic(page) + to->bv_offset;
 			memcpy(vto, vfrom, to->bv_len);
 			kunmap_atomic(vfrom);
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 62aed77d0bb9..b0894d2012ff 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -221,7 +221,7 @@ void t10_pi_prepare(struct request *rq, u8 protection_type)
 			void *p, *pmap;
 			unsigned int j;
 
-			pmap = kmap_atomic(iv.bv_page);
+			pmap = kmap_atomic(bvec_page(&iv));
 			p = pmap + iv.bv_offset;
 			for (j = 0; j < iv.bv_len; j += tuple_sz) {
 				struct t10_pi_tuple *pi = p;
@@ -276,7 +276,7 @@ void t10_pi_complete(struct request *rq, u8 protection_type,
 			void *p, *pmap;
 			unsigned int j;
 
-			pmap = kmap_atomic(iv.bv_page);
+			pmap = kmap_atomic(bvec_page(&iv));
 			p = pmap + iv.bv_offset;
 			for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
 				struct t10_pi_tuple *pi = p;
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 3cf9bc5d8d95..b73af6e22b90 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -300,7 +300,7 @@ skb_fillup(struct sk_buff *skb, struct bio *bio, struct bvec_iter iter)
 	struct bio_vec bv;
 
 	__bio_for_each_segment(bv, bio, iter, iter)
-		skb_fill_page_desc(skb, frag++, bv.bv_page,
+		skb_fill_page_desc(skb, frag++, bvec_page(&bv),
 				   bv.bv_offset, bv.bv_len);
 }
 
@@ -1028,7 +1028,7 @@ bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter iter, long cnt)
 	iter.bi_size = cnt;
 
 	__bio_for_each_segment(bv, bio, iter, iter) {
-		char *p = kmap_atomic(bv.bv_page) + bv.bv_offset;
+		char *p = kmap_atomic(bvec_page(&bv)) + bv.bv_offset;
 		skb_copy_bits(skb, soff, p, bv.bv_len);
 		kunmap_atomic(p);
 		soff += bv.bv_len;
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index c18586fccb6f..bf64e7bbe5ab 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -295,7 +295,7 @@ static blk_qc_t brd_make_request(struct request_queue *q, struct bio *bio)
 		unsigned int len = bvec.bv_len;
 		int err;
 
-		err = brd_do_bvec(brd, bvec.bv_page, len, bvec.bv_offset,
+		err = brd_do_bvec(brd, bvec_page(&bvec), len, bvec.bv_offset,
 				  bio_op(bio), sector);
 		if (err)
 			goto io_error;
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 11a85b740327..e567bc234781 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -977,7 +977,7 @@ static void drbd_bm_endio(struct bio *bio)
 	bm_page_unlock_io(device, idx);
 
 	if (ctx->flags & BM_AIO_COPY_PAGES)
-		mempool_free(bio->bi_io_vec[0].bv_page, &drbd_md_io_page_pool);
+		mempool_free(bvec_page(bio->bi_io_vec), &drbd_md_io_page_pool);
 
 	bio_put(bio);
 
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 714eb64fabfd..02d2e087226f 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1605,7 +1605,7 @@ static int _drbd_send_bio(struct drbd_peer_device *peer_device, struct bio *bio)
 	bio_for_each_segment(bvec, bio, iter) {
 		int err;
 
-		err = _drbd_no_send_page(peer_device, bvec.bv_page,
+		err = _drbd_no_send_page(peer_device, bvec_page(&bvec),
 					 bvec.bv_offset, bvec.bv_len,
 					 bio_iter_last(bvec, iter)
 					 ? 0 : MSG_MORE);
@@ -1627,7 +1627,7 @@ static int _drbd_send_zc_bio(struct drbd_peer_device *peer_device, struct bio *b
 	bio_for_each_segment(bvec, bio, iter) {
 		int err;
 
-		err = _drbd_send_page(peer_device, bvec.bv_page,
+		err = _drbd_send_page(peer_device, bvec_page(&bvec),
 				      bvec.bv_offset, bvec.bv_len,
 				      bio_iter_last(bvec, iter) ? 0 : MSG_MORE);
 		if (err)
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index c7ad88d91a09..ee7c77445456 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -2044,10 +2044,10 @@ static int recv_dless_read(struct drbd_peer_device *peer_device, struct drbd_req
 	D_ASSERT(peer_device->device, sector == bio->bi_iter.bi_sector);
 
 	bio_for_each_segment(bvec, bio, iter) {
-		void *mapped = kmap(bvec.bv_page) + bvec.bv_offset;
+		void *mapped = kmap(bvec_page(&bvec)) + bvec.bv_offset;
 		expect = min_t(int, data_size, bvec.bv_len);
 		err = drbd_recv_all_warn(peer_device->connection, mapped, expect);
-		kunmap(bvec.bv_page);
+		kunmap(bvec_page(&bvec));
 		if (err)
 			return err;
 		data_size -= expect;
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 268ef0c5d4ab..2fa4304f07af 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -339,7 +339,7 @@ void drbd_csum_bio(struct crypto_shash *tfm, struct bio *bio, void *digest)
 	bio_for_each_segment(bvec, bio, iter) {
 		u8 *src;
 
-		src = kmap_atomic(bvec.bv_page);
+		src = kmap_atomic(bvec_page(&bvec));
 		crypto_shash_update(desc, src + bvec.bv_offset, bvec.bv_len);
 		kunmap_atomic(src);
 
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 95f608d1a098..6201106cb7e3 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -2372,7 +2372,7 @@ static int buffer_chain_size(void)
 	size = 0;
 
 	rq_for_each_segment(bv, current_req, iter) {
-		if (page_address(bv.bv_page) + bv.bv_offset != base + size)
+		if (page_address(bvec_page(&bv)) + bv.bv_offset != base + size)
 			break;
 
 		size += bv.bv_len;
@@ -2442,7 +2442,7 @@ static void copy_buffer(int ssize, int max_sector, int max_sector_2)
 		size = bv.bv_len;
 		SUPBOUND(size, remaining);
 
-		buffer = page_address(bv.bv_page) + bv.bv_offset;
+		buffer = page_address(bvec_page(&bv)) + bv.bv_offset;
 		if (dma_buffer + size >
 		    floppy_track_buffer + (max_buffer_sectors << 10) ||
 		    dma_buffer < floppy_track_buffer) {
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bf1c61cab8eb..d9fd8b2a6b14 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -321,12 +321,12 @@ static int lo_write_transfer(struct loop_device *lo, struct request *rq,
 		return -ENOMEM;
 
 	rq_for_each_segment(bvec, rq, iter) {
-		ret = lo_do_transfer(lo, WRITE, page, 0, bvec.bv_page,
-			bvec.bv_offset, bvec.bv_len, pos >> 9);
+		ret = lo_do_transfer(lo, WRITE, page, 0, bvec_page(&bvec),
+				     bvec.bv_offset, bvec.bv_len, pos >> 9);
 		if (unlikely(ret))
 			break;
 
-		b.bv_page = page;
+		bvec_set_page(&b, page);
 		b.bv_offset = 0;
 		b.bv_len = bvec.bv_len;
 		ret = lo_write_bvec(lo->lo_backing_file, &b, &pos);
@@ -352,7 +352,7 @@ static int lo_read_simple(struct loop_device *lo, struct request *rq,
 		if (len < 0)
 			return len;
 
-		flush_dcache_page(bvec.bv_page);
+		flush_dcache_page(bvec_page(&bvec));
 
 		if (len != bvec.bv_len) {
 			struct bio *bio;
@@ -384,7 +384,7 @@ static int lo_read_transfer(struct loop_device *lo, struct request *rq,
 	rq_for_each_segment(bvec, rq, iter) {
 		loff_t offset = pos;
 
-		b.bv_page = page;
+		bvec_set_page(&b, page);
 		b.bv_offset = 0;
 		b.bv_len = bvec.bv_len;
 
@@ -395,12 +395,12 @@ static int lo_read_transfer(struct loop_device *lo, struct request *rq,
 			goto out_free_page;
 		}
 
-		ret = lo_do_transfer(lo, READ, page, 0, bvec.bv_page,
-			bvec.bv_offset, len, offset >> 9);
+		ret = lo_do_transfer(lo, READ, page, 0, bvec_page(&bvec),
+				     bvec.bv_offset, len, offset >> 9);
 		if (ret)
 			goto out_free_page;
 
-		flush_dcache_page(bvec.bv_page);
+		flush_dcache_page(bvec_page(&bvec));
 
 		if (len != bvec.bv_len) {
 			struct bio *bio;
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 417a9f15c116..d917826108a4 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1067,7 +1067,8 @@ static int null_handle_rq(struct nullb_cmd *cmd)
 	spin_lock_irq(&nullb->lock);
 	rq_for_each_segment(bvec, rq, iter) {
 		len = bvec.bv_len;
-		err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
+		err = null_transfer(nullb, bvec_page(&bvec), len,
+				     bvec.bv_offset,
 				     op_is_write(req_op(rq)), sector,
 				     req_op(rq) & REQ_FUA);
 		if (err) {
@@ -1102,7 +1103,8 @@ static int null_handle_bio(struct nullb_cmd *cmd)
 	spin_lock_irq(&nullb->lock);
 	bio_for_each_segment(bvec, bio, iter) {
 		len = bvec.bv_len;
-		err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
+		err = null_transfer(nullb, bvec_page(&bvec), len,
+				     bvec.bv_offset,
 				     op_is_write(bio_op(bio)), sector,
 				     bio->bi_opf & REQ_FUA);
 		if (err) {
diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 4e1d9b31f60c..da3e33ede30f 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -113,7 +113,7 @@ static void ps3disk_scatter_gather(struct ps3_storage_device *dev,
 		else
 			memcpy(buf, dev->bounce_buf+offset, size);
 		offset += size;
-		flush_kernel_dcache_page(bvec.bv_page);
+		flush_kernel_dcache_page(bvec_page(&bvec));
 		bvec_kunmap_irq(buf, &flags);
 		i++;
 	}
diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index c0c50816a10b..881f90a13472 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -547,7 +547,7 @@ static struct bio *ps3vram_do_bio(struct ps3_system_bus_device *dev,
 
 	bio_for_each_segment(bvec, bio, iter) {
 		/* PS3 is ppc64, so we don't handle highmem */
-		char *ptr = page_address(bvec.bv_page) + bvec.bv_offset;
+		char *ptr = page_address(bvec_page(&bvec)) + bvec.bv_offset;
 		size_t len = bvec.bv_len, retlen;
 
 		dev_dbg(&dev->core, "    %s %zu bytes at offset %llu\n", op,
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index aa3b82be5946..4cb84c2507f3 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1284,7 +1284,7 @@ static void zero_bvec(struct bio_vec *bv)
 
 	buf = bvec_kmap_irq(bv, &flags);
 	memset(buf, 0, bv->bv_len);
-	flush_dcache_page(bv->bv_page);
+	flush_dcache_page(bvec_page(bv));
 	bvec_kunmap_irq(buf, &flags);
 }
 
@@ -1587,8 +1587,8 @@ static void rbd_obj_request_destroy(struct kref *kref)
 	kfree(obj_request->img_extents);
 	if (obj_request->copyup_bvecs) {
 		for (i = 0; i < obj_request->copyup_bvec_count; i++) {
-			if (obj_request->copyup_bvecs[i].bv_page)
-				__free_page(obj_request->copyup_bvecs[i].bv_page);
+			if (bvec_page(&obj_request->copyup_bvecs[i]))
+				__free_page(bvec_page(&obj_request->copyup_bvecs[i]));
 		}
 		kfree(obj_request->copyup_bvecs);
 	}
@@ -2595,8 +2595,8 @@ static int setup_copyup_bvecs(struct rbd_obj_request *obj_req, u64 obj_overlap)
 	for (i = 0; i < obj_req->copyup_bvec_count; i++) {
 		unsigned int len = min(obj_overlap, (u64)PAGE_SIZE);
 
-		obj_req->copyup_bvecs[i].bv_page = alloc_page(GFP_NOIO);
-		if (!obj_req->copyup_bvecs[i].bv_page)
+		bvec_set_page(&obj_req->copyup_bvecs[i], alloc_page(GFP_NOIO));
+		if (!bvec_page(&obj_req->copyup_bvecs[i]))
 			return -ENOMEM;
 
 		obj_req->copyup_bvecs[i].bv_offset = 0;
diff --git a/drivers/block/rsxx/dma.c b/drivers/block/rsxx/dma.c
index af9cf0215164..699fa8c02bac 100644
--- a/drivers/block/rsxx/dma.c
+++ b/drivers/block/rsxx/dma.c
@@ -737,7 +737,8 @@ blk_status_t rsxx_dma_queue_bio(struct rsxx_cardinfo *card,
 				st = rsxx_queue_dma(card, &dma_list[tgt],
 							bio_data_dir(bio),
 							dma_off, dma_len,
-							laddr, bvec.bv_page,
+							laddr,
+							bvec_page(&bvec),
 							bv_off, cb, cb_data);
 				if (st)
 					goto bvec_err;
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index aa035cf8a51d..ba093a7bef6a 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -364,7 +364,7 @@ static int add_bio(struct cardinfo *card)
 	vec = bio_iter_iovec(bio, card->current_iter);
 
 	dma_handle = dma_map_page(&card->dev->dev,
-				  vec.bv_page,
+				  bvec_page(&vec),
 				  vec.bv_offset,
 				  vec.bv_len,
 				  bio_op(bio) == REQ_OP_READ ?
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4bc083b7c9b5..20671b0ca92b 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -198,7 +198,7 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap)
 		n++;
 	}
 
-	req->special_vec.bv_page = virt_to_page(range);
+	bvec_set_page(&req->special_vec, virt_to_page(range));
 	req->special_vec.bv_offset = offset_in_page(range);
 	req->special_vec.bv_len = sizeof(*range) * segments;
 	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
@@ -211,7 +211,7 @@ static inline void virtblk_request_done(struct request *req)
 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
 
 	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
-		kfree(page_address(req->special_vec.bv_page) +
+		kfree(page_address(bvec_page(&req->special_vec)) +
 		      req->special_vec.bv_offset);
 	}
 
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d58a359a6622..04fb864b16f5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -596,7 +596,7 @@ static int read_from_bdev_async(struct zram *zram, struct bio_vec *bvec,
 
 	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
 	bio_set_dev(bio, zram->bdev);
-	if (!bio_add_page(bio, bvec->bv_page, bvec->bv_len, bvec->bv_offset)) {
+	if (!bio_add_page(bio, bvec_page(bvec), bvec->bv_len, bvec->bv_offset)) {
 		bio_put(bio);
 		return -EIO;
 	}
@@ -656,7 +656,7 @@ static ssize_t writeback_store(struct device *dev,
 	for (index = 0; index < nr_pages; index++) {
 		struct bio_vec bvec;
 
-		bvec.bv_page = page;
+		bvec_set_page(&bvec, page);
 		bvec.bv_len = PAGE_SIZE;
 		bvec.bv_offset = 0;
 
@@ -712,7 +712,7 @@ static ssize_t writeback_store(struct device *dev,
 		bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
 		bio.bi_opf = REQ_OP_WRITE | REQ_SYNC;
 
-		bio_add_page(&bio, bvec.bv_page, bvec.bv_len,
+		bio_add_page(&bio, bvec_page(&bvec), bvec.bv_len,
 				bvec.bv_offset);
 		/*
 		 * XXX: A single page IO would be inefficient for write
@@ -1223,7 +1223,7 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 
 		zram_slot_unlock(zram, index);
 
-		bvec.bv_page = page;
+		bvec_set_page(&bvec, page);
 		bvec.bv_len = PAGE_SIZE;
 		bvec.bv_offset = 0;
 		return read_from_bdev(zram, &bvec,
@@ -1276,7 +1276,7 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 	int ret;
 	struct page *page;
 
-	page = bvec->bv_page;
+	page = bvec_page(bvec);
 	if (is_partial_io(bvec)) {
 		/* Use a temporary buffer to decompress the page */
 		page = alloc_page(GFP_NOIO|__GFP_HIGHMEM);
@@ -1289,7 +1289,7 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 		goto out;
 
 	if (is_partial_io(bvec)) {
-		void *dst = kmap_atomic(bvec->bv_page);
+		void *dst = kmap_atomic(bvec_page(bvec));
 		void *src = kmap_atomic(page);
 
 		memcpy(dst + bvec->bv_offset, src + offset, bvec->bv_len);
@@ -1312,7 +1312,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	unsigned int comp_len = 0;
 	void *src, *dst, *mem;
 	struct zcomp_strm *zstrm;
-	struct page *page = bvec->bv_page;
+	struct page *page = bvec_page(bvec);
 	unsigned long element = 0;
 	enum zram_pageflags flags = 0;
 
@@ -1442,13 +1442,13 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 		if (ret)
 			goto out;
 
-		src = kmap_atomic(bvec->bv_page);
+		src = kmap_atomic(bvec_page(bvec));
 		dst = kmap_atomic(page);
 		memcpy(dst + offset, src + bvec->bv_offset, bvec->bv_len);
 		kunmap_atomic(dst);
 		kunmap_atomic(src);
 
-		vec.bv_page = page;
+		bvec_set_page(&vec, page);
 		vec.bv_len = PAGE_SIZE;
 		vec.bv_offset = 0;
 	}
@@ -1516,7 +1516,7 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 	if (!op_is_write(op)) {
 		atomic64_inc(&zram->stats.num_reads);
 		ret = zram_bvec_read(zram, bvec, index, offset, bio);
-		flush_dcache_page(bvec->bv_page);
+		flush_dcache_page(bvec_page(bvec));
 	} else {
 		atomic64_inc(&zram->stats.num_writes);
 		ret = zram_bvec_write(zram, bvec, index, offset, bio);
@@ -1643,7 +1643,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	index = sector >> SECTORS_PER_PAGE_SHIFT;
 	offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
 
-	bv.bv_page = page;
+	bvec_set_page(&bv, page);
 	bv.bv_len = PAGE_SIZE;
 	bv.bv_offset = 0;
 
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6ca868868fee..6ddb1e8a7223 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -330,7 +330,7 @@ void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off,
 
 	for (i = off; i < nr_pages + off; i++) {
 		bv = bio->bi_io_vec[i];
-		mempool_free(bv.bv_page, &pblk->page_bio_pool);
+		mempool_free(bvec_page(&bv), &pblk->page_bio_pool);
 	}
 }
 
@@ -2188,8 +2188,7 @@ void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd)
 		/* We need to reuse last page of request (packed metadata)
 		 * in similar way as traditional oob metadata
 		 */
-		buffer = page_to_virt(
-			rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+		buffer = page_to_virt(bvec_page(&rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1]));
 	}
 
 	return buffer;
@@ -2204,7 +2203,7 @@ void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
 	if (pblk_is_oob_meta_supported(pblk))
 		return;
 
-	page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+	page = page_to_virt(bvec_page(&rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1]));
 	/* We need to fill oob meta buffer with data from packed metadata */
 	for (; i < rqd->nr_ppas; i++)
 		memcpy(pblk_get_meta(pblk, meta_list, i),
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 3789185144da..20486aa4b25e 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -270,8 +270,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
 		src_bv = new_bio->bi_io_vec[i++];
 		dst_bv = bio->bi_io_vec[bio_init_idx + hole];
 
-		src_p = kmap_atomic(src_bv.bv_page);
-		dst_p = kmap_atomic(dst_bv.bv_page);
+		src_p = kmap_atomic(bvec_page(&src_bv));
+		dst_p = kmap_atomic(bvec_page(&dst_bv));
 
 		memcpy(dst_p + dst_bv.bv_offset,
 			src_p + src_bv.bv_offset,
@@ -280,7 +280,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
 		kunmap_atomic(src_p);
 		kunmap_atomic(dst_p);
 
-		mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
+		mempool_free(bvec_page(&src_bv), &pblk->page_bio_pool);
 
 		hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
 	} while (hole < nr_secs);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 8b123be05254..5ee5aa937589 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -127,11 +127,11 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 
 	citer.bi_size = UINT_MAX;
 	bio_for_each_segment(bv, bio, iter) {
-		void *p1 = kmap_atomic(bv.bv_page);
+		void *p1 = kmap_atomic(bvec_page(&bv));
 		void *p2;
 
 		cbv = bio_iter_iovec(check, citer);
-		p2 = page_address(cbv.bv_page);
+		p2 = page_address(bvec_page(&cbv));
 
 		cache_set_err_on(memcmp(p1 + bv.bv_offset,
 					p2 + bv.bv_offset,
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index f101bfe8657a..a9262f9e49ab 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -44,10 +44,10 @@ static void bio_csum(struct bio *bio, struct bkey *k)
 	uint64_t csum = 0;
 
 	bio_for_each_segment(bv, bio, iter) {
-		void *d = kmap(bv.bv_page) + bv.bv_offset;
+		void *d = kmap(bvec_page(&bv)) + bv.bv_offset;
 
 		csum = bch_crc64_update(csum, d, bv.bv_len);
-		kunmap(bv.bv_page);
+		kunmap(bvec_page(&bv));
 	}
 
 	k->ptr[KEY_PTRS(k)] = csum & (~0ULL >> 1);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index a697a3a923cd..7631065e193f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1293,7 +1293,7 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
 	dc->bdev->bd_holder = dc;
 
 	bio_init(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1);
-	bio_first_bvec_all(&dc->sb_bio)->bv_page = sb_page;
+	bvec_set_page(bio_first_bvec_all(&dc->sb_bio), sb_page);
 	get_page(sb_page);
 
 
@@ -2036,7 +2036,7 @@ void bch_cache_release(struct kobject *kobj)
 	for (i = 0; i < RESERVE_NR; i++)
 		free_fifo(&ca->free[i]);
 
-	if (ca->sb_bio.bi_inline_vecs[0].bv_page)
+	if (bvec_page(ca->sb_bio.bi_inline_vecs))
 		put_page(bio_first_page_all(&ca->sb_bio));
 
 	if (!IS_ERR_OR_NULL(ca->bdev))
@@ -2171,7 +2171,7 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
 	ca->bdev->bd_holder = ca;
 
 	bio_init(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1);
-	bio_first_bvec_all(&ca->sb_bio)->bv_page = sb_page;
+	bvec_set_page(bio_first_bvec_all(&ca->sb_bio), sb_page);
 	get_page(sb_page);
 
 	if (blk_queue_discard(bdev_get_queue(bdev)))
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 62fb917f7a4f..c28bf9162184 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -244,9 +244,8 @@ void bch_bio_map(struct bio *bio, void *base)
 start:		bv->bv_len	= min_t(size_t, PAGE_SIZE - bv->bv_offset,
 					size);
 		if (base) {
-			bv->bv_page = is_vmalloc_addr(base)
-				? vmalloc_to_page(base)
-				: virt_to_page(base);
+			bvec_set_page(bv,
+				      is_vmalloc_addr(base) ? vmalloc_to_page(base) : virt_to_page(base));
 
 			base += bv->bv_len;
 		}
@@ -275,10 +274,10 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
 	 * bvec table directly.
 	 */
 	for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++, i++) {
-		bv->bv_page = alloc_page(gfp_mask);
-		if (!bv->bv_page) {
+		bvec_set_page(bv, alloc_page(gfp_mask));
+		if (!bvec_page(bv)) {
 			while (--bv >= bio->bi_io_vec)
-				__free_page(bv->bv_page);
+				__free_page(bvec_page(bv));
 			return -ENOMEM;
 		}
 	}
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index dd6565798778..ef7896c50814 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1107,13 +1107,15 @@ static int crypt_convert_block_aead(struct crypt_config *cc,
 	sg_init_table(dmreq->sg_in, 4);
 	sg_set_buf(&dmreq->sg_in[0], sector, sizeof(uint64_t));
 	sg_set_buf(&dmreq->sg_in[1], org_iv, cc->iv_size);
-	sg_set_page(&dmreq->sg_in[2], bv_in.bv_page, cc->sector_size, bv_in.bv_offset);
+	sg_set_page(&dmreq->sg_in[2], bvec_page(&bv_in), cc->sector_size,
+		    bv_in.bv_offset);
 	sg_set_buf(&dmreq->sg_in[3], tag, cc->integrity_tag_size);
 
 	sg_init_table(dmreq->sg_out, 4);
 	sg_set_buf(&dmreq->sg_out[0], sector, sizeof(uint64_t));
 	sg_set_buf(&dmreq->sg_out[1], org_iv, cc->iv_size);
-	sg_set_page(&dmreq->sg_out[2], bv_out.bv_page, cc->sector_size, bv_out.bv_offset);
+	sg_set_page(&dmreq->sg_out[2], bvec_page(&bv_out), cc->sector_size,
+		    bv_out.bv_offset);
 	sg_set_buf(&dmreq->sg_out[3], tag, cc->integrity_tag_size);
 
 	if (cc->iv_gen_ops) {
@@ -1196,10 +1198,12 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 	sg_out = &dmreq->sg_out[0];
 
 	sg_init_table(sg_in, 1);
-	sg_set_page(sg_in, bv_in.bv_page, cc->sector_size, bv_in.bv_offset);
+	sg_set_page(sg_in, bvec_page(&bv_in), cc->sector_size,
+		    bv_in.bv_offset);
 
 	sg_init_table(sg_out, 1);
-	sg_set_page(sg_out, bv_out.bv_page, cc->sector_size, bv_out.bv_offset);
+	sg_set_page(sg_out, bvec_page(&bv_out), cc->sector_size,
+		    bv_out.bv_offset);
 
 	if (cc->iv_gen_ops) {
 		/* For READs use IV stored in integrity metadata */
@@ -1450,8 +1454,8 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bv, clone, i, iter_all) {
-		BUG_ON(!bv->bv_page);
-		mempool_free(bv->bv_page, &cc->page_pool);
+		BUG_ON(!bvec_page(bv));
+		mempool_free(bvec_page(bv), &cc->page_pool);
 	}
 }
 
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index d57d997a52c8..3c6873303190 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -1352,7 +1352,7 @@ static void integrity_metadata(struct work_struct *w)
 			char *mem, *checksums_ptr;
 
 again:
-			mem = (char *)kmap_atomic(bv.bv_page) + bv.bv_offset;
+			mem = (char *)kmap_atomic(bvec_page(&bv)) + bv.bv_offset;
 			pos = 0;
 			checksums_ptr = checksums;
 			do {
@@ -1404,8 +1404,8 @@ static void integrity_metadata(struct work_struct *w)
 				unsigned char *tag;
 				unsigned this_len;
 
-				BUG_ON(PageHighMem(biv.bv_page));
-				tag = lowmem_page_address(biv.bv_page) + biv.bv_offset;
+				BUG_ON(PageHighMem(bvec_page(&biv)));
+				tag = lowmem_page_address(bvec_page(&biv)) + biv.bv_offset;
 				this_len = min(biv.bv_len, data_to_process);
 				r = dm_integrity_rw_tag(ic, tag, &dio->metadata_block, &dio->metadata_offset,
 							this_len, !dio->write ? TAG_READ : TAG_WRITE);
@@ -1525,9 +1525,9 @@ static bool __journal_read_write(struct dm_integrity_io *dio, struct bio *bio,
 		n_sectors -= bv.bv_len >> SECTOR_SHIFT;
 		bio_advance_iter(bio, &bio->bi_iter, bv.bv_len);
 retry_kmap:
-		mem = kmap_atomic(bv.bv_page);
+		mem = kmap_atomic(bvec_page(&bv));
 		if (likely(dio->write))
-			flush_dcache_page(bv.bv_page);
+			flush_dcache_page(bvec_page(&bv));
 
 		do {
 			struct journal_entry *je = access_journal_entry(ic, journal_section, journal_entry);
@@ -1538,7 +1538,7 @@ static bool __journal_read_write(struct dm_integrity_io *dio, struct bio *bio,
 				unsigned s;
 
 				if (unlikely(journal_entry_is_inprogress(je))) {
-					flush_dcache_page(bv.bv_page);
+					flush_dcache_page(bvec_page(&bv));
 					kunmap_atomic(mem);
 
 					__io_wait_event(ic->copy_to_journal_wait, !journal_entry_is_inprogress(je));
@@ -1577,8 +1577,8 @@ static bool __journal_read_write(struct dm_integrity_io *dio, struct bio *bio,
 					struct bio_vec biv = bvec_iter_bvec(bip->bip_vec, bip->bip_iter);
 					unsigned tag_now = min(biv.bv_len, tag_todo);
 					char *tag_addr;
-					BUG_ON(PageHighMem(biv.bv_page));
-					tag_addr = lowmem_page_address(biv.bv_page) + biv.bv_offset;
+					BUG_ON(PageHighMem(bvec_page(&biv)));
+					tag_addr = lowmem_page_address(bvec_page(&biv)) + biv.bv_offset;
 					if (likely(dio->write))
 						memcpy(tag_ptr, tag_addr, tag_now);
 					else
@@ -1629,7 +1629,7 @@ static bool __journal_read_write(struct dm_integrity_io *dio, struct bio *bio,
 		} while (bv.bv_len -= ic->sectors_per_block << SECTOR_SHIFT);
 
 		if (unlikely(!dio->write))
-			flush_dcache_page(bv.bv_page);
+			flush_dcache_page(bvec_page(&bv));
 		kunmap_atomic(mem);
 	} while (n_sectors);
 
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 81ffc59d05c9..81a346f9de17 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -211,7 +211,7 @@ static void bio_get_page(struct dpages *dp, struct page **p,
 	struct bio_vec bvec = bvec_iter_bvec((struct bio_vec *)dp->context_ptr,
 					     dp->context_bi);
 
-	*p = bvec.bv_page;
+	*p = bvec_page(&bvec);
 	*len = bvec.bv_len;
 	*offset = bvec.bv_offset;
 
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 9ea2b0291f20..e403fcb5c30a 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -190,8 +190,8 @@ static void free_pending_block(struct log_writes_c *lc,
 	int i;
 
 	for (i = 0; i < block->vec_cnt; i++) {
-		if (block->vecs[i].bv_page)
-			__free_page(block->vecs[i].bv_page);
+		if (bvec_page(&block->vecs[i]))
+			__free_page(bvec_page(&block->vecs[i]));
 	}
 	kfree(block->data);
 	kfree(block);
@@ -370,7 +370,7 @@ static int log_one_block(struct log_writes_c *lc,
 		 * The page offset is always 0 because we allocate a new page
 		 * for every bvec in the original bio for simplicity sake.
 		 */
-		ret = bio_add_page(bio, block->vecs[i].bv_page,
+		ret = bio_add_page(bio, bvec_page(&block->vecs[i]),
 				   block->vecs[i].bv_len, 0);
 		if (ret != block->vecs[i].bv_len) {
 			atomic_inc(&lc->io_blocks);
@@ -387,7 +387,7 @@ static int log_one_block(struct log_writes_c *lc,
 			bio->bi_private = lc;
 			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 
-			ret = bio_add_page(bio, block->vecs[i].bv_page,
+			ret = bio_add_page(bio, bvec_page(&block->vecs[i]),
 					   block->vecs[i].bv_len, 0);
 			if (ret != block->vecs[i].bv_len) {
 				DMERR("Couldn't add page on new bio?");
@@ -746,12 +746,12 @@ static int log_writes_map(struct dm_target *ti, struct bio *bio)
 			return DM_MAPIO_KILL;
 		}
 
-		src = kmap_atomic(bv.bv_page);
+		src = kmap_atomic(bvec_page(&bv));
 		dst = kmap_atomic(page);
 		memcpy(dst, src + bv.bv_offset, bv.bv_len);
 		kunmap_atomic(dst);
 		kunmap_atomic(src);
-		block->vecs[i].bv_page = page;
+		bvec_set_page(&block->vecs[i], page);
 		block->vecs[i].bv_len = bv.bv_len;
 		block->vec_cnt++;
 		i++;
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index f4c31ffaa88e..cafa738f3b57 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -388,7 +388,7 @@ static int verity_for_io_block(struct dm_verity *v, struct dm_verity_io *io,
 		 * until you consider the typical block size is 4,096B.
 		 * Going through this loops twice should be very rare.
 		 */
-		sg_set_page(&sg, bv.bv_page, len, bv.bv_offset);
+		sg_set_page(&sg, bvec_page(&bv), len, bv.bv_offset);
 		ahash_request_set_crypt(req, &sg, NULL, len);
 		r = crypto_wait_req(crypto_ahash_update(req), wait);
 
@@ -423,7 +423,7 @@ int verity_for_bv_block(struct dm_verity *v, struct dm_verity_io *io,
 		unsigned len;
 		struct bio_vec bv = bio_iter_iovec(bio, *iter);
 
-		page = kmap_atomic(bv.bv_page);
+		page = kmap_atomic(bvec_page(&bv));
 		len = bv.bv_len;
 
 		if (likely(len >= todo))
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c033bfcb209e..4913cdbd18eb 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1129,9 +1129,11 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 				 * must be preparing for prexor in rmw; read
 				 * the data into orig_page
 				 */
-				sh->dev[i].vec.bv_page = sh->dev[i].orig_page;
+				bvec_set_page(&sh->dev[i].vec,
+				              sh->dev[i].orig_page);
 			else
-				sh->dev[i].vec.bv_page = sh->dev[i].page;
+				bvec_set_page(&sh->dev[i].vec,
+				              sh->dev[i].page);
 			bi->bi_vcnt = 1;
 			bi->bi_io_vec[0].bv_len = STRIPE_SIZE;
 			bi->bi_io_vec[0].bv_offset = 0;
@@ -1185,7 +1187,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 						  + rrdev->data_offset);
 			if (test_bit(R5_SkipCopy, &sh->dev[i].flags))
 				WARN_ON(test_bit(R5_UPTODATE, &sh->dev[i].flags));
-			sh->dev[i].rvec.bv_page = sh->dev[i].page;
+			bvec_set_page(&sh->dev[i].rvec, sh->dev[i].page);
 			rbi->bi_vcnt = 1;
 			rbi->bi_io_vec[0].bv_len = STRIPE_SIZE;
 			rbi->bi_io_vec[0].bv_offset = 0;
@@ -1267,7 +1269,7 @@ async_copy_data(int frombio, struct bio *bio, struct page **page,
 
 		if (clen > 0) {
 			b_offset += bvl.bv_offset;
-			bio_page = bvl.bv_page;
+			bio_page = bvec_page(&bvl);
 			if (frombio) {
 				if (sh->raid_conf->skip_copy &&
 				    b_offset == 0 && page_offset == 0 &&
diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index db45c6bbb7bb..61ddf27726d1 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -97,7 +97,7 @@ static int nd_blk_rw_integrity(struct nd_namespace_blk *nsblk,
 		 */
 
 		cur_len = min(len, bv.bv_len);
-		iobuf = kmap_atomic(bv.bv_page);
+		iobuf = kmap_atomic(bvec_page(&bv));
 		err = ndbr->do_io(ndbr, dev_offset, iobuf + bv.bv_offset,
 				cur_len, rw);
 		kunmap_atomic(iobuf);
@@ -191,8 +191,8 @@ static blk_qc_t nd_blk_make_request(struct request_queue *q, struct bio *bio)
 		unsigned int len = bvec.bv_len;
 
 		BUG_ON(len > PAGE_SIZE);
-		err = nsblk_do_bvec(nsblk, bip, bvec.bv_page, len,
-				bvec.bv_offset, rw, iter.bi_sector);
+		err = nsblk_do_bvec(nsblk, bip, bvec_page(&bvec), len,
+				    bvec.bv_offset, rw, iter.bi_sector);
 		if (err) {
 			dev_dbg(&nsblk->common.dev,
 					"io error in %s sector %lld, len %d,\n",
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 4671776f5623..f3e7be5bdb25 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1171,7 +1171,7 @@ static int btt_rw_integrity(struct btt *btt, struct bio_integrity_payload *bip,
 		 */
 
 		cur_len = min(len, bv.bv_len);
-		mem = kmap_atomic(bv.bv_page);
+		mem = kmap_atomic(bvec_page(&bv));
 		if (rw)
 			ret = arena_write_bytes(arena, meta_nsoff,
 					mem + bv.bv_offset, cur_len,
@@ -1472,7 +1472,8 @@ static blk_qc_t btt_make_request(struct request_queue *q, struct bio *bio)
 			break;
 		}
 
-		err = btt_do_bvec(btt, bip, bvec.bv_page, len, bvec.bv_offset,
+		err = btt_do_bvec(btt, bip, bvec_page(&bvec), len,
+				  bvec.bv_offset,
 				  bio_op(bio), iter.bi_sector);
 		if (err) {
 			dev_err(&btt->nd_btt->dev,
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..04a6932fdd69 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -205,8 +205,8 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
-		rc = pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len,
-				bvec.bv_offset, bio_op(bio), iter.bi_sector);
+		rc = pmem_do_bvec(pmem, bvec_page(&bvec), bvec.bv_len,
+				  bvec.bv_offset, bio_op(bio), iter.bi_sector);
 		if (rc) {
 			bio->bi_status = rc;
 			break;
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 470601980794..942287e38b14 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -600,7 +600,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 	cmnd->dsm.nr = cpu_to_le32(segments - 1);
 	cmnd->dsm.attributes = cpu_to_le32(NVME_DSMGMT_AD);
 
-	req->special_vec.bv_page = virt_to_page(range);
+	bvec_set_page(&req->special_vec, virt_to_page(range));
 	req->special_vec.bv_offset = offset_in_page(range);
 	req->special_vec.bv_len = sizeof(*range) * segments;
 	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
@@ -691,7 +691,7 @@ void nvme_cleanup_cmd(struct request *req)
 	}
 	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
 		struct nvme_ns *ns = req->rq_disk->private_data;
-		struct page *page = req->special_vec.bv_page;
+		struct page *page = bvec_page(&req->special_vec);
 
 		if (page == ns->ctrl->discard_page)
 			clear_bit_unlock(0, &ns->ctrl->discard_page_busy);
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 68c49dd67210..0023d564d9fd 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -175,7 +175,7 @@ static inline bool nvme_tcp_has_inline_data(struct nvme_tcp_request *req)
 
 static inline struct page *nvme_tcp_req_cur_page(struct nvme_tcp_request *req)
 {
-	return req->iter.bvec->bv_page;
+	return bvec_page(req->iter.bvec);
 }
 
 static inline size_t nvme_tcp_req_cur_offset(struct nvme_tcp_request *req)
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index bc6ebb51b0bf..24f95424d814 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -77,7 +77,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
 
 static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
 {
-	bv->bv_page = sg_page(sg);
+	bvec_set_page(bv, sg_page(sg));
 	bv->bv_offset = sg->offset;
 	bv->bv_len = sg->length;
 }
diff --git a/drivers/s390/block/dasd_diag.c b/drivers/s390/block/dasd_diag.c
index e1fe02477ea8..65ee8b2a4953 100644
--- a/drivers/s390/block/dasd_diag.c
+++ b/drivers/s390/block/dasd_diag.c
@@ -546,7 +546,7 @@ static struct dasd_ccw_req *dasd_diag_build_cp(struct dasd_device *memdev,
 	dbio = dreq->bio;
 	recid = first_rec;
 	rq_for_each_segment(bv, req, iter) {
-		dst = page_address(bv.bv_page) + bv.bv_offset;
+		dst = page_address(bvec_page(&bv)) + bv.bv_offset;
 		for (off = 0; off < bv.bv_len; off += blksize) {
 			memset(dbio, 0, sizeof (struct dasd_diag_bio));
 			dbio->type = rw_cmd;
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
index 6e294b4d3635..35948cc5c618 100644
--- a/drivers/s390/block/dasd_eckd.c
+++ b/drivers/s390/block/dasd_eckd.c
@@ -3078,7 +3078,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_cmd_single(
 			/* Eckd can only do full blocks. */
 			return ERR_PTR(-EINVAL);
 		count += bv.bv_len >> (block->s2b_shift + 9);
-		if (idal_is_needed (page_address(bv.bv_page), bv.bv_len))
+		if (idal_is_needed (page_address(bvec_page(&bv)), bv.bv_len))
 			cidaw += bv.bv_len >> (block->s2b_shift + 9);
 	}
 	/* Paranoia. */
@@ -3149,7 +3149,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_cmd_single(
 			      last_rec - recid + 1, cmd, basedev, blksize);
 	}
 	rq_for_each_segment(bv, req, iter) {
-		dst = page_address(bv.bv_page) + bv.bv_offset;
+		dst = page_address(bvec_page(&bv)) + bv.bv_offset;
 		if (dasd_page_cache) {
 			char *copy = kmem_cache_alloc(dasd_page_cache,
 						      GFP_DMA | __GFP_NOWARN);
@@ -3308,7 +3308,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_cmd_track(
 	idaw_dst = NULL;
 	idaw_len = 0;
 	rq_for_each_segment(bv, req, iter) {
-		dst = page_address(bv.bv_page) + bv.bv_offset;
+		dst = page_address(bvec_page(&bv)) + bv.bv_offset;
 		seg_len = bv.bv_len;
 		while (seg_len) {
 			if (new_track) {
@@ -3646,7 +3646,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_tpm_track(
 		new_track = 1;
 		recid = first_rec;
 		rq_for_each_segment(bv, req, iter) {
-			dst = page_address(bv.bv_page) + bv.bv_offset;
+			dst = page_address(bvec_page(&bv)) + bv.bv_offset;
 			seg_len = bv.bv_len;
 			while (seg_len) {
 				if (new_track) {
@@ -3679,7 +3679,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_tpm_track(
 		}
 	} else {
 		rq_for_each_segment(bv, req, iter) {
-			dst = page_address(bv.bv_page) + bv.bv_offset;
+			dst = page_address(bvec_page(&bv)) + bv.bv_offset;
 			last_tidaw = itcw_add_tidaw(itcw, 0x00,
 						    dst, bv.bv_len);
 			if (IS_ERR(last_tidaw)) {
@@ -3907,7 +3907,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_raw(struct dasd_device *startdev,
 			idaws = idal_create_words(idaws, rawpadpage, PAGE_SIZE);
 	}
 	rq_for_each_segment(bv, req, iter) {
-		dst = page_address(bv.bv_page) + bv.bv_offset;
+		dst = page_address(bvec_page(&bv)) + bv.bv_offset;
 		seg_len = bv.bv_len;
 		if (cmd == DASD_ECKD_CCW_READ_TRACK)
 			memset(dst, 0, seg_len);
@@ -3968,7 +3968,7 @@ dasd_eckd_free_cp(struct dasd_ccw_req *cqr, struct request *req)
 	if (private->uses_cdl == 0 || recid > 2*blk_per_trk)
 		ccw++;
 	rq_for_each_segment(bv, req, iter) {
-		dst = page_address(bv.bv_page) + bv.bv_offset;
+		dst = page_address(bvec_page(&bv)) + bv.bv_offset;
 		for (off = 0; off < bv.bv_len; off += blksize) {
 			/* Skip locate record. */
 			if (private->uses_cdl && recid <= 2*blk_per_trk)
diff --git a/drivers/s390/block/dasd_fba.c b/drivers/s390/block/dasd_fba.c
index 56007a3e7f11..2c65118fa15e 100644
--- a/drivers/s390/block/dasd_fba.c
+++ b/drivers/s390/block/dasd_fba.c
@@ -471,7 +471,7 @@ static struct dasd_ccw_req *dasd_fba_build_cp_regular(
 			/* Fba can only do full blocks. */
 			return ERR_PTR(-EINVAL);
 		count += bv.bv_len >> (block->s2b_shift + 9);
-		if (idal_is_needed (page_address(bv.bv_page), bv.bv_len))
+		if (idal_is_needed (page_address(bvec_page(&bv)), bv.bv_len))
 			cidaw += bv.bv_len / blksize;
 	}
 	/* Paranoia. */
@@ -509,7 +509,7 @@ static struct dasd_ccw_req *dasd_fba_build_cp_regular(
 	}
 	recid = first_rec;
 	rq_for_each_segment(bv, req, iter) {
-		dst = page_address(bv.bv_page) + bv.bv_offset;
+		dst = page_address(bvec_page(&bv)) + bv.bv_offset;
 		if (dasd_page_cache) {
 			char *copy = kmem_cache_alloc(dasd_page_cache,
 						      GFP_DMA | __GFP_NOWARN);
@@ -591,7 +591,7 @@ dasd_fba_free_cp(struct dasd_ccw_req *cqr, struct request *req)
 	if (private->rdc_data.mode.bits.data_chain != 0)
 		ccw++;
 	rq_for_each_segment(bv, req, iter) {
-		dst = page_address(bv.bv_page) + bv.bv_offset;
+		dst = page_address(bvec_page(&bv)) + bv.bv_offset;
 		for (off = 0; off < bv.bv_len; off += blksize) {
 			/* Skip locate record. */
 			if (private->rdc_data.mode.bits.data_chain == 0)
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 4e8aedd50cb0..77228e43c415 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -893,7 +893,7 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
 	index = (bio->bi_iter.bi_sector >> 3);
 	bio_for_each_segment(bvec, bio, iter) {
 		page_addr = (unsigned long)
-			page_address(bvec.bv_page) + bvec.bv_offset;
+			page_address(bvec_page(&bvec)) + bvec.bv_offset;
 		source_addr = dev_info->start + (index<<12) + bytes_done;
 		if (unlikely((page_addr & 4095) != 0) || (bvec.bv_len & 4095) != 0)
 			// More paranoia.
diff --git a/drivers/s390/block/scm_blk.c b/drivers/s390/block/scm_blk.c
index e01889394c84..ea9afb84ae5b 100644
--- a/drivers/s390/block/scm_blk.c
+++ b/drivers/s390/block/scm_blk.c
@@ -201,7 +201,7 @@ static int scm_request_prepare(struct scm_request *scmrq)
 	rq_for_each_segment(bv, req, iter) {
 		WARN_ON(bv.bv_offset);
 		msb->blk_count += bv.bv_len >> 12;
-		aidaw->data_addr = (u64) page_address(bv.bv_page);
+		aidaw->data_addr = (u64) page_address(bvec_page(&bv));
 		aidaw++;
 	}
 
diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
index 3df5d68d09f0..3f86f269e89e 100644
--- a/drivers/s390/block/xpram.c
+++ b/drivers/s390/block/xpram.c
@@ -205,7 +205,7 @@ static blk_qc_t xpram_make_request(struct request_queue *q, struct bio *bio)
 	index = (bio->bi_iter.bi_sector >> 3) + xdev->offset;
 	bio_for_each_segment(bvec, bio, iter) {
 		page_addr = (unsigned long)
-			kmap(bvec.bv_page) + bvec.bv_offset;
+			kmap(bvec_page(&bvec)) + bvec.bv_offset;
 		bytes = bvec.bv_len;
 		if ((page_addr & 4095) != 0 || (bytes & 4095) != 0)
 			/* More paranoia. */
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2b2bc4b49d78..c355aff34ebf 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -828,10 +828,11 @@ static blk_status_t sd_setup_unmap_cmnd(struct scsi_cmnd *cmd)
 	unsigned int data_len = 24;
 	char *buf;
 
-	rq->special_vec.bv_page = mempool_alloc(sd_page_pool, GFP_ATOMIC);
-	if (!rq->special_vec.bv_page)
+	bvec_set_page(&rq->special_vec,
+		      mempool_alloc(sd_page_pool, GFP_ATOMIC));
+	if (!bvec_page(&rq->special_vec))
 		return BLK_STS_RESOURCE;
-	clear_highpage(rq->special_vec.bv_page);
+	clear_highpage(bvec_page(&rq->special_vec));
 	rq->special_vec.bv_offset = 0;
 	rq->special_vec.bv_len = data_len;
 	rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
@@ -840,7 +841,7 @@ static blk_status_t sd_setup_unmap_cmnd(struct scsi_cmnd *cmd)
 	cmd->cmnd[0] = UNMAP;
 	cmd->cmnd[8] = 24;
 
-	buf = page_address(rq->special_vec.bv_page);
+	buf = page_address(bvec_page(&rq->special_vec));
 	put_unaligned_be16(6 + 16, &buf[0]);
 	put_unaligned_be16(16, &buf[2]);
 	put_unaligned_be64(lba, &buf[8]);
@@ -862,10 +863,11 @@ static blk_status_t sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd,
 	u32 nr_blocks = sectors_to_logical(sdp, blk_rq_sectors(rq));
 	u32 data_len = sdp->sector_size;
 
-	rq->special_vec.bv_page = mempool_alloc(sd_page_pool, GFP_ATOMIC);
-	if (!rq->special_vec.bv_page)
+	bvec_set_page(&rq->special_vec,
+		      mempool_alloc(sd_page_pool, GFP_ATOMIC));
+	if (!bvec_page(&rq->special_vec))
 		return BLK_STS_RESOURCE;
-	clear_highpage(rq->special_vec.bv_page);
+	clear_highpage(bvec_page(&rq->special_vec));
 	rq->special_vec.bv_offset = 0;
 	rq->special_vec.bv_len = data_len;
 	rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
@@ -893,10 +895,11 @@ static blk_status_t sd_setup_write_same10_cmnd(struct scsi_cmnd *cmd,
 	u32 nr_blocks = sectors_to_logical(sdp, blk_rq_sectors(rq));
 	u32 data_len = sdp->sector_size;
 
-	rq->special_vec.bv_page = mempool_alloc(sd_page_pool, GFP_ATOMIC);
-	if (!rq->special_vec.bv_page)
+	bvec_set_page(&rq->special_vec,
+		      mempool_alloc(sd_page_pool, GFP_ATOMIC));
+	if (!bvec_page(&rq->special_vec))
 		return BLK_STS_RESOURCE;
-	clear_highpage(rq->special_vec.bv_page);
+	clear_highpage(bvec_page(&rq->special_vec));
 	rq->special_vec.bv_offset = 0;
 	rq->special_vec.bv_len = data_len;
 	rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
@@ -1304,7 +1307,7 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 	u8 *cmnd;
 
 	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
-		mempool_free(rq->special_vec.bv_page, sd_page_pool);
+		mempool_free(bvec_page(&rq->special_vec), sd_page_pool);
 
 	if (SCpnt->cmnd != scsi_req(rq)->cmd) {
 		cmnd = SCpnt->cmnd;
diff --git a/drivers/staging/erofs/data.c b/drivers/staging/erofs/data.c
index 526e0dbea5b5..ba467ba414ff 100644
--- a/drivers/staging/erofs/data.c
+++ b/drivers/staging/erofs/data.c
@@ -23,7 +23,7 @@ static inline void read_endio(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec_page(bvec);
 
 		/* page is already locked */
 		DBG_BUGON(PageUptodate(page));
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index 31eef8395774..11aa0c6f1994 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -852,7 +852,7 @@ static inline void z_erofs_vle_read_endio(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec_page(bvec);
 		bool cachemngd = false;
 
 		DBG_BUGON(PageUptodate(page));
diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c
index 49b110d1b972..9a3e3bc1101e 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -296,7 +296,7 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	}
 
 	for_each_sg(sgl, sg, sgl_nents, i) {
-		bvec[i].bv_page = sg_page(sg);
+		bvec_set_page(&bvec[i], sg_page(sg));
 		bvec[i].bv_len = sg->length;
 		bvec[i].bv_offset = sg->offset;
 
@@ -346,7 +346,7 @@ static int fd_do_rw(struct se_cmd *cmd, struct file *fd,
 	}
 
 	for_each_sg(sgl, sg, sgl_nents, i) {
-		bvec[i].bv_page = sg_page(sg);
+		bvec_set_page(&bvec[i], sg_page(sg));
 		bvec[i].bv_len = sg->length;
 		bvec[i].bv_offset = sg->offset;
 
@@ -483,7 +483,7 @@ fd_execute_write_same(struct se_cmd *cmd)
 		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 
 	for (i = 0; i < nolb; i++) {
-		bvec[i].bv_page = sg_page(&cmd->t_data_sg[0]);
+		bvec_set_page(&bvec[i], sg_page(&cmd->t_data_sg[0]));
 		bvec[i].bv_len = cmd->t_data_sg[0].length;
 		bvec[i].bv_offset = cmd->t_data_sg[0].offset;
 
diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index f3fbb700f569..fed2a4883817 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -8,8 +8,8 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 			       const struct bio_vec *vec2)
 {
 #if XEN_PAGE_SIZE == PAGE_SIZE
-	unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
-	unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
+	unsigned long bfn1 = pfn_to_bfn(page_to_pfn(bvec_page(vec1)));
+	unsigned long bfn2 = pfn_to_bfn(page_to_pfn(bvec_page(vec2)));
 
 	return bfn1 + PFN_DOWN(vec1->bv_offset + vec1->bv_len) == bfn2;
 #else
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index b626b28f0ce9..5f581ba51a5a 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -172,7 +172,7 @@ static int v9fs_vfs_writepage_locked(struct page *page)
 	else
 		len = PAGE_SIZE;
 
-	bvec.bv_page = page;
+	bvec_set_page(&bvec, page);
 	bvec.bv_offset = 0;
 	bvec.bv_len = len;
 	iov_iter_bvec(&from, WRITE, &bvec, 1, len);
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 0b37867b5c20..af7cdb8accf4 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -521,7 +521,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 			size = req->remain;
 		call->bvec[0].bv_len = size;
 		call->bvec[0].bv_offset = req->offset;
-		call->bvec[0].bv_page = req->pages[req->index];
+		bvec_set_page(&call->bvec[0], req->pages[req->index]);
 		iov_iter_bvec(&call->iter, READ, call->bvec, 1, size);
 		ASSERTCMP(size, <=, PAGE_SIZE);
 
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 2c588f9bbbda..85caafeb9131 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -303,7 +303,7 @@ static void afs_load_bvec(struct afs_call *call, struct msghdr *msg,
 			to = call->last_to;
 			msg->msg_flags &= ~MSG_MORE;
 		}
-		bv[i].bv_page = pages[i];
+		bvec_set_page(&bv[i], pages[i]);
 		bv[i].bv_len = to - offset;
 		bv[i].bv_offset = offset;
 		bytes += to - offset;
@@ -349,7 +349,7 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg)
 		ret = rxrpc_kernel_send_data(call->net->socket, call->rxcall, msg,
 					     bytes, afs_notify_end_request_tx);
 		for (loop = 0; loop < nr; loop++)
-			put_page(bv[loop].bv_page);
+			put_page(bvec_page(&bv[loop]));
 		if (ret < 0)
 			break;
 
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index 6e97a42d24d1..e05fb959b13e 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -567,7 +567,7 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 			size = req->remain;
 		call->bvec[0].bv_len = size;
 		call->bvec[0].bv_offset = req->offset;
-		call->bvec[0].bv_page = req->pages[req->index];
+		bvec_set_page(&call->bvec[0], req->pages[req->index]);
 		iov_iter_bvec(&call->iter, READ, call->bvec, 1, size);
 		ASSERTCMP(size, <=, PAGE_SIZE);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 78d3257435c0..7304fc309326 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -262,9 +262,9 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 	__set_current_state(TASK_RUNNING);
 
 	bio_for_each_segment_all(bvec, &bio, i, iter_all) {
-		if (should_dirty && !PageCompound(bvec->bv_page))
-			set_page_dirty_lock(bvec->bv_page);
-		put_page(bvec->bv_page);
+		if (should_dirty && !PageCompound(bvec_page(bvec)))
+			set_page_dirty_lock(bvec_page(bvec));
+		put_page(bvec_page(bvec));
 	}
 
 	if (unlikely(bio.bi_status))
@@ -342,7 +342,7 @@ static void blkdev_bio_end_io(struct bio *bio)
 			int i;
 
 			bio_for_each_segment_all(bvec, bio, i, iter_all)
-				put_page(bvec->bv_page);
+				put_page(bvec_page(bvec));
 		}
 		bio_put(bio);
 	}
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index b0c8094528d1..c5ee3ac73930 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2824,7 +2824,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
 
 		bio_for_each_segment(bvec, bio, iter) {
 			BUG_ON(bvec.bv_len != PAGE_SIZE);
-			mapped_datav[i] = kmap(bvec.bv_page);
+			mapped_datav[i] = kmap(bvec_page(&bvec));
 			i++;
 
 			if (dev_state->state->print_mask &
@@ -2838,7 +2838,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
 					      bio, &bio_is_patched,
 					      NULL, bio->bi_opf);
 		bio_for_each_segment(bvec, bio, iter)
-			kunmap(bvec.bv_page);
+			kunmap(bvec_page(&bvec));
 		kfree(mapped_datav);
 	} else if (NULL != dev_state && (bio->bi_opf & REQ_PREFLUSH)) {
 		if (dev_state->state->print_mask &
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 4f2a8ae0aa42..fcedb69c4d7a 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -170,7 +170,7 @@ static void end_compressed_bio_read(struct bio *bio)
 		 */
 		ASSERT(!bio_flagged(bio, BIO_CLONED));
 		bio_for_each_segment_all(bvec, cb->orig_bio, i, iter_all)
-			SetPageChecked(bvec->bv_page);
+			SetPageChecked(bvec_page(bvec));
 
 		bio_endio(cb->orig_bio);
 	}
@@ -398,7 +398,7 @@ static u64 bio_end_offset(struct bio *bio)
 {
 	struct bio_vec *last = bio_last_bvec_all(bio);
 
-	return page_offset(last->bv_page) + last->bv_len + last->bv_offset;
+	return page_offset(bvec_page(last)) + last->bv_len + last->bv_offset;
 }
 
 static noinline int add_ra_bio_pages(struct inode *inode,
@@ -1105,7 +1105,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
 	 * start byte is the first byte of the page we're currently
 	 * copying into relative to the start of the compressed data.
 	 */
-	start_byte = page_offset(bvec.bv_page) - disk_start;
+	start_byte = page_offset(bvec_page(&bvec)) - disk_start;
 
 	/* we haven't yet hit data corresponding to this page */
 	if (total_out <= start_byte)
@@ -1129,10 +1129,10 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
 				PAGE_SIZE - buf_offset);
 		bytes = min(bytes, working_bytes);
 
-		kaddr = kmap_atomic(bvec.bv_page);
+		kaddr = kmap_atomic(bvec_page(&bvec));
 		memcpy(kaddr + bvec.bv_offset, buf + buf_offset, bytes);
 		kunmap_atomic(kaddr);
-		flush_dcache_page(bvec.bv_page);
+		flush_dcache_page(bvec_page(&bvec));
 
 		buf_offset += bytes;
 		working_bytes -= bytes;
@@ -1144,7 +1144,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
 			return 0;
 		bvec = bio_iter_iovec(bio, bio->bi_iter);
 		prev_start_byte = start_byte;
-		start_byte = page_offset(bvec.bv_page) - disk_start;
+		start_byte = page_offset(bvec_page(&bvec)) - disk_start;
 
 		/*
 		 * We need to make sure we're only adjusting
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6fe9197f6ee4..490d734f73bc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -837,8 +837,8 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		root = BTRFS_I(bvec->bv_page->mapping->host)->root;
-		ret = csum_dirty_buffer(root->fs_info, bvec->bv_page);
+		root = BTRFS_I(bvec_page(bvec)->mapping->host)->root;
+		ret = csum_dirty_buffer(root->fs_info, bvec_page(bvec));
 		if (ret)
 			break;
 	}
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ca8b8e785cf3..7485910fdff0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -157,7 +157,7 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
 	u64 start;
 
 	mp_bvec_last_segment(bvec, &bv);
-	start = page_offset(bv.bv_page) + bv.bv_offset;
+	start = page_offset(bvec_page(&bv)) + bv.bv_offset;
 
 	bio->bi_private = NULL;
 
@@ -2456,7 +2456,7 @@ static void end_bio_extent_writepage(struct bio *bio)
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec_page(bvec);
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 
@@ -2528,7 +2528,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec_page(bvec);
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 		bool data_inode = btrfs_ino(BTRFS_I(inode))
@@ -3648,7 +3648,7 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec_page(bvec);
 
 		eb = (struct extent_buffer *)page->private;
 		BUG_ON(!eb);
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 920bf3b4b0ef..419c70021617 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -208,7 +208,7 @@ static blk_status_t __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio
 			goto next;
 
 		if (!dio)
-			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
+			offset = page_offset(bvec_page(&bvec)) + bvec.bv_offset;
 		count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
 					       (u32 *)csum, nblocks);
 		if (count)
@@ -446,14 +446,14 @@ blk_status_t btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
 
 	bio_for_each_segment(bvec, bio, iter) {
 		if (!contig)
-			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
+			offset = page_offset(bvec_page(&bvec)) + bvec.bv_offset;
 
 		if (!ordered) {
 			ordered = btrfs_lookup_ordered_extent(inode, offset);
 			BUG_ON(!ordered); /* Logic error */
 		}
 
-		data = kmap_atomic(bvec.bv_page);
+		data = kmap_atomic(bvec_page(&bvec));
 
 		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
 						 bvec.bv_len + fs_info->sectorsize
@@ -483,7 +483,7 @@ blk_status_t btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
 					+ total_bytes;
 				index = 0;
 
-				data = kmap_atomic(bvec.bv_page);
+				data = kmap_atomic(bvec_page(&bvec));
 			}
 
 			sums->sums[index] = ~(u32)0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 82fdda8ff5ab..90e216d67b50 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7843,7 +7843,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, i, iter_all)
 		clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree,
-				 io_tree, done->start, bvec->bv_page,
+				 io_tree, done->start, bvec_page(bvec),
 				 btrfs_ino(BTRFS_I(inode)), 0);
 end:
 	complete(&done->done);
@@ -7880,10 +7880,10 @@ static blk_status_t __btrfs_correct_data_nocsum(struct inode *inode,
 		done.start = start;
 		init_completion(&done.done);
 
-		ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
-				pgoff, start, start + sectorsize - 1,
-				io_bio->mirror_num,
-				btrfs_retry_endio_nocsum, &done);
+		ret = dio_read_error(inode, &io_bio->bio, bvec_page(&bvec),
+				     pgoff, start, start + sectorsize - 1,
+				     io_bio->mirror_num,
+				     btrfs_retry_endio_nocsum, &done);
 		if (ret) {
 			err = ret;
 			goto next;
@@ -7935,13 +7935,14 @@ static void btrfs_retry_endio(struct bio *bio)
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		ret = __readpage_endio_check(inode, io_bio, i, bvec->bv_page,
+		ret = __readpage_endio_check(inode, io_bio, i,
+					     bvec_page(bvec),
 					     bvec->bv_offset, done->start,
 					     bvec->bv_len);
 		if (!ret)
 			clean_io_failure(BTRFS_I(inode)->root->fs_info,
 					 failure_tree, io_tree, done->start,
-					 bvec->bv_page,
+					 bvec_page(bvec),
 					 btrfs_ino(BTRFS_I(inode)),
 					 bvec->bv_offset);
 		else
@@ -7987,7 +7988,8 @@ static blk_status_t __btrfs_subio_endio_read(struct inode *inode,
 		if (uptodate) {
 			csum_pos = BTRFS_BYTES_TO_BLKS(fs_info, offset);
 			ret = __readpage_endio_check(inode, io_bio, csum_pos,
-					bvec.bv_page, pgoff, start, sectorsize);
+					bvec_page(&bvec), pgoff, start,
+					sectorsize);
 			if (likely(!ret))
 				goto next;
 		}
@@ -7996,7 +7998,7 @@ static blk_status_t __btrfs_subio_endio_read(struct inode *inode,
 		done.start = start;
 		init_completion(&done.done);
 
-		status = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
+		status = dio_read_error(inode, &io_bio->bio, bvec_page(&bvec),
 					pgoff, start, start + sectorsize - 1,
 					io_bio->mirror_num, btrfs_retry_endio,
 					&done);
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 67a6f7d47402..f02532ef34f0 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1160,7 +1160,7 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio)
 			bio->bi_iter = btrfs_io_bio(bio)->iter;
 
 		bio_for_each_segment(bvec, bio, iter) {
-			rbio->bio_pages[page_index + i] = bvec.bv_page;
+			rbio->bio_pages[page_index + i] = bvec_page(&bvec);
 			i++;
 		}
 	}
@@ -1448,7 +1448,7 @@ static void set_bio_pages_uptodate(struct bio *bio)
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all)
-		SetPageUptodate(bvec->bv_page);
+		SetPageUptodate(bvec_page(bvec));
 }
 
 /*
diff --git a/fs/buffer.c b/fs/buffer.c
index ce357602f471..91c4bfde03e5 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3043,7 +3043,7 @@ void guard_bio_eod(int op, struct bio *bio)
 		struct bio_vec bv;
 
 		mp_bvec_last_segment(bvec, &bv);
-		zero_user(bv.bv_page, bv.bv_offset + bv.bv_len,
+		zero_user(bvec_page(&bv), bv.bv_offset + bv.bv_len,
 				truncated_bytes);
 	}
 }
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index d3c8035335a2..5183f545b90a 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -160,10 +160,10 @@ static void put_bvecs(struct bio_vec *bvecs, int num_bvecs, bool should_dirty)
 	int i;
 
 	for (i = 0; i < num_bvecs; i++) {
-		if (bvecs[i].bv_page) {
+		if (bvec_page(&bvecs[i])) {
 			if (should_dirty)
-				set_page_dirty_lock(bvecs[i].bv_page);
-			put_page(bvecs[i].bv_page);
+				set_page_dirty_lock(bvec_page(&bvecs[i]));
+			put_page(bvec_page(&bvecs[i]));
 		}
 	}
 	kvfree(bvecs);
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index 9bc0d17a9d77..4b6a6317f125 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -802,8 +802,8 @@ cifs_aio_ctx_release(struct kref *refcount)
 
 		for (i = 0; i < ctx->npages; i++) {
 			if (ctx->should_dirty)
-				set_page_dirty(ctx->bv[i].bv_page);
-			put_page(ctx->bv[i].bv_page);
+				set_page_dirty(bvec_page(&ctx->bv[i]));
+			put_page(bvec_page(&ctx->bv[i]));
 		}
 		kvfree(ctx->bv);
 	}
@@ -885,7 +885,7 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw)
 
 		for (i = 0; i < cur_npages; i++) {
 			len = rc > PAGE_SIZE ? PAGE_SIZE : rc;
-			bv[npages + i].bv_page = pages[i];
+			bvec_set_page(&bv[npages + i], pages[i]);
 			bv[npages + i].bv_offset = start;
 			bv[npages + i].bv_len = len - start;
 			rc -= len;
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 00225e699d03..a61a16fb9d2f 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -3456,7 +3456,7 @@ init_read_bvec(struct page **pages, unsigned int npages, unsigned int data_size,
 		return -ENOMEM;
 
 	for (i = 0; i < npages; i++) {
-		bvec[i].bv_page = pages[i];
+		bvec_set_page(&bvec[i], pages[i]);
 		bvec[i].bv_offset = (i == 0) ? cur_off : 0;
 		bvec[i].bv_len = min_t(unsigned int, PAGE_SIZE, data_size);
 		data_size -= bvec[i].bv_len;
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index b943b74cd246..08658dab6ee7 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2070,7 +2070,7 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
 		break;
 
 	case ITER_BVEC:
-		page = msg->msg_iter.bvec->bv_page;
+		page = bvec_page(msg->msg_iter.bvec);
 		page_offset = msg->msg_iter.bvec->bv_offset;
 		to_read = msg->msg_iter.bvec->bv_len;
 		rc = smbd_recv_page(info, page, page_offset, to_read);
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 1de8e996e566..1f18ae00fdcd 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -369,7 +369,7 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst,
 		for (i = 0; i < rqst[j].rq_npages; i++) {
 			struct bio_vec bvec;
 
-			bvec.bv_page = rqst[j].rq_pages[i];
+			bvec_set_page(&bvec, rqst[j].rq_pages[i]);
 			rqst_page_get_length(&rqst[j], i, &bvec.bv_len,
 					     &bvec.bv_offset);
 
diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 5759bcd018cd..51763b09a11b 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -33,7 +33,7 @@ static void __fscrypt_decrypt_bio(struct bio *bio, bool done)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bv, bio, i, iter_all) {
-		struct page *page = bv->bv_page;
+		struct page *page = bvec_page(bv);
 		int ret = fscrypt_decrypt_page(page->mapping->host, page,
 				PAGE_SIZE, 0, page->index);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 52a18858e3e7..e9f3b79048ae 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -554,7 +554,7 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
 		struct bvec_iter_all iter_all;
 
 		bio_for_each_segment_all(bvec, bio, i, iter_all) {
-			struct page *page = bvec->bv_page;
+			struct page *page = bvec_page(bvec);
 
 			if (dio->op == REQ_OP_READ && !PageCompound(page) &&
 					dio->should_dirty)
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 3e9298e6a705..4cd321328c18 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -66,7 +66,7 @@ static void ext4_finish_bio(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec_page(bvec);
 #ifdef CONFIG_FS_ENCRYPTION
 		struct page *data_page = NULL;
 #endif
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 3adadf461825..84222b89da52 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -83,7 +83,7 @@ static void mpage_end_io(struct bio *bio)
 		}
 	}
 	bio_for_each_segment_all(bv, bio, i, iter_all) {
-		struct page *page = bv->bv_page;
+		struct page *page = bvec_page(bv);
 
 		if (!bio->bi_status) {
 			SetPageUptodate(page);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9727944139f2..51bf04ba2599 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -90,7 +90,7 @@ static void __read_end_io(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bv, bio, i, iter_all) {
-		page = bv->bv_page;
+		page = bvec_page(bv);
 
 		/* PG_error was set if any post_read step failed */
 		if (bio->bi_status || PageError(page)) {
@@ -173,7 +173,7 @@ static void f2fs_write_end_io(struct bio *bio)
 	}
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec_page(bvec);
 		enum count_type type = WB_DATA_TYPE(page);
 
 		if (IS_DUMMY_WRITTEN_PAGE(page)) {
@@ -360,10 +360,10 @@ static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode,
 
 	bio_for_each_segment_all(bvec, io->bio, i, iter_all) {
 
-		if (bvec->bv_page->mapping)
-			target = bvec->bv_page;
+		if (bvec_page(bvec)->mapping)
+			target = bvec_page(bvec);
 		else
-			target = fscrypt_control_page(bvec->bv_page);
+			target = fscrypt_control_page(bvec_page(bvec));
 
 		if (inode && inode == target->mapping->host)
 			return true;
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 8722c60b11fe..e0523ef8421e 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -173,7 +173,7 @@ static void gfs2_end_log_write_bh(struct gfs2_sbd *sdp,
 				  blk_status_t error)
 {
 	struct buffer_head *bh, *next;
-	struct page *page = bvec->bv_page;
+	struct page *page = bvec_page(bvec);
 	unsigned size;
 
 	bh = page_buffers(page);
@@ -217,7 +217,7 @@ static void gfs2_end_log_write(struct bio *bio)
 	}
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		page = bvec->bv_page;
+		page = bvec_page(bvec);
 		if (page_has_buffers(page))
 			gfs2_end_log_write_bh(sdp, bvec, bio->bi_status);
 		else
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 3201342404a7..a7e645d08942 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -193,7 +193,7 @@ static void gfs2_meta_read_endio(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec_page(bvec);
 		struct buffer_head *bh = page_buffers(page);
 		unsigned int len = bvec->bv_len;
 
diff --git a/fs/io_uring.c b/fs/io_uring.c
index bbdbd56cf2ac..32f4b4ddd20b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2346,7 +2346,7 @@ static int io_sqe_buffer_unregister(struct io_ring_ctx *ctx)
 		struct io_mapped_ubuf *imu = &ctx->user_bufs[i];
 
 		for (j = 0; j < imu->nr_bvecs; j++)
-			put_page(imu->bvec[j].bv_page);
+			put_page(bvec_page(&imu->bvec[j]));
 
 		if (ctx->account_mem)
 			io_unaccount_mem(ctx->user, imu->nr_bvecs);
@@ -2504,7 +2504,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 			size_t vec_len;
 
 			vec_len = min_t(size_t, size, PAGE_SIZE - off);
-			imu->bvec[j].bv_page = pages[j];
+			bvec_set_page(&imu->bvec[j], pages[j]);
 			imu->bvec[j].bv_len = vec_len;
 			imu->bvec[j].bv_offset = off;
 			off = 0;
diff --git a/fs/iomap.c b/fs/iomap.c
index abdd18e404f8..ed5f249cf0d4 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -235,7 +235,7 @@ iomap_read_finish(struct iomap_page *iop, struct page *page)
 static void
 iomap_read_page_end_io(struct bio_vec *bvec, int error)
 {
-	struct page *page = bvec->bv_page;
+	struct page *page = bvec_page(bvec);
 	struct iomap_page *iop = to_iomap_page(page);
 
 	if (unlikely(error)) {
@@ -1595,7 +1595,7 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 			int i;
 
 			bio_for_each_segment_all(bvec, bio, i, iter_all)
-				put_page(bvec->bv_page);
+				put_page(bvec_page(bvec));
 		}
 		bio_put(bio);
 	}
diff --git a/fs/mpage.c b/fs/mpage.c
index 3f19da75178b..e234c9a8802d 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -51,7 +51,7 @@ static void mpage_end_io(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bv, bio, i, iter_all) {
-		struct page *page = bv->bv_page;
+		struct page *page = bvec_page(bv);
 		page_endio(page, bio_op(bio),
 			   blk_status_to_errno(bio->bi_status));
 	}
diff --git a/fs/splice.c b/fs/splice.c
index 3ee7e82df48f..4a0b522a0cb4 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -735,7 +735,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 				goto done;
 			}
 
-			array[n].bv_page = buf->page;
+			bvec_set_page(&array[n], buf->page);
 			array[n].bv_len = this_len;
 			array[n].bv_offset = buf->offset;
 			left -= this_len;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3619e9e8d359..d152d1ab2ad1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -66,10 +66,10 @@ xfs_finish_page_writeback(
 	struct bio_vec	*bvec,
 	int			error)
 {
-	struct iomap_page	*iop = to_iomap_page(bvec->bv_page);
+	struct iomap_page	*iop = to_iomap_page(bvec_page(bvec));
 
 	if (error) {
-		SetPageError(bvec->bv_page);
+		SetPageError(bvec_page(bvec));
 		mapping_set_error(inode->i_mapping, -EIO);
 	}
 
@@ -77,7 +77,7 @@ xfs_finish_page_writeback(
 	ASSERT(!iop || atomic_read(&iop->write_count) > 0);
 
 	if (!iop || atomic_dec_and_test(&iop->write_count))
-		end_page_writeback(bvec->bv_page);
+		end_page_writeback(bvec_page(bvec));
 }
 
 /*
diff --git a/include/linux/bio.h b/include/linux/bio.h
index bb6090aa165d..6ac4f6b192e6 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -280,7 +280,7 @@ static inline struct bio_vec *bio_first_bvec_all(struct bio *bio)
 
 static inline struct page *bio_first_page_all(struct bio *bio)
 {
-	return bio_first_bvec_all(bio)->bv_page;
+	return bvec_page(bio_first_bvec_all(bio));
 }
 
 static inline struct bio_vec *bio_last_bvec_all(struct bio *bio)
@@ -544,7 +544,7 @@ static inline char *bvec_kmap_irq(struct bio_vec *bvec, unsigned long *flags)
 	 * balancing is a lot nicer this way
 	 */
 	local_irq_save(*flags);
-	addr = (unsigned long) kmap_atomic(bvec->bv_page);
+	addr = (unsigned long) kmap_atomic(bvec_page(bvec));
 
 	BUG_ON(addr & ~PAGE_MASK);
 
@@ -562,7 +562,7 @@ static inline void bvec_kunmap_irq(char *buffer, unsigned long *flags)
 #else
 static inline char *bvec_kmap_irq(struct bio_vec *bvec, unsigned long *flags)
 {
-	return page_address(bvec->bv_page) + bvec->bv_offset;
+	return page_address(bvec_page(bvec)) + bvec->bv_offset;
 }
 
 static inline void bvec_kunmap_irq(char *buffer, unsigned long *flags)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 8f8fb528ce53..d701cd968f13 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -157,7 +157,7 @@ static inline bool bvec_iter_advance(const struct bio_vec *bv,
 
 static inline struct bio_vec *bvec_init_iter_all(struct bvec_iter_all *iter_all)
 {
-	iter_all->bv.bv_page = NULL;
+	bvec_set_page(&iter_all->bv, NULL);
 	iter_all->done = 0;
 
 	return &iter_all->bv;
@@ -168,11 +168,11 @@ static inline void mp_bvec_next_segment(const struct bio_vec *bvec,
 {
 	struct bio_vec *bv = &iter_all->bv;
 
-	if (bv->bv_page) {
-		bv->bv_page = nth_page(bv->bv_page, 1);
+	if (bvec_page(bv)) {
+		bvec_set_page(bv, nth_page(bvec_page(bv), 1));
 		bv->bv_offset = 0;
 	} else {
-		bv->bv_page = bvec->bv_page;
+		bvec_set_page(bv, bvec_page(bvec));
 		bv->bv_offset = bvec->bv_offset;
 	}
 	bv->bv_len = min_t(unsigned int, PAGE_SIZE - bv->bv_offset,
@@ -189,7 +189,7 @@ static inline void mp_bvec_last_segment(const struct bio_vec *bvec,
 	unsigned total = bvec->bv_offset + bvec->bv_len;
 	unsigned last_page = (total - 1) / PAGE_SIZE;
 
-	seg->bv_page = bvec_nth_page(bvec->bv_page, last_page);
+	bvec_set_page(seg, bvec_nth_page(bvec_page(bvec), last_page));
 
 	/* the whole segment is inside the last page */
 	if (bvec->bv_offset >= last_page * PAGE_SIZE) {
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 3e16187491d8..26eedf080df7 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -829,7 +829,7 @@ static struct page *ceph_msg_data_bio_next(struct ceph_msg_data_cursor *cursor,
 
 	*page_offset = bv.bv_offset;
 	*length = bv.bv_len;
-	return bv.bv_page;
+	return bvec_page(&bv);
 }
 
 static bool ceph_msg_data_bio_advance(struct ceph_msg_data_cursor *cursor,
@@ -890,7 +890,7 @@ static struct page *ceph_msg_data_bvecs_next(struct ceph_msg_data_cursor *cursor
 
 	*page_offset = bv.bv_offset;
 	*length = bv.bv_len;
-	return bv.bv_page;
+	return bvec_page(&bv);
 }
 
 static bool ceph_msg_data_bvecs_advance(struct ceph_msg_data_cursor *cursor,
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index aa8177ddcbda..93f1c6e2891b 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -148,7 +148,7 @@ xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp)
 		if (!buf->bvec)
 			return -ENOMEM;
 		for (i = 0; i < n; i++) {
-			buf->bvec[i].bv_page = buf->pages[i];
+			bvec_set_page(&buf->bvec[i], buf->pages[i]);
 			buf->bvec[i].bv_len = PAGE_SIZE;
 			buf->bvec[i].bv_offset = 0;
 		}
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 732d4b57411a..373c5a4bbc97 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -334,7 +334,7 @@ xs_alloc_sparse_pages(struct xdr_buf *buf, size_t want, gfp_t gfp)
 	for (i = 0; i < n; i++) {
 		if (buf->pages[i])
 			continue;
-		buf->bvec[i].bv_page = buf->pages[i] = alloc_page(gfp);
+		bvec_set_page(&buf->bvec[i], buf->pages[i] = alloc_page(gfp));
 		if (!buf->pages[i]) {
 			i *= PAGE_SIZE;
 			return i > buf->page_base ? i - buf->page_base : 0;
@@ -389,7 +389,7 @@ xs_flush_bvec(const struct bio_vec *bvec, size_t count, size_t seek)
 
 	bvec_iter_advance(bvec, &bi, seek & PAGE_MASK);
 	for_each_bvec(bv, bvec, bi, bi)
-		flush_dcache_page(bv.bv_page);
+		flush_dcache_page(bvec_page(&bv));
 }
 #else
 static inline void
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 06/15] block: convert bio_vec.bv_page to bv_pfn to store pfn and not page
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (4 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 05/15] block: replace all bio_vec->bv_page by bvec_page()/bvec_set_page() jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 07/15] block: add bvec_put_page_dirty*() to replace put_page(bvec_page()) jglisse
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

From: Jérôme Glisse <jglisse@redhat.com>

To be able to store flags with each bio_vec store the pfn value and
not the page this leave us with couple uppers bits we can latter use
for flags.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
---
 Documentation/block/biodoc.txt |  7 +++++--
 include/linux/bvec.h           | 29 +++++++++++++++++++++--------
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index ac18b488cb5e..c673d4285781 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -410,7 +410,7 @@ mapped to bio structures.
 2.2 The bio struct
 
 The bio structure uses a vector representation pointing to an array of tuples
-of <page, offset, len> to describe the i/o buffer, and has various other
+of <pfn, offset, len> to describe the i/o buffer, and has various other
 fields describing i/o parameters and state that needs to be maintained for
 performing the i/o.
 
@@ -418,11 +418,14 @@ Notice that this representation means that a bio has no virtual address
 mapping at all (unlike buffer heads).
 
 struct bio_vec {
-       struct page     *bv_page;
+       unsigned long   *bv_pfn;
        unsigned short  bv_len;
        unsigned short  bv_offset;
 };
 
+You should not access the bv_pfn fields directly but use helpers to get the
+corresponding struct page as bv_pfn can encode more than page pfn value.
+
 /*
  * main unit of I/O for the block layer and lower layers (ie drivers)
  */
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index d701cd968f13..ac84ac66a333 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -29,7 +29,7 @@
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
  */
 struct bio_vec {
-	struct page	*bv_page;
+	unsigned long	bv_pfn;
 	unsigned int	bv_len;
 	unsigned int	bv_offset;
 };
@@ -51,14 +51,19 @@ struct bvec_iter_all {
 	unsigned	done;
 };
 
+static inline unsigned long page_to_bvec_pfn(struct page *page)
+{
+	return page ? page_to_pfn(page) : -1UL;
+}
+
 static inline struct page *bvec_page(const struct bio_vec *bvec)
 {
-	return bvec->bv_page;
+	return bvec->bv_pfn == -1UL ? NULL : pfn_to_page(bvec->bv_pfn);
 }
 
 static inline void bvec_set_page(struct bio_vec *bvec, struct page *page)
 {
-	bvec->bv_page = page;
+	bvec->bv_pfn = page_to_bvec_pfn(page);
 }
 
 static inline struct page *bvec_nth_page(struct page *page, int idx)
@@ -70,11 +75,15 @@ static inline struct page *bvec_nth_page(struct page *page, int idx)
  * various member access, note that bio_data should of course not be used
  * on highmem page vectors
  */
-#define BIO_VEC_INIT(p, l, o) {.bv_page = (p), .bv_len = (l), .bv_offset = (o)}
+#define BIO_VEC_INIT(p, l, o) {.bv_pfn = page_to_bvec_pfn(p), \
+				.bv_len = (l), .bv_offset = (o)}
 
 #define __bvec_iter_bvec(bvec, iter)	(&(bvec)[(iter).bi_idx])
 
 /* multi-page (mp_bvec) helpers */
+#define mp_bvec_iter_pfn(bvec, iter)				\
+	((__bvec_iter_bvec((bvec), (iter)))->bv_pfn)
+
 #define mp_bvec_iter_page(bvec, iter)				\
 	(bvec_page(__bvec_iter_bvec((bvec), (iter))))
 
@@ -90,7 +99,7 @@ static inline struct page *bvec_nth_page(struct page *page, int idx)
 
 #define mp_bvec_iter_bvec(bvec, iter)				\
 ((struct bio_vec) {						\
-	.bv_page	= mp_bvec_iter_page((bvec), (iter)),	\
+	.bv_pfn		= mp_bvec_iter_pfn((bvec), (iter)),	\
 	.bv_len		= mp_bvec_iter_len((bvec), (iter)),	\
 	.bv_offset	= mp_bvec_iter_offset((bvec), (iter)),	\
 })
@@ -100,16 +109,20 @@ static inline struct page *bvec_nth_page(struct page *page, int idx)
 	(mp_bvec_iter_offset((bvec), (iter)) % PAGE_SIZE)
 
 #define bvec_iter_len(bvec, iter)				\
-	min_t(unsigned, mp_bvec_iter_len((bvec), (iter)),		\
+	min_t(unsigned, mp_bvec_iter_len((bvec), (iter)),	\
 	      PAGE_SIZE - bvec_iter_offset((bvec), (iter)))
 
 #define bvec_iter_page(bvec, iter)				\
-	bvec_nth_page(mp_bvec_iter_page((bvec), (iter)),		\
+	bvec_nth_page(mp_bvec_iter_page((bvec), (iter)),	\
 		      mp_bvec_iter_page_idx((bvec), (iter)))
 
+#define bvec_iter_pfn(bvec, iter)				\
+	(mp_bvec_iter_pfn((bvec), (iter)) +			\
+	 mp_bvec_iter_page_idx((bvec), (iter)))
+
 #define bvec_iter_bvec(bvec, iter)				\
 ((struct bio_vec) {						\
-	.bv_page	= bvec_iter_page((bvec), (iter)),	\
+	.bv_pfn		= bvec_iter_pfn((bvec), (iter)),	\
 	.bv_len		= bvec_iter_len((bvec), (iter)),	\
 	.bv_offset	= bvec_iter_offset((bvec), (iter)),	\
 })
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 07/15] block: add bvec_put_page_dirty*() to replace put_page(bvec_page())
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (5 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 06/15] block: convert bio_vec.bv_page to bv_pfn to store pfn and not page jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 08/15] block: use bvec_put_page() instead of put_page(bvec_page()) jglisse
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

From: Jérôme Glisse <jglisse@redhat.com>

For bio_vec.page we need to use the appropriate put_page ie put_user_page
if the page reference was taken through GUP (any of the get_user_page*)
or the regular put_page otherwise.

To distinguish between the two we store a flag as the top if of the pfn
values on all archectitecture we have at least one bit available there.

We also take care of dirtnyness ie calling set_page_dirty*().

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
---
 include/linux/bvec.h | 52 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index ac84ac66a333..a1e464c708fb 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -20,6 +20,7 @@
 #ifndef __LINUX_BVEC_ITER_H
 #define __LINUX_BVEC_ITER_H
 
+#include <asm/bitsperlong.h>
 #include <linux/kernel.h>
 #include <linux/bug.h>
 #include <linux/errno.h>
@@ -34,6 +35,9 @@ struct bio_vec {
 	unsigned int	bv_offset;
 };
 
+#define BVEC_PFN_GUP (1UL << (BITS_PER_LONG - 1))
+#define BVEC_PFN_MASK (~BVEC_PFN_GUP)
+
 struct bvec_iter {
 	sector_t		bi_sector;	/* device address in 512 byte
 						   sectors */
@@ -58,7 +62,13 @@ static inline unsigned long page_to_bvec_pfn(struct page *page)
 
 static inline struct page *bvec_page(const struct bio_vec *bvec)
 {
-	return bvec->bv_pfn == -1UL ? NULL : pfn_to_page(bvec->bv_pfn);
+	return bvec->bv_pfn == -1UL ? NULL :
+		pfn_to_page(bvec->bv_pfn & BVEC_PFN_MASK);
+}
+
+static inline void bvec_set_gup_page(struct bio_vec *bvec, struct page *page)
+{
+	bvec->bv_pfn = page_to_bvec_pfn(page) | BVEC_PFN_GUP;
 }
 
 static inline void bvec_set_page(struct bio_vec *bvec, struct page *page)
@@ -71,6 +81,46 @@ static inline struct page *bvec_nth_page(struct page *page, int idx)
 	return idx == 0 ? page : nth_page(page, idx);
 }
 
+static inline void bvec_put_page(const struct bio_vec *bvec)
+{
+	struct page *page = bvec_page(bvec);
+
+	if (page == NULL)
+		return;
+
+	if (bvec->bv_pfn & BVEC_PFN_GUP)
+		put_user_page(page);
+	else
+		put_page(page);
+}
+
+static inline void bvec_put_page_dirty(const struct bio_vec *bvec, bool dirty)
+{
+	struct page *page = bvec_page(bvec);
+
+	if (page == NULL)
+		return;
+
+	if (dirty)
+		set_page_dirty(page);
+
+	bvec_put_page(bvec);
+}
+
+static inline void bvec_put_page_dirty_lock(const struct bio_vec *bvec,
+					    bool dirty)
+{
+	struct page *page = bvec_page(bvec);
+
+	if (page == NULL)
+		return;
+
+	if (dirty)
+		set_page_dirty_lock(page);
+
+	bvec_put_page(bvec);
+}
+
 /*
  * various member access, note that bio_data should of course not be used
  * on highmem page vectors
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 08/15] block: use bvec_put_page() instead of put_page(bvec_page())
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (6 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 07/15] block: add bvec_put_page_dirty*() to replace put_page(bvec_page()) jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 09/15] block: bvec_put_page_dirty* instead of set_page_dirty* and bvec_put_page jglisse
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

From: Jérôme Glisse <jglisse@redhat.com>

Replace all put_page(bvec_page()) with bvec_put_page() so that we can
use proper put_page (ie either put_page() or put_user_page()).

This is done using a coccinelle patch and running it with:

spatch --sp-file spfile --in-place --dir .

with spfile:
%<---------------------------------------------------------------------
@exists@
expression E1;
@@
-put_page(bvec_page(E1));
+bvec_put_page(E1);
--------------------------------------------------------------------->%

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
---
 block/bio.c    | 6 +++---
 fs/afs/rxrpc.c | 2 +-
 fs/block_dev.c | 4 ++--
 fs/ceph/file.c | 2 +-
 fs/cifs/misc.c | 2 +-
 fs/io_uring.c  | 2 +-
 fs/iomap.c     | 2 +-
 7 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index c73ac2120ca0..b74b81085f3a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1433,7 +1433,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 
  out_unmap:
 	bio_for_each_segment_all(bvec, bio, j, iter_all) {
-		put_page(bvec_page(bvec));
+		bvec_put_page(bvec);
 	}
 	bio_put(bio);
 	return ERR_PTR(ret);
@@ -1452,7 +1452,7 @@ static void __bio_unmap_user(struct bio *bio)
 		if (bio_data_dir(bio) == READ)
 			set_page_dirty_lock(bvec_page(bvec));
 
-		put_page(bvec_page(bvec));
+		bvec_put_page(bvec);
 	}
 
 	bio_put(bio);
@@ -1666,7 +1666,7 @@ static void bio_release_pages(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, i, iter_all)
-		put_page(bvec_page(bvec));
+		bvec_put_page(bvec);
 }
 
 /*
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 85caafeb9131..08386ddf7185 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -349,7 +349,7 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg)
 		ret = rxrpc_kernel_send_data(call->net->socket, call->rxcall, msg,
 					     bytes, afs_notify_end_request_tx);
 		for (loop = 0; loop < nr; loop++)
-			put_page(bvec_page(&bv[loop]));
+			bvec_put_page(&bv[loop]);
 		if (ret < 0)
 			break;
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7304fc309326..9761f7943774 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -264,7 +264,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 	bio_for_each_segment_all(bvec, &bio, i, iter_all) {
 		if (should_dirty && !PageCompound(bvec_page(bvec)))
 			set_page_dirty_lock(bvec_page(bvec));
-		put_page(bvec_page(bvec));
+		bvec_put_page(bvec);
 	}
 
 	if (unlikely(bio.bi_status))
@@ -342,7 +342,7 @@ static void blkdev_bio_end_io(struct bio *bio)
 			int i;
 
 			bio_for_each_segment_all(bvec, bio, i, iter_all)
-				put_page(bvec_page(bvec));
+				bvec_put_page(bvec);
 		}
 		bio_put(bio);
 	}
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 5183f545b90a..6a39347f4956 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -163,7 +163,7 @@ static void put_bvecs(struct bio_vec *bvecs, int num_bvecs, bool should_dirty)
 		if (bvec_page(&bvecs[i])) {
 			if (should_dirty)
 				set_page_dirty_lock(bvec_page(&bvecs[i]));
-			put_page(bvec_page(&bvecs[i]));
+			bvec_put_page(&bvecs[i]);
 		}
 	}
 	kvfree(bvecs);
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index 4b6a6317f125..86d78f297526 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -803,7 +803,7 @@ cifs_aio_ctx_release(struct kref *refcount)
 		for (i = 0; i < ctx->npages; i++) {
 			if (ctx->should_dirty)
 				set_page_dirty(bvec_page(&ctx->bv[i]));
-			put_page(bvec_page(&ctx->bv[i]));
+			bvec_put_page(&ctx->bv[i]);
 		}
 		kvfree(ctx->bv);
 	}
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 32f4b4ddd20b..349f0e58ee5c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2346,7 +2346,7 @@ static int io_sqe_buffer_unregister(struct io_ring_ctx *ctx)
 		struct io_mapped_ubuf *imu = &ctx->user_bufs[i];
 
 		for (j = 0; j < imu->nr_bvecs; j++)
-			put_page(bvec_page(&imu->bvec[j]));
+			bvec_put_page(&imu->bvec[j]);
 
 		if (ctx->account_mem)
 			io_unaccount_mem(ctx->user, imu->nr_bvecs);
diff --git a/fs/iomap.c b/fs/iomap.c
index ed5f249cf0d4..ab578054ebe9 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1595,7 +1595,7 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 			int i;
 
 			bio_for_each_segment_all(bvec, bio, i, iter_all)
-				put_page(bvec_page(bvec));
+				bvec_put_page(bvec);
 		}
 		bio_put(bio);
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 09/15] block: bvec_put_page_dirty* instead of set_page_dirty* and bvec_put_page
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (7 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 08/15] block: use bvec_put_page() instead of put_page(bvec_page()) jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() jglisse
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

From: Jérôme Glisse <jglisse@redhat.com>

Use bvec_put_page_dirty*() instead of set_page_dirty*() followed by a call
to bvec_put_page(). With this change we can use the proper put_user_page*()
helpers.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
---
 block/bio.c    | 8 ++------
 fs/block_dev.c | 8 +++-----
 fs/ceph/file.c | 6 +-----
 fs/cifs/misc.c | 8 +++-----
 4 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index b74b81085f3a..efd254c90974 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1448,12 +1448,8 @@ static void __bio_unmap_user(struct bio *bio)
 	/*
 	 * make sure we dirty pages we wrote to
 	 */
-	bio_for_each_segment_all(bvec, bio, i, iter_all) {
-		if (bio_data_dir(bio) == READ)
-			set_page_dirty_lock(bvec_page(bvec));
-
-		bvec_put_page(bvec);
-	}
+	bio_for_each_segment_all(bvec, bio, i, iter_all)
+		bvec_put_page_dirty_lock(bvec, bio_data_dir(bio) == READ);
 
 	bio_put(bio);
 }
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9761f7943774..16a17fae6694 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -261,11 +261,9 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 	}
 	__set_current_state(TASK_RUNNING);
 
-	bio_for_each_segment_all(bvec, &bio, i, iter_all) {
-		if (should_dirty && !PageCompound(bvec_page(bvec)))
-			set_page_dirty_lock(bvec_page(bvec));
-		bvec_put_page(bvec);
-	}
+	bio_for_each_segment_all(bvec, &bio, i, iter_all)
+		bvec_put_page_dirty_lock(bvec, should_dirty &&
+				!PageCompound(bvec_page(bvec)));
 
 	if (unlikely(bio.bi_status))
 		ret = blk_status_to_errno(bio.bi_status);
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 6a39347f4956..d5561662b902 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -160,11 +160,7 @@ static void put_bvecs(struct bio_vec *bvecs, int num_bvecs, bool should_dirty)
 	int i;
 
 	for (i = 0; i < num_bvecs; i++) {
-		if (bvec_page(&bvecs[i])) {
-			if (should_dirty)
-				set_page_dirty_lock(bvec_page(&bvecs[i]));
-			bvec_put_page(&bvecs[i]);
-		}
+		bvec_put_page_dirty_lock(&bvecs[i], should_dirty);
 	}
 	kvfree(bvecs);
 }
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index 86d78f297526..bc77a4a5f1af 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -800,11 +800,9 @@ cifs_aio_ctx_release(struct kref *refcount)
 	if (ctx->bv) {
 		unsigned i;
 
-		for (i = 0; i < ctx->npages; i++) {
-			if (ctx->should_dirty)
-				set_page_dirty(bvec_page(&ctx->bv[i]));
-			bvec_put_page(&ctx->bv[i]);
-		}
+		for (i = 0; i < ctx->npages; i++)
+			bvec_put_page_dirty_lock(&ctx->bv[i],
+					  ctx->should_dirty);
 		kvfree(ctx->bv);
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page()
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (8 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 09/15] block: bvec_put_page_dirty* instead of set_page_dirty* and bvec_put_page jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-15 14:59   ` Jan Kara
  2019-04-11 21:08 ` [PATCH v1 11/15] block: make sure bio_add_page*() knows page that are coming from GUP jglisse
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

From: Jérôme Glisse <jglisse@redhat.com>

We want to keep track of how we got a reference on page added to bio_vec
ie wether the page was reference through GUP (get_user_page*) or not. So
add a flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() to that
effect.

This is done using a coccinelle patch and running it with:

spatch --sp-file spfile --in-place --include-headers --dir .

with spfile:
%<---------------------------------------------------------------------
@@
identifier I1, I2, I3, I4;
@@
void __bio_add_page(struct bio *I1, struct page *I2, unsigned I3,
unsigned I4
+, bool is_gup
 ) {...}

@@
identifier I1, I2, I3, I4;
@@
void __bio_add_page(struct bio *I1, struct page *I2, unsigned I3,
unsigned I4
+, bool is_gup
 );

@@
identifier I1, I2, I3, I4;
@@
int bio_add_page(struct bio *I1, struct page *I2, unsigned I3,
unsigned I4
+, bool is_gup
 ) {...}

@@
@@
int bio_add_page(struct bio *, struct page *, unsigned, unsigned
+, bool is_gup
 );

@@
identifier I1, I2, I3, I4, I5;
@@
int bio_add_pc_page(struct request_queue *I1, struct bio *I2,
struct page *I3, unsigned I4, unsigned I5
+, bool is_gup
 ) {...}

@@
@@
int bio_add_pc_page(struct request_queue *, struct bio *,
struct page *, unsigned, unsigned
+, bool is_gup
 );

@@
expression E1, E2, E3, E4;
@@
__bio_add_page(E1, E2, E3, E4
+, false
 )

@@
expression E1, E2, E3, E4;
@@
bio_add_page(E1, E2, E3, E4
+, false
 )

@@
expression E1, E2, E3, E4, E5;
@@
bio_add_pc_page(E1, E2, E3, E4, E5
+, false
 )
--------------------------------------------------------------------->%

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
---
 block/bio.c                         | 20 ++++++++++----------
 block/blk-lib.c                     |  3 ++-
 drivers/block/drbd/drbd_actlog.c    |  2 +-
 drivers/block/drbd/drbd_bitmap.c    |  2 +-
 drivers/block/drbd/drbd_receiver.c  |  2 +-
 drivers/block/floppy.c              |  2 +-
 drivers/block/pktcdvd.c             |  4 ++--
 drivers/block/xen-blkback/blkback.c |  2 +-
 drivers/block/zram/zram_drv.c       |  4 ++--
 drivers/lightnvm/core.c             |  2 +-
 drivers/lightnvm/pblk-core.c        |  5 +++--
 drivers/lightnvm/pblk-rb.c          |  2 +-
 drivers/md/dm-bufio.c               |  2 +-
 drivers/md/dm-crypt.c               |  2 +-
 drivers/md/dm-io.c                  |  5 +++--
 drivers/md/dm-log-writes.c          |  8 ++++----
 drivers/md/dm-writecache.c          |  3 ++-
 drivers/md/dm-zoned-metadata.c      |  6 +++---
 drivers/md/md.c                     |  4 ++--
 drivers/md/raid1-10.c               |  2 +-
 drivers/md/raid1.c                  |  4 ++--
 drivers/md/raid10.c                 |  4 ++--
 drivers/md/raid5-cache.c            |  7 ++++---
 drivers/md/raid5-ppl.c              |  6 +++---
 drivers/nvme/target/io-cmd-bdev.c   |  2 +-
 drivers/staging/erofs/data.c        |  4 ++--
 drivers/staging/erofs/unzip_vle.c   |  2 +-
 drivers/target/target_core_iblock.c |  4 ++--
 drivers/target/target_core_pscsi.c  |  2 +-
 fs/btrfs/check-integrity.c          |  2 +-
 fs/btrfs/compression.c              | 10 +++++-----
 fs/btrfs/extent_io.c                |  8 ++++----
 fs/btrfs/raid56.c                   |  4 ++--
 fs/btrfs/scrub.c                    | 10 +++++-----
 fs/buffer.c                         |  2 +-
 fs/crypto/bio.c                     |  2 +-
 fs/direct-io.c                      |  2 +-
 fs/ext4/page-io.c                   |  2 +-
 fs/ext4/readpage.c                  |  2 +-
 fs/f2fs/data.c                      | 10 +++++-----
 fs/gfs2/lops.c                      |  4 ++--
 fs/gfs2/meta_io.c                   |  2 +-
 fs/gfs2/ops_fstype.c                |  2 +-
 fs/hfsplus/wrapper.c                |  3 ++-
 fs/iomap.c                          |  6 +++---
 fs/jfs/jfs_logmgr.c                 |  4 ++--
 fs/jfs/jfs_metapage.c               |  6 +++---
 fs/mpage.c                          |  4 ++--
 fs/nfs/blocklayout/blocklayout.c    |  2 +-
 fs/nilfs2/segbuf.c                  |  3 ++-
 fs/ocfs2/cluster/heartbeat.c        |  2 +-
 fs/xfs/xfs_aops.c                   |  2 +-
 fs/xfs/xfs_buf.c                    |  2 +-
 include/linux/bio.h                 |  7 ++++---
 kernel/power/swap.c                 |  2 +-
 mm/page_io.c                        |  2 +-
 56 files changed, 116 insertions(+), 108 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index efd254c90974..73227ede9a0a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -663,7 +663,7 @@ EXPORT_SYMBOL(bio_clone_fast);
  *	This should only be used by REQ_PC bios.
  */
 int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
-		    *page, unsigned int len, unsigned int offset)
+		    *page, unsigned int len, unsigned int offset, bool is_gup)
 {
 	int retried_segments = 0;
 	struct bio_vec *bvec;
@@ -798,7 +798,7 @@ EXPORT_SYMBOL_GPL(__bio_try_merge_page);
  * that @bio has space for another bvec.
  */
 void __bio_add_page(struct bio *bio, struct page *page,
-		unsigned int len, unsigned int off)
+		unsigned int len, unsigned int off, bool is_gup)
 {
 	struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
 
@@ -825,12 +825,12 @@ EXPORT_SYMBOL_GPL(__bio_add_page);
  *	if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio.
  */
 int bio_add_page(struct bio *bio, struct page *page,
-		 unsigned int len, unsigned int offset)
+		 unsigned int len, unsigned int offset, bool is_gup)
 {
 	if (!__bio_try_merge_page(bio, page, len, offset, false)) {
 		if (bio_full(bio))
 			return 0;
-		__bio_add_page(bio, page, len, offset);
+		__bio_add_page(bio, page, len, offset, false);
 	}
 	return len;
 }
@@ -847,7 +847,7 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
 
 	len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count);
 	size = bio_add_page(bio, bvec_page(bv), len,
-				bv->bv_offset + iter->iov_offset);
+				bv->bv_offset + iter->iov_offset, false);
 	if (size == len) {
 		if (!bio_flagged(bio, BIO_NO_PAGE_REF)) {
 			struct page *page;
@@ -902,7 +902,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		struct page *page = pages[i];
 
 		len = min_t(size_t, PAGE_SIZE - offset, left);
-		if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset) != len))
+		if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset, false) != len))
 			return -EINVAL;
 		offset = 0;
 	}
@@ -1298,7 +1298,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
 			}
 		}
 
-		if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) {
+		if (bio_add_pc_page(q, bio, page, bytes, offset, false) < bytes) {
 			if (!map_data)
 				__free_page(page);
 			break;
@@ -1393,7 +1393,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 				if (n > bytes)
 					n = bytes;
 
-				if (!bio_add_pc_page(q, bio, page, n, offs))
+				if (!bio_add_pc_page(q, bio, page, n, offs, false))
 					break;
 
 				/*
@@ -1509,7 +1509,7 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
 			bytes = len;
 
 		if (bio_add_pc_page(q, bio, virt_to_page(data), bytes,
-				    offset) < bytes) {
+				    offset, false) < bytes) {
 			/* we don't support partial mappings */
 			bio_put(bio);
 			return ERR_PTR(-EINVAL);
@@ -1592,7 +1592,7 @@ struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
 		if (!reading)
 			memcpy(page_address(page), p, bytes);
 
-		if (bio_add_pc_page(q, bio, page, bytes, 0) < bytes)
+		if (bio_add_pc_page(q, bio, page, bytes, 0, false) < bytes)
 			break;
 
 		len -= bytes;
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 02a0b398566d..0ccb8ea980f5 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -289,7 +289,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
 
 		while (nr_sects != 0) {
 			sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
-			bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0);
+			bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0,
+					       false);
 			nr_sects -= bi_size >> 9;
 			sector += bi_size >> 9;
 			if (bi_size < sz)
diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index 5f0eaee8c8a7..532c783667c2 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -154,7 +154,7 @@ static int _drbd_md_sync_page_io(struct drbd_device *device,
 	bio_set_dev(bio, bdev->md_bdev);
 	bio->bi_iter.bi_sector = sector;
 	err = -EIO;
-	if (bio_add_page(bio, device->md_io.page, size, 0) != size)
+	if (bio_add_page(bio, device->md_io.page, size, 0, false) != size)
 		goto out;
 	bio->bi_private = device;
 	bio->bi_end_io = drbd_md_endio;
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index e567bc234781..558c331342f1 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -1024,7 +1024,7 @@ static void bm_page_io_async(struct drbd_bm_aio_ctx *ctx, int page_nr) __must_ho
 	bio->bi_iter.bi_sector = on_disk_sector;
 	/* bio_add_page of a single page to an empty bio will always succeed,
 	 * according to api.  Do we want to assert that? */
-	bio_add_page(bio, page, len, 0);
+	bio_add_page(bio, page, len, 0, false);
 	bio->bi_private = ctx;
 	bio->bi_end_io = drbd_bm_endio;
 	bio_set_op_attrs(bio, op, 0);
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index ee7c77445456..802565c28905 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1716,7 +1716,7 @@ int drbd_submit_peer_request(struct drbd_device *device,
 
 	page_chain_for_each(page) {
 		unsigned len = min_t(unsigned, data_size, PAGE_SIZE);
-		if (!bio_add_page(bio, page, len, 0))
+		if (!bio_add_page(bio, page, len, 0, false))
 			goto next_bio;
 		data_size -= len;
 		sector += len >> 9;
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 6201106cb7e3..11e77f88ac39 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4131,7 +4131,7 @@ static int __floppy_read_block_0(struct block_device *bdev, int drive)
 
 	bio_init(&bio, &bio_vec, 1);
 	bio_set_dev(&bio, bdev);
-	bio_add_page(&bio, page, size, 0);
+	bio_add_page(&bio, page, size, 0, false);
 
 	bio.bi_iter.bi_sector = 0;
 	bio.bi_flags |= (1 << BIO_QUIET);
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index f5a71023f76c..cb5b9b4a7091 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1037,7 +1037,7 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
 		offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
 		pkt_dbg(2, pd, "Adding frame %d, page:%p offs:%d\n",
 			f, pkt->pages[p], offset);
-		if (!bio_add_page(bio, pkt->pages[p], CD_FRAMESIZE, offset))
+		if (!bio_add_page(bio, pkt->pages[p], CD_FRAMESIZE, offset, false))
 			BUG();
 
 		atomic_inc(&pkt->io_wait);
@@ -1277,7 +1277,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 		struct page *page = pkt->pages[(f * CD_FRAMESIZE) / PAGE_SIZE];
 		unsigned offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
 
-		if (!bio_add_page(pkt->w_bio, page, CD_FRAMESIZE, offset))
+		if (!bio_add_page(pkt->w_bio, page, CD_FRAMESIZE, offset, false))
 			BUG();
 	}
 	pkt_dbg(2, pd, "vcnt=%d\n", pkt->w_bio->bi_vcnt);
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..886e2e3202a7 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1362,7 +1362,7 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
 		       (bio_add_page(bio,
 				     pages[i]->page,
 				     seg[i].nsec << 9,
-				     seg[i].offset) == 0)) {
+				     seg[i].offset, false) == 0)) {
 
 			int nr_iovecs = min_t(int, (nseg-i), BIO_MAX_PAGES);
 			bio = bio_alloc(GFP_KERNEL, nr_iovecs);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 04fb864b16f5..a0734408db2f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -596,7 +596,7 @@ static int read_from_bdev_async(struct zram *zram, struct bio_vec *bvec,
 
 	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
 	bio_set_dev(bio, zram->bdev);
-	if (!bio_add_page(bio, bvec_page(bvec), bvec->bv_len, bvec->bv_offset)) {
+	if (!bio_add_page(bio, bvec_page(bvec), bvec->bv_len, bvec->bv_offset, false)) {
 		bio_put(bio);
 		return -EIO;
 	}
@@ -713,7 +713,7 @@ static ssize_t writeback_store(struct device *dev,
 		bio.bi_opf = REQ_OP_WRITE | REQ_SYNC;
 
 		bio_add_page(&bio, bvec_page(&bvec), bvec.bv_len,
-				bvec.bv_offset);
+				bvec.bv_offset, false);
 		/*
 		 * XXX: A single page IO would be inefficient for write
 		 * but it would be not bad as starter.
diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 5f82036fe322..cc08485dc36a 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -807,7 +807,7 @@ static int nvm_bb_chunk_sense(struct nvm_dev *dev, struct ppa_addr ppa)
 		return -ENOMEM;
 
 	bio_init(&bio, &bio_vec, 1);
-	bio_add_page(&bio, page, PAGE_SIZE, 0);
+	bio_add_page(&bio, page, PAGE_SIZE, 0, false);
 	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
 
 	rqd.bio = &bio;
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6ddb1e8a7223..2f374275b638 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -344,7 +344,8 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags,
 	for (i = 0; i < nr_pages; i++) {
 		page = mempool_alloc(&pblk->page_bio_pool, flags);
 
-		ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0);
+		ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0,
+				      false);
 		if (ret != PBLK_EXPOSED_PAGE_SIZE) {
 			pblk_err(pblk, "could not add page to bio\n");
 			mempool_free(page, &pblk->page_bio_pool);
@@ -605,7 +606,7 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
 			goto out;
 		}
 
-		ret = bio_add_pc_page(dev->q, bio, page, PAGE_SIZE, 0);
+		ret = bio_add_pc_page(dev->q, bio, page, PAGE_SIZE, 0, false);
 		if (ret != PAGE_SIZE) {
 			pblk_err(pblk, "could not add page to bio\n");
 			bio_put(bio);
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 03c241b340ea..986d9d308176 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -596,7 +596,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
 			return NVM_IO_ERR;
 		}
 
-		if (bio_add_pc_page(q, bio, page, rb->seg_size, 0) !=
+		if (bio_add_pc_page(q, bio, page, rb->seg_size, 0, false) !=
 								rb->seg_size) {
 			pblk_err(pblk, "could not add page to write bio\n");
 			flags &= ~PBLK_WRITTEN_DATA;
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 1ecef76225a1..4c77e2a7c2d8 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -598,7 +598,7 @@ static void use_bio(struct dm_buffer *b, int rw, sector_t sector,
 	do {
 		unsigned this_step = min((unsigned)(PAGE_SIZE - offset_in_page(ptr)), len);
 		if (!bio_add_page(bio, virt_to_page(ptr), this_step,
-				  offset_in_page(ptr))) {
+				  offset_in_page(ptr), false)) {
 			bio_put(bio);
 			goto dmio;
 		}
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index ef7896c50814..29006bdc6753 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1429,7 +1429,7 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
 
 		len = (remaining_size > PAGE_SIZE) ? PAGE_SIZE : remaining_size;
 
-		bio_add_page(clone, page, len, 0);
+		bio_add_page(clone, page, len, 0, false);
 
 		remaining_size -= len;
 	}
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 81a346f9de17..1d47565b49c3 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -361,7 +361,8 @@ static void do_region(int op, int op_flags, unsigned region,
 			 * WRITE SAME only uses a single page.
 			 */
 			dp->get_page(dp, &page, &len, &offset);
-			bio_add_page(bio, page, logical_block_size, offset);
+			bio_add_page(bio, page, logical_block_size, offset,
+				     false);
 			num_sectors = min_t(sector_t, special_cmd_max_sectors, remaining);
 			bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;
 
@@ -374,7 +375,7 @@ static void do_region(int op, int op_flags, unsigned region,
 			 */
 			dp->get_page(dp, &page, &len, &offset);
 			len = min(len, to_bytes(remaining));
-			if (!bio_add_page(bio, page, len, offset))
+			if (!bio_add_page(bio, page, len, offset, false))
 				break;
 
 			offset = 0;
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index e403fcb5c30a..4d42de63c85e 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -234,7 +234,7 @@ static int write_metadata(struct log_writes_c *lc, void *entry,
 	       lc->sectorsize - entrylen - datalen);
 	kunmap_atomic(ptr);
 
-	ret = bio_add_page(bio, page, lc->sectorsize, 0);
+	ret = bio_add_page(bio, page, lc->sectorsize, 0, false);
 	if (ret != lc->sectorsize) {
 		DMERR("Couldn't add page to the log block");
 		goto error_bio;
@@ -294,7 +294,7 @@ static int write_inline_data(struct log_writes_c *lc, void *entry,
 				memset(ptr + pg_datalen, 0, pg_sectorlen - pg_datalen);
 			kunmap_atomic(ptr);
 
-			ret = bio_add_page(bio, page, pg_sectorlen, 0);
+			ret = bio_add_page(bio, page, pg_sectorlen, 0, false);
 			if (ret != pg_sectorlen) {
 				DMERR("Couldn't add page of inline data");
 				__free_page(page);
@@ -371,7 +371,7 @@ static int log_one_block(struct log_writes_c *lc,
 		 * for every bvec in the original bio for simplicity sake.
 		 */
 		ret = bio_add_page(bio, bvec_page(&block->vecs[i]),
-				   block->vecs[i].bv_len, 0);
+				   block->vecs[i].bv_len, 0, false);
 		if (ret != block->vecs[i].bv_len) {
 			atomic_inc(&lc->io_blocks);
 			submit_bio(bio);
@@ -388,7 +388,7 @@ static int log_one_block(struct log_writes_c *lc,
 			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 
 			ret = bio_add_page(bio, bvec_page(&block->vecs[i]),
-					   block->vecs[i].bv_len, 0);
+					   block->vecs[i].bv_len, 0, false);
 			if (ret != block->vecs[i].bv_len) {
 				DMERR("Couldn't add page on new bio?");
 				bio_put(bio);
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index f7822875589e..2fff48b5479a 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -1440,7 +1440,8 @@ static bool wc_add_block(struct writeback_struct *wb, struct wc_entry *e, gfp_t
 
 	persistent_memory_flush_cache(address, block_size);
 	return bio_add_page(&wb->bio, persistent_memory_page(address),
-			    block_size, persistent_memory_page_offset(address)) != 0;
+			    block_size,
+			    persistent_memory_page_offset(address), false) != 0;
 }
 
 struct writeback_list {
diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index fa68336560c3..70fbf77bc396 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -438,7 +438,7 @@ static struct dmz_mblock *dmz_get_mblock_slow(struct dmz_metadata *zmd,
 	bio->bi_private = mblk;
 	bio->bi_end_io = dmz_mblock_bio_end_io;
 	bio_set_op_attrs(bio, REQ_OP_READ, REQ_META | REQ_PRIO);
-	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
+	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0, false);
 	submit_bio(bio);
 
 	return mblk;
@@ -588,7 +588,7 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
 	bio->bi_private = mblk;
 	bio->bi_end_io = dmz_mblock_bio_end_io;
 	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
-	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
+	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0, false);
 	submit_bio(bio);
 }
 
@@ -608,7 +608,7 @@ static int dmz_rdwr_block(struct dmz_metadata *zmd, int op, sector_t block,
 	bio->bi_iter.bi_sector = dmz_blk2sect(block);
 	bio_set_dev(bio, zmd->dev->bdev);
 	bio_set_op_attrs(bio, op, REQ_SYNC | REQ_META | REQ_PRIO);
-	bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0);
+	bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0, false);
 	ret = submit_bio_wait(bio);
 	bio_put(bio);
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 05ffffb8b769..585016563ec1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -817,7 +817,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
 
 	bio_set_dev(bio, rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev);
 	bio->bi_iter.bi_sector = sector;
-	bio_add_page(bio, page, size, 0);
+	bio_add_page(bio, page, size, 0, false);
 	bio->bi_private = rdev;
 	bio->bi_end_io = super_written;
 
@@ -859,7 +859,7 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
 		bio->bi_iter.bi_sector = sector + rdev->new_data_offset;
 	else
 		bio->bi_iter.bi_sector = sector + rdev->data_offset;
-	bio_add_page(bio, page, size, 0);
+	bio_add_page(bio, page, size, 0, false);
 
 	submit_bio_wait(bio);
 
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 400001b815db..f79c87b3d2bb 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -76,7 +76,7 @@ static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
 		 * won't fail because the vec table is big
 		 * enough to hold all these pages
 		 */
-		bio_add_page(bio, page, len, 0);
+		bio_add_page(bio, page, len, 0, false);
 		size -= len;
 	} while (idx++ < RESYNC_PAGES && size > 0);
 }
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index fdf451aac369..a9e736ef1b33 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1112,7 +1112,7 @@ static void alloc_behind_master_bio(struct r1bio *r1_bio,
 		if (unlikely(!page))
 			goto free_pages;
 
-		bio_add_page(behind_bio, page, len, 0);
+		bio_add_page(behind_bio, page, len, 0, false);
 
 		size -= len;
 		i++;
@@ -2854,7 +2854,7 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 				 * won't fail because the vec table is big
 				 * enough to hold all these pages
 				 */
-				bio_add_page(bio, page, len, 0);
+				bio_add_page(bio, page, len, 0, false);
 			}
 		}
 		nr_sectors += len>>9;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3b6880dd648d..e172fd3666d7 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3449,7 +3449,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			 * won't fail because the vec table is big enough
 			 * to hold all these pages
 			 */
-			bio_add_page(bio, page, len, 0);
+			bio_add_page(bio, page, len, 0, false);
 		}
 		nr_sectors += len>>9;
 		sector_nr += len>>9;
@@ -4659,7 +4659,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 			 * won't fail because the vec table is big enough
 			 * to hold all these pages
 			 */
-			bio_add_page(bio, page, len, 0);
+			bio_add_page(bio, page, len, 0, false);
 		}
 		sector_nr += len >> 9;
 		nr_sectors += len >> 9;
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index cbbe6b6535be..b62806564760 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -804,7 +804,7 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
 	io->current_bio = r5l_bio_alloc(log);
 	io->current_bio->bi_end_io = r5l_log_endio;
 	io->current_bio->bi_private = io;
-	bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0);
+	bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0, false);
 
 	r5_reserve_log_entry(log, io);
 
@@ -864,7 +864,7 @@ static void r5l_append_payload_page(struct r5l_log *log, struct page *page)
 		io->need_split_bio = false;
 	}
 
-	if (!bio_add_page(io->current_bio, page, PAGE_SIZE, 0))
+	if (!bio_add_page(io->current_bio, page, PAGE_SIZE, 0, false))
 		BUG();
 
 	r5_reserve_log_entry(log, io);
@@ -1699,7 +1699,8 @@ static int r5l_recovery_fetch_ra_pool(struct r5l_log *log,
 
 	while (ctx->valid_pages < ctx->total_pages) {
 		bio_add_page(ctx->ra_bio,
-			     ctx->ra_pool[ctx->valid_pages], PAGE_SIZE, 0);
+			     ctx->ra_pool[ctx->valid_pages], PAGE_SIZE, 0,
+			     false);
 		ctx->valid_pages += 1;
 
 		offset = r5l_ring_add(log, offset, BLOCK_SECTORS);
diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
index 17e9e7d51097..12003f091465 100644
--- a/drivers/md/raid5-ppl.c
+++ b/drivers/md/raid5-ppl.c
@@ -476,7 +476,7 @@ static void ppl_submit_iounit(struct ppl_io_unit *io)
 	bio->bi_opf = REQ_OP_WRITE | REQ_FUA;
 	bio_set_dev(bio, log->rdev->bdev);
 	bio->bi_iter.bi_sector = log->next_io_sector;
-	bio_add_page(bio, io->header_page, PAGE_SIZE, 0);
+	bio_add_page(bio, io->header_page, PAGE_SIZE, 0, false);
 	bio->bi_write_hint = ppl_conf->write_hint;
 
 	pr_debug("%s: log->current_io_sector: %llu\n", __func__,
@@ -501,7 +501,7 @@ static void ppl_submit_iounit(struct ppl_io_unit *io)
 		if (test_bit(STRIPE_FULL_WRITE, &sh->state))
 			continue;
 
-		if (!bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0)) {
+		if (!bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0, false)) {
 			struct bio *prev = bio;
 
 			bio = bio_alloc_bioset(GFP_NOIO, BIO_MAX_PAGES,
@@ -510,7 +510,7 @@ static void ppl_submit_iounit(struct ppl_io_unit *io)
 			bio->bi_write_hint = prev->bi_write_hint;
 			bio_copy_dev(bio, prev);
 			bio->bi_iter.bi_sector = bio_end_sector(prev);
-			bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0);
+			bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0, false);
 
 			bio_chain(bio, prev);
 			ppl_submit_iounit_bio(io, prev);
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index a065dbfc43b1..6ba1fd806394 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -144,7 +144,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	bio_set_op_attrs(bio, op, op_flags);
 
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
-		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
+		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset, false)
 				!= sg->length) {
 			struct bio *prev = bio;
 
diff --git a/drivers/staging/erofs/data.c b/drivers/staging/erofs/data.c
index ba467ba414ff..4fb84db9d5b4 100644
--- a/drivers/staging/erofs/data.c
+++ b/drivers/staging/erofs/data.c
@@ -70,7 +70,7 @@ struct page *__erofs_get_meta_page(struct super_block *sb,
 			goto err_out;
 		}
 
-		err = bio_add_page(bio, page, PAGE_SIZE, 0);
+		err = bio_add_page(bio, page, PAGE_SIZE, 0, false);
 		if (unlikely(err != PAGE_SIZE)) {
 			err = -EFAULT;
 			goto err_out;
@@ -290,7 +290,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio,
 		}
 	}
 
-	err = bio_add_page(bio, page, PAGE_SIZE, 0);
+	err = bio_add_page(bio, page, PAGE_SIZE, 0, false);
 	/* out of the extent or bio is full */
 	if (err < PAGE_SIZE)
 		goto submit_bio_retry;
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index 11aa0c6f1994..3cecd109324e 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -1453,7 +1453,7 @@ static bool z_erofs_vle_submit_all(struct super_block *sb,
 			++nr_bios;
 		}
 
-		err = bio_add_page(bio, page, PAGE_SIZE, 0);
+		err = bio_add_page(bio, page, PAGE_SIZE, 0, false);
 		if (err < PAGE_SIZE)
 			goto submit_bio_retry;
 
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index b5ed9c377060..9dc0d3712241 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -501,7 +501,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
 	refcount_set(&ibr->pending, 1);
 
 	while (sectors) {
-		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
+		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset, false)
 				!= sg->length) {
 
 			bio = iblock_get_bio(cmd, block_lba, 1, REQ_OP_WRITE,
@@ -753,7 +753,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 		 *	length of the S/G list entry this will cause and
 		 *	endless loop.  Better hope no driver uses huge pages.
 		 */
-		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
+		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset, false)
 				!= sg->length) {
 			if (cmd->prot_type && dev->dev_attrib.pi_prot_type) {
 				rc = iblock_alloc_bip(cmd, bio, &prot_miter);
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index b5388a106567..570ef259d78d 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -916,7 +916,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 				page, len, off);
 
 			rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
-					bio, page, bytes, off);
+					bio, page, bytes, off, false);
 			pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
 				bio_segments(bio), nr_vecs);
 			if (rc != bytes) {
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index c5ee3ac73930..d1bdddf3299a 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1633,7 +1633,7 @@ static int btrfsic_read_block(struct btrfsic_state *state,
 
 		for (j = i; j < num_pages; j++) {
 			ret = bio_add_page(bio, block_ctx->pagev[j],
-					   PAGE_SIZE, 0);
+					   PAGE_SIZE, 0, false);
 			if (PAGE_SIZE != ret)
 				break;
 		}
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index fcedb69c4d7a..3e28a0c01a60 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -337,7 +337,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 							  0);
 
 		page->mapping = NULL;
-		if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) <
+		if (submit || bio_add_page(bio, page, PAGE_SIZE, 0, false) <
 		    PAGE_SIZE) {
 			/*
 			 * inc the count before we submit the bio so
@@ -365,7 +365,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 			bio->bi_opf = REQ_OP_WRITE | write_flags;
 			bio->bi_private = cb;
 			bio->bi_end_io = end_compressed_bio_write;
-			bio_add_page(bio, page, PAGE_SIZE, 0);
+			bio_add_page(bio, page, PAGE_SIZE, 0, false);
 		}
 		if (bytes_left < PAGE_SIZE) {
 			btrfs_info(fs_info,
@@ -491,7 +491,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 		}
 
 		ret = bio_add_page(cb->orig_bio, page,
-				   PAGE_SIZE, 0);
+				   PAGE_SIZE, 0, false);
 
 		if (ret == PAGE_SIZE) {
 			nr_pages++;
@@ -616,7 +616,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 							  comp_bio, 0);
 
 		page->mapping = NULL;
-		if (submit || bio_add_page(comp_bio, page, PAGE_SIZE, 0) <
+		if (submit || bio_add_page(comp_bio, page, PAGE_SIZE, 0, false) <
 		    PAGE_SIZE) {
 			ret = btrfs_bio_wq_end_io(fs_info, comp_bio,
 						  BTRFS_WQ_ENDIO_DATA);
@@ -649,7 +649,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			comp_bio->bi_private = cb;
 			comp_bio->bi_end_io = end_compressed_bio_read;
 
-			bio_add_page(comp_bio, page, PAGE_SIZE, 0);
+			bio_add_page(comp_bio, page, PAGE_SIZE, 0, false);
 		}
 		cur_disk_byte += PAGE_SIZE;
 	}
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7485910fdff0..e3ddfff82c12 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2042,7 +2042,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 	}
 	bio_set_dev(bio, dev->bdev);
 	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC;
-	bio_add_page(bio, page, length, pg_offset);
+	bio_add_page(bio, page, length, pg_offset, false);
 
 	if (btrfsic_submit_bio_wait(bio)) {
 		/* try to remap that extent elsewhere? */
@@ -2357,7 +2357,7 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
 		       csum_size);
 	}
 
-	bio_add_page(bio, page, failrec->len, pg_offset);
+	bio_add_page(bio, page, failrec->len, pg_offset, false);
 
 	return bio;
 }
@@ -2775,7 +2775,7 @@ static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
 
 		if (prev_bio_flags != bio_flags || !contig || !can_merge ||
 		    force_bio_submit ||
-		    bio_add_page(bio, page, page_size, pg_offset) < page_size) {
+		    bio_add_page(bio, page, page_size, pg_offset, false) < page_size) {
 			ret = submit_one_bio(bio, mirror_num, prev_bio_flags);
 			if (ret < 0) {
 				*bio_ret = NULL;
@@ -2790,7 +2790,7 @@ static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
 	}
 
 	bio = btrfs_bio_alloc(bdev, offset);
-	bio_add_page(bio, page, page_size, pg_offset);
+	bio_add_page(bio, page, page_size, pg_offset, false);
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = tree;
 	bio->bi_write_hint = page->mapping->host->i_write_hint;
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index f02532ef34f0..5d2a3b8cf45c 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1097,7 +1097,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 		    !last->bi_status &&
 		    last->bi_disk == stripe->dev->bdev->bd_disk &&
 		    last->bi_partno == stripe->dev->bdev->bd_partno) {
-			ret = bio_add_page(last, page, PAGE_SIZE, 0);
+			ret = bio_add_page(last, page, PAGE_SIZE, 0, false);
 			if (ret == PAGE_SIZE)
 				return 0;
 		}
@@ -1109,7 +1109,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 	bio_set_dev(bio, stripe->dev->bdev);
 	bio->bi_iter.bi_sector = disk_start >> 9;
 
-	bio_add_page(bio, page, PAGE_SIZE, 0);
+	bio_add_page(bio, page, PAGE_SIZE, 0, false);
 	bio_list_add(bio_list, bio);
 	return 0;
 }
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a99588536c79..2b63d595e9f6 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1433,7 +1433,7 @@ static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info,
 		struct scrub_page *page = sblock->pagev[page_num];
 
 		WARN_ON(!page->page);
-		bio_add_page(bio, page->page, PAGE_SIZE, 0);
+		bio_add_page(bio, page->page, PAGE_SIZE, 0, false);
 	}
 
 	if (scrub_submit_raid56_bio_wait(fs_info, bio, first_page)) {
@@ -1486,7 +1486,7 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
 		bio = btrfs_io_bio_alloc(1);
 		bio_set_dev(bio, page->dev->bdev);
 
-		bio_add_page(bio, page->page, PAGE_SIZE, 0);
+		bio_add_page(bio, page->page, PAGE_SIZE, 0, false);
 		bio->bi_iter.bi_sector = page->physical >> 9;
 		bio->bi_opf = REQ_OP_READ;
 
@@ -1569,7 +1569,7 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
 		bio->bi_iter.bi_sector = page_bad->physical >> 9;
 		bio->bi_opf = REQ_OP_WRITE;
 
-		ret = bio_add_page(bio, page_good->page, PAGE_SIZE, 0);
+		ret = bio_add_page(bio, page_good->page, PAGE_SIZE, 0, false);
 		if (PAGE_SIZE != ret) {
 			bio_put(bio);
 			return -EIO;
@@ -1670,7 +1670,7 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
 		goto again;
 	}
 
-	ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0);
+	ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0, false);
 	if (ret != PAGE_SIZE) {
 		if (sbio->page_count < 1) {
 			bio_put(sbio->bio);
@@ -2071,7 +2071,7 @@ static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx,
 	}
 
 	sbio->pagev[sbio->page_count] = spage;
-	ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0);
+	ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0, false);
 	if (ret != PAGE_SIZE) {
 		if (sbio->page_count < 1) {
 			bio_put(sbio->bio);
diff --git a/fs/buffer.c b/fs/buffer.c
index 91c4bfde03e5..74aae2aa69c4 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3075,7 +3075,7 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 	bio_set_dev(bio, bh->b_bdev);
 	bio->bi_write_hint = write_hint;
 
-	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
+	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh), false);
 	BUG_ON(bio->bi_iter.bi_size != bh->b_size);
 
 	bio->bi_end_io = end_bio_bh_io_sync;
diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 51763b09a11b..604766e24a46 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -131,7 +131,7 @@ int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
 			pblk << (inode->i_sb->s_blocksize_bits - 9);
 		bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 		ret = bio_add_page(bio, ciphertext_page,
-					inode->i_sb->s_blocksize, 0);
+					inode->i_sb->s_blocksize, 0, false);
 		if (ret != inode->i_sb->s_blocksize) {
 			/* should never happen! */
 			WARN_ON(1);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index e9f3b79048ae..b8b5d8e31aeb 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -761,7 +761,7 @@ static inline int dio_bio_add_page(struct dio_submit *sdio)
 	int ret;
 
 	ret = bio_add_page(sdio->bio, sdio->cur_page,
-			sdio->cur_page_len, sdio->cur_page_offset);
+			sdio->cur_page_len, sdio->cur_page_offset, false);
 	if (ret == sdio->cur_page_len) {
 		/*
 		 * Decrement count only, if we are done with this page
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4cd321328c18..a76ce3346705 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -402,7 +402,7 @@ static int io_submit_add_bh(struct ext4_io_submit *io,
 			return ret;
 		io->io_bio->bi_write_hint = inode->i_write_hint;
 	}
-	ret = bio_add_page(io->io_bio, page, bh->b_size, bh_offset(bh));
+	ret = bio_add_page(io->io_bio, page, bh->b_size, bh_offset(bh), false);
 	if (ret != bh->b_size)
 		goto submit_and_retry;
 	wbc_account_io(io->io_wbc, page, bh->b_size);
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 84222b89da52..90ee8263d266 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -264,7 +264,7 @@ int ext4_mpage_readpages(struct address_space *mapping,
 		}
 
 		length = first_hole << blkbits;
-		if (bio_add_page(bio, page, length, 0) < length)
+		if (bio_add_page(bio, page, length, 0, false) < length)
 			goto submit_and_realloc;
 
 		if (((map.m_flags & EXT4_MAP_BOUNDARY) &&
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 51bf04ba2599..24353c9c8a41 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -308,7 +308,7 @@ static inline void __submit_bio(struct f2fs_sb_info *sbi,
 			SetPagePrivate(page);
 			set_page_private(page, (unsigned long)DUMMY_WRITTEN_PAGE);
 			lock_page(page);
-			if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE)
+			if (bio_add_page(bio, page, PAGE_SIZE, 0, false) < PAGE_SIZE)
 				f2fs_bug_on(sbi, 1);
 		}
 		/*
@@ -461,7 +461,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 	bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
 				1, is_read_io(fio->op), fio->type, fio->temp);
 
-	if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
+	if (bio_add_page(bio, page, PAGE_SIZE, 0, false) < PAGE_SIZE) {
 		bio_put(bio);
 		return -EFAULT;
 	}
@@ -530,7 +530,7 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
 		io->fio = *fio;
 	}
 
-	if (bio_add_page(io->bio, bio_page, PAGE_SIZE, 0) < PAGE_SIZE) {
+	if (bio_add_page(io->bio, bio_page, PAGE_SIZE, 0, false) < PAGE_SIZE) {
 		__submit_merged_bio(io);
 		goto alloc_new;
 	}
@@ -598,7 +598,7 @@ static int f2fs_submit_page_read(struct inode *inode, struct page *page,
 	/* wait for GCed page writeback via META_MAPPING */
 	f2fs_wait_on_block_writeback(inode, blkaddr);
 
-	if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
+	if (bio_add_page(bio, page, PAGE_SIZE, 0, false) < PAGE_SIZE) {
 		bio_put(bio);
 		return -EFAULT;
 	}
@@ -1621,7 +1621,7 @@ static int f2fs_mpage_readpages(struct address_space *mapping,
 		 */
 		f2fs_wait_on_block_writeback(inode, block_nr);
 
-		if (bio_add_page(bio, page, blocksize, 0) < blocksize)
+		if (bio_add_page(bio, page, blocksize, 0, false) < blocksize)
 			goto submit_and_realloc;
 
 		inc_page_count(F2FS_I_SB(inode), F2FS_RD_DATA);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index e0523ef8421e..3dca16f510b7 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -334,11 +334,11 @@ void gfs2_log_write(struct gfs2_sbd *sdp, struct page *page,
 
 	bio = gfs2_log_get_bio(sdp, blkno, &sdp->sd_log_bio, REQ_OP_WRITE,
 			       gfs2_end_log_write, false);
-	ret = bio_add_page(bio, page, size, offset);
+	ret = bio_add_page(bio, page, size, offset, false);
 	if (ret == 0) {
 		bio = gfs2_log_get_bio(sdp, blkno, &sdp->sd_log_bio,
 				       REQ_OP_WRITE, gfs2_end_log_write, true);
-		ret = bio_add_page(bio, page, size, offset);
+		ret = bio_add_page(bio, page, size, offset, false);
 		WARN_ON(ret == 0);
 	}
 }
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index a7e645d08942..c7db0f249002 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -225,7 +225,7 @@ static void gfs2_submit_bhs(int op, int op_flags, struct buffer_head *bhs[],
 		bio_set_dev(bio, bh->b_bdev);
 		while (num > 0) {
 			bh = *bhs;
-			if (!bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh))) {
+			if (!bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh), false)) {
 				BUG_ON(bio->bi_iter.bi_size == 0);
 				break;
 			}
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index b041cb8ae383..cdd52e6c02f7 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -243,7 +243,7 @@ static int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector, int silent)
 	bio = bio_alloc(GFP_NOFS, 1);
 	bio->bi_iter.bi_sector = sector * (sb->s_blocksize >> 9);
 	bio_set_dev(bio, sb->s_bdev);
-	bio_add_page(bio, page, PAGE_SIZE, 0);
+	bio_add_page(bio, page, PAGE_SIZE, 0, false);
 
 	bio->bi_end_io = end_bio_io_page;
 	bio->bi_private = page;
diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c
index 08c1580bdf7a..3eff6b4dcb69 100644
--- a/fs/hfsplus/wrapper.c
+++ b/fs/hfsplus/wrapper.c
@@ -77,7 +77,8 @@ int hfsplus_submit_bio(struct super_block *sb, sector_t sector,
 		unsigned int len = min_t(unsigned int, PAGE_SIZE - page_offset,
 					 io_size);
 
-		ret = bio_add_page(bio, virt_to_page(buf), len, page_offset);
+		ret = bio_add_page(bio, virt_to_page(buf), len, page_offset,
+				   false);
 		if (ret != len) {
 			ret = -EIO;
 			goto out;
diff --git a/fs/iomap.c b/fs/iomap.c
index ab578054ebe9..c706fd2b0f6e 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -356,7 +356,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		ctx->bio->bi_end_io = iomap_read_end_io;
 	}
 
-	bio_add_page(ctx->bio, page, plen, poff);
+	bio_add_page(ctx->bio, page, plen, poff, false);
 done:
 	/*
 	 * Move the caller beyond our range so that it keeps making progress.
@@ -624,7 +624,7 @@ iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
 	bio.bi_opf = REQ_OP_READ;
 	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
 	bio_set_dev(&bio, iomap->bdev);
-	__bio_add_page(&bio, page, plen, poff);
+	__bio_add_page(&bio, page, plen, poff, false);
 	return submit_bio_wait(&bio);
 }
 
@@ -1616,7 +1616,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
 	get_page(page);
-	__bio_add_page(bio, page, len, 0);
+	__bio_add_page(bio, page, len, 0, false);
 	bio_set_op_attrs(bio, REQ_OP_WRITE, flags);
 	iomap_dio_submit_bio(dio, iomap, bio);
 }
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 6b68df395892..42a8c1a8fb77 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -1997,7 +1997,7 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp)
 	bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
 	bio_set_dev(bio, log->bdev);
 
-	bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
+	bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset, false);
 	BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
 
 	bio->bi_end_io = lbmIODone;
@@ -2141,7 +2141,7 @@ static void lbmStartIO(struct lbuf * bp)
 	bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
 	bio_set_dev(bio, log->bdev);
 
-	bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
+	bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset, false);
 	BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
 
 	bio->bi_end_io = lbmIODone;
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index fa2c6824c7f2..6f66f0a15768 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -401,7 +401,7 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
 				continue;
 			}
 			/* Not contiguous */
-			if (bio_add_page(bio, page, bio_bytes, bio_offset) <
+			if (bio_add_page(bio, page, bio_bytes, bio_offset, false) <
 			    bio_bytes)
 				goto add_failed;
 			/*
@@ -444,7 +444,7 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
 		next_block = lblock + len;
 	}
 	if (bio) {
-		if (bio_add_page(bio, page, bio_bytes, bio_offset) < bio_bytes)
+		if (bio_add_page(bio, page, bio_bytes, bio_offset, false) < bio_bytes)
 				goto add_failed;
 		if (!bio->bi_iter.bi_size)
 			goto dump_bio;
@@ -518,7 +518,7 @@ static int metapage_readpage(struct file *fp, struct page *page)
 			bio_set_op_attrs(bio, REQ_OP_READ, 0);
 			len = xlen << inode->i_blkbits;
 			offset = block_offset << inode->i_blkbits;
-			if (bio_add_page(bio, page, len, offset) < len)
+			if (bio_add_page(bio, page, len, offset, false) < len)
 				goto add_failed;
 			block_offset += xlen;
 		} else
diff --git a/fs/mpage.c b/fs/mpage.c
index e234c9a8802d..67e6d1dda984 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -313,7 +313,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 	}
 
 	length = first_hole << blkbits;
-	if (bio_add_page(args->bio, page, length, 0) < length) {
+	if (bio_add_page(args->bio, page, length, 0, false) < length) {
 		args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio);
 		goto alloc_new;
 	}
@@ -650,7 +650,7 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
 	 */
 	wbc_account_io(wbc, page, PAGE_SIZE);
 	length = first_unmapped << blkbits;
-	if (bio_add_page(bio, page, length, 0) < length) {
+	if (bio_add_page(bio, page, length, 0, false) < length) {
 		bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio);
 		goto alloc_new;
 	}
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 690221747b47..fb58bf7bc06f 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -182,7 +182,7 @@ do_add_page_to_bio(struct bio *bio, int npg, int rw, sector_t isect,
 			return ERR_PTR(-ENOMEM);
 		bio_set_op_attrs(bio, rw, 0);
 	}
-	if (bio_add_page(bio, page, *len, offset) < *len) {
+	if (bio_add_page(bio, page, *len, offset, false) < *len) {
 		bio = bl_submit_bio(bio);
 		goto retry;
 	}
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index 20c479b5e41b..64ecdab529c7 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -424,7 +424,8 @@ static int nilfs_segbuf_submit_bh(struct nilfs_segment_buffer *segbuf,
 			return -ENOMEM;
 	}
 
-	len = bio_add_page(wi->bio, bh->b_page, bh->b_size, bh_offset(bh));
+	len = bio_add_page(wi->bio, bh->b_page, bh->b_size, bh_offset(bh),
+			   false);
 	if (len == bh->b_size) {
 		wi->end++;
 		return 0;
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index f3c20b279eb2..e8c209c2e348 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -569,7 +569,7 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
 		mlog(ML_HB_BIO, "page %d, vec_len = %u, vec_start = %u\n",
 		     current_page, vec_len, vec_start);
 
-		len = bio_add_page(bio, page, vec_len, vec_start);
+		len = bio_add_page(bio, page, vec_len, vec_start, false);
 		if (len != vec_len) break;
 
 		cs += vec_len / (PAGE_SIZE/spp);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index d152d1ab2ad1..085ccd01e059 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -667,7 +667,7 @@ xfs_add_to_ioend(
 			atomic_inc(&iop->write_count);
 		if (bio_full(wpc->ioend->io_bio))
 			xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
-		bio_add_page(wpc->ioend->io_bio, page, len, poff);
+		bio_add_page(wpc->ioend->io_bio, page, len, poff, false);
 	}
 
 	wpc->ioend->io_size += len;
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 548344e25128..2b981cf8d2af 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1389,7 +1389,7 @@ xfs_buf_ioapply_map(
 			nbytes = size;
 
 		rbytes = bio_add_page(bio, bp->b_pages[page_index], nbytes,
-				      offset);
+				      offset, false);
 		if (rbytes < nbytes)
 			break;
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 6ac4f6b192e6..05fcc5227d0e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -429,13 +429,14 @@ extern void bio_uninit(struct bio *);
 extern void bio_reset(struct bio *);
 void bio_chain(struct bio *, struct bio *);
 
-extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
+extern int bio_add_page(struct bio *, struct page *, unsigned int,
+			unsigned int, bool is_gup);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
-			   unsigned int, unsigned int);
+			   unsigned int, unsigned int, bool is_gup);
 bool __bio_try_merge_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off, bool same_page);
 void __bio_add_page(struct bio *bio, struct page *page,
-		unsigned int len, unsigned int off);
+		unsigned int len, unsigned int off, bool is_gup);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
 struct rq_map_data;
 extern struct bio *bio_map_user_iov(struct request_queue *,
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index d7f6c1a288d3..ca5e0e1576e3 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -274,7 +274,7 @@ static int hib_submit_io(int op, int op_flags, pgoff_t page_off, void *addr,
 	bio_set_dev(bio, hib_resume_bdev);
 	bio_set_op_attrs(bio, op, op_flags);
 
-	if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
+	if (bio_add_page(bio, page, PAGE_SIZE, 0, false) < PAGE_SIZE) {
 		pr_err("Adding page to bio failed at %llu\n",
 		       (unsigned long long)bio->bi_iter.bi_sector);
 		bio_put(bio);
diff --git a/mm/page_io.c b/mm/page_io.c
index 6b3be0445c61..c36bfe4ba317 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -42,7 +42,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 		bio->bi_end_io = end_io;
 
 		for (i = 0; i < nr; i++)
-			bio_add_page(bio, page + i, PAGE_SIZE, 0);
+			bio_add_page(bio, page + i, PAGE_SIZE, 0, false);
 		VM_BUG_ON(bio->bi_iter.bi_size != PAGE_SIZE * nr);
 	}
 	return bio;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 11/15] block: make sure bio_add_page*() knows page that are coming from GUP
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (9 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 12/15] fs/direct-io: keep track of wether a page is coming from GUP or not jglisse
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

From: Jérôme Glisse <jglisse@redhat.com>

When we get a page reference through get_user_page*() we want to keep
track of that so pass down that information to bio_add_page*().

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
---
 block/bio.c | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 73227ede9a0a..197b70426aa6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -708,7 +708,10 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 	 * cannot add the page
 	 */
 	bvec = &bio->bi_io_vec[bio->bi_vcnt];
-	bvec_set_page(bvec, page);
+	if (is_gup)
+		bvec_set_gup_page(bvec, page);
+	else
+		bvec_set_page(bvec, page);
 	bvec->bv_len = len;
 	bvec->bv_offset = offset;
 	bio->bi_vcnt++;
@@ -793,6 +796,7 @@ EXPORT_SYMBOL_GPL(__bio_try_merge_page);
  * @page: page to add
  * @len: length of the data to add
  * @off: offset of the data in @page
+ * @is_gup: was the page referenced through GUP (get_user_page*)
  *
  * Add the data at @page + @off to @bio as a new bvec.  The caller must ensure
  * that @bio has space for another bvec.
@@ -805,7 +809,10 @@ void __bio_add_page(struct bio *bio, struct page *page,
 	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
 	WARN_ON_ONCE(bio_full(bio));
 
-	bvec_set_page(bv, page);
+	if (is_gup)
+		bvec_set_gup_page(bv, page);
+	else
+		bvec_set_page(bv, page);
 	bv->bv_offset = off;
 	bv->bv_len = len;
 
@@ -820,6 +827,7 @@ EXPORT_SYMBOL_GPL(__bio_add_page);
  *	@page: page to add
  *	@len: vec entry length
  *	@offset: vec entry offset
+ *	@is_gup: was the page referenced through GUP (get_user_page*)
  *
  *	Attempt to add a page to the bio_vec maplist. This will only fail
  *	if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio.
@@ -830,7 +838,7 @@ int bio_add_page(struct bio *bio, struct page *page,
 	if (!__bio_try_merge_page(bio, page, len, offset, false)) {
 		if (bio_full(bio))
 			return 0;
-		__bio_add_page(bio, page, len, offset, false);
+		__bio_add_page(bio, page, len, offset, is_gup);
 	}
 	return len;
 }
@@ -885,6 +893,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	ssize_t size, left;
 	unsigned len, i;
 	size_t offset;
+	bool gup;
 
 	/*
 	 * Move page array up in the allocated memory for the bio vecs as far as
@@ -894,6 +903,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
+	/* Is iov_iter_get_pages() using GUP ? */
+	gup = iov_iter_get_pages_use_gup(iter);
 	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
@@ -902,7 +913,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		struct page *page = pages[i];
 
 		len = min_t(size_t, PAGE_SIZE - offset, left);
-		if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset, false) != len))
+		if (WARN_ON_ONCE(bio_add_page(bio, page, len,
+					      offset, gup) != len))
 			return -EINVAL;
 		offset = 0;
 	}
@@ -1372,6 +1384,10 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 		ssize_t bytes;
 		size_t offs, added = 0;
 		int npages;
+		bool gup;
+
+		/* Is iov_iter_get_pages() using GUP ? */
+		gup = iov_iter_get_pages_use_gup(iter);
 
 		bytes = iov_iter_get_pages_alloc(iter, &pages, LONG_MAX, &offs);
 		if (unlikely(bytes <= 0)) {
@@ -1393,7 +1409,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 				if (n > bytes)
 					n = bytes;
 
-				if (!bio_add_pc_page(q, bio, page, n, offs, false))
+				if (!bio_add_pc_page(q, bio, page, n, offs, gup))
 					break;
 
 				/*
@@ -1412,8 +1428,12 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 		/*
 		 * release the pages we didn't map into the bio, if any
 		 */
-		while (j < npages)
-			put_page(pages[j++]);
+		while (j < npages) {
+			if (gup)
+				put_user_page(pages[j++]);
+			else
+				put_page(pages[j++]);
+		}
 		kvfree(pages);
 		/* couldn't stuff something into bio? */
 		if (bytes)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 12/15] fs/direct-io: keep track of wether a page is coming from GUP or not
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (10 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 11/15] block: make sure bio_add_page*() knows page that are coming from GUP jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 23:14   ` Dave Chinner
  2019-04-11 21:08 ` [PATCH v1 13/15] fs/splice: use put_user_page() when appropriate jglisse
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox,
	Ernesto A . Fernández, Jeff Moyer

From: Jérôme Glisse <jglisse@redhat.com>

We want to keep track of how we got a reference on page when doing DIO,
ie wether the page was reference through GUP (get_user_page*) or not.
For that this patch rework the way page reference is taken and handed
over between DIO code and BIO. Instead of taking a reference for page
that have been successfuly added to a BIO we just steal the reference
we have when we lookup the page (either through GUP or for ZERO_PAGE).

So this patch keep track of wether the reference has been stolen by the
BIO or not. This avoids a bunch of get_page()/put_page() so this limit
the number of atomic operations.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
---
 fs/direct-io.c | 82 ++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index b8b5d8e31aeb..ef9fc7703a78 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -100,6 +100,7 @@ struct dio_submit {
 	unsigned cur_page_len;		/* Nr of bytes at cur_page_offset */
 	sector_t cur_page_block;	/* Where it starts */
 	loff_t cur_page_fs_offset;	/* Offset in file */
+	bool cur_page_from_gup;		/* Current page is coming from GUP */
 
 	struct iov_iter *iter;
 	/*
@@ -148,6 +149,8 @@ struct dio {
 		struct page *pages[DIO_PAGES];	/* page buffer */
 		struct work_struct complete_work;/* deferred AIO completion */
 	};
+
+	bool gup;			/* pages are coming from GUP */
 } ____cacheline_aligned_in_smp;
 
 static struct kmem_cache *dio_cache __read_mostly;
@@ -167,6 +170,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
 {
 	ssize_t ret;
 
+	dio->gup = iov_iter_get_pages_use_gup(sdio->iter);
 	ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES,
 				&sdio->from);
 
@@ -181,6 +185,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
 			dio->page_errors = ret;
 		get_page(page);
 		dio->pages[0] = page;
+		dio->gup = false;
 		sdio->head = 0;
 		sdio->tail = 1;
 		sdio->from = 0;
@@ -490,8 +495,12 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
  */
 static inline void dio_cleanup(struct dio *dio, struct dio_submit *sdio)
 {
-	while (sdio->head < sdio->tail)
-		put_page(dio->pages[sdio->head++]);
+	while (sdio->head < sdio->tail) {
+		if (dio->gup)
+			put_user_page(dio->pages[sdio->head++]);
+		else
+			put_page(dio->pages[sdio->head++]);
+	}
 }
 
 /*
@@ -760,15 +769,19 @@ static inline int dio_bio_add_page(struct dio_submit *sdio)
 {
 	int ret;
 
-	ret = bio_add_page(sdio->bio, sdio->cur_page,
-			sdio->cur_page_len, sdio->cur_page_offset, false);
+	/*
+	 * The bio is stealing the page reference and that is fine we can add a
+	 * page only once ie when dio_send_cur_page() is call and each call to
+	 * dio_send_cur_page() clear the cur_page (on success).
+	 */
+	ret = bio_add_page(sdio->bio, sdio->cur_page, sdio->cur_page_len,
+			 sdio->cur_page_offset, sdio->cur_page_from_gup);
 	if (ret == sdio->cur_page_len) {
 		/*
 		 * Decrement count only, if we are done with this page
 		 */
 		if ((sdio->cur_page_len + sdio->cur_page_offset) == PAGE_SIZE)
 			sdio->pages_in_io--;
-		get_page(sdio->cur_page);
 		sdio->final_block_in_bio = sdio->cur_page_block +
 			(sdio->cur_page_len >> sdio->blkbits);
 		ret = 0;
@@ -828,9 +841,14 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio,
 		ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh);
 		if (ret == 0) {
 			ret = dio_bio_add_page(sdio);
+			if (!ret)
+				/* Clear the current page. */
+				sdio->cur_page = NULL;
 			BUG_ON(ret != 0);
 		}
-	}
+	} else
+		/* Clear the current page. */
+		sdio->cur_page = NULL;
 out:
 	return ret;
 }
@@ -855,7 +873,7 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio,
 static inline int
 submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page,
 		    unsigned offset, unsigned len, sector_t blocknr,
-		    struct buffer_head *map_bh)
+		    struct buffer_head *map_bh, bool gup)
 {
 	int ret = 0;
 
@@ -882,14 +900,13 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page,
 	 */
 	if (sdio->cur_page) {
 		ret = dio_send_cur_page(dio, sdio, map_bh);
-		put_page(sdio->cur_page);
-		sdio->cur_page = NULL;
 		if (ret)
 			return ret;
 	}
 
-	get_page(page);		/* It is in dio */
+	/* Steal page reference and GUP flag */
 	sdio->cur_page = page;
+	sdio->cur_page_from_gup = gup;
 	sdio->cur_page_offset = offset;
 	sdio->cur_page_len = len;
 	sdio->cur_page_block = blocknr;
@@ -903,8 +920,6 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page,
 		ret = dio_send_cur_page(dio, sdio, map_bh);
 		if (sdio->bio)
 			dio_bio_submit(dio, sdio);
-		put_page(sdio->cur_page);
-		sdio->cur_page = NULL;
 	}
 	return ret;
 }
@@ -946,13 +961,29 @@ static inline void dio_zero_block(struct dio *dio, struct dio_submit *sdio,
 	this_chunk_bytes = this_chunk_blocks << sdio->blkbits;
 
 	page = ZERO_PAGE(0);
+	get_page(page);
 	if (submit_page_section(dio, sdio, page, 0, this_chunk_bytes,
-				sdio->next_block_for_io, map_bh))
+				sdio->next_block_for_io, map_bh, false)) {
+		put_page(page);
 		return;
+	}
 
 	sdio->next_block_for_io += this_chunk_blocks;
 }
 
+static inline void dio_put_page(const struct dio *dio, bool stolen,
+				struct page *page)
+{
+	/* If page reference was stolen then nothing to do. */
+	if (stolen)
+		return;
+
+	if (dio->gup)
+		put_user_page(page);
+	else
+		put_page(page);
+}
+
 /*
  * Walk the user pages, and the file, mapping blocks to disk and generating
  * a sequence of (page,offset,len,block) mappings.  These mappings are injected
@@ -977,6 +1008,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 	int ret = 0;
 
 	while (sdio->block_in_file < sdio->final_block_in_request) {
+		bool stolen = false;
 		struct page *page;
 		size_t from, to;
 
@@ -1003,7 +1035,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 
 				ret = get_more_blocks(dio, sdio, map_bh);
 				if (ret) {
-					put_page(page);
+					dio_put_page(dio, stolen, page);
 					goto out;
 				}
 				if (!buffer_mapped(map_bh))
@@ -1048,7 +1080,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 
 				/* AKPM: eargh, -ENOTBLK is a hack */
 				if (dio->op == REQ_OP_WRITE) {
-					put_page(page);
+					dio_put_page(dio, stolen, page);
 					return -ENOTBLK;
 				}
 
@@ -1061,7 +1093,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 				if (sdio->block_in_file >=
 						i_size_aligned >> blkbits) {
 					/* We hit eof */
-					put_page(page);
+					dio_put_page(dio, stolen, page);
 					goto out;
 				}
 				zero_user(page, from, 1 << blkbits);
@@ -1099,11 +1131,13 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 						  from,
 						  this_chunk_bytes,
 						  sdio->next_block_for_io,
-						  map_bh);
+						  map_bh, dio->gup);
 			if (ret) {
-				put_page(page);
+				dio_put_page(dio, stolen, page);
 				goto out;
-			}
+			} else
+				/* The page reference has been  stolen ... */
+				stolen = true;
 			sdio->next_block_for_io += this_chunk_blocks;
 
 			sdio->block_in_file += this_chunk_blocks;
@@ -1117,7 +1151,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 		}
 
 		/* Drop the ref which was taken in get_user_pages() */
-		put_page(page);
+		dio_put_page(dio, stolen, page);
 	}
 out:
 	return ret;
@@ -1356,8 +1390,12 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 		ret2 = dio_send_cur_page(dio, &sdio, &map_bh);
 		if (retval == 0)
 			retval = ret2;
-		put_page(sdio.cur_page);
-		sdio.cur_page = NULL;
+		else {
+			if (sdio.cur_page_from_gup)
+				put_user_page(sdio.cur_page);
+			else
+				put_page(sdio.cur_page);
+		}
 	}
 	if (sdio.bio)
 		dio_bio_submit(dio, &sdio);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 13/15] fs/splice: use put_user_page() when appropriate
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (11 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 12/15] fs/direct-io: keep track of wether a page is coming from GUP or not jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 14/15] fs: use bvec_set_gup_page() where appropriate jglisse
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

From: Jérôme Glisse <jglisse@redhat.com>

Use put_user_page() when page reference was taken through GUP.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
---
 fs/splice.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 4a0b522a0cb4..c9c350d37912 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -371,6 +371,7 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 	unsigned int nr_pages;
 	size_t offset, base, copied = 0;
 	ssize_t res;
+	bool gup;
 	int i;
 
 	if (pipe->nrbufs == pipe->buffers)
@@ -383,7 +384,7 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 	offset = *ppos & ~PAGE_MASK;
 
 	iov_iter_pipe(&to, READ, pipe, len + offset);
-
+	gup = iov_iter_get_pages_use_gup(&to);
 	res = iov_iter_get_pages_alloc(&to, &pages, len + offset, &base);
 	if (res <= 0)
 		return -ENOMEM;
@@ -419,8 +420,12 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 	if (vec != __vec)
 		kfree(vec);
 out:
-	for (i = 0; i < nr_pages; i++)
-		put_page(pages[i]);
+	for (i = 0; i < nr_pages; i++) {
+		if (gup)
+			put_user_page(pages[i]);
+		else
+			put_page(pages[i]);
+	}
 	kvfree(pages);
 	iov_iter_advance(&to, copied);	/* truncates and discards */
 	return res;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 14/15] fs: use bvec_set_gup_page() where appropriate
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (12 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 13/15] fs/splice: use put_user_page() when appropriate jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-11 21:08 ` [PATCH v1 15/15] ceph: use put_user_pages() instead of ceph_put_page_vector() jglisse
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox, Steve French,
	linux-cifs, samba-technical, Ilya Dryomov, Sage Weil, Alex Elder,
	ceph-devel

From: Jérôme Glisse <jglisse@redhat.com>

When we get a page reference through get_user_page*() we want to keep
track of that and bvec now has the ability to do so. Convert code to
use bvec_set_gup_page() where appropriate.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Sage Weil <sage@redhat.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
---
 fs/ceph/file.c | 3 +++
 fs/cifs/misc.c | 6 +++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index d5561662b902..6c5b85f01721 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -104,6 +104,9 @@ static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize,
 				min_t(int, bytes, PAGE_SIZE - start),
 				start);
 
+			/* Is iov_iter_get_pages() using GUP ? */
+			if (iov_iter_get_pages_use_gup(iter))
+				bvec_set_gup_page(&bv, pages[idx]);
 			bvecs[bvec_idx] = bv;
 			bytes -= bv.bv_len;
 			start = 0;
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index bc77a4a5f1af..e10d9f0f5874 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -883,7 +883,11 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw)
 
 		for (i = 0; i < cur_npages; i++) {
 			len = rc > PAGE_SIZE ? PAGE_SIZE : rc;
-			bvec_set_page(&bv[npages + i], pages[i]);
+			/* Is iov_iter_get_pages() using GUP ? */
+			if (iov_iter_get_pages_use_gup(iter))
+				bvec_set_gup_page(&bv[npages + i], pages[i]);
+			else
+				bvec_set_page(&bv[npages + i], pages[i]);
 			bv[npages + i].bv_offset = start;
 			bv[npages + i].bv_len = len - start;
 			rc -= len;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 15/15] ceph: use put_user_pages() instead of ceph_put_page_vector()
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (13 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 14/15] fs: use bvec_set_gup_page() where appropriate jglisse
@ 2019-04-11 21:08 ` jglisse
  2019-04-15  7:46   ` Yan, Zheng
  2019-04-16  0:00 ` [PATCH v1 00/15] Keep track of GUPed pages in fs and block Dave Chinner
       [not found] ` <2c124cc4-b97e-ee28-2926-305bc6bc74bd@plexistor.com>
  16 siblings, 1 reply; 47+ messages in thread
From: jglisse @ 2019-04-11 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jérôme Glisse, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox, Yan Zheng,
	Sage Weil, Ilya Dryomov, ceph-devel

From: Jérôme Glisse <jglisse@redhat.com>

When page reference were taken through GUP (get_user_page*()) we need
to drop them with put_user_pages().

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yan Zheng <zyan@redhat.com>
Cc: Sage Weil <sage@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
---
 fs/ceph/file.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 6c5b85f01721..5842ad3a4218 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -667,7 +667,8 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
 			} else {
 				iov_iter_advance(to, 0);
 			}
-			ceph_put_page_vector(pages, num_pages, false);
+			/* iov_iter_get_pages_alloc() did call GUP */
+			put_user_pages(pages, num_pages);
 		} else {
 			int idx = 0;
 			size_t left = ret > 0 ? ret : 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 12/15] fs/direct-io: keep track of wether a page is coming from GUP or not
  2019-04-11 21:08 ` [PATCH v1 12/15] fs/direct-io: keep track of wether a page is coming from GUP or not jglisse
@ 2019-04-11 23:14   ` Dave Chinner
  2019-04-12  0:08     ` Jerome Glisse
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2019-04-11 23:14 UTC (permalink / raw)
  To: jglisse
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Jan Kara, Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Ernesto A . Fernández, Jeff Moyer

On Thu, Apr 11, 2019 at 05:08:31PM -0400, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> We want to keep track of how we got a reference on page when doing DIO,
> ie wether the page was reference through GUP (get_user_page*) or not.
> For that this patch rework the way page reference is taken and handed
> over between DIO code and BIO. Instead of taking a reference for page
> that have been successfuly added to a BIO we just steal the reference
> we have when we lookup the page (either through GUP or for ZERO_PAGE).
> 
> So this patch keep track of wether the reference has been stolen by the
> BIO or not. This avoids a bunch of get_page()/put_page() so this limit
> the number of atomic operations.

Is the asme set of changes appropriate for the fs/iomap.c direct IO
path (i.e. XFS)?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 12/15] fs/direct-io: keep track of wether a page is coming from GUP or not
  2019-04-11 23:14   ` Dave Chinner
@ 2019-04-12  0:08     ` Jerome Glisse
  0 siblings, 0 replies; 47+ messages in thread
From: Jerome Glisse @ 2019-04-12  0:08 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Jan Kara, Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Ernesto A . Fernández, Jeff Moyer

On Fri, Apr 12, 2019 at 09:14:43AM +1000, Dave Chinner wrote:
> On Thu, Apr 11, 2019 at 05:08:31PM -0400, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > We want to keep track of how we got a reference on page when doing DIO,
> > ie wether the page was reference through GUP (get_user_page*) or not.
> > For that this patch rework the way page reference is taken and handed
> > over between DIO code and BIO. Instead of taking a reference for page
> > that have been successfuly added to a BIO we just steal the reference
> > we have when we lookup the page (either through GUP or for ZERO_PAGE).
> > 
> > So this patch keep track of wether the reference has been stolen by the
> > BIO or not. This avoids a bunch of get_page()/put_page() so this limit
> > the number of atomic operations.
> 
> Is the asme set of changes appropriate for the fs/iomap.c direct IO
> path (i.e. XFS)?

Yes and it is part of this patchset AFAICT iomap use bio_iov_iter_get_pages()
which is updated to pass down wether page are coming from GUP or not. The
bio you get out of that is then release through iomap_dio_bio_end_io() which
calls bvec_put_page() which will use put_user_page() for GUPed page.

I may have miss a case and review are welcome.

Note that while the convertion is happening put_user_page is exactly the same
as put_page() in fact the implementation just call put_page() with nothing
else.

The tricky part is making sure that before we diverge with a put_user_page()
that does something else that put_page() we will need to be sure that we did
not left a path that do GUP but does call put_page() and not put_user_page().
We have some plan to catch that in debug build.

In any case i believe we will be very careful when the times come to change
put_user_page() to something different.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 15/15] ceph: use put_user_pages() instead of ceph_put_page_vector()
  2019-04-11 21:08 ` [PATCH v1 15/15] ceph: use put_user_pages() instead of ceph_put_page_vector() jglisse
@ 2019-04-15  7:46   ` Yan, Zheng
  2019-04-15 15:11     ` Jerome Glisse
  0 siblings, 1 reply; 47+ messages in thread
From: Yan, Zheng @ 2019-04-15  7:46 UTC (permalink / raw)
  To: jglisse, linux-kernel
  Cc: linux-fsdevel, linux-block, linux-mm, John Hubbard, Jan Kara,
	Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Dave Chinner,
	Jason Gunthorpe, Matthew Wilcox, Sage Weil, Ilya Dryomov,
	ceph-devel

On 4/12/19 5:08 AM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> When page reference were taken through GUP (get_user_page*()) we need
> to drop them with put_user_pages().
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-block@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Yan Zheng <zyan@redhat.com>
> Cc: Sage Weil <sage@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: ceph-devel@vger.kernel.org
> ---
>   fs/ceph/file.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 6c5b85f01721..5842ad3a4218 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -667,7 +667,8 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>   			} else {
>   				iov_iter_advance(to, 0);
>   			}
> -			ceph_put_page_vector(pages, num_pages, false);
> +			/* iov_iter_get_pages_alloc() did call GUP */
> +			put_user_pages(pages, num_pages);

pages in pipe were not from get_user_pages(). Am I missing anything?

Regards
Yan, Zheng

>   		} else {
>   			int idx = 0;
>   			size_t left = ret > 0 ? ret : 0;
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page()
  2019-04-11 21:08 ` [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() jglisse
@ 2019-04-15 14:59   ` Jan Kara
  2019-04-15 15:24     ` Jerome Glisse
  2019-04-16  0:22     ` Jerome Glisse
  0 siblings, 2 replies; 47+ messages in thread
From: Jan Kara @ 2019-04-15 14:59 UTC (permalink / raw)
  To: jglisse
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Jan Kara, Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Dave Chinner,
	Jason Gunthorpe, Matthew Wilcox

Hi Jerome!

On Thu 11-04-19 17:08:29, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> We want to keep track of how we got a reference on page added to bio_vec
> ie wether the page was reference through GUP (get_user_page*) or not. So
> add a flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() to that
> effect.

Thanks for writing this patch set! Looking through patches like this one,
I'm a bit concerned. With so many bio_add_page() callers it's difficult to
get things right and not regress in the future. I'm wondering whether the
things won't be less error-prone if we required that all page reference
from bio are gup-like (not necessarily taken by GUP, if creator of the bio
gets to struct page he needs via some other means (e.g. page cache lookup),
he could just use get_gup_pin() helper we'd provide).  After all, a page
reference in bio means that the page is pinned for the duration of IO and
can be DMAed to/from so it even makes some sense to track the reference
like that. Then bio_put() would just unconditionally do put_user_page() and
we won't have to propagate the information in the bio.

Do you think this would be workable and easier?

								Honza

> 
> This is done using a coccinelle patch and running it with:
> 
> spatch --sp-file spfile --in-place --include-headers --dir .
> 
> with spfile:
> %<---------------------------------------------------------------------
> @@
> identifier I1, I2, I3, I4;
> @@
> void __bio_add_page(struct bio *I1, struct page *I2, unsigned I3,
> unsigned I4
> +, bool is_gup
>  ) {...}
> 
> @@
> identifier I1, I2, I3, I4;
> @@
> void __bio_add_page(struct bio *I1, struct page *I2, unsigned I3,
> unsigned I4
> +, bool is_gup
>  );
> 
> @@
> identifier I1, I2, I3, I4;
> @@
> int bio_add_page(struct bio *I1, struct page *I2, unsigned I3,
> unsigned I4
> +, bool is_gup
>  ) {...}
> 
> @@
> @@
> int bio_add_page(struct bio *, struct page *, unsigned, unsigned
> +, bool is_gup
>  );
> 
> @@
> identifier I1, I2, I3, I4, I5;
> @@
> int bio_add_pc_page(struct request_queue *I1, struct bio *I2,
> struct page *I3, unsigned I4, unsigned I5
> +, bool is_gup
>  ) {...}
> 
> @@
> @@
> int bio_add_pc_page(struct request_queue *, struct bio *,
> struct page *, unsigned, unsigned
> +, bool is_gup
>  );
> 
> @@
> expression E1, E2, E3, E4;
> @@
> __bio_add_page(E1, E2, E3, E4
> +, false
>  )
> 
> @@
> expression E1, E2, E3, E4;
> @@
> bio_add_page(E1, E2, E3, E4
> +, false
>  )
> 
> @@
> expression E1, E2, E3, E4, E5;
> @@
> bio_add_pc_page(E1, E2, E3, E4, E5
> +, false
>  )
> --------------------------------------------------------------------->%
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-block@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Matthew Wilcox <willy@infradead.org>
> ---
>  block/bio.c                         | 20 ++++++++++----------
>  block/blk-lib.c                     |  3 ++-
>  drivers/block/drbd/drbd_actlog.c    |  2 +-
>  drivers/block/drbd/drbd_bitmap.c    |  2 +-
>  drivers/block/drbd/drbd_receiver.c  |  2 +-
>  drivers/block/floppy.c              |  2 +-
>  drivers/block/pktcdvd.c             |  4 ++--
>  drivers/block/xen-blkback/blkback.c |  2 +-
>  drivers/block/zram/zram_drv.c       |  4 ++--
>  drivers/lightnvm/core.c             |  2 +-
>  drivers/lightnvm/pblk-core.c        |  5 +++--
>  drivers/lightnvm/pblk-rb.c          |  2 +-
>  drivers/md/dm-bufio.c               |  2 +-
>  drivers/md/dm-crypt.c               |  2 +-
>  drivers/md/dm-io.c                  |  5 +++--
>  drivers/md/dm-log-writes.c          |  8 ++++----
>  drivers/md/dm-writecache.c          |  3 ++-
>  drivers/md/dm-zoned-metadata.c      |  6 +++---
>  drivers/md/md.c                     |  4 ++--
>  drivers/md/raid1-10.c               |  2 +-
>  drivers/md/raid1.c                  |  4 ++--
>  drivers/md/raid10.c                 |  4 ++--
>  drivers/md/raid5-cache.c            |  7 ++++---
>  drivers/md/raid5-ppl.c              |  6 +++---
>  drivers/nvme/target/io-cmd-bdev.c   |  2 +-
>  drivers/staging/erofs/data.c        |  4 ++--
>  drivers/staging/erofs/unzip_vle.c   |  2 +-
>  drivers/target/target_core_iblock.c |  4 ++--
>  drivers/target/target_core_pscsi.c  |  2 +-
>  fs/btrfs/check-integrity.c          |  2 +-
>  fs/btrfs/compression.c              | 10 +++++-----
>  fs/btrfs/extent_io.c                |  8 ++++----
>  fs/btrfs/raid56.c                   |  4 ++--
>  fs/btrfs/scrub.c                    | 10 +++++-----
>  fs/buffer.c                         |  2 +-
>  fs/crypto/bio.c                     |  2 +-
>  fs/direct-io.c                      |  2 +-
>  fs/ext4/page-io.c                   |  2 +-
>  fs/ext4/readpage.c                  |  2 +-
>  fs/f2fs/data.c                      | 10 +++++-----
>  fs/gfs2/lops.c                      |  4 ++--
>  fs/gfs2/meta_io.c                   |  2 +-
>  fs/gfs2/ops_fstype.c                |  2 +-
>  fs/hfsplus/wrapper.c                |  3 ++-
>  fs/iomap.c                          |  6 +++---
>  fs/jfs/jfs_logmgr.c                 |  4 ++--
>  fs/jfs/jfs_metapage.c               |  6 +++---
>  fs/mpage.c                          |  4 ++--
>  fs/nfs/blocklayout/blocklayout.c    |  2 +-
>  fs/nilfs2/segbuf.c                  |  3 ++-
>  fs/ocfs2/cluster/heartbeat.c        |  2 +-
>  fs/xfs/xfs_aops.c                   |  2 +-
>  fs/xfs/xfs_buf.c                    |  2 +-
>  include/linux/bio.h                 |  7 ++++---
>  kernel/power/swap.c                 |  2 +-
>  mm/page_io.c                        |  2 +-
>  56 files changed, 116 insertions(+), 108 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index efd254c90974..73227ede9a0a 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -663,7 +663,7 @@ EXPORT_SYMBOL(bio_clone_fast);
>   *	This should only be used by REQ_PC bios.
>   */
>  int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
> -		    *page, unsigned int len, unsigned int offset)
> +		    *page, unsigned int len, unsigned int offset, bool is_gup)
>  {
>  	int retried_segments = 0;
>  	struct bio_vec *bvec;
> @@ -798,7 +798,7 @@ EXPORT_SYMBOL_GPL(__bio_try_merge_page);
>   * that @bio has space for another bvec.
>   */
>  void __bio_add_page(struct bio *bio, struct page *page,
> -		unsigned int len, unsigned int off)
> +		unsigned int len, unsigned int off, bool is_gup)
>  {
>  	struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
>  
> @@ -825,12 +825,12 @@ EXPORT_SYMBOL_GPL(__bio_add_page);
>   *	if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio.
>   */
>  int bio_add_page(struct bio *bio, struct page *page,
> -		 unsigned int len, unsigned int offset)
> +		 unsigned int len, unsigned int offset, bool is_gup)
>  {
>  	if (!__bio_try_merge_page(bio, page, len, offset, false)) {
>  		if (bio_full(bio))
>  			return 0;
> -		__bio_add_page(bio, page, len, offset);
> +		__bio_add_page(bio, page, len, offset, false);
>  	}
>  	return len;
>  }
> @@ -847,7 +847,7 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
>  
>  	len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count);
>  	size = bio_add_page(bio, bvec_page(bv), len,
> -				bv->bv_offset + iter->iov_offset);
> +				bv->bv_offset + iter->iov_offset, false);
>  	if (size == len) {
>  		if (!bio_flagged(bio, BIO_NO_PAGE_REF)) {
>  			struct page *page;
> @@ -902,7 +902,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  		struct page *page = pages[i];
>  
>  		len = min_t(size_t, PAGE_SIZE - offset, left);
> -		if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset) != len))
> +		if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset, false) != len))
>  			return -EINVAL;
>  		offset = 0;
>  	}
> @@ -1298,7 +1298,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
>  			}
>  		}
>  
> -		if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) {
> +		if (bio_add_pc_page(q, bio, page, bytes, offset, false) < bytes) {
>  			if (!map_data)
>  				__free_page(page);
>  			break;
> @@ -1393,7 +1393,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
>  				if (n > bytes)
>  					n = bytes;
>  
> -				if (!bio_add_pc_page(q, bio, page, n, offs))
> +				if (!bio_add_pc_page(q, bio, page, n, offs, false))
>  					break;
>  
>  				/*
> @@ -1509,7 +1509,7 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
>  			bytes = len;
>  
>  		if (bio_add_pc_page(q, bio, virt_to_page(data), bytes,
> -				    offset) < bytes) {
> +				    offset, false) < bytes) {
>  			/* we don't support partial mappings */
>  			bio_put(bio);
>  			return ERR_PTR(-EINVAL);
> @@ -1592,7 +1592,7 @@ struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
>  		if (!reading)
>  			memcpy(page_address(page), p, bytes);
>  
> -		if (bio_add_pc_page(q, bio, page, bytes, 0) < bytes)
> +		if (bio_add_pc_page(q, bio, page, bytes, 0, false) < bytes)
>  			break;
>  
>  		len -= bytes;
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 02a0b398566d..0ccb8ea980f5 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -289,7 +289,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
>  
>  		while (nr_sects != 0) {
>  			sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
> -			bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0);
> +			bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0,
> +					       false);
>  			nr_sects -= bi_size >> 9;
>  			sector += bi_size >> 9;
>  			if (bi_size < sz)
> diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
> index 5f0eaee8c8a7..532c783667c2 100644
> --- a/drivers/block/drbd/drbd_actlog.c
> +++ b/drivers/block/drbd/drbd_actlog.c
> @@ -154,7 +154,7 @@ static int _drbd_md_sync_page_io(struct drbd_device *device,
>  	bio_set_dev(bio, bdev->md_bdev);
>  	bio->bi_iter.bi_sector = sector;
>  	err = -EIO;
> -	if (bio_add_page(bio, device->md_io.page, size, 0) != size)
> +	if (bio_add_page(bio, device->md_io.page, size, 0, false) != size)
>  		goto out;
>  	bio->bi_private = device;
>  	bio->bi_end_io = drbd_md_endio;
> diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
> index e567bc234781..558c331342f1 100644
> --- a/drivers/block/drbd/drbd_bitmap.c
> +++ b/drivers/block/drbd/drbd_bitmap.c
> @@ -1024,7 +1024,7 @@ static void bm_page_io_async(struct drbd_bm_aio_ctx *ctx, int page_nr) __must_ho
>  	bio->bi_iter.bi_sector = on_disk_sector;
>  	/* bio_add_page of a single page to an empty bio will always succeed,
>  	 * according to api.  Do we want to assert that? */
> -	bio_add_page(bio, page, len, 0);
> +	bio_add_page(bio, page, len, 0, false);
>  	bio->bi_private = ctx;
>  	bio->bi_end_io = drbd_bm_endio;
>  	bio_set_op_attrs(bio, op, 0);
> diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
> index ee7c77445456..802565c28905 100644
> --- a/drivers/block/drbd/drbd_receiver.c
> +++ b/drivers/block/drbd/drbd_receiver.c
> @@ -1716,7 +1716,7 @@ int drbd_submit_peer_request(struct drbd_device *device,
>  
>  	page_chain_for_each(page) {
>  		unsigned len = min_t(unsigned, data_size, PAGE_SIZE);
> -		if (!bio_add_page(bio, page, len, 0))
> +		if (!bio_add_page(bio, page, len, 0, false))
>  			goto next_bio;
>  		data_size -= len;
>  		sector += len >> 9;
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index 6201106cb7e3..11e77f88ac39 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -4131,7 +4131,7 @@ static int __floppy_read_block_0(struct block_device *bdev, int drive)
>  
>  	bio_init(&bio, &bio_vec, 1);
>  	bio_set_dev(&bio, bdev);
> -	bio_add_page(&bio, page, size, 0);
> +	bio_add_page(&bio, page, size, 0, false);
>  
>  	bio.bi_iter.bi_sector = 0;
>  	bio.bi_flags |= (1 << BIO_QUIET);
> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> index f5a71023f76c..cb5b9b4a7091 100644
> --- a/drivers/block/pktcdvd.c
> +++ b/drivers/block/pktcdvd.c
> @@ -1037,7 +1037,7 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
>  		offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
>  		pkt_dbg(2, pd, "Adding frame %d, page:%p offs:%d\n",
>  			f, pkt->pages[p], offset);
> -		if (!bio_add_page(bio, pkt->pages[p], CD_FRAMESIZE, offset))
> +		if (!bio_add_page(bio, pkt->pages[p], CD_FRAMESIZE, offset, false))
>  			BUG();
>  
>  		atomic_inc(&pkt->io_wait);
> @@ -1277,7 +1277,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
>  		struct page *page = pkt->pages[(f * CD_FRAMESIZE) / PAGE_SIZE];
>  		unsigned offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
>  
> -		if (!bio_add_page(pkt->w_bio, page, CD_FRAMESIZE, offset))
> +		if (!bio_add_page(pkt->w_bio, page, CD_FRAMESIZE, offset, false))
>  			BUG();
>  	}
>  	pkt_dbg(2, pd, "vcnt=%d\n", pkt->w_bio->bi_vcnt);
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index fd1e19f1a49f..886e2e3202a7 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -1362,7 +1362,7 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>  		       (bio_add_page(bio,
>  				     pages[i]->page,
>  				     seg[i].nsec << 9,
> -				     seg[i].offset) == 0)) {
> +				     seg[i].offset, false) == 0)) {
>  
>  			int nr_iovecs = min_t(int, (nseg-i), BIO_MAX_PAGES);
>  			bio = bio_alloc(GFP_KERNEL, nr_iovecs);
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 04fb864b16f5..a0734408db2f 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -596,7 +596,7 @@ static int read_from_bdev_async(struct zram *zram, struct bio_vec *bvec,
>  
>  	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
>  	bio_set_dev(bio, zram->bdev);
> -	if (!bio_add_page(bio, bvec_page(bvec), bvec->bv_len, bvec->bv_offset)) {
> +	if (!bio_add_page(bio, bvec_page(bvec), bvec->bv_len, bvec->bv_offset, false)) {
>  		bio_put(bio);
>  		return -EIO;
>  	}
> @@ -713,7 +713,7 @@ static ssize_t writeback_store(struct device *dev,
>  		bio.bi_opf = REQ_OP_WRITE | REQ_SYNC;
>  
>  		bio_add_page(&bio, bvec_page(&bvec), bvec.bv_len,
> -				bvec.bv_offset);
> +				bvec.bv_offset, false);
>  		/*
>  		 * XXX: A single page IO would be inefficient for write
>  		 * but it would be not bad as starter.
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index 5f82036fe322..cc08485dc36a 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -807,7 +807,7 @@ static int nvm_bb_chunk_sense(struct nvm_dev *dev, struct ppa_addr ppa)
>  		return -ENOMEM;
>  
>  	bio_init(&bio, &bio_vec, 1);
> -	bio_add_page(&bio, page, PAGE_SIZE, 0);
> +	bio_add_page(&bio, page, PAGE_SIZE, 0, false);
>  	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
>  
>  	rqd.bio = &bio;
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 6ddb1e8a7223..2f374275b638 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -344,7 +344,8 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags,
>  	for (i = 0; i < nr_pages; i++) {
>  		page = mempool_alloc(&pblk->page_bio_pool, flags);
>  
> -		ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0);
> +		ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0,
> +				      false);
>  		if (ret != PBLK_EXPOSED_PAGE_SIZE) {
>  			pblk_err(pblk, "could not add page to bio\n");
>  			mempool_free(page, &pblk->page_bio_pool);
> @@ -605,7 +606,7 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
>  			goto out;
>  		}
>  
> -		ret = bio_add_pc_page(dev->q, bio, page, PAGE_SIZE, 0);
> +		ret = bio_add_pc_page(dev->q, bio, page, PAGE_SIZE, 0, false);
>  		if (ret != PAGE_SIZE) {
>  			pblk_err(pblk, "could not add page to bio\n");
>  			bio_put(bio);
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index 03c241b340ea..986d9d308176 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -596,7 +596,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>  			return NVM_IO_ERR;
>  		}
>  
> -		if (bio_add_pc_page(q, bio, page, rb->seg_size, 0) !=
> +		if (bio_add_pc_page(q, bio, page, rb->seg_size, 0, false) !=
>  								rb->seg_size) {
>  			pblk_err(pblk, "could not add page to write bio\n");
>  			flags &= ~PBLK_WRITTEN_DATA;
> diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
> index 1ecef76225a1..4c77e2a7c2d8 100644
> --- a/drivers/md/dm-bufio.c
> +++ b/drivers/md/dm-bufio.c
> @@ -598,7 +598,7 @@ static void use_bio(struct dm_buffer *b, int rw, sector_t sector,
>  	do {
>  		unsigned this_step = min((unsigned)(PAGE_SIZE - offset_in_page(ptr)), len);
>  		if (!bio_add_page(bio, virt_to_page(ptr), this_step,
> -				  offset_in_page(ptr))) {
> +				  offset_in_page(ptr), false)) {
>  			bio_put(bio);
>  			goto dmio;
>  		}
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index ef7896c50814..29006bdc6753 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -1429,7 +1429,7 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
>  
>  		len = (remaining_size > PAGE_SIZE) ? PAGE_SIZE : remaining_size;
>  
> -		bio_add_page(clone, page, len, 0);
> +		bio_add_page(clone, page, len, 0, false);
>  
>  		remaining_size -= len;
>  	}
> diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
> index 81a346f9de17..1d47565b49c3 100644
> --- a/drivers/md/dm-io.c
> +++ b/drivers/md/dm-io.c
> @@ -361,7 +361,8 @@ static void do_region(int op, int op_flags, unsigned region,
>  			 * WRITE SAME only uses a single page.
>  			 */
>  			dp->get_page(dp, &page, &len, &offset);
> -			bio_add_page(bio, page, logical_block_size, offset);
> +			bio_add_page(bio, page, logical_block_size, offset,
> +				     false);
>  			num_sectors = min_t(sector_t, special_cmd_max_sectors, remaining);
>  			bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;
>  
> @@ -374,7 +375,7 @@ static void do_region(int op, int op_flags, unsigned region,
>  			 */
>  			dp->get_page(dp, &page, &len, &offset);
>  			len = min(len, to_bytes(remaining));
> -			if (!bio_add_page(bio, page, len, offset))
> +			if (!bio_add_page(bio, page, len, offset, false))
>  				break;
>  
>  			offset = 0;
> diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
> index e403fcb5c30a..4d42de63c85e 100644
> --- a/drivers/md/dm-log-writes.c
> +++ b/drivers/md/dm-log-writes.c
> @@ -234,7 +234,7 @@ static int write_metadata(struct log_writes_c *lc, void *entry,
>  	       lc->sectorsize - entrylen - datalen);
>  	kunmap_atomic(ptr);
>  
> -	ret = bio_add_page(bio, page, lc->sectorsize, 0);
> +	ret = bio_add_page(bio, page, lc->sectorsize, 0, false);
>  	if (ret != lc->sectorsize) {
>  		DMERR("Couldn't add page to the log block");
>  		goto error_bio;
> @@ -294,7 +294,7 @@ static int write_inline_data(struct log_writes_c *lc, void *entry,
>  				memset(ptr + pg_datalen, 0, pg_sectorlen - pg_datalen);
>  			kunmap_atomic(ptr);
>  
> -			ret = bio_add_page(bio, page, pg_sectorlen, 0);
> +			ret = bio_add_page(bio, page, pg_sectorlen, 0, false);
>  			if (ret != pg_sectorlen) {
>  				DMERR("Couldn't add page of inline data");
>  				__free_page(page);
> @@ -371,7 +371,7 @@ static int log_one_block(struct log_writes_c *lc,
>  		 * for every bvec in the original bio for simplicity sake.
>  		 */
>  		ret = bio_add_page(bio, bvec_page(&block->vecs[i]),
> -				   block->vecs[i].bv_len, 0);
> +				   block->vecs[i].bv_len, 0, false);
>  		if (ret != block->vecs[i].bv_len) {
>  			atomic_inc(&lc->io_blocks);
>  			submit_bio(bio);
> @@ -388,7 +388,7 @@ static int log_one_block(struct log_writes_c *lc,
>  			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>  
>  			ret = bio_add_page(bio, bvec_page(&block->vecs[i]),
> -					   block->vecs[i].bv_len, 0);
> +					   block->vecs[i].bv_len, 0, false);
>  			if (ret != block->vecs[i].bv_len) {
>  				DMERR("Couldn't add page on new bio?");
>  				bio_put(bio);
> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> index f7822875589e..2fff48b5479a 100644
> --- a/drivers/md/dm-writecache.c
> +++ b/drivers/md/dm-writecache.c
> @@ -1440,7 +1440,8 @@ static bool wc_add_block(struct writeback_struct *wb, struct wc_entry *e, gfp_t
>  
>  	persistent_memory_flush_cache(address, block_size);
>  	return bio_add_page(&wb->bio, persistent_memory_page(address),
> -			    block_size, persistent_memory_page_offset(address)) != 0;
> +			    block_size,
> +			    persistent_memory_page_offset(address), false) != 0;
>  }
>  
>  struct writeback_list {
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index fa68336560c3..70fbf77bc396 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -438,7 +438,7 @@ static struct dmz_mblock *dmz_get_mblock_slow(struct dmz_metadata *zmd,
>  	bio->bi_private = mblk;
>  	bio->bi_end_io = dmz_mblock_bio_end_io;
>  	bio_set_op_attrs(bio, REQ_OP_READ, REQ_META | REQ_PRIO);
> -	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
> +	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0, false);
>  	submit_bio(bio);
>  
>  	return mblk;
> @@ -588,7 +588,7 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
>  	bio->bi_private = mblk;
>  	bio->bi_end_io = dmz_mblock_bio_end_io;
>  	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
> -	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
> +	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0, false);
>  	submit_bio(bio);
>  }
>  
> @@ -608,7 +608,7 @@ static int dmz_rdwr_block(struct dmz_metadata *zmd, int op, sector_t block,
>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
>  	bio_set_dev(bio, zmd->dev->bdev);
>  	bio_set_op_attrs(bio, op, REQ_SYNC | REQ_META | REQ_PRIO);
> -	bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0);
> +	bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0, false);
>  	ret = submit_bio_wait(bio);
>  	bio_put(bio);
>  
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 05ffffb8b769..585016563ec1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -817,7 +817,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
>  
>  	bio_set_dev(bio, rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev);
>  	bio->bi_iter.bi_sector = sector;
> -	bio_add_page(bio, page, size, 0);
> +	bio_add_page(bio, page, size, 0, false);
>  	bio->bi_private = rdev;
>  	bio->bi_end_io = super_written;
>  
> @@ -859,7 +859,7 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
>  		bio->bi_iter.bi_sector = sector + rdev->new_data_offset;
>  	else
>  		bio->bi_iter.bi_sector = sector + rdev->data_offset;
> -	bio_add_page(bio, page, size, 0);
> +	bio_add_page(bio, page, size, 0, false);
>  
>  	submit_bio_wait(bio);
>  
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index 400001b815db..f79c87b3d2bb 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -76,7 +76,7 @@ static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
>  		 * won't fail because the vec table is big
>  		 * enough to hold all these pages
>  		 */
> -		bio_add_page(bio, page, len, 0);
> +		bio_add_page(bio, page, len, 0, false);
>  		size -= len;
>  	} while (idx++ < RESYNC_PAGES && size > 0);
>  }
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index fdf451aac369..a9e736ef1b33 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1112,7 +1112,7 @@ static void alloc_behind_master_bio(struct r1bio *r1_bio,
>  		if (unlikely(!page))
>  			goto free_pages;
>  
> -		bio_add_page(behind_bio, page, len, 0);
> +		bio_add_page(behind_bio, page, len, 0, false);
>  
>  		size -= len;
>  		i++;
> @@ -2854,7 +2854,7 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>  				 * won't fail because the vec table is big
>  				 * enough to hold all these pages
>  				 */
> -				bio_add_page(bio, page, len, 0);
> +				bio_add_page(bio, page, len, 0, false);
>  			}
>  		}
>  		nr_sectors += len>>9;
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 3b6880dd648d..e172fd3666d7 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3449,7 +3449,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  			 * won't fail because the vec table is big enough
>  			 * to hold all these pages
>  			 */
> -			bio_add_page(bio, page, len, 0);
> +			bio_add_page(bio, page, len, 0, false);
>  		}
>  		nr_sectors += len>>9;
>  		sector_nr += len>>9;
> @@ -4659,7 +4659,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>  			 * won't fail because the vec table is big enough
>  			 * to hold all these pages
>  			 */
> -			bio_add_page(bio, page, len, 0);
> +			bio_add_page(bio, page, len, 0, false);
>  		}
>  		sector_nr += len >> 9;
>  		nr_sectors += len >> 9;
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index cbbe6b6535be..b62806564760 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -804,7 +804,7 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
>  	io->current_bio = r5l_bio_alloc(log);
>  	io->current_bio->bi_end_io = r5l_log_endio;
>  	io->current_bio->bi_private = io;
> -	bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0);
> +	bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0, false);
>  
>  	r5_reserve_log_entry(log, io);
>  
> @@ -864,7 +864,7 @@ static void r5l_append_payload_page(struct r5l_log *log, struct page *page)
>  		io->need_split_bio = false;
>  	}
>  
> -	if (!bio_add_page(io->current_bio, page, PAGE_SIZE, 0))
> +	if (!bio_add_page(io->current_bio, page, PAGE_SIZE, 0, false))
>  		BUG();
>  
>  	r5_reserve_log_entry(log, io);
> @@ -1699,7 +1699,8 @@ static int r5l_recovery_fetch_ra_pool(struct r5l_log *log,
>  
>  	while (ctx->valid_pages < ctx->total_pages) {
>  		bio_add_page(ctx->ra_bio,
> -			     ctx->ra_pool[ctx->valid_pages], PAGE_SIZE, 0);
> +			     ctx->ra_pool[ctx->valid_pages], PAGE_SIZE, 0,
> +			     false);
>  		ctx->valid_pages += 1;
>  
>  		offset = r5l_ring_add(log, offset, BLOCK_SECTORS);
> diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
> index 17e9e7d51097..12003f091465 100644
> --- a/drivers/md/raid5-ppl.c
> +++ b/drivers/md/raid5-ppl.c
> @@ -476,7 +476,7 @@ static void ppl_submit_iounit(struct ppl_io_unit *io)
>  	bio->bi_opf = REQ_OP_WRITE | REQ_FUA;
>  	bio_set_dev(bio, log->rdev->bdev);
>  	bio->bi_iter.bi_sector = log->next_io_sector;
> -	bio_add_page(bio, io->header_page, PAGE_SIZE, 0);
> +	bio_add_page(bio, io->header_page, PAGE_SIZE, 0, false);
>  	bio->bi_write_hint = ppl_conf->write_hint;
>  
>  	pr_debug("%s: log->current_io_sector: %llu\n", __func__,
> @@ -501,7 +501,7 @@ static void ppl_submit_iounit(struct ppl_io_unit *io)
>  		if (test_bit(STRIPE_FULL_WRITE, &sh->state))
>  			continue;
>  
> -		if (!bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0)) {
> +		if (!bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0, false)) {
>  			struct bio *prev = bio;
>  
>  			bio = bio_alloc_bioset(GFP_NOIO, BIO_MAX_PAGES,
> @@ -510,7 +510,7 @@ static void ppl_submit_iounit(struct ppl_io_unit *io)
>  			bio->bi_write_hint = prev->bi_write_hint;
>  			bio_copy_dev(bio, prev);
>  			bio->bi_iter.bi_sector = bio_end_sector(prev);
> -			bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0);
> +			bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0, false);
>  
>  			bio_chain(bio, prev);
>  			ppl_submit_iounit_bio(io, prev);
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index a065dbfc43b1..6ba1fd806394 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -144,7 +144,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  	bio_set_op_attrs(bio, op, op_flags);
>  
>  	for_each_sg(req->sg, sg, req->sg_cnt, i) {
> -		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
> +		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset, false)
>  				!= sg->length) {
>  			struct bio *prev = bio;
>  
> diff --git a/drivers/staging/erofs/data.c b/drivers/staging/erofs/data.c
> index ba467ba414ff..4fb84db9d5b4 100644
> --- a/drivers/staging/erofs/data.c
> +++ b/drivers/staging/erofs/data.c
> @@ -70,7 +70,7 @@ struct page *__erofs_get_meta_page(struct super_block *sb,
>  			goto err_out;
>  		}
>  
> -		err = bio_add_page(bio, page, PAGE_SIZE, 0);
> +		err = bio_add_page(bio, page, PAGE_SIZE, 0, false);
>  		if (unlikely(err != PAGE_SIZE)) {
>  			err = -EFAULT;
>  			goto err_out;
> @@ -290,7 +290,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio,
>  		}
>  	}
>  
> -	err = bio_add_page(bio, page, PAGE_SIZE, 0);
> +	err = bio_add_page(bio, page, PAGE_SIZE, 0, false);
>  	/* out of the extent or bio is full */
>  	if (err < PAGE_SIZE)
>  		goto submit_bio_retry;
> diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
> index 11aa0c6f1994..3cecd109324e 100644
> --- a/drivers/staging/erofs/unzip_vle.c
> +++ b/drivers/staging/erofs/unzip_vle.c
> @@ -1453,7 +1453,7 @@ static bool z_erofs_vle_submit_all(struct super_block *sb,
>  			++nr_bios;
>  		}
>  
> -		err = bio_add_page(bio, page, PAGE_SIZE, 0);
> +		err = bio_add_page(bio, page, PAGE_SIZE, 0, false);
>  		if (err < PAGE_SIZE)
>  			goto submit_bio_retry;
>  
> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> index b5ed9c377060..9dc0d3712241 100644
> --- a/drivers/target/target_core_iblock.c
> +++ b/drivers/target/target_core_iblock.c
> @@ -501,7 +501,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
>  	refcount_set(&ibr->pending, 1);
>  
>  	while (sectors) {
> -		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
> +		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset, false)
>  				!= sg->length) {
>  
>  			bio = iblock_get_bio(cmd, block_lba, 1, REQ_OP_WRITE,
> @@ -753,7 +753,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>  		 *	length of the S/G list entry this will cause and
>  		 *	endless loop.  Better hope no driver uses huge pages.
>  		 */
> -		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
> +		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset, false)
>  				!= sg->length) {
>  			if (cmd->prot_type && dev->dev_attrib.pi_prot_type) {
>  				rc = iblock_alloc_bip(cmd, bio, &prot_miter);
> diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> index b5388a106567..570ef259d78d 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -916,7 +916,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>  				page, len, off);
>  
>  			rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
> -					bio, page, bytes, off);
> +					bio, page, bytes, off, false);
>  			pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
>  				bio_segments(bio), nr_vecs);
>  			if (rc != bytes) {
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index c5ee3ac73930..d1bdddf3299a 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -1633,7 +1633,7 @@ static int btrfsic_read_block(struct btrfsic_state *state,
>  
>  		for (j = i; j < num_pages; j++) {
>  			ret = bio_add_page(bio, block_ctx->pagev[j],
> -					   PAGE_SIZE, 0);
> +					   PAGE_SIZE, 0, false);
>  			if (PAGE_SIZE != ret)
>  				break;
>  		}
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index fcedb69c4d7a..3e28a0c01a60 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -337,7 +337,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
>  							  0);
>  
>  		page->mapping = NULL;
> -		if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) <
> +		if (submit || bio_add_page(bio, page, PAGE_SIZE, 0, false) <
>  		    PAGE_SIZE) {
>  			/*
>  			 * inc the count before we submit the bio so
> @@ -365,7 +365,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
>  			bio->bi_opf = REQ_OP_WRITE | write_flags;
>  			bio->bi_private = cb;
>  			bio->bi_end_io = end_compressed_bio_write;
> -			bio_add_page(bio, page, PAGE_SIZE, 0);
> +			bio_add_page(bio, page, PAGE_SIZE, 0, false);
>  		}
>  		if (bytes_left < PAGE_SIZE) {
>  			btrfs_info(fs_info,
> @@ -491,7 +491,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>  		}
>  
>  		ret = bio_add_page(cb->orig_bio, page,
> -				   PAGE_SIZE, 0);
> +				   PAGE_SIZE, 0, false);
>  
>  		if (ret == PAGE_SIZE) {
>  			nr_pages++;
> @@ -616,7 +616,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>  							  comp_bio, 0);
>  
>  		page->mapping = NULL;
> -		if (submit || bio_add_page(comp_bio, page, PAGE_SIZE, 0) <
> +		if (submit || bio_add_page(comp_bio, page, PAGE_SIZE, 0, false) <
>  		    PAGE_SIZE) {
>  			ret = btrfs_bio_wq_end_io(fs_info, comp_bio,
>  						  BTRFS_WQ_ENDIO_DATA);
> @@ -649,7 +649,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>  			comp_bio->bi_private = cb;
>  			comp_bio->bi_end_io = end_compressed_bio_read;
>  
> -			bio_add_page(comp_bio, page, PAGE_SIZE, 0);
> +			bio_add_page(comp_bio, page, PAGE_SIZE, 0, false);
>  		}
>  		cur_disk_byte += PAGE_SIZE;
>  	}
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 7485910fdff0..e3ddfff82c12 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2042,7 +2042,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
>  	}
>  	bio_set_dev(bio, dev->bdev);
>  	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC;
> -	bio_add_page(bio, page, length, pg_offset);
> +	bio_add_page(bio, page, length, pg_offset, false);
>  
>  	if (btrfsic_submit_bio_wait(bio)) {
>  		/* try to remap that extent elsewhere? */
> @@ -2357,7 +2357,7 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
>  		       csum_size);
>  	}
>  
> -	bio_add_page(bio, page, failrec->len, pg_offset);
> +	bio_add_page(bio, page, failrec->len, pg_offset, false);
>  
>  	return bio;
>  }
> @@ -2775,7 +2775,7 @@ static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
>  
>  		if (prev_bio_flags != bio_flags || !contig || !can_merge ||
>  		    force_bio_submit ||
> -		    bio_add_page(bio, page, page_size, pg_offset) < page_size) {
> +		    bio_add_page(bio, page, page_size, pg_offset, false) < page_size) {
>  			ret = submit_one_bio(bio, mirror_num, prev_bio_flags);
>  			if (ret < 0) {
>  				*bio_ret = NULL;
> @@ -2790,7 +2790,7 @@ static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
>  	}
>  
>  	bio = btrfs_bio_alloc(bdev, offset);
> -	bio_add_page(bio, page, page_size, pg_offset);
> +	bio_add_page(bio, page, page_size, pg_offset, false);
>  	bio->bi_end_io = end_io_func;
>  	bio->bi_private = tree;
>  	bio->bi_write_hint = page->mapping->host->i_write_hint;
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index f02532ef34f0..5d2a3b8cf45c 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1097,7 +1097,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
>  		    !last->bi_status &&
>  		    last->bi_disk == stripe->dev->bdev->bd_disk &&
>  		    last->bi_partno == stripe->dev->bdev->bd_partno) {
> -			ret = bio_add_page(last, page, PAGE_SIZE, 0);
> +			ret = bio_add_page(last, page, PAGE_SIZE, 0, false);
>  			if (ret == PAGE_SIZE)
>  				return 0;
>  		}
> @@ -1109,7 +1109,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
>  	bio_set_dev(bio, stripe->dev->bdev);
>  	bio->bi_iter.bi_sector = disk_start >> 9;
>  
> -	bio_add_page(bio, page, PAGE_SIZE, 0);
> +	bio_add_page(bio, page, PAGE_SIZE, 0, false);
>  	bio_list_add(bio_list, bio);
>  	return 0;
>  }
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index a99588536c79..2b63d595e9f6 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1433,7 +1433,7 @@ static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info,
>  		struct scrub_page *page = sblock->pagev[page_num];
>  
>  		WARN_ON(!page->page);
> -		bio_add_page(bio, page->page, PAGE_SIZE, 0);
> +		bio_add_page(bio, page->page, PAGE_SIZE, 0, false);
>  	}
>  
>  	if (scrub_submit_raid56_bio_wait(fs_info, bio, first_page)) {
> @@ -1486,7 +1486,7 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
>  		bio = btrfs_io_bio_alloc(1);
>  		bio_set_dev(bio, page->dev->bdev);
>  
> -		bio_add_page(bio, page->page, PAGE_SIZE, 0);
> +		bio_add_page(bio, page->page, PAGE_SIZE, 0, false);
>  		bio->bi_iter.bi_sector = page->physical >> 9;
>  		bio->bi_opf = REQ_OP_READ;
>  
> @@ -1569,7 +1569,7 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
>  		bio->bi_iter.bi_sector = page_bad->physical >> 9;
>  		bio->bi_opf = REQ_OP_WRITE;
>  
> -		ret = bio_add_page(bio, page_good->page, PAGE_SIZE, 0);
> +		ret = bio_add_page(bio, page_good->page, PAGE_SIZE, 0, false);
>  		if (PAGE_SIZE != ret) {
>  			bio_put(bio);
>  			return -EIO;
> @@ -1670,7 +1670,7 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
>  		goto again;
>  	}
>  
> -	ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0);
> +	ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0, false);
>  	if (ret != PAGE_SIZE) {
>  		if (sbio->page_count < 1) {
>  			bio_put(sbio->bio);
> @@ -2071,7 +2071,7 @@ static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx,
>  	}
>  
>  	sbio->pagev[sbio->page_count] = spage;
> -	ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0);
> +	ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0, false);
>  	if (ret != PAGE_SIZE) {
>  		if (sbio->page_count < 1) {
>  			bio_put(sbio->bio);
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 91c4bfde03e5..74aae2aa69c4 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3075,7 +3075,7 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
>  	bio_set_dev(bio, bh->b_bdev);
>  	bio->bi_write_hint = write_hint;
>  
> -	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
> +	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh), false);
>  	BUG_ON(bio->bi_iter.bi_size != bh->b_size);
>  
>  	bio->bi_end_io = end_bio_bh_io_sync;
> diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
> index 51763b09a11b..604766e24a46 100644
> --- a/fs/crypto/bio.c
> +++ b/fs/crypto/bio.c
> @@ -131,7 +131,7 @@ int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
>  			pblk << (inode->i_sb->s_blocksize_bits - 9);
>  		bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>  		ret = bio_add_page(bio, ciphertext_page,
> -					inode->i_sb->s_blocksize, 0);
> +					inode->i_sb->s_blocksize, 0, false);
>  		if (ret != inode->i_sb->s_blocksize) {
>  			/* should never happen! */
>  			WARN_ON(1);
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index e9f3b79048ae..b8b5d8e31aeb 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -761,7 +761,7 @@ static inline int dio_bio_add_page(struct dio_submit *sdio)
>  	int ret;
>  
>  	ret = bio_add_page(sdio->bio, sdio->cur_page,
> -			sdio->cur_page_len, sdio->cur_page_offset);
> +			sdio->cur_page_len, sdio->cur_page_offset, false);
>  	if (ret == sdio->cur_page_len) {
>  		/*
>  		 * Decrement count only, if we are done with this page
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 4cd321328c18..a76ce3346705 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -402,7 +402,7 @@ static int io_submit_add_bh(struct ext4_io_submit *io,
>  			return ret;
>  		io->io_bio->bi_write_hint = inode->i_write_hint;
>  	}
> -	ret = bio_add_page(io->io_bio, page, bh->b_size, bh_offset(bh));
> +	ret = bio_add_page(io->io_bio, page, bh->b_size, bh_offset(bh), false);
>  	if (ret != bh->b_size)
>  		goto submit_and_retry;
>  	wbc_account_io(io->io_wbc, page, bh->b_size);
> diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
> index 84222b89da52..90ee8263d266 100644
> --- a/fs/ext4/readpage.c
> +++ b/fs/ext4/readpage.c
> @@ -264,7 +264,7 @@ int ext4_mpage_readpages(struct address_space *mapping,
>  		}
>  
>  		length = first_hole << blkbits;
> -		if (bio_add_page(bio, page, length, 0) < length)
> +		if (bio_add_page(bio, page, length, 0, false) < length)
>  			goto submit_and_realloc;
>  
>  		if (((map.m_flags & EXT4_MAP_BOUNDARY) &&
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 51bf04ba2599..24353c9c8a41 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -308,7 +308,7 @@ static inline void __submit_bio(struct f2fs_sb_info *sbi,
>  			SetPagePrivate(page);
>  			set_page_private(page, (unsigned long)DUMMY_WRITTEN_PAGE);
>  			lock_page(page);
> -			if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE)
> +			if (bio_add_page(bio, page, PAGE_SIZE, 0, false) < PAGE_SIZE)
>  				f2fs_bug_on(sbi, 1);
>  		}
>  		/*
> @@ -461,7 +461,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
>  	bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
>  				1, is_read_io(fio->op), fio->type, fio->temp);
>  
> -	if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
> +	if (bio_add_page(bio, page, PAGE_SIZE, 0, false) < PAGE_SIZE) {
>  		bio_put(bio);
>  		return -EFAULT;
>  	}
> @@ -530,7 +530,7 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
>  		io->fio = *fio;
>  	}
>  
> -	if (bio_add_page(io->bio, bio_page, PAGE_SIZE, 0) < PAGE_SIZE) {
> +	if (bio_add_page(io->bio, bio_page, PAGE_SIZE, 0, false) < PAGE_SIZE) {
>  		__submit_merged_bio(io);
>  		goto alloc_new;
>  	}
> @@ -598,7 +598,7 @@ static int f2fs_submit_page_read(struct inode *inode, struct page *page,
>  	/* wait for GCed page writeback via META_MAPPING */
>  	f2fs_wait_on_block_writeback(inode, blkaddr);
>  
> -	if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
> +	if (bio_add_page(bio, page, PAGE_SIZE, 0, false) < PAGE_SIZE) {
>  		bio_put(bio);
>  		return -EFAULT;
>  	}
> @@ -1621,7 +1621,7 @@ static int f2fs_mpage_readpages(struct address_space *mapping,
>  		 */
>  		f2fs_wait_on_block_writeback(inode, block_nr);
>  
> -		if (bio_add_page(bio, page, blocksize, 0) < blocksize)
> +		if (bio_add_page(bio, page, blocksize, 0, false) < blocksize)
>  			goto submit_and_realloc;
>  
>  		inc_page_count(F2FS_I_SB(inode), F2FS_RD_DATA);
> diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> index e0523ef8421e..3dca16f510b7 100644
> --- a/fs/gfs2/lops.c
> +++ b/fs/gfs2/lops.c
> @@ -334,11 +334,11 @@ void gfs2_log_write(struct gfs2_sbd *sdp, struct page *page,
>  
>  	bio = gfs2_log_get_bio(sdp, blkno, &sdp->sd_log_bio, REQ_OP_WRITE,
>  			       gfs2_end_log_write, false);
> -	ret = bio_add_page(bio, page, size, offset);
> +	ret = bio_add_page(bio, page, size, offset, false);
>  	if (ret == 0) {
>  		bio = gfs2_log_get_bio(sdp, blkno, &sdp->sd_log_bio,
>  				       REQ_OP_WRITE, gfs2_end_log_write, true);
> -		ret = bio_add_page(bio, page, size, offset);
> +		ret = bio_add_page(bio, page, size, offset, false);
>  		WARN_ON(ret == 0);
>  	}
>  }
> diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
> index a7e645d08942..c7db0f249002 100644
> --- a/fs/gfs2/meta_io.c
> +++ b/fs/gfs2/meta_io.c
> @@ -225,7 +225,7 @@ static void gfs2_submit_bhs(int op, int op_flags, struct buffer_head *bhs[],
>  		bio_set_dev(bio, bh->b_bdev);
>  		while (num > 0) {
>  			bh = *bhs;
> -			if (!bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh))) {
> +			if (!bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh), false)) {
>  				BUG_ON(bio->bi_iter.bi_size == 0);
>  				break;
>  			}
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index b041cb8ae383..cdd52e6c02f7 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -243,7 +243,7 @@ static int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector, int silent)
>  	bio = bio_alloc(GFP_NOFS, 1);
>  	bio->bi_iter.bi_sector = sector * (sb->s_blocksize >> 9);
>  	bio_set_dev(bio, sb->s_bdev);
> -	bio_add_page(bio, page, PAGE_SIZE, 0);
> +	bio_add_page(bio, page, PAGE_SIZE, 0, false);
>  
>  	bio->bi_end_io = end_bio_io_page;
>  	bio->bi_private = page;
> diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c
> index 08c1580bdf7a..3eff6b4dcb69 100644
> --- a/fs/hfsplus/wrapper.c
> +++ b/fs/hfsplus/wrapper.c
> @@ -77,7 +77,8 @@ int hfsplus_submit_bio(struct super_block *sb, sector_t sector,
>  		unsigned int len = min_t(unsigned int, PAGE_SIZE - page_offset,
>  					 io_size);
>  
> -		ret = bio_add_page(bio, virt_to_page(buf), len, page_offset);
> +		ret = bio_add_page(bio, virt_to_page(buf), len, page_offset,
> +				   false);
>  		if (ret != len) {
>  			ret = -EIO;
>  			goto out;
> diff --git a/fs/iomap.c b/fs/iomap.c
> index ab578054ebe9..c706fd2b0f6e 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -356,7 +356,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		ctx->bio->bi_end_io = iomap_read_end_io;
>  	}
>  
> -	bio_add_page(ctx->bio, page, plen, poff);
> +	bio_add_page(ctx->bio, page, plen, poff, false);
>  done:
>  	/*
>  	 * Move the caller beyond our range so that it keeps making progress.
> @@ -624,7 +624,7 @@ iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
>  	bio.bi_opf = REQ_OP_READ;
>  	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
>  	bio_set_dev(&bio, iomap->bdev);
> -	__bio_add_page(&bio, page, plen, poff);
> +	__bio_add_page(&bio, page, plen, poff, false);
>  	return submit_bio_wait(&bio);
>  }
>  
> @@ -1616,7 +1616,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
>  	bio->bi_end_io = iomap_dio_bio_end_io;
>  
>  	get_page(page);
> -	__bio_add_page(bio, page, len, 0);
> +	__bio_add_page(bio, page, len, 0, false);
>  	bio_set_op_attrs(bio, REQ_OP_WRITE, flags);
>  	iomap_dio_submit_bio(dio, iomap, bio);
>  }
> diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
> index 6b68df395892..42a8c1a8fb77 100644
> --- a/fs/jfs/jfs_logmgr.c
> +++ b/fs/jfs/jfs_logmgr.c
> @@ -1997,7 +1997,7 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp)
>  	bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
>  	bio_set_dev(bio, log->bdev);
>  
> -	bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
> +	bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset, false);
>  	BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
>  
>  	bio->bi_end_io = lbmIODone;
> @@ -2141,7 +2141,7 @@ static void lbmStartIO(struct lbuf * bp)
>  	bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
>  	bio_set_dev(bio, log->bdev);
>  
> -	bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
> +	bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset, false);
>  	BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
>  
>  	bio->bi_end_io = lbmIODone;
> diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
> index fa2c6824c7f2..6f66f0a15768 100644
> --- a/fs/jfs/jfs_metapage.c
> +++ b/fs/jfs/jfs_metapage.c
> @@ -401,7 +401,7 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
>  				continue;
>  			}
>  			/* Not contiguous */
> -			if (bio_add_page(bio, page, bio_bytes, bio_offset) <
> +			if (bio_add_page(bio, page, bio_bytes, bio_offset, false) <
>  			    bio_bytes)
>  				goto add_failed;
>  			/*
> @@ -444,7 +444,7 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
>  		next_block = lblock + len;
>  	}
>  	if (bio) {
> -		if (bio_add_page(bio, page, bio_bytes, bio_offset) < bio_bytes)
> +		if (bio_add_page(bio, page, bio_bytes, bio_offset, false) < bio_bytes)
>  				goto add_failed;
>  		if (!bio->bi_iter.bi_size)
>  			goto dump_bio;
> @@ -518,7 +518,7 @@ static int metapage_readpage(struct file *fp, struct page *page)
>  			bio_set_op_attrs(bio, REQ_OP_READ, 0);
>  			len = xlen << inode->i_blkbits;
>  			offset = block_offset << inode->i_blkbits;
> -			if (bio_add_page(bio, page, len, offset) < len)
> +			if (bio_add_page(bio, page, len, offset, false) < len)
>  				goto add_failed;
>  			block_offset += xlen;
>  		} else
> diff --git a/fs/mpage.c b/fs/mpage.c
> index e234c9a8802d..67e6d1dda984 100644
> --- a/fs/mpage.c
> +++ b/fs/mpage.c
> @@ -313,7 +313,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
>  	}
>  
>  	length = first_hole << blkbits;
> -	if (bio_add_page(args->bio, page, length, 0) < length) {
> +	if (bio_add_page(args->bio, page, length, 0, false) < length) {
>  		args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio);
>  		goto alloc_new;
>  	}
> @@ -650,7 +650,7 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
>  	 */
>  	wbc_account_io(wbc, page, PAGE_SIZE);
>  	length = first_unmapped << blkbits;
> -	if (bio_add_page(bio, page, length, 0) < length) {
> +	if (bio_add_page(bio, page, length, 0, false) < length) {
>  		bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio);
>  		goto alloc_new;
>  	}
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 690221747b47..fb58bf7bc06f 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -182,7 +182,7 @@ do_add_page_to_bio(struct bio *bio, int npg, int rw, sector_t isect,
>  			return ERR_PTR(-ENOMEM);
>  		bio_set_op_attrs(bio, rw, 0);
>  	}
> -	if (bio_add_page(bio, page, *len, offset) < *len) {
> +	if (bio_add_page(bio, page, *len, offset, false) < *len) {
>  		bio = bl_submit_bio(bio);
>  		goto retry;
>  	}
> diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
> index 20c479b5e41b..64ecdab529c7 100644
> --- a/fs/nilfs2/segbuf.c
> +++ b/fs/nilfs2/segbuf.c
> @@ -424,7 +424,8 @@ static int nilfs_segbuf_submit_bh(struct nilfs_segment_buffer *segbuf,
>  			return -ENOMEM;
>  	}
>  
> -	len = bio_add_page(wi->bio, bh->b_page, bh->b_size, bh_offset(bh));
> +	len = bio_add_page(wi->bio, bh->b_page, bh->b_size, bh_offset(bh),
> +			   false);
>  	if (len == bh->b_size) {
>  		wi->end++;
>  		return 0;
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index f3c20b279eb2..e8c209c2e348 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -569,7 +569,7 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
>  		mlog(ML_HB_BIO, "page %d, vec_len = %u, vec_start = %u\n",
>  		     current_page, vec_len, vec_start);
>  
> -		len = bio_add_page(bio, page, vec_len, vec_start);
> +		len = bio_add_page(bio, page, vec_len, vec_start, false);
>  		if (len != vec_len) break;
>  
>  		cs += vec_len / (PAGE_SIZE/spp);
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index d152d1ab2ad1..085ccd01e059 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -667,7 +667,7 @@ xfs_add_to_ioend(
>  			atomic_inc(&iop->write_count);
>  		if (bio_full(wpc->ioend->io_bio))
>  			xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
> -		bio_add_page(wpc->ioend->io_bio, page, len, poff);
> +		bio_add_page(wpc->ioend->io_bio, page, len, poff, false);
>  	}
>  
>  	wpc->ioend->io_size += len;
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 548344e25128..2b981cf8d2af 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1389,7 +1389,7 @@ xfs_buf_ioapply_map(
>  			nbytes = size;
>  
>  		rbytes = bio_add_page(bio, bp->b_pages[page_index], nbytes,
> -				      offset);
> +				      offset, false);
>  		if (rbytes < nbytes)
>  			break;
>  
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 6ac4f6b192e6..05fcc5227d0e 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -429,13 +429,14 @@ extern void bio_uninit(struct bio *);
>  extern void bio_reset(struct bio *);
>  void bio_chain(struct bio *, struct bio *);
>  
> -extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
> +extern int bio_add_page(struct bio *, struct page *, unsigned int,
> +			unsigned int, bool is_gup);
>  extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
> -			   unsigned int, unsigned int);
> +			   unsigned int, unsigned int, bool is_gup);
>  bool __bio_try_merge_page(struct bio *bio, struct page *page,
>  		unsigned int len, unsigned int off, bool same_page);
>  void __bio_add_page(struct bio *bio, struct page *page,
> -		unsigned int len, unsigned int off);
> +		unsigned int len, unsigned int off, bool is_gup);
>  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
>  struct rq_map_data;
>  extern struct bio *bio_map_user_iov(struct request_queue *,
> diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> index d7f6c1a288d3..ca5e0e1576e3 100644
> --- a/kernel/power/swap.c
> +++ b/kernel/power/swap.c
> @@ -274,7 +274,7 @@ static int hib_submit_io(int op, int op_flags, pgoff_t page_off, void *addr,
>  	bio_set_dev(bio, hib_resume_bdev);
>  	bio_set_op_attrs(bio, op, op_flags);
>  
> -	if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
> +	if (bio_add_page(bio, page, PAGE_SIZE, 0, false) < PAGE_SIZE) {
>  		pr_err("Adding page to bio failed at %llu\n",
>  		       (unsigned long long)bio->bi_iter.bi_sector);
>  		bio_put(bio);
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 6b3be0445c61..c36bfe4ba317 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -42,7 +42,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
>  		bio->bi_end_io = end_io;
>  
>  		for (i = 0; i < nr; i++)
> -			bio_add_page(bio, page + i, PAGE_SIZE, 0);
> +			bio_add_page(bio, page + i, PAGE_SIZE, 0, false);
>  		VM_BUG_ON(bio->bi_iter.bi_size != PAGE_SIZE * nr);
>  	}
>  	return bio;
> -- 
> 2.20.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 15/15] ceph: use put_user_pages() instead of ceph_put_page_vector()
  2019-04-15  7:46   ` Yan, Zheng
@ 2019-04-15 15:11     ` Jerome Glisse
  0 siblings, 0 replies; 47+ messages in thread
From: Jerome Glisse @ 2019-04-15 15:11 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Jan Kara, Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Dave Chinner,
	Jason Gunthorpe, Matthew Wilcox, Sage Weil, Ilya Dryomov,
	ceph-devel

On Mon, Apr 15, 2019 at 03:46:59PM +0800, Yan, Zheng wrote:
> On 4/12/19 5:08 AM, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > When page reference were taken through GUP (get_user_page*()) we need
> > to drop them with put_user_pages().
> > 
> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: linux-block@vger.kernel.org
> > Cc: linux-mm@kvack.org
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Johannes Thumshirn <jthumshirn@suse.de>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Jens Axboe <axboe@kernel.dk>
> > Cc: Ming Lei <ming.lei@redhat.com>
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Yan Zheng <zyan@redhat.com>
> > Cc: Sage Weil <sage@redhat.com>
> > Cc: Ilya Dryomov <idryomov@gmail.com>
> > Cc: ceph-devel@vger.kernel.org
> > ---
> >   fs/ceph/file.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 6c5b85f01721..5842ad3a4218 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -667,7 +667,8 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
> >   			} else {
> >   				iov_iter_advance(to, 0);
> >   			}
> > -			ceph_put_page_vector(pages, num_pages, false);
> > +			/* iov_iter_get_pages_alloc() did call GUP */
> > +			put_user_pages(pages, num_pages);
> 
> pages in pipe were not from get_user_pages(). Am I missing anything?

Oh my mistake i miss-read iov_iter_is_pipe() and iov_iter_get_pages_alloc()
missed the special case for pipe before iterate_all_kinds() Thanks for
catching that.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page()
  2019-04-15 14:59   ` Jan Kara
@ 2019-04-15 15:24     ` Jerome Glisse
  2019-04-16 16:46       ` Jan Kara
  2019-04-16  0:22     ` Jerome Glisse
  1 sibling, 1 reply; 47+ messages in thread
From: Jerome Glisse @ 2019-04-15 15:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Dave Chinner,
	Jason Gunthorpe, Matthew Wilcox

On Mon, Apr 15, 2019 at 04:59:52PM +0200, Jan Kara wrote:
> Hi Jerome!
> 
> On Thu 11-04-19 17:08:29, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > We want to keep track of how we got a reference on page added to bio_vec
> > ie wether the page was reference through GUP (get_user_page*) or not. So
> > add a flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() to that
> > effect.
> 
> Thanks for writing this patch set! Looking through patches like this one,
> I'm a bit concerned. With so many bio_add_page() callers it's difficult to
> get things right and not regress in the future. I'm wondering whether the
> things won't be less error-prone if we required that all page reference
> from bio are gup-like (not necessarily taken by GUP, if creator of the bio
> gets to struct page he needs via some other means (e.g. page cache lookup),
> he could just use get_gup_pin() helper we'd provide).  After all, a page
> reference in bio means that the page is pinned for the duration of IO and
> can be DMAed to/from so it even makes some sense to track the reference
> like that. Then bio_put() would just unconditionally do put_user_page() and
> we won't have to propagate the information in the bio.
> 
> Do you think this would be workable and easier?

It might be workable but i am not sure it is any simpler. bio_add_page*()
does not take page reference it is up to the caller to take the proper
page reference so the complexity would be push there (just in a different
place) so i don't think it would be any simpler. This means that we would
have to update more code than this patchset does.

This present patch is just a coccinelle semantic patch and even if it
is scary to see that many call site, they are not that many that need
to worry about the GUP parameter and they all are in patch 11, 12, 13
and 14.

So i believe this patchset is simpler than converting everyone to take
a GUP like page reference. Also doing so means we loose the information
about GUP kind of defeat the purpose. So i believe it would be better
to limit special reference to GUP only pages.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
                   ` (14 preceding siblings ...)
  2019-04-11 21:08 ` [PATCH v1 15/15] ceph: use put_user_pages() instead of ceph_put_page_vector() jglisse
@ 2019-04-16  0:00 ` Dave Chinner
       [not found] ` <2c124cc4-b97e-ee28-2926-305bc6bc74bd@plexistor.com>
  16 siblings, 0 replies; 47+ messages in thread
From: Dave Chinner @ 2019-04-16  0:00 UTC (permalink / raw)
  To: jglisse
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Jan Kara, Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, samba-technical,
	Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder, ceph-devel,
	Eric Van Hensbergen, Latchesar Ionkov, Mike Marshall,
	Martin Brandenburg, devel, Dominique Martinet, v9fs-developer,
	Coly Li, Kent Overstreet, linux-bcache,
	Ernesto A . Fernández

On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> This patchset depends on various small fixes [1] and also on patchset
> which introduce put_user_page*() [2] and thus is 5.3 material as those
> pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> so that it can get review and comments on how and what should be done
> to test things.
> 
> For various reasons [2] [3] we want to track page reference through GUP
> differently than "regular" page reference. Thus we need to keep track
> of how we got a page within the block and fs layer. To do so this patch-
> set change the bio_bvec struct to store a pfn and flags instead of a
> direct pointer to a page. This way we can flag page that are coming from
> GUP.
> 
> This patchset is divided as follow:
>     - First part of the patchset is just small cleanup i believe they
>       can go in as his assuming people are ok with them.
>     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
>       done in multi-step, first we replace all direct dereference of
>       the field by call to inline helper, then we introduce macro for
>       bio_bvec that are initialized on the stack. Finaly we change the
>       bv_page field to bv_pfn.
>     - Third part replace put_page(bv_page(bio_vec)) with a new helper
>       which will use put_user_page() when the page in the bio_vec is
>       coming from GUP.
>     - Fourth part update BIO to use bv_set_user_page() for page that
>       are coming from GUP this means updating bio_add_page*() to pass
>       down the origin of the page (GUP or not).
>     - Fith part convert few more places that directly use bvec_io or
>       BIO.
> 
> Note that after this patchset they are still places in the kernel where
> we should use put_user_page*(). The intention is to separate that task
> in chewable chunk (driver by driver, sub-system by sub-system).
> 
> 
> I have only lightly tested this patchset (branch [4]) on my desktop and
> have not seen anything obviously wrong but i might have miss something.
> What kind of test suite should i run to stress test the vfs/block layer
> around DIO and BIO ?

Such widespread changes need full correctness tests run on them. I'd
suggest fstests (auto group) be run on all the filesystems it
supports that are affected by the changes in the patchset. Given you
touched bio_add_page() here, that's probably all of them....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page()
  2019-04-15 14:59   ` Jan Kara
  2019-04-15 15:24     ` Jerome Glisse
@ 2019-04-16  0:22     ` Jerome Glisse
  2019-04-16 16:52       ` Jan Kara
  1 sibling, 1 reply; 47+ messages in thread
From: Jerome Glisse @ 2019-04-16  0:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Dave Chinner,
	Jason Gunthorpe, Matthew Wilcox

On Mon, Apr 15, 2019 at 04:59:52PM +0200, Jan Kara wrote:
> Hi Jerome!
> 
> On Thu 11-04-19 17:08:29, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > We want to keep track of how we got a reference on page added to bio_vec
> > ie wether the page was reference through GUP (get_user_page*) or not. So
> > add a flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() to that
> > effect.
> 
> Thanks for writing this patch set! Looking through patches like this one,
> I'm a bit concerned. With so many bio_add_page() callers it's difficult to
> get things right and not regress in the future. I'm wondering whether the
> things won't be less error-prone if we required that all page reference
> from bio are gup-like (not necessarily taken by GUP, if creator of the bio
> gets to struct page he needs via some other means (e.g. page cache lookup),
> he could just use get_gup_pin() helper we'd provide).  After all, a page
> reference in bio means that the page is pinned for the duration of IO and
> can be DMAed to/from so it even makes some sense to track the reference
> like that. Then bio_put() would just unconditionally do put_user_page() and
> we won't have to propagate the information in the bio.
> 
> Do you think this would be workable and easier?

Thinking again on this, i can drop that patch and just add a new
bio_add_page_from_gup() and then it would be much more obvious that
only very few places need to use that new version and they are mostly
obvious places. It is usualy GUP then right away add the pages to bio
or bvec.

We can probably add documentation around GUP explaining that if you
want to build a bio or bvec from GUP you must pay attention to which
function you use.

Also pages going in a bio are not necessarily written too, they can
be use as source (writting to block) or as destination (reading from
block). So having all of them with refcount bias as GUP would muddy
the water somemore between pages we can no longer clean (ie GUPed)
and those that are just being use in regular read or write operation.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page()
  2019-04-15 15:24     ` Jerome Glisse
@ 2019-04-16 16:46       ` Jan Kara
  2019-04-16 16:54         ` Dan Williams
  2019-04-16 17:07         ` Jerome Glisse
  0 siblings, 2 replies; 47+ messages in thread
From: Jan Kara @ 2019-04-16 16:46 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Jan Kara, linux-kernel, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Dave Chinner,
	Jason Gunthorpe, Matthew Wilcox

On Mon 15-04-19 11:24:33, Jerome Glisse wrote:
> On Mon, Apr 15, 2019 at 04:59:52PM +0200, Jan Kara wrote:
> > Hi Jerome!
> > 
> > On Thu 11-04-19 17:08:29, jglisse@redhat.com wrote:
> > > From: Jérôme Glisse <jglisse@redhat.com>
> > > 
> > > We want to keep track of how we got a reference on page added to bio_vec
> > > ie wether the page was reference through GUP (get_user_page*) or not. So
> > > add a flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() to that
> > > effect.
> > 
> > Thanks for writing this patch set! Looking through patches like this one,
> > I'm a bit concerned. With so many bio_add_page() callers it's difficult to
> > get things right and not regress in the future. I'm wondering whether the
> > things won't be less error-prone if we required that all page reference
> > from bio are gup-like (not necessarily taken by GUP, if creator of the bio
> > gets to struct page he needs via some other means (e.g. page cache lookup),
> > he could just use get_gup_pin() helper we'd provide).  After all, a page
> > reference in bio means that the page is pinned for the duration of IO and
> > can be DMAed to/from so it even makes some sense to track the reference
> > like that. Then bio_put() would just unconditionally do put_user_page() and
> > we won't have to propagate the information in the bio.
> > 
> > Do you think this would be workable and easier?
> 
> It might be workable but i am not sure it is any simpler. bio_add_page*()
> does not take page reference it is up to the caller to take the proper
> page reference so the complexity would be push there (just in a different
> place) so i don't think it would be any simpler. This means that we would
> have to update more code than this patchset does.

I agree that the amount of work in this patch set is about the same
(although you don't have to pass the information about reference type in
the biovec so you save the complexities there). But for the future the
rule that "bio references must be gup-pins" is IMO easier to grasp for
developers and you can reasonably assert it in bio_add_page().

> This present patch is just a coccinelle semantic patch and even if it
> is scary to see that many call site, they are not that many that need
> to worry about the GUP parameter and they all are in patch 11, 12, 13
> and 14.
> 
> So i believe this patchset is simpler than converting everyone to take
> a GUP like page reference. Also doing so means we loose the information
> about GUP kind of defeat the purpose. So i believe it would be better
> to limit special reference to GUP only pages.

So what's the difference whether the page reference has been acquired via
GUP or via some other means? I don't think that really matters. If say
infiniband introduced new ioctl() that takes file descriptor, offset, and
length and just takes pages from page cache and attaches them to their RDMA
scatter-gather lists, then they'd need to use 'pin' references anyway...

Then why do we work on differentiating between GUP pins and other page
references?  Because it matters what the reference is going to be used for
and what is it's lifetime. And generally GUP references are used to do IO
to/from page and may even be controlled by userspace so that's why we need
to make them different. But in principle the 'gup-pin' reference is not about
the fact that the reference has been obtained from GUP but about the fact
that it is used to do IO. Hence I think that the rule "bio references must
be gup-pins" makes some sense.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page()
  2019-04-16  0:22     ` Jerome Glisse
@ 2019-04-16 16:52       ` Jan Kara
  2019-04-16 18:32         ` Jerome Glisse
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2019-04-16 16:52 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Jan Kara, linux-kernel, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Dave Chinner,
	Jason Gunthorpe, Matthew Wilcox

On Mon 15-04-19 20:22:04, Jerome Glisse wrote:
> On Mon, Apr 15, 2019 at 04:59:52PM +0200, Jan Kara wrote:
> > Hi Jerome!
> > 
> > On Thu 11-04-19 17:08:29, jglisse@redhat.com wrote:
> > > From: Jérôme Glisse <jglisse@redhat.com>
> > > 
> > > We want to keep track of how we got a reference on page added to bio_vec
> > > ie wether the page was reference through GUP (get_user_page*) or not. So
> > > add a flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() to that
> > > effect.
> > 
> > Thanks for writing this patch set! Looking through patches like this one,
> > I'm a bit concerned. With so many bio_add_page() callers it's difficult to
> > get things right and not regress in the future. I'm wondering whether the
> > things won't be less error-prone if we required that all page reference
> > from bio are gup-like (not necessarily taken by GUP, if creator of the bio
> > gets to struct page he needs via some other means (e.g. page cache lookup),
> > he could just use get_gup_pin() helper we'd provide).  After all, a page
> > reference in bio means that the page is pinned for the duration of IO and
> > can be DMAed to/from so it even makes some sense to track the reference
> > like that. Then bio_put() would just unconditionally do put_user_page() and
> > we won't have to propagate the information in the bio.
> > 
> > Do you think this would be workable and easier?
> 
> Thinking again on this, i can drop that patch and just add a new
> bio_add_page_from_gup() and then it would be much more obvious that
> only very few places need to use that new version and they are mostly
> obvious places. It is usualy GUP then right away add the pages to bio
> or bvec.

Yes, that's another option. Probably second preferred by me after my own
proposal ;)

> We can probably add documentation around GUP explaining that if you
> want to build a bio or bvec from GUP you must pay attention to which
> function you use.

Yes, although we both know how careful people are in reading
documentation...

> Also pages going in a bio are not necessarily written too, they can
> be use as source (writting to block) or as destination (reading from
> block). So having all of them with refcount bias as GUP would muddy
> the water somemore between pages we can no longer clean (ie GUPed)
> and those that are just being use in regular read or write operation.

Why would the difference matter here?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page()
  2019-04-16 16:46       ` Jan Kara
@ 2019-04-16 16:54         ` Dan Williams
  2019-04-16 17:07         ` Jerome Glisse
  1 sibling, 0 replies; 47+ messages in thread
From: Dan Williams @ 2019-04-16 16:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jerome Glisse, Linux Kernel Mailing List, linux-fsdevel,
	linux-block, Linux MM, John Hubbard, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Dave Chinner, Jason Gunthorpe, Matthew Wilcox

On Tue, Apr 16, 2019 at 9:47 AM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 15-04-19 11:24:33, Jerome Glisse wrote:
> > On Mon, Apr 15, 2019 at 04:59:52PM +0200, Jan Kara wrote:
> > > Hi Jerome!
> > >
> > > On Thu 11-04-19 17:08:29, jglisse@redhat.com wrote:
> > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > >
> > > > We want to keep track of how we got a reference on page added to bio_vec
> > > > ie wether the page was reference through GUP (get_user_page*) or not. So
> > > > add a flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() to that
> > > > effect.
> > >
> > > Thanks for writing this patch set! Looking through patches like this one,
> > > I'm a bit concerned. With so many bio_add_page() callers it's difficult to
> > > get things right and not regress in the future. I'm wondering whether the
> > > things won't be less error-prone if we required that all page reference
> > > from bio are gup-like (not necessarily taken by GUP, if creator of the bio
> > > gets to struct page he needs via some other means (e.g. page cache lookup),
> > > he could just use get_gup_pin() helper we'd provide).  After all, a page
> > > reference in bio means that the page is pinned for the duration of IO and
> > > can be DMAed to/from so it even makes some sense to track the reference
> > > like that. Then bio_put() would just unconditionally do put_user_page() and
> > > we won't have to propagate the information in the bio.
> > >
> > > Do you think this would be workable and easier?
> >
> > It might be workable but i am not sure it is any simpler. bio_add_page*()
> > does not take page reference it is up to the caller to take the proper
> > page reference so the complexity would be push there (just in a different
> > place) so i don't think it would be any simpler. This means that we would
> > have to update more code than this patchset does.
>
> I agree that the amount of work in this patch set is about the same
> (although you don't have to pass the information about reference type in
> the biovec so you save the complexities there). But for the future the
> rule that "bio references must be gup-pins" is IMO easier to grasp for
> developers and you can reasonably assert it in bio_add_page().
>
> > This present patch is just a coccinelle semantic patch and even if it
> > is scary to see that many call site, they are not that many that need
> > to worry about the GUP parameter and they all are in patch 11, 12, 13
> > and 14.
> >
> > So i believe this patchset is simpler than converting everyone to take
> > a GUP like page reference. Also doing so means we loose the information
> > about GUP kind of defeat the purpose. So i believe it would be better
> > to limit special reference to GUP only pages.
>
> So what's the difference whether the page reference has been acquired via
> GUP or via some other means? I don't think that really matters. If say
> infiniband introduced new ioctl() that takes file descriptor, offset, and
> length and just takes pages from page cache and attaches them to their RDMA
> scatter-gather lists, then they'd need to use 'pin' references anyway...
>
> Then why do we work on differentiating between GUP pins and other page
> references?  Because it matters what the reference is going to be used for
> and what is it's lifetime. And generally GUP references are used to do IO
> to/from page and may even be controlled by userspace so that's why we need
> to make them different. But in principle the 'gup-pin' reference is not about
> the fact that the reference has been obtained from GUP but about the fact
> that it is used to do IO. Hence I think that the rule "bio references must
> be gup-pins" makes some sense.

+1 to this idea. I don't see the need to preserve the concept that
some biovecs carry non-GUP pages.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page()
  2019-04-16 16:46       ` Jan Kara
  2019-04-16 16:54         ` Dan Williams
@ 2019-04-16 17:07         ` Jerome Glisse
  1 sibling, 0 replies; 47+ messages in thread
From: Jerome Glisse @ 2019-04-16 17:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Dave Chinner,
	Jason Gunthorpe, Matthew Wilcox

On Tue, Apr 16, 2019 at 06:46:58PM +0200, Jan Kara wrote:
> On Mon 15-04-19 11:24:33, Jerome Glisse wrote:
> > On Mon, Apr 15, 2019 at 04:59:52PM +0200, Jan Kara wrote:
> > > Hi Jerome!
> > > 
> > > On Thu 11-04-19 17:08:29, jglisse@redhat.com wrote:
> > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > 
> > > > We want to keep track of how we got a reference on page added to bio_vec
> > > > ie wether the page was reference through GUP (get_user_page*) or not. So
> > > > add a flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() to that
> > > > effect.
> > > 
> > > Thanks for writing this patch set! Looking through patches like this one,
> > > I'm a bit concerned. With so many bio_add_page() callers it's difficult to
> > > get things right and not regress in the future. I'm wondering whether the
> > > things won't be less error-prone if we required that all page reference
> > > from bio are gup-like (not necessarily taken by GUP, if creator of the bio
> > > gets to struct page he needs via some other means (e.g. page cache lookup),
> > > he could just use get_gup_pin() helper we'd provide).  After all, a page
> > > reference in bio means that the page is pinned for the duration of IO and
> > > can be DMAed to/from so it even makes some sense to track the reference
> > > like that. Then bio_put() would just unconditionally do put_user_page() and
> > > we won't have to propagate the information in the bio.
> > > 
> > > Do you think this would be workable and easier?
> > 
> > It might be workable but i am not sure it is any simpler. bio_add_page*()
> > does not take page reference it is up to the caller to take the proper
> > page reference so the complexity would be push there (just in a different
> > place) so i don't think it would be any simpler. This means that we would
> > have to update more code than this patchset does.
> 
> I agree that the amount of work in this patch set is about the same
> (although you don't have to pass the information about reference type in
> the biovec so you save the complexities there). But for the future the
> rule that "bio references must be gup-pins" is IMO easier to grasp for
> developers and you can reasonably assert it in bio_add_page().
> 
> > This present patch is just a coccinelle semantic patch and even if it
> > is scary to see that many call site, they are not that many that need
> > to worry about the GUP parameter and they all are in patch 11, 12, 13
> > and 14.
> > 
> > So i believe this patchset is simpler than converting everyone to take
> > a GUP like page reference. Also doing so means we loose the information
> > about GUP kind of defeat the purpose. So i believe it would be better
> > to limit special reference to GUP only pages.
> 
> So what's the difference whether the page reference has been acquired via
> GUP or via some other means? I don't think that really matters. If say
> infiniband introduced new ioctl() that takes file descriptor, offset, and
> length and just takes pages from page cache and attaches them to their RDMA
> scatter-gather lists, then they'd need to use 'pin' references anyway...
> 
> Then why do we work on differentiating between GUP pins and other page
> references?  Because it matters what the reference is going to be used for
> and what is it's lifetime. And generally GUP references are used to do IO
> to/from page and may even be controlled by userspace so that's why we need
> to make them different. But in principle the 'gup-pin' reference is not about
> the fact that the reference has been obtained from GUP but about the fact
> that it is used to do IO. Hence I think that the rule "bio references must
> be gup-pins" makes some sense.

It will break things like page protection i am working on (KSM for file
back page). Pages can go through bio for mundane reasons (crypto, network,
gpu, ...) that have nothing to do with I/O for fs and do not have to block
any of the fs operation. If we GUP bias all those pages then we will
effectively make the situation worse in that pages will have a high likely-
hood to always look GUPed while it is just going through some bio for one
of those mundane reasons (and page is not being written to just use as
a source).

I understand why conceptualy it looks appealing but we would be loosing
information here. I really want to be able to determine if a page is GUPed
or not. If we GUP bias everyone in bio then we loose that.

Also i want to point out that the complexity of biasing all page in bio
are much bigger than this patchset it will require changes to all call
site of bio_add_page*() at very least.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page()
  2019-04-16 16:52       ` Jan Kara
@ 2019-04-16 18:32         ` Jerome Glisse
  0 siblings, 0 replies; 47+ messages in thread
From: Jerome Glisse @ 2019-04-16 18:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Dave Chinner,
	Jason Gunthorpe, Matthew Wilcox

On Tue, Apr 16, 2019 at 06:52:06PM +0200, Jan Kara wrote:
> On Mon 15-04-19 20:22:04, Jerome Glisse wrote:
> > On Mon, Apr 15, 2019 at 04:59:52PM +0200, Jan Kara wrote:
> > > Hi Jerome!
> > > 
> > > On Thu 11-04-19 17:08:29, jglisse@redhat.com wrote:
> > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > 
> > > > We want to keep track of how we got a reference on page added to bio_vec
> > > > ie wether the page was reference through GUP (get_user_page*) or not. So
> > > > add a flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() to that
> > > > effect.
> > > 
> > > Thanks for writing this patch set! Looking through patches like this one,
> > > I'm a bit concerned. With so many bio_add_page() callers it's difficult to
> > > get things right and not regress in the future. I'm wondering whether the
> > > things won't be less error-prone if we required that all page reference
> > > from bio are gup-like (not necessarily taken by GUP, if creator of the bio
> > > gets to struct page he needs via some other means (e.g. page cache lookup),
> > > he could just use get_gup_pin() helper we'd provide).  After all, a page
> > > reference in bio means that the page is pinned for the duration of IO and
> > > can be DMAed to/from so it even makes some sense to track the reference
> > > like that. Then bio_put() would just unconditionally do put_user_page() and
> > > we won't have to propagate the information in the bio.
> > > 
> > > Do you think this would be workable and easier?
> > 
> > Thinking again on this, i can drop that patch and just add a new
> > bio_add_page_from_gup() and then it would be much more obvious that
> > only very few places need to use that new version and they are mostly
> > obvious places. It is usualy GUP then right away add the pages to bio
> > or bvec.
> 
> Yes, that's another option. Probably second preferred by me after my own
> proposal ;)
> 
> > We can probably add documentation around GUP explaining that if you
> > want to build a bio or bvec from GUP you must pay attention to which
> > function you use.
> 
> Yes, although we both know how careful people are in reading
> documentation...

Yes i know this is a sad state, but if enough people see comments in
enough places we should end up with more eyes aware of the gotcha and
hopefully increase the likelyhood of catching any new user.

> 
> > Also pages going in a bio are not necessarily written too, they can
> > be use as source (writting to block) or as destination (reading from
> > block). So having all of them with refcount bias as GUP would muddy
> > the water somemore between pages we can no longer clean (ie GUPed)
> > and those that are just being use in regular read or write operation.
> 
> Why would the difference matter here?

Restricting GUP like status to GUP insure that we only ever back-off
because of GUP and not because of some innocuous I/O.

I am working on a v2 that just add a new variant to add page, but i
will have to run (x)fstest before re-posting.

I also have the scatterlist convertion mostly ready:

https://cgit.freedesktop.org/~glisse/linux/log/?h=gup-scatterlist-v1

After that GUP is mostly isolated to individual driver and much easier
to track and update.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
       [not found] ` <2c124cc4-b97e-ee28-2926-305bc6bc74bd@plexistor.com>
@ 2019-04-16 18:47   ` Jerome Glisse
  2019-04-16 18:59   ` Kent Overstreet
  1 sibling, 0 replies; 47+ messages in thread
From: Jerome Glisse @ 2019-04-16 18:47 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: linux-kernel, linux-fsdevel, linux-block, linux-mm, John Hubbard,
	Jan Kara, Dan Williams, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, samba-technical,
	Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder, ceph-devel,
	Eric Van Hensbergen, Latchesar Ionkov, Mike Marshall,
	Martin Brandenburg, devel, Dominique Martinet, v9fs-developer,
	Coly Li, Kent Overstreet, linux-bcache,
	Ernesto A. Fernández

On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > This patchset depends on various small fixes [1] and also on patchset
> > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > so that it can get review and comments on how and what should be done
> > to test things.
> > 
> > For various reasons [2] [3] we want to track page reference through GUP
> > differently than "regular" page reference. Thus we need to keep track
> > of how we got a page within the block and fs layer. To do so this patch-
> > set change the bio_bvec struct to store a pfn and flags instead of a
> > direct pointer to a page. This way we can flag page that are coming from
> > GUP.
> > 
> > This patchset is divided as follow:
> >     - First part of the patchset is just small cleanup i believe they
> >       can go in as his assuming people are ok with them.
> 
> 
> >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> >       done in multi-step, first we replace all direct dereference of
> >       the field by call to inline helper, then we introduce macro for
> >       bio_bvec that are initialized on the stack. Finaly we change the
> >       bv_page field to bv_pfn.
> 
> Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> as a flag (pointer always aligned to 64 bytes in our case).
> 
> So yes we need an inline helper for reference of the page but is it not clearer
> that we assume a page* and not any kind of pfn ?
> It will not be the first place using low bits of a pointer for flags.

Yes i can use the lower bit of struct page * pointer it should be safe on
all architecture. I wanted to change the bv_page field name to make sure
that we catch anyone doing any direct dereference. Do you prefer keeping a
page pointer there ?

> 
> That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> can just submit them as two separate BIOs (chained at the block layer).
> 
> Many users just submit one page bios and let elevator merge them any way.

The issue is that bio_vec is use, on its own, outside of bios and for
those use cases i need to track the GUP status within the bio_vec. Thus
it is easier to use the same mechanisms for bio too as adding a flag to
bio would mean that i also have to audit all code path that could merge
bios. While i believe it should be restrictred to block/blk-merge.c it
seems some block and some fs have spawn some custom bio manipulation
(md comes to mind). So using same mechanism for bio_vec and bio seems
like a safer and easier course of action.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
       [not found] ` <2c124cc4-b97e-ee28-2926-305bc6bc74bd@plexistor.com>
  2019-04-16 18:47   ` Jerome Glisse
@ 2019-04-16 18:59   ` Kent Overstreet
  2019-04-16 19:12     ` Dan Williams
  1 sibling, 1 reply; 47+ messages in thread
From: Kent Overstreet @ 2019-04-16 18:59 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: jglisse, linux-kernel, linux-fsdevel, linux-block, linux-mm,
	John Hubbard, Jan Kara, Dan Williams, Alexander Viro,
	Johannes Thumshirn, Christoph Hellwig, Jens Axboe, Ming Lei,
	Jason Gunthorpe, Matthew Wilcox, Steve French, linux-cifs,
	samba-technical, Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder,
	ceph-devel, Eric Van Hensbergen, Latchesar Ionkov, Mike Marshall,
	Martin Brandenburg, devel, Dominique Martinet, v9fs-developer,
	Coly Li, linux-bcache, Ernesto A. Fernández

On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > This patchset depends on various small fixes [1] and also on patchset
> > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > so that it can get review and comments on how and what should be done
> > to test things.
> > 
> > For various reasons [2] [3] we want to track page reference through GUP
> > differently than "regular" page reference. Thus we need to keep track
> > of how we got a page within the block and fs layer. To do so this patch-
> > set change the bio_bvec struct to store a pfn and flags instead of a
> > direct pointer to a page. This way we can flag page that are coming from
> > GUP.
> > 
> > This patchset is divided as follow:
> >     - First part of the patchset is just small cleanup i believe they
> >       can go in as his assuming people are ok with them.
> 
> 
> >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> >       done in multi-step, first we replace all direct dereference of
> >       the field by call to inline helper, then we introduce macro for
> >       bio_bvec that are initialized on the stack. Finaly we change the
> >       bv_page field to bv_pfn.
> 
> Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> as a flag (pointer always aligned to 64 bytes in our case).
> 
> So yes we need an inline helper for reference of the page but is it not clearer
> that we assume a page* and not any kind of pfn ?
> It will not be the first place using low bits of a pointer for flags.
> 
> That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> can just submit them as two separate BIOs (chained at the block layer).
> 
> Many users just submit one page bios and let elevator merge them any way.

Let's please not add additional flags and weirdness to struct bio - "if this
flag is set interpret one way, if not interpret another" - or eventually bios
will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.

Question though - why do we need a flag for whether a page is a GUP page or not?
Couldn't the needed information just be determined by what range the pfn is not
(i.e. whether or not it has a struct page associated with it)?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-16 18:59   ` Kent Overstreet
@ 2019-04-16 19:12     ` Dan Williams
  2019-04-16 19:49       ` Jerome Glisse
       [not found]       ` <ccac6c5a-7120-0455-88de-ca321b01e825@plexistor.com>
  0 siblings, 2 replies; 47+ messages in thread
From: Dan Williams @ 2019-04-16 19:12 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Boaz Harrosh, Jérôme Glisse, Linux Kernel Mailing List,
	linux-fsdevel, linux-block, Linux MM, John Hubbard, Jan Kara,
	Alexander Viro, Johannes Thumshirn, Christoph Hellwig,
	Jens Axboe, Ming Lei, Jason Gunthorpe, Matthew Wilcox,
	Steve French, linux-cifs, samba-technical, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg, devel,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
<kent.overstreet@gmail.com> wrote:
>
> On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> > On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > > From: Jérôme Glisse <jglisse@redhat.com>
> > >
> > > This patchset depends on various small fixes [1] and also on patchset
> > > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > > so that it can get review and comments on how and what should be done
> > > to test things.
> > >
> > > For various reasons [2] [3] we want to track page reference through GUP
> > > differently than "regular" page reference. Thus we need to keep track
> > > of how we got a page within the block and fs layer. To do so this patch-
> > > set change the bio_bvec struct to store a pfn and flags instead of a
> > > direct pointer to a page. This way we can flag page that are coming from
> > > GUP.
> > >
> > > This patchset is divided as follow:
> > >     - First part of the patchset is just small cleanup i believe they
> > >       can go in as his assuming people are ok with them.
> >
> >
> > >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> > >       done in multi-step, first we replace all direct dereference of
> > >       the field by call to inline helper, then we introduce macro for
> > >       bio_bvec that are initialized on the stack. Finaly we change the
> > >       bv_page field to bv_pfn.
> >
> > Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> > as a flag (pointer always aligned to 64 bytes in our case).
> >
> > So yes we need an inline helper for reference of the page but is it not clearer
> > that we assume a page* and not any kind of pfn ?
> > It will not be the first place using low bits of a pointer for flags.
> >
> > That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> > at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> > can just submit them as two separate BIOs (chained at the block layer).
> >
> > Many users just submit one page bios and let elevator merge them any way.
>
> Let's please not add additional flags and weirdness to struct bio - "if this
> flag is set interpret one way, if not interpret another" - or eventually bios
> will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.

This all reminds of the failed attempt to teach the block layer to
operate without pages:

https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/

>
> Question though - why do we need a flag for whether a page is a GUP page or not?
> Couldn't the needed information just be determined by what range the pfn is not
> (i.e. whether or not it has a struct page associated with it)?

That amounts to a pfn_valid() check which is a bit heavier than if we
can store a flag in the bv_pfn entry directly.

I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
an 'unsigned long'.

That said, I'm still in favor of Jan's proposal to just make the
bv_page semantics uniform. Otherwise we're complicating this core
infrastructure for some yet to be implemented GPU memory management
capabilities with yet to be determined value. Circle back when that
value is clear, but in the meantime fix the GUP bug.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-16 19:12     ` Dan Williams
@ 2019-04-16 19:49       ` Jerome Glisse
  2019-04-17 21:53         ` Dan Williams
       [not found]       ` <ccac6c5a-7120-0455-88de-ca321b01e825@plexistor.com>
  1 sibling, 1 reply; 47+ messages in thread
From: Jerome Glisse @ 2019-04-16 19:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: Kent Overstreet, Boaz Harrosh, Linux Kernel Mailing List,
	linux-fsdevel, linux-block, Linux MM, John Hubbard, Jan Kara,
	Alexander Viro, Johannes Thumshirn, Christoph Hellwig,
	Jens Axboe, Ming Lei, Jason Gunthorpe, Matthew Wilcox,
	Steve French, linux-cifs, samba-technical, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg, devel,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Tue, Apr 16, 2019 at 12:12:27PM -0700, Dan Williams wrote:
> On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> >
> > On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> > > On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > >
> > > > This patchset depends on various small fixes [1] and also on patchset
> > > > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > > > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > > > so that it can get review and comments on how and what should be done
> > > > to test things.
> > > >
> > > > For various reasons [2] [3] we want to track page reference through GUP
> > > > differently than "regular" page reference. Thus we need to keep track
> > > > of how we got a page within the block and fs layer. To do so this patch-
> > > > set change the bio_bvec struct to store a pfn and flags instead of a
> > > > direct pointer to a page. This way we can flag page that are coming from
> > > > GUP.
> > > >
> > > > This patchset is divided as follow:
> > > >     - First part of the patchset is just small cleanup i believe they
> > > >       can go in as his assuming people are ok with them.
> > >
> > >
> > > >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> > > >       done in multi-step, first we replace all direct dereference of
> > > >       the field by call to inline helper, then we introduce macro for
> > > >       bio_bvec that are initialized on the stack. Finaly we change the
> > > >       bv_page field to bv_pfn.
> > >
> > > Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> > > as a flag (pointer always aligned to 64 bytes in our case).
> > >
> > > So yes we need an inline helper for reference of the page but is it not clearer
> > > that we assume a page* and not any kind of pfn ?
> > > It will not be the first place using low bits of a pointer for flags.
> > >
> > > That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> > > at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> > > can just submit them as two separate BIOs (chained at the block layer).
> > >
> > > Many users just submit one page bios and let elevator merge them any way.
> >
> > Let's please not add additional flags and weirdness to struct bio - "if this
> > flag is set interpret one way, if not interpret another" - or eventually bios
> > will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.
> 
> This all reminds of the failed attempt to teach the block layer to
> operate without pages:
> 
> https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> 
> >
> > Question though - why do we need a flag for whether a page is a GUP page or not?
> > Couldn't the needed information just be determined by what range the pfn is not
> > (i.e. whether or not it has a struct page associated with it)?
> 
> That amounts to a pfn_valid() check which is a bit heavier than if we
> can store a flag in the bv_pfn entry directly.
> 
> I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> an 'unsigned long'.
> 
> That said, I'm still in favor of Jan's proposal to just make the
> bv_page semantics uniform. Otherwise we're complicating this core
> infrastructure for some yet to be implemented GPU memory management
> capabilities with yet to be determined value. Circle back when that
> value is clear, but in the meantime fix the GUP bug.

This has nothing to do with GPU, what make you think so ? Here i am
trying to solve GUP and to keep the value of knowing wether a page
has been GUP or not. I argue that if we bias every page in every bio
then we loose that information and thus the value.

I gave the page protection mechanisms as an example that would be
impacted but it is not the only one. Knowing if a page has been GUP
can be useful for memory reclaimation, compaction, NUMA balancing,
...

Also page that are going through a bio in one thread might be under
some other fs specific operation in another thread which would be
block by GUP but do not need to be block by I/O (ie fs can either
wait on the I/O or knows that it is safe to proceed even if the page
is under I/O).

Hence i believe that by making every page look the same we do loose
valuable information. More over the complexity of making all the
page in bio have a reference count bias is much bigger than the
changes needed to keep track of wether the page did came from GUP
or not.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
       [not found]       ` <ccac6c5a-7120-0455-88de-ca321b01e825@plexistor.com>
@ 2019-04-16 19:57         ` Jerome Glisse
       [not found]           ` <41e2d7e1-104b-a006-2824-015ca8c76cc8@gmail.com>
  2019-04-17 21:54         ` Dan Williams
  1 sibling, 1 reply; 47+ messages in thread
From: Jerome Glisse @ 2019-04-16 19:57 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Dan Williams, Kent Overstreet, Linux Kernel Mailing List,
	linux-fsdevel, linux-block, Linux MM, John Hubbard, Jan Kara,
	Alexander Viro, Johannes Thumshirn, Christoph Hellwig,
	Jens Axboe, Ming Lei, Jason Gunthorpe, Matthew Wilcox,
	Steve French, linux-cifs, Yan Zheng, Sage Weil, Ilya Dryomov,
	Alex Elder, ceph-devel, Eric Van Hensbergen, Latchesar Ionkov,
	Mike Marshall, Martin Brandenburg, devel, Dominique Martinet,
	v9fs-developer, Coly Li, linux-bcache, Ernesto A. Fernández

On Tue, Apr 16, 2019 at 10:28:40PM +0300, Boaz Harrosh wrote:
> On 16/04/19 22:12, Dan Williams wrote:
> > On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> > <kent.overstreet@gmail.com> wrote:
> <>
> > This all reminds of the failed attempt to teach the block layer to
> > operate without pages:
> > 
> > https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> > 
> 
> Exactly why I want to make sure it is just a [pointer | flag] and not any kind of pfn
> type. Let us please not go there again?
> 
> >>
> >> Question though - why do we need a flag for whether a page is a GUP page or not?
> >> Couldn't the needed information just be determined by what range the pfn is not
> >> (i.e. whether or not it has a struct page associated with it)?
> > 
> > That amounts to a pfn_valid() check which is a bit heavier than if we
> > can store a flag in the bv_pfn entry directly.
> > 
> > I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> > an 'unsigned long'.
> > 
> 
> No, please please not. This is not a pfn and not a pfn_t. It is a page-ptr
> and a flag that says where/how to put_page it. IE I did a GUP on this page
> please do a PUP on this page instead of regular put_page. So no where do I mean
> pfn or pfn_t in this code. Then why?
> 
> > That said, I'm still in favor of Jan's proposal to just make the
> > bv_page semantics uniform. Otherwise we're complicating this core
> > infrastructure for some yet to be implemented GPU memory management
> > capabilities with yet to be determined value. Circle back when that
> > value is clear, but in the meantime fix the GUP bug.
> > 
> 
> I agree there are simpler ways to solve the bugs at hand then
> to system wide separate get_user_page from get_page and force all put_user
> callers to remember what to do. Is there some Document explaining the
> all design of where this is going?
> 

A very long thread on this:

https://lkml.org/lkml/2018/12/3/1128

especialy all the reply to this first one

There is also:

https://lkml.org/lkml/2019/3/26/1395
https://lwn.net/Articles/753027/

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
       [not found]           ` <41e2d7e1-104b-a006-2824-015ca8c76cc8@gmail.com>
@ 2019-04-16 23:16             ` Jerome Glisse
       [not found]               ` <fa00a2ff-3664-3165-7af8-9d9c53238245@plexistor.com>
  2019-04-16 23:34             ` Jerome Glisse
  1 sibling, 1 reply; 47+ messages in thread
From: Jerome Glisse @ 2019-04-16 23:16 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Boaz Harrosh, Dan Williams, Kent Overstreet,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Jan Kara, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Wed, Apr 17, 2019 at 01:09:22AM +0300, Boaz Harrosh wrote:
> On 16/04/19 22:57, Jerome Glisse wrote:
> <>
> > 
> > A very long thread on this:
> > 
> > https://lkml.org/lkml/2018/12/3/1128
> > 
> > especialy all the reply to this first one
> > 
> > There is also:
> > 
> > https://lkml.org/lkml/2019/3/26/1395
> > https://lwn.net/Articles/753027/
> > 
> 
> OK I have re-read this patchset and a little bit of the threads above (not all)
> 
> As I understand the long term plan is to keep two separate ref-counts one
> for GUP-ref and one for the regular page-state/ownership ref.
> Currently looking at page-ref we do not know if we have a GUP currently held.
> With the new plan we can (Still not sure what's the full plan with this new info)
> 
> But if you make it such as the first GUP-ref also takes a page_ref and the
> last GUp-dec also does put_page. Then the all of these becomes a matter of
> matching every call to get_user_pages or iov_iter_get_pages() with a new
> put_user_pages or iov_iter_put_pages().
> 
> Then if much below us an LLD takes a get_page() say an skb below the iscsi
> driver, and so on. We do not care and we keep doing a put_page because we know
> the GUP-ref holds the page for us.
> 
> The current block layer is transparent to any page-ref it does not take any
> nor put_page any. It is only the higher users that have done GUP that take care of that.
> 
> The patterns I see are:
> 
>   iov_iter_get_pages()
> 
> 	IO(sync)
> 
>   for(numpages)
> 	put_page()
> 
> Or
> 
>   iov_iter_get_pages()
> 
> 	IO (async)
> 		->	foo_end_io()
> 				put_page
> 
> (Same with get_user_pages)
> (IO need not be block layer. It can be networking and so on like in NFS or CIFS
>  and so on)

They are also other code that pass around bio_vec and the code that
fill it is disconnected from the code that release the page and they
can mix and match GUP and non GUP AFAICT.

On fs side they are also code that fill either bio or bio_vec and
use some extra mechanism other than bio_end to submit io through
workqueue and then release pages (cifs for instance). Again i believe
they can mix and match GUP and non GUP (i have not spotted something
obvious indicating otherwise).

> 
> The first pattern is easy just add the proper new api for
> it, so for every iov_iter_get_pages() you have an iov_iter_put_pages() and remove
> lots of cooked up for loops. Also the all iov_iter_get_pages_use_gup() just drops.
> (Same at get_user_pages sites use put_user_pages)

Yes this patchset already convert some of this first pattern.

> The second pattern is a bit harder because it is possible that the foo_end_io()
> is currently used for GUP as well as none-GUP cases. this is easy to fix. But the
> even harder case is if the same foo_end_io() call has some pages GUPed and some not
> in the same call.
> 
> staring at this patchset and the call sites I did not see any such places. Do you know
> of any?
> (We can always force such mixed-case users to always GUP-ref the pages and code
>  foo_end_io() to GUP-dec)

I believe direct-io.c is such example thought in that case i believe it
can only be the ZERO_PAGE so this might easily detectable. They are also
lot of fs functions taking an iterator and then using iov_iter_get_pages*()
to fill a bio. AFAICT those functions can be call with pipe iterator or
iovec iterator and probably also with other iterator type. But it is all
common code afterward (the bi_end_io function is the same no matter the
iterator).

Thought that can probably be solve that way:

From:
    foo_bi_end_io(struct bio *bio) {
        ...
        for (i = 0; i < npages; ++i) {
            put_page(pages[i]);
        }
    }

To:
    foo_bi_end_io_common(struct bio *bio) {
        ...
    }

    foo_bi_end_io_normal(struct bio *bio)
        foo_bi_end_io_common(bio);
        for (i = 0; i < npages; ++i) {
            put_page(pages[i]);
        }
    }

    foo_bi_end_io_gup(struct bio *bio)
        foo_bi_end_io_common(bio);
        for (i = 0; i < npages; ++i) {
            put_user_page(pages[i]);
        }
    }

Then when filling in the bio i either pick foo_bi_end_io_normal() or
foo_bi_end_io_gup(). I am assuming that bio with different bi_end_io
function never get merge.

The issue is that some bio_add_page*() call site are disconnected
from where the bio is allocated and initialized (and also where the
bi_end_io function is set). This make it quite hard to ascertain
that GUPed page and non GUP page can not co-exist in same bio.

Also in some cases it is not clear that the same iter is use to
fill the same bio ie it might be possible that some code path fill
the same bio from different iterator (and thus some pages might
be coming from GUP and other not).

It would certainly seems to require more careful review from the
maintainers of such fs. I tend to believe that putting the burden
on the reviewer is a harder sell :)

From quick glance:
   - nilfs segment thing
   - direct-io same bio accumulate pages over multiple call but
     it should always be from same iterator and thus either always
     be from GUP or non GUP. Also the ZERO_PAGE case should be easy
     to catch.
   - fs/nfs/blocklayout/blocklayout.c
   - gfs2 log buffer, that should never be page from GUP but i could
     not ascertain that easily from quick review

This is not extensive, i was just grepping for bio_add_page() and
they are 2 other variant to check and i tended to discard places
where bio is allocated in same function as bio_add_page() but this
might not be a valid assumption either. Some bio might be allocated
and only if there is no default bio already and then set as default
bio which might be use latter on with different iterator.

> 
> So with a very careful coding I think you need not touch the block / scatter-list layers
> nor any LLD drivers. The only code affected is the code around the get_user_pages and friends.
> Changing the API will surface all those.
> (IE. introduce a new API, convert one by one, Remove old API)
> 
> Am I smoking?

No, i thought about it seemed more dangerous and harder to get right
because some code add page in one place and setup bio in another. I
can dig some more on that front but this still leave the non-bio user
of bio_vec and those IIRC also suffer from same disconnect issue.

> 
> BTW: Are you aware of the users of iov_iter_get_pages_alloc() Do they need fixing too?

Yeah and that patchset should address those already, i do not think
i missed any.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
       [not found]           ` <41e2d7e1-104b-a006-2824-015ca8c76cc8@gmail.com>
  2019-04-16 23:16             ` Jerome Glisse
@ 2019-04-16 23:34             ` Jerome Glisse
  1 sibling, 0 replies; 47+ messages in thread
From: Jerome Glisse @ 2019-04-16 23:34 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Boaz Harrosh, Dan Williams, Kent Overstreet,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Jan Kara, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Wed, Apr 17, 2019 at 01:09:22AM +0300, Boaz Harrosh wrote:
> On 16/04/19 22:57, Jerome Glisse wrote:
> <>
> > 
> > A very long thread on this:
> > 
> > https://lkml.org/lkml/2018/12/3/1128
> > 
> > especialy all the reply to this first one
> > 
> > There is also:
> > 
> > https://lkml.org/lkml/2019/3/26/1395
> > https://lwn.net/Articles/753027/
> > 
> 
> OK I have re-read this patchset and a little bit of the threads above (not all)
> 
> As I understand the long term plan is to keep two separate ref-counts one
> for GUP-ref and one for the regular page-state/ownership ref.
> Currently looking at page-ref we do not know if we have a GUP currently held.
> With the new plan we can (Still not sure what's the full plan with this new info)
> 
> But if you make it such as the first GUP-ref also takes a page_ref and the
> last GUp-dec also does put_page. Then the all of these becomes a matter of
> matching every call to get_user_pages or iov_iter_get_pages() with a new
> put_user_pages or iov_iter_put_pages().

So sorry forgot to answer that part. So idea is to do:
    GUP() {
        ...
-       page_ref_inc(page);
+       page_ref_add(page, GUP_BIAS);
        ...
    }

with GUP_BIAS = 1024 or something big but not too big to avoid risk of
overflow by GUP. Then put_user_page() just ref_sub instead of ref_dec
the same amount.

We can have false GUP positive if a page is map so many time or reference
so many time that its refcount reach the GUP_BIAS value but considering
such page as GUPed should not be too harmful (not more harmful than what
we do with GUPed page).

So we want to call put_user_page() for GUPed page and only GUPed page so
that we keep the reference count properly balance.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
       [not found]               ` <fa00a2ff-3664-3165-7af8-9d9c53238245@plexistor.com>
@ 2019-04-17  2:03                 ` Jerome Glisse
  2019-04-17 21:19                   ` Jerome Glisse
  0 siblings, 1 reply; 47+ messages in thread
From: Jerome Glisse @ 2019-04-17  2:03 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Boaz Harrosh, Dan Williams, Kent Overstreet,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Jan Kara, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Wed, Apr 17, 2019 at 04:11:03AM +0300, Boaz Harrosh wrote:
> On 17/04/19 02:16, Jerome Glisse wrote:
> > On Wed, Apr 17, 2019 at 01:09:22AM +0300, Boaz Harrosh wrote:
> >> On 16/04/19 22:57, Jerome Glisse wrote:
> >> <>
> >>>
> >>> A very long thread on this:
> >>>
> >>> https://lkml.org/lkml/2018/12/3/1128
> >>>
> >>> especialy all the reply to this first one
> >>>
> >>> There is also:
> >>>
> >>> https://lkml.org/lkml/2019/3/26/1395
> >>> https://lwn.net/Articles/753027/
> >>>
> >>
> >> OK I have re-read this patchset and a little bit of the threads above (not all)
> >>
> >> As I understand the long term plan is to keep two separate ref-counts one
> >> for GUP-ref and one for the regular page-state/ownership ref.
> >> Currently looking at page-ref we do not know if we have a GUP currently held.
> >> With the new plan we can (Still not sure what's the full plan with this new info)
> >>
> >> But if you make it such as the first GUP-ref also takes a page_ref and the
> >> last GUp-dec also does put_page. Then the all of these becomes a matter of
> >> matching every call to get_user_pages or iov_iter_get_pages() with a new
> >> put_user_pages or iov_iter_put_pages().
> >>
> >> Then if much below us an LLD takes a get_page() say an skb below the iscsi
> >> driver, and so on. We do not care and we keep doing a put_page because we know
> >> the GUP-ref holds the page for us.
> >>
> >> The current block layer is transparent to any page-ref it does not take any
> >> nor put_page any. It is only the higher users that have done GUP that take care of that.
> >>
> >> The patterns I see are:
> >>
> >>   iov_iter_get_pages()
> >>
> >> 	IO(sync)
> >>
> >>   for(numpages)
> >> 	put_page()
> >>
> >> Or
> >>
> >>   iov_iter_get_pages()
> >>
> >> 	IO (async)
> >> 		->	foo_end_io()
> >> 				put_page
> >>
> >> (Same with get_user_pages)
> >> (IO need not be block layer. It can be networking and so on like in NFS or CIFS
> >>  and so on)
> > 
> > They are also other code that pass around bio_vec and the code that
> > fill it is disconnected from the code that release the page and they
> > can mix and match GUP and non GUP AFAICT.
> > 
> > On fs side they are also code that fill either bio or bio_vec and
> > use some extra mechanism other than bio_end to submit io through
> > workqueue and then release pages (cifs for instance). Again i believe
> > they can mix and match GUP and non GUP (i have not spotted something
> > obvious indicating otherwise).
> > 
> 
> But what I meant is why do we care at all? block layer does not inc page nor put
> page in any of bio or bio_vec. It is agnostic to the page-refs.
> 
> Users register an end_io and know if pages are getted or not.
> So the balanced put is up to the user.
> 
> >>
> >> The first pattern is easy just add the proper new api for
> >> it, so for every iov_iter_get_pages() you have an iov_iter_put_pages() and remove
> >> lots of cooked up for loops. Also the all iov_iter_get_pages_use_gup() just drops.
> >> (Same at get_user_pages sites use put_user_pages)
> > 
> > Yes this patchset already convert some of this first pattern.
> > 
> 
> Right!
> 
> >> The second pattern is a bit harder because it is possible that the foo_end_io()
> >> is currently used for GUP as well as none-GUP cases. this is easy to fix. But the
> >> even harder case is if the same foo_end_io() call has some pages GUPed and some not
> >> in the same call.
> >>
> >> staring at this patchset and the call sites I did not see any such places. Do you know
> >> of any?
> >> (We can always force such mixed-case users to always GUP-ref the pages and code
> >>  foo_end_io() to GUP-dec)
> > 
> > I believe direct-io.c is such example thought in that case i believe it
> > can only be the ZERO_PAGE so this might easily detectable. They are also
> > lot of fs functions taking an iterator and then using iov_iter_get_pages*()
> > to fill a bio. AFAICT those functions can be call with pipe iterator or
> > iovec iterator and probably also with other iterator type. But it is all
> > common code afterward (the bi_end_io function is the same no matter the
> > iterator).
> > 
> > Thought that can probably be solve that way:
> > 
> > From:
> >     foo_bi_end_io(struct bio *bio) {
> >         ...
> >         for (i = 0; i < npages; ++i) {
> >             put_page(pages[i]);
> >         }
> >     }
> > 
> > To:
> >     foo_bi_end_io_common(struct bio *bio) {
> >         ...
> >     }
> > 
> >     foo_bi_end_io_normal(struct bio *bio)
> >         foo_bi_end_io_common(bio);
> >         for (i = 0; i < npages; ++i) {
> >             put_page(pages[i]);
> >         }
> >     }
> > 
> >     foo_bi_end_io_gup(struct bio *bio)
> >         foo_bi_end_io_common(bio);
> >         for (i = 0; i < npages; ++i) {
> >             put_user_page(pages[i]);
> >         }
> >     }
> > 
> 
> Yes or when foo_bi_end_io_common is more complicated, then just make it
> 
>      foo_bi_end_io_common(struct bio *bio, bool gup) {
>          ...
>      }
> 
>      foo_bi_end_io_normal(struct bio *bio)
> 	foo_bi_end_io_common(bio, false);
>      }
>  
>      foo_bi_end_io_gup(struct bio *bio)
> 	foo_bi_end_io_common(bio, true);
>      }
> 
> Less risky coding of code we do not know?

Yes whatever is more appropriate for eahc end_io func.

> 
> > Then when filling in the bio i either pick foo_bi_end_io_normal() or
> > foo_bi_end_io_gup(). I am assuming that bio with different bi_end_io
> > function never get merge.
> > 
> 
> Exactly
> 
> > The issue is that some bio_add_page*() call site are disconnected
> > from where the bio is allocated and initialized (and also where the
> > bi_end_io function is set). This make it quite hard to ascertain
> > that GUPed page and non GUP page can not co-exist in same bio.
> > 
> 
> Two questions if they always do a put_page at end IO. Who takes a page_ref
> in the not GUP case? page-cache? VFS? a mechanical get_page?

It depends, some get page to page-cache (find_get_page), other just
get_page on a page that is coming from unknown origin (ie it is not
clear from within the function where the page is coming from), other
allocate a page and transfer the page reference to the bio ie the
page will get free once the bio end function is call through as it
call call put_page on page within the bio.

So there is not one simple story here. Hence why it looks harder to
me (still likely do-able).

> > Also in some cases it is not clear that the same iter is use to
> > fill the same bio ie it might be possible that some code path fill
> > the same bio from different iterator (and thus some pages might
> > be coming from GUP and other not).
> > 
> 
> This one is hard to believe for me. 
> one iter may produce multiple iter_get_pages() and many more bios.
> But the opposite?
> 
> I know, never say never. Do you know of a specific example?
> I would like to stare at it.

No i do not know of any such example but reading through a lot of code
i could not convince myself of that fact. I agree with your feeling on
the matter i just don't have proof. I can spend more time digging all
code path and ascertaining thing but this will inevitably be a snapshot
in time and something that people might break in the future.

> 
> > It would certainly seems to require more careful review from the
> > maintainers of such fs. I tend to believe that putting the burden
> > on the reviewer is a harder sell :)
> > 
> 
> I think a couple carefully put WARN_ONs in the PUT path can
> detect any leakage of refs. And help debug these cases.

Yes we do plan on catching things like underflow (easy one to catch)
or put_page that bring the refcount just below the bias value ... and
in anycase the only harm that can come of this should be memory leak.

So i believe that even if we miss some weird corner case we should be
able to catch it eventualy and nothing harmful should ever happen,
famous last word i will regret when i get burnt at the stake for missing
that one case ;)

> 
> > From quick glance:
> >    - nilfs segment thing
> >    - direct-io same bio accumulate pages over multiple call but
> >      it should always be from same iterator and thus either always
> >      be from GUP or non GUP. Also the ZERO_PAGE case should be easy
> >      to catch.
> 
> Yes. Or we can always take a GUP-ref on the ZERO_PAGE as well
> 
> >    - fs/nfs/blocklayout/blocklayout.c
> 
> This one is an example of "please do not touch" if you look at the code
> it currently does not do any put page at all. Though yes it does bio_add_page.
> 
> The pages are GETed and PUTed in nfs/direct.c and reference are balanced there.
> 
> this is the trivial case of for every iov_iter_get_pages[_alloc]() there is
> a new defined iov_iter_put_pages[_alloc]()
> 
> So this is an example of extra not needed code changes in your approach
> 
> >    - gfs2 log buffer, that should never be page from GUP but i could
> >      not ascertain that easily from quick review
> 
> 	Same as NFS maybe? didn't look.
> 
> > 
> > This is not extensive, i was just grepping for bio_add_page() and
> > they are 2 other variant to check and i tended to discard places
> > where bio is allocated in same function as bio_add_page() but this
> > might not be a valid assumption either. Some bio might be allocated
> > and only if there is no default bio already and then set as default
> > bio which might be use latter on with different iterator.
> > 
> 
> I think we do not care at all about any of the bio_add_page() or bio_alloc
> places. All we care about is the call to iov_iter_get_pages* and where in the
> code these puts are balanced.
> 
> If we need to split the endio case at those sights then we can do as above.
> Or in the worse case when pages are really mixed. Always take a GUP  ref also
> on the not GUP case. 
> (I would like to see where this happens)
> 
> >>
> >> So with a very careful coding I think you need not touch the block / scatter-list layers
> >> nor any LLD drivers. The only code affected is the code around the get_user_pages and friends.
> >> Changing the API will surface all those.
> >> (IE. introduce a new API, convert one by one, Remove old API)
> >>
> >> Am I smoking?
> > 
> > No, i thought about it seemed more dangerous and harder to get right
> > because some code add page in one place and setup bio in another. I
> > can dig some more on that front but this still leave the non-bio user
> > of bio_vec and those IIRC also suffer from same disconnect issue.
> > 
> 
> Again I should not care about bio_vec. I only need to trace the balancing of the
> ref taken in GUP call sight. Let me help you in those places it is not obvious to
> you.
> 
> >>
> >> BTW: Are you aware of the users of iov_iter_get_pages_alloc() Do they need fixing too?
> > 
> > Yeah and that patchset should address those already, i do not think
> > i missed any.
> > 
> 
> I could not find a patch for nfs/direct.c where a put_page is called
> to balance the iov_iter_get_pages_alloc(). Which takes care of for example of
> the blocklayout.c pages state
> 
> So I think the deep Audit needs to be for iov_iter_get_pages and get_user_pages
> and the balancing of that. And the all of bio_alloc and bio_add_page should stay
> agnostic to any pege-refs taking/putting

Will try to get started on that and see if i hit any roadblock. I will
report once i get my feet wet, or at least before i drown ;)

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-17  2:03                 ` Jerome Glisse
@ 2019-04-17 21:19                   ` Jerome Glisse
  0 siblings, 0 replies; 47+ messages in thread
From: Jerome Glisse @ 2019-04-17 21:19 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Boaz Harrosh, Dan Williams, Kent Overstreet,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Jan Kara, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Tue, Apr 16, 2019 at 10:03:45PM -0400, Jerome Glisse wrote:
> On Wed, Apr 17, 2019 at 04:11:03AM +0300, Boaz Harrosh wrote:
> > On 17/04/19 02:16, Jerome Glisse wrote:
> > > On Wed, Apr 17, 2019 at 01:09:22AM +0300, Boaz Harrosh wrote:
> > >> On 16/04/19 22:57, Jerome Glisse wrote:

[...]

> > >>
> > >> BTW: Are you aware of the users of iov_iter_get_pages_alloc() Do they need fixing too?
> > > 
> > > Yeah and that patchset should address those already, i do not think
> > > i missed any.
> > > 
> > 
> > I could not find a patch for nfs/direct.c where a put_page is called
> > to balance the iov_iter_get_pages_alloc(). Which takes care of for example of
> > the blocklayout.c pages state
> > 
> > So I think the deep Audit needs to be for iov_iter_get_pages and get_user_pages
> > and the balancing of that. And the all of bio_alloc and bio_add_page should stay
> > agnostic to any pege-refs taking/putting
> 
> Will try to get started on that and see if i hit any roadblock. I will
> report once i get my feet wet, or at least before i drown ;)

So far it does not look too bad:

https://cgit.freedesktop.org/~glisse/linux/log/?h=gup-bio-v2

they are few things that will be harder to fit in like splice
and pipe that are populated from GUP.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-16 19:49       ` Jerome Glisse
@ 2019-04-17 21:53         ` Dan Williams
  2019-04-17 22:28           ` Jerome Glisse
  0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2019-04-17 21:53 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Kent Overstreet, Boaz Harrosh, Linux Kernel Mailing List,
	linux-fsdevel, linux-block, Linux MM, John Hubbard, Jan Kara,
	Alexander Viro, Johannes Thumshirn, Christoph Hellwig,
	Jens Axboe, Ming Lei, Jason Gunthorpe, Matthew Wilcox,
	Steve French, linux-cifs, samba-technical, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg, devel,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Tue, Apr 16, 2019 at 12:50 PM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Tue, Apr 16, 2019 at 12:12:27PM -0700, Dan Williams wrote:
> > On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> > <kent.overstreet@gmail.com> wrote:
> > >
> > > On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> > > > On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > >
> > > > > This patchset depends on various small fixes [1] and also on patchset
> > > > > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > > > > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > > > > so that it can get review and comments on how and what should be done
> > > > > to test things.
> > > > >
> > > > > For various reasons [2] [3] we want to track page reference through GUP
> > > > > differently than "regular" page reference. Thus we need to keep track
> > > > > of how we got a page within the block and fs layer. To do so this patch-
> > > > > set change the bio_bvec struct to store a pfn and flags instead of a
> > > > > direct pointer to a page. This way we can flag page that are coming from
> > > > > GUP.
> > > > >
> > > > > This patchset is divided as follow:
> > > > >     - First part of the patchset is just small cleanup i believe they
> > > > >       can go in as his assuming people are ok with them.
> > > >
> > > >
> > > > >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> > > > >       done in multi-step, first we replace all direct dereference of
> > > > >       the field by call to inline helper, then we introduce macro for
> > > > >       bio_bvec that are initialized on the stack. Finaly we change the
> > > > >       bv_page field to bv_pfn.
> > > >
> > > > Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> > > > as a flag (pointer always aligned to 64 bytes in our case).
> > > >
> > > > So yes we need an inline helper for reference of the page but is it not clearer
> > > > that we assume a page* and not any kind of pfn ?
> > > > It will not be the first place using low bits of a pointer for flags.
> > > >
> > > > That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> > > > at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> > > > can just submit them as two separate BIOs (chained at the block layer).
> > > >
> > > > Many users just submit one page bios and let elevator merge them any way.
> > >
> > > Let's please not add additional flags and weirdness to struct bio - "if this
> > > flag is set interpret one way, if not interpret another" - or eventually bios
> > > will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.
> >
> > This all reminds of the failed attempt to teach the block layer to
> > operate without pages:
> >
> > https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> >
> > >
> > > Question though - why do we need a flag for whether a page is a GUP page or not?
> > > Couldn't the needed information just be determined by what range the pfn is not
> > > (i.e. whether or not it has a struct page associated with it)?
> >
> > That amounts to a pfn_valid() check which is a bit heavier than if we
> > can store a flag in the bv_pfn entry directly.
> >
> > I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> > an 'unsigned long'.
> >
> > That said, I'm still in favor of Jan's proposal to just make the
> > bv_page semantics uniform. Otherwise we're complicating this core
> > infrastructure for some yet to be implemented GPU memory management
> > capabilities with yet to be determined value. Circle back when that
> > value is clear, but in the meantime fix the GUP bug.
>
> This has nothing to do with GPU, what make you think so ? Here i am
> trying to solve GUP and to keep the value of knowing wether a page
> has been GUP or not. I argue that if we bias every page in every bio
> then we loose that information and thus the value.
>
> I gave the page protection mechanisms as an example that would be
> impacted but it is not the only one. Knowing if a page has been GUP
> can be useful for memory reclaimation, compaction, NUMA balancing,

Right, this is what I was reacting to in your pushback to Jan's
proposal. You're claiming value for not doing the simple thing for
some future "may be useful in these contexts". To my knowledge those
things are not broken today. You're asking for the complexity to be
carried today for some future benefit, and I'm asking for the
simplicity to be maintained as much as possible today and let the
value of future changes stand on their own to push for more complexity
later.

Effectively don't use this bug fix to push complexity for a future
agenda where the value has yet to be quantified.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
       [not found]       ` <ccac6c5a-7120-0455-88de-ca321b01e825@plexistor.com>
  2019-04-16 19:57         ` Jerome Glisse
@ 2019-04-17 21:54         ` Dan Williams
  1 sibling, 0 replies; 47+ messages in thread
From: Dan Williams @ 2019-04-17 21:54 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Kent Overstreet, Jérôme Glisse,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Jan Kara, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg, devel,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Tue, Apr 16, 2019 at 12:28 PM Boaz Harrosh <boaz@plexistor.com> wrote:
>
> On 16/04/19 22:12, Dan Williams wrote:
> > On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> > <kent.overstreet@gmail.com> wrote:
> <>
> > This all reminds of the failed attempt to teach the block layer to
> > operate without pages:
> >
> > https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> >
>
> Exactly why I want to make sure it is just a [pointer | flag] and not any kind of pfn
> type. Let us please not go there again?
>
> >>
> >> Question though - why do we need a flag for whether a page is a GUP page or not?
> >> Couldn't the needed information just be determined by what range the pfn is not
> >> (i.e. whether or not it has a struct page associated with it)?
> >
> > That amounts to a pfn_valid() check which is a bit heavier than if we
> > can store a flag in the bv_pfn entry directly.
> >
> > I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> > an 'unsigned long'.
> >
>
> No, please please not. This is not a pfn and not a pfn_t. It is a page-ptr
> and a flag that says where/how to put_page it. IE I did a GUP on this page
> please do a PUP on this page instead of regular put_page. So no where do I mean
> pfn or pfn_t in this code. Then why?

If it's not a pfn then it shouldn't be an unsigned long named "bv_pfn".

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-17 21:53         ` Dan Williams
@ 2019-04-17 22:28           ` Jerome Glisse
  2019-04-17 23:32             ` Dan Williams
  2019-04-18 10:42             ` Jan Kara
  0 siblings, 2 replies; 47+ messages in thread
From: Jerome Glisse @ 2019-04-17 22:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Kent Overstreet, Boaz Harrosh, Linux Kernel Mailing List,
	linux-fsdevel, linux-block, Linux MM, John Hubbard, Jan Kara,
	Alexander Viro, Johannes Thumshirn, Christoph Hellwig,
	Jens Axboe, Ming Lei, Jason Gunthorpe, Matthew Wilcox,
	Steve French, linux-cifs, samba-technical, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg, devel,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Wed, Apr 17, 2019 at 02:53:28PM -0700, Dan Williams wrote:
> On Tue, Apr 16, 2019 at 12:50 PM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Tue, Apr 16, 2019 at 12:12:27PM -0700, Dan Williams wrote:
> > > On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> > > <kent.overstreet@gmail.com> wrote:
> > > >
> > > > On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> > > > > On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > > > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > > >
> > > > > > This patchset depends on various small fixes [1] and also on patchset
> > > > > > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > > > > > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > > > > > so that it can get review and comments on how and what should be done
> > > > > > to test things.
> > > > > >
> > > > > > For various reasons [2] [3] we want to track page reference through GUP
> > > > > > differently than "regular" page reference. Thus we need to keep track
> > > > > > of how we got a page within the block and fs layer. To do so this patch-
> > > > > > set change the bio_bvec struct to store a pfn and flags instead of a
> > > > > > direct pointer to a page. This way we can flag page that are coming from
> > > > > > GUP.
> > > > > >
> > > > > > This patchset is divided as follow:
> > > > > >     - First part of the patchset is just small cleanup i believe they
> > > > > >       can go in as his assuming people are ok with them.
> > > > >
> > > > >
> > > > > >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> > > > > >       done in multi-step, first we replace all direct dereference of
> > > > > >       the field by call to inline helper, then we introduce macro for
> > > > > >       bio_bvec that are initialized on the stack. Finaly we change the
> > > > > >       bv_page field to bv_pfn.
> > > > >
> > > > > Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> > > > > as a flag (pointer always aligned to 64 bytes in our case).
> > > > >
> > > > > So yes we need an inline helper for reference of the page but is it not clearer
> > > > > that we assume a page* and not any kind of pfn ?
> > > > > It will not be the first place using low bits of a pointer for flags.
> > > > >
> > > > > That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> > > > > at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> > > > > can just submit them as two separate BIOs (chained at the block layer).
> > > > >
> > > > > Many users just submit one page bios and let elevator merge them any way.
> > > >
> > > > Let's please not add additional flags and weirdness to struct bio - "if this
> > > > flag is set interpret one way, if not interpret another" - or eventually bios
> > > > will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.
> > >
> > > This all reminds of the failed attempt to teach the block layer to
> > > operate without pages:
> > >
> > > https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> > >
> > > >
> > > > Question though - why do we need a flag for whether a page is a GUP page or not?
> > > > Couldn't the needed information just be determined by what range the pfn is not
> > > > (i.e. whether or not it has a struct page associated with it)?
> > >
> > > That amounts to a pfn_valid() check which is a bit heavier than if we
> > > can store a flag in the bv_pfn entry directly.
> > >
> > > I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> > > an 'unsigned long'.
> > >
> > > That said, I'm still in favor of Jan's proposal to just make the
> > > bv_page semantics uniform. Otherwise we're complicating this core
> > > infrastructure for some yet to be implemented GPU memory management
> > > capabilities with yet to be determined value. Circle back when that
> > > value is clear, but in the meantime fix the GUP bug.
> >
> > This has nothing to do with GPU, what make you think so ? Here i am
> > trying to solve GUP and to keep the value of knowing wether a page
> > has been GUP or not. I argue that if we bias every page in every bio
> > then we loose that information and thus the value.
> >
> > I gave the page protection mechanisms as an example that would be
> > impacted but it is not the only one. Knowing if a page has been GUP
> > can be useful for memory reclaimation, compaction, NUMA balancing,
> 
> Right, this is what I was reacting to in your pushback to Jan's
> proposal. You're claiming value for not doing the simple thing for
> some future "may be useful in these contexts". To my knowledge those
> things are not broken today. You're asking for the complexity to be
> carried today for some future benefit, and I'm asking for the
> simplicity to be maintained as much as possible today and let the
> value of future changes stand on their own to push for more complexity
> later.
> 
> Effectively don't use this bug fix to push complexity for a future
> agenda where the value has yet to be quantified.

Except that this solution (biasing everyone in bio) would _more complex_
it is only conceptualy appealing. The changes are on the other hand much
deeper and much riskier but you decided to ignore that and focus on some-
thing i was just giving as an example.

Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-17 22:28           ` Jerome Glisse
@ 2019-04-17 23:32             ` Dan Williams
  2019-04-18 10:42             ` Jan Kara
  1 sibling, 0 replies; 47+ messages in thread
From: Dan Williams @ 2019-04-17 23:32 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Kent Overstreet, Boaz Harrosh, Linux Kernel Mailing List,
	linux-fsdevel, linux-block, Linux MM, John Hubbard, Jan Kara,
	Alexander Viro, Johannes Thumshirn, Christoph Hellwig,
	Jens Axboe, Ming Lei, Jason Gunthorpe, Matthew Wilcox,
	Steve French, linux-cifs, samba-technical, Yan Zheng, Sage Weil,
	Ilya Dryomov, Alex Elder, ceph-devel, Eric Van Hensbergen,
	Latchesar Ionkov, Mike Marshall, Martin Brandenburg, devel,
	Dominique Martinet, v9fs-developer, Coly Li, linux-bcache,
	Ernesto A. Fernández

On Wed, Apr 17, 2019 at 3:29 PM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Wed, Apr 17, 2019 at 02:53:28PM -0700, Dan Williams wrote:
> > On Tue, Apr 16, 2019 at 12:50 PM Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > On Tue, Apr 16, 2019 at 12:12:27PM -0700, Dan Williams wrote:
> > > > On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> > > > <kent.overstreet@gmail.com> wrote:
> > > > >
> > > > > On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> > > > > > On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > > > > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > > > >
> > > > > > > This patchset depends on various small fixes [1] and also on patchset
> > > > > > > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > > > > > > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > > > > > > so that it can get review and comments on how and what should be done
> > > > > > > to test things.
> > > > > > >
> > > > > > > For various reasons [2] [3] we want to track page reference through GUP
> > > > > > > differently than "regular" page reference. Thus we need to keep track
> > > > > > > of how we got a page within the block and fs layer. To do so this patch-
> > > > > > > set change the bio_bvec struct to store a pfn and flags instead of a
> > > > > > > direct pointer to a page. This way we can flag page that are coming from
> > > > > > > GUP.
> > > > > > >
> > > > > > > This patchset is divided as follow:
> > > > > > >     - First part of the patchset is just small cleanup i believe they
> > > > > > >       can go in as his assuming people are ok with them.
> > > > > >
> > > > > >
> > > > > > >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> > > > > > >       done in multi-step, first we replace all direct dereference of
> > > > > > >       the field by call to inline helper, then we introduce macro for
> > > > > > >       bio_bvec that are initialized on the stack. Finaly we change the
> > > > > > >       bv_page field to bv_pfn.
> > > > > >
> > > > > > Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> > > > > > as a flag (pointer always aligned to 64 bytes in our case).
> > > > > >
> > > > > > So yes we need an inline helper for reference of the page but is it not clearer
> > > > > > that we assume a page* and not any kind of pfn ?
> > > > > > It will not be the first place using low bits of a pointer for flags.
> > > > > >
> > > > > > That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> > > > > > at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> > > > > > can just submit them as two separate BIOs (chained at the block layer).
> > > > > >
> > > > > > Many users just submit one page bios and let elevator merge them any way.
> > > > >
> > > > > Let's please not add additional flags and weirdness to struct bio - "if this
> > > > > flag is set interpret one way, if not interpret another" - or eventually bios
> > > > > will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.
> > > >
> > > > This all reminds of the failed attempt to teach the block layer to
> > > > operate without pages:
> > > >
> > > > https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> > > >
> > > > >
> > > > > Question though - why do we need a flag for whether a page is a GUP page or not?
> > > > > Couldn't the needed information just be determined by what range the pfn is not
> > > > > (i.e. whether or not it has a struct page associated with it)?
> > > >
> > > > That amounts to a pfn_valid() check which is a bit heavier than if we
> > > > can store a flag in the bv_pfn entry directly.
> > > >
> > > > I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> > > > an 'unsigned long'.
> > > >
> > > > That said, I'm still in favor of Jan's proposal to just make the
> > > > bv_page semantics uniform. Otherwise we're complicating this core
> > > > infrastructure for some yet to be implemented GPU memory management
> > > > capabilities with yet to be determined value. Circle back when that
> > > > value is clear, but in the meantime fix the GUP bug.
> > >
> > > This has nothing to do with GPU, what make you think so ? Here i am
> > > trying to solve GUP and to keep the value of knowing wether a page
> > > has been GUP or not. I argue that if we bias every page in every bio
> > > then we loose that information and thus the value.
> > >
> > > I gave the page protection mechanisms as an example that would be
> > > impacted but it is not the only one. Knowing if a page has been GUP
> > > can be useful for memory reclaimation, compaction, NUMA balancing,
> >
> > Right, this is what I was reacting to in your pushback to Jan's
> > proposal. You're claiming value for not doing the simple thing for
> > some future "may be useful in these contexts". To my knowledge those
> > things are not broken today. You're asking for the complexity to be
> > carried today for some future benefit, and I'm asking for the
> > simplicity to be maintained as much as possible today and let the
> > value of future changes stand on their own to push for more complexity
> > later.
> >
> > Effectively don't use this bug fix to push complexity for a future
> > agenda where the value has yet to be quantified.
>
> Except that this solution (biasing everyone in bio) would _more complex_
> it is only conceptualy appealing. The changes are on the other hand much
> deeper and much riskier but you decided to ignore that and focus on some-
> thing i was just giving as an example.

Not ignoring, asking for more clarification on the complexity it
introduces independent of potential future uses.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-17 22:28           ` Jerome Glisse
  2019-04-17 23:32             ` Dan Williams
@ 2019-04-18 10:42             ` Jan Kara
  2019-04-18 14:27               ` Jerome Glisse
  2019-04-18 18:03               ` Dan Williams
  1 sibling, 2 replies; 47+ messages in thread
From: Jan Kara @ 2019-04-18 10:42 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Dan Williams, Kent Overstreet, Boaz Harrosh,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Jan Kara, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, samba-technical,
	Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder, ceph-devel,
	Eric Van Hensbergen, Latchesar Ionkov, Mike Marshall,
	Martin Brandenburg, devel, Dominique Martinet, v9fs-developer,
	Coly Li, linux-bcache, Ernesto A. Fernández

On Wed 17-04-19 18:28:58, Jerome Glisse wrote:
> On Wed, Apr 17, 2019 at 02:53:28PM -0700, Dan Williams wrote:
> > On Tue, Apr 16, 2019 at 12:50 PM Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > On Tue, Apr 16, 2019 at 12:12:27PM -0700, Dan Williams wrote:
> > > > On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> > > > <kent.overstreet@gmail.com> wrote:
> > > > >
> > > > > On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> > > > > > On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > > > > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > > > >
> > > > > > > This patchset depends on various small fixes [1] and also on patchset
> > > > > > > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > > > > > > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > > > > > > so that it can get review and comments on how and what should be done
> > > > > > > to test things.
> > > > > > >
> > > > > > > For various reasons [2] [3] we want to track page reference through GUP
> > > > > > > differently than "regular" page reference. Thus we need to keep track
> > > > > > > of how we got a page within the block and fs layer. To do so this patch-
> > > > > > > set change the bio_bvec struct to store a pfn and flags instead of a
> > > > > > > direct pointer to a page. This way we can flag page that are coming from
> > > > > > > GUP.
> > > > > > >
> > > > > > > This patchset is divided as follow:
> > > > > > >     - First part of the patchset is just small cleanup i believe they
> > > > > > >       can go in as his assuming people are ok with them.
> > > > > >
> > > > > >
> > > > > > >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> > > > > > >       done in multi-step, first we replace all direct dereference of
> > > > > > >       the field by call to inline helper, then we introduce macro for
> > > > > > >       bio_bvec that are initialized on the stack. Finaly we change the
> > > > > > >       bv_page field to bv_pfn.
> > > > > >
> > > > > > Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> > > > > > as a flag (pointer always aligned to 64 bytes in our case).
> > > > > >
> > > > > > So yes we need an inline helper for reference of the page but is it not clearer
> > > > > > that we assume a page* and not any kind of pfn ?
> > > > > > It will not be the first place using low bits of a pointer for flags.
> > > > > >
> > > > > > That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> > > > > > at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> > > > > > can just submit them as two separate BIOs (chained at the block layer).
> > > > > >
> > > > > > Many users just submit one page bios and let elevator merge them any way.
> > > > >
> > > > > Let's please not add additional flags and weirdness to struct bio - "if this
> > > > > flag is set interpret one way, if not interpret another" - or eventually bios
> > > > > will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.
> > > >
> > > > This all reminds of the failed attempt to teach the block layer to
> > > > operate without pages:
> > > >
> > > > https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> > > >
> > > > >
> > > > > Question though - why do we need a flag for whether a page is a GUP page or not?
> > > > > Couldn't the needed information just be determined by what range the pfn is not
> > > > > (i.e. whether or not it has a struct page associated with it)?
> > > >
> > > > That amounts to a pfn_valid() check which is a bit heavier than if we
> > > > can store a flag in the bv_pfn entry directly.
> > > >
> > > > I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> > > > an 'unsigned long'.
> > > >
> > > > That said, I'm still in favor of Jan's proposal to just make the
> > > > bv_page semantics uniform. Otherwise we're complicating this core
> > > > infrastructure for some yet to be implemented GPU memory management
> > > > capabilities with yet to be determined value. Circle back when that
> > > > value is clear, but in the meantime fix the GUP bug.
> > >
> > > This has nothing to do with GPU, what make you think so ? Here i am
> > > trying to solve GUP and to keep the value of knowing wether a page
> > > has been GUP or not. I argue that if we bias every page in every bio
> > > then we loose that information and thus the value.
> > >
> > > I gave the page protection mechanisms as an example that would be
> > > impacted but it is not the only one. Knowing if a page has been GUP
> > > can be useful for memory reclaimation, compaction, NUMA balancing,
> > 
> > Right, this is what I was reacting to in your pushback to Jan's
> > proposal. You're claiming value for not doing the simple thing for
> > some future "may be useful in these contexts". To my knowledge those
> > things are not broken today. You're asking for the complexity to be
> > carried today for some future benefit, and I'm asking for the
> > simplicity to be maintained as much as possible today and let the
> > value of future changes stand on their own to push for more complexity
> > later.
> > 
> > Effectively don't use this bug fix to push complexity for a future
> > agenda where the value has yet to be quantified.
> 
> Except that this solution (biasing everyone in bio) would _more complex_
> it is only conceptualy appealing. The changes are on the other hand much
> deeper and much riskier but you decided to ignore that and focus on some-
> thing i was just giving as an example.

Yeah, after going and reading several places like fs/iomap.c, fs/mpage.c,
drivers/md/dm-io.c I agree with you. The places that are not doing direct
IO usually just don't hold any page reference that could be directly
attributed to the bio (and they don't drop it when bio finishes). They
rather use other means (like PageLocked, PageWriteback) to make sure the
page stays alive so mandating gup-pin reference for all pages attached to a
bio would require a lot of reworking of places that are not related to our
problem and currently work just fine. So I withdraw my suggestion. Nice in
theory, too much work in practice ;).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-18 10:42             ` Jan Kara
@ 2019-04-18 14:27               ` Jerome Glisse
  2019-04-18 15:30                 ` Jan Kara
  2019-04-18 18:03               ` Dan Williams
  1 sibling, 1 reply; 47+ messages in thread
From: Jerome Glisse @ 2019-04-18 14:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dan Williams, Kent Overstreet, Boaz Harrosh,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, samba-technical,
	Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder, ceph-devel,
	Eric Van Hensbergen, Latchesar Ionkov, Mike Marshall,
	Martin Brandenburg, devel, Dominique Martinet, v9fs-developer,
	Coly Li, linux-bcache, Ernesto A. Fernández

On Thu, Apr 18, 2019 at 12:42:05PM +0200, Jan Kara wrote:
> On Wed 17-04-19 18:28:58, Jerome Glisse wrote:
> > On Wed, Apr 17, 2019 at 02:53:28PM -0700, Dan Williams wrote:
> > > On Tue, Apr 16, 2019 at 12:50 PM Jerome Glisse <jglisse@redhat.com> wrote:
> > > >
> > > > On Tue, Apr 16, 2019 at 12:12:27PM -0700, Dan Williams wrote:
> > > > > On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> > > > > <kent.overstreet@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> > > > > > > On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > > > > > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > > > > >
> > > > > > > > This patchset depends on various small fixes [1] and also on patchset
> > > > > > > > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > > > > > > > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > > > > > > > so that it can get review and comments on how and what should be done
> > > > > > > > to test things.
> > > > > > > >
> > > > > > > > For various reasons [2] [3] we want to track page reference through GUP
> > > > > > > > differently than "regular" page reference. Thus we need to keep track
> > > > > > > > of how we got a page within the block and fs layer. To do so this patch-
> > > > > > > > set change the bio_bvec struct to store a pfn and flags instead of a
> > > > > > > > direct pointer to a page. This way we can flag page that are coming from
> > > > > > > > GUP.
> > > > > > > >
> > > > > > > > This patchset is divided as follow:
> > > > > > > >     - First part of the patchset is just small cleanup i believe they
> > > > > > > >       can go in as his assuming people are ok with them.
> > > > > > >
> > > > > > >
> > > > > > > >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> > > > > > > >       done in multi-step, first we replace all direct dereference of
> > > > > > > >       the field by call to inline helper, then we introduce macro for
> > > > > > > >       bio_bvec that are initialized on the stack. Finaly we change the
> > > > > > > >       bv_page field to bv_pfn.
> > > > > > >
> > > > > > > Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> > > > > > > as a flag (pointer always aligned to 64 bytes in our case).
> > > > > > >
> > > > > > > So yes we need an inline helper for reference of the page but is it not clearer
> > > > > > > that we assume a page* and not any kind of pfn ?
> > > > > > > It will not be the first place using low bits of a pointer for flags.
> > > > > > >
> > > > > > > That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> > > > > > > at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> > > > > > > can just submit them as two separate BIOs (chained at the block layer).
> > > > > > >
> > > > > > > Many users just submit one page bios and let elevator merge them any way.
> > > > > >
> > > > > > Let's please not add additional flags and weirdness to struct bio - "if this
> > > > > > flag is set interpret one way, if not interpret another" - or eventually bios
> > > > > > will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.
> > > > >
> > > > > This all reminds of the failed attempt to teach the block layer to
> > > > > operate without pages:
> > > > >
> > > > > https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> > > > >
> > > > > >
> > > > > > Question though - why do we need a flag for whether a page is a GUP page or not?
> > > > > > Couldn't the needed information just be determined by what range the pfn is not
> > > > > > (i.e. whether or not it has a struct page associated with it)?
> > > > >
> > > > > That amounts to a pfn_valid() check which is a bit heavier than if we
> > > > > can store a flag in the bv_pfn entry directly.
> > > > >
> > > > > I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> > > > > an 'unsigned long'.
> > > > >
> > > > > That said, I'm still in favor of Jan's proposal to just make the
> > > > > bv_page semantics uniform. Otherwise we're complicating this core
> > > > > infrastructure for some yet to be implemented GPU memory management
> > > > > capabilities with yet to be determined value. Circle back when that
> > > > > value is clear, but in the meantime fix the GUP bug.
> > > >
> > > > This has nothing to do with GPU, what make you think so ? Here i am
> > > > trying to solve GUP and to keep the value of knowing wether a page
> > > > has been GUP or not. I argue that if we bias every page in every bio
> > > > then we loose that information and thus the value.
> > > >
> > > > I gave the page protection mechanisms as an example that would be
> > > > impacted but it is not the only one. Knowing if a page has been GUP
> > > > can be useful for memory reclaimation, compaction, NUMA balancing,
> > > 
> > > Right, this is what I was reacting to in your pushback to Jan's
> > > proposal. You're claiming value for not doing the simple thing for
> > > some future "may be useful in these contexts". To my knowledge those
> > > things are not broken today. You're asking for the complexity to be
> > > carried today for some future benefit, and I'm asking for the
> > > simplicity to be maintained as much as possible today and let the
> > > value of future changes stand on their own to push for more complexity
> > > later.
> > > 
> > > Effectively don't use this bug fix to push complexity for a future
> > > agenda where the value has yet to be quantified.
> > 
> > Except that this solution (biasing everyone in bio) would _more complex_
> > it is only conceptualy appealing. The changes are on the other hand much
> > deeper and much riskier but you decided to ignore that and focus on some-
> > thing i was just giving as an example.
> 
> Yeah, after going and reading several places like fs/iomap.c, fs/mpage.c,
> drivers/md/dm-io.c I agree with you. The places that are not doing direct
> IO usually just don't hold any page reference that could be directly
> attributed to the bio (and they don't drop it when bio finishes). They
> rather use other means (like PageLocked, PageWriteback) to make sure the
> page stays alive so mandating gup-pin reference for all pages attached to a
> bio would require a lot of reworking of places that are not related to our
> problem and currently work just fine. So I withdraw my suggestion. Nice in
> theory, too much work in practice ;).

Have you seem Boaz proposal ? I have started it and it does not look to
bad (but you knwo taste and color :)) You can take a peek:

https://cgit.freedesktop.org/~glisse/linux/log/?h=gup-bio-v2

I need to finish that and run fstests on bunch of different fs before
posting. Dunno if i will have enough time to do that before LSF/MM.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-18 14:27               ` Jerome Glisse
@ 2019-04-18 15:30                 ` Jan Kara
  2019-04-18 15:36                   ` Jerome Glisse
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2019-04-18 15:30 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Jan Kara, Dan Williams, Kent Overstreet, Boaz Harrosh,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, samba-technical,
	Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder, ceph-devel,
	Eric Van Hensbergen, Latchesar Ionkov, Mike Marshall,
	Martin Brandenburg, devel, Dominique Martinet, v9fs-developer,
	Coly Li, linux-bcache, Ernesto A. Fernández

On Thu 18-04-19 10:27:29, Jerome Glisse wrote:
> On Thu, Apr 18, 2019 at 12:42:05PM +0200, Jan Kara wrote:
> > On Wed 17-04-19 18:28:58, Jerome Glisse wrote:
> > > On Wed, Apr 17, 2019 at 02:53:28PM -0700, Dan Williams wrote:
> > > > On Tue, Apr 16, 2019 at 12:50 PM Jerome Glisse <jglisse@redhat.com> wrote:
> > > > >
> > > > > On Tue, Apr 16, 2019 at 12:12:27PM -0700, Dan Williams wrote:
> > > > > > On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> > > > > > <kent.overstreet@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> > > > > > > > On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > > > > > > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > > > > > >
> > > > > > > > > This patchset depends on various small fixes [1] and also on patchset
> > > > > > > > > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > > > > > > > > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > > > > > > > > so that it can get review and comments on how and what should be done
> > > > > > > > > to test things.
> > > > > > > > >
> > > > > > > > > For various reasons [2] [3] we want to track page reference through GUP
> > > > > > > > > differently than "regular" page reference. Thus we need to keep track
> > > > > > > > > of how we got a page within the block and fs layer. To do so this patch-
> > > > > > > > > set change the bio_bvec struct to store a pfn and flags instead of a
> > > > > > > > > direct pointer to a page. This way we can flag page that are coming from
> > > > > > > > > GUP.
> > > > > > > > >
> > > > > > > > > This patchset is divided as follow:
> > > > > > > > >     - First part of the patchset is just small cleanup i believe they
> > > > > > > > >       can go in as his assuming people are ok with them.
> > > > > > > >
> > > > > > > >
> > > > > > > > >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> > > > > > > > >       done in multi-step, first we replace all direct dereference of
> > > > > > > > >       the field by call to inline helper, then we introduce macro for
> > > > > > > > >       bio_bvec that are initialized on the stack. Finaly we change the
> > > > > > > > >       bv_page field to bv_pfn.
> > > > > > > >
> > > > > > > > Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> > > > > > > > as a flag (pointer always aligned to 64 bytes in our case).
> > > > > > > >
> > > > > > > > So yes we need an inline helper for reference of the page but is it not clearer
> > > > > > > > that we assume a page* and not any kind of pfn ?
> > > > > > > > It will not be the first place using low bits of a pointer for flags.
> > > > > > > >
> > > > > > > > That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> > > > > > > > at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> > > > > > > > can just submit them as two separate BIOs (chained at the block layer).
> > > > > > > >
> > > > > > > > Many users just submit one page bios and let elevator merge them any way.
> > > > > > >
> > > > > > > Let's please not add additional flags and weirdness to struct bio - "if this
> > > > > > > flag is set interpret one way, if not interpret another" - or eventually bios
> > > > > > > will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.
> > > > > >
> > > > > > This all reminds of the failed attempt to teach the block layer to
> > > > > > operate without pages:
> > > > > >
> > > > > > https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> > > > > >
> > > > > > >
> > > > > > > Question though - why do we need a flag for whether a page is a GUP page or not?
> > > > > > > Couldn't the needed information just be determined by what range the pfn is not
> > > > > > > (i.e. whether or not it has a struct page associated with it)?
> > > > > >
> > > > > > That amounts to a pfn_valid() check which is a bit heavier than if we
> > > > > > can store a flag in the bv_pfn entry directly.
> > > > > >
> > > > > > I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> > > > > > an 'unsigned long'.
> > > > > >
> > > > > > That said, I'm still in favor of Jan's proposal to just make the
> > > > > > bv_page semantics uniform. Otherwise we're complicating this core
> > > > > > infrastructure for some yet to be implemented GPU memory management
> > > > > > capabilities with yet to be determined value. Circle back when that
> > > > > > value is clear, but in the meantime fix the GUP bug.
> > > > >
> > > > > This has nothing to do with GPU, what make you think so ? Here i am
> > > > > trying to solve GUP and to keep the value of knowing wether a page
> > > > > has been GUP or not. I argue that if we bias every page in every bio
> > > > > then we loose that information and thus the value.
> > > > >
> > > > > I gave the page protection mechanisms as an example that would be
> > > > > impacted but it is not the only one. Knowing if a page has been GUP
> > > > > can be useful for memory reclaimation, compaction, NUMA balancing,
> > > > 
> > > > Right, this is what I was reacting to in your pushback to Jan's
> > > > proposal. You're claiming value for not doing the simple thing for
> > > > some future "may be useful in these contexts". To my knowledge those
> > > > things are not broken today. You're asking for the complexity to be
> > > > carried today for some future benefit, and I'm asking for the
> > > > simplicity to be maintained as much as possible today and let the
> > > > value of future changes stand on their own to push for more complexity
> > > > later.
> > > > 
> > > > Effectively don't use this bug fix to push complexity for a future
> > > > agenda where the value has yet to be quantified.
> > > 
> > > Except that this solution (biasing everyone in bio) would _more complex_
> > > it is only conceptualy appealing. The changes are on the other hand much
> > > deeper and much riskier but you decided to ignore that and focus on some-
> > > thing i was just giving as an example.
> > 
> > Yeah, after going and reading several places like fs/iomap.c, fs/mpage.c,
> > drivers/md/dm-io.c I agree with you. The places that are not doing direct
> > IO usually just don't hold any page reference that could be directly
> > attributed to the bio (and they don't drop it when bio finishes). They
> > rather use other means (like PageLocked, PageWriteback) to make sure the
> > page stays alive so mandating gup-pin reference for all pages attached to a
> > bio would require a lot of reworking of places that are not related to our
> > problem and currently work just fine. So I withdraw my suggestion. Nice in
> > theory, too much work in practice ;).
> 
> Have you seem Boaz proposal ? I have started it and it does not look to
> bad (but you knwo taste and color :)) You can take a peek:
> 
> https://cgit.freedesktop.org/~glisse/linux/log/?h=gup-bio-v2
> 
> I need to finish that and run fstests on bunch of different fs before
> posting. Dunno if i will have enough time to do that before LSF/MM.

Yes, I've seen it. I just wasn't sure how the result will look like. What
you have in your tree looks pretty clean so far. BTW (I know I'm repeating
myself ;) what if we made iov_iter_get_pages() & iov_iter_get_pages_alloc()
always return gup-pin reference? That would get rid of the need for two
ioend handlers for each call site...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-18 15:30                 ` Jan Kara
@ 2019-04-18 15:36                   ` Jerome Glisse
  0 siblings, 0 replies; 47+ messages in thread
From: Jerome Glisse @ 2019-04-18 15:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dan Williams, Kent Overstreet, Boaz Harrosh,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, samba-technical,
	Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder, ceph-devel,
	Eric Van Hensbergen, Latchesar Ionkov, Mike Marshall,
	Martin Brandenburg, devel, Dominique Martinet, v9fs-developer,
	Coly Li, linux-bcache, Ernesto A. Fernández

On Thu, Apr 18, 2019 at 05:30:47PM +0200, Jan Kara wrote:
> On Thu 18-04-19 10:27:29, Jerome Glisse wrote:
> > On Thu, Apr 18, 2019 at 12:42:05PM +0200, Jan Kara wrote:
> > > On Wed 17-04-19 18:28:58, Jerome Glisse wrote:
> > > > On Wed, Apr 17, 2019 at 02:53:28PM -0700, Dan Williams wrote:
> > > > > On Tue, Apr 16, 2019 at 12:50 PM Jerome Glisse <jglisse@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Apr 16, 2019 at 12:12:27PM -0700, Dan Williams wrote:
> > > > > > > On Tue, Apr 16, 2019 at 11:59 AM Kent Overstreet
> > > > > > > <kent.overstreet@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Apr 16, 2019 at 09:35:04PM +0300, Boaz Harrosh wrote:
> > > > > > > > > On Thu, Apr 11, 2019 at 05:08:19PM -0400, jglisse@redhat.com wrote:
> > > > > > > > > > From: Jérôme Glisse <jglisse@redhat.com>
> > > > > > > > > >
> > > > > > > > > > This patchset depends on various small fixes [1] and also on patchset
> > > > > > > > > > which introduce put_user_page*() [2] and thus is 5.3 material as those
> > > > > > > > > > pre-requisite will get in 5.2 at best. Nonetheless i am posting it now
> > > > > > > > > > so that it can get review and comments on how and what should be done
> > > > > > > > > > to test things.
> > > > > > > > > >
> > > > > > > > > > For various reasons [2] [3] we want to track page reference through GUP
> > > > > > > > > > differently than "regular" page reference. Thus we need to keep track
> > > > > > > > > > of how we got a page within the block and fs layer. To do so this patch-
> > > > > > > > > > set change the bio_bvec struct to store a pfn and flags instead of a
> > > > > > > > > > direct pointer to a page. This way we can flag page that are coming from
> > > > > > > > > > GUP.
> > > > > > > > > >
> > > > > > > > > > This patchset is divided as follow:
> > > > > > > > > >     - First part of the patchset is just small cleanup i believe they
> > > > > > > > > >       can go in as his assuming people are ok with them.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >     - Second part convert bio_vec->bv_page to bio_vec->bv_pfn this is
> > > > > > > > > >       done in multi-step, first we replace all direct dereference of
> > > > > > > > > >       the field by call to inline helper, then we introduce macro for
> > > > > > > > > >       bio_bvec that are initialized on the stack. Finaly we change the
> > > > > > > > > >       bv_page field to bv_pfn.
> > > > > > > > >
> > > > > > > > > Why do we need a bv_pfn. Why not just use the lowest bit of the page-ptr
> > > > > > > > > as a flag (pointer always aligned to 64 bytes in our case).
> > > > > > > > >
> > > > > > > > > So yes we need an inline helper for reference of the page but is it not clearer
> > > > > > > > > that we assume a page* and not any kind of pfn ?
> > > > > > > > > It will not be the first place using low bits of a pointer for flags.
> > > > > > > > >
> > > > > > > > > That said. Why we need it at all? I mean why not have it as a bio flag. If it exist
> > > > > > > > > at all that a user has a GUP and none-GUP pages to IO at the same request he/she
> > > > > > > > > can just submit them as two separate BIOs (chained at the block layer).
> > > > > > > > >
> > > > > > > > > Many users just submit one page bios and let elevator merge them any way.
> > > > > > > >
> > > > > > > > Let's please not add additional flags and weirdness to struct bio - "if this
> > > > > > > > flag is set interpret one way, if not interpret another" - or eventually bios
> > > > > > > > will be as bad as skbuffs. I would much prefer just changing bv_page to bv_pfn.
> > > > > > >
> > > > > > > This all reminds of the failed attempt to teach the block layer to
> > > > > > > operate without pages:
> > > > > > >
> > > > > > > https://lore.kernel.org/lkml/20150316201640.33102.33761.stgit@dwillia2-desk3.amr.corp.intel.com/
> > > > > > >
> > > > > > > >
> > > > > > > > Question though - why do we need a flag for whether a page is a GUP page or not?
> > > > > > > > Couldn't the needed information just be determined by what range the pfn is not
> > > > > > > > (i.e. whether or not it has a struct page associated with it)?
> > > > > > >
> > > > > > > That amounts to a pfn_valid() check which is a bit heavier than if we
> > > > > > > can store a flag in the bv_pfn entry directly.
> > > > > > >
> > > > > > > I'd say create a new PFN_* flag, and make bv_pfn a 'pfn_t' rather than
> > > > > > > an 'unsigned long'.
> > > > > > >
> > > > > > > That said, I'm still in favor of Jan's proposal to just make the
> > > > > > > bv_page semantics uniform. Otherwise we're complicating this core
> > > > > > > infrastructure for some yet to be implemented GPU memory management
> > > > > > > capabilities with yet to be determined value. Circle back when that
> > > > > > > value is clear, but in the meantime fix the GUP bug.
> > > > > >
> > > > > > This has nothing to do with GPU, what make you think so ? Here i am
> > > > > > trying to solve GUP and to keep the value of knowing wether a page
> > > > > > has been GUP or not. I argue that if we bias every page in every bio
> > > > > > then we loose that information and thus the value.
> > > > > >
> > > > > > I gave the page protection mechanisms as an example that would be
> > > > > > impacted but it is not the only one. Knowing if a page has been GUP
> > > > > > can be useful for memory reclaimation, compaction, NUMA balancing,
> > > > > 
> > > > > Right, this is what I was reacting to in your pushback to Jan's
> > > > > proposal. You're claiming value for not doing the simple thing for
> > > > > some future "may be useful in these contexts". To my knowledge those
> > > > > things are not broken today. You're asking for the complexity to be
> > > > > carried today for some future benefit, and I'm asking for the
> > > > > simplicity to be maintained as much as possible today and let the
> > > > > value of future changes stand on their own to push for more complexity
> > > > > later.
> > > > > 
> > > > > Effectively don't use this bug fix to push complexity for a future
> > > > > agenda where the value has yet to be quantified.
> > > > 
> > > > Except that this solution (biasing everyone in bio) would _more complex_
> > > > it is only conceptualy appealing. The changes are on the other hand much
> > > > deeper and much riskier but you decided to ignore that and focus on some-
> > > > thing i was just giving as an example.
> > > 
> > > Yeah, after going and reading several places like fs/iomap.c, fs/mpage.c,
> > > drivers/md/dm-io.c I agree with you. The places that are not doing direct
> > > IO usually just don't hold any page reference that could be directly
> > > attributed to the bio (and they don't drop it when bio finishes). They
> > > rather use other means (like PageLocked, PageWriteback) to make sure the
> > > page stays alive so mandating gup-pin reference for all pages attached to a
> > > bio would require a lot of reworking of places that are not related to our
> > > problem and currently work just fine. So I withdraw my suggestion. Nice in
> > > theory, too much work in practice ;).
> > 
> > Have you seem Boaz proposal ? I have started it and it does not look to
> > bad (but you knwo taste and color :)) You can take a peek:
> > 
> > https://cgit.freedesktop.org/~glisse/linux/log/?h=gup-bio-v2
> > 
> > I need to finish that and run fstests on bunch of different fs before
> > posting. Dunno if i will have enough time to do that before LSF/MM.
> 
> Yes, I've seen it. I just wasn't sure how the result will look like. What
> you have in your tree looks pretty clean so far. BTW (I know I'm repeating
> myself ;) what if we made iov_iter_get_pages() & iov_iter_get_pages_alloc()
> always return gup-pin reference? That would get rid of the need for two
> ioend handlers for each call site...

No it would not everywhere, some of the callsite have more way to fill
the bio then just iov_iter_get_pages*() so i expect a good chunk of
the patches i have would stay the same because of that. So in the end
it might now simplify much maybe couple place.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/15] Keep track of GUPed pages in fs and block
  2019-04-18 10:42             ` Jan Kara
  2019-04-18 14:27               ` Jerome Glisse
@ 2019-04-18 18:03               ` Dan Williams
  1 sibling, 0 replies; 47+ messages in thread
From: Dan Williams @ 2019-04-18 18:03 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jerome Glisse, Kent Overstreet, Boaz Harrosh,
	Linux Kernel Mailing List, linux-fsdevel, linux-block, Linux MM,
	John Hubbard, Alexander Viro, Johannes Thumshirn,
	Christoph Hellwig, Jens Axboe, Ming Lei, Jason Gunthorpe,
	Matthew Wilcox, Steve French, linux-cifs, samba-technical,
	Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder, ceph-devel,
	Eric Van Hensbergen, Latchesar Ionkov, Mike Marshall,
	Martin Brandenburg, devel, Dominique Martinet, v9fs-developer,
	Coly Li, linux-bcache, Ernesto A. Fernández

On Thu, Apr 18, 2019 at 3:42 AM Jan Kara <jack@suse.cz> wrote:
> > Except that this solution (biasing everyone in bio) would _more complex_
> > it is only conceptualy appealing. The changes are on the other hand much
> > deeper and much riskier but you decided to ignore that and focus on some-
> > thing i was just giving as an example.
>
> Yeah, after going and reading several places like fs/iomap.c, fs/mpage.c,
> drivers/md/dm-io.c I agree with you. The places that are not doing direct
> IO usually just don't hold any page reference that could be directly
> attributed to the bio (and they don't drop it when bio finishes). They
> rather use other means (like PageLocked, PageWriteback) to make sure the
> page stays alive so mandating gup-pin reference for all pages attached to a
> bio would require a lot of reworking of places that are not related to our
> problem and currently work just fine. So I withdraw my suggestion. Nice in
> theory, too much work in practice ;).

Is it though? We already have BIO_NO_PAGE_REF, so it seems it would be
a useful cleanup to have all locations that don't participate in page
references use that existing flag and then teach all other locations
to use gup-pinned pages.

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2019-04-18 18:28 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11 21:08 [PATCH v1 00/15] Keep track of GUPed pages in fs and block jglisse
2019-04-11 21:08 ` [PATCH v1 01/15] fs/direct-io: fix trailing whitespace issues jglisse
2019-04-11 21:08 ` [PATCH v1 02/15] iov_iter: add helper to test if an iter would use GUP jglisse
2019-04-11 21:08 ` [PATCH v1 03/15] block: introduce bvec_page()/bvec_set_page() to get/set bio_vec.bv_page jglisse
2019-04-11 21:08 ` [PATCH v1 04/15] block: introduce BIO_VEC_INIT() macro to initialize bio_vec structure jglisse
2019-04-11 21:08 ` [PATCH v1 05/15] block: replace all bio_vec->bv_page by bvec_page()/bvec_set_page() jglisse
2019-04-11 21:08 ` [PATCH v1 06/15] block: convert bio_vec.bv_page to bv_pfn to store pfn and not page jglisse
2019-04-11 21:08 ` [PATCH v1 07/15] block: add bvec_put_page_dirty*() to replace put_page(bvec_page()) jglisse
2019-04-11 21:08 ` [PATCH v1 08/15] block: use bvec_put_page() instead of put_page(bvec_page()) jglisse
2019-04-11 21:08 ` [PATCH v1 09/15] block: bvec_put_page_dirty* instead of set_page_dirty* and bvec_put_page jglisse
2019-04-11 21:08 ` [PATCH v1 10/15] block: add gup flag to bio_add_page()/bio_add_pc_page()/__bio_add_page() jglisse
2019-04-15 14:59   ` Jan Kara
2019-04-15 15:24     ` Jerome Glisse
2019-04-16 16:46       ` Jan Kara
2019-04-16 16:54         ` Dan Williams
2019-04-16 17:07         ` Jerome Glisse
2019-04-16  0:22     ` Jerome Glisse
2019-04-16 16:52       ` Jan Kara
2019-04-16 18:32         ` Jerome Glisse
2019-04-11 21:08 ` [PATCH v1 11/15] block: make sure bio_add_page*() knows page that are coming from GUP jglisse
2019-04-11 21:08 ` [PATCH v1 12/15] fs/direct-io: keep track of wether a page is coming from GUP or not jglisse
2019-04-11 23:14   ` Dave Chinner
2019-04-12  0:08     ` Jerome Glisse
2019-04-11 21:08 ` [PATCH v1 13/15] fs/splice: use put_user_page() when appropriate jglisse
2019-04-11 21:08 ` [PATCH v1 14/15] fs: use bvec_set_gup_page() where appropriate jglisse
2019-04-11 21:08 ` [PATCH v1 15/15] ceph: use put_user_pages() instead of ceph_put_page_vector() jglisse
2019-04-15  7:46   ` Yan, Zheng
2019-04-15 15:11     ` Jerome Glisse
2019-04-16  0:00 ` [PATCH v1 00/15] Keep track of GUPed pages in fs and block Dave Chinner
     [not found] ` <2c124cc4-b97e-ee28-2926-305bc6bc74bd@plexistor.com>
2019-04-16 18:47   ` Jerome Glisse
2019-04-16 18:59   ` Kent Overstreet
2019-04-16 19:12     ` Dan Williams
2019-04-16 19:49       ` Jerome Glisse
2019-04-17 21:53         ` Dan Williams
2019-04-17 22:28           ` Jerome Glisse
2019-04-17 23:32             ` Dan Williams
2019-04-18 10:42             ` Jan Kara
2019-04-18 14:27               ` Jerome Glisse
2019-04-18 15:30                 ` Jan Kara
2019-04-18 15:36                   ` Jerome Glisse
2019-04-18 18:03               ` Dan Williams
     [not found]       ` <ccac6c5a-7120-0455-88de-ca321b01e825@plexistor.com>
2019-04-16 19:57         ` Jerome Glisse
     [not found]           ` <41e2d7e1-104b-a006-2824-015ca8c76cc8@gmail.com>
2019-04-16 23:16             ` Jerome Glisse
     [not found]               ` <fa00a2ff-3664-3165-7af8-9d9c53238245@plexistor.com>
2019-04-17  2:03                 ` Jerome Glisse
2019-04-17 21:19                   ` Jerome Glisse
2019-04-16 23:34             ` Jerome Glisse
2019-04-17 21:54         ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).