All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] btrfs: remove buffer heads form superblock handling
@ 2020-02-05 14:38 Johannes Thumshirn
  2020-02-05 14:38 ` [PATCH v4 1/5] btrfs: use the page-cache for super block reading Johannes Thumshirn
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-05 14:38 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, Josef Bacik, linux-btrfs @ vger . kernel . org,
	Johannes Thumshirn

This patch series removes the use of buffer_heads from btrfs' super block read
and write paths. It also converts the integrity-checking code to only work
with pages and BIOs.

Compared to buffer heads, this gives us a leaner call path, as the
buffer_head code wraps around getting pages from the page-cache and adding
them to BIOs to submit.

The first patch removes buffer_heads from superblock reading.  The second
removes it from super_block writing and the subsequent patches remove the
buffer_heads from the integrity check code.

It's based on misc-next from Wednesday February 5, and doesn't show any
regressions in xfstests to the baseline.

This is more or less a consolidation submission, as I lost track what changes
have been requested.

Changes to v3:
- Incroporated feedback from Christoph

Changes to v2:
- Removed patch #1 again
- Added Reviews from Josef
- Re-visited page locking, but not changes, it retains the same locking scheme
  the buffer_heads had
- Incroptorated comments from David regarding open-coding functions
- For more details see the idividual patches.

Changes to v1:
- Added patch #1
- Converted sb reading and integrity checking to use the page cache
- Added rationale behind the conversion to the commit messages.
- For more details see the idividual patches.


Johannes Thumshirn (5):
  btrfs: use the page-cache for super block reading
  btrfs: use BIOs instead of buffer_heads from superblock writeout
  btrfs: remove btrfsic_submit_bh()
  btrfs: remove buffer_heads from btrfsic_process_written_block()
  btrfs: remove buffer_heads form superblock mirror integrity checking

 fs/btrfs/check-integrity.c | 218 +++++++++++--------------------------
 fs/btrfs/check-integrity.h |   2 -
 fs/btrfs/disk-io.c         | 195 +++++++++++++++++++--------------
 fs/btrfs/disk-io.h         |   4 +-
 fs/btrfs/volumes.c         |  57 +++++-----
 fs/btrfs/volumes.h         |   2 -
 6 files changed, 210 insertions(+), 268 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 1/5] btrfs: use the page-cache for super block reading
  2020-02-05 14:38 [PATCH v4 0/5] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
@ 2020-02-05 14:38 ` Johannes Thumshirn
  2020-02-05 16:53   ` Christoph Hellwig
  2020-02-05 14:38 ` [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout Johannes Thumshirn
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-05 14:38 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, Josef Bacik, linux-btrfs @ vger . kernel . org,
	Johannes Thumshirn

Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
some drawbacks, like not being able to propagate errors from the lower
layers.

Directly use the page cache for reading the super-blocks from disk or
invalidating an on-disk super-block. We have to use the page-cache so to
avoid races between mkfs and udev. See also 6f60cbd3ae44 ("btrfs: access
superblock via pagecache in scan_one_device").

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

---
Changes to v3:
- Use read_cache_pages() and write_one_page() for IO (hch)
- Changed subject (David)
- Dropped Josef's R-b due to change

Changes to v2:
- open-code kunmap() + put_page() (David)
- fix double kunmap() (David)
- don't use bi_set_op_attrs() (David)

Changes to v1:
- move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
- switch to using pagecahce instead of alloc_pages() (Nikolay, David)
---
 fs/btrfs/disk-io.c | 78 +++++++++++++++++++++++++---------------------
 fs/btrfs/disk-io.h |  4 +--
 fs/btrfs/volumes.c | 57 +++++++++++++++++----------------
 fs/btrfs/volumes.h |  2 --
 4 files changed, 76 insertions(+), 65 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 28622de9e642..bc14ef1aadda 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2617,11 +2617,12 @@ int __cold open_ctree(struct super_block *sb,
 	u64 features;
 	u16 csum_type;
 	struct btrfs_key location;
-	struct buffer_head *bh;
 	struct btrfs_super_block *disk_super;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_root *tree_root;
 	struct btrfs_root *chunk_root;
+	struct page *super_page;
+	u8 *superblock;
 	int ret;
 	int err = -EINVAL;
 	int clear_free_space_tree = 0;
@@ -2815,28 +2816,33 @@ int __cold open_ctree(struct super_block *sb,
 	/*
 	 * Read super block and check the signature bytes only
 	 */
-	bh = btrfs_read_dev_super(fs_devices->latest_bdev);
-	if (IS_ERR(bh)) {
-		err = PTR_ERR(bh);
+	ret = btrfs_read_dev_super(fs_devices->latest_bdev, &super_page);
+	if (ret) {
+		err = ret;
 		goto fail_alloc;
 	}
 
+	superblock = kmap(super_page);
 	/*
 	 * Verify the type first, if that or the the checksum value are
 	 * corrupted, we'll find out
 	 */
-	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)bh->b_data);
+	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)
+					  superblock);
 	if (!btrfs_supported_super_csum(csum_type)) {
 		btrfs_err(fs_info, "unsupported checksum algorithm: %u",
 			  csum_type);
 		err = -EINVAL;
-		brelse(bh);
+		kunmap(super_page);
+		put_page(super_page);
 		goto fail_alloc;
 	}
 
 	ret = btrfs_init_csum_hash(fs_info, csum_type);
 	if (ret) {
 		err = ret;
+		kunmap(super_page);
+		put_page(super_page);
 		goto fail_alloc;
 	}
 
@@ -2844,10 +2850,11 @@ int __cold open_ctree(struct super_block *sb,
 	 * We want to check superblock checksum, the type is stored inside.
 	 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
 	 */
-	if (btrfs_check_super_csum(fs_info, bh->b_data)) {
+	if (btrfs_check_super_csum(fs_info, superblock)) {
 		btrfs_err(fs_info, "superblock checksum mismatch");
 		err = -EINVAL;
-		brelse(bh);
+		kunmap(super_page);
+		put_page(super_page);
 		goto fail_csum;
 	}
 
@@ -2856,8 +2863,9 @@ int __cold open_ctree(struct super_block *sb,
 	 * following bytes up to INFO_SIZE, the checksum is calculated from
 	 * the whole block of INFO_SIZE
 	 */
-	memcpy(fs_info->super_copy, bh->b_data, sizeof(*fs_info->super_copy));
-	brelse(bh);
+	memcpy(fs_info->super_copy, superblock, sizeof(*fs_info->super_copy));
+	kunmap(super_page);
+	put_page(super_page);
 
 	disk_super = fs_info->super_copy;
 
@@ -3355,40 +3363,40 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
 }
 
 int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
-			struct buffer_head **bh_ret)
+			struct page **super_page)
 {
-	struct buffer_head *bh;
 	struct btrfs_super_block *super;
+	struct page *page;
 	u64 bytenr;
+	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	gfp_t gfp_mask;
 
 	bytenr = btrfs_sb_offset(copy_num);
 	if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
 		return -EINVAL;
 
-	bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
-	/*
-	 * If we fail to read from the underlying devices, as of now
-	 * the best option we have is to mark it EIO.
-	 */
-	if (!bh)
-		return -EIO;
+	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
+	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
+	if (IS_ERR_OR_NULL(page))
+		return -ENOMEM;
 
-	super = (struct btrfs_super_block *)bh->b_data;
+	super = kmap(page);
 	if (btrfs_super_bytenr(super) != bytenr ||
 		    btrfs_super_magic(super) != BTRFS_MAGIC) {
-		brelse(bh);
+		kunmap(page);
+		put_page(page);
 		return -EINVAL;
 	}
+	kunmap(page);
 
-	*bh_ret = bh;
+	*super_page = page;
 	return 0;
 }
 
 
-struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
+int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
 {
-	struct buffer_head *bh;
-	struct buffer_head *latest = NULL;
+	struct page *latest = NULL;
 	struct btrfs_super_block *super;
 	int i;
 	u64 transid = 0;
@@ -3400,25 +3408,25 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
 	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 	 */
 	for (i = 0; i < 1; i++) {
-		ret = btrfs_read_dev_one_super(bdev, i, &bh);
+		ret = btrfs_read_dev_one_super(bdev, i, page);
 		if (ret)
 			continue;
 
-		super = (struct btrfs_super_block *)bh->b_data;
+		super = kmap(*page);
 
 		if (!latest || btrfs_super_generation(super) > transid) {
-			brelse(latest);
-			latest = bh;
+			if (latest) {
+				kunmap(latest);
+				put_page(latest);
+			}
+			latest = *page;
 			transid = btrfs_super_generation(super);
-		} else {
-			brelse(bh);
 		}
-	}
 
-	if (!latest)
-		return ERR_PTR(ret);
+		kunmap(*page);
+	}
 
-	return latest;
+	return ret;
 }
 
 /*
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 8add2e14aab1..a89283ce8ca2 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -54,9 +54,9 @@ int __cold open_ctree(struct super_block *sb,
 	       char *options);
 void __cold close_ctree(struct btrfs_fs_info *fs_info);
 int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors);
-struct buffer_head *btrfs_read_dev_super(struct block_device *bdev);
+int btrfs_read_dev_super(struct block_device *bdev, struct page **super_page);
 int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
-			struct buffer_head **bh_ret);
+			     struct page **super_page);
 int btrfs_commit_super(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
 					struct btrfs_key *key);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0b3167bd9f35..d381225aaff0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6,7 +6,6 @@
 #include <linux/sched.h>
 #include <linux/bio.h>
 #include <linux/slab.h>
-#include <linux/buffer_head.h>
 #include <linux/blkdev.h>
 #include <linux/ratelimit.h>
 #include <linux/kthread.h>
@@ -500,7 +499,7 @@ static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
 static int
 btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
 		      int flush, struct block_device **bdev,
-		      struct buffer_head **bh)
+		      struct page **super_page)
 {
 	int ret;
 
@@ -519,9 +518,8 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
 		goto error;
 	}
 	invalidate_bdev(*bdev);
-	*bh = btrfs_read_dev_super(*bdev);
-	if (IS_ERR(*bh)) {
-		ret = PTR_ERR(*bh);
+	ret = btrfs_read_dev_super(*bdev, super_page);
+	if (ret) {
 		blkdev_put(*bdev, flags);
 		goto error;
 	}
@@ -530,7 +528,6 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
 
 error:
 	*bdev = NULL;
-	*bh = NULL;
 	return ret;
 }
 
@@ -611,7 +608,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 {
 	struct request_queue *q;
 	struct block_device *bdev;
-	struct buffer_head *bh;
+	struct page *super_page;
 	struct btrfs_super_block *disk_super;
 	u64 devid;
 	int ret;
@@ -622,17 +619,17 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 		return -EINVAL;
 
 	ret = btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1,
-				    &bdev, &bh);
+				    &bdev, &super_page);
 	if (ret)
 		return ret;
 
-	disk_super = (struct btrfs_super_block *)bh->b_data;
+	disk_super = kmap(super_page);
 	devid = btrfs_stack_device_id(&disk_super->dev_item);
 	if (devid != device->devid)
-		goto error_brelse;
+		goto error_free_page;
 
 	if (memcmp(device->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE))
-		goto error_brelse;
+		goto error_free_page;
 
 	device->generation = btrfs_super_generation(disk_super);
 
@@ -641,7 +638,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 		    BTRFS_FEATURE_INCOMPAT_METADATA_UUID) {
 			pr_err(
 		"BTRFS: Invalid seeding and uuid-changed device detected\n");
-			goto error_brelse;
+			goto error_free_page;
 		}
 
 		clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
@@ -667,12 +664,14 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 		fs_devices->rw_devices++;
 		list_add_tail(&device->dev_alloc_list, &fs_devices->alloc_list);
 	}
-	brelse(bh);
+	kunmap(super_page);
+	put_page(super_page);
 
 	return 0;
 
-error_brelse:
-	brelse(bh);
+error_free_page:
+	kunmap(super_page);
+	put_page(super_page);
 	blkdev_put(bdev, flags);
 
 	return -EINVAL;
@@ -2209,14 +2208,15 @@ static struct btrfs_device *btrfs_find_device_by_path(
 	u64 devid;
 	u8 *dev_uuid;
 	struct block_device *bdev;
-	struct buffer_head *bh;
+	struct page *super_page;
 	struct btrfs_device *device;
 
 	ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
-				    fs_info->bdev_holder, 0, &bdev, &bh);
+				    fs_info->bdev_holder, 0, &bdev,
+				    &super_page);
 	if (ret)
 		return ERR_PTR(ret);
-	disk_super = (struct btrfs_super_block *)bh->b_data;
+	disk_super = kmap(super_page);
 	devid = btrfs_stack_device_id(&disk_super->dev_item);
 	dev_uuid = disk_super->dev_item.uuid;
 	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
@@ -2226,7 +2226,8 @@ static struct btrfs_device *btrfs_find_device_by_path(
 		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
 					   disk_super->fsid, true);
 
-	brelse(bh);
+	kunmap(super_page);
+	put_page(super_page);
 	if (!device)
 		device = ERR_PTR(-ENOENT);
 	blkdev_put(bdev, FMODE_READ);
@@ -7319,7 +7320,6 @@ int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
 
 void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path)
 {
-	struct buffer_head *bh;
 	struct btrfs_super_block *disk_super;
 	int copy_num;
 
@@ -7328,16 +7328,21 @@ void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_pat
 
 	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX;
 		copy_num++) {
+		u64 bytenr = btrfs_sb_offset(copy_num);
+		struct page *page;
 
-		if (btrfs_read_dev_one_super(bdev, copy_num, &bh))
+		if (btrfs_read_dev_one_super(bdev, copy_num, &page))
 			continue;
 
-		disk_super = (struct btrfs_super_block *)bh->b_data;
-
+		disk_super = kmap(page) + offset_in_page(bytenr);
 		memset(&disk_super->magic, 0, sizeof(disk_super->magic));
-		set_buffer_dirty(bh);
-		sync_dirty_buffer(bh);
-		brelse(bh);
+		kunmap(page);
+
+		set_page_dirty(page);
+		lock_page(page); /* write_on_page() unlocks the page */
+		write_one_page(page);
+		put_page(page);
+
 	}
 
 	/* Notify udev that device has changed */
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 690d4f5a0653..3b8eb2a14960 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -17,8 +17,6 @@ extern struct mutex uuid_mutex;
 
 #define BTRFS_STRIPE_LEN	SZ_64K
 
-struct buffer_head;
-
 struct btrfs_io_geometry {
 	/* remaining bytes before crossing a stripe */
 	u64 len;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout
  2020-02-05 14:38 [PATCH v4 0/5] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
  2020-02-05 14:38 ` [PATCH v4 1/5] btrfs: use the page-cache for super block reading Johannes Thumshirn
@ 2020-02-05 14:38 ` Johannes Thumshirn
  2020-02-05 18:16   ` Christoph Hellwig
  2020-02-05 14:38 ` [PATCH v4 3/5] btrfs: remove btrfsic_submit_bh() Johannes Thumshirn
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-05 14:38 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, Josef Bacik, linux-btrfs @ vger . kernel . org,
	Johannes Thumshirn

Similar to the superblock read path, change the write path to using BIOs
and pages instead of buffer_heads. This allows us to skip over the
buffer_head code, for writing the superblock to disk.

This is based on a patch originally authored by Nikolay Borisov.

Co-developed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>

---
Changes to v2:
- Don't use bi_set_op_attrs() (David)

Changes to v1:
- Remove left-over buffer_head.h include (David)
---
 fs/btrfs/disk-io.c | 117 +++++++++++++++++++++++++++------------------
 1 file changed, 70 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bc14ef1aadda..f5343a35ac2f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -7,7 +7,6 @@
 #include <linux/blkdev.h>
 #include <linux/radix-tree.h>
 #include <linux/writeback.h>
-#include <linux/buffer_head.h>
 #include <linux/workqueue.h>
 #include <linux/kthread.h>
 #include <linux/slab.h>
@@ -3341,25 +3340,33 @@ int __cold open_ctree(struct super_block *sb,
 }
 ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
 
-static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
+static void btrfs_end_super_write(struct bio *bio)
 {
-	if (uptodate) {
-		set_buffer_uptodate(bh);
-	} else {
-		struct btrfs_device *device = (struct btrfs_device *)
-			bh->b_private;
-
-		btrfs_warn_rl_in_rcu(device->fs_info,
-				"lost page write due to IO error on %s",
-					  rcu_str_deref(device->name));
-		/* note, we don't set_buffer_write_io_error because we have
-		 * our own ways of dealing with the IO errors
-		 */
-		clear_buffer_uptodate(bh);
-		btrfs_dev_stat_inc_and_print(device, BTRFS_DEV_STAT_WRITE_ERRS);
+	struct btrfs_device *device = bio->bi_private;
+	struct bio_vec *bvec;
+	struct bvec_iter_all iter_all;
+	struct page *page;
+
+	bio_for_each_segment_all(bvec, bio, iter_all) {
+		page = bvec->bv_page;
+
+		if (blk_status_to_errno(bio->bi_status)) {
+			btrfs_warn_rl_in_rcu(device->fs_info,
+					     "lost page write due to IO error on %s",
+					     rcu_str_deref(device->name));
+			ClearPageUptodate(page);
+			SetPageError(page);
+			btrfs_dev_stat_inc_and_print(device,
+						     BTRFS_DEV_STAT_WRITE_ERRS);
+		} else {
+			SetPageUptodate(page);
+		}
+
+		put_page(page);
+		unlock_page(page);
 	}
-	unlock_buffer(bh);
-	put_bh(bh);
+
+	bio_put(bio);
 }
 
 int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
@@ -3437,16 +3444,16 @@ int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
  * the expected device size at commit time. Note that max_mirrors must be
  * same for write and wait phases.
  *
- * Return number of errors when buffer head is not found or submission fails.
+ * Return number of errors when page is not found or submission fails.
  */
 static int write_dev_supers(struct btrfs_device *device,
 			    struct btrfs_super_block *sb, int max_mirrors)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
+	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
-	struct buffer_head *bh;
+	gfp_t gfp_mask;
 	int i;
-	int ret;
 	int errors = 0;
 	u64 bytenr;
 	int op_flags;
@@ -3456,7 +3463,13 @@ static int write_dev_supers(struct btrfs_device *device,
 
 	shash->tfm = fs_info->csum_shash;
 
+	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
+
 	for (i = 0; i < max_mirrors; i++) {
+		struct page *page;
+		struct bio *bio;
+		u8 *ptr;
+
 		bytenr = btrfs_sb_offset(i);
 		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
 		    device->commit_total_bytes)
@@ -3469,26 +3482,22 @@ static int write_dev_supers(struct btrfs_device *device,
 				    BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
 		crypto_shash_final(shash, sb->csum);
 
-		/* One reference for us, and we leave it for the caller */
-		bh = __getblk(device->bdev, bytenr / BTRFS_BDEV_BLOCKSIZE,
-			      BTRFS_SUPER_INFO_SIZE);
-		if (!bh) {
+		page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT,
+					   gfp_mask);
+		if (!page) {
 			btrfs_err(device->fs_info,
-			    "couldn't get super buffer head for bytenr %llu",
+			    "couldn't get superblock page for bytenr %llu",
 			    bytenr);
 			errors++;
 			continue;
 		}
 
-		memcpy(bh->b_data, sb, BTRFS_SUPER_INFO_SIZE);
+		/* Bump the refcount for wait_dev_supers() */
+		get_page(page);
 
-		/* one reference for submit_bh */
-		get_bh(bh);
-
-		set_buffer_uptodate(bh);
-		lock_buffer(bh);
-		bh->b_end_io = btrfs_end_buffer_write_sync;
-		bh->b_private = device;
+		ptr = kmap(page);
+		memcpy(ptr, sb, BTRFS_SUPER_INFO_SIZE);
+		kunmap(page);
 
 		/*
 		 * we fua the first super.  The others we allow
@@ -3497,9 +3506,23 @@ static int write_dev_supers(struct btrfs_device *device,
 		op_flags = REQ_SYNC | REQ_META | REQ_PRIO;
 		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
 			op_flags |= REQ_FUA;
-		ret = btrfsic_submit_bh(REQ_OP_WRITE, op_flags, bh);
-		if (ret)
-			errors++;
+
+		/*
+		 * Directly use BIOs here instead of relying on the page-cache
+		 * to do I/O, so we don't loose the ability to do integrity
+		 * checking.
+		 */
+		bio = bio_alloc(gfp_mask, 1);
+		bio_set_dev(bio, device->bdev);
+		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
+		bio->bi_private = device;
+		bio->bi_end_io = btrfs_end_super_write;
+		bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
+			     offset_in_page(bytenr));
+
+		bio->bi_opf = REQ_OP_WRITE | op_flags;
+
+		btrfsic_submit_bio(bio);
 	}
 	return errors < i ? 0 : -1;
 }
@@ -3508,12 +3531,11 @@ static int write_dev_supers(struct btrfs_device *device,
  * Wait for write completion of superblocks done by write_dev_supers,
  * @max_mirrors same for write and wait phases.
  *
- * Return number of errors when buffer head is not found or not marked up to
+ * Return number of errors when page is not found or not marked up to
  * date.
  */
 static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 {
-	struct buffer_head *bh;
 	int i;
 	int errors = 0;
 	bool primary_failed = false;
@@ -3523,32 +3545,33 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		max_mirrors = BTRFS_SUPER_MIRROR_MAX;
 
 	for (i = 0; i < max_mirrors; i++) {
+		struct page *page;
+
 		bytenr = btrfs_sb_offset(i);
 		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
 		    device->commit_total_bytes)
 			break;
 
-		bh = __find_get_block(device->bdev,
-				      bytenr / BTRFS_BDEV_BLOCKSIZE,
-				      BTRFS_SUPER_INFO_SIZE);
-		if (!bh) {
+		page = find_get_page(device->bdev->bd_inode->i_mapping,
+				     bytenr >> PAGE_SHIFT);
+		if (!page) {
 			errors++;
 			if (i == 0)
 				primary_failed = true;
 			continue;
 		}
-		wait_on_buffer(bh);
-		if (!buffer_uptodate(bh)) {
+		wait_on_page_locked(page);
+		if (PageError(page)) {
 			errors++;
 			if (i == 0)
 				primary_failed = true;
 		}
 
 		/* drop our reference */
-		brelse(bh);
+		put_page(page);
 
 		/* drop the reference from the writing run */
-		brelse(bh);
+		put_page(page);
 	}
 
 	/* log error, force error return */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 3/5] btrfs: remove btrfsic_submit_bh()
  2020-02-05 14:38 [PATCH v4 0/5] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
  2020-02-05 14:38 ` [PATCH v4 1/5] btrfs: use the page-cache for super block reading Johannes Thumshirn
  2020-02-05 14:38 ` [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout Johannes Thumshirn
@ 2020-02-05 14:38 ` Johannes Thumshirn
  2020-02-05 14:38 ` [PATCH v4 4/5] btrfs: remove buffer_heads from btrfsic_process_written_block() Johannes Thumshirn
  2020-02-05 14:38 ` [PATCH v4 5/5] btrfs: remove buffer_heads form superblock mirror integrity checking Johannes Thumshirn
  4 siblings, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-05 14:38 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, Josef Bacik, linux-btrfs @ vger . kernel . org,
	Johannes Thumshirn

Now that the last use of btrfsic_submit_bh() is gone, remove the function
as well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/check-integrity.c | 57 --------------------------------------
 fs/btrfs/check-integrity.h |  2 --
 2 files changed, 59 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index a0ce69f2d27c..e7507985435e 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2730,63 +2730,6 @@ static struct btrfsic_dev_state *btrfsic_dev_state_lookup(dev_t dev)
 						  &btrfsic_dev_state_hashtable);
 }
 
-int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh)
-{
-	struct btrfsic_dev_state *dev_state;
-
-	if (!btrfsic_is_initialized)
-		return submit_bh(op, op_flags, bh);
-
-	mutex_lock(&btrfsic_mutex);
-	/* since btrfsic_submit_bh() might also be called before
-	 * btrfsic_mount(), this might return NULL */
-	dev_state = btrfsic_dev_state_lookup(bh->b_bdev->bd_dev);
-
-	/* Only called to write the superblock (incl. FLUSH/FUA) */
-	if (NULL != dev_state &&
-	    (op == REQ_OP_WRITE) && bh->b_size > 0) {
-		u64 dev_bytenr;
-
-		dev_bytenr = BTRFS_BDEV_BLOCKSIZE * bh->b_blocknr;
-		if (dev_state->state->print_mask &
-		    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
-			pr_info("submit_bh(op=0x%x,0x%x, blocknr=%llu (bytenr %llu), size=%zu, data=%p, bdev=%p)\n",
-			       op, op_flags, (unsigned long long)bh->b_blocknr,
-			       dev_bytenr, bh->b_size, bh->b_data, bh->b_bdev);
-		btrfsic_process_written_block(dev_state, dev_bytenr,
-					      &bh->b_data, 1, NULL,
-					      NULL, bh, op_flags);
-	} else if (NULL != dev_state && (op_flags & REQ_PREFLUSH)) {
-		if (dev_state->state->print_mask &
-		    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
-			pr_info("submit_bh(op=0x%x,0x%x FLUSH, bdev=%p)\n",
-			       op, op_flags, bh->b_bdev);
-		if (!dev_state->dummy_block_for_bio_bh_flush.is_iodone) {
-			if ((dev_state->state->print_mask &
-			     (BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH |
-			      BTRFSIC_PRINT_MASK_VERBOSE)))
-				pr_info("btrfsic_submit_bh(%s) with FLUSH but dummy block already in use (ignored)!\n",
-				       dev_state->name);
-		} else {
-			struct btrfsic_block *const block =
-				&dev_state->dummy_block_for_bio_bh_flush;
-
-			block->is_iodone = 0;
-			block->never_written = 0;
-			block->iodone_w_error = 0;
-			block->flush_gen = dev_state->last_flush_gen + 1;
-			block->submit_bio_bh_rw = op_flags;
-			block->orig_bio_bh_private = bh->b_private;
-			block->orig_bio_bh_end_io.bh = bh->b_end_io;
-			block->next_in_same_bio = NULL;
-			bh->b_private = block;
-			bh->b_end_io = btrfsic_bh_end_io;
-		}
-	}
-	mutex_unlock(&btrfsic_mutex);
-	return submit_bh(op, op_flags, bh);
-}
-
 static void __btrfsic_submit_bio(struct bio *bio)
 {
 	struct btrfsic_dev_state *dev_state;
diff --git a/fs/btrfs/check-integrity.h b/fs/btrfs/check-integrity.h
index 9bf4359cc44c..bcc730a06cb5 100644
--- a/fs/btrfs/check-integrity.h
+++ b/fs/btrfs/check-integrity.h
@@ -7,11 +7,9 @@
 #define BTRFS_CHECK_INTEGRITY_H
 
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
-int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh);
 void btrfsic_submit_bio(struct bio *bio);
 int btrfsic_submit_bio_wait(struct bio *bio);
 #else
-#define btrfsic_submit_bh submit_bh
 #define btrfsic_submit_bio submit_bio
 #define btrfsic_submit_bio_wait submit_bio_wait
 #endif
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 4/5] btrfs: remove buffer_heads from btrfsic_process_written_block()
  2020-02-05 14:38 [PATCH v4 0/5] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2020-02-05 14:38 ` [PATCH v4 3/5] btrfs: remove btrfsic_submit_bh() Johannes Thumshirn
@ 2020-02-05 14:38 ` Johannes Thumshirn
  2020-02-05 14:38 ` [PATCH v4 5/5] btrfs: remove buffer_heads form superblock mirror integrity checking Johannes Thumshirn
  4 siblings, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-05 14:38 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, Josef Bacik, linux-btrfs @ vger . kernel . org,
	Johannes Thumshirn

Now that the last caller of btrfsic_process_written_block() with
buffer_heads is gone, remove the buffer_head processing path from it as
well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/check-integrity.c | 103 +++++++++----------------------------
 1 file changed, 25 insertions(+), 78 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index e7507985435e..4f6db2fe482a 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -152,11 +152,8 @@ struct btrfsic_block {
 	struct list_head ref_to_list;	/* list */
 	struct list_head ref_from_list;	/* list */
 	struct btrfsic_block *next_in_same_bio;
-	void *orig_bio_bh_private;
-	union {
-		bio_end_io_t *bio;
-		bh_end_io_t *bh;
-	} orig_bio_bh_end_io;
+	void *orig_bio_private;
+	bio_end_io_t *orig_bio_end_io;
 	int submit_bio_bh_rw;
 	u64 flush_gen; /* only valid if !never_written */
 };
@@ -325,14 +322,12 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 					  u64 dev_bytenr, char **mapped_datav,
 					  unsigned int num_pages,
 					  struct bio *bio, int *bio_is_patched,
-					  struct buffer_head *bh,
 					  int submit_bio_bh_rw);
 static int btrfsic_process_written_superblock(
 		struct btrfsic_state *state,
 		struct btrfsic_block *const block,
 		struct btrfs_super_block *const super_hdr);
 static void btrfsic_bio_end_io(struct bio *bp);
-static void btrfsic_bh_end_io(struct buffer_head *bh, int uptodate);
 static int btrfsic_is_block_ref_by_superblock(const struct btrfsic_state *state,
 					      const struct btrfsic_block *block,
 					      int recursion_level);
@@ -399,8 +394,8 @@ static void btrfsic_block_init(struct btrfsic_block *b)
 	b->never_written = 0;
 	b->mirror_num = 0;
 	b->next_in_same_bio = NULL;
-	b->orig_bio_bh_private = NULL;
-	b->orig_bio_bh_end_io.bio = NULL;
+	b->orig_bio_private = NULL;
+	b->orig_bio_end_io = NULL;
 	INIT_LIST_HEAD(&b->collision_resolving_node);
 	INIT_LIST_HEAD(&b->all_blocks_node);
 	INIT_LIST_HEAD(&b->ref_to_list);
@@ -1743,7 +1738,6 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 					  u64 dev_bytenr, char **mapped_datav,
 					  unsigned int num_pages,
 					  struct bio *bio, int *bio_is_patched,
-					  struct buffer_head *bh,
 					  int submit_bio_bh_rw)
 {
 	int is_metadata;
@@ -1902,9 +1896,9 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 				block->is_iodone = 0;
 				BUG_ON(NULL == bio_is_patched);
 				if (!*bio_is_patched) {
-					block->orig_bio_bh_private =
+					block->orig_bio_private =
 					    bio->bi_private;
-					block->orig_bio_bh_end_io.bio =
+					block->orig_bio_end_io =
 					    bio->bi_end_io;
 					block->next_in_same_bio = NULL;
 					bio->bi_private = block;
@@ -1916,25 +1910,17 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 					    bio->bi_private;
 
 					BUG_ON(NULL == chained_block);
-					block->orig_bio_bh_private =
-					    chained_block->orig_bio_bh_private;
-					block->orig_bio_bh_end_io.bio =
-					    chained_block->orig_bio_bh_end_io.
-					    bio;
+					block->orig_bio_private =
+					    chained_block->orig_bio_private;
+					block->orig_bio_end_io =
+					    chained_block->orig_bio_end_io;
 					block->next_in_same_bio = chained_block;
 					bio->bi_private = block;
 				}
-			} else if (NULL != bh) {
-				block->is_iodone = 0;
-				block->orig_bio_bh_private = bh->b_private;
-				block->orig_bio_bh_end_io.bh = bh->b_end_io;
-				block->next_in_same_bio = NULL;
-				bh->b_private = block;
-				bh->b_end_io = btrfsic_bh_end_io;
 			} else {
 				block->is_iodone = 1;
-				block->orig_bio_bh_private = NULL;
-				block->orig_bio_bh_end_io.bio = NULL;
+				block->orig_bio_private = NULL;
+				block->orig_bio_end_io = NULL;
 				block->next_in_same_bio = NULL;
 			}
 		}
@@ -2042,8 +2028,8 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 			block->is_iodone = 0;
 			BUG_ON(NULL == bio_is_patched);
 			if (!*bio_is_patched) {
-				block->orig_bio_bh_private = bio->bi_private;
-				block->orig_bio_bh_end_io.bio = bio->bi_end_io;
+				block->orig_bio_private = bio->bi_private;
+				block->orig_bio_end_io = bio->bi_end_io;
 				block->next_in_same_bio = NULL;
 				bio->bi_private = block;
 				bio->bi_end_io = btrfsic_bio_end_io;
@@ -2054,24 +2040,17 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 				    bio->bi_private;
 
 				BUG_ON(NULL == chained_block);
-				block->orig_bio_bh_private =
-				    chained_block->orig_bio_bh_private;
-				block->orig_bio_bh_end_io.bio =
-				    chained_block->orig_bio_bh_end_io.bio;
+				block->orig_bio_private =
+				    chained_block->orig_bio_private;
+				block->orig_bio_end_io =
+				    chained_block->orig_bio_end_io;
 				block->next_in_same_bio = chained_block;
 				bio->bi_private = block;
 			}
-		} else if (NULL != bh) {
-			block->is_iodone = 0;
-			block->orig_bio_bh_private = bh->b_private;
-			block->orig_bio_bh_end_io.bh = bh->b_end_io;
-			block->next_in_same_bio = NULL;
-			bh->b_private = block;
-			bh->b_end_io = btrfsic_bh_end_io;
 		} else {
 			block->is_iodone = 1;
-			block->orig_bio_bh_private = NULL;
-			block->orig_bio_bh_end_io.bio = NULL;
+			block->orig_bio_private = NULL;
+			block->orig_bio_end_io = NULL;
 			block->next_in_same_bio = NULL;
 		}
 		if (state->print_mask & BTRFSIC_PRINT_MASK_VERBOSE)
@@ -2112,8 +2091,8 @@ static void btrfsic_bio_end_io(struct bio *bp)
 		iodone_w_error = 1;
 
 	BUG_ON(NULL == block);
-	bp->bi_private = block->orig_bio_bh_private;
-	bp->bi_end_io = block->orig_bio_bh_end_io.bio;
+	bp->bi_private = block->orig_bio_private;
+	bp->bi_end_io = block->orig_bio_end_io;
 
 	do {
 		struct btrfsic_block *next_block;
@@ -2146,38 +2125,6 @@ static void btrfsic_bio_end_io(struct bio *bp)
 	bp->bi_end_io(bp);
 }
 
-static void btrfsic_bh_end_io(struct buffer_head *bh, int uptodate)
-{
-	struct btrfsic_block *block = (struct btrfsic_block *)bh->b_private;
-	int iodone_w_error = !uptodate;
-	struct btrfsic_dev_state *dev_state;
-
-	BUG_ON(NULL == block);
-	dev_state = block->dev_state;
-	if ((dev_state->state->print_mask & BTRFSIC_PRINT_MASK_END_IO_BIO_BH))
-		pr_info("bh_end_io(error=%d) for %c @%llu (%s/%llu/%d)\n",
-		       iodone_w_error,
-		       btrfsic_get_block_type(dev_state->state, block),
-		       block->logical_bytenr, block->dev_state->name,
-		       block->dev_bytenr, block->mirror_num);
-
-	block->iodone_w_error = iodone_w_error;
-	if (block->submit_bio_bh_rw & REQ_PREFLUSH) {
-		dev_state->last_flush_gen++;
-		if ((dev_state->state->print_mask &
-		     BTRFSIC_PRINT_MASK_END_IO_BIO_BH))
-			pr_info("bh_end_io() new %s flush_gen=%llu\n",
-			       dev_state->name, dev_state->last_flush_gen);
-	}
-	if (block->submit_bio_bh_rw & REQ_FUA)
-		block->flush_gen = 0; /* FUA completed means block is on disk */
-
-	bh->b_private = block->orig_bio_bh_private;
-	bh->b_end_io = block->orig_bio_bh_end_io.bh;
-	block->is_iodone = 1; /* for FLUSH, this releases the block */
-	bh->b_end_io(bh, uptodate);
-}
-
 static int btrfsic_process_written_superblock(
 		struct btrfsic_state *state,
 		struct btrfsic_block *const superblock,
@@ -2781,7 +2728,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
 		btrfsic_process_written_block(dev_state, dev_bytenr,
 					      mapped_datav, segs,
 					      bio, &bio_is_patched,
-					      NULL, bio->bi_opf);
+					      bio->bi_opf);
 		bio_for_each_segment(bvec, bio, iter)
 			kunmap(bvec.bv_page);
 		kfree(mapped_datav);
@@ -2805,8 +2752,8 @@ static void __btrfsic_submit_bio(struct bio *bio)
 			block->iodone_w_error = 0;
 			block->flush_gen = dev_state->last_flush_gen + 1;
 			block->submit_bio_bh_rw = bio->bi_opf;
-			block->orig_bio_bh_private = bio->bi_private;
-			block->orig_bio_bh_end_io.bio = bio->bi_end_io;
+			block->orig_bio_private = bio->bi_private;
+			block->orig_bio_end_io = bio->bi_end_io;
 			block->next_in_same_bio = NULL;
 			bio->bi_private = block;
 			bio->bi_end_io = btrfsic_bio_end_io;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 5/5] btrfs: remove buffer_heads form superblock mirror integrity checking
  2020-02-05 14:38 [PATCH v4 0/5] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
                   ` (3 preceding siblings ...)
  2020-02-05 14:38 ` [PATCH v4 4/5] btrfs: remove buffer_heads from btrfsic_process_written_block() Johannes Thumshirn
@ 2020-02-05 14:38 ` Johannes Thumshirn
  4 siblings, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-05 14:38 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, Josef Bacik, linux-btrfs @ vger . kernel . org,
	Johannes Thumshirn

The integrity checking code for the superblock mirrors is the last remaining
user of buffer_heads in BTRFS, change it to using plain BIOs as well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>

---
Changes to v2:
- Open-code kunmap() + put_page() (David)
- Remove __GFP_NOFAIL from allocation (Josef)
- Merge error paths (David)

Changes to v1:
- Convert from alloc_page() to find_or_create_page()
---
 fs/btrfs/check-integrity.c | 58 +++++++++++++++++++++++++-------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 4f6db2fe482a..d6ab6d3ca413 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -77,7 +77,6 @@
 
 #include <linux/sched.h>
 #include <linux/slab.h>
-#include <linux/buffer_head.h>
 #include <linux/mutex.h>
 #include <linux/genhd.h>
 #include <linux/blkdev.h>
@@ -762,29 +761,48 @@ static int btrfsic_process_superblock_dev_mirror(
 	struct btrfs_fs_info *fs_info = state->fs_info;
 	struct btrfs_super_block *super_tmp;
 	u64 dev_bytenr;
-	struct buffer_head *bh;
 	struct btrfsic_block *superblock_tmp;
 	int pass;
 	struct block_device *const superblock_bdev = device->bdev;
+	struct page *page;
+	struct bio bio;
+	struct bio_vec bio_vec;
+	struct address_space *mapping = superblock_bdev->bd_inode->i_mapping;
+	gfp_t gfp_mask;
+	int ret;
 
 	/* super block bytenr is always the unmapped device bytenr */
 	dev_bytenr = btrfs_sb_offset(superblock_mirror_num);
 	if (dev_bytenr + BTRFS_SUPER_INFO_SIZE > device->commit_total_bytes)
 		return -1;
-	bh = __bread(superblock_bdev, dev_bytenr / BTRFS_BDEV_BLOCKSIZE,
-		     BTRFS_SUPER_INFO_SIZE);
-	if (NULL == bh)
+
+	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS);
+
+	page = find_or_create_page(mapping, dev_bytenr >> PAGE_SHIFT, gfp_mask);
+	if (!page)
+		return -1;
+
+	bio_init(&bio, &bio_vec, 1);
+	bio.bi_iter.bi_sector = dev_bytenr >> SECTOR_SHIFT;
+	bio_set_dev(&bio, superblock_bdev);
+	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
+	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0);
+
+	ret = submit_bio_wait(&bio);
+	if (ret)
 		return -1;
-	super_tmp = (struct btrfs_super_block *)
-	    (bh->b_data + (dev_bytenr & (BTRFS_BDEV_BLOCKSIZE - 1)));
+
+	unlock_page(page);
+
+	super_tmp = kmap(page);
 
 	if (btrfs_super_bytenr(super_tmp) != dev_bytenr ||
 	    btrfs_super_magic(super_tmp) != BTRFS_MAGIC ||
 	    memcmp(device->uuid, super_tmp->dev_item.uuid, BTRFS_UUID_SIZE) ||
 	    btrfs_super_nodesize(super_tmp) != state->metablock_size ||
 	    btrfs_super_sectorsize(super_tmp) != state->datablock_size) {
-		brelse(bh);
-		return 0;
+		ret = 0;
+		goto out_unmap;
 	}
 
 	superblock_tmp =
@@ -795,8 +813,8 @@ static int btrfsic_process_superblock_dev_mirror(
 		superblock_tmp = btrfsic_block_alloc();
 		if (NULL == superblock_tmp) {
 			pr_info("btrfsic: error, kmalloc failed!\n");
-			brelse(bh);
-			return -1;
+			ret = -1;
+			goto out_unmap;
 		}
 		/* for superblock, only the dev_bytenr makes sense */
 		superblock_tmp->dev_bytenr = dev_bytenr;
@@ -880,8 +898,8 @@ static int btrfsic_process_superblock_dev_mirror(
 					      mirror_num)) {
 				pr_info("btrfsic: btrfsic_map_block(bytenr @%llu, mirror %d) failed!\n",
 				       next_bytenr, mirror_num);
-				brelse(bh);
-				return -1;
+				ret = -1;
+				goto out_unmap;
 			}
 
 			next_block = btrfsic_block_lookup_or_add(
@@ -890,8 +908,8 @@ static int btrfsic_process_superblock_dev_mirror(
 					mirror_num, NULL);
 			if (NULL == next_block) {
 				btrfsic_release_block_ctx(&tmp_next_block_ctx);
-				brelse(bh);
-				return -1;
+				ret = -1;
+				goto out_unmap;
 			}
 
 			next_block->disk_key = tmp_disk_key;
@@ -902,16 +920,18 @@ static int btrfsic_process_superblock_dev_mirror(
 					BTRFSIC_GENERATION_UNKNOWN);
 			btrfsic_release_block_ctx(&tmp_next_block_ctx);
 			if (NULL == l) {
-				brelse(bh);
-				return -1;
+				ret = -1;
+				goto out_unmap;
 			}
 		}
 	}
 	if (state->print_mask & BTRFSIC_PRINT_MASK_INITIAL_ALL_TREES)
 		btrfsic_dump_tree_sub(state, superblock_tmp, 0);
 
-	brelse(bh);
-	return 0;
+out_unmap:
+	kunmap(page);
+	put_page(page);
+	return ret;
 }
 
 static struct btrfsic_stack_frame *btrfsic_stack_frame_alloc(void)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/5] btrfs: use the page-cache for super block reading
  2020-02-05 14:38 ` [PATCH v4 1/5] btrfs: use the page-cache for super block reading Johannes Thumshirn
@ 2020-02-05 16:53   ` Christoph Hellwig
  2020-02-06  8:17     ` Johannes Thumshirn
  2020-02-06  9:29     ` Johannes Thumshirn
  0 siblings, 2 replies; 18+ messages in thread
From: Christoph Hellwig @ 2020-02-05 16:53 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On Wed, Feb 05, 2020 at 11:38:27PM +0900, Johannes Thumshirn wrote:
> Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
> some drawbacks, like not being able to propagate errors from the lower
> layers.
> 
> Directly use the page cache for reading the super-blocks from disk or
> invalidating an on-disk super-block. We have to use the page-cache so to
> avoid races between mkfs and udev. See also 6f60cbd3ae44 ("btrfs: access
> superblock via pagecache in scan_one_device").
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> ---
> Changes to v3:
> - Use read_cache_pages() and write_one_page() for IO (hch)
> - Changed subject (David)
> - Dropped Josef's R-b due to change
> 
> Changes to v2:
> - open-code kunmap() + put_page() (David)
> - fix double kunmap() (David)
> - don't use bi_set_op_attrs() (David)
> 
> Changes to v1:
> - move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
> - switch to using pagecahce instead of alloc_pages() (Nikolay, David)
> ---
>  fs/btrfs/disk-io.c | 78 +++++++++++++++++++++++++---------------------
>  fs/btrfs/disk-io.h |  4 +--
>  fs/btrfs/volumes.c | 57 +++++++++++++++++----------------
>  fs/btrfs/volumes.h |  2 --
>  4 files changed, 76 insertions(+), 65 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 28622de9e642..bc14ef1aadda 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2617,11 +2617,12 @@ int __cold open_ctree(struct super_block *sb,
>  	u64 features;
>  	u16 csum_type;
>  	struct btrfs_key location;
> -	struct buffer_head *bh;
>  	struct btrfs_super_block *disk_super;
>  	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
>  	struct btrfs_root *tree_root;
>  	struct btrfs_root *chunk_root;
> +	struct page *super_page;
> +	u8 *superblock;

I thought you agree to turn this into a struct btrfs_super_block
pointer?

>  	bytenr = btrfs_sb_offset(copy_num);
>  	if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
>  		return -EINVAL;
>  
> -	bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
> -	/*
> -	 * If we fail to read from the underlying devices, as of now
> -	 * the best option we have is to mark it EIO.
> -	 */
> -	if (!bh)
> -		return -EIO;
> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
> +	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
> +	if (IS_ERR_OR_NULL(page))
> +		return -ENOMEM;

Why do you need the __GFP_NOFAIL given that failures are handled
properly here?  Also I think instead of using mapping_gfp_constraint you
can use GFP_NOFS directly here.

>  
> -	super = (struct btrfs_super_block *)bh->b_data;
> +	super = kmap(page);
>  	if (btrfs_super_bytenr(super) != bytenr ||
>  		    btrfs_super_magic(super) != BTRFS_MAGIC) {
> -		brelse(bh);
> +		kunmap(page);
> +		put_page(page);
>  		return -EINVAL;
>  	}
> +	kunmap(page);

Also last time I wondered why we can't leave the page mapped for the
caller and also return the virtual address?  That would keep the
callers a little cleaner.  Note that you don't need to pass the
struct page in that case as the unmap helper can use kmap_to_page (and
I think a helper would be really nice for the unmap and put anyway).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout
  2020-02-05 14:38 ` [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout Johannes Thumshirn
@ 2020-02-05 18:16   ` Christoph Hellwig
  2020-02-06  8:20     ` Johannes Thumshirn
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2020-02-05 18:16 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On Wed, Feb 05, 2020 at 11:38:28PM +0900, Johannes Thumshirn wrote:
> +static void btrfs_end_super_write(struct bio *bio)
>  {
> +	struct btrfs_device *device = bio->bi_private;
> +	struct bio_vec *bvec;
> +	struct bvec_iter_all iter_all;
> +	struct page *page;
> +
> +	bio_for_each_segment_all(bvec, bio, iter_all) {
> +		page = bvec->bv_page;
> +
> +		if (blk_status_to_errno(bio->bi_status)) {

Nit: this could simply check bio->bi_status without a conversion.

> +			btrfs_warn_rl_in_rcu(device->fs_info,
> +					     "lost page write due to IO error on %s",
> +					     rcu_str_deref(device->name));

But maybe you want to print the error here?

> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;

Same comment on the ask as in the previous patch.

> +		u8 *ptr;

I'd use a typed pointer here again..

> +		ptr = kmap(page);
> +		memcpy(ptr, sb, BTRFS_SUPER_INFO_SIZE);

With which you could do a struct assignment here and very slightly
improve type safety.

> @@ -3497,9 +3506,23 @@ static int write_dev_supers(struct btrfs_device *device,
>  		op_flags = REQ_SYNC | REQ_META | REQ_PRIO;
>  		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
>  			op_flags |= REQ_FUA;

Question on the existing code:  why is it safe to not use FUA for the
subsequent superblocks?

> +
> +		/*
> +		 * Directly use BIOs here instead of relying on the page-cache
> +		 * to do I/O, so we don't loose the ability to do integrity
> +		 * checking.
> +		 */
> +		bio = bio_alloc(gfp_mask, 1);
> +		bio_set_dev(bio, device->bdev);
> +		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> +		bio->bi_private = device;
> +		bio->bi_end_io = btrfs_end_super_write;
> +		bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
> +			     offset_in_page(bytenr));

Missing return value check.  But given that it is a single page and
can't error out please switch to __bio_add_page here.

> +		bio->bi_opf = REQ_OP_WRITE | op_flags;

You could kill the op_flags variable and just assign everything directly
to bio->bi_opf.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/5] btrfs: use the page-cache for super block reading
  2020-02-05 16:53   ` Christoph Hellwig
@ 2020-02-06  8:17     ` Johannes Thumshirn
  2020-02-06 14:57       ` Christoph Hellwig
  2020-02-06  9:29     ` Johannes Thumshirn
  1 sibling, 1 reply; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-06  8:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On 05/02/2020 17:53, Christoph Hellwig wrote:
> On Wed, Feb 05, 2020 at 11:38:27PM +0900, Johannes Thumshirn wrote:
>> Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
>> some drawbacks, like not being able to propagate errors from the lower
>> layers.
>>
>> Directly use the page cache for reading the super-blocks from disk or
>> invalidating an on-disk super-block. We have to use the page-cache so to
>> avoid races between mkfs and udev. See also 6f60cbd3ae44 ("btrfs: access
>> superblock via pagecache in scan_one_device").
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> ---
>> Changes to v3:
>> - Use read_cache_pages() and write_one_page() for IO (hch)
>> - Changed subject (David)
>> - Dropped Josef's R-b due to change
>>
>> Changes to v2:
>> - open-code kunmap() + put_page() (David)
>> - fix double kunmap() (David)
>> - don't use bi_set_op_attrs() (David)
>>
>> Changes to v1:
>> - move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
>> - switch to using pagecahce instead of alloc_pages() (Nikolay, David)
>> ---
>>   fs/btrfs/disk-io.c | 78 +++++++++++++++++++++++++---------------------
>>   fs/btrfs/disk-io.h |  4 +--
>>   fs/btrfs/volumes.c | 57 +++++++++++++++++----------------
>>   fs/btrfs/volumes.h |  2 --
>>   4 files changed, 76 insertions(+), 65 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 28622de9e642..bc14ef1aadda 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -2617,11 +2617,12 @@ int __cold open_ctree(struct super_block *sb,
>>   	u64 features;
>>   	u16 csum_type;
>>   	struct btrfs_key location;
>> -	struct buffer_head *bh;
>>   	struct btrfs_super_block *disk_super;
>>   	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
>>   	struct btrfs_root *tree_root;
>>   	struct btrfs_root *chunk_root;
>> +	struct page *super_page;
>> +	u8 *superblock;
> 
> I thought you agree to turn this into a struct btrfs_super_block
> pointer?

As stated in the cover letter, I lost track of the TODOs ;-)

>>   	bytenr = btrfs_sb_offset(copy_num);
>>   	if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
>>   		return -EINVAL;
>>   
>> -	bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
>> -	/*
>> -	 * If we fail to read from the underlying devices, as of now
>> -	 * the best option we have is to mark it EIO.
>> -	 */
>> -	if (!bh)
>> -		return -EIO;
>> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
>> +	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
>> +	if (IS_ERR_OR_NULL(page))
>> +		return -ENOMEM;
> 
> Why do you need the __GFP_NOFAIL given that failures are handled
> properly here?  Also I think instead of using mapping_gfp_constraint you
> can use GFP_NOFS directly here.

OK

>>   
>> -	super = (struct btrfs_super_block *)bh->b_data;
>> +	super = kmap(page);
>>   	if (btrfs_super_bytenr(super) != bytenr ||
>>   		    btrfs_super_magic(super) != BTRFS_MAGIC) {
>> -		brelse(bh);
>> +		kunmap(page);
>> +		put_page(page);
>>   		return -EINVAL;
>>   	}
>> +	kunmap(page);
> 
> Also last time I wondered why we can't leave the page mapped for the
> caller and also return the virtual address?  That would keep the
> callers a little cleaner.  Note that you don't need to pass the
> struct page in that case as the unmap helper can use kmap_to_page (and
> I think a helper would be really nice for the unmap and put anyway).
> 

There's btrfs_release_disk_super() but David didn't like the use of it 
in v2 of this series. But when using a 'struct btrfs_disk_super' instead 
of a 'struct page' I think he could be ok.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout
  2020-02-05 18:16   ` Christoph Hellwig
@ 2020-02-06  8:20     ` Johannes Thumshirn
  2020-02-06 14:59       ` Christoph Hellwig
  2020-02-07 16:08       ` David Sterba
  0 siblings, 2 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-06  8:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On 05/02/2020 19:16, Christoph Hellwig wrote:
> On Wed, Feb 05, 2020 at 11:38:28PM +0900, Johannes Thumshirn wrote:
>> +static void btrfs_end_super_write(struct bio *bio)
>>   {
>> +	struct btrfs_device *device = bio->bi_private;
>> +	struct bio_vec *bvec;
>> +	struct bvec_iter_all iter_all;
>> +	struct page *page;
>> +
>> +	bio_for_each_segment_all(bvec, bio, iter_all) {
>> +		page = bvec->bv_page;
>> +
>> +		if (blk_status_to_errno(bio->bi_status)) {
> 
> Nit: this could simply check bio->bi_status without a conversion.
> 
>> +			btrfs_warn_rl_in_rcu(device->fs_info,
>> +					     "lost page write due to IO error on %s",
>> +					     rcu_str_deref(device->name));
> 
> But maybe you want to print the error here?
> 
>> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
> 
> Same comment on the ask as in the previous patch.
> 
>> +		u8 *ptr;
> 
> I'd use a typed pointer here again..
> 
>> +		ptr = kmap(page);
>> +		memcpy(ptr, sb, BTRFS_SUPER_INFO_SIZE);
> 
> With which you could do a struct assignment here and very slightly
> improve type safety.
> 
>> @@ -3497,9 +3506,23 @@ static int write_dev_supers(struct btrfs_device *device,
>>   		op_flags = REQ_SYNC | REQ_META | REQ_PRIO;
>>   		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
>>   			op_flags |= REQ_FUA;
> 
> Question on the existing code:  why is it safe to not use FUA for the
> subsequent superblocks?
> 
>> +
>> +		/*
>> +		 * Directly use BIOs here instead of relying on the page-cache
>> +		 * to do I/O, so we don't loose the ability to do integrity
>> +		 * checking.
>> +		 */
>> +		bio = bio_alloc(gfp_mask, 1);
>> +		bio_set_dev(bio, device->bdev);
>> +		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
>> +		bio->bi_private = device;
>> +		bio->bi_end_io = btrfs_end_super_write;
>> +		bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
>> +			     offset_in_page(bytenr));
> 
> Missing return value check.  But given that it is a single page and
> can't error out please switch to __bio_add_page here.

Good question, I guess it's saver to always FUA the SBs

> 
>> +		bio->bi_opf = REQ_OP_WRITE | op_flags;
> 
> You could kill the op_flags variable and just assign everything directly
> to bio->bi_opf.
> 

Then I'd still have to deal with the NOBARRIER mount option here so 
op_flags is nice to have for readability.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/5] btrfs: use the page-cache for super block reading
  2020-02-05 16:53   ` Christoph Hellwig
  2020-02-06  8:17     ` Johannes Thumshirn
@ 2020-02-06  9:29     ` Johannes Thumshirn
  1 sibling, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-06  9:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On 05/02/2020 17:53, Christoph Hellwig wrote:
[...]
>> +	struct page *super_page;
>> +	u8 *superblock;
> 
> I thought you agree to turn this into a struct btrfs_super_block
> pointer?
> 

I'll do this in an add-on patch, otherwise the diff will get messy with 
lots of unrelated changes.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/5] btrfs: use the page-cache for super block reading
  2020-02-06  8:17     ` Johannes Thumshirn
@ 2020-02-06 14:57       ` Christoph Hellwig
  2020-02-06 15:29         ` Johannes Thumshirn
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2020-02-06 14:57 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On Thu, Feb 06, 2020 at 08:17:20AM +0000, Johannes Thumshirn wrote:
> >> +	super = kmap(page);
> >>   	if (btrfs_super_bytenr(super) != bytenr ||
> >>   		    btrfs_super_magic(super) != BTRFS_MAGIC) {
> >> -		brelse(bh);
> >> +		kunmap(page);
> >> +		put_page(page);
> >>   		return -EINVAL;
> >>   	}
> >> +	kunmap(page);
> > 
> > Also last time I wondered why we can't leave the page mapped for the
> > caller and also return the virtual address?  That would keep the
> > callers a little cleaner.  Note that you don't need to pass the
> > struct page in that case as the unmap helper can use kmap_to_page (and
> > I think a helper would be really nice for the unmap and put anyway).
> > 
> 
> There's btrfs_release_disk_super() but David didn't like the use of it 
> in v2 of this series. But when using a 'struct btrfs_disk_super' instead 
> of a 'struct page' I think he could be ok.

Also I just noticed don't even need the kmap/kunmap at all given that the
block device mapping is never in highmem.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout
  2020-02-06  8:20     ` Johannes Thumshirn
@ 2020-02-06 14:59       ` Christoph Hellwig
  2020-02-06 15:18         ` Johannes Thumshirn
  2020-02-07 16:08       ` David Sterba
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2020-02-06 14:59 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On Thu, Feb 06, 2020 at 08:20:16AM +0000, Johannes Thumshirn wrote:
> >>   		op_flags = REQ_SYNC | REQ_META | REQ_PRIO;
> >>   		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
> >>   			op_flags |= REQ_FUA;
> > 
> >> +		bio->bi_opf = REQ_OP_WRITE | op_flags;
> > 
> > You could kill the op_flags variable and just assign everything directly
> > to bio->bi_opf.
> > 
> 
> Then I'd still have to deal with the NOBARRIER mount option here so 
> op_flags is nice to have for readability.

I tend to disagree.  The simple version is:

		bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO;
		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
			bio->bi_opf |= REQ_FUA;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout
  2020-02-06 14:59       ` Christoph Hellwig
@ 2020-02-06 15:18         ` Johannes Thumshirn
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-06 15:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On 06/02/2020 15:59, Christoph Hellwig wrote:
> I tend to disagree.  The simple version is:
> 
> 		bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO;
> 		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
> 			bio->bi_opf |= REQ_FUA;
> 

Well let's use that one then.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/5] btrfs: use the page-cache for super block reading
  2020-02-06 14:57       ` Christoph Hellwig
@ 2020-02-06 15:29         ` Johannes Thumshirn
  2020-02-07 16:13           ` David Sterba
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-06 15:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On 06/02/2020 15:58, Christoph Hellwig wrote:
> Also I just noticed don't even need the kmap/kunmap at all given that the
> block device mapping is never in highmem.
> 

This potentially touches more places, I'll cover that in a dedicated 
patchset.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout
  2020-02-06  8:20     ` Johannes Thumshirn
  2020-02-06 14:59       ` Christoph Hellwig
@ 2020-02-07 16:08       ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: David Sterba @ 2020-02-07 16:08 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On Thu, Feb 06, 2020 at 08:20:16AM +0000, Johannes Thumshirn wrote:
> >> @@ -3497,9 +3506,23 @@ static int write_dev_supers(struct btrfs_device *device,
> >>   		op_flags = REQ_SYNC | REQ_META | REQ_PRIO;
> >>   		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
> >>   			op_flags |= REQ_FUA;
> > 
> > Question on the existing code:  why is it safe to not use FUA for the
> > subsequent superblocks?
> > 
> >> +
> >>C +		/*
> >> +		 * Directly use BIOs here instead of relying on the page-cache
> >> +		 * to do I/O, so we don't loose the ability to do integrity
> >> +		 * checking.
> >> +		 */
> >> +		bio = bio_alloc(gfp_mask, 1);
> >> +		bio_set_dev(bio, device->bdev);
> >> +		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> >> +		bio->bi_private = device;
> >> +		bio->bi_end_io = btrfs_end_super_write;
> >> +		bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
> >> +			     offset_in_page(bytenr));
> > 
> > Missing return value check.  But given that it is a single page and
> > can't error out please switch to __bio_add_page here.
> IR
> Good question, I guess it's saver to always FUA the SBs

That is a performance optimization IIRC, only the primary superblock
does FUA the backup superblocks don't as this would add 2 more flushes
that are considered expensive.

The trade-off is optimistic because the backup superblocks are almost
never necessary. For the common power-fail situation primary will be
there or not atomically, the non-FUA writes of secondary superblocks
will be perhaps delayed a bit. The scenario where the primary sb is
unexpectedly damaged would have to happen in the short window between
primary FUA and backup writes, so the current version of sb is not
available. Something like that:

  write primary sb
1 FUA

  write backup copy 1
  other writes
  write backup copy 2
  other writes
2 FUA (or equvalent flushing the copies to device)

The window is between 1 and 2, and if some divine force kills primary
sb, the backup copies are not permanently stored yet. Which makes
recovery of the last transaction tricky, but there are still the backup
superblocks with previous intact version.

With FUA after each backup, the window would be shortened, with only 2
blocks written, allowing to access the latest transaction, or possibly
the previous one too given where exactly the write sequence is
interrupted.

The above describes possible scenario but I consider it quite rare to
hit in practice, also it depends on the device that should not just skip
writes or FUAs. So the performance optimization is IMO justified.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/5] btrfs: use the page-cache for super block reading
  2020-02-06 15:29         ` Johannes Thumshirn
@ 2020-02-07 16:13           ` David Sterba
  2020-02-10  7:16             ` Johannes Thumshirn
  0 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2020-02-07 16:13 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, David Sterba, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On Thu, Feb 06, 2020 at 03:29:57PM +0000, Johannes Thumshirn wrote:
> On 06/02/2020 15:58, Christoph Hellwig wrote:
> > Also I just noticed don't even need the kmap/kunmap at all given that the
> > block device mapping is never in highmem.
> > 
> 
> This potentially touches more places, I'll cover that in a dedicated 
> patchset.

Are the kmap/kunmaps anywhere on the buffer_head call paths? I can't
find it anywhere, and given that the mapping does not contain highmem
pages we could rather avoid adding it from the beginning.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/5] btrfs: use the page-cache for super block reading
  2020-02-07 16:13           ` David Sterba
@ 2020-02-10  7:16             ` Johannes Thumshirn
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2020-02-10  7:16 UTC (permalink / raw)
  To: dsterba
  Cc: Christoph Hellwig, Nikolay Borisov, Josef Bacik,
	linux-btrfs @ vger . kernel . org

On 07/02/2020 17:14, David Sterba wrote:
> On Thu, Feb 06, 2020 at 03:29:57PM +0000, Johannes Thumshirn wrote:
>> On 06/02/2020 15:58, Christoph Hellwig wrote:
>>> Also I just noticed don't even need the kmap/kunmap at all given that the
>>> block device mapping is never in highmem.
>>>
>>
>> This potentially touches more places, I'll cover that in a dedicated
>> patchset.
> 
> Are the kmap/kunmaps anywhere on the buffer_head call paths? I can't
> find it anywhere, and given that the mapping does not contain highmem
> pages we could rather avoid adding it from the beginning.


There's at least one more in btrfs_read_disk_super(), but yes I can 
avoid them in the BH path and remove the ones I find in the next go.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-02-10  7:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05 14:38 [PATCH v4 0/5] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
2020-02-05 14:38 ` [PATCH v4 1/5] btrfs: use the page-cache for super block reading Johannes Thumshirn
2020-02-05 16:53   ` Christoph Hellwig
2020-02-06  8:17     ` Johannes Thumshirn
2020-02-06 14:57       ` Christoph Hellwig
2020-02-06 15:29         ` Johannes Thumshirn
2020-02-07 16:13           ` David Sterba
2020-02-10  7:16             ` Johannes Thumshirn
2020-02-06  9:29     ` Johannes Thumshirn
2020-02-05 14:38 ` [PATCH v4 2/5] btrfs: use BIOs instead of buffer_heads from superblock writeout Johannes Thumshirn
2020-02-05 18:16   ` Christoph Hellwig
2020-02-06  8:20     ` Johannes Thumshirn
2020-02-06 14:59       ` Christoph Hellwig
2020-02-06 15:18         ` Johannes Thumshirn
2020-02-07 16:08       ` David Sterba
2020-02-05 14:38 ` [PATCH v4 3/5] btrfs: remove btrfsic_submit_bh() Johannes Thumshirn
2020-02-05 14:38 ` [PATCH v4 4/5] btrfs: remove buffer_heads from btrfsic_process_written_block() Johannes Thumshirn
2020-02-05 14:38 ` [PATCH v4 5/5] btrfs: remove buffer_heads form superblock mirror integrity checking Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.