All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] btrfs: remove buffer heads form superblock handling
@ 2020-01-23  8:18 Johannes Thumshirn
  2020-01-23  8:18 ` [PATCH v2 1/6] btrfs: export btrfs_release_disk_super Johannes Thumshirn
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-23  8:18 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org, Johannes Thumshirn

This patch series removes the use of buffer_heads from btrfs' super block read
and write paths. It also converts the integrity-checking code to only work
with pages and BIOs.

Compared to buffer heads, this gives us a leaner call path, as the
buffer_head code wraps around getting pages from the page-cache and adding
them to BIOs to submit.

The first patch is from Nikolay and exports btrfs_release_disk_super(), so it
can be used in the patchews follwing in this series.
The second patch removes buffer_heads from superblock reading.
The third from super_block writing and the subsequent patches remove the
buffer_heads from the integrity check code.

It's based on misc-next from Wednesday January 22, and doesn't show any
regressions in xfstests to the baseline.

Cahnges to v1:
- Added patch #1
- Converted sb reading and integrity checking to use the page cache
- Added rationale behind the conversion to the commit messages.
- For more details see the idividual patches.

Johannes Thumshirn (5):
  btrfs: remove buffer heads from super block reading
  btrfs: remove use of buffer_heads from superblock writeout
  btrfs: remove btrfsic_submit_bh()
  btrfs: remove buffer_heads from btrfsic_process_written_block()
  btrfs: remove buffer_heads form superblock mirror integrity checking

Nikolay Borisov (1):
  btrfs: export btrfs_release_disk_super

 fs/btrfs/check-integrity.c | 204 ++++++++++---------------------------
 fs/btrfs/check-integrity.h |   2 -
 fs/btrfs/disk-io.c         | 194 +++++++++++++++++++++--------------
 fs/btrfs/disk-io.h         |   4 +-
 fs/btrfs/volumes.c         |  64 +++++++-----
 fs/btrfs/volumes.h         |   3 +-
 6 files changed, 216 insertions(+), 255 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/6] btrfs: export btrfs_release_disk_super
  2020-01-23  8:18 [PATCH v2 0/6] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
@ 2020-01-23  8:18 ` Johannes Thumshirn
  2020-01-23 13:42   ` Josef Bacik
  2020-01-23  8:18 ` [PATCH v2 2/6] btrfs: remove buffer heads from super block reading Johannes Thumshirn
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-23  8:18 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org, Johannes Thumshirn

From: Nikolay Borisov <nborisov@suse.com>

Preparatory patch for removal of buffer_head usage in btrfs.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/volumes.c | 2 +-
 fs/btrfs/volumes.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9cfc668f91f4..7a480a2bdf51 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1247,7 +1247,7 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 	return ret;
 }
 
-static void btrfs_release_disk_super(struct page *page)
+void btrfs_release_disk_super(struct page *page)
 {
 	kunmap(page);
 	put_page(page);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 690d4f5a0653..b7f2edbc6581 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -483,6 +483,7 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
 int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset);
 struct extent_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
 				       u64 logical, u64 length);
+void btrfs_release_disk_super(struct page *page);
 
 static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
 				      int index)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/6] btrfs: remove buffer heads from super block reading
  2020-01-23  8:18 [PATCH v2 0/6] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
  2020-01-23  8:18 ` [PATCH v2 1/6] btrfs: export btrfs_release_disk_super Johannes Thumshirn
@ 2020-01-23  8:18 ` Johannes Thumshirn
  2020-01-23 13:51   ` Josef Bacik
                     ` (4 more replies)
  2020-01-23  8:18 ` [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout Johannes Thumshirn
                   ` (3 subsequent siblings)
  5 siblings, 5 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-23  8:18 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org, Johannes Thumshirn

Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
some drawbacks, like not being able to propagate errors from the lower
layers.

Change the buffer_heads to BIOs and utilize the page cache for the page
allocation. Compared to buffer_heads using BIOs are more lightweight and
we skip several layers of buffer_head code until we either reach the page
cache or build a BIO and submit it to read the blocks from disk.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

---
Changes to v1:
- move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
- switch to using pagecahce instead of alloc_pages() (Nikolay, David)
---
 fs/btrfs/disk-io.c | 83 ++++++++++++++++++++++++++++++----------------
 fs/btrfs/disk-io.h |  4 +--
 fs/btrfs/volumes.c | 62 ++++++++++++++++++++--------------
 fs/btrfs/volumes.h |  2 --
 4 files changed, 94 insertions(+), 57 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index aea48d6ddc0c..b111f32108cc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2635,11 +2635,12 @@ int __cold open_ctree(struct super_block *sb,
 	u64 features;
 	u16 csum_type;
 	struct btrfs_key location;
-	struct buffer_head *bh;
 	struct btrfs_super_block *disk_super;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_root *tree_root;
 	struct btrfs_root *chunk_root;
+	struct page *super_page;
+	u8 *superblock;
 	int ret;
 	int err = -EINVAL;
 	int clear_free_space_tree = 0;
@@ -2832,28 +2833,31 @@ int __cold open_ctree(struct super_block *sb,
 	/*
 	 * Read super block and check the signature bytes only
 	 */
-	bh = btrfs_read_dev_super(fs_devices->latest_bdev);
-	if (IS_ERR(bh)) {
-		err = PTR_ERR(bh);
+	ret = btrfs_read_dev_super(fs_devices->latest_bdev, &super_page);
+	if (ret) {
+		err = ret;
 		goto fail_alloc;
 	}
 
+	superblock = kmap(super_page);
 	/*
 	 * Verify the type first, if that or the the checksum value are
 	 * corrupted, we'll find out
 	 */
-	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)bh->b_data);
+	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)
+					  superblock);
 	if (!btrfs_supported_super_csum(csum_type)) {
 		btrfs_err(fs_info, "unsupported checksum algorithm: %u",
 			  csum_type);
 		err = -EINVAL;
-		brelse(bh);
+		btrfs_release_disk_super(super_page);
 		goto fail_alloc;
 	}
 
 	ret = btrfs_init_csum_hash(fs_info, csum_type);
 	if (ret) {
 		err = ret;
+		btrfs_release_disk_super(super_page);
 		goto fail_alloc;
 	}
 
@@ -2861,10 +2865,10 @@ int __cold open_ctree(struct super_block *sb,
 	 * We want to check superblock checksum, the type is stored inside.
 	 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
 	 */
-	if (btrfs_check_super_csum(fs_info, bh->b_data)) {
+	if (btrfs_check_super_csum(fs_info, superblock)) {
 		btrfs_err(fs_info, "superblock checksum mismatch");
 		err = -EINVAL;
-		brelse(bh);
+		btrfs_release_disk_super(super_page);
 		goto fail_csum;
 	}
 
@@ -2873,8 +2877,8 @@ int __cold open_ctree(struct super_block *sb,
 	 * following bytes up to INFO_SIZE, the checksum is calculated from
 	 * the whole block of INFO_SIZE
 	 */
-	memcpy(fs_info->super_copy, bh->b_data, sizeof(*fs_info->super_copy));
-	brelse(bh);
+	memcpy(fs_info->super_copy, superblock, sizeof(*fs_info->super_copy));
+	btrfs_release_disk_super(super_page);
 
 	disk_super = fs_info->super_copy;
 
@@ -3374,40 +3378,60 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
 }
 
 int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
-			struct buffer_head **bh_ret)
+			struct page **super_page)
 {
-	struct buffer_head *bh;
 	struct btrfs_super_block *super;
+	struct bio_vec bio_vec;
+	struct bio bio;
+	struct page *page;
 	u64 bytenr;
+	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	gfp_t gfp_mask;
+	int ret;
 
 	bytenr = btrfs_sb_offset(copy_num);
 	if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
 		return -EINVAL;
 
-	bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
+	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
+	page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
+	if (!page)
+		return -ENOMEM;
+
+	bio_init(&bio, &bio_vec, 1);
+	bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
+	bio_set_dev(&bio, bdev);
+	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
+	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
+		     offset_in_page(bytenr));
+
+	ret = submit_bio_wait(&bio);
+	unlock_page(page);
 	/*
 	 * If we fail to read from the underlying devices, as of now
 	 * the best option we have is to mark it EIO.
 	 */
-	if (!bh)
+	if (ret) {
+		put_page(page);
 		return -EIO;
+	}
 
-	super = (struct btrfs_super_block *)bh->b_data;
+	super = kmap(page);
 	if (btrfs_super_bytenr(super) != bytenr ||
 		    btrfs_super_magic(super) != BTRFS_MAGIC) {
-		brelse(bh);
+		btrfs_release_disk_super(page);
 		return -EINVAL;
 	}
+	kunmap(page);
 
-	*bh_ret = bh;
+	*super_page = page;
 	return 0;
 }
 
 
-struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
+int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
 {
-	struct buffer_head *bh;
-	struct buffer_head *latest = NULL;
+	struct page *latest = NULL;
 	struct btrfs_super_block *super;
 	int i;
 	u64 transid = 0;
@@ -3419,25 +3443,28 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
 	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 	 */
 	for (i = 0; i < 1; i++) {
-		ret = btrfs_read_dev_one_super(bdev, i, &bh);
+		ret = btrfs_read_dev_one_super(bdev, i, page);
 		if (ret)
 			continue;
 
-		super = (struct btrfs_super_block *)bh->b_data;
+		super = kmap(*page);
 
 		if (!latest || btrfs_super_generation(super) > transid) {
-			brelse(latest);
-			latest = bh;
+			if (latest)
+				btrfs_release_disk_super(latest);
+			latest = *page;
 			transid = btrfs_super_generation(super);
 		} else {
-			brelse(bh);
+			btrfs_release_disk_super(*page);
 		}
+
+		kunmap(*page);
 	}
 
-	if (!latest)
-		return ERR_PTR(ret);
+	if (ret)
+		return ret;
 
-	return latest;
+	return 0;
 }
 
 /*
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 8c2d6cf1ce59..e04b233c436a 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -54,9 +54,9 @@ int __cold open_ctree(struct super_block *sb,
 	       char *options);
 void __cold close_ctree(struct btrfs_fs_info *fs_info);
 int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors);
-struct buffer_head *btrfs_read_dev_super(struct block_device *bdev);
+int btrfs_read_dev_super(struct block_device *bdev, struct page **super_page);
 int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
-			struct buffer_head **bh_ret);
+			     struct page **super_page);
 int btrfs_commit_super(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root,
 				      struct btrfs_key *location);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7a480a2bdf51..f4a6ee518f0c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6,7 +6,6 @@
 #include <linux/sched.h>
 #include <linux/bio.h>
 #include <linux/slab.h>
-#include <linux/buffer_head.h>
 #include <linux/blkdev.h>
 #include <linux/ratelimit.h>
 #include <linux/kthread.h>
@@ -500,7 +499,7 @@ static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
 static int
 btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
 		      int flush, struct block_device **bdev,
-		      struct buffer_head **bh)
+		      struct page **super_page)
 {
 	int ret;
 
@@ -519,9 +518,8 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
 		goto error;
 	}
 	invalidate_bdev(*bdev);
-	*bh = btrfs_read_dev_super(*bdev);
-	if (IS_ERR(*bh)) {
-		ret = PTR_ERR(*bh);
+	ret = btrfs_read_dev_super(*bdev, super_page);
+	if (ret) {
 		blkdev_put(*bdev, flags);
 		goto error;
 	}
@@ -530,7 +528,6 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
 
 error:
 	*bdev = NULL;
-	*bh = NULL;
 	return ret;
 }
 
@@ -611,7 +608,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 {
 	struct request_queue *q;
 	struct block_device *bdev;
-	struct buffer_head *bh;
+	struct page *super_page;
 	struct btrfs_super_block *disk_super;
 	u64 devid;
 	int ret;
@@ -622,17 +619,17 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 		return -EINVAL;
 
 	ret = btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1,
-				    &bdev, &bh);
+				    &bdev, &super_page);
 	if (ret)
 		return ret;
 
-	disk_super = (struct btrfs_super_block *)bh->b_data;
+	disk_super = kmap(super_page);
 	devid = btrfs_stack_device_id(&disk_super->dev_item);
 	if (devid != device->devid)
-		goto error_brelse;
+		goto error_free_page;
 
 	if (memcmp(device->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE))
-		goto error_brelse;
+		goto error_free_page;
 
 	device->generation = btrfs_super_generation(disk_super);
 
@@ -641,7 +638,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 		    BTRFS_FEATURE_INCOMPAT_METADATA_UUID) {
 			pr_err(
 		"BTRFS: Invalid seeding and uuid-changed device detected\n");
-			goto error_brelse;
+			goto error_free_page;
 		}
 
 		clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
@@ -667,12 +664,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 		fs_devices->rw_devices++;
 		list_add_tail(&device->dev_alloc_list, &fs_devices->alloc_list);
 	}
-	brelse(bh);
+	btrfs_release_disk_super(super_page);
 
 	return 0;
 
-error_brelse:
-	brelse(bh);
+error_free_page:
+	btrfs_release_disk_super(super_page);
 	blkdev_put(bdev, flags);
 
 	return -EINVAL;
@@ -2209,14 +2206,15 @@ static struct btrfs_device *btrfs_find_device_by_path(
 	u64 devid;
 	u8 *dev_uuid;
 	struct block_device *bdev;
-	struct buffer_head *bh;
+	struct page *super_page;
 	struct btrfs_device *device;
 
 	ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
-				    fs_info->bdev_holder, 0, &bdev, &bh);
+				    fs_info->bdev_holder, 0, &bdev,
+				    &super_page);
 	if (ret)
 		return ERR_PTR(ret);
-	disk_super = (struct btrfs_super_block *)bh->b_data;
+	disk_super = kmap(super_page);
 	devid = btrfs_stack_device_id(&disk_super->dev_item);
 	dev_uuid = disk_super->dev_item.uuid;
 	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
@@ -2226,7 +2224,7 @@ static struct btrfs_device *btrfs_find_device_by_path(
 		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
 					   disk_super->fsid, true);
 
-	brelse(bh);
+	btrfs_release_disk_super(super_page);
 	if (!device)
 		device = ERR_PTR(-ENOENT);
 	blkdev_put(bdev, FMODE_READ);
@@ -7319,25 +7317,39 @@ int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
 
 void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path)
 {
-	struct buffer_head *bh;
+	struct bio_vec bio_vec;
+	struct bio bio;
 	struct btrfs_super_block *disk_super;
 	int copy_num;
 
 	if (!bdev)
 		return;
 
+	bio_init(&bio, &bio_vec, 1);
 	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX;
 		copy_num++) {
+		u64 bytenr = btrfs_sb_offset(copy_num);
+		struct page *page;
 
-		if (btrfs_read_dev_one_super(bdev, copy_num, &bh))
+		if (btrfs_read_dev_one_super(bdev, copy_num, &page))
 			continue;
 
-		disk_super = (struct btrfs_super_block *)bh->b_data;
+		disk_super = kmap(page) + offset_in_page(bytenr);
 
 		memset(&disk_super->magic, 0, sizeof(disk_super->magic));
-		set_buffer_dirty(bh);
-		sync_dirty_buffer(bh);
-		brelse(bh);
+
+		bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
+		bio_set_dev(&bio, bdev);
+		bio_set_op_attrs(&bio, REQ_OP_WRITE, 0);
+		bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
+			     offset_in_page(bytenr));
+
+		lock_page(page);
+		submit_bio_wait(&bio);
+		unlock_page(page);
+		btrfs_release_disk_super(page);
+		bio_reset(&bio);
+
 	}
 
 	/* Notify udev that device has changed */
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index b7f2edbc6581..1e4ebe6d6368 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -17,8 +17,6 @@ extern struct mutex uuid_mutex;
 
 #define BTRFS_STRIPE_LEN	SZ_64K
 
-struct buffer_head;
-
 struct btrfs_io_geometry {
 	/* remaining bytes before crossing a stripe */
 	u64 len;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout
  2020-01-23  8:18 [PATCH v2 0/6] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
  2020-01-23  8:18 ` [PATCH v2 1/6] btrfs: export btrfs_release_disk_super Johannes Thumshirn
  2020-01-23  8:18 ` [PATCH v2 2/6] btrfs: remove buffer heads from super block reading Johannes Thumshirn
@ 2020-01-23  8:18 ` Johannes Thumshirn
  2020-01-23 13:59   ` Josef Bacik
                     ` (2 more replies)
  2020-01-23  8:18 ` [PATCH v2 4/6] btrfs: remove btrfsic_submit_bh() Johannes Thumshirn
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-23  8:18 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org, Johannes Thumshirn

Similar to the superblock read path, change the write path to using BIOs
and pages instead of buffer_heads. This allows us to skip over the
buffer_head code, for writing the superblock to disk.

This is based on a patch originally authored by Nikolay Borisov.

Co-developed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

---
Changes to v1:
- Remove left-over buffer_head.h include (David)
---
 fs/btrfs/disk-io.c | 111 ++++++++++++++++++++++++++-------------------
 1 file changed, 64 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b111f32108cc..ac1755793596 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -7,7 +7,6 @@
 #include <linux/blkdev.h>
 #include <linux/radix-tree.h>
 #include <linux/writeback.h>
-#include <linux/buffer_head.h>
 #include <linux/workqueue.h>
 #include <linux/kthread.h>
 #include <linux/slab.h>
@@ -3356,25 +3355,33 @@ int __cold open_ctree(struct super_block *sb,
 }
 ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
 
-static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
+static void btrfs_end_super_write(struct bio *bio)
 {
-	if (uptodate) {
-		set_buffer_uptodate(bh);
-	} else {
-		struct btrfs_device *device = (struct btrfs_device *)
-			bh->b_private;
-
-		btrfs_warn_rl_in_rcu(device->fs_info,
-				"lost page write due to IO error on %s",
-					  rcu_str_deref(device->name));
-		/* note, we don't set_buffer_write_io_error because we have
-		 * our own ways of dealing with the IO errors
-		 */
-		clear_buffer_uptodate(bh);
-		btrfs_dev_stat_inc_and_print(device, BTRFS_DEV_STAT_WRITE_ERRS);
+	struct btrfs_device *device = bio->bi_private;
+	struct bio_vec *bvec;
+	struct bvec_iter_all iter_all;
+	struct page *page;
+
+	bio_for_each_segment_all(bvec, bio, iter_all) {
+		page = bvec->bv_page;
+
+		if (blk_status_to_errno(bio->bi_status)) {
+			btrfs_warn_rl_in_rcu(device->fs_info,
+					     "lost page write due to IO error on %s",
+					     rcu_str_deref(device->name));
+			ClearPageUptodate(page);
+			SetPageError(page);
+			btrfs_dev_stat_inc_and_print(device,
+						     BTRFS_DEV_STAT_WRITE_ERRS);
+		} else {
+			SetPageUptodate(page);
+		}
+
+		put_page(page);
+		unlock_page(page);
 	}
-	unlock_buffer(bh);
-	put_bh(bh);
+
+	bio_put(bio);
 }
 
 int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
@@ -3475,16 +3482,16 @@ int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
  * the expected device size at commit time. Note that max_mirrors must be
  * same for write and wait phases.
  *
- * Return number of errors when buffer head is not found or submission fails.
+ * Return number of errors when page is not found or submission fails.
  */
 static int write_dev_supers(struct btrfs_device *device,
 			    struct btrfs_super_block *sb, int max_mirrors)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
+	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
-	struct buffer_head *bh;
+	gfp_t gfp_mask;
 	int i;
-	int ret;
 	int errors = 0;
 	u64 bytenr;
 	int op_flags;
@@ -3494,7 +3501,13 @@ static int write_dev_supers(struct btrfs_device *device,
 
 	shash->tfm = fs_info->csum_shash;
 
+	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
+
 	for (i = 0; i < max_mirrors; i++) {
+		struct page *page;
+		struct bio *bio;
+		u8 *ptr;
+
 		bytenr = btrfs_sb_offset(i);
 		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
 		    device->commit_total_bytes)
@@ -3507,26 +3520,22 @@ static int write_dev_supers(struct btrfs_device *device,
 				    BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
 		crypto_shash_final(shash, sb->csum);
 
-		/* One reference for us, and we leave it for the caller */
-		bh = __getblk(device->bdev, bytenr / BTRFS_BDEV_BLOCKSIZE,
-			      BTRFS_SUPER_INFO_SIZE);
-		if (!bh) {
+		page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT,
+					   gfp_mask);
+		if (!page) {
 			btrfs_err(device->fs_info,
-			    "couldn't get super buffer head for bytenr %llu",
+			    "couldn't get superblock page for bytenr %llu",
 			    bytenr);
 			errors++;
 			continue;
 		}
 
-		memcpy(bh->b_data, sb, BTRFS_SUPER_INFO_SIZE);
-
-		/* one reference for submit_bh */
-		get_bh(bh);
+		/* Bump the refcount for wait_dev_supers() */
+		get_page(page);
 
-		set_buffer_uptodate(bh);
-		lock_buffer(bh);
-		bh->b_end_io = btrfs_end_buffer_write_sync;
-		bh->b_private = device;
+		ptr = kmap(page);
+		memcpy(ptr, sb, BTRFS_SUPER_INFO_SIZE);
+		kunmap(page);
 
 		/*
 		 * we fua the first super.  The others we allow
@@ -3535,9 +3544,17 @@ static int write_dev_supers(struct btrfs_device *device,
 		op_flags = REQ_SYNC | REQ_META | REQ_PRIO;
 		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
 			op_flags |= REQ_FUA;
-		ret = btrfsic_submit_bh(REQ_OP_WRITE, op_flags, bh);
-		if (ret)
-			errors++;
+
+		bio = bio_alloc(gfp_mask, 1);
+		bio_set_dev(bio, device->bdev);
+		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
+		bio->bi_private = device;
+		bio->bi_end_io = btrfs_end_super_write;
+		bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
+			     offset_in_page(bytenr));
+
+		bio_set_op_attrs(bio, REQ_OP_WRITE, op_flags);
+		btrfsic_submit_bio(bio);
 	}
 	return errors < i ? 0 : -1;
 }
@@ -3546,12 +3563,11 @@ static int write_dev_supers(struct btrfs_device *device,
  * Wait for write completion of superblocks done by write_dev_supers,
  * @max_mirrors same for write and wait phases.
  *
- * Return number of errors when buffer head is not found or not marked up to
+ * Return number of errors when page is not found or not marked up to
  * date.
  */
 static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 {
-	struct buffer_head *bh;
 	int i;
 	int errors = 0;
 	bool primary_failed = false;
@@ -3561,32 +3577,33 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		max_mirrors = BTRFS_SUPER_MIRROR_MAX;
 
 	for (i = 0; i < max_mirrors; i++) {
+		struct page *page;
+
 		bytenr = btrfs_sb_offset(i);
 		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
 		    device->commit_total_bytes)
 			break;
 
-		bh = __find_get_block(device->bdev,
-				      bytenr / BTRFS_BDEV_BLOCKSIZE,
-				      BTRFS_SUPER_INFO_SIZE);
-		if (!bh) {
+		page = find_get_page(device->bdev->bd_inode->i_mapping,
+				     bytenr >> PAGE_SHIFT);
+		if (!page) {
 			errors++;
 			if (i == 0)
 				primary_failed = true;
 			continue;
 		}
-		wait_on_buffer(bh);
-		if (!buffer_uptodate(bh)) {
+		wait_on_page_locked(page);
+		if (PageError(page)) {
 			errors++;
 			if (i == 0)
 				primary_failed = true;
 		}
 
 		/* drop our reference */
-		brelse(bh);
+		put_page(page);
 
 		/* drop the reference from the writing run */
-		brelse(bh);
+		put_page(page);
 	}
 
 	/* log error, force error return */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/6] btrfs: remove btrfsic_submit_bh()
  2020-01-23  8:18 [PATCH v2 0/6] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2020-01-23  8:18 ` [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout Johannes Thumshirn
@ 2020-01-23  8:18 ` Johannes Thumshirn
  2020-01-23 14:00   ` Josef Bacik
  2020-01-23  8:18 ` [PATCH v2 5/6] btrfs: remove buffer_heads from btrfsic_process_written_block() Johannes Thumshirn
  2020-01-23  8:18 ` [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking Johannes Thumshirn
  5 siblings, 1 reply; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-23  8:18 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org, Johannes Thumshirn

Now that the last use of btrfsic_submit_bh() is gone, remove the function
as well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/check-integrity.c | 57 --------------------------------------
 fs/btrfs/check-integrity.h |  2 --
 2 files changed, 59 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index a0ce69f2d27c..e7507985435e 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2730,63 +2730,6 @@ static struct btrfsic_dev_state *btrfsic_dev_state_lookup(dev_t dev)
 						  &btrfsic_dev_state_hashtable);
 }
 
-int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh)
-{
-	struct btrfsic_dev_state *dev_state;
-
-	if (!btrfsic_is_initialized)
-		return submit_bh(op, op_flags, bh);
-
-	mutex_lock(&btrfsic_mutex);
-	/* since btrfsic_submit_bh() might also be called before
-	 * btrfsic_mount(), this might return NULL */
-	dev_state = btrfsic_dev_state_lookup(bh->b_bdev->bd_dev);
-
-	/* Only called to write the superblock (incl. FLUSH/FUA) */
-	if (NULL != dev_state &&
-	    (op == REQ_OP_WRITE) && bh->b_size > 0) {
-		u64 dev_bytenr;
-
-		dev_bytenr = BTRFS_BDEV_BLOCKSIZE * bh->b_blocknr;
-		if (dev_state->state->print_mask &
-		    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
-			pr_info("submit_bh(op=0x%x,0x%x, blocknr=%llu (bytenr %llu), size=%zu, data=%p, bdev=%p)\n",
-			       op, op_flags, (unsigned long long)bh->b_blocknr,
-			       dev_bytenr, bh->b_size, bh->b_data, bh->b_bdev);
-		btrfsic_process_written_block(dev_state, dev_bytenr,
-					      &bh->b_data, 1, NULL,
-					      NULL, bh, op_flags);
-	} else if (NULL != dev_state && (op_flags & REQ_PREFLUSH)) {
-		if (dev_state->state->print_mask &
-		    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
-			pr_info("submit_bh(op=0x%x,0x%x FLUSH, bdev=%p)\n",
-			       op, op_flags, bh->b_bdev);
-		if (!dev_state->dummy_block_for_bio_bh_flush.is_iodone) {
-			if ((dev_state->state->print_mask &
-			     (BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH |
-			      BTRFSIC_PRINT_MASK_VERBOSE)))
-				pr_info("btrfsic_submit_bh(%s) with FLUSH but dummy block already in use (ignored)!\n",
-				       dev_state->name);
-		} else {
-			struct btrfsic_block *const block =
-				&dev_state->dummy_block_for_bio_bh_flush;
-
-			block->is_iodone = 0;
-			block->never_written = 0;
-			block->iodone_w_error = 0;
-			block->flush_gen = dev_state->last_flush_gen + 1;
-			block->submit_bio_bh_rw = op_flags;
-			block->orig_bio_bh_private = bh->b_private;
-			block->orig_bio_bh_end_io.bh = bh->b_end_io;
-			block->next_in_same_bio = NULL;
-			bh->b_private = block;
-			bh->b_end_io = btrfsic_bh_end_io;
-		}
-	}
-	mutex_unlock(&btrfsic_mutex);
-	return submit_bh(op, op_flags, bh);
-}
-
 static void __btrfsic_submit_bio(struct bio *bio)
 {
 	struct btrfsic_dev_state *dev_state;
diff --git a/fs/btrfs/check-integrity.h b/fs/btrfs/check-integrity.h
index 9bf4359cc44c..bcc730a06cb5 100644
--- a/fs/btrfs/check-integrity.h
+++ b/fs/btrfs/check-integrity.h
@@ -7,11 +7,9 @@
 #define BTRFS_CHECK_INTEGRITY_H
 
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
-int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh);
 void btrfsic_submit_bio(struct bio *bio);
 int btrfsic_submit_bio_wait(struct bio *bio);
 #else
-#define btrfsic_submit_bh submit_bh
 #define btrfsic_submit_bio submit_bio
 #define btrfsic_submit_bio_wait submit_bio_wait
 #endif
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 5/6] btrfs: remove buffer_heads from btrfsic_process_written_block()
  2020-01-23  8:18 [PATCH v2 0/6] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
                   ` (3 preceding siblings ...)
  2020-01-23  8:18 ` [PATCH v2 4/6] btrfs: remove btrfsic_submit_bh() Johannes Thumshirn
@ 2020-01-23  8:18 ` Johannes Thumshirn
  2020-01-23 14:00   ` Josef Bacik
  2020-01-23  8:18 ` [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking Johannes Thumshirn
  5 siblings, 1 reply; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-23  8:18 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org, Johannes Thumshirn

Now that the last caller of btrfsic_process_written_block() with
buffer_heads is gone, remove the buffer_head processing path from it as
well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/check-integrity.c | 103 +++++++++----------------------------
 1 file changed, 25 insertions(+), 78 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index e7507985435e..4f6db2fe482a 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -152,11 +152,8 @@ struct btrfsic_block {
 	struct list_head ref_to_list;	/* list */
 	struct list_head ref_from_list;	/* list */
 	struct btrfsic_block *next_in_same_bio;
-	void *orig_bio_bh_private;
-	union {
-		bio_end_io_t *bio;
-		bh_end_io_t *bh;
-	} orig_bio_bh_end_io;
+	void *orig_bio_private;
+	bio_end_io_t *orig_bio_end_io;
 	int submit_bio_bh_rw;
 	u64 flush_gen; /* only valid if !never_written */
 };
@@ -325,14 +322,12 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 					  u64 dev_bytenr, char **mapped_datav,
 					  unsigned int num_pages,
 					  struct bio *bio, int *bio_is_patched,
-					  struct buffer_head *bh,
 					  int submit_bio_bh_rw);
 static int btrfsic_process_written_superblock(
 		struct btrfsic_state *state,
 		struct btrfsic_block *const block,
 		struct btrfs_super_block *const super_hdr);
 static void btrfsic_bio_end_io(struct bio *bp);
-static void btrfsic_bh_end_io(struct buffer_head *bh, int uptodate);
 static int btrfsic_is_block_ref_by_superblock(const struct btrfsic_state *state,
 					      const struct btrfsic_block *block,
 					      int recursion_level);
@@ -399,8 +394,8 @@ static void btrfsic_block_init(struct btrfsic_block *b)
 	b->never_written = 0;
 	b->mirror_num = 0;
 	b->next_in_same_bio = NULL;
-	b->orig_bio_bh_private = NULL;
-	b->orig_bio_bh_end_io.bio = NULL;
+	b->orig_bio_private = NULL;
+	b->orig_bio_end_io = NULL;
 	INIT_LIST_HEAD(&b->collision_resolving_node);
 	INIT_LIST_HEAD(&b->all_blocks_node);
 	INIT_LIST_HEAD(&b->ref_to_list);
@@ -1743,7 +1738,6 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 					  u64 dev_bytenr, char **mapped_datav,
 					  unsigned int num_pages,
 					  struct bio *bio, int *bio_is_patched,
-					  struct buffer_head *bh,
 					  int submit_bio_bh_rw)
 {
 	int is_metadata;
@@ -1902,9 +1896,9 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 				block->is_iodone = 0;
 				BUG_ON(NULL == bio_is_patched);
 				if (!*bio_is_patched) {
-					block->orig_bio_bh_private =
+					block->orig_bio_private =
 					    bio->bi_private;
-					block->orig_bio_bh_end_io.bio =
+					block->orig_bio_end_io =
 					    bio->bi_end_io;
 					block->next_in_same_bio = NULL;
 					bio->bi_private = block;
@@ -1916,25 +1910,17 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 					    bio->bi_private;
 
 					BUG_ON(NULL == chained_block);
-					block->orig_bio_bh_private =
-					    chained_block->orig_bio_bh_private;
-					block->orig_bio_bh_end_io.bio =
-					    chained_block->orig_bio_bh_end_io.
-					    bio;
+					block->orig_bio_private =
+					    chained_block->orig_bio_private;
+					block->orig_bio_end_io =
+					    chained_block->orig_bio_end_io;
 					block->next_in_same_bio = chained_block;
 					bio->bi_private = block;
 				}
-			} else if (NULL != bh) {
-				block->is_iodone = 0;
-				block->orig_bio_bh_private = bh->b_private;
-				block->orig_bio_bh_end_io.bh = bh->b_end_io;
-				block->next_in_same_bio = NULL;
-				bh->b_private = block;
-				bh->b_end_io = btrfsic_bh_end_io;
 			} else {
 				block->is_iodone = 1;
-				block->orig_bio_bh_private = NULL;
-				block->orig_bio_bh_end_io.bio = NULL;
+				block->orig_bio_private = NULL;
+				block->orig_bio_end_io = NULL;
 				block->next_in_same_bio = NULL;
 			}
 		}
@@ -2042,8 +2028,8 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 			block->is_iodone = 0;
 			BUG_ON(NULL == bio_is_patched);
 			if (!*bio_is_patched) {
-				block->orig_bio_bh_private = bio->bi_private;
-				block->orig_bio_bh_end_io.bio = bio->bi_end_io;
+				block->orig_bio_private = bio->bi_private;
+				block->orig_bio_end_io = bio->bi_end_io;
 				block->next_in_same_bio = NULL;
 				bio->bi_private = block;
 				bio->bi_end_io = btrfsic_bio_end_io;
@@ -2054,24 +2040,17 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
 				    bio->bi_private;
 
 				BUG_ON(NULL == chained_block);
-				block->orig_bio_bh_private =
-				    chained_block->orig_bio_bh_private;
-				block->orig_bio_bh_end_io.bio =
-				    chained_block->orig_bio_bh_end_io.bio;
+				block->orig_bio_private =
+				    chained_block->orig_bio_private;
+				block->orig_bio_end_io =
+				    chained_block->orig_bio_end_io;
 				block->next_in_same_bio = chained_block;
 				bio->bi_private = block;
 			}
-		} else if (NULL != bh) {
-			block->is_iodone = 0;
-			block->orig_bio_bh_private = bh->b_private;
-			block->orig_bio_bh_end_io.bh = bh->b_end_io;
-			block->next_in_same_bio = NULL;
-			bh->b_private = block;
-			bh->b_end_io = btrfsic_bh_end_io;
 		} else {
 			block->is_iodone = 1;
-			block->orig_bio_bh_private = NULL;
-			block->orig_bio_bh_end_io.bio = NULL;
+			block->orig_bio_private = NULL;
+			block->orig_bio_end_io = NULL;
 			block->next_in_same_bio = NULL;
 		}
 		if (state->print_mask & BTRFSIC_PRINT_MASK_VERBOSE)
@@ -2112,8 +2091,8 @@ static void btrfsic_bio_end_io(struct bio *bp)
 		iodone_w_error = 1;
 
 	BUG_ON(NULL == block);
-	bp->bi_private = block->orig_bio_bh_private;
-	bp->bi_end_io = block->orig_bio_bh_end_io.bio;
+	bp->bi_private = block->orig_bio_private;
+	bp->bi_end_io = block->orig_bio_end_io;
 
 	do {
 		struct btrfsic_block *next_block;
@@ -2146,38 +2125,6 @@ static void btrfsic_bio_end_io(struct bio *bp)
 	bp->bi_end_io(bp);
 }
 
-static void btrfsic_bh_end_io(struct buffer_head *bh, int uptodate)
-{
-	struct btrfsic_block *block = (struct btrfsic_block *)bh->b_private;
-	int iodone_w_error = !uptodate;
-	struct btrfsic_dev_state *dev_state;
-
-	BUG_ON(NULL == block);
-	dev_state = block->dev_state;
-	if ((dev_state->state->print_mask & BTRFSIC_PRINT_MASK_END_IO_BIO_BH))
-		pr_info("bh_end_io(error=%d) for %c @%llu (%s/%llu/%d)\n",
-		       iodone_w_error,
-		       btrfsic_get_block_type(dev_state->state, block),
-		       block->logical_bytenr, block->dev_state->name,
-		       block->dev_bytenr, block->mirror_num);
-
-	block->iodone_w_error = iodone_w_error;
-	if (block->submit_bio_bh_rw & REQ_PREFLUSH) {
-		dev_state->last_flush_gen++;
-		if ((dev_state->state->print_mask &
-		     BTRFSIC_PRINT_MASK_END_IO_BIO_BH))
-			pr_info("bh_end_io() new %s flush_gen=%llu\n",
-			       dev_state->name, dev_state->last_flush_gen);
-	}
-	if (block->submit_bio_bh_rw & REQ_FUA)
-		block->flush_gen = 0; /* FUA completed means block is on disk */
-
-	bh->b_private = block->orig_bio_bh_private;
-	bh->b_end_io = block->orig_bio_bh_end_io.bh;
-	block->is_iodone = 1; /* for FLUSH, this releases the block */
-	bh->b_end_io(bh, uptodate);
-}
-
 static int btrfsic_process_written_superblock(
 		struct btrfsic_state *state,
 		struct btrfsic_block *const superblock,
@@ -2781,7 +2728,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
 		btrfsic_process_written_block(dev_state, dev_bytenr,
 					      mapped_datav, segs,
 					      bio, &bio_is_patched,
-					      NULL, bio->bi_opf);
+					      bio->bi_opf);
 		bio_for_each_segment(bvec, bio, iter)
 			kunmap(bvec.bv_page);
 		kfree(mapped_datav);
@@ -2805,8 +2752,8 @@ static void __btrfsic_submit_bio(struct bio *bio)
 			block->iodone_w_error = 0;
 			block->flush_gen = dev_state->last_flush_gen + 1;
 			block->submit_bio_bh_rw = bio->bi_opf;
-			block->orig_bio_bh_private = bio->bi_private;
-			block->orig_bio_bh_end_io.bio = bio->bi_end_io;
+			block->orig_bio_private = bio->bi_private;
+			block->orig_bio_end_io = bio->bi_end_io;
 			block->next_in_same_bio = NULL;
 			bio->bi_private = block;
 			bio->bi_end_io = btrfsic_bio_end_io;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking
  2020-01-23  8:18 [PATCH v2 0/6] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
                   ` (4 preceding siblings ...)
  2020-01-23  8:18 ` [PATCH v2 5/6] btrfs: remove buffer_heads from btrfsic_process_written_block() Johannes Thumshirn
@ 2020-01-23  8:18 ` Johannes Thumshirn
  2020-01-23 14:03   ` Josef Bacik
                     ` (2 more replies)
  5 siblings, 3 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-23  8:18 UTC (permalink / raw)
  To: David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org, Johannes Thumshirn

The integrity checking code for the superblock mirrors is the last remaining
user of buffer_heads in BTRFS, change it to using plain BIOs as well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

---
Changes to v1:
- Convert from alloc_page() to find_or_create_page()
---
 fs/btrfs/check-integrity.c | 44 +++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 4f6db2fe482a..45b88bcd6cbb 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -77,7 +77,6 @@
 
 #include <linux/sched.h>
 #include <linux/slab.h>
-#include <linux/buffer_head.h>
 #include <linux/mutex.h>
 #include <linux/genhd.h>
 #include <linux/blkdev.h>
@@ -762,28 +761,47 @@ static int btrfsic_process_superblock_dev_mirror(
 	struct btrfs_fs_info *fs_info = state->fs_info;
 	struct btrfs_super_block *super_tmp;
 	u64 dev_bytenr;
-	struct buffer_head *bh;
 	struct btrfsic_block *superblock_tmp;
 	int pass;
 	struct block_device *const superblock_bdev = device->bdev;
+	struct page *page;
+	struct bio bio;
+	struct bio_vec bio_vec;
+	struct address_space *mapping = superblock_bdev->bd_inode->i_mapping;
+	gfp_t gfp_mask;
+	int ret;
 
 	/* super block bytenr is always the unmapped device bytenr */
 	dev_bytenr = btrfs_sb_offset(superblock_mirror_num);
 	if (dev_bytenr + BTRFS_SUPER_INFO_SIZE > device->commit_total_bytes)
 		return -1;
-	bh = __bread(superblock_bdev, dev_bytenr / BTRFS_BDEV_BLOCKSIZE,
-		     BTRFS_SUPER_INFO_SIZE);
-	if (NULL == bh)
+
+	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
+
+	page = find_or_create_page(mapping, dev_bytenr >> PAGE_SHIFT, gfp_mask);
+	if (!page)
+		return -1;
+
+	bio_init(&bio, &bio_vec, 1);
+	bio.bi_iter.bi_sector = dev_bytenr >> SECTOR_SHIFT;
+	bio_set_dev(&bio, superblock_bdev);
+	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
+	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0);
+
+	ret = submit_bio_wait(&bio);
+	if (ret)
 		return -1;
-	super_tmp = (struct btrfs_super_block *)
-	    (bh->b_data + (dev_bytenr & (BTRFS_BDEV_BLOCKSIZE - 1)));
+
+	unlock_page(page);
+
+	super_tmp = kmap(page);
 
 	if (btrfs_super_bytenr(super_tmp) != dev_bytenr ||
 	    btrfs_super_magic(super_tmp) != BTRFS_MAGIC ||
 	    memcmp(device->uuid, super_tmp->dev_item.uuid, BTRFS_UUID_SIZE) ||
 	    btrfs_super_nodesize(super_tmp) != state->metablock_size ||
 	    btrfs_super_sectorsize(super_tmp) != state->datablock_size) {
-		brelse(bh);
+		btrfs_release_disk_super(page);
 		return 0;
 	}
 
@@ -795,7 +813,7 @@ static int btrfsic_process_superblock_dev_mirror(
 		superblock_tmp = btrfsic_block_alloc();
 		if (NULL == superblock_tmp) {
 			pr_info("btrfsic: error, kmalloc failed!\n");
-			brelse(bh);
+			btrfs_release_disk_super(page);
 			return -1;
 		}
 		/* for superblock, only the dev_bytenr makes sense */
@@ -880,7 +898,7 @@ static int btrfsic_process_superblock_dev_mirror(
 					      mirror_num)) {
 				pr_info("btrfsic: btrfsic_map_block(bytenr @%llu, mirror %d) failed!\n",
 				       next_bytenr, mirror_num);
-				brelse(bh);
+				btrfs_release_disk_super(page);
 				return -1;
 			}
 
@@ -890,7 +908,7 @@ static int btrfsic_process_superblock_dev_mirror(
 					mirror_num, NULL);
 			if (NULL == next_block) {
 				btrfsic_release_block_ctx(&tmp_next_block_ctx);
-				brelse(bh);
+				btrfs_release_disk_super(page);
 				return -1;
 			}
 
@@ -902,7 +920,7 @@ static int btrfsic_process_superblock_dev_mirror(
 					BTRFSIC_GENERATION_UNKNOWN);
 			btrfsic_release_block_ctx(&tmp_next_block_ctx);
 			if (NULL == l) {
-				brelse(bh);
+				btrfs_release_disk_super(page);
 				return -1;
 			}
 		}
@@ -910,7 +928,7 @@ static int btrfsic_process_superblock_dev_mirror(
 	if (state->print_mask & BTRFSIC_PRINT_MASK_INITIAL_ALL_TREES)
 		btrfsic_dump_tree_sub(state, superblock_tmp, 0);
 
-	brelse(bh);
+	btrfs_release_disk_super(page);
 	return 0;
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/6] btrfs: export btrfs_release_disk_super
  2020-01-23  8:18 ` [PATCH v2 1/6] btrfs: export btrfs_release_disk_super Johannes Thumshirn
@ 2020-01-23 13:42   ` Josef Bacik
  0 siblings, 0 replies; 24+ messages in thread
From: Josef Bacik @ 2020-01-23 13:42 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org

On 1/23/20 3:18 AM, Johannes Thumshirn wrote:
> From: Nikolay Borisov <nborisov@suse.com>
> 
> Preparatory patch for removal of buffer_head usage in btrfs.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/6] btrfs: remove buffer heads from super block reading
  2020-01-23  8:18 ` [PATCH v2 2/6] btrfs: remove buffer heads from super block reading Johannes Thumshirn
@ 2020-01-23 13:51   ` Josef Bacik
  2020-01-23 13:53     ` Johannes Thumshirn
  2020-01-23 14:13   ` Nikolay Borisov
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Josef Bacik @ 2020-01-23 13:51 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org

On 1/23/20 3:18 AM, Johannes Thumshirn wrote:
> Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
> some drawbacks, like not being able to propagate errors from the lower
> layers.
> 
> Change the buffer_heads to BIOs and utilize the page cache for the page
> allocation. Compared to buffer_heads using BIOs are more lightweight and
> we skip several layers of buffer_head code until we either reach the page
> cache or build a BIO and submit it to read the blocks from disk.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> ---
> Changes to v1:
> - move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
> - switch to using pagecahce instead of alloc_pages() (Nikolay, David)
> ---
>   fs/btrfs/disk-io.c | 83 ++++++++++++++++++++++++++++++----------------
>   fs/btrfs/disk-io.h |  4 +--
>   fs/btrfs/volumes.c | 62 ++++++++++++++++++++--------------
>   fs/btrfs/volumes.h |  2 --
>   4 files changed, 94 insertions(+), 57 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index aea48d6ddc0c..b111f32108cc 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2635,11 +2635,12 @@ int __cold open_ctree(struct super_block *sb,
>   	u64 features;
>   	u16 csum_type;
>   	struct btrfs_key location;
> -	struct buffer_head *bh;
>   	struct btrfs_super_block *disk_super;
>   	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
>   	struct btrfs_root *tree_root;
>   	struct btrfs_root *chunk_root;
> +	struct page *super_page;
> +	u8 *superblock;
>   	int ret;
>   	int err = -EINVAL;
>   	int clear_free_space_tree = 0;
> @@ -2832,28 +2833,31 @@ int __cold open_ctree(struct super_block *sb,
>   	/*
>   	 * Read super block and check the signature bytes only
>   	 */
> -	bh = btrfs_read_dev_super(fs_devices->latest_bdev);
> -	if (IS_ERR(bh)) {
> -		err = PTR_ERR(bh);
> +	ret = btrfs_read_dev_super(fs_devices->latest_bdev, &super_page);
> +	if (ret) {
> +		err = ret;
>   		goto fail_alloc;
>   	}
>   
> +	superblock = kmap(super_page);
>   	/*
>   	 * Verify the type first, if that or the the checksum value are
>   	 * corrupted, we'll find out
>   	 */
> -	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)bh->b_data);
> +	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)
> +					  superblock);
>   	if (!btrfs_supported_super_csum(csum_type)) {
>   		btrfs_err(fs_info, "unsupported checksum algorithm: %u",
>   			  csum_type);
>   		err = -EINVAL;
> -		brelse(bh);
> +		btrfs_release_disk_super(super_page);
>   		goto fail_alloc;
>   	}
>   
>   	ret = btrfs_init_csum_hash(fs_info, csum_type);
>   	if (ret) {
>   		err = ret;
> +		btrfs_release_disk_super(super_page);
>   		goto fail_alloc;
>   	}
>   
> @@ -2861,10 +2865,10 @@ int __cold open_ctree(struct super_block *sb,
>   	 * We want to check superblock checksum, the type is stored inside.
>   	 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
>   	 */
> -	if (btrfs_check_super_csum(fs_info, bh->b_data)) {
> +	if (btrfs_check_super_csum(fs_info, superblock)) {
>   		btrfs_err(fs_info, "superblock checksum mismatch");
>   		err = -EINVAL;
> -		brelse(bh);
> +		btrfs_release_disk_super(super_page);
>   		goto fail_csum;
>   	}
>   
> @@ -2873,8 +2877,8 @@ int __cold open_ctree(struct super_block *sb,
>   	 * following bytes up to INFO_SIZE, the checksum is calculated from
>   	 * the whole block of INFO_SIZE
>   	 */
> -	memcpy(fs_info->super_copy, bh->b_data, sizeof(*fs_info->super_copy));
> -	brelse(bh);
> +	memcpy(fs_info->super_copy, superblock, sizeof(*fs_info->super_copy));
> +	btrfs_release_disk_super(super_page);
>   
>   	disk_super = fs_info->super_copy;
>   
> @@ -3374,40 +3378,60 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
>   }
>   
>   int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
> -			struct buffer_head **bh_ret)
> +			struct page **super_page)
>   {
> -	struct buffer_head *bh;
>   	struct btrfs_super_block *super;
> +	struct bio_vec bio_vec;
> +	struct bio bio;
> +	struct page *page;
>   	u64 bytenr;
> +	struct address_space *mapping = bdev->bd_inode->i_mapping;
> +	gfp_t gfp_mask;
> +	int ret;
>   
>   	bytenr = btrfs_sb_offset(copy_num);
>   	if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
>   		return -EINVAL;
>   
> -	bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
> +	page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
> +	if (!page)
> +		return -ENOMEM;
> +
> +	bio_init(&bio, &bio_vec, 1);
> +	bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> +	bio_set_dev(&bio, bdev);
> +	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
> +	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
> +		     offset_in_page(bytenr));
> +
> +	ret = submit_bio_wait(&bio);
> +	unlock_page(page);
>   	/*
>   	 * If we fail to read from the underlying devices, as of now
>   	 * the best option we have is to mark it EIO.
>   	 */
> -	if (!bh)
> +	if (ret) {
> +		put_page(page);
>   		return -EIO;
> +	}
>   
> -	super = (struct btrfs_super_block *)bh->b_data;
> +	super = kmap(page);
>   	if (btrfs_super_bytenr(super) != bytenr ||
>   		    btrfs_super_magic(super) != BTRFS_MAGIC) {
> -		brelse(bh);
> +		btrfs_release_disk_super(page);
>   		return -EINVAL;
>   	}
> +	kunmap(page);
>   
> -	*bh_ret = bh;
> +	*super_page = page;
>   	return 0;
>   }
>   
>   
> -struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
> +int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
>   {
> -	struct buffer_head *bh;
> -	struct buffer_head *latest = NULL;
> +	struct page *latest = NULL;
>   	struct btrfs_super_block *super;
>   	int i;
>   	u64 transid = 0;
> @@ -3419,25 +3443,28 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>   	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>   	 */
>   	for (i = 0; i < 1; i++) {
> -		ret = btrfs_read_dev_one_super(bdev, i, &bh);
> +		ret = btrfs_read_dev_one_super(bdev, i, page);
>   		if (ret)
>   			continue;
>   
> -		super = (struct btrfs_super_block *)bh->b_data;
> +		super = kmap(*page);
>   
>   		if (!latest || btrfs_super_generation(super) > transid) {
> -			brelse(latest);
> -			latest = bh;
> +			if (latest)
> +				btrfs_release_disk_super(latest);
> +			latest = *page;
>   			transid = btrfs_super_generation(super);
>   		} else {
> -			brelse(bh);
> +			btrfs_release_disk_super(*page);
>   		}
> +
> +		kunmap(*page);
>   	}
>   
> -	if (!latest)
> -		return ERR_PTR(ret);
> +	if (ret)
> +		return ret;
>   
> -	return latest;
> +	return 0;
>   }
>   
>   /*
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 8c2d6cf1ce59..e04b233c436a 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -54,9 +54,9 @@ int __cold open_ctree(struct super_block *sb,
>   	       char *options);
>   void __cold close_ctree(struct btrfs_fs_info *fs_info);
>   int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors);
> -struct buffer_head *btrfs_read_dev_super(struct block_device *bdev);
> +int btrfs_read_dev_super(struct block_device *bdev, struct page **super_page);
>   int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
> -			struct buffer_head **bh_ret);
> +			     struct page **super_page);
>   int btrfs_commit_super(struct btrfs_fs_info *fs_info);
>   struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root,
>   				      struct btrfs_key *location);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7a480a2bdf51..f4a6ee518f0c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6,7 +6,6 @@
>   #include <linux/sched.h>
>   #include <linux/bio.h>
>   #include <linux/slab.h>
> -#include <linux/buffer_head.h>
>   #include <linux/blkdev.h>
>   #include <linux/ratelimit.h>
>   #include <linux/kthread.h>
> @@ -500,7 +499,7 @@ static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
>   static int
>   btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
>   		      int flush, struct block_device **bdev,
> -		      struct buffer_head **bh)
> +		      struct page **super_page)
>   {
>   	int ret;
>   
> @@ -519,9 +518,8 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
>   		goto error;
>   	}
>   	invalidate_bdev(*bdev);
> -	*bh = btrfs_read_dev_super(*bdev);
> -	if (IS_ERR(*bh)) {
> -		ret = PTR_ERR(*bh);
> +	ret = btrfs_read_dev_super(*bdev, super_page);
> +	if (ret) {
>   		blkdev_put(*bdev, flags);
>   		goto error;
>   	}
> @@ -530,7 +528,6 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
>   
>   error:
>   	*bdev = NULL;
> -	*bh = NULL;
>   	return ret;
>   }
>   
> @@ -611,7 +608,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>   {
>   	struct request_queue *q;
>   	struct block_device *bdev;
> -	struct buffer_head *bh;
> +	struct page *super_page;
>   	struct btrfs_super_block *disk_super;
>   	u64 devid;
>   	int ret;
> @@ -622,17 +619,17 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>   		return -EINVAL;
>   
>   	ret = btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1,
> -				    &bdev, &bh);
> +				    &bdev, &super_page);
>   	if (ret)
>   		return ret;
>   
> -	disk_super = (struct btrfs_super_block *)bh->b_data;
> +	disk_super = kmap(super_page);
>   	devid = btrfs_stack_device_id(&disk_super->dev_item);
>   	if (devid != device->devid)
> -		goto error_brelse;
> +		goto error_free_page;
>   
>   	if (memcmp(device->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE))
> -		goto error_brelse;
> +		goto error_free_page;
>   
>   	device->generation = btrfs_super_generation(disk_super);
>   
> @@ -641,7 +638,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>   		    BTRFS_FEATURE_INCOMPAT_METADATA_UUID) {
>   			pr_err(
>   		"BTRFS: Invalid seeding and uuid-changed device detected\n");
> -			goto error_brelse;
> +			goto error_free_page;
>   		}
>   
>   		clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> @@ -667,12 +664,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>   		fs_devices->rw_devices++;
>   		list_add_tail(&device->dev_alloc_list, &fs_devices->alloc_list);
>   	}
> -	brelse(bh);
> +	btrfs_release_disk_super(super_page);
>   
>   	return 0;
>   
> -error_brelse:
> -	brelse(bh);
> +error_free_page:
> +	btrfs_release_disk_super(super_page);
>   	blkdev_put(bdev, flags);
>   
>   	return -EINVAL;
> @@ -2209,14 +2206,15 @@ static struct btrfs_device *btrfs_find_device_by_path(
>   	u64 devid;
>   	u8 *dev_uuid;
>   	struct block_device *bdev;
> -	struct buffer_head *bh;
> +	struct page *super_page;
>   	struct btrfs_device *device;
>   
>   	ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
> -				    fs_info->bdev_holder, 0, &bdev, &bh);
> +				    fs_info->bdev_holder, 0, &bdev,
> +				    &super_page);
>   	if (ret)
>   		return ERR_PTR(ret);
> -	disk_super = (struct btrfs_super_block *)bh->b_data;
> +	disk_super = kmap(super_page);
>   	devid = btrfs_stack_device_id(&disk_super->dev_item);
>   	dev_uuid = disk_super->dev_item.uuid;
>   	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
> @@ -2226,7 +2224,7 @@ static struct btrfs_device *btrfs_find_device_by_path(
>   		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
>   					   disk_super->fsid, true);
>   
> -	brelse(bh);
> +	btrfs_release_disk_super(super_page);
>   	if (!device)
>   		device = ERR_PTR(-ENOENT);
>   	blkdev_put(bdev, FMODE_READ);
> @@ -7319,25 +7317,39 @@ int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
>   
>   void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path)
>   {
> -	struct buffer_head *bh;
> +	struct bio_vec bio_vec;
> +	struct bio bio;
>   	struct btrfs_super_block *disk_super;
>   	int copy_num;
>   
>   	if (!bdev)
>   		return;
>   
> +	bio_init(&bio, &bio_vec, 1);
>   	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX;
>   		copy_num++) {
> +		u64 bytenr = btrfs_sb_offset(copy_num);
> +		struct page *page;
>   
> -		if (btrfs_read_dev_one_super(bdev, copy_num, &bh))
> +		if (btrfs_read_dev_one_super(bdev, copy_num, &page))
>   			continue;
>   
> -		disk_super = (struct btrfs_super_block *)bh->b_data;
> +		disk_super = kmap(page) + offset_in_page(bytenr);
>   
>   		memset(&disk_super->magic, 0, sizeof(disk_super->magic));
> -		set_buffer_dirty(bh);
> -		sync_dirty_buffer(bh);
> -		brelse(bh);
> +
> +		bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> +		bio_set_dev(&bio, bdev);
> +		bio_set_op_attrs(&bio, REQ_OP_WRITE, 0);
nit: We're losing REQ_SYNC here, not that it matters but if you have to re-roll 
it may be nice to have.  Otherwise

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/6] btrfs: remove buffer heads from super block reading
  2020-01-23 13:51   ` Josef Bacik
@ 2020-01-23 13:53     ` Johannes Thumshirn
  0 siblings, 0 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-23 13:53 UTC (permalink / raw)
  To: Josef Bacik, David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org

On 23/01/2020, 14:51, "Josef Bacik" <josef@toxicpanda.com> wrote:
[...]
    > -		brelse(bh);
    > +
    > +		bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
    > +		bio_set_dev(&bio, bdev);
    > +		bio_set_op_attrs(&bio, REQ_OP_WRITE, 0);
    nit: We're losing REQ_SYNC here, not that it matters but if you have to re-roll 
    it may be nice to have.  Otherwise
    
No actually we don't. submnit_bio_wait() implies REQ_SYNC, so we're all good.

    Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,
	Johannes
   


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout
  2020-01-23  8:18 ` [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout Johannes Thumshirn
@ 2020-01-23 13:59   ` Josef Bacik
  2020-01-23 14:19   ` Nikolay Borisov
  2020-01-24 14:15   ` David Sterba
  2 siblings, 0 replies; 24+ messages in thread
From: Josef Bacik @ 2020-01-23 13:59 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org

On 1/23/20 3:18 AM, Johannes Thumshirn wrote:
> Similar to the superblock read path, change the write path to using BIOs
> and pages instead of buffer_heads. This allows us to skip over the
> buffer_head code, for writing the superblock to disk.
> 
> This is based on a patch originally authored by Nikolay Borisov.
> 
> Co-developed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/6] btrfs: remove btrfsic_submit_bh()
  2020-01-23  8:18 ` [PATCH v2 4/6] btrfs: remove btrfsic_submit_bh() Johannes Thumshirn
@ 2020-01-23 14:00   ` Josef Bacik
  0 siblings, 0 replies; 24+ messages in thread
From: Josef Bacik @ 2020-01-23 14:00 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org

On 1/23/20 3:18 AM, Johannes Thumshirn wrote:
> Now that the last use of btrfsic_submit_bh() is gone, remove the function
> as well.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/6] btrfs: remove buffer_heads from btrfsic_process_written_block()
  2020-01-23  8:18 ` [PATCH v2 5/6] btrfs: remove buffer_heads from btrfsic_process_written_block() Johannes Thumshirn
@ 2020-01-23 14:00   ` Josef Bacik
  0 siblings, 0 replies; 24+ messages in thread
From: Josef Bacik @ 2020-01-23 14:00 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org

On 1/23/20 3:18 AM, Johannes Thumshirn wrote:
> Now that the last caller of btrfsic_process_written_block() with
> buffer_heads is gone, remove the buffer_head processing path from it as
> well.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking
  2020-01-23  8:18 ` [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking Johannes Thumshirn
@ 2020-01-23 14:03   ` Josef Bacik
  2020-01-23 14:23   ` Nikolay Borisov
  2020-01-24 14:22   ` David Sterba
  2 siblings, 0 replies; 24+ messages in thread
From: Josef Bacik @ 2020-01-23 14:03 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: Nikolay Borisov, linux-btrfs @ vger . kernel . org

On 1/23/20 3:18 AM, Johannes Thumshirn wrote:
> The integrity checking code for the superblock mirrors is the last remaining
> user of buffer_heads in BTRFS, change it to using plain BIOs as well.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> ---
> Changes to v1:
> - Convert from alloc_page() to find_or_create_page()
> ---
>   fs/btrfs/check-integrity.c | 44 +++++++++++++++++++++++++++-----------
>   1 file changed, 31 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index 4f6db2fe482a..45b88bcd6cbb 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -77,7 +77,6 @@
>   
>   #include <linux/sched.h>
>   #include <linux/slab.h>
> -#include <linux/buffer_head.h>
>   #include <linux/mutex.h>
>   #include <linux/genhd.h>
>   #include <linux/blkdev.h>
> @@ -762,28 +761,47 @@ static int btrfsic_process_superblock_dev_mirror(
>   	struct btrfs_fs_info *fs_info = state->fs_info;
>   	struct btrfs_super_block *super_tmp;
>   	u64 dev_bytenr;
> -	struct buffer_head *bh;
>   	struct btrfsic_block *superblock_tmp;
>   	int pass;
>   	struct block_device *const superblock_bdev = device->bdev;
> +	struct page *page;
> +	struct bio bio;
> +	struct bio_vec bio_vec;
> +	struct address_space *mapping = superblock_bdev->bd_inode->i_mapping;
> +	gfp_t gfp_mask;
> +	int ret;
>   
>   	/* super block bytenr is always the unmapped device bytenr */
>   	dev_bytenr = btrfs_sb_offset(superblock_mirror_num);
>   	if (dev_bytenr + BTRFS_SUPER_INFO_SIZE > device->commit_total_bytes)
>   		return -1;
> -	bh = __bread(superblock_bdev, dev_bytenr / BTRFS_BDEV_BLOCKSIZE,
> -		     BTRFS_SUPER_INFO_SIZE);
> -	if (NULL == bh)
> +
> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
> +

I don't think we need __GFP_NOFAIL here, it's just check integrity.  Other than 
that you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/6] btrfs: remove buffer heads from super block reading
  2020-01-23  8:18 ` [PATCH v2 2/6] btrfs: remove buffer heads from super block reading Johannes Thumshirn
  2020-01-23 13:51   ` Josef Bacik
@ 2020-01-23 14:13   ` Nikolay Borisov
  2020-01-23 16:45     ` Johannes Thumshirn
  2020-01-23 22:25   ` David Sterba
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Nikolay Borisov @ 2020-01-23 14:13 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba; +Cc: linux-btrfs @ vger . kernel . org



On 23.01.20 г. 10:18 ч., Johannes Thumshirn wrote:

<snip>

> @@ -3374,40 +3378,60 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
>  }
>  
>  int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
> -			struct buffer_head **bh_ret)
> +			struct page **super_page)
>  {
> -	struct buffer_head *bh;
>  	struct btrfs_super_block *super;
> +	struct bio_vec bio_vec;
> +	struct bio bio;
> +	struct page *page;
>  	u64 bytenr;
> +	struct address_space *mapping = bdev->bd_inode->i_mapping;
> +	gfp_t gfp_mask;
> +	int ret;
>  
>  	bytenr = btrfs_sb_offset(copy_num);
>  	if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
>  		return -EINVAL;
>  
> -	bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
> +	page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
> +	if (!page)
> +		return -ENOMEM;
> +
> +	bio_init(&bio, &bio_vec, 1);
> +	bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> +	bio_set_dev(&bio, bdev);
> +	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
> +	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
> +		     offset_in_page(bytenr));
> +
> +	ret = submit_bio_wait(&bio);
> +	unlock_page(page);

So this is based on my code where the page is unlocked as soon as it's
read. However, as per out off-list discussion, I think this page needs
to be unlocked once the caller of
btrfs_read_dev_one_super/btrfs_read_dev_super is finished with it.

>  	/*
>  	 * If we fail to read from the underlying devices, as of now
>  	 * the best option we have is to mark it EIO.
>  	 */
> -	if (!bh)
> +	if (ret) {
> +		put_page(page);
>  		return -EIO;
> +	}
>  
> -	super = (struct btrfs_super_block *)bh->b_data;
> +	super = kmap(page);
>  	if (btrfs_super_bytenr(super) != bytenr ||
>  		    btrfs_super_magic(super) != BTRFS_MAGIC) {
> -		brelse(bh);
> +		btrfs_release_disk_super(page);
>  		return -EINVAL;
>  	}
> +	kunmap(page);
>  
> -	*bh_ret = bh;
> +	*super_page = page;
>  	return 0;
>  }
>  
>  
> -struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
> +int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
>  {
> -	struct buffer_head *bh;
> -	struct buffer_head *latest = NULL;
> +	struct page *latest = NULL;
>  	struct btrfs_super_block *super;
>  	int i;
>  	u64 transid = 0;

<snip>

>  
>  void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path)
>  {
> -	struct buffer_head *bh;
> +	struct bio_vec bio_vec;
> +	struct bio bio;
>  	struct btrfs_super_block *disk_super;
>  	int copy_num;
>  
>  	if (!bdev)
>  		return;
>  
> +	bio_init(&bio, &bio_vec, 1);
>  	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX;
>  		copy_num++) {
> +		u64 bytenr = btrfs_sb_offset(copy_num);
> +		struct page *page;
>  
> -		if (btrfs_read_dev_one_super(bdev, copy_num, &bh))
> +		if (btrfs_read_dev_one_super(bdev, copy_num, &page))
>  			continue;
>  
> -		disk_super = (struct btrfs_super_block *)bh->b_data;
> +		disk_super = kmap(page) + offset_in_page(bytenr);
>  
>  		memset(&disk_super->magic, 0, sizeof(disk_super->magic));
> -		set_buffer_dirty(bh);
> -		sync_dirty_buffer(bh);
> -		brelse(bh);
> +
> +		bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> +		bio_set_dev(&bio, bdev);
> +		bio_set_op_attrs(&bio, REQ_OP_WRITE, 0);
> +		bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
> +			     offset_in_page(bytenr));
> +
> +		lock_page(page);
> +		submit_bio_wait(&bio);
> +		unlock_page(page);
> +		btrfs_release_disk_super(page);
> +		bio_reset(&bio);

For example in this code if btrfs_read_dev_one_super held the page
locked you would have unlocked after submit_bio_wait is has returned.
But in this case what you do is take the page unlocked and now you have
a race window between btrfs_read_dev_one_super and lock_page(page) in
this function? btrfs_scratch_superblocks is used in replace context so
what if it races with transaction commit? If my worries are moot and I
have missed anything then at the very least safety should be documented
in the changelog of the patch.

> +
>  	}
>  
>  	/* Notify udev that device has changed */
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index b7f2edbc6581..1e4ebe6d6368 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -17,8 +17,6 @@ extern struct mutex uuid_mutex;
>  
>  #define BTRFS_STRIPE_LEN	SZ_64K
>  
> -struct buffer_head;
> -
>  struct btrfs_io_geometry {
>  	/* remaining bytes before crossing a stripe */
>  	u64 len;
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout
  2020-01-23  8:18 ` [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout Johannes Thumshirn
  2020-01-23 13:59   ` Josef Bacik
@ 2020-01-23 14:19   ` Nikolay Borisov
  2020-01-24 14:15   ` David Sterba
  2 siblings, 0 replies; 24+ messages in thread
From: Nikolay Borisov @ 2020-01-23 14:19 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba; +Cc: linux-btrfs @ vger . kernel . org



On 23.01.20 г. 10:18 ч., Johannes Thumshirn wrote:
> Similar to the superblock read path, change the write path to using BIOs
> and pages instead of buffer_heads. This allows us to skip over the
> buffer_head code, for writing the superblock to disk.
> 
> This is based on a patch originally authored by Nikolay Borisov.
> 
> Co-developed-by: Nikolay Borisov <nborisov@suse.com>

As per Documentation/process/5.Posting.rst:

Every Co-developed-by: must be immediately followed by a Signed-off-by:
of the associated co-author.

FWIW I'm giving mine (I guess david can fix this during merge):

Signed-off-by: Nikolay Borisov <nborisov@suse.com>

> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 

Additionally,

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking
  2020-01-23  8:18 ` [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking Johannes Thumshirn
  2020-01-23 14:03   ` Josef Bacik
@ 2020-01-23 14:23   ` Nikolay Borisov
  2020-01-24 14:22   ` David Sterba
  2 siblings, 0 replies; 24+ messages in thread
From: Nikolay Borisov @ 2020-01-23 14:23 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba; +Cc: linux-btrfs @ vger . kernel . org



On 23.01.20 г. 10:18 ч., Johannes Thumshirn wrote:
> The integrity checking code for the superblock mirrors is the last remaining
> user of buffer_heads in BTRFS, change it to using plain BIOs as well.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> ---
> Changes to v1:
> - Convert from alloc_page() to find_or_create_page()
> ---
>  fs/btrfs/check-integrity.c | 44 +++++++++++++++++++++++++++-----------
>  1 file changed, 31 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index 4f6db2fe482a..45b88bcd6cbb 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -77,7 +77,6 @@
>  
>  #include <linux/sched.h>
>  #include <linux/slab.h>
> -#include <linux/buffer_head.h>
>  #include <linux/mutex.h>
>  #include <linux/genhd.h>
>  #include <linux/blkdev.h>
> @@ -762,28 +761,47 @@ static int btrfsic_process_superblock_dev_mirror(
>  	struct btrfs_fs_info *fs_info = state->fs_info;
>  	struct btrfs_super_block *super_tmp;
>  	u64 dev_bytenr;
> -	struct buffer_head *bh;
>  	struct btrfsic_block *superblock_tmp;
>  	int pass;
>  	struct block_device *const superblock_bdev = device->bdev;
> +	struct page *page;
> +	struct bio bio;
> +	struct bio_vec bio_vec;
> +	struct address_space *mapping = superblock_bdev->bd_inode->i_mapping;
> +	gfp_t gfp_mask;
> +	int ret;
>  
>  	/* super block bytenr is always the unmapped device bytenr */
>  	dev_bytenr = btrfs_sb_offset(superblock_mirror_num);
>  	if (dev_bytenr + BTRFS_SUPER_INFO_SIZE > device->commit_total_bytes)
>  		return -1;
> -	bh = __bread(superblock_bdev, dev_bytenr / BTRFS_BDEV_BLOCKSIZE,
> -		     BTRFS_SUPER_INFO_SIZE);
> -	if (NULL == bh)
> +
> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
> +
> +	page = find_or_create_page(mapping, dev_bytenr >> PAGE_SHIFT, gfp_mask);
> +	if (!page)
> +		return -1;
> +
> +	bio_init(&bio, &bio_vec, 1);
> +	bio.bi_iter.bi_sector = dev_bytenr >> SECTOR_SHIFT;
> +	bio_set_dev(&bio, superblock_bdev);
> +	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
> +	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0);
> +
> +	ret = submit_bio_wait(&bio);
> +	if (ret)
>  		return -1;
> -	super_tmp = (struct btrfs_super_block *)
> -	    (bh->b_data + (dev_bytenr & (BTRFS_BDEV_BLOCKSIZE - 1)));
> +
> +	unlock_page(page);

This is safe since it's part of the integrity code which gets called
during mount so presumably we can't have a transaction commit while this
is running. I'd prefer an explicit mention of that in the changelog.

<snip>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/6] btrfs: remove buffer heads from super block reading
  2020-01-23 14:13   ` Nikolay Borisov
@ 2020-01-23 16:45     ` Johannes Thumshirn
  0 siblings, 0 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-23 16:45 UTC (permalink / raw)
  To: Nikolay Borisov, David Sterba; +Cc: linux-btrfs @ vger . kernel . org

[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]

On 23/01/2020 15:14, Nikolay Borisov wrote:
> 
> 
> On 23.01.20 г. 10:18 ч., Johannes Thumshirn wrote:
> 
> <snip>
> 
>> @@ -3374,40 +3378,60 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
>>   }
>>   
>>   int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
>> -			struct buffer_head **bh_ret)
>> +			struct page **super_page)
>>   {
>> -	struct buffer_head *bh;
>>   	struct btrfs_super_block *super;
>> +	struct bio_vec bio_vec;
>> +	struct bio bio;
>> +	struct page *page;
>>   	u64 bytenr;
>> +	struct address_space *mapping = bdev->bd_inode->i_mapping;
>> +	gfp_t gfp_mask;
>> +	int ret;
>>   
>>   	bytenr = btrfs_sb_offset(copy_num);
>>   	if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
>>   		return -EINVAL;
>>   
>> -	bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
>> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
>> +	page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
>> +	if (!page)
>> +		return -ENOMEM;
>> +
>> +	bio_init(&bio, &bio_vec, 1);
>> +	bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
>> +	bio_set_dev(&bio, bdev);
>> +	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
>> +	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
>> +		     offset_in_page(bytenr));
>> +
>> +	ret = submit_bio_wait(&bio);
>> +	unlock_page(page);
> 
> So this is based on my code where the page is unlocked as soon as it's
> read. However, as per out off-list discussion, I think this page needs
> to be unlocked once the caller of
> btrfs_read_dev_one_super/btrfs_read_dev_super is finished with it.

As I don't have an answer to that yet, I just gave it a try (see 
attached diff) and it instantly deadlocks on mount. So either I did 
something wrong or holding the lock until the callers have finished 
processing it is a bad idea.

Thanks,
	Johannes

[-- Attachment #2: disk-io.c.patch --]
[-- Type: text/plain, Size: 2406 bytes --]

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ac1755793596..7afac96be17f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2856,6 +2856,7 @@ int __cold open_ctree(struct super_block *sb,
 	ret = btrfs_init_csum_hash(fs_info, csum_type);
 	if (ret) {
 		err = ret;
+		unlock_page(super_page);
 		btrfs_release_disk_super(super_page);
 		goto fail_alloc;
 	}
@@ -2867,6 +2868,7 @@ int __cold open_ctree(struct super_block *sb,
 	if (btrfs_check_super_csum(fs_info, superblock)) {
 		btrfs_err(fs_info, "superblock checksum mismatch");
 		err = -EINVAL;
+		unlock_page(super_page);
 		btrfs_release_disk_super(super_page);
 		goto fail_csum;
 	}
@@ -2877,6 +2879,7 @@ int __cold open_ctree(struct super_block *sb,
 	 * the whole block of INFO_SIZE
 	 */
 	memcpy(fs_info->super_copy, superblock, sizeof(*fs_info->super_copy));
+	unlock_page(super_page);
 	btrfs_release_disk_super(super_page);
 
 	disk_super = fs_info->super_copy;
@@ -3413,7 +3416,6 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
 		     offset_in_page(bytenr));
 
 	ret = submit_bio_wait(&bio);
-	unlock_page(page);
 	/*
 	 * If we fail to read from the underlying devices, as of now
 	 * the best option we have is to mark it EIO.
@@ -3426,6 +3428,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
 	super = kmap(page);
 	if (btrfs_super_bytenr(super) != bytenr ||
 		    btrfs_super_magic(super) != BTRFS_MAGIC) {
+		unlock_page(page);
 		btrfs_release_disk_super(page);
 		return -EINVAL;
 	}
@@ -3457,11 +3460,14 @@ int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
 		super = kmap(*page);
 
 		if (!latest || btrfs_super_generation(super) > transid) {
-			if (latest)
+			if (latest) {
+				unlock_page(latest);
 				btrfs_release_disk_super(latest);
+			}
 			latest = *page;
 			transid = btrfs_super_generation(super);
 		} else {
+			unlock_page(*page);
 			btrfs_release_disk_super(*page);
 		}
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4a6ee518f0c..8bb2f9dead63 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7344,7 +7344,6 @@ void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_pat
 		bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
 			     offset_in_page(bytenr));
 
-		lock_page(page);
 		submit_bio_wait(&bio);
 		unlock_page(page);
 		btrfs_release_disk_super(page);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/6] btrfs: remove buffer heads from super block reading
  2020-01-23  8:18 ` [PATCH v2 2/6] btrfs: remove buffer heads from super block reading Johannes Thumshirn
  2020-01-23 13:51   ` Josef Bacik
  2020-01-23 14:13   ` Nikolay Borisov
@ 2020-01-23 22:25   ` David Sterba
  2020-01-24 13:54   ` David Sterba
  2020-01-24 14:03   ` David Sterba
  4 siblings, 0 replies; 24+ messages in thread
From: David Sterba @ 2020-01-23 22:25 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Nikolay Borisov, linux-btrfs @ vger . kernel . org

On Thu, Jan 23, 2020 at 05:18:45PM +0900, Johannes Thumshirn wrote:
> Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
> some drawbacks, like not being able to propagate errors from the lower
> layers.
> 
> Change the buffer_heads to BIOs and utilize the page cache for the page
> allocation. Compared to buffer_heads using BIOs are more lightweight and
> we skip several layers of buffer_head code until we either reach the page
> cache or build a BIO and submit it to read the blocks from disk.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> ---
> Changes to v1:
> - move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
> - switch to using pagecahce instead of alloc_pages() (Nikolay, David)
> ---
>  fs/btrfs/disk-io.c | 83 ++++++++++++++++++++++++++++++----------------
>  fs/btrfs/disk-io.h |  4 +--
>  fs/btrfs/volumes.c | 62 ++++++++++++++++++++--------------
>  fs/btrfs/volumes.h |  2 --
>  4 files changed, 94 insertions(+), 57 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index aea48d6ddc0c..b111f32108cc 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2635,11 +2635,12 @@ int __cold open_ctree(struct super_block *sb,
>  	u64 features;
>  	u16 csum_type;
>  	struct btrfs_key location;
> -	struct buffer_head *bh;
>  	struct btrfs_super_block *disk_super;
>  	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
>  	struct btrfs_root *tree_root;
>  	struct btrfs_root *chunk_root;
> +	struct page *super_page;
> +	u8 *superblock;
>  	int ret;
>  	int err = -EINVAL;
>  	int clear_free_space_tree = 0;
> @@ -2832,28 +2833,31 @@ int __cold open_ctree(struct super_block *sb,
>  	/*
>  	 * Read super block and check the signature bytes only
>  	 */
> -	bh = btrfs_read_dev_super(fs_devices->latest_bdev);
> -	if (IS_ERR(bh)) {
> -		err = PTR_ERR(bh);
> +	ret = btrfs_read_dev_super(fs_devices->latest_bdev, &super_page);
> +	if (ret) {
> +		err = ret;
>  		goto fail_alloc;
>  	}
>  
> +	superblock = kmap(super_page);

Page mapped here

>  	/*
>  	 * Verify the type first, if that or the the checksum value are
>  	 * corrupted, we'll find out
>  	 */
> -	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)bh->b_data);
> +	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)
> +					  superblock);
>  	if (!btrfs_supported_super_csum(csum_type)) {
>  		btrfs_err(fs_info, "unsupported checksum algorithm: %u",
>  			  csum_type);
>  		err = -EINVAL;
> -		brelse(bh);
> +		btrfs_release_disk_super(super_page);

and the pairing kunmap is hidden in this helper with mismatching name.
It's correct but the common cleanup code at the fail_alloc label can be
used to avoid calling btrfs_release_disk_super at each error branch.

The helper is trival, kunmap(), put_page(). I'd rather keep it opencoded
in case the kmap happens in the function so it can be easily seen what's
the span of the mapping.

>  		goto fail_alloc;
>  	}
>  
>  	ret = btrfs_init_csum_hash(fs_info, csum_type);
>  	if (ret) {
>  		err = ret;
> +		btrfs_release_disk_super(super_page);
>  		goto fail_alloc;
>  	}
>  
> @@ -2861,10 +2865,10 @@ int __cold open_ctree(struct super_block *sb,
>  	 * We want to check superblock checksum, the type is stored inside.
>  	 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
>  	 */
> -	if (btrfs_check_super_csum(fs_info, bh->b_data)) {
> +	if (btrfs_check_super_csum(fs_info, superblock)) {
>  		btrfs_err(fs_info, "superblock checksum mismatch");
>  		err = -EINVAL;
> -		brelse(bh);
> +		btrfs_release_disk_super(super_page);
>  		goto fail_csum;
>  	}
>  
> @@ -2873,8 +2877,8 @@ int __cold open_ctree(struct super_block *sb,
>  	 * following bytes up to INFO_SIZE, the checksum is calculated from
>  	 * the whole block of INFO_SIZE
>  	 */
> -	memcpy(fs_info->super_copy, bh->b_data, sizeof(*fs_info->super_copy));
> -	brelse(bh);
> +	memcpy(fs_info->super_copy, superblock, sizeof(*fs_info->super_copy));
> +	btrfs_release_disk_super(super_page);
>  
>  	disk_super = fs_info->super_copy;
>  
> @@ -3374,40 +3378,60 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
>  }
>  
>  int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
> -			struct buffer_head **bh_ret)
> +			struct page **super_page)
>  {
> -	struct buffer_head *bh;
>  	struct btrfs_super_block *super;
> +	struct bio_vec bio_vec;
> +	struct bio bio;
> +	struct page *page;
>  	u64 bytenr;
> +	struct address_space *mapping = bdev->bd_inode->i_mapping;
> +	gfp_t gfp_mask;
> +	int ret;
>  
>  	bytenr = btrfs_sb_offset(copy_num);
>  	if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
>  		return -EINVAL;
>  
> -	bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
> +	page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
> +	if (!page)
> +		return -ENOMEM;
> +
> +	bio_init(&bio, &bio_vec, 1);
> +	bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> +	bio_set_dev(&bio, bdev);
> +	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
> +	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
> +		     offset_in_page(bytenr));
> +
> +	ret = submit_bio_wait(&bio);
> +	unlock_page(page);
>  	/*
>  	 * If we fail to read from the underlying devices, as of now
>  	 * the best option we have is to mark it EIO.
>  	 */
> -	if (!bh)
> +	if (ret) {
> +		put_page(page);
>  		return -EIO;
> +	}
>  
> -	super = (struct btrfs_super_block *)bh->b_data;
> +	super = kmap(page);
>  	if (btrfs_super_bytenr(super) != bytenr ||
>  		    btrfs_super_magic(super) != BTRFS_MAGIC) {
> -		brelse(bh);
> +		btrfs_release_disk_super(page);
>  		return -EINVAL;
>  	}
> +	kunmap(page);

Like here

>  
> -	*bh_ret = bh;
> +	*super_page = page;
>  	return 0;
>  }
>  
>  
> -struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
> +int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
>  {
> -	struct buffer_head *bh;
> -	struct buffer_head *latest = NULL;
> +	struct page *latest = NULL;
>  	struct btrfs_super_block *super;
>  	int i;
>  	u64 transid = 0;
> @@ -3419,25 +3443,28 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>  	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>  	 */
>  	for (i = 0; i < 1; i++) {
> -		ret = btrfs_read_dev_one_super(bdev, i, &bh);
> +		ret = btrfs_read_dev_one_super(bdev, i, page);
>  		if (ret)
>  			continue;
>  
> -		super = (struct btrfs_super_block *)bh->b_data;
> +		super = kmap(*page);
>  
>  		if (!latest || btrfs_super_generation(super) > transid) {
> -			brelse(latest);
> -			latest = bh;
> +			if (latest)
> +				btrfs_release_disk_super(latest);
> +			latest = *page;
>  			transid = btrfs_super_generation(super);
>  		} else {
> -			brelse(bh);
> +			btrfs_release_disk_super(*page);
>  		}
> +
> +		kunmap(*page);

And here.

>  	}
>  
> -	if (!latest)
> -		return ERR_PTR(ret);
> +	if (ret)
> +		return ret;
>  
> -	return latest;
> +	return 0;
>  }
>  
>  /*
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 8c2d6cf1ce59..e04b233c436a 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -54,9 +54,9 @@ int __cold open_ctree(struct super_block *sb,
>  	       char *options);
>  void __cold close_ctree(struct btrfs_fs_info *fs_info);
>  int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors);
> -struct buffer_head *btrfs_read_dev_super(struct block_device *bdev);
> +int btrfs_read_dev_super(struct block_device *bdev, struct page **super_page);
>  int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
> -			struct buffer_head **bh_ret);
> +			     struct page **super_page);
>  int btrfs_commit_super(struct btrfs_fs_info *fs_info);
>  struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root,
>  				      struct btrfs_key *location);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7a480a2bdf51..f4a6ee518f0c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6,7 +6,6 @@
>  #include <linux/sched.h>
>  #include <linux/bio.h>
>  #include <linux/slab.h>
> -#include <linux/buffer_head.h>
>  #include <linux/blkdev.h>
>  #include <linux/ratelimit.h>
>  #include <linux/kthread.h>
> @@ -500,7 +499,7 @@ static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
>  static int
>  btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
>  		      int flush, struct block_device **bdev,
> -		      struct buffer_head **bh)
> +		      struct page **super_page)
>  {
>  	int ret;
>  
> @@ -519,9 +518,8 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
>  		goto error;
>  	}
>  	invalidate_bdev(*bdev);
> -	*bh = btrfs_read_dev_super(*bdev);
> -	if (IS_ERR(*bh)) {
> -		ret = PTR_ERR(*bh);
> +	ret = btrfs_read_dev_super(*bdev, super_page);
> +	if (ret) {
>  		blkdev_put(*bdev, flags);
>  		goto error;
>  	}
> @@ -530,7 +528,6 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
>  
>  error:
>  	*bdev = NULL;
> -	*bh = NULL;
>  	return ret;
>  }
>  
> @@ -611,7 +608,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>  {
>  	struct request_queue *q;
>  	struct block_device *bdev;
> -	struct buffer_head *bh;
> +	struct page *super_page;
>  	struct btrfs_super_block *disk_super;
>  	u64 devid;
>  	int ret;
> @@ -622,17 +619,17 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>  		return -EINVAL;
>  
>  	ret = btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1,
> -				    &bdev, &bh);
> +				    &bdev, &super_page);
>  	if (ret)
>  		return ret;
>  
> -	disk_super = (struct btrfs_super_block *)bh->b_data;
> +	disk_super = kmap(super_page);
>  	devid = btrfs_stack_device_id(&disk_super->dev_item);
>  	if (devid != device->devid)
> -		goto error_brelse;
> +		goto error_free_page;
>  
>  	if (memcmp(device->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE))
> -		goto error_brelse;
> +		goto error_free_page;
>  
>  	device->generation = btrfs_super_generation(disk_super);
>  
> @@ -641,7 +638,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>  		    BTRFS_FEATURE_INCOMPAT_METADATA_UUID) {
>  			pr_err(
>  		"BTRFS: Invalid seeding and uuid-changed device detected\n");
> -			goto error_brelse;
> +			goto error_free_page;
>  		}
>  
>  		clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> @@ -667,12 +664,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>  		fs_devices->rw_devices++;
>  		list_add_tail(&device->dev_alloc_list, &fs_devices->alloc_list);
>  	}
> -	brelse(bh);
> +	btrfs_release_disk_super(super_page);

kummap here

>  
>  	return 0;
>  
> -error_brelse:
> -	brelse(bh);
> +error_free_page:
> +	btrfs_release_disk_super(super_page);
>  	blkdev_put(bdev, flags);
>  
>  	return -EINVAL;
> @@ -2209,14 +2206,15 @@ static struct btrfs_device *btrfs_find_device_by_path(
>  	u64 devid;
>  	u8 *dev_uuid;
>  	struct block_device *bdev;
> -	struct buffer_head *bh;
> +	struct page *super_page;
>  	struct btrfs_device *device;
>  
>  	ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
> -				    fs_info->bdev_holder, 0, &bdev, &bh);
> +				    fs_info->bdev_holder, 0, &bdev,
> +				    &super_page);
>  	if (ret)
>  		return ERR_PTR(ret);
> -	disk_super = (struct btrfs_super_block *)bh->b_data;
> +	disk_super = kmap(super_page);
>  	devid = btrfs_stack_device_id(&disk_super->dev_item);
>  	dev_uuid = disk_super->dev_item.uuid;
>  	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
> @@ -2226,7 +2224,7 @@ static struct btrfs_device *btrfs_find_device_by_path(
>  		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
>  					   disk_super->fsid, true);
>  
> -	brelse(bh);
> +	btrfs_release_disk_super(super_page);

and here

>  	if (!device)
>  		device = ERR_PTR(-ENOENT);
>  	blkdev_put(bdev, FMODE_READ);
> @@ -7319,25 +7317,39 @@ int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
>  
>  void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path)
>  {
> -	struct buffer_head *bh;
> +	struct bio_vec bio_vec;
> +	struct bio bio;
>  	struct btrfs_super_block *disk_super;
>  	int copy_num;
>  
>  	if (!bdev)
>  		return;
>  
> +	bio_init(&bio, &bio_vec, 1);
>  	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX;
>  		copy_num++) {
> +		u64 bytenr = btrfs_sb_offset(copy_num);
> +		struct page *page;
>  
> -		if (btrfs_read_dev_one_super(bdev, copy_num, &bh))
> +		if (btrfs_read_dev_one_super(bdev, copy_num, &page))
>  			continue;
>  
> -		disk_super = (struct btrfs_super_block *)bh->b_data;
> +		disk_super = kmap(page) + offset_in_page(bytenr);
>  
>  		memset(&disk_super->magic, 0, sizeof(disk_super->magic));
> -		set_buffer_dirty(bh);
> -		sync_dirty_buffer(bh);
> -		brelse(bh);
> +
> +		bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> +		bio_set_dev(&bio, bdev);
> +		bio_set_op_attrs(&bio, REQ_OP_WRITE, 0);
> +		bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
> +			     offset_in_page(bytenr));
> +
> +		lock_page(page);
> +		submit_bio_wait(&bio);
> +		unlock_page(page);
> +		btrfs_release_disk_super(page);

And the last one

> +		bio_reset(&bio);
> +
>  	}
>  
>  	/* Notify udev that device has changed */
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index b7f2edbc6581..1e4ebe6d6368 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -17,8 +17,6 @@ extern struct mutex uuid_mutex;
>  
>  #define BTRFS_STRIPE_LEN	SZ_64K
>  
> -struct buffer_head;
> -
>  struct btrfs_io_geometry {
>  	/* remaining bytes before crossing a stripe */
>  	u64 len;
> -- 
> 2.24.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/6] btrfs: remove buffer heads from super block reading
  2020-01-23  8:18 ` [PATCH v2 2/6] btrfs: remove buffer heads from super block reading Johannes Thumshirn
                     ` (2 preceding siblings ...)
  2020-01-23 22:25   ` David Sterba
@ 2020-01-24 13:54   ` David Sterba
  2020-01-24 14:03   ` David Sterba
  4 siblings, 0 replies; 24+ messages in thread
From: David Sterba @ 2020-01-24 13:54 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Nikolay Borisov, linux-btrfs @ vger . kernel . org

On Thu, Jan 23, 2020 at 05:18:45PM +0900, Johannes Thumshirn wrote:
> -	if (!latest)
> -		return ERR_PTR(ret);
> +	if (ret)
> +		return ret;
>  
> -	return latest;
> +	return 0;

This could be simpliried to 'return ret', no?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/6] btrfs: remove buffer heads from super block reading
  2020-01-23  8:18 ` [PATCH v2 2/6] btrfs: remove buffer heads from super block reading Johannes Thumshirn
                     ` (3 preceding siblings ...)
  2020-01-24 13:54   ` David Sterba
@ 2020-01-24 14:03   ` David Sterba
  4 siblings, 0 replies; 24+ messages in thread
From: David Sterba @ 2020-01-24 14:03 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Nikolay Borisov, linux-btrfs @ vger . kernel . org

On Thu, Jan 23, 2020 at 05:18:45PM +0900, Johannes Thumshirn wrote:
> Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
> some drawbacks, like not being able to propagate errors from the lower
> layers.
> 
> Change the buffer_heads to BIOs and utilize the page cache for the page
> allocation. Compared to buffer_heads using BIOs are more lightweight and
> we skip several layers of buffer_head code until we either reach the page
> cache or build a BIO and submit it to read the blocks from disk.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> ---
> Changes to v1:
> - move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
> - switch to using pagecahce instead of alloc_pages() (Nikolay, David)
> ---
>  fs/btrfs/disk-io.c | 83 ++++++++++++++++++++++++++++++----------------
>  fs/btrfs/disk-io.h |  4 +--
>  fs/btrfs/volumes.c | 62 ++++++++++++++++++++--------------
>  fs/btrfs/volumes.h |  2 --
>  4 files changed, 94 insertions(+), 57 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index aea48d6ddc0c..b111f32108cc 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2635,11 +2635,12 @@ int __cold open_ctree(struct super_block *sb,
>  	u64 features;
>  	u16 csum_type;
>  	struct btrfs_key location;
> -	struct buffer_head *bh;
>  	struct btrfs_super_block *disk_super;
>  	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
>  	struct btrfs_root *tree_root;
>  	struct btrfs_root *chunk_root;
> +	struct page *super_page;
> +	u8 *superblock;
>  	int ret;
>  	int err = -EINVAL;
>  	int clear_free_space_tree = 0;
> @@ -2832,28 +2833,31 @@ int __cold open_ctree(struct super_block *sb,
>  	/*
>  	 * Read super block and check the signature bytes only
>  	 */
> -	bh = btrfs_read_dev_super(fs_devices->latest_bdev);
> -	if (IS_ERR(bh)) {
> -		err = PTR_ERR(bh);
> +	ret = btrfs_read_dev_super(fs_devices->latest_bdev, &super_page);
> +	if (ret) {
> +		err = ret;
>  		goto fail_alloc;
>  	}
>  
> +	superblock = kmap(super_page);
>  	/*
>  	 * Verify the type first, if that or the the checksum value are
>  	 * corrupted, we'll find out
>  	 */
> -	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)bh->b_data);
> +	csum_type = btrfs_super_csum_type((struct btrfs_super_block *)
> +					  superblock);
>  	if (!btrfs_supported_super_csum(csum_type)) {
>  		btrfs_err(fs_info, "unsupported checksum algorithm: %u",
>  			  csum_type);
>  		err = -EINVAL;
> -		brelse(bh);
> +		btrfs_release_disk_super(super_page);
>  		goto fail_alloc;
>  	}
>  
>  	ret = btrfs_init_csum_hash(fs_info, csum_type);
>  	if (ret) {
>  		err = ret;
> +		btrfs_release_disk_super(super_page);
>  		goto fail_alloc;
>  	}
>  
> @@ -2861,10 +2865,10 @@ int __cold open_ctree(struct super_block *sb,
>  	 * We want to check superblock checksum, the type is stored inside.
>  	 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
>  	 */
> -	if (btrfs_check_super_csum(fs_info, bh->b_data)) {
> +	if (btrfs_check_super_csum(fs_info, superblock)) {
>  		btrfs_err(fs_info, "superblock checksum mismatch");
>  		err = -EINVAL;
> -		brelse(bh);
> +		btrfs_release_disk_super(super_page);
>  		goto fail_csum;
>  	}
>  
> @@ -2873,8 +2877,8 @@ int __cold open_ctree(struct super_block *sb,
>  	 * following bytes up to INFO_SIZE, the checksum is calculated from
>  	 * the whole block of INFO_SIZE
>  	 */
> -	memcpy(fs_info->super_copy, bh->b_data, sizeof(*fs_info->super_copy));
> -	brelse(bh);
> +	memcpy(fs_info->super_copy, superblock, sizeof(*fs_info->super_copy));
> +	btrfs_release_disk_super(super_page);
>  
>  	disk_super = fs_info->super_copy;
>  
> @@ -3374,40 +3378,60 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
>  }
>  
>  int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
> -			struct buffer_head **bh_ret)
> +			struct page **super_page)
>  {
> -	struct buffer_head *bh;
>  	struct btrfs_super_block *super;
> +	struct bio_vec bio_vec;
> +	struct bio bio;
> +	struct page *page;
>  	u64 bytenr;
> +	struct address_space *mapping = bdev->bd_inode->i_mapping;
> +	gfp_t gfp_mask;
> +	int ret;
>  
>  	bytenr = btrfs_sb_offset(copy_num);
>  	if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
>  		return -EINVAL;
>  
> -	bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
> +	gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
> +	page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
> +	if (!page)
> +		return -ENOMEM;
> +
> +	bio_init(&bio, &bio_vec, 1);
> +	bio.bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> +	bio_set_dev(&bio, bdev);
> +	bio_set_op_attrs(&bio, REQ_OP_READ, 0);

The comment in blk_types.h says:

390 /* obsolete, don't use in new code */
391 static inline void bio_set_op_attrs(struct bio *bio, unsigned op,
392                 unsigned op_flags)

please open code it. Other uses of bio_set_op_attrs have been cleaned up
already.

> +	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE,
> +		     offset_in_page(bytenr));

For a fresh bio this never fails right?

> +
> +	ret = submit_bio_wait(&bio);
> +	unlock_page(page);
>  	/*
>  	 * If we fail to read from the underlying devices, as of now
>  	 * the best option we have is to mark it EIO.
>  	 */
> -	if (!bh)
> +	if (ret) {
> +		put_page(page);
>  		return -EIO;
> +	}
>  
> -	super = (struct btrfs_super_block *)bh->b_data;
> +	super = kmap(page);
>  	if (btrfs_super_bytenr(super) != bytenr ||
>  		    btrfs_super_magic(super) != BTRFS_MAGIC) {
> -		brelse(bh);
> +		btrfs_release_disk_super(page);
>  		return -EINVAL;
>  	}
> +	kunmap(page);
>  
> -	*bh_ret = bh;
> +	*super_page = page;
>  	return 0;
>  }
>  
>  
> -struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
> +int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
>  {
> -	struct buffer_head *bh;
> -	struct buffer_head *latest = NULL;
> +	struct page *latest = NULL;
>  	struct btrfs_super_block *super;
>  	int i;
>  	u64 transid = 0;
> @@ -3419,25 +3443,28 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>  	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>  	 */
>  	for (i = 0; i < 1; i++) {
> -		ret = btrfs_read_dev_one_super(bdev, i, &bh);
> +		ret = btrfs_read_dev_one_super(bdev, i, page);
>  		if (ret)
>  			continue;
>  
> -		super = (struct btrfs_super_block *)bh->b_data;
> +		super = kmap(*page);
>  
>  		if (!latest || btrfs_super_generation(super) > transid) {
> -			brelse(latest);
> -			latest = bh;
> +			if (latest)
> +				btrfs_release_disk_super(latest);
> +			latest = *page;
>  			transid = btrfs_super_generation(super);
>  		} else {
> -			brelse(bh);
> +			btrfs_release_disk_super(*page);
>  		}
> +
> +		kunmap(*page);

This looks like double unmap, once in btrfs_release_disk_super and then
unconditinally as kunmap.

The first part of the if/else block also calls kunmap, but depending on
latest.

So another point for opencoding btrfs_release_disk_super, this kind of
mistakes is too easy.

>  	}
>  
> -	if (!latest)
> -		return ERR_PTR(ret);
> +	if (ret)
> +		return ret;
>  
> -	return latest;
> +	return 0;
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout
  2020-01-23  8:18 ` [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout Johannes Thumshirn
  2020-01-23 13:59   ` Josef Bacik
  2020-01-23 14:19   ` Nikolay Borisov
@ 2020-01-24 14:15   ` David Sterba
  2 siblings, 0 replies; 24+ messages in thread
From: David Sterba @ 2020-01-24 14:15 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Nikolay Borisov, linux-btrfs @ vger . kernel . org

On Thu, Jan 23, 2020 at 05:18:46PM +0900, Johannes Thumshirn wrote:
> @@ -3507,26 +3520,22 @@ static int write_dev_supers(struct btrfs_device *device,
>  				    BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
>  		crypto_shash_final(shash, sb->csum);
>  
> -		/* One reference for us, and we leave it for the caller */
> -		bh = __getblk(device->bdev, bytenr / BTRFS_BDEV_BLOCKSIZE,
> -			      BTRFS_SUPER_INFO_SIZE);
> -		if (!bh) {
> +		page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT,
> +					   gfp_mask);

Note the difference: bh used BDEV_BLOCKSIZE which is fixed value of 4kb,
while bios use page size

With bh the bh_data always point to the requested 4KiB multiple, while
for the page it could be different. Now this will not break in practice
because all offsets of superblocks are multiples of 64KiB which is
probably the largest page size available on arches today.

> +		if (!page) {
>  			btrfs_err(device->fs_info,
> -			    "couldn't get super buffer head for bytenr %llu",
> +			    "couldn't get superblock page for bytenr %llu",
>  			    bytenr);
>  			errors++;
>  			continue;
>  		}
>  
> -		memcpy(bh->b_data, sb, BTRFS_SUPER_INFO_SIZE);
> -
> -		/* one reference for submit_bh */
> -		get_bh(bh);
> +		/* Bump the refcount for wait_dev_supers() */
> +		get_page(page);
>  
> -		set_buffer_uptodate(bh);
> -		lock_buffer(bh);
> -		bh->b_end_io = btrfs_end_buffer_write_sync;
> -		bh->b_private = device;
> +		ptr = kmap(page);
> +		memcpy(ptr, sb, BTRFS_SUPER_INFO_SIZE);

Copy superblock buffer to the beginning of the page, that's fine.

> +		kunmap(page);
>  
>  		/*
>  		 * we fua the first super.  The others we allow
> @@ -3535,9 +3544,17 @@ static int write_dev_supers(struct btrfs_device *device,
>  		op_flags = REQ_SYNC | REQ_META | REQ_PRIO;
>  		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
>  			op_flags |= REQ_FUA;
> -		ret = btrfsic_submit_bh(REQ_OP_WRITE, op_flags, bh);
> -		if (ret)
> -			errors++;
> +
> +		bio = bio_alloc(gfp_mask, 1);
> +		bio_set_dev(bio, device->bdev);
> +		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> +		bio->bi_private = device;
> +		bio->bi_end_io = btrfs_end_super_write;
> +		bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
> +			     offset_in_page(bytenr));

Now bytenr offset in page will be always 0 due to the 64K alignment, and
this also means that the correct data from sb are going to be written.

What bothers me is the implicit behaviour and dependence on the
alignment, either it should be offset_in_page everywhere or nowhere with
asserts to catch future 128KiB page capable CPUs.

> +
> +		bio_set_op_attrs(bio, REQ_OP_WRITE, op_flags);
> +		btrfsic_submit_bio(bio);
>  	}
>  	return errors < i ? 0 : -1;
>  }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking
  2020-01-23  8:18 ` [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking Johannes Thumshirn
  2020-01-23 14:03   ` Josef Bacik
  2020-01-23 14:23   ` Nikolay Borisov
@ 2020-01-24 14:22   ` David Sterba
  2020-01-24 14:59     ` Johannes Thumshirn
  2 siblings, 1 reply; 24+ messages in thread
From: David Sterba @ 2020-01-24 14:22 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Nikolay Borisov, linux-btrfs @ vger . kernel . org

On Thu, Jan 23, 2020 at 05:18:49PM +0900, Johannes Thumshirn wrote:
>  	if (btrfs_super_bytenr(super_tmp) != dev_bytenr ||
>  	    btrfs_super_magic(super_tmp) != BTRFS_MAGIC ||
>  	    memcmp(device->uuid, super_tmp->dev_item.uuid, BTRFS_UUID_SIZE) ||
>  	    btrfs_super_nodesize(super_tmp) != state->metablock_size ||
>  	    btrfs_super_sectorsize(super_tmp) != state->datablock_size) {
> -		brelse(bh);
> +		btrfs_release_disk_super(page);
>  		return 0;
>  	}
>  
> @@ -795,7 +813,7 @@ static int btrfsic_process_superblock_dev_mirror(
>  		superblock_tmp = btrfsic_block_alloc();
>  		if (NULL == superblock_tmp) {
>  			pr_info("btrfsic: error, kmalloc failed!\n");
> -			brelse(bh);
> +			btrfs_release_disk_super(page);
>  			return -1;
>  		}
>  		/* for superblock, only the dev_bytenr makes sense */
> @@ -880,7 +898,7 @@ static int btrfsic_process_superblock_dev_mirror(
>  					      mirror_num)) {
>  				pr_info("btrfsic: btrfsic_map_block(bytenr @%llu, mirror %d) failed!\n",
>  				       next_bytenr, mirror_num);
> -				brelse(bh);
> +				btrfs_release_disk_super(page);
>  				return -1;
>  			}
>  
> @@ -890,7 +908,7 @@ static int btrfsic_process_superblock_dev_mirror(
>  					mirror_num, NULL);
>  			if (NULL == next_block) {
>  				btrfsic_release_block_ctx(&tmp_next_block_ctx);
> -				brelse(bh);
> +				btrfs_release_disk_super(page);
>  				return -1;
>  			}
>  
> @@ -902,7 +920,7 @@ static int btrfsic_process_superblock_dev_mirror(
>  					BTRFSIC_GENERATION_UNKNOWN);
>  			btrfsic_release_block_ctx(&tmp_next_block_ctx);
>  			if (NULL == l) {
> -				brelse(bh);
> +				btrfs_release_disk_super(page);
>  				return -1;
>  			}
>  		}
> @@ -910,7 +928,7 @@ static int btrfsic_process_superblock_dev_mirror(
>  	if (state->print_mask & BTRFSIC_PRINT_MASK_INITIAL_ALL_TREES)
>  		btrfsic_dump_tree_sub(state, superblock_tmp, 0);
>  
> -	brelse(bh);
> +	btrfs_release_disk_super(page);

This could be the cleaned up to merge all error exits to jump to this
common block. Integrity checker is an old code so nobody cared enough to
clean it up, but it would be good to do it now when there are pople
looking at the code.

As mentioned before, btrfs_release_disk_super should be opencoded so
this would make it more straightfowrard to have only one place instad of
each error exit brelse replaced by put_page/kunmap.

>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking
  2020-01-24 14:22   ` David Sterba
@ 2020-01-24 14:59     ` Johannes Thumshirn
  0 siblings, 0 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-01-24 14:59 UTC (permalink / raw)
  To: dsterba; +Cc: David Sterba, Nikolay Borisov, linux-btrfs @ vger . kernel . org

On 24/01/2020 15:22, David Sterba wrote:
[...]
>>   
>> -	brelse(bh);
>> +	btrfs_release_disk_super(page);
> 
> This could be the cleaned up to merge all error exits to jump to this
> common block. Integrity checker is an old code so nobody cared enough to
> clean it up, but it would be good to do it now when there are pople
> looking at the code.
> 
> As mentioned before, btrfs_release_disk_super should be opencoded so
> this would make it more straightfowrard to have only one place instad of
> each error exit brelse replaced by put_page/kunmap.

Sure no problem. I already have a internal branch with 
put_page()/kunmap(), but before sending it out I want to address 
Nikolay's comments regarding the page locking and this still causes 
deadlocks here.

Thanks,
	Johannes

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-01-24 14:59 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-23  8:18 [PATCH v2 0/6] btrfs: remove buffer heads form superblock handling Johannes Thumshirn
2020-01-23  8:18 ` [PATCH v2 1/6] btrfs: export btrfs_release_disk_super Johannes Thumshirn
2020-01-23 13:42   ` Josef Bacik
2020-01-23  8:18 ` [PATCH v2 2/6] btrfs: remove buffer heads from super block reading Johannes Thumshirn
2020-01-23 13:51   ` Josef Bacik
2020-01-23 13:53     ` Johannes Thumshirn
2020-01-23 14:13   ` Nikolay Borisov
2020-01-23 16:45     ` Johannes Thumshirn
2020-01-23 22:25   ` David Sterba
2020-01-24 13:54   ` David Sterba
2020-01-24 14:03   ` David Sterba
2020-01-23  8:18 ` [PATCH v2 3/6] btrfs: remove use of buffer_heads from superblock writeout Johannes Thumshirn
2020-01-23 13:59   ` Josef Bacik
2020-01-23 14:19   ` Nikolay Borisov
2020-01-24 14:15   ` David Sterba
2020-01-23  8:18 ` [PATCH v2 4/6] btrfs: remove btrfsic_submit_bh() Johannes Thumshirn
2020-01-23 14:00   ` Josef Bacik
2020-01-23  8:18 ` [PATCH v2 5/6] btrfs: remove buffer_heads from btrfsic_process_written_block() Johannes Thumshirn
2020-01-23 14:00   ` Josef Bacik
2020-01-23  8:18 ` [PATCH v2 6/6] btrfs: remove buffer_heads form superblock mirror integrity checking Johannes Thumshirn
2020-01-23 14:03   ` Josef Bacik
2020-01-23 14:23   ` Nikolay Borisov
2020-01-24 14:22   ` David Sterba
2020-01-24 14:59     ` Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.