linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Remove extent_map::bdev
@ 2019-10-07 19:37 David Sterba
  2019-10-07 19:37 ` [PATCH 1/5] btrfs: assert extent_map bdevs and lookup_map and split David Sterba
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: David Sterba @ 2019-10-07 19:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The extent_map::bdev is unused and and can be removed, but it's not
straightforward so there are several steps. The first patch decouples
bdev from map_lookup. The remaining patches clean up use of the bdev,
removing a few bio_set_dev on the way. In the end, submit_extent_page is
one parameter lighter.

This has survived several fstests runs

David Sterba (5):
  btrfs: assert extent_map bdevs and lookup_map and split
  btrfs: get bdev from latest_dev for dio bh_result
  btrfs: drop bio_set_dev where not needed
  btrfs: remove extent_map::bdev
  btrfs: drop bdev argument from submit_extent_page

 fs/btrfs/compression.c | 10 ----------
 fs/btrfs/disk-io.c     |  3 ---
 fs/btrfs/extent_io.c   | 15 +++------------
 fs/btrfs/extent_map.c  |  6 +++++-
 fs/btrfs/extent_map.h  | 11 ++---------
 fs/btrfs/file-item.c   |  1 -
 fs/btrfs/file.c        |  3 ---
 fs/btrfs/inode.c       | 14 ++++----------
 fs/btrfs/relocation.c  |  2 --
 9 files changed, 14 insertions(+), 51 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] btrfs: assert extent_map bdevs and lookup_map and split
  2019-10-07 19:37 [PATCH 0/5] Remove extent_map::bdev David Sterba
@ 2019-10-07 19:37 ` David Sterba
  2019-10-07 19:37 ` [PATCH 2/5] btrfs: get bdev from latest_dev for dio bh_result David Sterba
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-07 19:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This is a preparatory patch for removing extent_map::bdev. There's some
history behind the code so this is only precaution to catch if things
break before the actual removal happens.

Logically, comparing a raw low-level block device (bdev) does not make
sense for extent maps (high-level objects). This had no effect in
practice but was quite confusing in the code.  The lookup_map is set iff
EXTENT_FLAG_FS_MAPPING is set.

The two pointers were stored in the same bytes and used potentially in
two meanings. Now they're split, so the asserts are in place to check
that the condition will not change.

The lookup map pointer misused bdev, this has been changed in commit
95617d69326c ("btrfs: cleanup, stop casting for extent_map->lookup
everywhere") to the explicit type. But the semantics hasn't changed and
bdev was not actually used to decide if maps are mergeable.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_map.c | 9 ++++++++-
 fs/btrfs/extent_map.h | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 9d30acca55e1..9f99dccbc3ca 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -214,9 +214,16 @@ static int mergable_maps(struct extent_map *prev, struct extent_map *next)
 	ASSERT(next->block_start != EXTENT_MAP_DELALLOC &&
 	       prev->block_start != EXTENT_MAP_DELALLOC);
 
+	if (prev->map_lookup || next->map_lookup)
+		ASSERT(test_bit(EXTENT_FLAG_FS_MAPPING, &prev->flags) &&
+		       test_bit(EXTENT_FLAG_FS_MAPPING, &next->flags));
+
+	if (prev->bdev || next->bdev)
+		ASSERT(prev->bdev == next->bdev);
+
 	if (extent_map_end(prev) == next->start &&
 	    prev->flags == next->flags &&
-	    prev->bdev == next->bdev &&
+	    prev->map_lookup == next->map_lookup &&
 	    ((next->block_start == EXTENT_MAP_HOLE &&
 	      prev->block_start == EXTENT_MAP_HOLE) ||
 	     (next->block_start == EXTENT_MAP_INLINE &&
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 473f039fcd7c..3eb9c596b445 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -42,7 +42,7 @@ struct extent_map {
 	u64 block_len;
 	u64 generation;
 	unsigned long flags;
-	union {
+	struct {
 		struct block_device *bdev;
 
 		/*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] btrfs: get bdev from latest_dev for dio bh_result
  2019-10-07 19:37 [PATCH 0/5] Remove extent_map::bdev David Sterba
  2019-10-07 19:37 ` [PATCH 1/5] btrfs: assert extent_map bdevs and lookup_map and split David Sterba
@ 2019-10-07 19:37 ` David Sterba
  2019-10-09 10:42   ` Nikolay Borisov
  2019-10-07 19:37 ` [PATCH 3/5] btrfs: drop bio_set_dev where not needed David Sterba
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2019-10-07 19:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

To remove use of extent_map::bdev we need to find a replacement, and the
latest_bdev is the only one we can use here, because inode::i_bdev and
superblock::s_bdev are NULL.

The only thing that DIO code uses from the bdev is the blocksize to
perform alignment checks in do_blockdev_direct_IO, but we do them in
btrfs code before any call to DIO. We can't pass NULL because there are
dereferences of bdev in __blockdev_direct_IO, even though it's not going
to be used later.

So it's safe to pass any valid bdev that's used within the filesystem.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/inode.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 067cbd6e3923..8e085d21c3c5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7603,6 +7603,8 @@ static int btrfs_get_blocks_direct_read(struct extent_map *em,
 					struct inode *inode,
 					u64 start, u64 len)
 {
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+
 	if (em->block_start == EXTENT_MAP_HOLE ||
 			test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
 		return -ENOENT;
@@ -7612,7 +7614,7 @@ static int btrfs_get_blocks_direct_read(struct extent_map *em,
 	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
 		inode->i_blkbits;
 	bh_result->b_size = len;
-	bh_result->b_bdev = em->bdev;
+	bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
 	set_buffer_mapped(bh_result);
 
 	return 0;
@@ -7695,7 +7697,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
 		inode->i_blkbits;
 	bh_result->b_size = len;
-	bh_result->b_bdev = em->bdev;
+	bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
 	set_buffer_mapped(bh_result);
 
 	if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] btrfs: drop bio_set_dev where not needed
  2019-10-07 19:37 [PATCH 0/5] Remove extent_map::bdev David Sterba
  2019-10-07 19:37 ` [PATCH 1/5] btrfs: assert extent_map bdevs and lookup_map and split David Sterba
  2019-10-07 19:37 ` [PATCH 2/5] btrfs: get bdev from latest_dev for dio bh_result David Sterba
@ 2019-10-07 19:37 ` David Sterba
  2019-10-07 19:37 ` [PATCH 4/5] btrfs: remove extent_map::bdev David Sterba
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-07 19:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

bio_set_dev sets a bdev to a bio and is not only setting a pointer bug
also changing some state bits if there was a different bdev set before.
This is one thing that's not needed.

Another thing is that setting a bdev at bio allocation time is too early
and actually does not work with plain redundancy profiles, where each
time we submit a bio to a device, the bdev is set correctly.

In many places the bio bdev is set to latest_bdev that seems to serve as
a stub pointer "just to put something to bio". But we don't have to do
that.

Where do we know which bdev to set:

* for regular IO: submit_stripe_bio that's called by btrfs_map_bio

* repair IO: repair_io_failure, read or write from specific device

* super block write (using buffer_heads but uses raw bdev) and barriers

* scrub: this does not use all regular IO paths as it needs to reach all
  copies, verify and fixup eventually, and for that all bdev management
  is independent

* raid56: rbio_add_io_page, for the RMW write

* integrity-checker: does it's own low-level block tracking

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/compression.c | 10 ----------
 fs/btrfs/extent_io.c   |  2 --
 2 files changed, 12 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b05b361e2062..5e66da73ab34 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -320,7 +320,6 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 	int pg_index = 0;
 	struct page *page;
 	u64 first_byte = disk_start;
-	struct block_device *bdev;
 	blk_status_t ret;
 	int skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
 
@@ -339,10 +338,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 	cb->orig_bio = NULL;
 	cb->nr_pages = nr_pages;
 
-	bdev = fs_info->fs_devices->latest_bdev;
-
 	bio = btrfs_bio_alloc(first_byte);
-	bio_set_dev(bio, bdev);
 	bio->bi_opf = REQ_OP_WRITE | write_flags;
 	bio->bi_private = cb;
 	bio->bi_end_io = end_compressed_bio_write;
@@ -385,7 +381,6 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 			}
 
 			bio = btrfs_bio_alloc(first_byte);
-			bio_set_dev(bio, bdev);
 			bio->bi_opf = REQ_OP_WRITE | write_flags;
 			bio->bi_private = cb;
 			bio->bi_end_io = end_compressed_bio_write;
@@ -553,7 +548,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	unsigned long nr_pages;
 	unsigned long pg_index;
 	struct page *page;
-	struct block_device *bdev;
 	struct bio *comp_bio;
 	u64 cur_disk_byte = (u64)bio->bi_iter.bi_sector << 9;
 	u64 em_len;
@@ -604,8 +598,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	if (!cb->compressed_pages)
 		goto fail1;
 
-	bdev = fs_info->fs_devices->latest_bdev;
-
 	for (pg_index = 0; pg_index < nr_pages; pg_index++) {
 		cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS |
 							      __GFP_HIGHMEM);
@@ -624,7 +616,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	cb->len = bio->bi_iter.bi_size;
 
 	comp_bio = btrfs_bio_alloc(cur_disk_byte);
-	bio_set_dev(comp_bio, bdev);
 	comp_bio->bi_opf = REQ_OP_READ;
 	comp_bio->bi_private = cb;
 	comp_bio->bi_end_io = end_compressed_bio_read;
@@ -675,7 +666,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			}
 
 			comp_bio = btrfs_bio_alloc(cur_disk_byte);
-			bio_set_dev(comp_bio, bdev);
 			comp_bio->bi_opf = REQ_OP_READ;
 			comp_bio->bi_private = cb;
 			comp_bio->bi_end_io = end_compressed_bio_read;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0f1d917c8bb3..5baade1bc1e1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2544,7 +2544,6 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
 	bio = btrfs_io_bio_alloc(1);
 	bio->bi_end_io = endio_func;
 	bio->bi_iter.bi_sector = failrec->logical >> 9;
-	bio_set_dev(bio, fs_info->fs_devices->latest_bdev);
 	bio->bi_iter.bi_size = 0;
 	bio->bi_private = data;
 
@@ -2987,7 +2986,6 @@ static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
 	}
 
 	bio = btrfs_bio_alloc(offset);
-	bio_set_dev(bio, bdev);
 	bio_add_page(bio, page, page_size, pg_offset);
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = tree;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] btrfs: remove extent_map::bdev
  2019-10-07 19:37 [PATCH 0/5] Remove extent_map::bdev David Sterba
                   ` (2 preceding siblings ...)
  2019-10-07 19:37 ` [PATCH 3/5] btrfs: drop bio_set_dev where not needed David Sterba
@ 2019-10-07 19:37 ` David Sterba
  2019-10-07 19:37 ` [PATCH 5/5] btrfs: drop bdev argument from submit_extent_page David Sterba
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-07 19:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

We can now remove the bdev from extent_map. Previous patches made sure
that bio_set_dev is correctly in all places and that we don't need to
grab it from latest_bdev or pass it around inside the extent map.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c    |  3 ---
 fs/btrfs/extent_io.c  |  2 --
 fs/btrfs/extent_map.c |  3 ---
 fs/btrfs/extent_map.h | 11 ++---------
 fs/btrfs/file-item.c  |  1 -
 fs/btrfs/file.c       |  3 ---
 fs/btrfs/inode.c      |  8 --------
 fs/btrfs/relocation.c |  2 --
 8 files changed, 2 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 16dc60b4966d..7a7e59baa0d6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -205,7 +205,6 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
 		struct page *page, size_t pg_offset, u64 start, u64 len,
 		int create)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct extent_map_tree *em_tree = &inode->extent_tree;
 	struct extent_map *em;
 	int ret;
@@ -213,7 +212,6 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
 	read_lock(&em_tree->lock);
 	em = lookup_extent_mapping(em_tree, start, len);
 	if (em) {
-		em->bdev = fs_info->fs_devices->latest_bdev;
 		read_unlock(&em_tree->lock);
 		goto out;
 	}
@@ -228,7 +226,6 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
 	em->len = (u64)-1;
 	em->block_len = (u64)-1;
 	em->block_start = 0;
-	em->bdev = fs_info->fs_devices->latest_bdev;
 
 	write_lock(&em_tree->lock);
 	ret = add_extent_mapping(em_tree, em, 0);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5baade1bc1e1..d3dda8388f93 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3150,7 +3150,6 @@ static int __do_readpage(struct extent_io_tree *tree,
 			offset = em->block_start + extent_offset;
 			disk_io_size = iosize;
 		}
-		bdev = em->bdev;
 		block_start = em->block_start;
 		if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
 			block_start = EXTENT_MAP_HOLE;
@@ -3486,7 +3485,6 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode,
 		iosize = min(em_end - cur, end - cur + 1);
 		iosize = ALIGN(iosize, blocksize);
 		offset = em->block_start + extent_offset;
-		bdev = em->bdev;
 		block_start = em->block_start;
 		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
 		free_extent_map(em);
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 9f99dccbc3ca..6f417ff68980 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -218,9 +218,6 @@ static int mergable_maps(struct extent_map *prev, struct extent_map *next)
 		ASSERT(test_bit(EXTENT_FLAG_FS_MAPPING, &prev->flags) &&
 		       test_bit(EXTENT_FLAG_FS_MAPPING, &next->flags));
 
-	if (prev->bdev || next->bdev)
-		ASSERT(prev->bdev == next->bdev);
-
 	if (extent_map_end(prev) == next->start &&
 	    prev->flags == next->flags &&
 	    prev->map_lookup == next->map_lookup &&
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 3eb9c596b445..8e217337dff9 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -42,15 +42,8 @@ struct extent_map {
 	u64 block_len;
 	u64 generation;
 	unsigned long flags;
-	struct {
-		struct block_device *bdev;
-
-		/*
-		 * used for chunk mappings
-		 * flags & EXTENT_FLAG_FS_MAPPING must be set
-		 */
-		struct map_lookup *map_lookup;
-	};
+	/* Used for chunk mappings, flag EXTENT_FLAG_FS_MAPPING must be set */
+	struct map_lookup *map_lookup;
 	refcount_t refs;
 	unsigned int compress_type;
 	struct list_head list;
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 1a599f50837b..3270a40b0777 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -945,7 +945,6 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 	u8 type = btrfs_file_extent_type(leaf, fi);
 	int compress_type = btrfs_file_extent_compression(leaf, fi);
 
-	em->bdev = fs_info->fs_devices->latest_bdev;
 	btrfs_item_key_to_cpu(leaf, &key, slot);
 	extent_start = key.offset;
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 3d151f788177..888871fb7aa3 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -667,7 +667,6 @@ void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end,
 			}
 
 			split->generation = gen;
-			split->bdev = em->bdev;
 			split->flags = flags;
 			split->compress_type = em->compress_type;
 			replace_extent_mapping(em_tree, em, split, modified);
@@ -680,7 +679,6 @@ void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end,
 
 			split->start = start + len;
 			split->len = em->start + em->len - (start + len);
-			split->bdev = em->bdev;
 			split->flags = flags;
 			split->compress_type = em->compress_type;
 			split->generation = gen;
@@ -2363,7 +2361,6 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 		hole_em->block_start = EXTENT_MAP_HOLE;
 		hole_em->block_len = 0;
 		hole_em->orig_block_len = 0;
-		hole_em->bdev = fs_info->fs_devices->latest_bdev;
 		hole_em->compress_type = BTRFS_COMPRESS_NONE;
 		hole_em->generation = trans->transid;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8e085d21c3c5..23178fef4fc6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5132,7 +5132,6 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 			hole_em->block_len = 0;
 			hole_em->orig_block_len = 0;
 			hole_em->ram_bytes = hole_size;
-			hole_em->bdev = fs_info->fs_devices->latest_bdev;
 			hole_em->compress_type = BTRFS_COMPRESS_NONE;
 			hole_em->generation = fs_info->generation;
 
@@ -6910,8 +6909,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 
 	read_lock(&em_tree->lock);
 	em = lookup_extent_mapping(em_tree, start, len);
-	if (em)
-		em->bdev = fs_info->fs_devices->latest_bdev;
 	read_unlock(&em_tree->lock);
 
 	if (em) {
@@ -6927,7 +6924,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 		err = -ENOMEM;
 		goto out;
 	}
-	em->bdev = fs_info->fs_devices->latest_bdev;
 	em->start = EXTENT_MAP_HOLE;
 	em->orig_start = EXTENT_MAP_HOLE;
 	em->len = (u64)-1;
@@ -7186,7 +7182,6 @@ struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
 			err = -ENOMEM;
 			goto out;
 		}
-		em->bdev = NULL;
 
 		ASSERT(hole_em);
 		/*
@@ -7546,7 +7541,6 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len,
 {
 	struct extent_map_tree *em_tree;
 	struct extent_map *em;
-	struct btrfs_root *root = BTRFS_I(inode)->root;
 	int ret;
 
 	ASSERT(type == BTRFS_ORDERED_PREALLOC ||
@@ -7564,7 +7558,6 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len,
 	em->len = len;
 	em->block_len = block_len;
 	em->block_start = block_start;
-	em->bdev = root->fs_info->fs_devices->latest_bdev;
 	em->orig_block_len = orig_block_len;
 	em->ram_bytes = ram_bytes;
 	em->generation = -1;
@@ -10410,7 +10403,6 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		em->block_len = ins.offset;
 		em->orig_block_len = ins.offset;
 		em->ram_bytes = ins.offset;
-		em->bdev = fs_info->fs_devices->latest_bdev;
 		set_bit(EXTENT_FLAG_PREALLOC, &em->flags);
 		em->generation = trans->transid;
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 00504657b602..ad9d8d99c651 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3195,7 +3195,6 @@ static noinline_for_stack
 int setup_extent_mapping(struct inode *inode, u64 start, u64 end,
 			 u64 block_start)
 {
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
 	struct extent_map *em;
 	int ret = 0;
@@ -3208,7 +3207,6 @@ int setup_extent_mapping(struct inode *inode, u64 start, u64 end,
 	em->len = end + 1 - start;
 	em->block_len = em->len;
 	em->block_start = block_start;
-	em->bdev = fs_info->fs_devices->latest_bdev;
 	set_bit(EXTENT_FLAG_PINNED, &em->flags);
 
 	lock_extent(&BTRFS_I(inode)->io_tree, start, end);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] btrfs: drop bdev argument from submit_extent_page
  2019-10-07 19:37 [PATCH 0/5] Remove extent_map::bdev David Sterba
                   ` (3 preceding siblings ...)
  2019-10-07 19:37 ` [PATCH 4/5] btrfs: remove extent_map::bdev David Sterba
@ 2019-10-07 19:37 ` David Sterba
  2019-10-22 14:00 ` [PATCH 0/5] Remove extent_map::bdev David Sterba
  2019-11-18 15:41 ` Josef Bacik
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-07 19:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

After previous patches removing bdev being passed around to set it to
bio, it has become unused in submit_extent_page. So it now has "only" 13
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d3dda8388f93..f426a0a772b3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2929,7 +2929,6 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size)
  *              a contiguous page to the previous one
  * @size:	portion of page that we want to write
  * @offset:	starting offset in the page
- * @bdev:	attach newly created bios to this bdev
  * @bio_ret:	must be valid pointer, newly allocated bio will be stored there
  * @end_io_func:     end_io callback for new bio
  * @mirror_num:	     desired mirror to read/write
@@ -2940,7 +2939,6 @@ static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
 			      struct writeback_control *wbc,
 			      struct page *page, u64 offset,
 			      size_t size, unsigned long pg_offset,
-			      struct block_device *bdev,
 			      struct bio **bio_ret,
 			      bio_end_io_t end_io_func,
 			      int mirror_num,
@@ -3073,7 +3071,6 @@ static int __do_readpage(struct extent_io_tree *tree,
 	u64 block_start;
 	u64 cur_end;
 	struct extent_map *em;
-	struct block_device *bdev;
 	int ret = 0;
 	int nr = 0;
 	size_t pg_offset = 0;
@@ -3239,7 +3236,7 @@ static int __do_readpage(struct extent_io_tree *tree,
 
 		ret = submit_extent_page(REQ_OP_READ | read_flags, tree, NULL,
 					 page, offset, disk_io_size,
-					 pg_offset, bdev, bio,
+					 pg_offset, bio,
 					 end_bio_extent_readpage, mirror_num,
 					 *bio_flags,
 					 this_bio_flag,
@@ -3427,7 +3424,6 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode,
 	u64 block_start;
 	u64 iosize;
 	struct extent_map *em;
-	struct block_device *bdev;
 	size_t pg_offset = 0;
 	size_t blocksize;
 	int ret = 0;
@@ -3526,7 +3522,7 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode,
 
 		ret = submit_extent_page(REQ_OP_WRITE | write_flags, tree, wbc,
 					 page, offset, iosize, pg_offset,
-					 bdev, &epd->bio,
+					 &epd->bio,
 					 end_bio_extent_writepage,
 					 0, 0, 0, false);
 		if (ret) {
@@ -3855,7 +3851,6 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 			struct extent_page_data *epd)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-	struct block_device *bdev = fs_info->fs_devices->latest_bdev;
 	struct extent_io_tree *tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
 	u64 offset = eb->start;
 	u32 nritems;
@@ -3890,7 +3885,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 		clear_page_dirty_for_io(p);
 		set_page_writeback(p);
 		ret = submit_extent_page(REQ_OP_WRITE | write_flags, tree, wbc,
-					 p, offset, PAGE_SIZE, 0, bdev,
+					 p, offset, PAGE_SIZE, 0,
 					 &epd->bio,
 					 end_bio_extent_buffer_writepage,
 					 0, 0, 0, false);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] btrfs: get bdev from latest_dev for dio bh_result
  2019-10-07 19:37 ` [PATCH 2/5] btrfs: get bdev from latest_dev for dio bh_result David Sterba
@ 2019-10-09 10:42   ` Nikolay Borisov
  2019-10-11 18:26     ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2019-10-09 10:42 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 7.10.19 г. 22:37 ч., David Sterba wrote:
> To remove use of extent_map::bdev we need to find a replacement, and the
> latest_bdev is the only one we can use here, because inode::i_bdev and
> superblock::s_bdev are NULL.
> 
> The only thing that DIO code uses from the bdev is the blocksize to
> perform alignment checks in do_blockdev_direct_IO, but we do them in
> btrfs code before any call to DIO. We can't pass NULL because there are

nit: This is not entirely correct. In fact map_bh in
do_blockdev_direct_IO gets filled in :

do_direct_IO
  get_more_blocks
   sdio->get_block() <-- this is btrfs_get_blocks_direct

Subsequently the map_bh->b_dev member is used in
clean_bdev_aliases and dio_new_bio to set the bio's bdev to that of the
buffer_head. However, because we have provided a submit function
dio_bio_submit calls our submission function and ignores the bdev.

Furthermore __blockdev_direct_IO already uses latest_bdev as per
btrfs_direct_IO.


> dereferences of bdev in __blockdev_direct_IO, even though it's not going
> to be used later.
> 
> So it's safe to pass any valid bdev that's used within the filesystem.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

The code is ok so :

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/inode.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 067cbd6e3923..8e085d21c3c5 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7603,6 +7603,8 @@ static int btrfs_get_blocks_direct_read(struct extent_map *em,
>  					struct inode *inode,
>  					u64 start, u64 len)
>  {
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +
>  	if (em->block_start == EXTENT_MAP_HOLE ||
>  			test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
>  		return -ENOENT;
> @@ -7612,7 +7614,7 @@ static int btrfs_get_blocks_direct_read(struct extent_map *em,
>  	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
>  		inode->i_blkbits;
>  	bh_result->b_size = len;
> -	bh_result->b_bdev = em->bdev;
> +	bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
>  	set_buffer_mapped(bh_result);
>  
>  	return 0;
> @@ -7695,7 +7697,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>  	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
>  		inode->i_blkbits;
>  	bh_result->b_size = len;
> -	bh_result->b_bdev = em->bdev;
> +	bh_result->b_bdev = fs_info->fs_devices->latest_bdev;
>  	set_buffer_mapped(bh_result);
>  
>  	if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] btrfs: get bdev from latest_dev for dio bh_result
  2019-10-09 10:42   ` Nikolay Borisov
@ 2019-10-11 18:26     ` David Sterba
  2019-10-11 18:54       ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2019-10-11 18:26 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: David Sterba, linux-btrfs

On Wed, Oct 09, 2019 at 01:42:00PM +0300, Nikolay Borisov wrote:
> 
> 
> On 7.10.19 г. 22:37 ч., David Sterba wrote:
> > To remove use of extent_map::bdev we need to find a replacement, and the
> > latest_bdev is the only one we can use here, because inode::i_bdev and
> > superblock::s_bdev are NULL.
> > 
> > The only thing that DIO code uses from the bdev is the blocksize to
> > perform alignment checks in do_blockdev_direct_IO, but we do them in
> > btrfs code before any call to DIO. We can't pass NULL because there are
> 
> nit: This is not entirely correct. In fact map_bh in
> do_blockdev_direct_IO gets filled in :
> 
> do_direct_IO
>   get_more_blocks
>    sdio->get_block() <-- this is btrfs_get_blocks_direct
> 
> Subsequently the map_bh->b_dev member is used in
> clean_bdev_aliases and dio_new_bio to set the bio's bdev to that of the
> buffer_head. However, because we have provided a submit function
> dio_bio_submit calls our submission function and ignores the bdev.

You're right, and actually I got crashes in clean_bdev_aliases when I
supplied a NULL bdev, so I'll add it to the changelog. Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] btrfs: get bdev from latest_dev for dio bh_result
  2019-10-11 18:26     ` David Sterba
@ 2019-10-11 18:54       ` David Sterba
  0 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-11 18:54 UTC (permalink / raw)
  To: dsterba, Nikolay Borisov, David Sterba, linux-btrfs

On Fri, Oct 11, 2019 at 08:26:42PM +0200, David Sterba wrote:
> You're right, and actually I got crashes in clean_bdev_aliases when I
> supplied a NULL bdev, so I'll add it to the changelog. Thanks.

Unless there are further comments, I won't resend the whole patchset.
The changelog in this patch will be:

---
    btrfs: get bdev from latest_dev for dio bh_result

    To remove use of extent_map::bdev we need to find a replacement, and the
    latest_bdev is the only one we can use here, because inode::i_bdev and
    superblock::s_bdev are NULL.

    The DIO code uses bdev in two places:

    * to read blocksize to perform alignment checks in
      do_blockdev_direct_IO, but we do them in btrfs code before any call to
      DIO

    * in the following call chain:

      do_direct_IO
        get_more_blocks
         sdio->get_block() <-- this is btrfs_get_blocks_direct

      subsequently the map_bh->b_dev member is used in clean_bdev_aliases
      and dio_new_bio to set the bio's bdev to that of the buffer_head.
      However, because we have provided a submit function dio_bio_submit
      calls our submission function and ignores the bdev.

    We can't pass NULL because there are dereferences of bdev in
    __blockdev_direct_IO, even though it's not going to be used later.

    So it's safe to pass any valid bdev that's used within the filesystem.
---

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Remove extent_map::bdev
  2019-10-07 19:37 [PATCH 0/5] Remove extent_map::bdev David Sterba
                   ` (4 preceding siblings ...)
  2019-10-07 19:37 ` [PATCH 5/5] btrfs: drop bdev argument from submit_extent_page David Sterba
@ 2019-10-22 14:00 ` David Sterba
  2019-11-18 15:41 ` Josef Bacik
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-22 14:00 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Mon, Oct 07, 2019 at 09:37:40PM +0200, David Sterba wrote:
> The extent_map::bdev is unused and and can be removed, but it's not
> straightforward so there are several steps. The first patch decouples
> bdev from map_lookup. The remaining patches clean up use of the bdev,
> removing a few bio_set_dev on the way. In the end, submit_extent_page is
> one parameter lighter.
> 
> This has survived several fstests runs
> 
> David Sterba (5):
>   btrfs: assert extent_map bdevs and lookup_map and split
>   btrfs: get bdev from latest_dev for dio bh_result
>   btrfs: drop bio_set_dev where not needed
>   btrfs: remove extent_map::bdev
>   btrfs: drop bdev argument from submit_extent_page

Moved from topic branch to misc-next.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Remove extent_map::bdev
  2019-10-07 19:37 [PATCH 0/5] Remove extent_map::bdev David Sterba
                   ` (5 preceding siblings ...)
  2019-10-22 14:00 ` [PATCH 0/5] Remove extent_map::bdev David Sterba
@ 2019-11-18 15:41 ` Josef Bacik
  2019-11-18 15:49   ` David Sterba
  2019-11-18 21:57   ` David Sterba
  6 siblings, 2 replies; 13+ messages in thread
From: Josef Bacik @ 2019-11-18 15:41 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Mon, Oct 07, 2019 at 09:37:40PM +0200, David Sterba wrote:
> The extent_map::bdev is unused and and can be removed, but it's not
> straightforward so there are several steps. The first patch decouples
> bdev from map_lookup. The remaining patches clean up use of the bdev,
> removing a few bio_set_dev on the way. In the end, submit_extent_page is
> one parameter lighter.
> 
> This has survived several fstests runs
> 
> David Sterba (5):
>   btrfs: assert extent_map bdevs and lookup_map and split
>   btrfs: get bdev from latest_dev for dio bh_result
>   btrfs: drop bio_set_dev where not needed
>   btrfs: remove extent_map::bdev
>   btrfs: drop bdev argument from submit_extent_page
> 
>  fs/btrfs/compression.c | 10 ----------
>  fs/btrfs/disk-io.c     |  3 ---
>  fs/btrfs/extent_io.c   | 15 +++------------
>  fs/btrfs/extent_map.c  |  6 +++++-
>  fs/btrfs/extent_map.h  | 11 ++---------
>  fs/btrfs/file-item.c   |  1 -
>  fs/btrfs/file.c        |  3 ---
>  fs/btrfs/inode.c       | 14 ++++----------
>  fs/btrfs/relocation.c  |  2 --
>  9 files changed, 14 insertions(+), 51 deletions(-)
> 

This series needs to be dropped from misc-next, it causes any box with cgroups
enabled to crash on boot.  The bio requires having a bio->bi_disk set when we do
wbc_init_bio, which we no longer have with this patch.

To fix this you would need to do something similar to what we do with
compression, save the wbc css in the bbio and then call the associate_blkg at
submit time.

However, this is problematic right now because we don't get notified when the
bbio is free'd.  We have no way to free the ref on the css we save, which means
infrastructure needs to be added to biosets so we can get called at free time
for every bio in a bioset.  Or we can add back the bi_css to the bio, but that
seems like a less good idea.

Either way this needs to be dropped until this is addressed.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Remove extent_map::bdev
  2019-11-18 15:41 ` Josef Bacik
@ 2019-11-18 15:49   ` David Sterba
  2019-11-18 21:57   ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-11-18 15:49 UTC (permalink / raw)
  To: Josef Bacik; +Cc: David Sterba, linux-btrfs

On Mon, Nov 18, 2019 at 10:41:54AM -0500, Josef Bacik wrote:
> On Mon, Oct 07, 2019 at 09:37:40PM +0200, David Sterba wrote:
> > The extent_map::bdev is unused and and can be removed, but it's not
> > straightforward so there are several steps. The first patch decouples
> > bdev from map_lookup. The remaining patches clean up use of the bdev,
> > removing a few bio_set_dev on the way. In the end, submit_extent_page is
> > one parameter lighter.
> > 
> > This has survived several fstests runs
> > 
> > David Sterba (5):
> >   btrfs: assert extent_map bdevs and lookup_map and split
> >   btrfs: get bdev from latest_dev for dio bh_result
> >   btrfs: drop bio_set_dev where not needed
> >   btrfs: remove extent_map::bdev
> >   btrfs: drop bdev argument from submit_extent_page
> > 
> >  fs/btrfs/compression.c | 10 ----------
> >  fs/btrfs/disk-io.c     |  3 ---
> >  fs/btrfs/extent_io.c   | 15 +++------------
> >  fs/btrfs/extent_map.c  |  6 +++++-
> >  fs/btrfs/extent_map.h  | 11 ++---------
> >  fs/btrfs/file-item.c   |  1 -
> >  fs/btrfs/file.c        |  3 ---
> >  fs/btrfs/inode.c       | 14 ++++----------
> >  fs/btrfs/relocation.c  |  2 --
> >  9 files changed, 14 insertions(+), 51 deletions(-)
> > 
> 
> This series needs to be dropped from misc-next, it causes any box with cgroups
> enabled to crash on boot.  The bio requires having a bio->bi_disk set when we do
> wbc_init_bio, which we no longer have with this patch.

Do you have the stack trace of the crash?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Remove extent_map::bdev
  2019-11-18 15:41 ` Josef Bacik
  2019-11-18 15:49   ` David Sterba
@ 2019-11-18 21:57   ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-11-18 21:57 UTC (permalink / raw)
  To: Josef Bacik; +Cc: David Sterba, linux-btrfs

On Mon, Nov 18, 2019 at 10:41:54AM -0500, Josef Bacik wrote:
> On Mon, Oct 07, 2019 at 09:37:40PM +0200, David Sterba wrote:
> > The extent_map::bdev is unused and and can be removed, but it's not
> > straightforward so there are several steps. The first patch decouples
> > bdev from map_lookup. The remaining patches clean up use of the bdev,
> > removing a few bio_set_dev on the way. In the end, submit_extent_page is
> > one parameter lighter.
> > 
> > This has survived several fstests runs
> > 
> > David Sterba (5):
> >   btrfs: assert extent_map bdevs and lookup_map and split
> >   btrfs: get bdev from latest_dev for dio bh_result
> >   btrfs: drop bio_set_dev where not needed
> >   btrfs: remove extent_map::bdev
> >   btrfs: drop bdev argument from submit_extent_page
> > 
> >  fs/btrfs/compression.c | 10 ----------
> >  fs/btrfs/disk-io.c     |  3 ---
> >  fs/btrfs/extent_io.c   | 15 +++------------
> >  fs/btrfs/extent_map.c  |  6 +++++-
> >  fs/btrfs/extent_map.h  | 11 ++---------
> >  fs/btrfs/file-item.c   |  1 -
> >  fs/btrfs/file.c        |  3 ---
> >  fs/btrfs/inode.c       | 14 ++++----------
> >  fs/btrfs/relocation.c  |  2 --
> >  9 files changed, 14 insertions(+), 51 deletions(-)
> > 
> 
> This series needs to be dropped from misc-next, it causes any box with cgroups
> enabled to crash on boot.  The bio requires having a bio->bi_disk set when we do
> wbc_init_bio, which we no longer have with this patch.
> 
> To fix this you would need to do something similar to what we do with
> compression, save the wbc css in the bbio and then call the associate_blkg at
> submit time.
> 
> However, this is problematic right now because we don't get notified when the
> bbio is free'd.  We have no way to free the ref on the css we save, which means
> infrastructure needs to be added to biosets so we can get called at free time
> for every bio in a bioset.  Or we can add back the bi_css to the bio, but that
> seems like a less good idea.
> 
> Either way this needs to be dropped until this is addressed.  Thanks,

I found a simple fix that does not need the reverts.

The bdev passed to submit_extent_bio is the latest_bdev, that's what the
cleanup series got rid of, pointlessly passing around a known pointer.

For equivalent change, the bdev can be pulled out of the page in a
series of 5 dereferences.

@@ -2987,13 +2987,16 @@ static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
        }
 
        bio = btrfs_bio_alloc(offset);
-       bio_set_dev(bio, bdev);
        bio_add_page(bio, page, page_size, pg_offset);
        bio->bi_end_io = end_io_func;
        bio->bi_private = tree;
        bio->bi_write_hint = page->mapping->host->i_write_hint;
        bio->bi_opf = opf;
        if (wbc) {
+               struct block_device *bdev;
+
+               bdev = BTRFS_I(page->mapping->host)->root->fs_info->fs_devices->latest_bdev;
+               bio_set_dev(bio, bdev);
                wbc_init_bio(wbc, bio);
                wbc_account_cgroup_owner(wbc, page, page_size);
        }
---

To avoid problems with bisection, I'll put the series to after this patch but
otherwise I don't see any reasons to revert.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-18 21:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-07 19:37 [PATCH 0/5] Remove extent_map::bdev David Sterba
2019-10-07 19:37 ` [PATCH 1/5] btrfs: assert extent_map bdevs and lookup_map and split David Sterba
2019-10-07 19:37 ` [PATCH 2/5] btrfs: get bdev from latest_dev for dio bh_result David Sterba
2019-10-09 10:42   ` Nikolay Borisov
2019-10-11 18:26     ` David Sterba
2019-10-11 18:54       ` David Sterba
2019-10-07 19:37 ` [PATCH 3/5] btrfs: drop bio_set_dev where not needed David Sterba
2019-10-07 19:37 ` [PATCH 4/5] btrfs: remove extent_map::bdev David Sterba
2019-10-07 19:37 ` [PATCH 5/5] btrfs: drop bdev argument from submit_extent_page David Sterba
2019-10-22 14:00 ` [PATCH 0/5] Remove extent_map::bdev David Sterba
2019-11-18 15:41 ` Josef Bacik
2019-11-18 15:49   ` David Sterba
2019-11-18 21:57   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).