linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time
@ 2019-10-10  2:39 Qu Wenruo
  2019-10-10  2:39 ` [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Qu Wenruo @ 2019-10-10  2:39 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from:
https://github.com/adam900710/linux/tree/bg_tree
Which is based on v5.4-rc1 tag.

This patchset will hugely reduce mount time of large fs by putting all
block group items into its own tree.

The old behavior will try to read out all block group items at mount
time, however due to the key of block group items are scattered across
tons of extent items, we must call btrfs_search_slot() for each block
group.

It works fine for small fs, but when number of block groups goes beyond
200, such tree search will become a random read, causing obvious slow
down.

On the other hand, btrfs_read_chunk_tree() is still very fast, since we
put CHUNK_ITEMS into their own tree and package them next to each other.

Following this idea, we could do the same thing for block group items,
so instead of triggering btrfs_search_slot() for each block group, we
just call btrfs_next_item() and under most case we could finish in
memory, and hugely speed up mount (see BENCHMARK below).

The only disadvantage is, this method introduce an incompatible feature,
so existing fs can't use this feature directly.
Either specify it at mkfs time, or use btrfs-progs offline convert tool.

[[Benchmark]]
Since I have upgraded my rig to all NVME storage, there is no HDD
test result.

Physical device:	NVMe SSD
VM device:		VirtIO block device, backup by sparse file
Nodesize:		4K  (to bump up tree height)
Extent data size:	4M
Fs size used:		1T

All file extents on disk is in 4M size, preallocated to reduce space usage
(as the VM uses loopback block device backed by sparse file)

Without patchset:
Use ftrace function graph:

 7)               |  open_ctree [btrfs]() {
 7)               |    btrfs_read_block_groups [btrfs]() {
 7) @ 805851.8 us |    }
 7) @ 911890.2 us |  }

 btrfs_read_block_groups() takes 88% of the total mount time,

With patchset, and use -O bg-tree mkfs option:

 6)               |  open_ctree [btrfs]() {
 6)               |    btrfs_read_block_groups [btrfs]() {
 6) * 91204.69 us |    }
 6) @ 192039.5 us |  }

  open_ctree() time is only 21% of original mount time.
  And btrfs_read_block_groups() only takes 47% of total open_ctree()
  execution time.

The reason is pretty obvious when considering how many tree blocks needs
to be read from disk:
- Original extent tree:
  nodes:	55
  leaves:	1025
  total:	1080
- Block group tree:
  nodes:	1
  leaves:	13
  total:	14

Not to mention all the tree blocks readahead works pretty fine for bg
tree, as we will read every item.
While readahead for extent tree will just be a diaster, as all block
groups are scatter across the whole extent tree.

Changelog:
v2:
- Rebase to v5.4-rc1
  Minor conflicts due to code moved to block-group.c
- Fix a bug where some block groups will not be loaded at mount time
  It's a bug in that refactor patch, not exposed by previous round of
  tests.
- Add a new patch to remove a dead check
- Update benchmark to NVMe based result
  Hardware upgrade is not always a good thing for benchmark.

Changelog:
v3:
- Add a separate patch to fix possible memory leak
- Add Reviewed-by tag for the refactor patch
- Reword the refactor patch to mention the change of use
  btrfs_fs_incompat()

Qu Wenruo (3):
  btrfs: block-group: Fix a memory leak due to missing
    btrfs_put_block_group()
  btrfs: block-group: Refactor btrfs_read_block_groups()
  btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time

 fs/btrfs/block-group.c          | 306 ++++++++++++++++++++------------
 fs/btrfs/ctree.h                |   5 +-
 fs/btrfs/disk-io.c              |  13 ++
 fs/btrfs/sysfs.c                |   2 +
 include/uapi/linux/btrfs.h      |   1 +
 include/uapi/linux/btrfs_tree.h |   3 +
 6 files changed, 212 insertions(+), 118 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
  2019-10-10  2:39 [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time Qu Wenruo
@ 2019-10-10  2:39 ` Qu Wenruo
  2019-10-10  2:51   ` Anand Jain
                     ` (2 more replies)
  2019-10-10  2:39 ` [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 17+ messages in thread
From: Qu Wenruo @ 2019-10-10  2:39 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_read_block_groups(), if we have an invalid block group which
has mixed type (DATA|METADATA) while the fs doesn't have MIX_BGS
feature, we error out without freeing the block group cache.

This patch will add the missing btrfs_put_block_group() to prevent
memory leak.

Fixes: 49303381f19a ("Btrfs: bail out if block group has different mixed flag")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index bf7e3f23bba7..c906a2b6c2cf 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1762,6 +1762,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 "bg %llu is a mixed block group but filesystem hasn't enabled mixed block groups",
 				  cache->key.objectid);
 			ret = -EINVAL;
+			btrfs_put_block_group(cache);
 			goto error;
 		}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups()
  2019-10-10  2:39 [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time Qu Wenruo
  2019-10-10  2:39 ` [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Qu Wenruo
@ 2019-10-10  2:39 ` Qu Wenruo
  2019-10-10  2:52   ` Anand Jain
                     ` (2 more replies)
  2019-10-10  2:39 ` [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time Qu Wenruo
  2019-10-10  2:40 ` [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce " Qu WenRuo
  3 siblings, 3 replies; 17+ messages in thread
From: Qu Wenruo @ 2019-10-10  2:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn, Anand Jain

Refactor the work inside the loop of btrfs_read_block_groups() into one
separate function, read_one_block_group().

This allows read_one_block_group to be reused for later BG_TREE feature.

The refactor does the following extra fix:
- Use btrfs_fs_incompat() to replace open-coded feature check

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/block-group.c | 215 +++++++++++++++++++++--------------------
 1 file changed, 108 insertions(+), 107 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index c906a2b6c2cf..0c5eef0610fa 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1687,6 +1687,109 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
+static int read_one_block_group(struct btrfs_fs_info *info,
+				struct btrfs_path *path,
+				int need_clear)
+{
+	struct extent_buffer *leaf = path->nodes[0];
+	struct btrfs_block_group_cache *cache;
+	struct btrfs_space_info *space_info;
+	struct btrfs_key key;
+	int mixed = btrfs_fs_incompat(info, MIXED_GROUPS);
+	int slot = path->slots[0];
+	int ret;
+
+	btrfs_item_key_to_cpu(leaf, &key, slot);
+	ASSERT(key.type == BTRFS_BLOCK_GROUP_ITEM_KEY);
+
+	cache = btrfs_create_block_group_cache(info, key.objectid,
+					       key.offset);
+	if (!cache)
+		return -ENOMEM;
+
+	if (need_clear) {
+		/*
+		 * When we mount with old space cache, we need to
+		 * set BTRFS_DC_CLEAR and set dirty flag.
+		 *
+		 * a) Setting 'BTRFS_DC_CLEAR' makes sure that we
+		 *    truncate the old free space cache inode and
+		 *    setup a new one.
+		 * b) Setting 'dirty flag' makes sure that we flush
+		 *    the new space cache info onto disk.
+		 */
+		if (btrfs_test_opt(info, SPACE_CACHE))
+			cache->disk_cache_state = BTRFS_DC_CLEAR;
+	}
+	read_extent_buffer(leaf, &cache->item,
+			   btrfs_item_ptr_offset(leaf, slot),
+			   sizeof(cache->item));
+	cache->flags = btrfs_block_group_flags(&cache->item);
+	if (!mixed && ((cache->flags & BTRFS_BLOCK_GROUP_METADATA) &&
+	    (cache->flags & BTRFS_BLOCK_GROUP_DATA))) {
+			btrfs_err(info,
+"bg %llu is a mixed block group but filesystem hasn't enabled mixed block groups",
+				  cache->key.objectid);
+			ret = -EINVAL;
+			goto error;
+	}
+
+	/*
+	 * We need to exclude the super stripes now so that the space info has
+	 * super bytes accounted for, otherwise we'll think we have more space
+	 * than we actually do.
+	 */
+	ret = exclude_super_stripes(cache);
+	if (ret) {
+		/* We may have excluded something, so call this just in case. */
+		btrfs_free_excluded_extents(cache);
+		goto error;
+	}
+
+	/*
+	 * Check for two cases, either we are full, and therefore don't need
+	 * to bother with the caching work since we won't find any space, or we
+	 * are empty, and we can just add all the space in and be done with it.
+	 * This saves us _a_lot_ of time, particularly in the full case.
+	 */
+	if (key.offset == btrfs_block_group_used(&cache->item)) {
+		cache->last_byte_to_unpin = (u64)-1;
+		cache->cached = BTRFS_CACHE_FINISHED;
+		btrfs_free_excluded_extents(cache);
+	} else if (btrfs_block_group_used(&cache->item) == 0) {
+		cache->last_byte_to_unpin = (u64)-1;
+		cache->cached = BTRFS_CACHE_FINISHED;
+		add_new_free_space(cache, key.objectid,
+				   key.objectid + key.offset);
+		btrfs_free_excluded_extents(cache);
+	}
+	ret = btrfs_add_block_group_cache(info, cache);
+	if (ret) {
+		btrfs_remove_free_space_cache(cache);
+		goto error;
+	}
+	trace_btrfs_add_block_group(info, cache, 0);
+	btrfs_update_space_info(info, cache->flags, key.offset,
+				btrfs_block_group_used(&cache->item),
+				cache->bytes_super, &space_info);
+
+	cache->space_info = space_info;
+
+	link_block_group(cache);
+
+	set_avail_alloc_bits(info, cache->flags);
+	if (btrfs_chunk_readonly(info, cache->key.objectid)) {
+		inc_block_group_ro(cache, 1);
+	} else if (btrfs_block_group_used(&cache->item) == 0) {
+		ASSERT(list_empty(&cache->bg_list));
+		btrfs_mark_bg_unused(cache);
+	}
+	return 0;
+error:
+	btrfs_put_block_group(cache);
+	return ret;
+}
+
 int btrfs_read_block_groups(struct btrfs_fs_info *info)
 {
 	struct btrfs_path *path;
@@ -1694,15 +1797,8 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 	struct btrfs_block_group_cache *cache;
 	struct btrfs_space_info *space_info;
 	struct btrfs_key key;
-	struct btrfs_key found_key;
-	struct extent_buffer *leaf;
 	int need_clear = 0;
 	u64 cache_gen;
-	u64 feature;
-	int mixed;
-
-	feature = btrfs_super_incompat_flags(info->super_copy);
-	mixed = !!(feature & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS);
 
 	key.objectid = 0;
 	key.offset = 0;
@@ -1726,108 +1822,13 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 		if (ret != 0)
 			goto error;
 
-		leaf = path->nodes[0];
-		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
-
-		cache = btrfs_create_block_group_cache(info, found_key.objectid,
-						       found_key.offset);
-		if (!cache) {
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		if (need_clear) {
-			/*
-			 * When we mount with old space cache, we need to
-			 * set BTRFS_DC_CLEAR and set dirty flag.
-			 *
-			 * a) Setting 'BTRFS_DC_CLEAR' makes sure that we
-			 *    truncate the old free space cache inode and
-			 *    setup a new one.
-			 * b) Setting 'dirty flag' makes sure that we flush
-			 *    the new space cache info onto disk.
-			 */
-			if (btrfs_test_opt(info, SPACE_CACHE))
-				cache->disk_cache_state = BTRFS_DC_CLEAR;
-		}
-
-		read_extent_buffer(leaf, &cache->item,
-				   btrfs_item_ptr_offset(leaf, path->slots[0]),
-				   sizeof(cache->item));
-		cache->flags = btrfs_block_group_flags(&cache->item);
-		if (!mixed &&
-		    ((cache->flags & BTRFS_BLOCK_GROUP_METADATA) &&
-		    (cache->flags & BTRFS_BLOCK_GROUP_DATA))) {
-			btrfs_err(info,
-"bg %llu is a mixed block group but filesystem hasn't enabled mixed block groups",
-				  cache->key.objectid);
-			ret = -EINVAL;
-			btrfs_put_block_group(cache);
+		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+		ret = read_one_block_group(info, path, need_clear);
+		if (ret < 0)
 			goto error;
-		}
-
-		key.objectid = found_key.objectid + found_key.offset;
+		key.objectid += key.offset;
+		key.offset = 0;
 		btrfs_release_path(path);
-
-		/*
-		 * We need to exclude the super stripes now so that the space
-		 * info has super bytes accounted for, otherwise we'll think
-		 * we have more space than we actually do.
-		 */
-		ret = exclude_super_stripes(cache);
-		if (ret) {
-			/*
-			 * We may have excluded something, so call this just in
-			 * case.
-			 */
-			btrfs_free_excluded_extents(cache);
-			btrfs_put_block_group(cache);
-			goto error;
-		}
-
-		/*
-		 * Check for two cases, either we are full, and therefore
-		 * don't need to bother with the caching work since we won't
-		 * find any space, or we are empty, and we can just add all
-		 * the space in and be done with it.  This saves us _a_lot_ of
-		 * time, particularly in the full case.
-		 */
-		if (found_key.offset == btrfs_block_group_used(&cache->item)) {
-			cache->last_byte_to_unpin = (u64)-1;
-			cache->cached = BTRFS_CACHE_FINISHED;
-			btrfs_free_excluded_extents(cache);
-		} else if (btrfs_block_group_used(&cache->item) == 0) {
-			cache->last_byte_to_unpin = (u64)-1;
-			cache->cached = BTRFS_CACHE_FINISHED;
-			add_new_free_space(cache, found_key.objectid,
-					   found_key.objectid +
-					   found_key.offset);
-			btrfs_free_excluded_extents(cache);
-		}
-
-		ret = btrfs_add_block_group_cache(info, cache);
-		if (ret) {
-			btrfs_remove_free_space_cache(cache);
-			btrfs_put_block_group(cache);
-			goto error;
-		}
-
-		trace_btrfs_add_block_group(info, cache, 0);
-		btrfs_update_space_info(info, cache->flags, found_key.offset,
-					btrfs_block_group_used(&cache->item),
-					cache->bytes_super, &space_info);
-
-		cache->space_info = space_info;
-
-		link_block_group(cache);
-
-		set_avail_alloc_bits(info, cache->flags);
-		if (btrfs_chunk_readonly(info, cache->key.objectid)) {
-			inc_block_group_ro(cache, 1);
-		} else if (btrfs_block_group_used(&cache->item) == 0) {
-			ASSERT(list_empty(&cache->bg_list));
-			btrfs_mark_bg_unused(cache);
-		}
 	}
 
 	list_for_each_entry_rcu(space_info, &info->space_info, list) {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time
  2019-10-10  2:39 [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time Qu Wenruo
  2019-10-10  2:39 ` [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Qu Wenruo
  2019-10-10  2:39 ` [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
@ 2019-10-10  2:39 ` Qu Wenruo
  2019-10-10  5:21   ` Naohiro Aota
                     ` (2 more replies)
  2019-10-10  2:40 ` [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce " Qu WenRuo
  3 siblings, 3 replies; 17+ messages in thread
From: Qu Wenruo @ 2019-10-10  2:39 UTC (permalink / raw)
  To: linux-btrfs

The overall idea of the new BG_TREE is pretty simple:
Put BLOCK_GROUP_ITEMS into a separate tree.

This brings one obvious enhancement:
- Reduce mount time of large fs

Although it could be possible to accept BLOCK_GROUP_ITEMS in either
trees (extent root or bg root), I'll leave that kernel convert as
alternatives to offline convert, as next step if there are a lot of
interests in that.

So for now, if an existing fs want to take advantage of BG_TREE feature,
btrfs-progs will provide offline convertion tool.

[[Benchmark]]
Physical device:	NVMe SSD
VM device:		VirtIO block device, backup by sparse file
Nodesize:		4K  (to bump up tree height)
Extent data size:	4M
Fs size used:		1T

All file extents on disk is in 4M size, preallocated to reduce space usage
(as the VM uses loopback block device backed by sparse file)

Without patchset:
Use ftrace function graph:

 7)               |  open_ctree [btrfs]() {
 7)               |    btrfs_read_block_groups [btrfs]() {
 7) @ 805851.8 us |    }
 7) @ 911890.2 us |  }

 btrfs_read_block_groups() takes 88% of the total mount time,

With patchset, and use -O bg-tree mkfs option:

 6)               |  open_ctree [btrfs]() {
 6)               |    btrfs_read_block_groups [btrfs]() {
 6) * 91204.69 us |    }
 6) @ 192039.5 us |  }

  open_ctree() time is only 21% of original mount time.
  And btrfs_read_block_groups() only takes 47% of total open_ctree()
  execution time.

The reason is pretty obvious when considering how many tree blocks needs
to be read from disk:
- Original extent tree:
  nodes:	55
  leaves:	1025
  total:	1080
- Block group tree:
  nodes:	1
  leaves:	13
  total:	14

Not to mention all the tree blocks readahead works pretty fine for bg
tree, as we will read every item.
While readahead for extent tree will just be a diaster, as all block
groups are scatter across the whole extent tree.

The reduction of mount time is already obvious even on super fast NVMe
disk with memory cache.
It would be even more obvious if the fs is on spinning rust.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c          | 104 ++++++++++++++++++++++++++------
 fs/btrfs/ctree.h                |   5 +-
 fs/btrfs/disk-io.c              |  13 ++++
 fs/btrfs/sysfs.c                |   2 +
 include/uapi/linux/btrfs.h      |   1 +
 include/uapi/linux/btrfs_tree.h |   3 +
 6 files changed, 110 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 0c5eef0610fa..0101fd42b586 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -861,7 +861,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 			     u64 group_start, struct extent_map *em)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_root *root = fs_info->extent_root;
+	struct btrfs_root *root;
 	struct btrfs_path *path;
 	struct btrfs_block_group_cache *block_group;
 	struct btrfs_free_cluster *cluster;
@@ -880,6 +880,11 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	BUG_ON(!block_group);
 	BUG_ON(!block_group->ro);
 
+	if (btrfs_fs_incompat(fs_info, BG_TREE))
+		root = fs_info->bg_root;
+	else
+		root = fs_info->extent_root;
+
 	trace_btrfs_remove_block_group(block_group);
 	/*
 	 * Free the reserved super bytes from this block group before
@@ -1790,6 +1795,56 @@ static int read_one_block_group(struct btrfs_fs_info *info,
 	return ret;
 }
 
+static int read_block_group_tree(struct btrfs_fs_info *info, int need_clear)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	int ret;
+
+	key.objectid = 0;
+	key.offset = 0;
+	key.type = 0;
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = btrfs_search_slot(NULL, info->bg_root, &key, path, 0, 0);
+	if (ret < 0)
+		return ret;
+	if (ret == 0) {
+		btrfs_err(info,
+			  "found invalid block group bytenr %llu len %llu",
+			  key.objectid, key.offset);
+		ret = -EUCLEAN;
+		goto out;
+	}
+
+	while (1) {
+		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+		if (key.type != BTRFS_BLOCK_GROUP_ITEM_KEY) {
+			btrfs_err(info,
+		"found invalid key (%llu, %u, %llu) in block group tree",
+				  key.objectid, key.type, key.offset);
+			ret = -EUCLEAN;
+			goto out;
+		}
+
+		ret = read_one_block_group(info, path, need_clear);
+		if (ret < 0)
+			goto out;
+		ret = btrfs_next_item(info->bg_root, path);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			goto out;
+		}
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
 int btrfs_read_block_groups(struct btrfs_fs_info *info)
 {
 	struct btrfs_path *path;
@@ -1815,20 +1870,26 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 	if (btrfs_test_opt(info, CLEAR_CACHE))
 		need_clear = 1;
 
-	while (1) {
-		ret = find_first_block_group(info, path, &key);
-		if (ret > 0)
-			break;
-		if (ret != 0)
-			goto error;
-
-		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
-		ret = read_one_block_group(info, path, need_clear);
+	if (btrfs_fs_incompat(info, BG_TREE)) {
+		ret = read_block_group_tree(info, need_clear);
 		if (ret < 0)
 			goto error;
-		key.objectid += key.offset;
-		key.offset = 0;
-		btrfs_release_path(path);
+	} else {
+		while (1) {
+			ret = find_first_block_group(info, path, &key);
+			if (ret > 0)
+				break;
+			if (ret != 0)
+				goto error;
+
+			btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+			ret = read_one_block_group(info, path, need_clear);
+			if (ret < 0)
+				goto error;
+			key.objectid += key.offset;
+			key.offset = 0;
+			btrfs_release_path(path);
+		}
 	}
 
 	list_for_each_entry_rcu(space_info, &info->space_info, list) {
@@ -1863,7 +1924,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_block_group_cache *block_group;
-	struct btrfs_root *extent_root = fs_info->extent_root;
+	struct btrfs_root *root;
 	struct btrfs_block_group_item item;
 	struct btrfs_key key;
 	int ret = 0;
@@ -1871,6 +1932,11 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
 	if (!trans->can_flush_pending_bgs)
 		return;
 
+	if (btrfs_fs_incompat(fs_info, BG_TREE))
+		root = fs_info->bg_root;
+	else
+		root = fs_info->extent_root;
+
 	while (!list_empty(&trans->new_bgs)) {
 		block_group = list_first_entry(&trans->new_bgs,
 					       struct btrfs_block_group_cache,
@@ -1883,7 +1949,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
 		memcpy(&key, &block_group->key, sizeof(key));
 		spin_unlock(&block_group->lock);
 
-		ret = btrfs_insert_item(trans, extent_root, &key, &item,
+		ret = btrfs_insert_item(trans, root, &key, &item,
 					sizeof(item));
 		if (ret)
 			btrfs_abort_transaction(trans, ret);
@@ -2119,11 +2185,15 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	int ret;
-	struct btrfs_root *extent_root = fs_info->extent_root;
+	struct btrfs_root *root;
 	unsigned long bi;
 	struct extent_buffer *leaf;
 
-	ret = btrfs_search_slot(trans, extent_root, &cache->key, path, 0, 1);
+	if (btrfs_fs_incompat(fs_info, BG_TREE))
+		root = fs_info->bg_root;
+	else
+		root = fs_info->extent_root;
+	ret = btrfs_search_slot(trans, root, &cache->key, path, 0, 1);
 	if (ret) {
 		if (ret > 0)
 			ret = -ENOENT;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 19d669d12ca1..c98d008fb42f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -291,7 +291,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
 	 BTRFS_FEATURE_INCOMPAT_NO_HOLES	|	\
-	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
+	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID	|	\
+	 BTRFS_FEATURE_INCOMPAT_BG_TREE)
 
 #define BTRFS_FEATURE_INCOMPAT_SAFE_SET			\
 	(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
@@ -538,6 +539,7 @@ struct btrfs_fs_info {
 	struct btrfs_root *quota_root;
 	struct btrfs_root *uuid_root;
 	struct btrfs_root *free_space_root;
+	struct btrfs_root *bg_root;
 
 	/* the log root tree is a directory of all the other log roots */
 	struct btrfs_root *log_root_tree;
@@ -2661,6 +2663,7 @@ static inline void free_fs_info(struct btrfs_fs_info *fs_info)
 	kfree(fs_info->quota_root);
 	kfree(fs_info->uuid_root);
 	kfree(fs_info->free_space_root);
+	kfree(fs_info->bg_root);
 	kfree(fs_info->super_copy);
 	kfree(fs_info->super_for_commit);
 	kvfree(fs_info);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 044981cf6df9..1c5728e6a660 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1570,6 +1570,8 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 	if (location->objectid == BTRFS_FREE_SPACE_TREE_OBJECTID)
 		return fs_info->free_space_root ? fs_info->free_space_root :
 						  ERR_PTR(-ENOENT);
+	if (location->objectid == BTRFS_BLOCK_GROUP_TREE_OBJECTID)
+		return fs_info->bg_root ? fs_info->bg_root: ERR_PTR(-ENOENT);
 again:
 	root = btrfs_lookup_fs_root(fs_info, location->objectid);
 	if (root) {
@@ -2041,6 +2043,7 @@ static void free_root_pointers(struct btrfs_fs_info *info, int chunk_root)
 	if (chunk_root)
 		free_root_extent_buffers(info->chunk_root);
 	free_root_extent_buffers(info->free_space_root);
+	free_root_extent_buffers(info->bg_root);
 }
 
 void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info)
@@ -2333,6 +2336,16 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
 	set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
 	fs_info->extent_root = root;
 
+	if (btrfs_fs_incompat(fs_info, BG_TREE)) {
+		location.objectid = BTRFS_BLOCK_GROUP_TREE_OBJECTID;
+		root = btrfs_read_tree_root(tree_root, &location);
+		if (IS_ERR(root)) {
+			ret = PTR_ERR(root);
+			goto out;
+		}
+		set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
+		fs_info->bg_root = root;
+	}
 	location.objectid = BTRFS_DEV_TREE_OBJECTID;
 	root = btrfs_read_tree_root(tree_root, &location);
 	if (IS_ERR(root)) {
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index f6d3c80f2e28..799f7e18d6ee 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -257,6 +257,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(raid56, RAID56);
 BTRFS_FEAT_ATTR_INCOMPAT(skinny_metadata, SKINNY_METADATA);
 BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES);
 BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID);
+BTRFS_FEAT_ATTR_INCOMPAT(bg_tree, BG_TREE);
 BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE);
 
 static struct attribute *btrfs_supported_feature_attrs[] = {
@@ -272,6 +273,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = {
 	BTRFS_FEAT_ATTR_PTR(no_holes),
 	BTRFS_FEAT_ATTR_PTR(metadata_uuid),
 	BTRFS_FEAT_ATTR_PTR(free_space_tree),
+	BTRFS_FEAT_ATTR_PTR(bg_tree),
 	NULL
 };
 
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 3ee0678c0a83..0f91ebc07064 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -270,6 +270,7 @@ struct btrfs_ioctl_fs_info_args {
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
 #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID	(1ULL << 10)
+#define BTRFS_FEATURE_INCOMPAT_BG_TREE		(1ULL << 11)
 
 struct btrfs_ioctl_feature_flags {
 	__u64 compat_flags;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index b65c7ee75bc7..1d1e50c42d0e 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -48,6 +48,9 @@
 /* tracks free space in block groups. */
 #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
 
+/* sotre BLOCK_GROUP_ITEMs in a seprate tree */
+#define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL
+
 /* device stats in the device tree */
 #define BTRFS_DEV_STATS_OBJECTID 0ULL
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time
  2019-10-10  2:39 [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-10-10  2:39 ` [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time Qu Wenruo
@ 2019-10-10  2:40 ` Qu WenRuo
  3 siblings, 0 replies; 17+ messages in thread
From: Qu WenRuo @ 2019-10-10  2:40 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4254 bytes --]



On 2019/10/10 上午10:39, Qu Wenruo wrote:
> This patchset can be fetched from:
> https://github.com/adam900710/linux/tree/bg_tree
> Which is based on v5.4-rc1 tag.
> 
> This patchset will hugely reduce mount time of large fs by putting all
> block group items into its own tree.
> 
> The old behavior will try to read out all block group items at mount
> time, however due to the key of block group items are scattered across
> tons of extent items, we must call btrfs_search_slot() for each block
> group.
> 
> It works fine for small fs, but when number of block groups goes beyond
> 200, such tree search will become a random read, causing obvious slow
> down.
> 
> On the other hand, btrfs_read_chunk_tree() is still very fast, since we
> put CHUNK_ITEMS into their own tree and package them next to each other.
> 
> Following this idea, we could do the same thing for block group items,
> so instead of triggering btrfs_search_slot() for each block group, we
> just call btrfs_next_item() and under most case we could finish in
> memory, and hugely speed up mount (see BENCHMARK below).
> 
> The only disadvantage is, this method introduce an incompatible feature,
> so existing fs can't use this feature directly.
> Either specify it at mkfs time, or use btrfs-progs offline convert tool.
> 
> [[Benchmark]]
> Since I have upgraded my rig to all NVME storage, there is no HDD
> test result.
> 
> Physical device:	NVMe SSD
> VM device:		VirtIO block device, backup by sparse file
> Nodesize:		4K  (to bump up tree height)
> Extent data size:	4M
> Fs size used:		1T
> 
> All file extents on disk is in 4M size, preallocated to reduce space usage
> (as the VM uses loopback block device backed by sparse file)
> 
> Without patchset:
> Use ftrace function graph:
> 
>  7)               |  open_ctree [btrfs]() {
>  7)               |    btrfs_read_block_groups [btrfs]() {
>  7) @ 805851.8 us |    }
>  7) @ 911890.2 us |  }
> 
>  btrfs_read_block_groups() takes 88% of the total mount time,
> 
> With patchset, and use -O bg-tree mkfs option:
> 
>  6)               |  open_ctree [btrfs]() {
>  6)               |    btrfs_read_block_groups [btrfs]() {
>  6) * 91204.69 us |    }
>  6) @ 192039.5 us |  }
> 
>   open_ctree() time is only 21% of original mount time.
>   And btrfs_read_block_groups() only takes 47% of total open_ctree()
>   execution time.
> 
> The reason is pretty obvious when considering how many tree blocks needs
> to be read from disk:
> - Original extent tree:
>   nodes:	55
>   leaves:	1025
>   total:	1080
> - Block group tree:
>   nodes:	1
>   leaves:	13
>   total:	14
> 
> Not to mention all the tree blocks readahead works pretty fine for bg
> tree, as we will read every item.
> While readahead for extent tree will just be a diaster, as all block
> groups are scatter across the whole extent tree.
> 
> Changelog:
> v2:
> - Rebase to v5.4-rc1
>   Minor conflicts due to code moved to block-group.c
> - Fix a bug where some block groups will not be loaded at mount time
>   It's a bug in that refactor patch, not exposed by previous round of
>   tests.
> - Add a new patch to remove a dead check
> - Update benchmark to NVMe based result
>   Hardware upgrade is not always a good thing for benchmark.
> 
> Changelog:
> v3:
> - Add a separate patch to fix possible memory leak
> - Add Reviewed-by tag for the refactor patch
> - Reword the refactor patch to mention the change of use
>   btrfs_fs_incompat()
Forgot one:

- Remove one wrong patch which could break usebackuproot mount option.

Thanks,
Qu
> 
> Qu Wenruo (3):
>   btrfs: block-group: Fix a memory leak due to missing
>     btrfs_put_block_group()
>   btrfs: block-group: Refactor btrfs_read_block_groups()
>   btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time
> 
>  fs/btrfs/block-group.c          | 306 ++++++++++++++++++++------------
>  fs/btrfs/ctree.h                |   5 +-
>  fs/btrfs/disk-io.c              |  13 ++
>  fs/btrfs/sysfs.c                |   2 +
>  include/uapi/linux/btrfs.h      |   1 +
>  include/uapi/linux/btrfs_tree.h |   3 +
>  6 files changed, 212 insertions(+), 118 deletions(-)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
  2019-10-10  2:39 ` [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Qu Wenruo
@ 2019-10-10  2:51   ` Anand Jain
  2019-10-10  7:20   ` Johannes Thumshirn
  2019-10-11 19:23   ` David Sterba
  2 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2019-10-10  2:51 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 10/10/19 10:39 AM, Qu Wenruo wrote:
> In btrfs_read_block_groups(), if we have an invalid block group which
> has mixed type (DATA|METADATA) while the fs doesn't have MIX_BGS
> feature, we error out without freeing the block group cache.
> 
> This patch will add the missing btrfs_put_block_group() to prevent
> memory leak.
> 
> Fixes: 49303381f19a ("Btrfs: bail out if block group has different mixed flag")
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups()
  2019-10-10  2:39 ` [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
@ 2019-10-10  2:52   ` Anand Jain
  2019-10-30  4:59   ` Qu WenRuo
  2019-11-04 19:55   ` David Sterba
  2 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2019-10-10  2:52 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Johannes Thumshirn

On 10/10/19 10:39 AM, Qu Wenruo wrote:
> Refactor the work inside the loop of btrfs_read_block_groups() into one
> separate function, read_one_block_group().
> 
> This allows read_one_block_group to be reused for later BG_TREE feature.
> 
> The refactor does the following extra fix:
> - Use btrfs_fs_incompat() to replace open-coded feature check
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Anand Jain <anand.jain@oracle.com>
> ---
>   fs/btrfs/block-group.c | 215 +++++++++++++++++++++--------------------
>   1 file changed, 108 insertions(+), 107 deletions(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index c906a2b6c2cf..0c5eef0610fa 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1687,6 +1687,109 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
>   	return ret;
>   }
>   
> +static int read_one_block_group(struct btrfs_fs_info *info,
> +				struct btrfs_path *path,
> +				int need_clear)
> +{
> +	struct extent_buffer *leaf = path->nodes[0];
> +	struct btrfs_block_group_cache *cache;
> +	struct btrfs_space_info *space_info;
> +	struct btrfs_key key;
> +	int mixed = btrfs_fs_incompat(info, MIXED_GROUPS);

  nit:  mixed can be bool. (David can change while integrating.)

Thanks, Anand

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time
  2019-10-10  2:39 ` [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time Qu Wenruo
@ 2019-10-10  5:21   ` Naohiro Aota
  2019-10-11 13:23   ` Josef Bacik
  2019-10-14  9:08   ` Anand Jain
  2 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2019-10-10  5:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 10, 2019 at 10:39:28AM +0800, Qu Wenruo wrote:
>diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
>index b65c7ee75bc7..1d1e50c42d0e 100644
>--- a/include/uapi/linux/btrfs_tree.h
>+++ b/include/uapi/linux/btrfs_tree.h
>@@ -48,6 +48,9 @@
> /* tracks free space in block groups. */
> #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
>
>+/* sotre BLOCK_GROUP_ITEMs in a seprate tree */
>+#define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL
>+

nit: "store" and "separate"?

> /* device stats in the device tree */
> #define BTRFS_DEV_STATS_OBJECTID 0ULL
>
>-- 
>2.23.0
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
  2019-10-10  2:39 ` [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Qu Wenruo
  2019-10-10  2:51   ` Anand Jain
@ 2019-10-10  7:20   ` Johannes Thumshirn
  2019-10-11 19:23   ` David Sterba
  2 siblings, 0 replies; 17+ messages in thread
From: Johannes Thumshirn @ 2019-10-10  7:20 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
(HRB 247165, AG München)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time
  2019-10-10  2:39 ` [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time Qu Wenruo
  2019-10-10  5:21   ` Naohiro Aota
@ 2019-10-11 13:23   ` Josef Bacik
  2019-10-14  9:08   ` Anand Jain
  2 siblings, 0 replies; 17+ messages in thread
From: Josef Bacik @ 2019-10-11 13:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 10, 2019 at 10:39:28AM +0800, Qu Wenruo wrote:
> The overall idea of the new BG_TREE is pretty simple:
> Put BLOCK_GROUP_ITEMS into a separate tree.
> 
> This brings one obvious enhancement:
> - Reduce mount time of large fs
> 
> Although it could be possible to accept BLOCK_GROUP_ITEMS in either
> trees (extent root or bg root), I'll leave that kernel convert as
> alternatives to offline convert, as next step if there are a lot of
> interests in that.
> 
> So for now, if an existing fs want to take advantage of BG_TREE feature,
> btrfs-progs will provide offline convertion tool.
> 
> [[Benchmark]]
> Physical device:	NVMe SSD
> VM device:		VirtIO block device, backup by sparse file
> Nodesize:		4K  (to bump up tree height)
> Extent data size:	4M
> Fs size used:		1T
> 
> All file extents on disk is in 4M size, preallocated to reduce space usage
> (as the VM uses loopback block device backed by sparse file)
> 
> Without patchset:
> Use ftrace function graph:
> 
>  7)               |  open_ctree [btrfs]() {
>  7)               |    btrfs_read_block_groups [btrfs]() {
>  7) @ 805851.8 us |    }
>  7) @ 911890.2 us |  }
> 
>  btrfs_read_block_groups() takes 88% of the total mount time,
> 
> With patchset, and use -O bg-tree mkfs option:
> 
>  6)               |  open_ctree [btrfs]() {
>  6)               |    btrfs_read_block_groups [btrfs]() {
>  6) * 91204.69 us |    }
>  6) @ 192039.5 us |  }
> 
>   open_ctree() time is only 21% of original mount time.
>   And btrfs_read_block_groups() only takes 47% of total open_ctree()
>   execution time.
> 
> The reason is pretty obvious when considering how many tree blocks needs
> to be read from disk:
> - Original extent tree:
>   nodes:	55
>   leaves:	1025
>   total:	1080
> - Block group tree:
>   nodes:	1
>   leaves:	13
>   total:	14
> 
> Not to mention all the tree blocks readahead works pretty fine for bg
> tree, as we will read every item.
> While readahead for extent tree will just be a diaster, as all block
> groups are scatter across the whole extent tree.
> 
> The reduction of mount time is already obvious even on super fast NVMe
> disk with memory cache.
> It would be even more obvious if the fs is on spinning rust.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

You need to add

fs_info->bg_root->block_rsv = &fs_info->delayed_refs_rsv;

to btrfs_init_global_block_rsv, otherwise bad things will happen.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
  2019-10-10  2:39 ` [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Qu Wenruo
  2019-10-10  2:51   ` Anand Jain
  2019-10-10  7:20   ` Johannes Thumshirn
@ 2019-10-11 19:23   ` David Sterba
  2 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2019-10-11 19:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 10, 2019 at 10:39:26AM +0800, Qu Wenruo wrote:
> In btrfs_read_block_groups(), if we have an invalid block group which
> has mixed type (DATA|METADATA) while the fs doesn't have MIX_BGS
> feature, we error out without freeing the block group cache.
> 
> This patch will add the missing btrfs_put_block_group() to prevent
> memory leak.
> 
> Fixes: 49303381f19a ("Btrfs: bail out if block group has different mixed flag")
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Please send independent fixes out of feature patchsets so it does not
get lost due to priorities fixes vs features. I'll add it to misc-next
and queue for 5.4. Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time
  2019-10-10  2:39 ` [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time Qu Wenruo
  2019-10-10  5:21   ` Naohiro Aota
  2019-10-11 13:23   ` Josef Bacik
@ 2019-10-14  9:08   ` Anand Jain
  2 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2019-10-14  9:08 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs





> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 044981cf6df9..1c5728e6a660 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1570,6 +1570,8 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>   	if (location->objectid == BTRFS_FREE_SPACE_TREE_OBJECTID)
>   		return fs_info->free_space_root ? fs_info->free_space_root :
>   						  ERR_PTR(-ENOENT);
> +	if (location->objectid == BTRFS_BLOCK_GROUP_TREE_OBJECTID)


> +		return fs_info->bg_root ? fs_info->bg_root: ERR_PTR(-ENOENT);

  An explicit check is better than implicit check for example:
------------
return btrfs_fs_incompat(fs_info, BG_TREE) ? \
	fs_info->bg_root : ERR_PTR(-ENOENT);
------------


Thanks, Anand

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups()
  2019-10-10  2:39 ` [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
  2019-10-10  2:52   ` Anand Jain
@ 2019-10-30  4:59   ` Qu WenRuo
  2019-11-04 19:53     ` David Sterba
  2019-11-04 19:55   ` David Sterba
  2 siblings, 1 reply; 17+ messages in thread
From: Qu WenRuo @ 2019-10-30  4:59 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn, Anand Jain


[-- Attachment #1.1: Type: text/plain, Size: 9489 bytes --]



On 2019/10/10 上午10:39, Qu Wenruo wrote:
> Refactor the work inside the loop of btrfs_read_block_groups() into one
> separate function, read_one_block_group().
> 
> This allows read_one_block_group to be reused for later BG_TREE feature.
> 
> The refactor does the following extra fix:
> - Use btrfs_fs_incompat() to replace open-coded feature check
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Anand Jain <anand.jain@oracle.com>

Hi David,

Mind to add this patch to for-next branch?

Considering the recent changes to struct btrfs_block_group_cache, there
is some considerable conflicts.

It would be much easier to solve them sooner than later.
If needed I could send a newer version based on latest for-next branch.

Thanks,
Qu

> ---
>  fs/btrfs/block-group.c | 215 +++++++++++++++++++++--------------------
>  1 file changed, 108 insertions(+), 107 deletions(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index c906a2b6c2cf..0c5eef0610fa 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1687,6 +1687,109 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
>  	return ret;
>  }
>  
> +static int read_one_block_group(struct btrfs_fs_info *info,
> +				struct btrfs_path *path,
> +				int need_clear)
> +{
> +	struct extent_buffer *leaf = path->nodes[0];
> +	struct btrfs_block_group_cache *cache;
> +	struct btrfs_space_info *space_info;
> +	struct btrfs_key key;
> +	int mixed = btrfs_fs_incompat(info, MIXED_GROUPS);
> +	int slot = path->slots[0];
> +	int ret;
> +
> +	btrfs_item_key_to_cpu(leaf, &key, slot);
> +	ASSERT(key.type == BTRFS_BLOCK_GROUP_ITEM_KEY);
> +
> +	cache = btrfs_create_block_group_cache(info, key.objectid,
> +					       key.offset);
> +	if (!cache)
> +		return -ENOMEM;
> +
> +	if (need_clear) {
> +		/*
> +		 * When we mount with old space cache, we need to
> +		 * set BTRFS_DC_CLEAR and set dirty flag.
> +		 *
> +		 * a) Setting 'BTRFS_DC_CLEAR' makes sure that we
> +		 *    truncate the old free space cache inode and
> +		 *    setup a new one.
> +		 * b) Setting 'dirty flag' makes sure that we flush
> +		 *    the new space cache info onto disk.
> +		 */
> +		if (btrfs_test_opt(info, SPACE_CACHE))
> +			cache->disk_cache_state = BTRFS_DC_CLEAR;
> +	}
> +	read_extent_buffer(leaf, &cache->item,
> +			   btrfs_item_ptr_offset(leaf, slot),
> +			   sizeof(cache->item));
> +	cache->flags = btrfs_block_group_flags(&cache->item);
> +	if (!mixed && ((cache->flags & BTRFS_BLOCK_GROUP_METADATA) &&
> +	    (cache->flags & BTRFS_BLOCK_GROUP_DATA))) {
> +			btrfs_err(info,
> +"bg %llu is a mixed block group but filesystem hasn't enabled mixed block groups",
> +				  cache->key.objectid);
> +			ret = -EINVAL;
> +			goto error;
> +	}
> +
> +	/*
> +	 * We need to exclude the super stripes now so that the space info has
> +	 * super bytes accounted for, otherwise we'll think we have more space
> +	 * than we actually do.
> +	 */
> +	ret = exclude_super_stripes(cache);
> +	if (ret) {
> +		/* We may have excluded something, so call this just in case. */
> +		btrfs_free_excluded_extents(cache);
> +		goto error;
> +	}
> +
> +	/*
> +	 * Check for two cases, either we are full, and therefore don't need
> +	 * to bother with the caching work since we won't find any space, or we
> +	 * are empty, and we can just add all the space in and be done with it.
> +	 * This saves us _a_lot_ of time, particularly in the full case.
> +	 */
> +	if (key.offset == btrfs_block_group_used(&cache->item)) {
> +		cache->last_byte_to_unpin = (u64)-1;
> +		cache->cached = BTRFS_CACHE_FINISHED;
> +		btrfs_free_excluded_extents(cache);
> +	} else if (btrfs_block_group_used(&cache->item) == 0) {
> +		cache->last_byte_to_unpin = (u64)-1;
> +		cache->cached = BTRFS_CACHE_FINISHED;
> +		add_new_free_space(cache, key.objectid,
> +				   key.objectid + key.offset);
> +		btrfs_free_excluded_extents(cache);
> +	}
> +	ret = btrfs_add_block_group_cache(info, cache);
> +	if (ret) {
> +		btrfs_remove_free_space_cache(cache);
> +		goto error;
> +	}
> +	trace_btrfs_add_block_group(info, cache, 0);
> +	btrfs_update_space_info(info, cache->flags, key.offset,
> +				btrfs_block_group_used(&cache->item),
> +				cache->bytes_super, &space_info);
> +
> +	cache->space_info = space_info;
> +
> +	link_block_group(cache);
> +
> +	set_avail_alloc_bits(info, cache->flags);
> +	if (btrfs_chunk_readonly(info, cache->key.objectid)) {
> +		inc_block_group_ro(cache, 1);
> +	} else if (btrfs_block_group_used(&cache->item) == 0) {
> +		ASSERT(list_empty(&cache->bg_list));
> +		btrfs_mark_bg_unused(cache);
> +	}
> +	return 0;
> +error:
> +	btrfs_put_block_group(cache);
> +	return ret;
> +}
> +
>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  {
>  	struct btrfs_path *path;
> @@ -1694,15 +1797,8 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  	struct btrfs_block_group_cache *cache;
>  	struct btrfs_space_info *space_info;
>  	struct btrfs_key key;
> -	struct btrfs_key found_key;
> -	struct extent_buffer *leaf;
>  	int need_clear = 0;
>  	u64 cache_gen;
> -	u64 feature;
> -	int mixed;
> -
> -	feature = btrfs_super_incompat_flags(info->super_copy);
> -	mixed = !!(feature & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS);
>  
>  	key.objectid = 0;
>  	key.offset = 0;
> @@ -1726,108 +1822,13 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  		if (ret != 0)
>  			goto error;
>  
> -		leaf = path->nodes[0];
> -		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> -
> -		cache = btrfs_create_block_group_cache(info, found_key.objectid,
> -						       found_key.offset);
> -		if (!cache) {
> -			ret = -ENOMEM;
> -			goto error;
> -		}
> -
> -		if (need_clear) {
> -			/*
> -			 * When we mount with old space cache, we need to
> -			 * set BTRFS_DC_CLEAR and set dirty flag.
> -			 *
> -			 * a) Setting 'BTRFS_DC_CLEAR' makes sure that we
> -			 *    truncate the old free space cache inode and
> -			 *    setup a new one.
> -			 * b) Setting 'dirty flag' makes sure that we flush
> -			 *    the new space cache info onto disk.
> -			 */
> -			if (btrfs_test_opt(info, SPACE_CACHE))
> -				cache->disk_cache_state = BTRFS_DC_CLEAR;
> -		}
> -
> -		read_extent_buffer(leaf, &cache->item,
> -				   btrfs_item_ptr_offset(leaf, path->slots[0]),
> -				   sizeof(cache->item));
> -		cache->flags = btrfs_block_group_flags(&cache->item);
> -		if (!mixed &&
> -		    ((cache->flags & BTRFS_BLOCK_GROUP_METADATA) &&
> -		    (cache->flags & BTRFS_BLOCK_GROUP_DATA))) {
> -			btrfs_err(info,
> -"bg %llu is a mixed block group but filesystem hasn't enabled mixed block groups",
> -				  cache->key.objectid);
> -			ret = -EINVAL;
> -			btrfs_put_block_group(cache);
> +		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +		ret = read_one_block_group(info, path, need_clear);
> +		if (ret < 0)
>  			goto error;
> -		}
> -
> -		key.objectid = found_key.objectid + found_key.offset;
> +		key.objectid += key.offset;
> +		key.offset = 0;
>  		btrfs_release_path(path);
> -
> -		/*
> -		 * We need to exclude the super stripes now so that the space
> -		 * info has super bytes accounted for, otherwise we'll think
> -		 * we have more space than we actually do.
> -		 */
> -		ret = exclude_super_stripes(cache);
> -		if (ret) {
> -			/*
> -			 * We may have excluded something, so call this just in
> -			 * case.
> -			 */
> -			btrfs_free_excluded_extents(cache);
> -			btrfs_put_block_group(cache);
> -			goto error;
> -		}
> -
> -		/*
> -		 * Check for two cases, either we are full, and therefore
> -		 * don't need to bother with the caching work since we won't
> -		 * find any space, or we are empty, and we can just add all
> -		 * the space in and be done with it.  This saves us _a_lot_ of
> -		 * time, particularly in the full case.
> -		 */
> -		if (found_key.offset == btrfs_block_group_used(&cache->item)) {
> -			cache->last_byte_to_unpin = (u64)-1;
> -			cache->cached = BTRFS_CACHE_FINISHED;
> -			btrfs_free_excluded_extents(cache);
> -		} else if (btrfs_block_group_used(&cache->item) == 0) {
> -			cache->last_byte_to_unpin = (u64)-1;
> -			cache->cached = BTRFS_CACHE_FINISHED;
> -			add_new_free_space(cache, found_key.objectid,
> -					   found_key.objectid +
> -					   found_key.offset);
> -			btrfs_free_excluded_extents(cache);
> -		}
> -
> -		ret = btrfs_add_block_group_cache(info, cache);
> -		if (ret) {
> -			btrfs_remove_free_space_cache(cache);
> -			btrfs_put_block_group(cache);
> -			goto error;
> -		}
> -
> -		trace_btrfs_add_block_group(info, cache, 0);
> -		btrfs_update_space_info(info, cache->flags, found_key.offset,
> -					btrfs_block_group_used(&cache->item),
> -					cache->bytes_super, &space_info);
> -
> -		cache->space_info = space_info;
> -
> -		link_block_group(cache);
> -
> -		set_avail_alloc_bits(info, cache->flags);
> -		if (btrfs_chunk_readonly(info, cache->key.objectid)) {
> -			inc_block_group_ro(cache, 1);
> -		} else if (btrfs_block_group_used(&cache->item) == 0) {
> -			ASSERT(list_empty(&cache->bg_list));
> -			btrfs_mark_bg_unused(cache);
> -		}
>  	}
>  
>  	list_for_each_entry_rcu(space_info, &info->space_info, list) {
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups()
  2019-10-30  4:59   ` Qu WenRuo
@ 2019-11-04 19:53     ` David Sterba
  2019-11-04 21:44       ` David Sterba
  0 siblings, 1 reply; 17+ messages in thread
From: David Sterba @ 2019-11-04 19:53 UTC (permalink / raw)
  To: Qu WenRuo; +Cc: linux-btrfs, Johannes Thumshirn, Anand Jain

On Wed, Oct 30, 2019 at 04:59:17AM +0000, Qu WenRuo wrote:
> 
> 
> On 2019/10/10 上午10:39, Qu Wenruo wrote:
> > Refactor the work inside the loop of btrfs_read_block_groups() into one
> > separate function, read_one_block_group().
> > 
> > This allows read_one_block_group to be reused for later BG_TREE feature.
> > 
> > The refactor does the following extra fix:
> > - Use btrfs_fs_incompat() to replace open-coded feature check
> > 
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> > Reviewed-by: Anand Jain <anand.jain@oracle.com>
> 
> Hi David,
> 
> Mind to add this patch to for-next branch?
> 
> Considering the recent changes to struct btrfs_block_group_cache, there
> is some considerable conflicts.

I see, as the patch is idependent I'll add it.

> It would be much easier to solve them sooner than later.
> If needed I could send a newer version based on latest for-next branch.

I've fixed the conflicts, but please have a look anyway. The change was
cache->item to local block group item and rename of found_key to key in
read_one_block_group.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups()
  2019-10-10  2:39 ` [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
  2019-10-10  2:52   ` Anand Jain
  2019-10-30  4:59   ` Qu WenRuo
@ 2019-11-04 19:55   ` David Sterba
  2 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2019-11-04 19:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Johannes Thumshirn, Anand Jain

On Thu, Oct 10, 2019 at 10:39:27AM +0800, Qu Wenruo wrote:
> +static int read_one_block_group(struct btrfs_fs_info *info,
> +				struct btrfs_path *path,
> +				int need_clear)
> +{
> +	struct extent_buffer *leaf = path->nodes[0];
> +	struct btrfs_block_group_cache *cache;
> +	struct btrfs_space_info *space_info;
> +	struct btrfs_key key;
> +	int mixed = btrfs_fs_incompat(info, MIXED_GROUPS);
> +	int slot = path->slots[0];
> +	int ret;
> +
> +	btrfs_item_key_to_cpu(leaf, &key, slot);

The first thing done here is the same as in the caller:

> +		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +		ret = read_one_block_group(info, path, need_clear);

The key can be passed by pointer so it's not on stack and the conversion
can be removed. I left it in the patch, please send a followup. Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups()
  2019-11-04 19:53     ` David Sterba
@ 2019-11-04 21:44       ` David Sterba
  2019-11-05  0:47         ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: David Sterba @ 2019-11-04 21:44 UTC (permalink / raw)
  To: dsterba, Qu WenRuo, linux-btrfs, Johannes Thumshirn, Anand Jain

On Mon, Nov 04, 2019 at 08:53:52PM +0100, David Sterba wrote:
> On Wed, Oct 30, 2019 at 04:59:17AM +0000, Qu WenRuo wrote:
> > 
> > 
> > On 2019/10/10 上午10:39, Qu Wenruo wrote:
> > > Refactor the work inside the loop of btrfs_read_block_groups() into one
> > > separate function, read_one_block_group().
> > > 
> > > This allows read_one_block_group to be reused for later BG_TREE feature.
> > > 
> > > The refactor does the following extra fix:
> > > - Use btrfs_fs_incompat() to replace open-coded feature check
> > > 
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > > Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> > > Reviewed-by: Anand Jain <anand.jain@oracle.com>
> > 
> > Hi David,
> > 
> > Mind to add this patch to for-next branch?
> > 
> > Considering the recent changes to struct btrfs_block_group_cache, there
> > is some considerable conflicts.
> 
> I see, as the patch is idependent I'll add it.
> 
> > It would be much easier to solve them sooner than later.
> > If needed I could send a newer version based on latest for-next branch.
> 
> I've fixed the conflicts, but please have a look anyway. The change was
> cache->item to local block group item and rename of found_key to key in
> read_one_block_group.

And it crashes during the self-tests, the patch is in branch
misc-next-with-bg-refactoring in my github tree, please have a look.
I've removed it from misc-next for now as I need to test for-next, but
it's probably going to be some trivial typo so the patch will be added
once it's found. Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups()
  2019-11-04 21:44       ` David Sterba
@ 2019-11-05  0:47         ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2019-11-05  0:47 UTC (permalink / raw)
  To: dsterba, Qu WenRuo, linux-btrfs, Johannes Thumshirn, Anand Jain


[-- Attachment #1.1: Type: text/plain, Size: 2001 bytes --]



On 2019/11/5 上午5:44, David Sterba wrote:
> On Mon, Nov 04, 2019 at 08:53:52PM +0100, David Sterba wrote:
>> On Wed, Oct 30, 2019 at 04:59:17AM +0000, Qu WenRuo wrote:
>>>
>>>
>>> On 2019/10/10 上午10:39, Qu Wenruo wrote:
>>>> Refactor the work inside the loop of btrfs_read_block_groups() into one
>>>> separate function, read_one_block_group().
>>>>
>>>> This allows read_one_block_group to be reused for later BG_TREE feature.
>>>>
>>>> The refactor does the following extra fix:
>>>> - Use btrfs_fs_incompat() to replace open-coded feature check
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>>>> Reviewed-by: Anand Jain <anand.jain@oracle.com>
>>>
>>> Hi David,
>>>
>>> Mind to add this patch to for-next branch?
>>>
>>> Considering the recent changes to struct btrfs_block_group_cache, there
>>> is some considerable conflicts.
>>
>> I see, as the patch is idependent I'll add it.
>>
>>> It would be much easier to solve them sooner than later.
>>> If needed I could send a newer version based on latest for-next branch.
>>
>> I've fixed the conflicts, but please have a look anyway. The change was
>> cache->item to local block group item and rename of found_key to key in
>> read_one_block_group.
> 
> And it crashes during the self-tests, the patch is in branch
> misc-next-with-bg-refactoring in my github tree, please have a look.
> I've removed it from misc-next for now as I need to test for-next, but
> it's probably going to be some trivial typo so the patch will be added
> once it's found. Thanks.
>

Found the cause.

It's missing cache->flags assignment.
There is a line in the original patch:

cache->flags = btrfs_block_group_flags(&cache->item);

But not in the rebased one.

Exactly the same bug I hit when developing the skinny-bg-tree feature.

I'll send out the fix for it, with the removal of unneeded
btrfs_item_key_to_cpu() call.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-11-05  0:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10  2:39 [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time Qu Wenruo
2019-10-10  2:39 ` [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Qu Wenruo
2019-10-10  2:51   ` Anand Jain
2019-10-10  7:20   ` Johannes Thumshirn
2019-10-11 19:23   ` David Sterba
2019-10-10  2:39 ` [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
2019-10-10  2:52   ` Anand Jain
2019-10-30  4:59   ` Qu WenRuo
2019-11-04 19:53     ` David Sterba
2019-11-04 21:44       ` David Sterba
2019-11-05  0:47         ` Qu Wenruo
2019-11-04 19:55   ` David Sterba
2019-10-10  2:39 ` [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time Qu Wenruo
2019-10-10  5:21   ` Naohiro Aota
2019-10-11 13:23   ` Josef Bacik
2019-10-14  9:08   ` Anand Jain
2019-10-10  2:40 ` [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce " Qu WenRuo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).