All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] btrfs: scrub: changes to reduce memory usage for both regular and subpage sectorsize.
@ 2022-08-08  5:45 Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 1/7] btrfs: scrub: use pointer array to replace @sblocks_for_recheck Qu Wenruo
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-08  5:45 UTC (permalink / raw)
  To: linux-btrfs

[Changelog]
v3:
- Move various members from scrub_sector to scrub_block
  This reduce memory usage as now those members are per-block instead of
  per-sector.

- Enlarge blocksize for data extent scrub
  This will greatly benefit from above change.

v2:
- Rebased to latest misc-next
  The conflicts are mostly from the member renaming.
  Has re-ran the tests on both aarch64 64K page size, and x86_64.
  For both scrub and replace groups.


Although btrfs scrub works for subpage from day one, it has a small
pitfall:

  Scrub will always allocate a full page for each sector.

This causes increased memory usage, although not a big deal, it's still
not ideal.

The patchset will change the behavior by integrating all pages into
scrub_block::pages[], instead of using scrub_sector::page.

Now scrub_sector will no longer hold a page pointer, but uses its
logical bytenr to caculate which page and page range it should use.

And since we're here, also move several members from scrub_sector to
scrub_block, and further enlarge the blocksize for data extent scrub.

[MEMORY USAGE CHANGE]

For 4K page systems
The memory usage for scrubbing a 16KiB metadata is:

Old:	240 + 4 * 128 = 752
New:	408 + 4 * 96  = 792
Diff:			+5.3%

This is a slight memory usage increase due to the extra page pointer
array.

The memory usage for scrubbing a 64KiB data is (the best case):
Old:   16 * (240 + 128) = 5888
New:   480 + 16 * 96    = 1944
Diff:			  -67.0%

In this case, the reduce in memory usage is awesome, almost cut 2/3 of
the memory.

The memory usage for scrubbing a 4KiB data is (the worst case):
Old    240 + 128 = 368
New:   408 + 96  = 504
Diff:		   +37.0%

The memory usage for scrubbing a 8KiB data is:
Old:   2 * (240 + 128) = 736
New:   480 + 2 * 96    = 672
Diff:			 -8.7%

So as long as the extent is larger than 4KiB, the new patchset will use
less memory usage, and the larger extent is, the better.

This is especially good as btrfs by default inline small (<=2K) data
extents, thus we have much higher chance to larger data extents.

[PATCHSET STRUCTURE]
The first 3 patches are just cleanups, mostly to make scrub_sector
allocation much easier.

The 4th patch is to introduce the new page array for sblock, and
the 5th one to completely remove the usage of scrub_sector::page.

The last two are optimizations for both regular and subpage cases.

Qu Wenruo (7):
  btrfs: scrub: use pointer array to replace @sblocks_for_recheck
  btrfs: extract the initialization of scrub_block into a helper
    function
  btrfs: extract the allocation and initialization of scrub_sector into
    a helper
  btrfs: scrub: introduce scrub_block::pages for more efficient memory
    usage for subpage
  btrfs: scrub: remove scrub_sector::page and use scrub_block::pages
    instead
  btrfs: scrub: move logical/physical/dev/mirror_num from scrub_sector
    to scrub_block
  btrfs: use larger blocksize for data extent scrub

 fs/btrfs/scrub.c | 546 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 353 insertions(+), 193 deletions(-)

-- 
2.37.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/7] btrfs: scrub: use pointer array to replace @sblocks_for_recheck
  2022-08-08  5:45 [PATCH v3 0/7] btrfs: scrub: changes to reduce memory usage for both regular and subpage sectorsize Qu Wenruo
@ 2022-08-08  5:45 ` Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 2/7] btrfs: extract the initialization of scrub_block into a helper function Qu Wenruo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-08  5:45 UTC (permalink / raw)
  To: linux-btrfs

In function scrub_handle_errored_block(), we use @sblocks_for_recheck
pointer to hold one scrub_block for each mirror, and uses kcalloc() to
allocate an array.

But this one pointer for an array is not really reader friendly.

Just change this pointer to struct scrub_block *[BTRFS_MAX_MIRRORS],
this will slightly increase the stack memory usage.

Since function scrub_handle_errored_block() won't get iterative calls,
this extra cost would completely be acceptable.

And since we're here, also set sblock->refs and use scrub_block_put() to
clean them up, as later we will add extra members in scrub_block, which
needs scrub_block_put() to clean them up.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 102 ++++++++++++++++++++++++++---------------------
 1 file changed, 56 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index d05025034b0a..9beb293ac4b2 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -203,7 +203,7 @@ struct full_stripe_lock {
 };
 
 static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
-				     struct scrub_block *sblocks_for_recheck);
+				     struct scrub_block *sblocks_for_recheck[]);
 static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
 				struct scrub_block *sblock,
 				int retry_failed_mirror);
@@ -817,7 +817,8 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 	unsigned int failed_mirror_index;
 	unsigned int is_metadata;
 	unsigned int have_csum;
-	struct scrub_block *sblocks_for_recheck; /* holds one for each mirror */
+	/* One scrub_block for each mirror */
+	struct scrub_block *sblocks_for_recheck[BTRFS_MAX_MIRRORS] = { 0 };
 	struct scrub_block *sblock_bad;
 	int ret;
 	int mirror_index;
@@ -910,17 +911,29 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 	 * repaired area is verified in order to correctly maintain
 	 * the statistics.
 	 */
-
-	sblocks_for_recheck = kcalloc(BTRFS_MAX_MIRRORS,
-				      sizeof(*sblocks_for_recheck), GFP_KERNEL);
-	if (!sblocks_for_recheck) {
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.malloc_errors++;
-		sctx->stat.read_errors++;
-		sctx->stat.uncorrectable_errors++;
-		spin_unlock(&sctx->stat_lock);
-		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_READ_ERRS);
-		goto out;
+	for (mirror_index = 0; mirror_index < BTRFS_MAX_MIRRORS; mirror_index++) {
+		sblocks_for_recheck[mirror_index] =
+			kzalloc(sizeof(struct scrub_block), GFP_KERNEL);
+		if (!sblocks_for_recheck[mirror_index]) {
+			spin_lock(&sctx->stat_lock);
+			sctx->stat.malloc_errors++;
+			sctx->stat.read_errors++;
+			sctx->stat.uncorrectable_errors++;
+			spin_unlock(&sctx->stat_lock);
+			btrfs_dev_stat_inc_and_print(dev,
+						     BTRFS_DEV_STAT_READ_ERRS);
+			goto out;
+		}
+		/*
+		 * note: the two members refs and outstanding_sectors
+		 * are not used in the blocks that are used for the recheck
+		 * procedure.
+		 *
+		 * But to make the cleanup easier, we just put one ref for
+		 * each sblocks.
+		 */
+		refcount_set(&sblocks_for_recheck[mirror_index]->refs, 1);
+		sblocks_for_recheck[mirror_index]->sctx = sctx;
 	}
 
 	/* Setup the context, map the logical blocks and alloc the sectors */
@@ -934,7 +947,7 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 		goto out;
 	}
 	BUG_ON(failed_mirror_index >= BTRFS_MAX_MIRRORS);
-	sblock_bad = sblocks_for_recheck + failed_mirror_index;
+	sblock_bad = sblocks_for_recheck[failed_mirror_index];
 
 	/* build and submit the bios for the failed mirror, check checksums */
 	scrub_recheck_block(fs_info, sblock_bad, 1);
@@ -1019,21 +1032,21 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 		if (!scrub_is_page_on_raid56(sblock_bad->sectors[0])) {
 			if (mirror_index >= BTRFS_MAX_MIRRORS)
 				break;
-			if (!sblocks_for_recheck[mirror_index].sector_count)
+			if (!sblocks_for_recheck[mirror_index]->sector_count)
 				break;
 
-			sblock_other = sblocks_for_recheck + mirror_index;
+			sblock_other = sblocks_for_recheck[mirror_index];
 		} else {
 			struct scrub_recover *r = sblock_bad->sectors[0]->recover;
 			int max_allowed = r->bioc->num_stripes - r->bioc->num_tgtdevs;
 
 			if (mirror_index >= max_allowed)
 				break;
-			if (!sblocks_for_recheck[1].sector_count)
+			if (!sblocks_for_recheck[1]->sector_count)
 				break;
 
 			ASSERT(failed_mirror_index == 0);
-			sblock_other = sblocks_for_recheck + 1;
+			sblock_other = sblocks_for_recheck[1];
 			sblock_other->sectors[0]->mirror_num = 1 + mirror_index;
 		}
 
@@ -1105,12 +1118,11 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 			/* Try to find no-io-error sector in mirrors */
 			for (mirror_index = 0;
 			     mirror_index < BTRFS_MAX_MIRRORS &&
-			     sblocks_for_recheck[mirror_index].sector_count > 0;
+			     sblocks_for_recheck[mirror_index]->sector_count > 0;
 			     mirror_index++) {
-				if (!sblocks_for_recheck[mirror_index].
+				if (!sblocks_for_recheck[mirror_index]->
 				    sectors[sector_num]->io_error) {
-					sblock_other = sblocks_for_recheck +
-						       mirror_index;
+					sblock_other = sblocks_for_recheck[mirror_index];
 					break;
 				}
 			}
@@ -1184,25 +1196,28 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 	}
 
 out:
-	if (sblocks_for_recheck) {
-		for (mirror_index = 0; mirror_index < BTRFS_MAX_MIRRORS;
-		     mirror_index++) {
-			struct scrub_block *sblock = sblocks_for_recheck +
-						     mirror_index;
-			struct scrub_recover *recover;
-			int i;
-
-			for (i = 0; i < sblock->sector_count; i++) {
-				sblock->sectors[i]->sblock = NULL;
-				recover = sblock->sectors[i]->recover;
-				if (recover) {
-					scrub_put_recover(fs_info, recover);
-					sblock->sectors[i]->recover = NULL;
-				}
-				scrub_sector_put(sblock->sectors[i]);
+	for (mirror_index = 0; mirror_index < BTRFS_MAX_MIRRORS; mirror_index++) {
+		struct scrub_block *sblock = sblocks_for_recheck[mirror_index];
+		struct scrub_recover *recover;
+		int sector_index;
+
+		/* Not allocated, continue checking the next mirror */
+		if (!sblock)
+			continue;
+
+		for (sector_index = 0; sector_index < sblock->sector_count;
+		     sector_index++) {
+			/*
+			 * Here we just cleanup the recover, each sector will be
+			 * properly cleaned up by later scrub_block_put()
+			 */
+			recover = sblock->sectors[sector_index]->recover;
+			if (recover) {
+				scrub_put_recover(fs_info, recover);
+				sblock->sectors[sector_index]->recover = NULL;
 			}
 		}
-		kfree(sblocks_for_recheck);
+		scrub_block_put(sblock);
 	}
 
 	ret = unlock_full_stripe(fs_info, logical, full_stripe_locked);
@@ -1252,7 +1267,7 @@ static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type,
 }
 
 static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
-				     struct scrub_block *sblocks_for_recheck)
+				     struct scrub_block *sblocks_for_recheck[])
 {
 	struct scrub_ctx *sctx = original_sblock->sctx;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
@@ -1272,11 +1287,6 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 	int nmirrors;
 	int ret;
 
-	/*
-	 * Note: the two members refs and outstanding_sectors are not used (and
-	 * not set) in the blocks that are used for the recheck procedure.
-	 */
-
 	while (length > 0) {
 		sublen = min_t(u64, length, fs_info->sectorsize);
 		mapped_length = sublen;
@@ -1315,7 +1325,7 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 			struct scrub_block *sblock;
 			struct scrub_sector *sector;
 
-			sblock = sblocks_for_recheck + mirror_index;
+			sblock = sblocks_for_recheck[mirror_index];
 			sblock->sctx = sctx;
 
 			sector = kzalloc(sizeof(*sector), GFP_NOFS);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/7] btrfs: extract the initialization of scrub_block into a helper function
  2022-08-08  5:45 [PATCH v3 0/7] btrfs: scrub: changes to reduce memory usage for both regular and subpage sectorsize Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 1/7] btrfs: scrub: use pointer array to replace @sblocks_for_recheck Qu Wenruo
@ 2022-08-08  5:45 ` Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 3/7] btrfs: extract the allocation and initialization of scrub_sector into a helper Qu Wenruo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-08  5:45 UTC (permalink / raw)
  To: linux-btrfs

Although there are only two callers, we are going to add some members
for scrub_block in the incoming patches.

Extracting the initialization code will make later expansion easier.

One thing to note is, even scrub_handle_errored_block() doesn't utilize
scrub_block::refs, we still use alloc_scrub_block() to initialize
sblock::ref, allowing us to use scrub_block_put() to do cleanup.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 49 ++++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 9beb293ac4b2..51d8e88a3486 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -202,6 +202,19 @@ struct full_stripe_lock {
 	struct mutex mutex;
 };
 
+static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx)
+{
+	struct scrub_block *sblock;
+
+	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
+	if (!sblock)
+		return NULL;
+	refcount_set(&sblock->refs, 1);
+	sblock->sctx = sctx;
+	sblock->no_io_error_seen = 1;
+	return sblock;
+}
+
 static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 				     struct scrub_block *sblocks_for_recheck[]);
 static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
@@ -912,8 +925,15 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 	 * the statistics.
 	 */
 	for (mirror_index = 0; mirror_index < BTRFS_MAX_MIRRORS; mirror_index++) {
-		sblocks_for_recheck[mirror_index] =
-			kzalloc(sizeof(struct scrub_block), GFP_KERNEL);
+		/*
+		 * Note: the two members refs and outstanding_sectors
+		 * are not used in the blocks that are used for the recheck
+		 * procedure.
+		 *
+		 * But alloc_scrub_block() will initialize sblock::ref anyway,
+		 * so we can use scrub_block_put() to clean them up.
+		 */
+		sblocks_for_recheck[mirror_index] = alloc_scrub_block(sctx);
 		if (!sblocks_for_recheck[mirror_index]) {
 			spin_lock(&sctx->stat_lock);
 			sctx->stat.malloc_errors++;
@@ -924,16 +944,6 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 						     BTRFS_DEV_STAT_READ_ERRS);
 			goto out;
 		}
-		/*
-		 * note: the two members refs and outstanding_sectors
-		 * are not used in the blocks that are used for the recheck
-		 * procedure.
-		 *
-		 * But to make the cleanup easier, we just put one ref for
-		 * each sblocks.
-		 */
-		refcount_set(&sblocks_for_recheck[mirror_index]->refs, 1);
-		sblocks_for_recheck[mirror_index]->sctx = sctx;
 	}
 
 	/* Setup the context, map the logical blocks and alloc the sectors */
@@ -2226,7 +2236,7 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
 	const u32 sectorsize = sctx->fs_info->sectorsize;
 	int index;
 
-	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
+	sblock = alloc_scrub_block(sctx);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
@@ -2234,12 +2244,6 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
 		return -ENOMEM;
 	}
 
-	/* one ref inside this function, plus one for each page added to
-	 * a bio later on */
-	refcount_set(&sblock->refs, 1);
-	sblock->sctx = sctx;
-	sblock->no_io_error_seen = 1;
-
 	for (index = 0; len > 0; index++) {
 		struct scrub_sector *sector;
 		/*
@@ -2579,7 +2583,7 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 
 	ASSERT(IS_ALIGNED(len, sectorsize));
 
-	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
+	sblock = alloc_scrub_block(sctx);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
@@ -2587,11 +2591,6 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 		return -ENOMEM;
 	}
 
-	/* one ref inside this function, plus one for each page added to
-	 * a bio later on */
-	refcount_set(&sblock->refs, 1);
-	sblock->sctx = sctx;
-	sblock->no_io_error_seen = 1;
 	sblock->sparity = sparity;
 	scrub_parity_get(sparity);
 
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/7] btrfs: extract the allocation and initialization of scrub_sector into a helper
  2022-08-08  5:45 [PATCH v3 0/7] btrfs: scrub: changes to reduce memory usage for both regular and subpage sectorsize Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 1/7] btrfs: scrub: use pointer array to replace @sblocks_for_recheck Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 2/7] btrfs: extract the initialization of scrub_block into a helper function Qu Wenruo
@ 2022-08-08  5:45 ` Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 4/7] btrfs: scrub: introduce scrub_block::pages for more efficient memory usage for subpage Qu Wenruo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-08  5:45 UTC (permalink / raw)
  To: linux-btrfs

The allocation and initialization is shared by 3 call sites, and we're
going to change the initialization of some members in the upcoming
patches.

So extra the allocation and initialization of scrub_sector into a
helper, alloc_scrub_sector(), which will do the following work:

- Allocate the memory for scrub_sector

- Allocate a page for scrub_sector::page

- Initialize scrub_sector::refs to 1

- Attach the allocated scrub_sector to scrub_block
  The attachment is bidirectional, which means scrub_block::sectorv[]
  will be updated and scrub_sector::sblock will also be updated.

- Update scrub_block::sector_count and do extra sanity check on it

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 63 ++++++++++++++++++++++++------------------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 51d8e88a3486..d51925403eef 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -215,6 +215,33 @@ static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx)
 	return sblock;
 }
 
+/* Allocate a new scrub sector and attach it to @sblock */
+static struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock,
+					       gfp_t gfp)
+{
+	struct scrub_sector *ssector;
+
+	ssector = kzalloc(sizeof(*ssector), gfp);
+	if (!ssector)
+		return NULL;
+	ssector->page = alloc_page(gfp);
+	if (!ssector->page) {
+		kfree(ssector);
+		return NULL;
+	}
+	atomic_set(&ssector->refs, 1);
+	ssector->sblock = sblock;
+	/* This sector to be added should not be used */
+	ASSERT(sblock->sectors[sblock->sector_count] == NULL);
+	/* And the sector count should be smaller than the limit */
+	ASSERT(sblock->sector_count < SCRUB_MAX_SECTORS_PER_BLOCK);
+
+	sblock->sectors[sblock->sector_count] = ssector;
+	sblock->sector_count++;
+
+	return ssector;
+}
+
 static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 				     struct scrub_block *sblocks_for_recheck[]);
 static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
@@ -1338,18 +1365,14 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 			sblock = sblocks_for_recheck[mirror_index];
 			sblock->sctx = sctx;
 
-			sector = kzalloc(sizeof(*sector), GFP_NOFS);
+			sector = alloc_scrub_sector(sblock, GFP_NOFS);
 			if (!sector) {
-leave_nomem:
 				spin_lock(&sctx->stat_lock);
 				sctx->stat.malloc_errors++;
 				spin_unlock(&sctx->stat_lock);
 				scrub_put_recover(fs_info, recover);
 				return -ENOMEM;
 			}
-			scrub_sector_get(sector);
-			sblock->sectors[sector_index] = sector;
-			sector->sblock = sblock;
 			sector->flags = flags;
 			sector->generation = generation;
 			sector->logical = logical;
@@ -1375,13 +1398,8 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 			sector->physical_for_dev_replace =
 				original_sblock->sectors[sector_index]->
 				physical_for_dev_replace;
-			/* For missing devices, dev->bdev is NULL */
+			/* for missing devices, dev->bdev is NULL */
 			sector->mirror_num = mirror_index + 1;
-			sblock->sector_count++;
-			sector->page = alloc_page(GFP_NOFS);
-			if (!sector->page)
-				goto leave_nomem;
-
 			scrub_get_recover(recover);
 			sector->recover = recover;
 		}
@@ -2253,19 +2271,14 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
 		 */
 		u32 l = min(sectorsize, len);
 
-		sector = kzalloc(sizeof(*sector), GFP_KERNEL);
+		sector = alloc_scrub_sector(sblock, GFP_KERNEL);
 		if (!sector) {
-leave_nomem:
 			spin_lock(&sctx->stat_lock);
 			sctx->stat.malloc_errors++;
 			spin_unlock(&sctx->stat_lock);
 			scrub_block_put(sblock);
 			return -ENOMEM;
 		}
-		ASSERT(index < SCRUB_MAX_SECTORS_PER_BLOCK);
-		scrub_sector_get(sector);
-		sblock->sectors[index] = sector;
-		sector->sblock = sblock;
 		sector->dev = dev;
 		sector->flags = flags;
 		sector->generation = gen;
@@ -2279,10 +2292,6 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
 		} else {
 			sector->have_csum = 0;
 		}
-		sblock->sector_count++;
-		sector->page = alloc_page(GFP_KERNEL);
-		if (!sector->page)
-			goto leave_nomem;
 		len -= l;
 		logical += l;
 		physical += l;
@@ -2597,23 +2606,18 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 	for (index = 0; len > 0; index++) {
 		struct scrub_sector *sector;
 
-		sector = kzalloc(sizeof(*sector), GFP_KERNEL);
+		sector = alloc_scrub_sector(sblock, GFP_KERNEL);
 		if (!sector) {
-leave_nomem:
 			spin_lock(&sctx->stat_lock);
 			sctx->stat.malloc_errors++;
 			spin_unlock(&sctx->stat_lock);
 			scrub_block_put(sblock);
 			return -ENOMEM;
 		}
-		ASSERT(index < SCRUB_MAX_SECTORS_PER_BLOCK);
-		/* For scrub block */
-		scrub_sector_get(sector);
 		sblock->sectors[index] = sector;
 		/* For scrub parity */
 		scrub_sector_get(sector);
 		list_add_tail(&sector->list, &sparity->sectors_list);
-		sector->sblock = sblock;
 		sector->dev = dev;
 		sector->flags = flags;
 		sector->generation = gen;
@@ -2626,11 +2630,6 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 		} else {
 			sector->have_csum = 0;
 		}
-		sblock->sector_count++;
-		sector->page = alloc_page(GFP_KERNEL);
-		if (!sector->page)
-			goto leave_nomem;
-
 
 		/* Iterate over the stripe range in sectorsize steps */
 		len -= sectorsize;
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 4/7] btrfs: scrub: introduce scrub_block::pages for more efficient memory usage for subpage
  2022-08-08  5:45 [PATCH v3 0/7] btrfs: scrub: changes to reduce memory usage for both regular and subpage sectorsize Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-08-08  5:45 ` [PATCH v3 3/7] btrfs: extract the allocation and initialization of scrub_sector into a helper Qu Wenruo
@ 2022-08-08  5:45 ` Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 5/7] btrfs: scrub: remove scrub_sector::page and use scrub_block::pages instead Qu Wenruo
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-08  5:45 UTC (permalink / raw)
  To: linux-btrfs

[BACKGROUND]
Currently for scrub, we allocate one page for one sector, this is fine
for PAGE_SIZE == sectorsize support, but can waste extra memory for
subpage support.

[CODE CHANGE]
So this patch will make scrub_block to contain all the pages, so if
we're scrubing an extent sized 64K, and our page size is also 64K, we
only need to allocate one page.

[LIFESPAN CHANGE]
Since now scrub_sector no longer holds a page, but using
scrub_block::pages[] instead, we have to ensure scrub_block has a longer
lifespan for write bio.

(The lifespan for read bio is already large enough)

Now scrub_block will only be released after the write bio finished.

[COMING NEXT]
Currently we only added scrub_block::pages[] for this purpose, but
scrub_sector is still utilizing the old scrub_sector::page.

The switch will happen in the next patch.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 137 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 115 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index d51925403eef..8e4ea78da1b1 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -54,6 +54,8 @@ struct scrub_ctx;
  */
 #define SCRUB_MAX_SECTORS_PER_BLOCK	(BTRFS_MAX_METADATA_BLOCKSIZE / SZ_4K)
 
+#define SCRUB_MAX_PAGES			(DIV_ROUND_UP(BTRFS_MAX_METADATA_BLOCKSIZE, PAGE_SIZE))
+
 struct scrub_recover {
 	refcount_t		refs;
 	struct btrfs_io_context	*bioc;
@@ -94,8 +96,16 @@ struct scrub_bio {
 };
 
 struct scrub_block {
+	/*
+	 * Each page will has its page::private used to record the logical
+	 * bytenr.
+	 */
+	struct page		*pages[SCRUB_MAX_PAGES];
 	struct scrub_sector	*sectors[SCRUB_MAX_SECTORS_PER_BLOCK];
+	u64			logical; /* Logical bytenr of the sblock */
+	u32			len; /* The length of sblock in bytes */
 	int			sector_count;
+
 	atomic_t		outstanding_sectors;
 	refcount_t		refs; /* free mem on transition to zero */
 	struct scrub_ctx	*sctx;
@@ -202,7 +212,46 @@ struct full_stripe_lock {
 	struct mutex mutex;
 };
 
-static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx)
+#ifndef CONFIG_64BIT
+/* This structure is for archtectures whose (void *) is smaller than u64 */
+struct scrub_page_private {
+	u64 logical;
+};
+#endif
+
+static int attach_scrub_page_private(struct page *page, u64 logical)
+{
+#ifdef CONFIG_64BIT
+	attach_page_private(page, (void *)logical);
+	return 0;
+#else
+	struct scrub_page_private *spp;
+
+	spp = kmalloc(sizeof(*spp), GFP_KERNEL);
+	if (!spp)
+		return -ENOMEM;
+	spp->logical = logical;
+	attach_page_private(page, (void *)spp);
+	return 0;
+#endif
+}
+
+static void detach_scrub_page_private(struct page *page)
+{
+#ifdef CONFIG_64BIT
+	detach_page_private(page);
+	return;
+#else
+	struct scrub_page_private *spp;
+
+	spp = detach_page_private(page);
+	kfree(spp);
+	return;
+#endif
+}
+
+static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx,
+					     u64 logical)
 {
 	struct scrub_block *sblock;
 
@@ -211,28 +260,55 @@ static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx)
 		return NULL;
 	refcount_set(&sblock->refs, 1);
 	sblock->sctx = sctx;
+	sblock->logical = logical;
 	sblock->no_io_error_seen = 1;
+	/*
+	 * Scrub_block::pages will be allocated at alloc_scrub_sector() when
+	 * the corresponding page is not allocated.
+	 */
 	return sblock;
 }
 
-/* Allocate a new scrub sector and attach it to @sblock */
+/*
+ * Allocate a new scrub sector and attach it to @sblock.
+ *
+ * Will also allocate new pages for @sblock if needed.
+ */
 static struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock,
-					       gfp_t gfp)
+					       u64 logical, gfp_t gfp)
 {
+	const int page_index = (logical - sblock->logical) >> PAGE_SHIFT;
 	struct scrub_sector *ssector;
 
 	ssector = kzalloc(sizeof(*ssector), gfp);
 	if (!ssector)
 		return NULL;
-	ssector->page = alloc_page(gfp);
-	if (!ssector->page) {
-		kfree(ssector);
-		return NULL;
+
+	/* Allocate a new page if the slot is not allocated*/
+	if (!sblock->pages[page_index]) {
+		int ret;
+
+		sblock->pages[page_index] = alloc_page(gfp);
+		if (!sblock->pages[page_index]) {
+			kfree(ssector);
+			return NULL;
+		}
+		ret = attach_scrub_page_private(sblock->pages[page_index],
+				sblock->logical + (page_index << PAGE_SHIFT));
+		if (ret < 0) {
+			kfree(ssector);
+			__free_page(sblock->pages[page_index]);
+			sblock->pages[page_index] = NULL;
+			return NULL;
+		}
 	}
+
 	atomic_set(&ssector->refs, 1);
 	ssector->sblock = sblock;
 	/* This sector to be added should not be used */
 	ASSERT(sblock->sectors[sblock->sector_count] == NULL);
+	ssector->logical = logical;
+
 	/* And the sector count should be smaller than the limit */
 	ASSERT(sblock->sector_count < SCRUB_MAX_SECTORS_PER_BLOCK);
 
@@ -960,7 +1036,8 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 		 * But alloc_scrub_block() will initialize sblock::ref anyway,
 		 * so we can use scrub_block_put() to clean them up.
 		 */
-		sblocks_for_recheck[mirror_index] = alloc_scrub_block(sctx);
+		sblocks_for_recheck[mirror_index] = alloc_scrub_block(sctx,
+								      logical);
 		if (!sblocks_for_recheck[mirror_index]) {
 			spin_lock(&sctx->stat_lock);
 			sctx->stat.malloc_errors++;
@@ -1365,7 +1442,7 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 			sblock = sblocks_for_recheck[mirror_index];
 			sblock->sctx = sctx;
 
-			sector = alloc_scrub_sector(sblock, GFP_NOFS);
+			sector = alloc_scrub_sector(sblock, logical, GFP_NOFS);
 			if (!sector) {
 				spin_lock(&sctx->stat_lock);
 				sctx->stat.malloc_errors++;
@@ -1375,7 +1452,6 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 			}
 			sector->flags = flags;
 			sector->generation = generation;
-			sector->logical = logical;
 			sector->have_csum = have_csum;
 			if (have_csum)
 				memcpy(sector->csum,
@@ -1654,6 +1730,11 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical)
 	return ret;
 }
 
+static void scrub_block_get(struct scrub_block *sblock)
+{
+	refcount_inc(&sblock->refs);
+}
+
 static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx,
 				      struct scrub_sector *sector)
 {
@@ -1714,6 +1795,13 @@ static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx,
 
 	sbio->sectors[sbio->sector_count] = sector;
 	scrub_sector_get(sector);
+	/*
+	 * Since ssector no longer holds a page, but uses sblock::pages, we
+	 * have to ensure the sblock didn't get freed before our write bio
+	 * finished.
+	 */
+	scrub_block_get(sector->sblock);
+
 	sbio->sector_count++;
 	if (sbio->sector_count == sctx->sectors_per_bio)
 		scrub_wr_submit(sctx);
@@ -1775,8 +1863,14 @@ static void scrub_wr_bio_end_io_worker(struct work_struct *work)
 		}
 	}
 
-	for (i = 0; i < sbio->sector_count; i++)
+	/*
+	 * In scrub_add_sector_to_wr_bio() we grab extra ref for sblock,
+	 * now in endio we should put the sblock.
+	 */
+	for (i = 0; i < sbio->sector_count; i++) {
+		scrub_block_put(sbio->sectors[i]->sblock);
 		scrub_sector_put(sbio->sectors[i]);
+	}
 
 	bio_put(sbio->bio);
 	kfree(sbio);
@@ -1950,11 +2044,6 @@ static int scrub_checksum_super(struct scrub_block *sblock)
 	return fail_cor + fail_gen;
 }
 
-static void scrub_block_get(struct scrub_block *sblock)
-{
-	refcount_inc(&sblock->refs);
-}
-
 static void scrub_block_put(struct scrub_block *sblock)
 {
 	if (refcount_dec_and_test(&sblock->refs)) {
@@ -1965,6 +2054,12 @@ static void scrub_block_put(struct scrub_block *sblock)
 
 		for (i = 0; i < sblock->sector_count; i++)
 			scrub_sector_put(sblock->sectors[i]);
+		for (i = 0; i < DIV_ROUND_UP(sblock->len, PAGE_SIZE); i++) {
+			if (sblock->pages[i]) {
+				detach_scrub_page_private(sblock->pages[i]);
+				__free_page(sblock->pages[i]);
+			}
+		}
 		kfree(sblock);
 	}
 }
@@ -2254,7 +2349,7 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
 	const u32 sectorsize = sctx->fs_info->sectorsize;
 	int index;
 
-	sblock = alloc_scrub_block(sctx);
+	sblock = alloc_scrub_block(sctx, logical);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
@@ -2271,7 +2366,7 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
 		 */
 		u32 l = min(sectorsize, len);
 
-		sector = alloc_scrub_sector(sblock, GFP_KERNEL);
+		sector = alloc_scrub_sector(sblock, logical, GFP_KERNEL);
 		if (!sector) {
 			spin_lock(&sctx->stat_lock);
 			sctx->stat.malloc_errors++;
@@ -2282,7 +2377,6 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
 		sector->dev = dev;
 		sector->flags = flags;
 		sector->generation = gen;
-		sector->logical = logical;
 		sector->physical = physical;
 		sector->physical_for_dev_replace = physical_for_dev_replace;
 		sector->mirror_num = mirror_num;
@@ -2592,7 +2686,7 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 
 	ASSERT(IS_ALIGNED(len, sectorsize));
 
-	sblock = alloc_scrub_block(sctx);
+	sblock = alloc_scrub_block(sctx, logical);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
@@ -2606,7 +2700,7 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 	for (index = 0; len > 0; index++) {
 		struct scrub_sector *sector;
 
-		sector = alloc_scrub_sector(sblock, GFP_KERNEL);
+		sector = alloc_scrub_sector(sblock, logical, GFP_KERNEL);
 		if (!sector) {
 			spin_lock(&sctx->stat_lock);
 			sctx->stat.malloc_errors++;
@@ -2621,7 +2715,6 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 		sector->dev = dev;
 		sector->flags = flags;
 		sector->generation = gen;
-		sector->logical = logical;
 		sector->physical = physical;
 		sector->mirror_num = mirror_num;
 		if (csum) {
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 5/7] btrfs: scrub: remove scrub_sector::page and use scrub_block::pages instead
  2022-08-08  5:45 [PATCH v3 0/7] btrfs: scrub: changes to reduce memory usage for both regular and subpage sectorsize Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-08-08  5:45 ` [PATCH v3 4/7] btrfs: scrub: introduce scrub_block::pages for more efficient memory usage for subpage Qu Wenruo
@ 2022-08-08  5:45 ` Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 6/7] btrfs: scrub: move logical/physical/dev/mirror_num from scrub_sector to scrub_block Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 7/7] btrfs: use larger blocksize for data extent scrub Qu Wenruo
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-08  5:45 UTC (permalink / raw)
  To: linux-btrfs

Although scrub currently works for subpage (PAGE_SIZE > sectorsize) cases,
it will allocate one page for each scrub_sector, which can cause extra
unnecessary memory usage.

This patch will utilize scrub_block::pages[] instead of allocating page
for each scrub_sector, this allows us to integrate larger extents while
use less memory.

For example, if our page size is 64K, sectorsize is 4K, and we got an
32K sized extent.
We will only allocated one page for scrub_block, and all 8 scrub sectors
will point to that page.

To do that properly, here we introduce several small helpers:

- scrub_page_get_logical()
  Get the logical bytenr of a page.
  We store the logical bytenr of the page range into page::private.
  But for 32bit systems, their (void *) is not large enough to contain
  a u64, so in that case we will need to allocate extra memory for it.

  For 64bit systems, we can use page::private directly.

- scrub_block_get_logical()
  Just get the logical bytenr of the first page.

- scrub_sector_get_page()
  Return the page which the scrub_sector points to.

- scrub_sector_get_page_offset()
  Return the offset inside the page which the scrub_sector points to.

- scrub_sector_get_kaddr()
  Return the address which the scrub_sector points to.
  Just a wrapper using scrub_sector_get_page() and
  scrub_sector_get_page_offset()

- bio_add_scrub_sector()

Please note that, even with this patch, we're still allocating one page
for one sector for data extents.

This is because in scrub_extent() we split the data extent using
sectorsize.

The memory usage reduce will need extra work to make scrub to work like
data read to only use the correct sector(s).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 97 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 65 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 8e4ea78da1b1..f359be9b42a7 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -64,7 +64,6 @@ struct scrub_recover {
 
 struct scrub_sector {
 	struct scrub_block	*sblock;
-	struct page		*page;
 	struct btrfs_device	*dev;
 	struct list_head	list;
 	u64			flags;  /* extent flags */
@@ -314,10 +313,59 @@ static struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock,
 
 	sblock->sectors[sblock->sector_count] = ssector;
 	sblock->sector_count++;
+	sblock->len += sblock->sctx->fs_info->sectorsize;
 
 	return ssector;
 }
 
+static struct page *scrub_sector_get_page(struct scrub_sector *ssector)
+{
+	struct scrub_block *sblock = ssector->sblock;
+	int index;
+	/*
+	 * When calling this function, ssector should have been attached to
+	 * the parent sblock.
+	 */
+	ASSERT(sblock);
+
+	/* The range should be inside the sblock range */
+	ASSERT(ssector->logical - sblock->logical < sblock->len);
+
+	index = (ssector->logical - sblock->logical) >> PAGE_SHIFT;
+	ASSERT(index < SCRUB_MAX_PAGES);
+	ASSERT(sblock->pages[index]);
+	ASSERT(PagePrivate(sblock->pages[index]));
+	return sblock->pages[index];
+}
+
+static unsigned int scrub_sector_get_page_offset(struct scrub_sector *ssector)
+{
+	struct scrub_block *sblock = ssector->sblock;
+	/*
+	 * When calling this function, ssector should have been attached to
+	 * the parent sblock.
+	 */
+	ASSERT(sblock);
+
+	/* The range should be inside the sblock range */
+	ASSERT(ssector->logical - sblock->logical < sblock->len);
+
+	return offset_in_page(ssector->logical - sblock->logical);
+}
+
+static char *scrub_sector_get_kaddr(struct scrub_sector *ssector)
+{
+	return page_address(scrub_sector_get_page(ssector)) +
+	       scrub_sector_get_page_offset(ssector);
+}
+
+static int bio_add_scrub_sector(struct bio *bio, struct scrub_sector *ssector,
+				 unsigned int len)
+{
+	return bio_add_page(bio, scrub_sector_get_page(ssector), len,
+			   scrub_sector_get_page_offset(ssector));
+}
+
 static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 				     struct scrub_block *sblocks_for_recheck[]);
 static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
@@ -649,10 +697,8 @@ static noinline_for_stack void scrub_free_ctx(struct scrub_ctx *sctx)
 	if (sctx->curr != -1) {
 		struct scrub_bio *sbio = sctx->bios[sctx->curr];
 
-		for (i = 0; i < sbio->sector_count; i++) {
-			WARN_ON(!sbio->sectors[i]->page);
+		for (i = 0; i < sbio->sector_count; i++)
 			scrub_block_put(sbio->sectors[i]->sblock);
-		}
 		bio_put(sbio->bio);
 	}
 
@@ -1526,8 +1572,7 @@ static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info,
 	for (i = 0; i < sblock->sector_count; i++) {
 		struct scrub_sector *sector = sblock->sectors[i];
 
-		WARN_ON(!sector->page);
-		bio_add_page(bio, sector->page, PAGE_SIZE, 0);
+		bio_add_scrub_sector(bio, sector, fs_info->sectorsize);
 	}
 
 	if (scrub_submit_raid56_bio_wait(fs_info, bio, first_sector)) {
@@ -1577,9 +1622,8 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
 			continue;
 		}
 
-		WARN_ON(!sector->page);
 		bio_init(&bio, sector->dev->bdev, &bvec, 1, REQ_OP_READ);
-		bio_add_page(&bio, sector->page, fs_info->sectorsize, 0);
+		bio_add_scrub_sector(&bio, sector, fs_info->sectorsize);
 		bio.bi_iter.bi_sector = sector->physical >> 9;
 
 		btrfsic_check_bio(&bio);
@@ -1643,8 +1687,6 @@ static int scrub_repair_sector_from_good_copy(struct scrub_block *sblock_bad,
 	struct btrfs_fs_info *fs_info = sblock_bad->sctx->fs_info;
 	const u32 sectorsize = fs_info->sectorsize;
 
-	BUG_ON(sector_bad->page == NULL);
-	BUG_ON(sector_good->page == NULL);
 	if (force_write || sblock_bad->header_error ||
 	    sblock_bad->checksum_error || sector_bad->io_error) {
 		struct bio bio;
@@ -1659,7 +1701,7 @@ static int scrub_repair_sector_from_good_copy(struct scrub_block *sblock_bad,
 
 		bio_init(&bio, sector_bad->dev->bdev, &bvec, 1, REQ_OP_WRITE);
 		bio.bi_iter.bi_sector = sector_bad->physical >> 9;
-		__bio_add_page(&bio, sector_good->page, sectorsize, 0);
+		ret = bio_add_scrub_sector(&bio, sector_good, sectorsize);
 
 		btrfsic_check_bio(&bio);
 		ret = submit_bio_wait(&bio);
@@ -1699,11 +1741,11 @@ static void scrub_write_block_to_dev_replace(struct scrub_block *sblock)
 
 static int scrub_write_sector_to_dev_replace(struct scrub_block *sblock, int sector_num)
 {
+	const u32 sectorsize = sblock->sctx->fs_info->sectorsize;
 	struct scrub_sector *sector = sblock->sectors[sector_num];
 
-	BUG_ON(sector->page == NULL);
 	if (sector->io_error)
-		clear_page(page_address(sector->page));
+		memset(scrub_sector_get_kaddr(sector), 0, sectorsize);
 
 	return scrub_add_sector_to_wr_bio(sblock->sctx, sector);
 }
@@ -1781,7 +1823,7 @@ static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx,
 		goto again;
 	}
 
-	ret = bio_add_page(sbio->bio, sector->page, sectorsize, 0);
+	ret = bio_add_scrub_sector(sbio->bio, sector, sectorsize);
 	if (ret != sectorsize) {
 		if (sbio->sector_count < 1) {
 			bio_put(sbio->bio);
@@ -1925,15 +1967,11 @@ static int scrub_checksum_data(struct scrub_block *sblock)
 	if (!sector->have_csum)
 		return 0;
 
-	kaddr = page_address(sector->page);
+	kaddr = scrub_sector_get_kaddr(sector);
 
 	shash->tfm = fs_info->csum_shash;
 	crypto_shash_init(shash);
 
-	/*
-	 * In scrub_sectors() and scrub_sectors_for_parity() we ensure each sector
-	 * only contains one sector of data.
-	 */
 	crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum);
 
 	if (memcmp(csum, sector->csum, fs_info->csum_size))
@@ -1966,7 +2004,7 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
 	ASSERT(sblock->sector_count == num_sectors);
 
 	sector = sblock->sectors[0];
-	kaddr = page_address(sector->page);
+	kaddr = scrub_sector_get_kaddr(sector);
 	h = (struct btrfs_header *)kaddr;
 	memcpy(on_disk_csum, h->csum, sctx->fs_info->csum_size);
 
@@ -1996,7 +2034,7 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
 			    sectorsize - BTRFS_CSUM_SIZE);
 
 	for (i = 1; i < num_sectors; i++) {
-		kaddr = page_address(sblock->sectors[i]->page);
+		kaddr = scrub_sector_get_kaddr(sblock->sectors[i]);
 		crypto_shash_update(shash, kaddr, sectorsize);
 	}
 
@@ -2021,7 +2059,7 @@ static int scrub_checksum_super(struct scrub_block *sblock)
 
 	BUG_ON(sblock->sector_count < 1);
 	sector = sblock->sectors[0];
-	kaddr = page_address(sector->page);
+	kaddr = scrub_sector_get_kaddr(sector);
 	s = (struct btrfs_super_block *)kaddr;
 
 	if (sector->logical != btrfs_super_bytenr(s))
@@ -2071,11 +2109,8 @@ static void scrub_sector_get(struct scrub_sector *sector)
 
 static void scrub_sector_put(struct scrub_sector *sector)
 {
-	if (atomic_dec_and_test(&sector->refs)) {
-		if (sector->page)
-			__free_page(sector->page);
+	if (atomic_dec_and_test(&sector->refs))
 		kfree(sector);
-	}
 }
 
 /*
@@ -2201,7 +2236,7 @@ static int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx,
 	}
 
 	sbio->sectors[sbio->sector_count] = sector;
-	ret = bio_add_page(sbio->bio, sector->page, sectorsize, 0);
+	ret = bio_add_scrub_sector(sbio->bio, sector, sectorsize);
 	if (ret != sectorsize) {
 		if (sbio->sector_count < 1) {
 			bio_put(sbio->bio);
@@ -2317,11 +2352,9 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 	for (i = 0; i < sblock->sector_count; i++) {
 		struct scrub_sector *sector = sblock->sectors[i];
 
-		/*
-		 * For now, our scrub is still one page per sector, so pgoff
-		 * is always 0.
-		 */
-		raid56_add_scrub_pages(rbio, sector->page, 0, sector->logical);
+		raid56_add_scrub_pages(rbio, scrub_sector_get_page(sector),
+				       scrub_sector_get_page_offset(sector),
+				       sector->logical);
 	}
 
 	INIT_WORK(&sblock->work, scrub_missing_raid56_worker);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 6/7] btrfs: scrub: move logical/physical/dev/mirror_num from scrub_sector to scrub_block
  2022-08-08  5:45 [PATCH v3 0/7] btrfs: scrub: changes to reduce memory usage for both regular and subpage sectorsize Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-08-08  5:45 ` [PATCH v3 5/7] btrfs: scrub: remove scrub_sector::page and use scrub_block::pages instead Qu Wenruo
@ 2022-08-08  5:45 ` Qu Wenruo
  2022-08-08  5:45 ` [PATCH v3 7/7] btrfs: use larger blocksize for data extent scrub Qu Wenruo
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-08  5:45 UTC (permalink / raw)
  To: linux-btrfs

Currently we store the following members in scrub_sector:

- logical
- physical
- physical_for_dev_replace
- dev
- mirror_num

However the current scrub code has ensured that scrub_blocks never cross
stripe boundary.
This is caused by the entrace functions (scrub_simple_mirror,
scrub_simple_stripe), thus every scrub_block will not cross stripe
boundary.

Thus this makes it possible to move those members into scrub_block other
than putting them into scrub_sector.

This should save quite some memory, as a scrub_block can be as large as 64
sectors, even for metadata it's 16 sectors byte default.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 164 ++++++++++++++++++++++++++---------------------
 1 file changed, 92 insertions(+), 72 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index f359be9b42a7..6c5be7cf00d0 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -64,15 +64,11 @@ struct scrub_recover {
 
 struct scrub_sector {
 	struct scrub_block	*sblock;
-	struct btrfs_device	*dev;
 	struct list_head	list;
 	u64			flags;  /* extent flags */
 	u64			generation;
-	u64			logical;
-	u64			physical;
-	u64			physical_for_dev_replace;
+	u32			offset; /* offset in bytes to @sblock. */
 	atomic_t		refs;
-	u8			mirror_num;
 	unsigned int		have_csum:1;
 	unsigned int		io_error:1;
 	u8			csum[BTRFS_CSUM_SIZE];
@@ -101,9 +97,13 @@ struct scrub_block {
 	 */
 	struct page		*pages[SCRUB_MAX_PAGES];
 	struct scrub_sector	*sectors[SCRUB_MAX_SECTORS_PER_BLOCK];
+	struct btrfs_device	*dev;
 	u64			logical; /* Logical bytenr of the sblock */
+	u64			physical;
+	u64			physical_for_dev_replace;
 	u32			len; /* The length of sblock in bytes */
 	int			sector_count;
+	int			mirror_num;
 
 	atomic_t		outstanding_sectors;
 	refcount_t		refs; /* free mem on transition to zero */
@@ -250,7 +250,10 @@ static void detach_scrub_page_private(struct page *page)
 }
 
 static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx,
-					     u64 logical)
+					     struct btrfs_device *dev,
+					     u64 logical, u64 physical,
+					     u64 physical_for_dev_replace,
+					     int mirror_num)
 {
 	struct scrub_block *sblock;
 
@@ -260,6 +263,10 @@ static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx,
 	refcount_set(&sblock->refs, 1);
 	sblock->sctx = sctx;
 	sblock->logical = logical;
+	sblock->physical = physical;
+	sblock->physical_for_dev_replace = physical_for_dev_replace;
+	sblock->dev = dev;
+	sblock->mirror_num = mirror_num;
 	sblock->no_io_error_seen = 1;
 	/*
 	 * Scrub_block::pages will be allocated at alloc_scrub_sector() when
@@ -279,6 +286,9 @@ static struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock,
 	const int page_index = (logical - sblock->logical) >> PAGE_SHIFT;
 	struct scrub_sector *ssector;
 
+	/* We should never have scrub_block exceed U32_MAX in size.*/
+	ASSERT(logical - sblock->logical < U32_MAX);
+
 	ssector = kzalloc(sizeof(*ssector), gfp);
 	if (!ssector)
 		return NULL;
@@ -306,7 +316,7 @@ static struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock,
 	ssector->sblock = sblock;
 	/* This sector to be added should not be used */
 	ASSERT(sblock->sectors[sblock->sector_count] == NULL);
-	ssector->logical = logical;
+	ssector->offset = logical - sblock->logical;
 
 	/* And the sector count should be smaller than the limit */
 	ASSERT(sblock->sector_count < SCRUB_MAX_SECTORS_PER_BLOCK);
@@ -329,9 +339,9 @@ static struct page *scrub_sector_get_page(struct scrub_sector *ssector)
 	ASSERT(sblock);
 
 	/* The range should be inside the sblock range */
-	ASSERT(ssector->logical - sblock->logical < sblock->len);
+	ASSERT(ssector->offset < sblock->len);
 
-	index = (ssector->logical - sblock->logical) >> PAGE_SHIFT;
+	index = ssector->offset >> PAGE_SHIFT;
 	ASSERT(index < SCRUB_MAX_PAGES);
 	ASSERT(sblock->pages[index]);
 	ASSERT(PagePrivate(sblock->pages[index]));
@@ -348,9 +358,9 @@ static unsigned int scrub_sector_get_page_offset(struct scrub_sector *ssector)
 	ASSERT(sblock);
 
 	/* The range should be inside the sblock range */
-	ASSERT(ssector->logical - sblock->logical < sblock->len);
+	ASSERT(ssector->offset < sblock->len);
 
-	return offset_in_page(ssector->logical - sblock->logical);
+	return offset_in_page(ssector->offset);
 }
 
 static char *scrub_sector_get_kaddr(struct scrub_sector *ssector)
@@ -888,22 +898,22 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
 	int ret;
 
 	WARN_ON(sblock->sector_count < 1);
-	dev = sblock->sectors[0]->dev;
+	dev = sblock->dev;
 	fs_info = sblock->sctx->fs_info;
 
 	/* Super block error, no need to search extent tree. */
 	if (sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER) {
 		btrfs_warn_in_rcu(fs_info, "%s on device %s, physical %llu",
 			errstr, rcu_str_deref(dev->name),
-			sblock->sectors[0]->physical);
+			sblock->physical);
 		return;
 	}
 	path = btrfs_alloc_path();
 	if (!path)
 		return;
 
-	swarn.physical = sblock->sectors[0]->physical;
-	swarn.logical = sblock->sectors[0]->logical;
+	swarn.physical = sblock->physical;
+	swarn.logical = sblock->logical;
 	swarn.errstr = errstr;
 	swarn.dev = NULL;
 
@@ -973,7 +983,7 @@ static inline void scrub_put_recover(struct btrfs_fs_info *fs_info,
 static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 {
 	struct scrub_ctx *sctx = sblock_to_check->sctx;
-	struct btrfs_device *dev = sblock_to_check->sectors[0]->dev;
+	struct btrfs_device *dev = sblock_to_check->dev;
 	struct btrfs_fs_info *fs_info;
 	u64 logical;
 	unsigned int failed_mirror_index;
@@ -1006,9 +1016,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_CORRUPTION_ERRS);
 		return 0;
 	}
-	logical = sblock_to_check->sectors[0]->logical;
-	BUG_ON(sblock_to_check->sectors[0]->mirror_num < 1);
-	failed_mirror_index = sblock_to_check->sectors[0]->mirror_num - 1;
+	logical = sblock_to_check->logical;
+	ASSERT(sblock_to_check->mirror_num);
+	failed_mirror_index = sblock_to_check->mirror_num - 1;
 	is_metadata = !(sblock_to_check->sectors[0]->flags &
 			BTRFS_EXTENT_FLAG_DATA);
 	have_csum = sblock_to_check->sectors[0]->have_csum;
@@ -1081,9 +1091,13 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 		 *
 		 * But alloc_scrub_block() will initialize sblock::ref anyway,
 		 * so we can use scrub_block_put() to clean them up.
+		 *
+		 * And here we don't setup the physical/dev for the sblock yet,
+		 * they will be correctly initialized in
+		 * scrub_setup_recheck_block().
 		 */
-		sblocks_for_recheck[mirror_index] = alloc_scrub_block(sctx,
-								      logical);
+		sblocks_for_recheck[mirror_index] = alloc_scrub_block(sctx, NULL,
+							logical, 0, 0, mirror_index);
 		if (!sblocks_for_recheck[mirror_index]) {
 			spin_lock(&sctx->stat_lock);
 			sctx->stat.malloc_errors++;
@@ -1207,7 +1221,7 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 
 			ASSERT(failed_mirror_index == 0);
 			sblock_other = sblocks_for_recheck[1];
-			sblock_other->sectors[0]->mirror_num = 1 + mirror_index;
+			sblock_other->mirror_num = 1 + mirror_index;
 		}
 
 		/* build and submit the bios, check checksums */
@@ -1431,8 +1445,8 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 {
 	struct scrub_ctx *sctx = original_sblock->sctx;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	u64 logical = original_sblock->logical;
 	u64 length = original_sblock->sector_count << fs_info->sectorsize_bits;
-	u64 logical = original_sblock->sectors[0]->logical;
 	u64 generation = original_sblock->sectors[0]->generation;
 	u64 flags = original_sblock->sectors[0]->flags;
 	u64 have_csum = original_sblock->sectors[0]->have_csum;
@@ -1512,16 +1526,20 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 						      mirror_index,
 						      &stripe_index,
 						      &stripe_offset);
-			sector->physical = bioc->stripes[stripe_index].physical +
-					 stripe_offset;
-			sector->dev = bioc->stripes[stripe_index].dev;
+			/*
+			 * We're at the first sector, also populate @sblock
+			 * physical and dev.
+			 */
+			if (sector_index == 0) {
+				sblock->physical =
+					bioc->stripes[stripe_index].physical +
+					stripe_offset;
+				sblock->dev = bioc->stripes[stripe_index].dev;
+				sblock->physical_for_dev_replace =
+					original_sblock->physical_for_dev_replace;
+			}
 
 			BUG_ON(sector_index >= original_sblock->sector_count);
-			sector->physical_for_dev_replace =
-				original_sblock->sectors[sector_index]->
-				physical_for_dev_replace;
-			/* for missing devices, dev->bdev is NULL */
-			sector->mirror_num = mirror_index + 1;
 			scrub_get_recover(recover);
 			sector->recover = recover;
 		}
@@ -1545,11 +1563,12 @@ static int scrub_submit_raid56_bio_wait(struct btrfs_fs_info *fs_info,
 {
 	DECLARE_COMPLETION_ONSTACK(done);
 
-	bio->bi_iter.bi_sector = sector->logical >> 9;
+	bio->bi_iter.bi_sector = (sector->offset + sector->sblock->logical) >>
+				 SECTOR_SHIFT;
 	bio->bi_private = &done;
 	bio->bi_end_io = scrub_bio_wait_endio;
 	raid56_parity_recover(bio, sector->recover->bioc,
-			      sector->sblock->sectors[0]->mirror_num, false);
+			      sector->sblock->mirror_num, false);
 
 	wait_for_completion_io(&done);
 	return blk_status_to_errno(bio->bi_status);
@@ -1563,11 +1582,11 @@ static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info,
 	int i;
 
 	/* All sectors in sblock belong to the same stripe on the same device. */
-	ASSERT(first_sector->dev);
-	if (!first_sector->dev->bdev)
+	ASSERT(sblock->dev);
+	if (!sblock->dev->bdev)
 		goto out;
 
-	bio = bio_alloc(first_sector->dev->bdev, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS);
+	bio = bio_alloc(sblock->dev->bdev, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS);
 
 	for (i = 0; i < sblock->sector_count; i++) {
 		struct scrub_sector *sector = sblock->sectors[i];
@@ -1616,15 +1635,16 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
 		struct bio bio;
 		struct bio_vec bvec;
 
-		if (sector->dev->bdev == NULL) {
+		if (sblock->dev->bdev == NULL) {
 			sector->io_error = 1;
 			sblock->no_io_error_seen = 0;
 			continue;
 		}
 
-		bio_init(&bio, sector->dev->bdev, &bvec, 1, REQ_OP_READ);
+		bio_init(&bio, sblock->dev->bdev, &bvec, 1, REQ_OP_READ);
 		bio_add_scrub_sector(&bio, sector, fs_info->sectorsize);
-		bio.bi_iter.bi_sector = sector->physical >> 9;
+		bio.bi_iter.bi_sector = (sblock->physical + sector->offset) >>
+					SECTOR_SHIFT;
 
 		btrfsic_check_bio(&bio);
 		if (submit_bio_wait(&bio)) {
@@ -1641,7 +1661,7 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
 
 static inline int scrub_check_fsid(u8 fsid[], struct scrub_sector *sector)
 {
-	struct btrfs_fs_devices *fs_devices = sector->dev->fs_devices;
+	struct btrfs_fs_devices *fs_devices = sector->sblock->dev->fs_devices;
 	int ret;
 
 	ret = memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
@@ -1693,14 +1713,15 @@ static int scrub_repair_sector_from_good_copy(struct scrub_block *sblock_bad,
 		struct bio_vec bvec;
 		int ret;
 
-		if (!sector_bad->dev->bdev) {
+		if (!sblock_bad->dev->bdev) {
 			btrfs_warn_rl(fs_info,
 				"scrub_repair_page_from_good_copy(bdev == NULL) is unexpected");
 			return -EIO;
 		}
 
-		bio_init(&bio, sector_bad->dev->bdev, &bvec, 1, REQ_OP_WRITE);
-		bio.bi_iter.bi_sector = sector_bad->physical >> 9;
+		bio_init(&bio, sblock_bad->dev->bdev, &bvec, 1, REQ_OP_WRITE);
+		bio.bi_iter.bi_sector = (sblock_bad->physical +
+					 sector_bad->offset) >> SECTOR_SHIFT;
 		ret = bio_add_scrub_sector(&bio, sector_good, sectorsize);
 
 		btrfsic_check_bio(&bio);
@@ -1708,7 +1729,7 @@ static int scrub_repair_sector_from_good_copy(struct scrub_block *sblock_bad,
 		bio_uninit(&bio);
 
 		if (ret) {
-			btrfs_dev_stat_inc_and_print(sector_bad->dev,
+			btrfs_dev_stat_inc_and_print(sblock_bad->dev,
 				BTRFS_DEV_STAT_WRITE_ERRS);
 			atomic64_inc(&fs_info->dev_replace.num_write_errors);
 			return -EIO;
@@ -1780,6 +1801,7 @@ static void scrub_block_get(struct scrub_block *sblock)
 static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx,
 				      struct scrub_sector *sector)
 {
+	struct scrub_block *sblock = sector->sblock;
 	struct scrub_bio *sbio;
 	int ret;
 	const u32 sectorsize = sctx->fs_info->sectorsize;
@@ -1798,14 +1820,16 @@ static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx,
 	}
 	sbio = sctx->wr_curr_bio;
 	if (sbio->sector_count == 0) {
-		ret = fill_writer_pointer_gap(sctx, sector->physical_for_dev_replace);
+		ret = fill_writer_pointer_gap(sctx, sector->offset +
+				sblock->physical_for_dev_replace);
 		if (ret) {
 			mutex_unlock(&sctx->wr_lock);
 			return ret;
 		}
 
-		sbio->physical = sector->physical_for_dev_replace;
-		sbio->logical = sector->logical;
+		sbio->physical = sblock->physical_for_dev_replace +
+				 sector->offset;
+		sbio->logical = sblock->logical + sector->offset;
 		sbio->dev = sctx->wr_tgtdev;
 		if (!sbio->bio) {
 			sbio->bio = bio_alloc(sbio->dev->bdev, sctx->sectors_per_bio,
@@ -1816,9 +1840,9 @@ static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx,
 		sbio->bio->bi_iter.bi_sector = sbio->physical >> 9;
 		sbio->status = 0;
 	} else if (sbio->physical + sbio->sector_count * sectorsize !=
-		   sector->physical_for_dev_replace ||
+		   sblock->physical_for_dev_replace + sector->offset ||
 		   sbio->logical + sbio->sector_count * sectorsize !=
-		   sector->logical) {
+		   sblock->logical + sector->offset) {
 		scrub_wr_submit(sctx);
 		goto again;
 	}
@@ -2013,7 +2037,7 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
 	 * a) don't have an extent buffer and
 	 * b) the page is already kmapped
 	 */
-	if (sector->logical != btrfs_stack_header_bytenr(h))
+	if (sblock->logical != btrfs_stack_header_bytenr(h))
 		sblock->header_error = 1;
 
 	if (sector->generation != btrfs_stack_header_generation(h)) {
@@ -2062,7 +2086,7 @@ static int scrub_checksum_super(struct scrub_block *sblock)
 	kaddr = scrub_sector_get_kaddr(sector);
 	s = (struct btrfs_super_block *)kaddr;
 
-	if (sector->logical != btrfs_super_bytenr(s))
+	if (sblock->logical != btrfs_super_bytenr(s))
 		++fail_cor;
 
 	if (sector->generation != btrfs_super_generation(s))
@@ -2215,9 +2239,9 @@ static int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx,
 	}
 	sbio = sctx->bios[sctx->curr];
 	if (sbio->sector_count == 0) {
-		sbio->physical = sector->physical;
-		sbio->logical = sector->logical;
-		sbio->dev = sector->dev;
+		sbio->physical = sblock->physical + sector->offset;
+		sbio->logical = sblock->logical + sector->offset;
+		sbio->dev = sblock->dev;
 		if (!sbio->bio) {
 			sbio->bio = bio_alloc(sbio->dev->bdev, sctx->sectors_per_bio,
 					      REQ_OP_READ, GFP_NOFS);
@@ -2227,10 +2251,10 @@ static int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx,
 		sbio->bio->bi_iter.bi_sector = sbio->physical >> 9;
 		sbio->status = 0;
 	} else if (sbio->physical + sbio->sector_count * sectorsize !=
-		   sector->physical ||
+		   sblock->physical + sector->offset ||
 		   sbio->logical + sbio->sector_count * sectorsize !=
-		   sector->logical ||
-		   sbio->dev != sector->dev) {
+		   sblock->logical + sector->offset ||
+		   sbio->dev != sblock->dev) {
 		scrub_submit(sctx);
 		goto again;
 	}
@@ -2277,8 +2301,8 @@ static void scrub_missing_raid56_worker(struct work_struct *work)
 	u64 logical;
 	struct btrfs_device *dev;
 
-	logical = sblock->sectors[0]->logical;
-	dev = sblock->sectors[0]->dev;
+	logical = sblock->logical;
+	dev = sblock->dev;
 
 	if (sblock->no_io_error_seen)
 		scrub_recheck_block_checksum(sblock);
@@ -2316,7 +2340,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 	struct scrub_ctx *sctx = sblock->sctx;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
 	u64 length = sblock->sector_count << fs_info->sectorsize_bits;
-	u64 logical = sblock->sectors[0]->logical;
+	u64 logical = sblock->logical;
 	struct btrfs_io_context *bioc = NULL;
 	struct bio *bio;
 	struct btrfs_raid_bio *rbio;
@@ -2354,7 +2378,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 
 		raid56_add_scrub_pages(rbio, scrub_sector_get_page(sector),
 				       scrub_sector_get_page_offset(sector),
-				       sector->logical);
+				       sector->offset + sector->sblock->logical);
 	}
 
 	INIT_WORK(&sblock->work, scrub_missing_raid56_worker);
@@ -2382,7 +2406,8 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
 	const u32 sectorsize = sctx->fs_info->sectorsize;
 	int index;
 
-	sblock = alloc_scrub_block(sctx, logical);
+	sblock = alloc_scrub_block(sctx, dev, logical, physical,
+				   physical_for_dev_replace, mirror_num);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
@@ -2407,12 +2432,8 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
 			scrub_block_put(sblock);
 			return -ENOMEM;
 		}
-		sector->dev = dev;
 		sector->flags = flags;
 		sector->generation = gen;
-		sector->physical = physical;
-		sector->physical_for_dev_replace = physical_for_dev_replace;
-		sector->mirror_num = mirror_num;
 		if (csum) {
 			sector->have_csum = 1;
 			memcpy(sector->csum, csum, sctx->fs_info->csum_size);
@@ -2564,8 +2585,9 @@ static void scrub_block_complete(struct scrub_block *sblock)
 	}
 
 	if (sblock->sparity && corrupted && !sblock->data_corrected) {
-		u64 start = sblock->sectors[0]->logical;
-		u64 end = sblock->sectors[sblock->sector_count - 1]->logical +
+		u64 start = sblock->logical;
+		u64 end = sblock->logical +
+			  sblock->sectors[sblock->sector_count - 1]->offset +
 			  sblock->sctx->fs_info->sectorsize;
 
 		ASSERT(end - start <= U32_MAX);
@@ -2719,7 +2741,8 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 
 	ASSERT(IS_ALIGNED(len, sectorsize));
 
-	sblock = alloc_scrub_block(sctx, logical);
+	sblock = alloc_scrub_block(sctx, dev, logical, physical, physical,
+				   mirror_num);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
@@ -2745,11 +2768,8 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 		/* For scrub parity */
 		scrub_sector_get(sector);
 		list_add_tail(&sector->list, &sparity->sectors_list);
-		sector->dev = dev;
 		sector->flags = flags;
 		sector->generation = gen;
-		sector->physical = physical;
-		sector->mirror_num = mirror_num;
 		if (csum) {
 			sector->have_csum = 1;
 			memcpy(sector->csum, csum, sctx->fs_info->csum_size);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 7/7] btrfs: use larger blocksize for data extent scrub
  2022-08-08  5:45 [PATCH v3 0/7] btrfs: scrub: changes to reduce memory usage for both regular and subpage sectorsize Qu Wenruo
                   ` (5 preceding siblings ...)
  2022-08-08  5:45 ` [PATCH v3 6/7] btrfs: scrub: move logical/physical/dev/mirror_num from scrub_sector to scrub_block Qu Wenruo
@ 2022-08-08  5:45 ` Qu Wenruo
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-08  5:45 UTC (permalink / raw)
  To: linux-btrfs

[PROBLEM]
The existing scrub code for data extents always limit the blocksize to
sectorsize.

This causes quite some extra scrub_block being allocated:
(there is a data extent at logical bytenr 298844160, length 64KiB)

 alloc_scrub_block: new block: logical=298844160 physical=298844160 mirror=1
 alloc_scrub_block: new block: logical=298848256 physical=298848256 mirror=1
 alloc_scrub_block: new block: logical=298852352 physical=298852352 mirror=1
 alloc_scrub_block: new block: logical=298856448 physical=298856448 mirror=1
 alloc_scrub_block: new block: logical=298860544 physical=298860544 mirror=1
 alloc_scrub_block: new block: logical=298864640 physical=298864640 mirror=1
 alloc_scrub_block: new block: logical=298868736 physical=298868736 mirror=1
 alloc_scrub_block: new block: logical=298872832 physical=298872832 mirror=1
 alloc_scrub_block: new block: logical=298876928 physical=298876928 mirror=1
 alloc_scrub_block: new block: logical=298881024 physical=298881024 mirror=1
 alloc_scrub_block: new block: logical=298885120 physical=298885120 mirror=1
 alloc_scrub_block: new block: logical=298889216 physical=298889216 mirror=1
 alloc_scrub_block: new block: logical=298893312 physical=298893312 mirror=1
 alloc_scrub_block: new block: logical=298897408 physical=298897408 mirror=1
 alloc_scrub_block: new block: logical=298901504 physical=298901504 mirror=1
 alloc_scrub_block: new block: logical=298905600 physical=298905600 mirror=1
 ...
 scrub_block_put: free block: logical=298844160 physical=298844160 len=4096 mirror=1
 scrub_block_put: free block: logical=298848256 physical=298848256 len=4096 mirror=1
 scrub_block_put: free block: logical=298852352 physical=298852352 len=4096 mirror=1
 scrub_block_put: free block: logical=298856448 physical=298856448 len=4096 mirror=1
 scrub_block_put: free block: logical=298860544 physical=298860544 len=4096 mirror=1
 scrub_block_put: free block: logical=298864640 physical=298864640 len=4096 mirror=1
 scrub_block_put: free block: logical=298868736 physical=298868736 len=4096 mirror=1
 scrub_block_put: free block: logical=298872832 physical=298872832 len=4096 mirror=1
 scrub_block_put: free block: logical=298876928 physical=298876928 len=4096 mirror=1
 scrub_block_put: free block: logical=298881024 physical=298881024 len=4096 mirror=1
 scrub_block_put: free block: logical=298885120 physical=298885120 len=4096 mirror=1
 scrub_block_put: free block: logical=298889216 physical=298889216 len=4096 mirror=1
 scrub_block_put: free block: logical=298893312 physical=298893312 len=4096 mirror=1
 scrub_block_put: free block: logical=298897408 physical=298897408 len=4096 mirror=1
 scrub_block_put: free block: logical=298901504 physical=298901504 len=4096 mirror=1
 scrub_block_put: free block: logical=298905600 physical=298905600 len=4096 mirror=1

This behavior will waste a lot of memory, especially after we have move
quite some members from scrub_sector to scrub_block.

[FIX]
To reduce the allocation of scrub_block, and reduce memory usage, this
patch will use BTRFS_STRIPE_LEN instead of sectorsize as the blocksize
to scrub data extents.

This results only one scrub_block to be allocated for above data extent:
 alloc_scrub_block: new block: logical=298844160 physical=298844160 mirror=1
 scrub_block_put: free block: logical=298844160 physical=298844160 len=65536 mirror=1

This would greatly reduce the memory usage (even it's just transient)
for larger data extents scrub.

For above example, the memory usage would be:

Old: num_sectors * (sizeof(scrub_block) + sizeof(scrub_sector))
     16          * (408                  + 96) = 8065

New: sizeof(scrub_block) + num_sectors * sizeof(scrub_sector)
     408                 + 16          * 96 = 1944

A good reduction of 75.9%.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 6c5be7cf00d0..ff251b7fbd47 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2671,11 +2671,17 @@ static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map,
 	u8 csum[BTRFS_CSUM_SIZE];
 	u32 blocksize;
 
+	/*
+	 * Block size determines how many scrub_block will be allocated.
+	 * Here we use BTRFS_STRIPE_LEN (64KiB) as default limit, so we
+	 * won't allocate too many scrub_block, while still won't cause
+	 * too large bios for large extents.
+	 */
 	if (flags & BTRFS_EXTENT_FLAG_DATA) {
 		if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
 			blocksize = map->stripe_len;
 		else
-			blocksize = sctx->fs_info->sectorsize;
+			blocksize = BTRFS_STRIPE_LEN;
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.data_extents_scrubbed++;
 		sctx->stat.data_bytes_scrubbed += len;
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-08  5:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-08  5:45 [PATCH v3 0/7] btrfs: scrub: changes to reduce memory usage for both regular and subpage sectorsize Qu Wenruo
2022-08-08  5:45 ` [PATCH v3 1/7] btrfs: scrub: use pointer array to replace @sblocks_for_recheck Qu Wenruo
2022-08-08  5:45 ` [PATCH v3 2/7] btrfs: extract the initialization of scrub_block into a helper function Qu Wenruo
2022-08-08  5:45 ` [PATCH v3 3/7] btrfs: extract the allocation and initialization of scrub_sector into a helper Qu Wenruo
2022-08-08  5:45 ` [PATCH v3 4/7] btrfs: scrub: introduce scrub_block::pages for more efficient memory usage for subpage Qu Wenruo
2022-08-08  5:45 ` [PATCH v3 5/7] btrfs: scrub: remove scrub_sector::page and use scrub_block::pages instead Qu Wenruo
2022-08-08  5:45 ` [PATCH v3 6/7] btrfs: scrub: move logical/physical/dev/mirror_num from scrub_sector to scrub_block Qu Wenruo
2022-08-08  5:45 ` [PATCH v3 7/7] btrfs: use larger blocksize for data extent scrub Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.