All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: djwong@kernel.org
Cc: hch@lst.de, linux-xfs@vger.kernel.org
Subject: [PATCHSET v29.2 8/8] xfs: reduce refcount repair memory usage
Date: Thu, 01 Feb 2024 11:39:50 -0800	[thread overview]
Message-ID: <170681337865.1608752.14424093781022631293.stgit@frogsfrogsfrogs> (raw)

Hi all,

The refcountbt repair code has serious memory usage problems when the
block sharing factor of the filesystem is very high.  This can happen if
a deduplication tool has been run against the filesystem, or if the fs
stores reflinked VM images that have been aging for a long time.

Recall that the original reference counting algorithm walks the reverse
mapping records of the filesystem to generate reference counts.  For any
given block in the AG, the rmap bag structure contains the all rmap
records that cover that block; the refcount is the size of that bag.

For online repair, the bag doesn't need the owner, offset, or state flag
information, so it discards those.  This halves the record size, but the
bag structure still stores one excerpted record for each reverse
mapping.  If the sharing count is high, this will use a LOT of memory
storing redundant records.  In the extreme case, 100k mappings to the
same piece of space will consume 100k*16 bytes = 1.6M of memory.

For offline repair, the bag stores the owner values so that we know
which inodes need to be marked as being reflink inodes.  If a
deduplication tool has been run and there are many blocks within a file
pointing to the same physical space, this will stll use a lot of memory
to store redundant records.

The solution to this problem is to deduplicate the bag records when
possible by adding a reference count to the bag record, and changing the
bag add function to detect an existing record to bump the refcount.  In
the above example, the 100k mappings will now use 24 bytes of memory.
These lookups can be done efficiently with a btree, so we create a new
refcount bag btree type (inside of online repair).  This is why we
refactored the btree code in the previous patchset.

The btree conversion also dramatically reduces the runtime of the
refcount generation algorithm, because the code to delete all bag
records that end at a given agblock now only has to delete one record
instead of (using the example above) 100k records.  As an added benefit,
record deletion now gives back the unused xfile space, which it did not
do previously.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-refcount-scalability

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-refcount-scalability
---
Commits in this patchset:
 * xfs: define an in-memory btree for storing refcount bag info during repairs
 * xfs: create refcount bag structure for btree repairs
 * xfs: port refcount repair to the new refcount bag structure
---
 fs/xfs/Makefile                |    2 
 fs/xfs/scrub/rcbag.c           |  307 +++++++++++++++++++++++++++++++++
 fs/xfs/scrub/rcbag.h           |   28 +++
 fs/xfs/scrub/rcbag_btree.c     |  370 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/rcbag_btree.h     |   81 +++++++++
 fs/xfs/scrub/refcount.c        |   12 +
 fs/xfs/scrub/refcount_repair.c |  164 ++++++------------
 fs/xfs/scrub/repair.h          |    2 
 fs/xfs/xfs_stats.c             |    3 
 fs/xfs/xfs_stats.h             |    1 
 fs/xfs/xfs_super.c             |   10 +
 11 files changed, 872 insertions(+), 108 deletions(-)
 create mode 100644 fs/xfs/scrub/rcbag.c
 create mode 100644 fs/xfs/scrub/rcbag.h
 create mode 100644 fs/xfs/scrub/rcbag_btree.c
 create mode 100644 fs/xfs/scrub/rcbag_btree.h


             reply	other threads:[~2024-02-01 19:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-01 19:39 Darrick J. Wong [this message]
2024-02-01 20:00 ` [PATCH 1/3] xfs: define an in-memory btree for storing refcount bag info during repairs Darrick J. Wong
2024-02-02  6:30   ` Christoph Hellwig
2024-02-01 20:00 ` [PATCH 2/3] xfs: create refcount bag structure for btree repairs Darrick J. Wong
2024-02-02  6:31   ` Christoph Hellwig
2024-02-01 20:00 ` [PATCH 3/3] xfs: port refcount repair to the new refcount bag structure Darrick J. Wong
2024-02-02  6:31   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=170681337865.1608752.14424093781022631293.stgit@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.