[PATCH 0/8] Block group caching fixes

* [PATCH 0/8] Block group caching fixes
@ 2020-10-23 13:58 Josef Bacik
  2020-10-23 13:58 ` [PATCH 1/8] btrfs: do not shorten unpin len for caching block groups Josef Bacik
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Josef Bacik @ 2020-10-23 13:58 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

The lockdep fix I posted about protecting ->caching_block_groups got feedback
that I needed to document how we're using the ->commit_root_sem to protect
->last_byte_to_unpin, which forced me to take a real hard look at how we're
coordinating the caching threads.  This lead to the discovery of a few pretty
significant problems which has lead to this patchset.  There are a few
individual fixes, but the bulk of these patches are around making all caching of
block groups asynchronous.

Previously we would load the v1 space cache at caching time, instead of doing it
asynchronously.  This was for speed, but also because there's a delicate balance
that needs to be maintained with unpinning and the v1 space cache.  With the
slow caching we keep track of our progress in caching, so we can unpin anything
prior to that progress.  However since the space cache only knows the state of
the block groupat that time, we have to load the cache in it's entirety so we
are able to unpin ranges in that block group properly.  Thus loading the space
cache on demand and inline with the caller.

This resulted in a few weird special cases for the space cache

1) We don't actually use the ->commit_root_sem when doing the free space cache
   lookups.  This is incorrect, and can actually race with a transaction commit.
   This in practice doesn't mean much, but is still functionally wrong and an
   outlier.
2) Instead of using the ->commit_root_sem and ->search_commit_root when looking
   up the extents for the free space cache inode, we use ->recurse and allow
   recursive read locks on nodes we may already hold write locks on.  This
   happens in the case where we are modifying the tree_root when we need to
   cache the block group, we may already be holding the write lock on an upper
   node and then subsequently need to take the read lock on that node to read
   the extents for the free space cache inode.  This is the only place we allow
   recursion, and we don't actually need it because we could easily use
   ->search_commit_root.

However we can't actually take the ->search_commit_root in this path, because it
causes the lockdep splat that I fix in the last patch of this series.  This
means we need to move the loading of the v1 space cache to an async thread so we
can take the proper locks for searching the commit root safely.  This allows us
to unify our modification of ->last_byte_to_unpin, and clean up the locking for
everything else so it's consistent.

The other major fix is how we update ->last_byte_to_unpin.  Previously we were
recording this value, and then later switching the commit roots.  This could
result in a gap where we would not unpin a range, thus leaking the space.  This
leaked space would then end up in the space cache as we use the in-memory cache
to write out the on disk space cache.  This explains how sometimes we would see
messages indicating that the space cache wasn't right and needed to be rebuilt.

The main fixes are

  btrfs: do not shorten unpin len for caching block groups
  btrfs: update last_byte_to_unpin in switch_commit_roots
  btrfs: protect the fs_info->caching_block_groups differently

And the work to make space cache async is in the following patches

  btrfs: cleanup btrfs_discard_update_discardable usage
  btrfs: load free space cache into a temporary ctl
  btrfs: load the free space cache inode extents from commit root
  btrfs: async load free space cache

Thanks,

Josef

Josef Bacik (8):
  btrfs: do not shorten unpin len for caching block groups
  btrfs: update last_byte_to_unpin in switch_commit_roots
  btrfs: explicitly protect ->last_byte_to_unpin in unpin_extent_range
  btrfs: cleanup btrfs_discard_update_discardable usage
  btrfs: load free space cache into a temporary ctl
  btrfs: load the free space cache inode extents from commit root
  btrfs: async load free space cache
  btrfs: protect the fs_info->caching_block_groups differently

 fs/btrfs/block-group.c       | 164 +++++++++++++--------------------
 fs/btrfs/ctree.h             |   1 -
 fs/btrfs/discard.c           |   7 +-
 fs/btrfs/discard.h           |   3 +-
 fs/btrfs/extent-tree.c       |  35 ++------
 fs/btrfs/free-space-cache.c  | 169 +++++++++++++++--------------------
 fs/btrfs/free-space-cache.h  |   3 +-
 fs/btrfs/inode.c             |  10 ++-
 fs/btrfs/tests/btrfs-tests.c |   2 +-
 fs/btrfs/transaction.c       |  43 ++++++++-
 10 files changed, 198 insertions(+), 239 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 22+ messages in thread