archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <>
Subject: bcachefs update: New allocator has been merged
Date: Sun, 13 Mar 2022 20:45:13 -0400	[thread overview]
Message-ID: <Yi6QGfZ5WeV1TOBs@moria.home.lan> (raw)

Just finished a big new update: the big allocator rewrite is finished and

It's a mandatory disk format upgrade; when switching to the new version on an
existing filesystem you'll see it initialize the freespace btree when you mount.

What's changed: we've got some new persistent data structures that replace code
that used to periodically walk all the buckets in the filesystem, kept in an in
memory array - and now that we don't need to do that anymore, the in-memory
bucket array is gone, too. Specifically, we've got:

 - A new hash table for buckets awaiting journal commit before they can be
   reused, using cuckoo hashing (this one was rolled out awhile ago)

 - An extents-style freespace btree, to replace the code in the old allocator
   threads that periodically walked the arrays of buckets to build up freelists

 - A btree of buckets that need discarding before being moved to the freespace
 - A new LRU btree, for buckets containing cached data - replacing code in the
   allocator threads that would scan buckets and build up a heap of buckets to
   be reused.

The old allocator threads are completely gone - and the code that replaces them
all transactional b-tree code, much of it trigger based, that's _way_ easier to
debug and reason about. This fixes weird performance corner cases and
scalabiilty issues - in particular, the allocator threads were prone to using
excessive CPU when the filesystem was nearly full. Also, we've got a new and
much improved discard implementation! Previously, we'd only issue discards
shortly prior to reusing/writing to a bucket again - now, we'll issue discards
right after buckets become empty.

Exciting stuff - this was the biggest and most invasive change in quite awhile,
and I'm pretty happy with how it turned out.

Next big change is going to be the addition of backpointers to fix copygc
scanning, and a rebalance-work btree to fix rebalance thread scanning, and then
we'll be pretty much set for major scalability work.

Other recent changes/improvements: a lot of assorted debugability improvements.

 - list_journal improvements: now, when going emergency read only, we finish
   writing everything we have pending to the journal - we just mark them as
   noflush writes, so they'll never be used by recovery, but list_journal can
   still see them. This means when we detect an inconsistency, we can see all
   the updates leading up to it in the journal (along with what transactions
   were doing them), making it much easier to work backwards to what went wrong.

   We've been doing a lot of debugging lately with just list_journal and grep -
   yay for grep debugging!

 - A bunch of printbuf and to_text() method improvements, which make it easy to
   write good log messages when something goes wrong

 - Started moving some internal state used for debugging from sysfs to debugfs,
   where we can be much more verbose (yay for grep debugging!)

 - Fixed some snapshots bugs - figured out a major cause of the transaction path
   overflow bugs we've been seeing.

And, big thanks to all the people who put up with and test my crappy code and
help with finding all the bugs and beating it into shape :)

                 reply	other threads:[~2022-03-14  0:45 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yi6QGfZ5WeV1TOBs@moria.home.lan \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).