All of lore.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Nix <nix@esperi.org.uk>
Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org,
	Nikhil Kshirsagar <nkshirsagar@gmail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket
Date: Thu, 9 Jun 2022 21:54:55 +0800	[thread overview]
Message-ID: <2FDA62C4-AFB2-4215-AD71-2EC6E14A4F5D@suse.de> (raw)
In-Reply-To: <8735geanp8.fsf@esperi.org.uk>



> 2022年6月9日 04:45,Nix <nix@esperi.org.uk> 写道:
> 
> On 21 May 2022, Coly Li spake thusly:
> 
>> When all journal buckets are fully filled by active jset with heavy
>> write I/O load, the cache set registration (after a reboot) will load
>> all active jsets and inserting them into the btree again (which is
>> called journal replay). If a journaled bkey is inserted into a btree
>> node and results btree node split, new journal request might be
>> triggered. For example, the btree grows one more level after the node
>> split, then the root node record in cache device super block will be
>> upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no
>> space in journal buckets, the journal replay has to wait for new journal
>> bucket to be reclaimed after at least one journal bucket replayed. This
>> is one example that how the journal no-space deadlock happens.
>> 
>> The solution to avoid the deadlock is to reserve 1 journal bucket in
> 
> It seems to me that this could happen more than once in a single journal
> replay (multiple nodes might be split, etc). Is one bucket actually
> always enough, or is it merely enough nearly all the time?

It is possible that multiple leaf nodes split during journal replay, but the journal_meta() only gets called when the root node is updated.
For the new bkey of the new split node inserting into root node, it doesn’t go into journal because journal only records inserting bkeys for leaf nodes. Only when the btree node split causes root node split, the new root node location (bkey) has to be recored in journal set.

Therefore almost all the time that btree root node only splits once during journal replay, it is very rare that between two root node splits (that means very large number of bkeys inserted) the oldest journal entry doesn’t get replayed, that is almost impossible in real practice. So reserving 8K journal space is indeed enough for the no-space deadlock situation.

The default bucket size is much larger than 8K, so we don’t have to worry about the reserved journal space exhausted even with a much larger journal buckets number.

Indeed my initial effort was to reserve 8K space within a journal bucket if the bucket size > 8KB. But there are too many locations should be careful, and the logic of the patch is complicated and total change set is 200+ lines. And I find if I reserve a whole bucket, the change set is only 30+ lines. So finally I decide to reserve a whole journal bucket, because the change is much simpler and easier to be understood.

Coly Li

  reply	other threads:[~2022-06-09 13:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-21 17:04 [PATCH 1/4] bcache: improve multithreaded bch_btree_check() Coly Li
2022-05-21 17:05 ` [PATCH 2/4] bcache: improve multithreaded bch_sectors_dirty_init() Coly Li
2022-05-21 17:05 ` [PATCH 3/4] bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() Coly Li
2022-05-21 17:05 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li
2022-06-08 20:45   ` Nix
2022-06-09 13:54     ` Coly Li [this message]
2022-05-22 17:07 [PATCH 0/4] bcache patches for Linux v5.19 (1st wave) Coly Li
2022-05-22 17:07 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li
2022-05-24 10:23 [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Coly Li
2022-05-24 10:23 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2FDA62C4-AFB2-4215-AD71-2EC6E14A4F5D@suse.de \
    --to=colyli@suse.de \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=nix@esperi.org.uk \
    --cc=nkshirsagar@gmail.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.