* [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) @ 2022-05-24 10:23 Coly Li 2022-05-24 10:23 ` [PATCH 1/4] bcache: improve multithreaded bch_btree_check() Coly Li ` (4 more replies) 0 siblings, 5 replies; 10+ messages in thread From: Coly Li @ 2022-05-24 10:23 UTC (permalink / raw) To: axboe; +Cc: linux-bcache, linux-block, Coly Li Hi Jens, Thank you for taking the late arrived series, they are all for bcache fixes when I work on the bcache journal no-space deadlock issue. It spent me quite long time to fix because other issues interfered my debug and analysis. When all the depending issues were fixed and my fix for the journal no-space deadlock is verified, this submission is late for Linux v5.19 submission. But it is still worthy to take them into v5.19 because real issues are fixed by this series. The bcache has 4 patches for Linux v5.19 merge window, all from me. - The first 2 patches are code clean up and potential bug fixes for multi- threaded btree nodes check (for cache device) and dirty sectors counting (for backing device), although no report from mailing list for them, it is good to have the fixes. - The 3rd patch removes incremental dirty sectors counting because it is conflicted with multithreaded dirty sectors counting and the latter one is 10x times faster. - The last patch fixes a journal no-space deadlock during cache device registration, it always reserves one journal bucket and only uses it in registration time, so the no-spance condition won't happen anymore. There are still 2 fixes are still under the long time I/O pressure testing, once they are accomplished, I will submit to you in later RC cycles. The v2 series fixed previously detectd oversize stack frame issue, in my test I don't observed the stack frame oversize warning and normal bcache operations work as expected. Thank you in advance. Coly Li --- Coly Li (4): bcache: improve multithreaded bch_btree_check() bcache: improve multithreaded bch_sectors_dirty_init() bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() bcache: avoid journal no-space deadlock by reserving 1 journal bucket drivers/md/bcache/btree.c | 58 +++++++++---------- drivers/md/bcache/btree.h | 2 +- drivers/md/bcache/journal.c | 31 +++++++++-- drivers/md/bcache/journal.h | 2 + drivers/md/bcache/super.c | 1 + drivers/md/bcache/writeback.c | 101 +++++++++++++--------------------- drivers/md/bcache/writeback.h | 2 +- 7 files changed, 94 insertions(+), 103 deletions(-) -- 2.35.3 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/4] bcache: improve multithreaded bch_btree_check() 2022-05-24 10:23 [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Coly Li @ 2022-05-24 10:23 ` Coly Li 2022-05-24 10:23 ` [PATCH 2/4] bcache: improve multithreaded bch_sectors_dirty_init() Coly Li ` (3 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Coly Li @ 2022-05-24 10:23 UTC (permalink / raw) To: axboe; +Cc: linux-bcache, linux-block, Coly Li, stable Commit 8e7102273f59 ("bcache: make bch_btree_check() to be multithreaded") makes bch_btree_check() to be much faster when checking all btree nodes during cache device registration. But it isn't in ideal shap yet, still can be improved. This patch does the following thing to improve current parallel btree nodes check by multiple threads in bch_btree_check(), - Add read lock to root node while checking all the btree nodes with multiple threads. Although currently it is not mandatory but it is good to have a read lock in code logic. - Remove local variable 'char name[32]', and generate kernel thread name string directly when calling kthread_run(). - Allocate local variable "struct btree_check_state check_state" on the stack and avoid unnecessary dynamic memory allocation for it. - Reduce BCH_BTR_CHKTHREAD_MAX from 64 to 12 which is enough indeed. - Increase check_state->started to count created kernel thread after it succeeds to create. - When wait for all checking kernel threads to finish, use wait_event() to replace wait_event_interruptible(). With this change, the code is more clear, and some potential error conditions are avoided. Fixes: 8e7102273f59 ("bcache: make bch_btree_check() to be multithreaded") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org --- drivers/md/bcache/btree.c | 58 ++++++++++++++++++--------------------- drivers/md/bcache/btree.h | 2 +- 2 files changed, 27 insertions(+), 33 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index ad9f16689419..2362bb8ef6d1 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -2006,8 +2006,7 @@ int bch_btree_check(struct cache_set *c) int i; struct bkey *k = NULL; struct btree_iter iter; - struct btree_check_state *check_state; - char name[32]; + struct btree_check_state check_state; /* check and mark root node keys */ for_each_key_filter(&c->root->keys, k, &iter, bch_ptr_invalid) @@ -2018,63 +2017,58 @@ int bch_btree_check(struct cache_set *c) if (c->root->level == 0) return 0; - check_state = kzalloc(sizeof(struct btree_check_state), GFP_KERNEL); - if (!check_state) - return -ENOMEM; - - check_state->c = c; - check_state->total_threads = bch_btree_chkthread_nr(); - check_state->key_idx = 0; - spin_lock_init(&check_state->idx_lock); - atomic_set(&check_state->started, 0); - atomic_set(&check_state->enough, 0); - init_waitqueue_head(&check_state->wait); + check_state.c = c; + check_state.total_threads = bch_btree_chkthread_nr(); + check_state.key_idx = 0; + spin_lock_init(&check_state.idx_lock); + atomic_set(&check_state.started, 0); + atomic_set(&check_state.enough, 0); + init_waitqueue_head(&check_state.wait); + rw_lock(0, c->root, c->root->level); /* * Run multiple threads to check btree nodes in parallel, - * if check_state->enough is non-zero, it means current + * if check_state.enough is non-zero, it means current * running check threads are enough, unncessary to create * more. */ - for (i = 0; i < check_state->total_threads; i++) { - /* fetch latest check_state->enough earlier */ + for (i = 0; i < check_state.total_threads; i++) { + /* fetch latest check_state.enough earlier */ smp_mb__before_atomic(); - if (atomic_read(&check_state->enough)) + if (atomic_read(&check_state.enough)) break; - check_state->infos[i].result = 0; - check_state->infos[i].state = check_state; - snprintf(name, sizeof(name), "bch_btrchk[%u]", i); - atomic_inc(&check_state->started); + check_state.infos[i].result = 0; + check_state.infos[i].state = &check_state; - check_state->infos[i].thread = + check_state.infos[i].thread = kthread_run(bch_btree_check_thread, - &check_state->infos[i], - name); - if (IS_ERR(check_state->infos[i].thread)) { + &check_state.infos[i], + "bch_btrchk[%d]", i); + if (IS_ERR(check_state.infos[i].thread)) { pr_err("fails to run thread bch_btrchk[%d]\n", i); for (--i; i >= 0; i--) - kthread_stop(check_state->infos[i].thread); + kthread_stop(check_state.infos[i].thread); ret = -ENOMEM; goto out; } + atomic_inc(&check_state.started); } /* * Must wait for all threads to stop. */ - wait_event_interruptible(check_state->wait, - atomic_read(&check_state->started) == 0); + wait_event(check_state.wait, atomic_read(&check_state.started) == 0); - for (i = 0; i < check_state->total_threads; i++) { - if (check_state->infos[i].result) { - ret = check_state->infos[i].result; + for (i = 0; i < check_state.total_threads; i++) { + if (check_state.infos[i].result) { + ret = check_state.infos[i].result; goto out; } } out: - kfree(check_state); + rw_unlock(0, c->root); return ret; } diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h index 50482107134f..1b5fdbc0d83e 100644 --- a/drivers/md/bcache/btree.h +++ b/drivers/md/bcache/btree.h @@ -226,7 +226,7 @@ struct btree_check_info { int result; }; -#define BCH_BTR_CHKTHREAD_MAX 64 +#define BCH_BTR_CHKTHREAD_MAX 12 struct btree_check_state { struct cache_set *c; int total_threads; -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/4] bcache: improve multithreaded bch_sectors_dirty_init() 2022-05-24 10:23 [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Coly Li 2022-05-24 10:23 ` [PATCH 1/4] bcache: improve multithreaded bch_btree_check() Coly Li @ 2022-05-24 10:23 ` Coly Li 2022-05-24 10:23 ` [PATCH 3/4] bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() Coly Li ` (2 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Coly Li @ 2022-05-24 10:23 UTC (permalink / raw) To: axboe; +Cc: linux-bcache, linux-block, Coly Li, stable Commit b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded") makes bch_sectors_dirty_init() to be much faster when counting dirty sectors by iterating all dirty keys in the btree. But it isn't in ideal shape yet, still can be improved. This patch does the following changes to improve current parallel dirty keys iteration on the btree, - Add read lock to root node when multiple threads iterating the btree, to prevent the root node gets split by I/Os from other registered bcache devices. - Remove local variable "char name[32]" and generate kernel thread name string directly when calling kthread_run(). - Allocate "struct bch_dirty_init_state state" directly on stack and avoid the unnecessary dynamic memory allocation for it. - Decrease BCH_DIRTY_INIT_THRD_MAX from 64 to 12 which is enough indeed. - Increase &state->started to count created kernel thread after it succeeds to create. - When wait for all dirty key counting threads to finish, use wait_event() to replace wait_event_interruptible(). With the above changes, the code is more clear, and some potential error conditions are avoided. Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org --- drivers/md/bcache/writeback.c | 62 ++++++++++++++--------------------- drivers/md/bcache/writeback.h | 2 +- 2 files changed, 26 insertions(+), 38 deletions(-) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 9ee0005874cd..d24c09490f8e 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -948,10 +948,10 @@ void bch_sectors_dirty_init(struct bcache_device *d) struct btree_iter iter; struct sectors_dirty_init op; struct cache_set *c = d->c; - struct bch_dirty_init_state *state; - char name[32]; + struct bch_dirty_init_state state; /* Just count root keys if no leaf node */ + rw_lock(0, c->root, c->root->level); if (c->root->level == 0) { bch_btree_op_init(&op.op, -1); op.inode = d->id; @@ -961,54 +961,42 @@ void bch_sectors_dirty_init(struct bcache_device *d) for_each_key_filter(&c->root->keys, k, &iter, bch_ptr_invalid) sectors_dirty_init_fn(&op.op, c->root, k); + rw_unlock(0, c->root); return; } - state = kzalloc(sizeof(struct bch_dirty_init_state), GFP_KERNEL); - if (!state) { - pr_warn("sectors dirty init failed: cannot allocate memory\n"); - return; - } - - state->c = c; - state->d = d; - state->total_threads = bch_btre_dirty_init_thread_nr(); - state->key_idx = 0; - spin_lock_init(&state->idx_lock); - atomic_set(&state->started, 0); - atomic_set(&state->enough, 0); - init_waitqueue_head(&state->wait); - - for (i = 0; i < state->total_threads; i++) { - /* Fetch latest state->enough earlier */ + state.c = c; + state.d = d; + state.total_threads = bch_btre_dirty_init_thread_nr(); + state.key_idx = 0; + spin_lock_init(&state.idx_lock); + atomic_set(&state.started, 0); + atomic_set(&state.enough, 0); + init_waitqueue_head(&state.wait); + + for (i = 0; i < state.total_threads; i++) { + /* Fetch latest state.enough earlier */ smp_mb__before_atomic(); - if (atomic_read(&state->enough)) + if (atomic_read(&state.enough)) break; - state->infos[i].state = state; - atomic_inc(&state->started); - snprintf(name, sizeof(name), "bch_dirty_init[%d]", i); - - state->infos[i].thread = - kthread_run(bch_dirty_init_thread, - &state->infos[i], - name); - if (IS_ERR(state->infos[i].thread)) { + state.infos[i].state = &state; + state.infos[i].thread = + kthread_run(bch_dirty_init_thread, &state.infos[i], + "bch_dirtcnt[%d]", i); + if (IS_ERR(state.infos[i].thread)) { pr_err("fails to run thread bch_dirty_init[%d]\n", i); for (--i; i >= 0; i--) - kthread_stop(state->infos[i].thread); + kthread_stop(state.infos[i].thread); goto out; } + atomic_inc(&state.started); } - /* - * Must wait for all threads to stop. - */ - wait_event_interruptible(state->wait, - atomic_read(&state->started) == 0); - out: - kfree(state); + /* Must wait for all threads to stop. */ + wait_event(state.wait, atomic_read(&state.started) == 0); + rw_unlock(0, c->root); } void bch_cached_dev_writeback_init(struct cached_dev *dc) diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index 02b2f9df73f6..31df716951f6 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -20,7 +20,7 @@ #define BCH_WRITEBACK_FRAGMENT_THRESHOLD_MID 57 #define BCH_WRITEBACK_FRAGMENT_THRESHOLD_HIGH 64 -#define BCH_DIRTY_INIT_THRD_MAX 64 +#define BCH_DIRTY_INIT_THRD_MAX 12 /* * 14 (16384ths) is chosen here as something that each backing device * should be a reasonable fraction of the share, and not to blow up -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/4] bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() 2022-05-24 10:23 [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Coly Li 2022-05-24 10:23 ` [PATCH 1/4] bcache: improve multithreaded bch_btree_check() Coly Li 2022-05-24 10:23 ` [PATCH 2/4] bcache: improve multithreaded bch_sectors_dirty_init() Coly Li @ 2022-05-24 10:23 ` Coly Li 2022-05-24 10:23 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li 2022-05-24 12:19 ` [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Jens Axboe 4 siblings, 0 replies; 10+ messages in thread From: Coly Li @ 2022-05-24 10:23 UTC (permalink / raw) To: axboe; +Cc: linux-bcache, linux-block, Coly Li, stable After making bch_sectors_dirty_init() being multithreaded, the existing incremental dirty sector counting in bch_root_node_dirty_init() doesn't release btree occupation after iterating 500000 (INIT_KEYS_EACH_TIME) bkeys. Because a read lock is added on btree root node to prevent the btree to be split during the dirty sectors counting, other I/O requester has no chance to gain the write lock even restart bcache_btree(). That is to say, the incremental dirty sectors counting is incompatible to the multhreaded bch_sectors_dirty_init(). We have to choose one and drop another one. In my testing, with 512 bytes random writes, I generate 1.2T dirty data and a btree with 400K nodes. With single thread and incremental dirty sectors counting, it takes 30+ minites to register the backing device. And with multithreaded dirty sectors counting, the backing device registration can be accomplished within 2 minutes. The 30+ minutes V.S. 2- minutes difference makes me decide to keep multithreaded bch_sectors_dirty_init() and drop the incremental dirty sectors counting. This is what this patch does. But INIT_KEYS_EACH_TIME is kept, in sectors_dirty_init_fn() the CPU will be released by cond_resched() after every INIT_KEYS_EACH_TIME keys iterated. This is to avoid the watchdog reports a bogus soft lockup warning. Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org --- drivers/md/bcache/writeback.c | 41 +++++++++++------------------------ 1 file changed, 13 insertions(+), 28 deletions(-) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index d24c09490f8e..75b71199800d 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -805,13 +805,11 @@ static int bch_writeback_thread(void *arg) /* Init */ #define INIT_KEYS_EACH_TIME 500000 -#define INIT_KEYS_SLEEP_MS 100 struct sectors_dirty_init { struct btree_op op; unsigned int inode; size_t count; - struct bkey start; }; static int sectors_dirty_init_fn(struct btree_op *_op, struct btree *b, @@ -827,11 +825,8 @@ static int sectors_dirty_init_fn(struct btree_op *_op, struct btree *b, KEY_START(k), KEY_SIZE(k)); op->count++; - if (atomic_read(&b->c->search_inflight) && - !(op->count % INIT_KEYS_EACH_TIME)) { - bkey_copy_key(&op->start, k); - return -EAGAIN; - } + if (!(op->count % INIT_KEYS_EACH_TIME)) + cond_resched(); return MAP_CONTINUE; } @@ -846,24 +841,16 @@ static int bch_root_node_dirty_init(struct cache_set *c, bch_btree_op_init(&op.op, -1); op.inode = d->id; op.count = 0; - op.start = KEY(op.inode, 0, 0); - - do { - ret = bcache_btree(map_keys_recurse, - k, - c->root, - &op.op, - &op.start, - sectors_dirty_init_fn, - 0); - if (ret == -EAGAIN) - schedule_timeout_interruptible( - msecs_to_jiffies(INIT_KEYS_SLEEP_MS)); - else if (ret < 0) { - pr_warn("sectors dirty init failed, ret=%d!\n", ret); - break; - } - } while (ret == -EAGAIN); + + ret = bcache_btree(map_keys_recurse, + k, + c->root, + &op.op, + &KEY(op.inode, 0, 0), + sectors_dirty_init_fn, + 0); + if (ret < 0) + pr_warn("sectors dirty init failed, ret=%d!\n", ret); return ret; } @@ -907,7 +894,6 @@ static int bch_dirty_init_thread(void *arg) goto out; } skip_nr--; - cond_resched(); } if (p) { @@ -917,7 +903,6 @@ static int bch_dirty_init_thread(void *arg) p = NULL; prev_idx = cur_idx; - cond_resched(); } out: @@ -956,11 +941,11 @@ void bch_sectors_dirty_init(struct bcache_device *d) bch_btree_op_init(&op.op, -1); op.inode = d->id; op.count = 0; - op.start = KEY(op.inode, 0, 0); for_each_key_filter(&c->root->keys, k, &iter, bch_ptr_invalid) sectors_dirty_init_fn(&op.op, c->root, k); + rw_unlock(0, c->root); return; } -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket 2022-05-24 10:23 [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Coly Li ` (2 preceding siblings ...) 2022-05-24 10:23 ` [PATCH 3/4] bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() Coly Li @ 2022-05-24 10:23 ` Coly Li 2022-05-24 12:19 ` [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Jens Axboe 4 siblings, 0 replies; 10+ messages in thread From: Coly Li @ 2022-05-24 10:23 UTC (permalink / raw) To: axboe; +Cc: linux-bcache, linux-block, Coly Li, Nikhil Kshirsagar, stable The journal no-space deadlock was reported time to time. Such deadlock can happen in the following situation. When all journal buckets are fully filled by active jset with heavy write I/O load, the cache set registration (after a reboot) will load all active jsets and inserting them into the btree again (which is called journal replay). If a journaled bkey is inserted into a btree node and results btree node split, new journal request might be triggered. For example, the btree grows one more level after the node split, then the root node record in cache device super block will be upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no space in journal buckets, the journal replay has to wait for new journal bucket to be reclaimed after at least one journal bucket replayed. This is one example that how the journal no-space deadlock happens. The solution to avoid the deadlock is to reserve 1 journal bucket in run time, and only permit the reserved journal bucket to be used during cache set registration procedure for things like journal replay. Then the journal space will never be fully filled, there is no chance for journal no-space deadlock to happen anymore. This patch adds a new member "bool do_reserve" in struct journal, it is inititalized to 0 (false) when struct journal is allocated, and set to 1 (true) by bch_journal_space_reserve() when all initialization done in run_cache_set(). In the run time when journal_reclaim() tries to allocate a new journal bucket, free_journal_buckets() is called to check whether there are enough free journal buckets to use. If there is only 1 free journal bucket and journal->do_reserve is 1 (true), the last bucket is reserved and free_journal_buckets() will return 0 to indicate no free journal bucket. Then journal_reclaim() will give up, and try next time to see whetheer there is free journal bucket to allocate. By this method, there is always 1 jouranl bucket reserved in run time. During the cache set registration, journal->do_reserve is 0 (false), so the reserved journal bucket can be used to avoid the no-space deadlock. Reported-by: Nikhil Kshirsagar <nkshirsagar@gmail.com> Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org --- drivers/md/bcache/journal.c | 31 ++++++++++++++++++++++++++----- drivers/md/bcache/journal.h | 2 ++ drivers/md/bcache/super.c | 1 + 3 files changed, 29 insertions(+), 5 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index df5347ea450b..e5da469a4235 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -405,6 +405,11 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list) return ret; } +void bch_journal_space_reserve(struct journal *j) +{ + j->do_reserve = true; +} + /* Journalling */ static void btree_flush_write(struct cache_set *c) @@ -621,12 +626,30 @@ static void do_journal_discard(struct cache *ca) } } +static unsigned int free_journal_buckets(struct cache_set *c) +{ + struct journal *j = &c->journal; + struct cache *ca = c->cache; + struct journal_device *ja = &c->cache->journal; + unsigned int n; + + /* In case njournal_buckets is not power of 2 */ + if (ja->cur_idx >= ja->discard_idx) + n = ca->sb.njournal_buckets + ja->discard_idx - ja->cur_idx; + else + n = ja->discard_idx - ja->cur_idx; + + if (n > (1 + j->do_reserve)) + return n - (1 + j->do_reserve); + + return 0; +} + static void journal_reclaim(struct cache_set *c) { struct bkey *k = &c->journal.key; struct cache *ca = c->cache; uint64_t last_seq; - unsigned int next; struct journal_device *ja = &ca->journal; atomic_t p __maybe_unused; @@ -649,12 +672,10 @@ static void journal_reclaim(struct cache_set *c) if (c->journal.blocks_free) goto out; - next = (ja->cur_idx + 1) % ca->sb.njournal_buckets; - /* No space available on this device */ - if (next == ja->discard_idx) + if (!free_journal_buckets(c)) goto out; - ja->cur_idx = next; + ja->cur_idx = (ja->cur_idx + 1) % ca->sb.njournal_buckets; k->ptr[0] = MAKE_PTR(0, bucket_to_sector(c, ca->sb.d[ja->cur_idx]), ca->sb.nr_this_dev); diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h index f2ea34d5f431..cd316b4a1e95 100644 --- a/drivers/md/bcache/journal.h +++ b/drivers/md/bcache/journal.h @@ -105,6 +105,7 @@ struct journal { spinlock_t lock; spinlock_t flush_write_lock; bool btree_flushing; + bool do_reserve; /* used when waiting because the journal was full */ struct closure_waitlist wait; struct closure io; @@ -182,5 +183,6 @@ int bch_journal_replay(struct cache_set *c, struct list_head *list); void bch_journal_free(struct cache_set *c); int bch_journal_alloc(struct cache_set *c); +void bch_journal_space_reserve(struct journal *j); #endif /* _BCACHE_JOURNAL_H */ diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index bf3de149d3c9..2bb55278d22d 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2128,6 +2128,7 @@ static int run_cache_set(struct cache_set *c) flash_devs_run(c); + bch_journal_space_reserve(&c->journal); set_bit(CACHE_SET_RUNNING, &c->flags); return 0; err: -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) 2022-05-24 10:23 [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Coly Li ` (3 preceding siblings ...) 2022-05-24 10:23 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li @ 2022-05-24 12:19 ` Jens Axboe 4 siblings, 0 replies; 10+ messages in thread From: Jens Axboe @ 2022-05-24 12:19 UTC (permalink / raw) To: colyli; +Cc: linux-bcache, linux-block On Tue, 24 May 2022 18:23:32 +0800, Coly Li wrote: > Thank you for taking the late arrived series, they are all for bcache > fixes when I work on the bcache journal no-space deadlock issue. It > spent me quite long time to fix because other issues interfered my debug > and analysis. When all the depending issues were fixed and my fix for > the journal no-space deadlock is verified, this submission is late for > Linux v5.19 submission. But it is still worthy to take them into v5.19 > because real issues are fixed by this series. > > [...] Applied, thanks! [1/4] bcache: improve multithreaded bch_btree_check() commit: 622536443b6731ec82c563aae7807165adbe9178 [2/4] bcache: improve multithreaded bch_sectors_dirty_init() commit: 4dc34ae1b45fe26e772a44379f936c72623dd407 [3/4] bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() commit: 80db4e4707e78cb22287da7d058d7274bd4cb370 [4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket commit: 32feee36c30ea06e38ccb8ae6e5c44c6eec790a6 Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/4] bcache patches for Linux v5.19 (1st wave) @ 2022-05-22 17:07 Coly Li 2022-05-22 17:07 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li 0 siblings, 1 reply; 10+ messages in thread From: Coly Li @ 2022-05-22 17:07 UTC (permalink / raw) To: axboe; +Cc: linux-bcache, linux-block, Coly Li Hi Jens, The bcache has 4 patches for Linux v5.19 merge window, all from me. - The first 2 patches are code clean up and potential bug fixes for multi- threaded btree nodes check (for cache device) and dirty sectors counting (for backing device), although no report from mailing list for them, it is good to have the fixes. - The 3rd patch removes incremental dirty sectors counting because it is conflicted with multithreaded dirty sectors counting and the latter one is 10x times faster. - The last patch fixes a journal no-space deadlock during cache device registration, it always reserves one journal bucket and only uses it in registration time, so the no-spance condition won't happen anymore. There are still 2 fixes are still under the long time I/O pressure testing, once they are accomplished, I will submit to you in later RC cycles. Please take them, and thanks in advance. Coly Li --- Coly Li (4): bcache: improve multithreaded bch_btree_check() bcache: improve multithreaded bch_sectors_dirty_init() bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() bcache: avoid journal no-space deadlock by reserving 1 journal bucket drivers/md/bcache/btree.c | 58 +++++++++---------- drivers/md/bcache/journal.c | 31 +++++++++-- drivers/md/bcache/journal.h | 2 + drivers/md/bcache/super.c | 1 + drivers/md/bcache/writeback.c | 101 +++++++++++++--------------------- 5 files changed, 92 insertions(+), 101 deletions(-) -- 2.35.3 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket 2022-05-22 17:07 [PATCH 0/4] bcache patches " Coly Li @ 2022-05-22 17:07 ` Coly Li 0 siblings, 0 replies; 10+ messages in thread From: Coly Li @ 2022-05-22 17:07 UTC (permalink / raw) To: axboe; +Cc: linux-bcache, linux-block, Coly Li, Nikhil Kshirsagar, stable The journal no-space deadlock was reported time to time. Such deadlock can happen in the following situation. When all journal buckets are fully filled by active jset with heavy write I/O load, the cache set registration (after a reboot) will load all active jsets and inserting them into the btree again (which is called journal replay). If a journaled bkey is inserted into a btree node and results btree node split, new journal request might be triggered. For example, the btree grows one more level after the node split, then the root node record in cache device super block will be upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no space in journal buckets, the journal replay has to wait for new journal bucket to be reclaimed after at least one journal bucket replayed. This is one example that how the journal no-space deadlock happens. The solution to avoid the deadlock is to reserve 1 journal bucket in run time, and only permit the reserved journal bucket to be used during cache set registration procedure for things like journal replay. Then the journal space will never be fully filled, there is no chance for journal no-space deadlock to happen anymore. This patch adds a new member "bool do_reserve" in struct journal, it is inititalized to 0 (false) when struct journal is allocated, and set to 1 (true) by bch_journal_space_reserve() when all initialization done in run_cache_set(). In the run time when journal_reclaim() tries to allocate a new journal bucket, free_journal_buckets() is called to check whether there are enough free journal buckets to use. If there is only 1 free journal bucket and journal->do_reserve is 1 (true), the last bucket is reserved and free_journal_buckets() will return 0 to indicate no free journal bucket. Then journal_reclaim() will give up, and try next time to see whetheer there is free journal bucket to allocate. By this method, there is always 1 jouranl bucket reserved in run time. During the cache set registration, journal->do_reserve is 0 (false), so the reserved journal bucket can be used to avoid the no-space deadlock. Reported-by: Nikhil Kshirsagar <nkshirsagar@gmail.com> Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org --- drivers/md/bcache/journal.c | 31 ++++++++++++++++++++++++++----- drivers/md/bcache/journal.h | 2 ++ drivers/md/bcache/super.c | 1 + 3 files changed, 29 insertions(+), 5 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index df5347ea450b..e5da469a4235 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -405,6 +405,11 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list) return ret; } +void bch_journal_space_reserve(struct journal *j) +{ + j->do_reserve = true; +} + /* Journalling */ static void btree_flush_write(struct cache_set *c) @@ -621,12 +626,30 @@ static void do_journal_discard(struct cache *ca) } } +static unsigned int free_journal_buckets(struct cache_set *c) +{ + struct journal *j = &c->journal; + struct cache *ca = c->cache; + struct journal_device *ja = &c->cache->journal; + unsigned int n; + + /* In case njournal_buckets is not power of 2 */ + if (ja->cur_idx >= ja->discard_idx) + n = ca->sb.njournal_buckets + ja->discard_idx - ja->cur_idx; + else + n = ja->discard_idx - ja->cur_idx; + + if (n > (1 + j->do_reserve)) + return n - (1 + j->do_reserve); + + return 0; +} + static void journal_reclaim(struct cache_set *c) { struct bkey *k = &c->journal.key; struct cache *ca = c->cache; uint64_t last_seq; - unsigned int next; struct journal_device *ja = &ca->journal; atomic_t p __maybe_unused; @@ -649,12 +672,10 @@ static void journal_reclaim(struct cache_set *c) if (c->journal.blocks_free) goto out; - next = (ja->cur_idx + 1) % ca->sb.njournal_buckets; - /* No space available on this device */ - if (next == ja->discard_idx) + if (!free_journal_buckets(c)) goto out; - ja->cur_idx = next; + ja->cur_idx = (ja->cur_idx + 1) % ca->sb.njournal_buckets; k->ptr[0] = MAKE_PTR(0, bucket_to_sector(c, ca->sb.d[ja->cur_idx]), ca->sb.nr_this_dev); diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h index f2ea34d5f431..cd316b4a1e95 100644 --- a/drivers/md/bcache/journal.h +++ b/drivers/md/bcache/journal.h @@ -105,6 +105,7 @@ struct journal { spinlock_t lock; spinlock_t flush_write_lock; bool btree_flushing; + bool do_reserve; /* used when waiting because the journal was full */ struct closure_waitlist wait; struct closure io; @@ -182,5 +183,6 @@ int bch_journal_replay(struct cache_set *c, struct list_head *list); void bch_journal_free(struct cache_set *c); int bch_journal_alloc(struct cache_set *c); +void bch_journal_space_reserve(struct journal *j); #endif /* _BCACHE_JOURNAL_H */ diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index bf3de149d3c9..2bb55278d22d 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2128,6 +2128,7 @@ static int run_cache_set(struct cache_set *c) flash_devs_run(c); + bch_journal_space_reserve(&c->journal); set_bit(CACHE_SET_RUNNING, &c->flags); return 0; err: -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 1/4] bcache: improve multithreaded bch_btree_check() @ 2022-05-21 17:04 Coly Li 2022-05-21 17:05 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li 0 siblings, 1 reply; 10+ messages in thread From: Coly Li @ 2022-05-21 17:04 UTC (permalink / raw) To: linux-bcache; +Cc: linux-block, Coly Li, stable Commit 8e7102273f59 ("bcache: make bch_btree_check() to be multithreaded") makes bch_btree_check() to be much faster when checking all btree nodes during cache device registration. But it isn't in ideal shap yet, still can be improved. This patch does the following thing to improve current parallel btree nodes check by multiple threads in bch_btree_check(), - Add read lock to root node while checking all the btree nodes with multiple threads. Although currently it is not mandatory but it is good to have a read lock in code logic. - Remove local variable 'char name[32]', and generate kernel thread name string directly when calling kthread_run(). - Allocate local variable "struct btree_check_state check_state" on the stack and avoid unnecessary dynamic memory allocation for it. - Increase check_state->started to count created kernel thread after it succeeds to create. - When wait for all checking kernel threads to finish, use wait_event() to replace wait_event_interruptible(). With this change, the code is more clear, and some potential error conditions are avoided. Fixes: 8e7102273f59 ("bcache: make bch_btree_check() to be multithreaded") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org --- drivers/md/bcache/btree.c | 58 ++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 32 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index ad9f16689419..2362bb8ef6d1 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -2006,8 +2006,7 @@ int bch_btree_check(struct cache_set *c) int i; struct bkey *k = NULL; struct btree_iter iter; - struct btree_check_state *check_state; - char name[32]; + struct btree_check_state check_state; /* check and mark root node keys */ for_each_key_filter(&c->root->keys, k, &iter, bch_ptr_invalid) @@ -2018,63 +2017,58 @@ int bch_btree_check(struct cache_set *c) if (c->root->level == 0) return 0; - check_state = kzalloc(sizeof(struct btree_check_state), GFP_KERNEL); - if (!check_state) - return -ENOMEM; - - check_state->c = c; - check_state->total_threads = bch_btree_chkthread_nr(); - check_state->key_idx = 0; - spin_lock_init(&check_state->idx_lock); - atomic_set(&check_state->started, 0); - atomic_set(&check_state->enough, 0); - init_waitqueue_head(&check_state->wait); + check_state.c = c; + check_state.total_threads = bch_btree_chkthread_nr(); + check_state.key_idx = 0; + spin_lock_init(&check_state.idx_lock); + atomic_set(&check_state.started, 0); + atomic_set(&check_state.enough, 0); + init_waitqueue_head(&check_state.wait); + rw_lock(0, c->root, c->root->level); /* * Run multiple threads to check btree nodes in parallel, - * if check_state->enough is non-zero, it means current + * if check_state.enough is non-zero, it means current * running check threads are enough, unncessary to create * more. */ - for (i = 0; i < check_state->total_threads; i++) { - /* fetch latest check_state->enough earlier */ + for (i = 0; i < check_state.total_threads; i++) { + /* fetch latest check_state.enough earlier */ smp_mb__before_atomic(); - if (atomic_read(&check_state->enough)) + if (atomic_read(&check_state.enough)) break; - check_state->infos[i].result = 0; - check_state->infos[i].state = check_state; - snprintf(name, sizeof(name), "bch_btrchk[%u]", i); - atomic_inc(&check_state->started); + check_state.infos[i].result = 0; + check_state.infos[i].state = &check_state; - check_state->infos[i].thread = + check_state.infos[i].thread = kthread_run(bch_btree_check_thread, - &check_state->infos[i], - name); - if (IS_ERR(check_state->infos[i].thread)) { + &check_state.infos[i], + "bch_btrchk[%d]", i); + if (IS_ERR(check_state.infos[i].thread)) { pr_err("fails to run thread bch_btrchk[%d]\n", i); for (--i; i >= 0; i--) - kthread_stop(check_state->infos[i].thread); + kthread_stop(check_state.infos[i].thread); ret = -ENOMEM; goto out; } + atomic_inc(&check_state.started); } /* * Must wait for all threads to stop. */ - wait_event_interruptible(check_state->wait, - atomic_read(&check_state->started) == 0); + wait_event(check_state.wait, atomic_read(&check_state.started) == 0); - for (i = 0; i < check_state->total_threads; i++) { - if (check_state->infos[i].result) { - ret = check_state->infos[i].result; + for (i = 0; i < check_state.total_threads; i++) { + if (check_state.infos[i].result) { + ret = check_state.infos[i].result; goto out; } } out: - kfree(check_state); + rw_unlock(0, c->root); return ret; } -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket 2022-05-21 17:04 [PATCH 1/4] bcache: improve multithreaded bch_btree_check() Coly Li @ 2022-05-21 17:05 ` Coly Li 2022-06-08 20:45 ` Nix 0 siblings, 1 reply; 10+ messages in thread From: Coly Li @ 2022-05-21 17:05 UTC (permalink / raw) To: linux-bcache; +Cc: linux-block, Coly Li, Nikhil Kshirsagar, stable The journal no-space deadlock was reported time to time. Such deadlock can happen in the following situation. When all journal buckets are fully filled by active jset with heavy write I/O load, the cache set registration (after a reboot) will load all active jsets and inserting them into the btree again (which is called journal replay). If a journaled bkey is inserted into a btree node and results btree node split, new journal request might be triggered. For example, the btree grows one more level after the node split, then the root node record in cache device super block will be upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no space in journal buckets, the journal replay has to wait for new journal bucket to be reclaimed after at least one journal bucket replayed. This is one example that how the journal no-space deadlock happens. The solution to avoid the deadlock is to reserve 1 journal bucket in run time, and only permit the reserved journal bucket to be used during cache set registration procedure for things like journal replay. Then the journal space will never be fully filled, there is no chance for journal no-space deadlock to happen anymore. This patch adds a new member "bool do_reserve" in struct journal, it is inititalized to 0 (false) when struct journal is allocated, and set to 1 (true) by bch_journal_space_reserve() when all initialization done in run_cache_set(). In the run time when journal_reclaim() tries to allocate a new journal bucket, free_journal_buckets() is called to check whether there are enough free journal buckets to use. If there is only 1 free journal bucket and journal->do_reserve is 1 (true), the last bucket is reserved and free_journal_buckets() will return 0 to indicate no free journal bucket. Then journal_reclaim() will give up, and try next time to see whetheer there is free journal bucket to allocate. By this method, there is always 1 jouranl bucket reserved in run time. During the cache set registration, journal->do_reserve is 0 (false), so the reserved journal bucket can be used to avoid the no-space deadlock. Reported-by: Nikhil Kshirsagar <nkshirsagar@gmail.com> Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org --- drivers/md/bcache/journal.c | 31 ++++++++++++++++++++++++++----- drivers/md/bcache/journal.h | 2 ++ drivers/md/bcache/super.c | 1 + 3 files changed, 29 insertions(+), 5 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index df5347ea450b..e5da469a4235 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -405,6 +405,11 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list) return ret; } +void bch_journal_space_reserve(struct journal *j) +{ + j->do_reserve = true; +} + /* Journalling */ static void btree_flush_write(struct cache_set *c) @@ -621,12 +626,30 @@ static void do_journal_discard(struct cache *ca) } } +static unsigned int free_journal_buckets(struct cache_set *c) +{ + struct journal *j = &c->journal; + struct cache *ca = c->cache; + struct journal_device *ja = &c->cache->journal; + unsigned int n; + + /* In case njournal_buckets is not power of 2 */ + if (ja->cur_idx >= ja->discard_idx) + n = ca->sb.njournal_buckets + ja->discard_idx - ja->cur_idx; + else + n = ja->discard_idx - ja->cur_idx; + + if (n > (1 + j->do_reserve)) + return n - (1 + j->do_reserve); + + return 0; +} + static void journal_reclaim(struct cache_set *c) { struct bkey *k = &c->journal.key; struct cache *ca = c->cache; uint64_t last_seq; - unsigned int next; struct journal_device *ja = &ca->journal; atomic_t p __maybe_unused; @@ -649,12 +672,10 @@ static void journal_reclaim(struct cache_set *c) if (c->journal.blocks_free) goto out; - next = (ja->cur_idx + 1) % ca->sb.njournal_buckets; - /* No space available on this device */ - if (next == ja->discard_idx) + if (!free_journal_buckets(c)) goto out; - ja->cur_idx = next; + ja->cur_idx = (ja->cur_idx + 1) % ca->sb.njournal_buckets; k->ptr[0] = MAKE_PTR(0, bucket_to_sector(c, ca->sb.d[ja->cur_idx]), ca->sb.nr_this_dev); diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h index f2ea34d5f431..cd316b4a1e95 100644 --- a/drivers/md/bcache/journal.h +++ b/drivers/md/bcache/journal.h @@ -105,6 +105,7 @@ struct journal { spinlock_t lock; spinlock_t flush_write_lock; bool btree_flushing; + bool do_reserve; /* used when waiting because the journal was full */ struct closure_waitlist wait; struct closure io; @@ -182,5 +183,6 @@ int bch_journal_replay(struct cache_set *c, struct list_head *list); void bch_journal_free(struct cache_set *c); int bch_journal_alloc(struct cache_set *c); +void bch_journal_space_reserve(struct journal *j); #endif /* _BCACHE_JOURNAL_H */ diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index bf3de149d3c9..2bb55278d22d 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2128,6 +2128,7 @@ static int run_cache_set(struct cache_set *c) flash_devs_run(c); + bch_journal_space_reserve(&c->journal); set_bit(CACHE_SET_RUNNING, &c->flags); return 0; err: -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket 2022-05-21 17:05 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li @ 2022-06-08 20:45 ` Nix 2022-06-09 13:54 ` Coly Li 0 siblings, 1 reply; 10+ messages in thread From: Nix @ 2022-06-08 20:45 UTC (permalink / raw) To: Coly Li; +Cc: linux-bcache, linux-block, Nikhil Kshirsagar, stable On 21 May 2022, Coly Li spake thusly: > When all journal buckets are fully filled by active jset with heavy > write I/O load, the cache set registration (after a reboot) will load > all active jsets and inserting them into the btree again (which is > called journal replay). If a journaled bkey is inserted into a btree > node and results btree node split, new journal request might be > triggered. For example, the btree grows one more level after the node > split, then the root node record in cache device super block will be > upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no > space in journal buckets, the journal replay has to wait for new journal > bucket to be reclaimed after at least one journal bucket replayed. This > is one example that how the journal no-space deadlock happens. > > The solution to avoid the deadlock is to reserve 1 journal bucket in It seems to me that this could happen more than once in a single journal replay (multiple nodes might be split, etc). Is one bucket actually always enough, or is it merely enough nearly all the time? -- NULL && (void) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket 2022-06-08 20:45 ` Nix @ 2022-06-09 13:54 ` Coly Li 0 siblings, 0 replies; 10+ messages in thread From: Coly Li @ 2022-06-09 13:54 UTC (permalink / raw) To: Nix; +Cc: linux-bcache, linux-block, Nikhil Kshirsagar, stable > 2022年6月9日 04:45,Nix <nix@esperi.org.uk> 写道: > > On 21 May 2022, Coly Li spake thusly: > >> When all journal buckets are fully filled by active jset with heavy >> write I/O load, the cache set registration (after a reboot) will load >> all active jsets and inserting them into the btree again (which is >> called journal replay). If a journaled bkey is inserted into a btree >> node and results btree node split, new journal request might be >> triggered. For example, the btree grows one more level after the node >> split, then the root node record in cache device super block will be >> upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no >> space in journal buckets, the journal replay has to wait for new journal >> bucket to be reclaimed after at least one journal bucket replayed. This >> is one example that how the journal no-space deadlock happens. >> >> The solution to avoid the deadlock is to reserve 1 journal bucket in > > It seems to me that this could happen more than once in a single journal > replay (multiple nodes might be split, etc). Is one bucket actually > always enough, or is it merely enough nearly all the time? It is possible that multiple leaf nodes split during journal replay, but the journal_meta() only gets called when the root node is updated. For the new bkey of the new split node inserting into root node, it doesn’t go into journal because journal only records inserting bkeys for leaf nodes. Only when the btree node split causes root node split, the new root node location (bkey) has to be recored in journal set. Therefore almost all the time that btree root node only splits once during journal replay, it is very rare that between two root node splits (that means very large number of bkeys inserted) the oldest journal entry doesn’t get replayed, that is almost impossible in real practice. So reserving 8K journal space is indeed enough for the no-space deadlock situation. The default bucket size is much larger than 8K, so we don’t have to worry about the reserved journal space exhausted even with a much larger journal buckets number. Indeed my initial effort was to reserve 8K space within a journal bucket if the bucket size > 8KB. But there are too many locations should be careful, and the logic of the patch is complicated and total change set is 200+ lines. And I find if I reserve a whole bucket, the change set is only 30+ lines. So finally I decide to reserve a whole journal bucket, because the change is much simpler and easier to be understood. Coly Li ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-06-09 13:55 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-24 10:23 [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Coly Li 2022-05-24 10:23 ` [PATCH 1/4] bcache: improve multithreaded bch_btree_check() Coly Li 2022-05-24 10:23 ` [PATCH 2/4] bcache: improve multithreaded bch_sectors_dirty_init() Coly Li 2022-05-24 10:23 ` [PATCH 3/4] bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() Coly Li 2022-05-24 10:23 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li 2022-05-24 12:19 ` [Resend PATCH v2 0/4] bcache fixes for Linux v5.19 (1st wave) Jens Axboe -- strict thread matches above, loose matches on Subject: below -- 2022-05-22 17:07 [PATCH 0/4] bcache patches " Coly Li 2022-05-22 17:07 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li 2022-05-21 17:04 [PATCH 1/4] bcache: improve multithreaded bch_btree_check() Coly Li 2022-05-21 17:05 ` [PATCH 4/4] bcache: avoid journal no-space deadlock by reserving 1 journal bucket Coly Li 2022-06-08 20:45 ` Nix 2022-06-09 13:54 ` Coly Li
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.