All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] bcache: fixup btree_cache_wait list damage
@ 2022-04-01 12:27 mingzhe.zou
  2022-04-01 12:27 ` [PATCH 2/2] bcache: check return in the register process and handle error mingzhe.zou
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: mingzhe.zou @ 2022-04-01 12:27 UTC (permalink / raw)
  To: colyli, linux-bcache; +Cc: zoumingzhe, ZouMingzhe

From: ZouMingzhe <mingzhe.zou@easystack.cn>

We get a kernel crash about "list_add corruption. next->prev should be
prev (ffff9c801bc01210), but was ffff9c77b688237c. (next=ffffae586d8afe68)."

crash> struct list_head 0xffff9c801bc01210
struct list_head {
  next = 0xffffae586d8afe68,
  prev = 0xffffae586d8afe68
}
crash> struct list_head 0xffff9c77b688237c
struct list_head {
  next = 0x0,
  prev = 0x0
}
crash> struct list_head 0xffffae586d8afe68
struct list_head struct: invalid kernel virtual address: ffffae586d8afe68  type: "gdb_readmem_callback"
Cannot access memory at address 0xffffae586d8afe68

[230469.019492] Call Trace:
[230469.032041]  prepare_to_wait+0x8a/0xb0
[230469.044363]  ? bch_btree_keys_free+0x6c/0xc0 [escache]
[230469.056533]  mca_cannibalize_lock+0x72/0x90 [escache]
[230469.068788]  mca_alloc+0x2ae/0x450 [escache]
[230469.080790]  bch_btree_node_get+0x136/0x2d0 [escache]
[230469.092681]  bch_btree_check_thread+0x1e1/0x260 [escache]
[230469.104382]  ? finish_wait+0x80/0x80
[230469.115884]  ? bch_btree_check_recurse+0x1a0/0x1a0 [escache]
[230469.127259]  kthread+0x112/0x130
[230469.138448]  ? kthread_flush_work_fn+0x10/0x10
[230469.149477]  ret_from_fork+0x35/0x40

bch_btree_check_thread() and bch_dirty_init_thread() maybe call
mca_cannibalize() to cannibalize other cached btree nodes. Only
one thread can do it at a time, so the op of other threads will
be added to the btree_cache_wait list.

We must call finish_wait() to remove op from btree_cache_wait
before free it's memory address. Otherwise, the list will be
damaged. Also should call bch_cannibalize_unlock() to release
the btree_cache_alloc_lock and wake_up other waiters.

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
---
 drivers/md/bcache/btree.c     | 10 +++++++++-
 drivers/md/bcache/btree.h     |  2 ++
 drivers/md/bcache/writeback.c |  8 ++++++++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index ad9f16689419..f8e6f5c7c736 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -885,7 +885,7 @@ static struct btree *mca_cannibalize(struct cache_set *c, struct btree_op *op,
  * cannibalize_bucket() will take. This means every time we unlock the root of
  * the btree, we need to release this lock if we have it held.
  */
-static void bch_cannibalize_unlock(struct cache_set *c)
+void bch_cannibalize_unlock(struct cache_set *c)
 {
 	spin_lock(&c->btree_cannibalize_lock);
 	if (c->btree_cache_alloc_lock == current) {
@@ -1968,6 +1968,14 @@ static int bch_btree_check_thread(void *arg)
 			c->gc_stats.nodes++;
 			bch_btree_op_init(&op, 0);
 			ret = bcache_btree(check_recurse, p, c->root, &op);
+			/* The op may be added to cache_set's btree_cache_wait
+			* in mca_cannibalize(), must ensure it is removed from
+			* the list and release btree_cache_alloc_lock before
+			* free op memory.
+			* Otherwise, the btree_cache_wait will be damaged.
+			*/
+			bch_cannibalize_unlock(c);
+			finish_wait(&c->btree_cache_wait, &(&op)->wait);
 			if (ret)
 				goto out;
 		}
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index 50482107134f..435e82574ac3 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -365,6 +365,8 @@ static inline void force_wake_up_gc(struct cache_set *c)
 	_r;                                                             \
 })
 
+void bch_cannibalize_unlock(struct cache_set *c);
+
 #define MAP_DONE	0
 #define MAP_CONTINUE	1
 
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 9ee0005874cd..5b828555bca8 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -865,6 +865,14 @@ static int bch_root_node_dirty_init(struct cache_set *c,
 		}
 	} while (ret == -EAGAIN);
 
+	/* The op may be added to cache_set's btree_cache_wait
+	 * in mca_cannibalize(), must ensure it is removed from
+	 * the list and release btree_cache_alloc_lock before
+	 * free op memory.
+	 * Otherwise, the btree_cache_wait will be damaged.
+	 */
+	bch_cannibalize_unlock(c);
+	finish_wait(&c->btree_cache_wait, &(&op.op)->wait);
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] bcache: check return in the register process and handle error
  2022-04-01 12:27 [PATCH 1/2] bcache: fixup btree_cache_wait list damage mingzhe.zou
@ 2022-04-01 12:27 ` mingzhe.zou
  2022-04-07 16:24   ` Coly Li
  2022-04-07 15:53 ` [PATCH 1/2] bcache: fixup btree_cache_wait list damage Coly Li
  2022-04-11  3:04 ` [PATCH v2 1/3] bcache: bch_sectors_dirty_init() check each thread result and return error mingzhe.zou
  2 siblings, 1 reply; 12+ messages in thread
From: mingzhe.zou @ 2022-04-01 12:27 UTC (permalink / raw)
  To: colyli, linux-bcache; +Cc: zoumingzhe, ZouMingzhe

From: ZouMingzhe <mingzhe.zou@easystack.cn>

Register bcache device process maybe return error. However,
some functions miss return value or unchecked. This allows
the problem device to be successfully registered.

We must to check for these errors and handle them properly.

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
---
 drivers/md/bcache/btree.c     | 30 ++++++++++-----
 drivers/md/bcache/super.c     | 69 +++++++++++++++++++++++------------
 drivers/md/bcache/writeback.c | 27 +++++++++++---
 drivers/md/bcache/writeback.h |  3 +-
 4 files changed, 91 insertions(+), 38 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index f8e6f5c7c736..7525d8c2406c 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1032,7 +1032,7 @@ struct btree *bch_btree_node_get(struct cache_set *c, struct btree_op *op,
 	return b;
 }
 
-static void btree_node_prefetch(struct btree *parent, struct bkey *k)
+static int btree_node_prefetch(struct btree *parent, struct bkey *k)
 {
 	struct btree *b;
 
@@ -1040,11 +1040,16 @@ static void btree_node_prefetch(struct btree *parent, struct bkey *k)
 	b = mca_alloc(parent->c, NULL, k, parent->level - 1);
 	mutex_unlock(&parent->c->bucket_lock);
 
-	if (!IS_ERR_OR_NULL(b)) {
-		b->parent = parent;
-		bch_btree_node_read(b);
-		rw_unlock(true, b);
-	}
+	if (IS_ERR(b))
+		return PTR_ERR(b);
+
+	if (!b)
+		return -ENOMEM;
+
+	b->parent = parent;
+	bch_btree_node_read(b);
+	rw_unlock(true, b);
+	return 0;
 }
 
 /* Btree alloc */
@@ -1888,7 +1893,7 @@ static int bch_btree_check_recurse(struct btree *b, struct btree_op *op)
 			k = bch_btree_iter_next_filter(&iter, &b->keys,
 						       bch_ptr_bad);
 			if (k) {
-				btree_node_prefetch(b, k);
+				ret = btree_node_prefetch(b, k);
 				/*
 				 * initiallize c->gc_stats.nodes
 				 * for incremental GC
@@ -1896,7 +1901,7 @@ static int bch_btree_check_recurse(struct btree *b, struct btree_op *op)
 				b->c->gc_stats.nodes++;
 			}
 
-			if (p)
+			if (p && !ret)
 				ret = bcache_btree(check_recurse, p, b, op);
 
 			p = k;
@@ -1965,6 +1970,9 @@ static int bch_btree_check_thread(void *arg)
 			struct btree_op op;
 
 			btree_node_prefetch(c->root, p);
+			ret = btree_node_prefetch(c->root, p);
+			if (ret)
+				goto out;
 			c->gc_stats.nodes++;
 			bch_btree_op_init(&op, 0);
 			ret = bcache_btree(check_recurse, p, c->root, &op);
@@ -2064,16 +2072,20 @@ int bch_btree_check(struct cache_set *c)
 			for (--i; i >= 0; i--)
 				kthread_stop(check_state->infos[i].thread);
 			ret = -ENOMEM;
-			goto out;
+			goto out_wait;
 		}
 	}
 
 	/*
 	 * Must wait for all threads to stop.
 	 */
+out_wait:
 	wait_event_interruptible(check_state->wait,
 				 atomic_read(&check_state->started) == 0);
 
+	if (ret)
+		goto out;
+
 	for (i = 0; i < check_state->total_threads; i++) {
 		if (check_state->infos[i].result) {
 			ret = check_state->infos[i].result;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index bf3de149d3c9..a6b062a87acc 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1300,21 +1300,17 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
 		bch_writeback_queue(dc);
 	}
 
-	bch_sectors_dirty_init(&dc->disk);
+	ret = bch_sectors_dirty_init(&dc->disk);
+	if (ret) {
+		pr_err("Fails in sectors dirty init for %s\n",
+		       dc->disk.disk->disk_name);
+		goto err;
+	}
 
 	ret = bch_cached_dev_run(dc);
 	if (ret && (ret != -EBUSY)) {
-		up_write(&dc->writeback_lock);
-		/*
-		 * bch_register_lock is held, bcache_device_stop() is not
-		 * able to be directly called. The kthread and kworker
-		 * created previously in bch_cached_dev_writeback_start()
-		 * have to be stopped manually here.
-		 */
-		kthread_stop(dc->writeback_thread);
-		cancel_writeback_rate_update_dwork(dc);
 		pr_err("Couldn't run cached device %pg\n", dc->bdev);
-		return ret;
+		goto err;
 	}
 
 	bcache_device_link(&dc->disk, c, "bdev");
@@ -1334,6 +1330,18 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
 		dc->disk.disk->disk_name,
 		dc->disk.c->set_uuid);
 	return 0;
+
+err:
+	up_write(&dc->writeback_lock);
+	/*
+	 * bch_register_lock is held, bcache_device_stop() is not
+	 * able to be directly called. The kthread and kworker
+	 * created previously in bch_cached_dev_writeback_start()
+	 * have to be stopped manually here.
+	 */
+	kthread_stop(dc->writeback_thread);
+	cancel_writeback_rate_update_dwork(dc);
+	return ret;
 }
 
 /* when dc->disk.kobj released */
@@ -1472,8 +1480,12 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
 
 	list_add(&dc->list, &uncached_devices);
 	/* attach to a matched cache set if it exists */
-	list_for_each_entry(c, &bch_cache_sets, list)
-		bch_cached_dev_attach(dc, c, NULL);
+	err = "failed to attach cached device";
+	list_for_each_entry(c, &bch_cache_sets, list) {
+		ret = bch_cached_dev_attach(dc, c, NULL);
+		if (ret)
+			goto err;
+	}
 
 	if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE ||
 	    BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) {
@@ -1526,7 +1538,7 @@ static void flash_dev_flush(struct closure *cl)
 
 static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
 {
-	int err = -ENOMEM;
+	int ret = -ENOMEM;
 	struct bcache_device *d = kzalloc(sizeof(struct bcache_device),
 					  GFP_KERNEL);
 	if (!d)
@@ -1542,14 +1554,19 @@ static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
 		goto err;
 
 	bcache_device_attach(d, c, u - c->uuids);
-	bch_sectors_dirty_init(d);
+
+	ret = bch_sectors_dirty_init(d);
+	if (ret)
+		goto err;
+
 	bch_flash_dev_request_init(d);
-	err = add_disk(d->disk);
-	if (err)
+
+	ret = add_disk(d->disk);
+	if (ret)
 		goto err;
 
-	err = kobject_add(&d->kobj, &disk_to_dev(d->disk)->kobj, "bcache");
-	if (err)
+	ret = kobject_add(&d->kobj, &disk_to_dev(d->disk)->kobj, "bcache");
+	if (ret)
 		goto err;
 
 	bcache_device_link(d, c, "volume");
@@ -1564,7 +1581,7 @@ static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
 err:
 	kobject_put(&d->kobj);
 err_ret:
-	return err;
+	return ret;
 }
 
 static int flash_devs_run(struct cache_set *c)
@@ -1971,6 +1988,8 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 
 static int run_cache_set(struct cache_set *c)
 {
+	int ret = -EIO;
+
 	const char *err = "cannot allocate memory";
 	struct cached_dev *dc, *t;
 	struct cache *ca = c->cache;
@@ -2123,8 +2142,12 @@ static int run_cache_set(struct cache_set *c)
 	if (bch_has_feature_obso_large_bucket(&c->cache->sb))
 		pr_err("Detect obsoleted large bucket layout, all attached bcache device will be read-only\n");
 
-	list_for_each_entry_safe(dc, t, &uncached_devices, list)
-		bch_cached_dev_attach(dc, c, NULL);
+	err = "failed to attach cached device";
+	list_for_each_entry_safe(dc, t, &uncached_devices, list) {
+		ret = bch_cached_dev_attach(dc, c, NULL);
+		if (ret)
+			goto err;
+	}
 
 	flash_devs_run(c);
 
@@ -2141,7 +2164,7 @@ static int run_cache_set(struct cache_set *c)
 
 	bch_cache_set_error(c, "%s", err);
 
-	return -EIO;
+	return ret;
 }
 
 static const char *register_cache_set(struct cache *ca)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 5b828555bca8..f61cb71ca0c7 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -878,6 +878,7 @@ static int bch_root_node_dirty_init(struct cache_set *c,
 
 static int bch_dirty_init_thread(void *arg)
 {
+	int ret = 0;
 	struct dirty_init_thrd_info *info = arg;
 	struct bch_dirty_init_state *state = info->state;
 	struct cache_set *c = state->c;
@@ -919,7 +920,8 @@ static int bch_dirty_init_thread(void *arg)
 		}
 
 		if (p) {
-			if (bch_root_node_dirty_init(c, state->d, p) < 0)
+			ret = bch_root_node_dirty_init(c, state->d, p);
+			if (ret < 0)
 				goto out;
 		}
 
@@ -929,6 +931,7 @@ static int bch_dirty_init_thread(void *arg)
 	}
 
 out:
+	info->result = ret;
 	/* In order to wake up state->wait in time */
 	smp_mb__before_atomic();
 	if (atomic_dec_and_test(&state->started))
@@ -949,8 +952,9 @@ static int bch_btre_dirty_init_thread_nr(void)
 	return n;
 }
 
-void bch_sectors_dirty_init(struct bcache_device *d)
+int bch_sectors_dirty_init(struct bcache_device *d)
 {
+	int ret = 0;
 	int i;
 	struct bkey *k = NULL;
 	struct btree_iter iter;
@@ -969,13 +973,13 @@ void bch_sectors_dirty_init(struct bcache_device *d)
 		for_each_key_filter(&c->root->keys,
 				    k, &iter, bch_ptr_invalid)
 			sectors_dirty_init_fn(&op.op, c->root, k);
-		return;
+		return 0;
 	}
 
 	state = kzalloc(sizeof(struct bch_dirty_init_state), GFP_KERNEL);
 	if (!state) {
 		pr_warn("sectors dirty init failed: cannot allocate memory\n");
-		return;
+		return -ENOMEM;
 	}
 
 	state->c = c;
@@ -1005,18 +1009,31 @@ void bch_sectors_dirty_init(struct bcache_device *d)
 			pr_err("fails to run thread bch_dirty_init[%d]\n", i);
 			for (--i; i >= 0; i--)
 				kthread_stop(state->infos[i].thread);
-			goto out;
+			ret = -ENOMEM;
+			goto out_wait;
 		}
 	}
 
 	/*
 	 * Must wait for all threads to stop.
 	 */
+out_wait:
 	wait_event_interruptible(state->wait,
 		 atomic_read(&state->started) == 0);
 
+	if (ret)
+		goto out;
+
+	for (i = 0; i < state->total_threads; i++) {
+		if (state->infos[i].result) {
+			ret = state->infos[i].result;
+			goto out;
+		}
+	}
+
 out:
 	kfree(state);
+	return ret;
 }
 
 void bch_cached_dev_writeback_init(struct cached_dev *dc)
diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
index 02b2f9df73f6..0e42b3de5be6 100644
--- a/drivers/md/bcache/writeback.h
+++ b/drivers/md/bcache/writeback.h
@@ -32,6 +32,7 @@ struct bch_dirty_init_state;
 struct dirty_init_thrd_info {
 	struct bch_dirty_init_state	*state;
 	struct task_struct		*thread;
+	int				result;
 };
 
 struct bch_dirty_init_state {
@@ -148,7 +149,7 @@ static inline void bch_writeback_add(struct cached_dev *dc)
 void bcache_dev_sectors_dirty_add(struct cache_set *c, unsigned int inode,
 				  uint64_t offset, int nr_sectors);
 
-void bch_sectors_dirty_init(struct bcache_device *d);
+int bch_sectors_dirty_init(struct bcache_device *d);
 void bch_cached_dev_writeback_init(struct cached_dev *dc);
 int bch_cached_dev_writeback_start(struct cached_dev *dc);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] bcache: fixup btree_cache_wait list damage
  2022-04-01 12:27 [PATCH 1/2] bcache: fixup btree_cache_wait list damage mingzhe.zou
  2022-04-01 12:27 ` [PATCH 2/2] bcache: check return in the register process and handle error mingzhe.zou
@ 2022-04-07 15:53 ` Coly Li
  2022-04-08  7:22   ` Zou Mingzhe
  2022-04-11  3:04 ` [PATCH v2 1/3] bcache: bch_sectors_dirty_init() check each thread result and return error mingzhe.zou
  2 siblings, 1 reply; 12+ messages in thread
From: Coly Li @ 2022-04-07 15:53 UTC (permalink / raw)
  To: mingzhe.zou; +Cc: zoumingzhe, linux-bcache

On 4/1/22 8:27 PM, mingzhe.zou@easystack.cn wrote:
> From: ZouMingzhe <mingzhe.zou@easystack.cn>
>
> We get a kernel crash about "list_add corruption. next->prev should be
> prev (ffff9c801bc01210), but was ffff9c77b688237c. (next=ffffae586d8afe68)."
>
> crash> struct list_head 0xffff9c801bc01210
> struct list_head {
>    next = 0xffffae586d8afe68,
>    prev = 0xffffae586d8afe68
> }
> crash> struct list_head 0xffff9c77b688237c
> struct list_head {
>    next = 0x0,
>    prev = 0x0
> }
> crash> struct list_head 0xffffae586d8afe68
> struct list_head struct: invalid kernel virtual address: ffffae586d8afe68  type: "gdb_readmem_callback"
> Cannot access memory at address 0xffffae586d8afe68
>
> [230469.019492] Call Trace:
> [230469.032041]  prepare_to_wait+0x8a/0xb0
> [230469.044363]  ? bch_btree_keys_free+0x6c/0xc0 [escache]
> [230469.056533]  mca_cannibalize_lock+0x72/0x90 [escache]
> [230469.068788]  mca_alloc+0x2ae/0x450 [escache]
> [230469.080790]  bch_btree_node_get+0x136/0x2d0 [escache]
> [230469.092681]  bch_btree_check_thread+0x1e1/0x260 [escache]
> [230469.104382]  ? finish_wait+0x80/0x80
> [230469.115884]  ? bch_btree_check_recurse+0x1a0/0x1a0 [escache]
> [230469.127259]  kthread+0x112/0x130
> [230469.138448]  ? kthread_flush_work_fn+0x10/0x10
> [230469.149477]  ret_from_fork+0x35/0x40
>
> bch_btree_check_thread() and bch_dirty_init_thread() maybe call
> mca_cannibalize() to cannibalize other cached btree nodes. Only
> one thread can do it at a time, so the op of other threads will
> be added to the btree_cache_wait list.
>
> We must call finish_wait() to remove op from btree_cache_wait
> before free it's memory address. Otherwise, the list will be
> damaged. Also should call bch_cannibalize_unlock() to release
> the btree_cache_alloc_lock and wake_up other waiters.
>
> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>


Thank you for this fix, it is really cool to find such defect on 
cannibalize lock/unlock. It spent me a little time to understand how it 
happens, and reply you late.

I feel the root cause is not from where you patched in bch_btree_check() 
and bch_root_node_dirty_init(), something really fishing when using 
mca_cannibalize_lock(),

843 static int mca_cannibalize_lock(struct cache_set *c, struct btree_op 
*op)
  844 {
  845         spin_lock(&c->btree_cannibalize_lock);
  846         if (likely(c->btree_cache_alloc_lock == NULL)) {
  847                 c->btree_cache_alloc_lock = current;
  848         } else if (c->btree_cache_alloc_lock != current) {
  849                 if (op)
  850 prepare_to_wait(&c->btree_cache_wait, &op->wait,
  851 TASK_UNINTERRUPTIBLE);
  852 spin_unlock(&c->btree_cannibalize_lock);
  853                 return -EINTR;
  854         }
  855         spin_unlock(&c->btree_cannibalize_lock);
  856
  857         return 0;
  858 }

In line 849-851, if the cannibalized locking failed, insert current 
op->wait into c->btree_cache_wait. Then at line 852, return -EINTR to 
indicate the caller should retry. But it seems no caller checks whether 
the return value is -EINTR and handles it properly.

Your patch should work, but I feel the issue of bch_cannibalize_lock() 
is not solved yet. Maybe we should work on handling -EINTR returned from 
mca_cannibalize_lock() IMHO.


BTW, when you observe the panic, how are the hardware configurations about,

- CPU cores

- Memory size

- Cache size

-  Number of keys on the btree root node

Thanks.


Coly Li



> ---
>   drivers/md/bcache/btree.c     | 10 +++++++++-
>   drivers/md/bcache/btree.h     |  2 ++
>   drivers/md/bcache/writeback.c |  8 ++++++++
>   3 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index ad9f16689419..f8e6f5c7c736 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -885,7 +885,7 @@ static struct btree *mca_cannibalize(struct cache_set *c, struct btree_op *op,
>    * cannibalize_bucket() will take. This means every time we unlock the root of
>    * the btree, we need to release this lock if we have it held.
>    */
> -static void bch_cannibalize_unlock(struct cache_set *c)
> +void bch_cannibalize_unlock(struct cache_set *c)
>   {
>   	spin_lock(&c->btree_cannibalize_lock);
>   	if (c->btree_cache_alloc_lock == current) {
> @@ -1968,6 +1968,14 @@ static int bch_btree_check_thread(void *arg)
>   			c->gc_stats.nodes++;
>   			bch_btree_op_init(&op, 0);
>   			ret = bcache_btree(check_recurse, p, c->root, &op);
> +			/* The op may be added to cache_set's btree_cache_wait
> +			* in mca_cannibalize(), must ensure it is removed from
> +			* the list and release btree_cache_alloc_lock before
> +			* free op memory.
> +			* Otherwise, the btree_cache_wait will be damaged.
> +			*/
> +			bch_cannibalize_unlock(c);
> +			finish_wait(&c->btree_cache_wait, &(&op)->wait);
>   			if (ret)
>   				goto out;
>   		}
> diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
> index 50482107134f..435e82574ac3 100644
> --- a/drivers/md/bcache/btree.h
> +++ b/drivers/md/bcache/btree.h
> @@ -365,6 +365,8 @@ static inline void force_wake_up_gc(struct cache_set *c)
>   	_r;                                                             \
>   })
>   
> +void bch_cannibalize_unlock(struct cache_set *c);
> +
>   #define MAP_DONE	0
>   #define MAP_CONTINUE	1
>   
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 9ee0005874cd..5b828555bca8 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -865,6 +865,14 @@ static int bch_root_node_dirty_init(struct cache_set *c,
>   		}
>   	} while (ret == -EAGAIN);
>   
> +	/* The op may be added to cache_set's btree_cache_wait
> +	 * in mca_cannibalize(), must ensure it is removed from
> +	 * the list and release btree_cache_alloc_lock before
> +	 * free op memory.
> +	 * Otherwise, the btree_cache_wait will be damaged.
> +	 */
> +	bch_cannibalize_unlock(c);
> +	finish_wait(&c->btree_cache_wait, &(&op.op)->wait);
>   	return ret;
>   }
>   



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] bcache: check return in the register process and handle error
  2022-04-01 12:27 ` [PATCH 2/2] bcache: check return in the register process and handle error mingzhe.zou
@ 2022-04-07 16:24   ` Coly Li
  0 siblings, 0 replies; 12+ messages in thread
From: Coly Li @ 2022-04-07 16:24 UTC (permalink / raw)
  To: mingzhe.zou; +Cc: zoumingzhe, linux-bcache

On 4/1/22 8:27 PM, mingzhe.zou@easystack.cn wrote:
> From: ZouMingzhe <mingzhe.zou@easystack.cn>
>
> Register bcache device process maybe return error. However,
> some functions miss return value or unchecked. This allows
> the problem device to be successfully registered.
>
> We must to check for these errors and handle them properly.
>
> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
> ---
>   drivers/md/bcache/btree.c     | 30 ++++++++++-----
>   drivers/md/bcache/super.c     | 69 +++++++++++++++++++++++------------
>   drivers/md/bcache/writeback.c | 27 +++++++++++---
>   drivers/md/bcache/writeback.h |  3 +-
>   4 files changed, 91 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index f8e6f5c7c736..7525d8c2406c 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -1032,7 +1032,7 @@ struct btree *bch_btree_node_get(struct cache_set *c, struct btree_op *op,
>   	return b;
>   }
>   
> -static void btree_node_prefetch(struct btree *parent, struct bkey *k)
> +static int btree_node_prefetch(struct btree *parent, struct bkey *k)
>   {
>   	struct btree *b;
>   
> @@ -1040,11 +1040,16 @@ static void btree_node_prefetch(struct btree *parent, struct bkey *k)
>   	b = mca_alloc(parent->c, NULL, k, parent->level - 1);
>   	mutex_unlock(&parent->c->bucket_lock);
>   
> -	if (!IS_ERR_OR_NULL(b)) {
> -		b->parent = parent;
> -		bch_btree_node_read(b);
> -		rw_unlock(true, b);
> -	}
> +	if (IS_ERR(b))
> +		return PTR_ERR(b);
> +
> +	if (!b)
> +		return -ENOMEM;
> +
> +	b->parent = parent;
> +	bch_btree_node_read(b);
> +	rw_unlock(true, b);
> +	return 0;
>   }

Hi Mingzhe,

btree_node_prefetch() is a try-best effort, it could be failed and not 
required to succeed all the time. If it fails, the following code will 
allocate memory cache object for btree node later.

So don't add mandatory return value and check it.


>   
>   /* Btree alloc */
> @@ -1888,7 +1893,7 @@ static int bch_btree_check_recurse(struct btree *b, struct btree_op *op)
>   			k = bch_btree_iter_next_filter(&iter, &b->keys,
>   						       bch_ptr_bad);
>   			if (k) {
> -				btree_node_prefetch(b, k);
> +				ret = btree_node_prefetch(b, k);
>   				/*
>   				 * initiallize c->gc_stats.nodes
>   				 * for incremental GC
> @@ -1896,7 +1901,7 @@ static int bch_btree_check_recurse(struct btree *b, struct btree_op *op)
>   				b->c->gc_stats.nodes++;
>   			}
>   
> -			if (p)
> +			if (p && !ret)
>   				ret = bcache_btree(check_recurse, p, b, op);

For example the above code, if the previous btree_node_prefetch() 
succeeds, then the bcache_btree() iteration will not allocate memory 
cache object for btree node pointed by k again, other wise a memory 
cache object to btree node pointed by k will be allocated in iteration time.



>   
>   			p = k;
> @@ -1965,6 +1970,9 @@ static int bch_btree_check_thread(void *arg)
>   			struct btree_op op;
>   
>   			btree_node_prefetch(c->root, p);
> +			ret = btree_node_prefetch(c->root, p);
> +			if (ret)
> +				goto out;

Again, it is unnecessary to check btree_node_prefetch here neither.


>   			c->gc_stats.nodes++;
>   			bch_btree_op_init(&op, 0);
>   			ret = bcache_btree(check_recurse, p, c->root, &op);
> @@ -2064,16 +2072,20 @@ int bch_btree_check(struct cache_set *c)
>   			for (--i; i >= 0; i--)
>   				kthread_stop(check_state->infos[i].thread);
>   			ret = -ENOMEM;
> -			goto out;
> +			goto out_wait;
>   		}
>   	}
>   
>   	/*
>   	 * Must wait for all threads to stop.
>   	 */
> +out_wait:
>   	wait_event_interruptible(check_state->wait,
>   				 atomic_read(&check_state->started) == 0);
>   
> +	if (ret)
> +		goto out;
> +
>   	for (i = 0; i < check_state->total_threads; i++) {
>   		if (check_state->infos[i].result) {
>   			ret = check_state->infos[i].result;
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index bf3de149d3c9..a6b062a87acc 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1300,21 +1300,17 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
>   		bch_writeback_queue(dc);
>   	}
>   
> -	bch_sectors_dirty_init(&dc->disk);
> +	ret = bch_sectors_dirty_init(&dc->disk);
> +	if (ret) {
> +		pr_err("Fails in sectors dirty init for %s\n",
> +		       dc->disk.disk->disk_name);
> +		goto err;
> +	}
>   
>   	ret = bch_cached_dev_run(dc);
>   	if (ret && (ret != -EBUSY)) {
> -		up_write(&dc->writeback_lock);
> -		/*
> -		 * bch_register_lock is held, bcache_device_stop() is not
> -		 * able to be directly called. The kthread and kworker
> -		 * created previously in bch_cached_dev_writeback_start()
> -		 * have to be stopped manually here.
> -		 */
> -		kthread_stop(dc->writeback_thread);
> -		cancel_writeback_rate_update_dwork(dc);
>   		pr_err("Couldn't run cached device %pg\n", dc->bdev);
> -		return ret;
> +		goto err;
>   	}
>   
>   	bcache_device_link(&dc->disk, c, "bdev");
> @@ -1334,6 +1330,18 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
>   		dc->disk.disk->disk_name,
>   		dc->disk.c->set_uuid);
>   	return 0;
> +
> +err:
> +	up_write(&dc->writeback_lock);
> +	/*
> +	 * bch_register_lock is held, bcache_device_stop() is not
> +	 * able to be directly called. The kthread and kworker
> +	 * created previously in bch_cached_dev_writeback_start()
> +	 * have to be stopped manually here.
> +	 */
> +	kthread_stop(dc->writeback_thread);
> +	cancel_writeback_rate_update_dwork(dc);
> +	return ret;
>   }

I agree it makes sense to check failure from bch_sectors_dirty_init().


>   
>   /* when dc->disk.kobj released */
> @@ -1472,8 +1480,12 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
>   
>   	list_add(&dc->list, &uncached_devices);
>   	/* attach to a matched cache set if it exists */
> -	list_for_each_entry(c, &bch_cache_sets, list)
> -		bch_cached_dev_attach(dc, c, NULL);
> +	err = "failed to attach cached device";
> +	list_for_each_entry(c, &bch_cache_sets, list) {
> +		ret = bch_cached_dev_attach(dc, c, NULL);
> +		if (ret)
> +			goto err;
> +	}
>   


NO, all the candidate cache set should be tried, you cannot give up 
trying rested cache set when encounter an error here.


>   	if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE ||
>   	    BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) {
> @@ -1526,7 +1538,7 @@ static void flash_dev_flush(struct closure *cl)
>   
>   static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
>   {
> -	int err = -ENOMEM;
> +	int ret = -ENOMEM;
>   	struct bcache_device *d = kzalloc(sizeof(struct bcache_device),
>   					  GFP_KERNEL);
>   	if (!d)
> @@ -1542,14 +1554,19 @@ static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
>   		goto err;
>   
>   	bcache_device_attach(d, c, u - c->uuids);
> -	bch_sectors_dirty_init(d);
> +
> +	ret = bch_sectors_dirty_init(d);
> +	if (ret)
> +		goto err;

The above check  might make sense.


> +
>   	bch_flash_dev_request_init(d);
> -	err = add_disk(d->disk);
> -	if (err)
> +
> +	ret = add_disk(d->disk);
> +	if (ret)
>   		goto err;
>   
> -	err = kobject_add(&d->kobj, &disk_to_dev(d->disk)->kobj, "bcache");
> -	if (err)
> +	ret = kobject_add(&d->kobj, &disk_to_dev(d->disk)->kobj, "bcache");
> +	if (ret)
>   		goto err;
>   
>   	bcache_device_link(d, c, "volume");
> @@ -1564,7 +1581,7 @@ static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
>   err:
>   	kobject_put(&d->kobj);
>   err_ret:
> -	return err;
> +	return ret;
>   }

Why not still using int err? Except for the return value check for 
bcH_sectors_dirty_init(), the rested changes are only renaming.



>   
>   static int flash_devs_run(struct cache_set *c)
> @@ -1971,6 +1988,8 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
>   
>   static int run_cache_set(struct cache_set *c)
>   {
> +	int ret = -EIO;
> +
>   	const char *err = "cannot allocate memory";
>   	struct cached_dev *dc, *t;
>   	struct cache *ca = c->cache;
> @@ -2123,8 +2142,12 @@ static int run_cache_set(struct cache_set *c)
>   	if (bch_has_feature_obso_large_bucket(&c->cache->sb))
>   		pr_err("Detect obsoleted large bucket layout, all attached bcache device will be read-only\n");
>   
> -	list_for_each_entry_safe(dc, t, &uncached_devices, list)
> -		bch_cached_dev_attach(dc, c, NULL);
> +	err = "failed to attach cached device";
> +	list_for_each_entry_safe(dc, t, &uncached_devices, list) {
> +		ret = bch_cached_dev_attach(dc, c, NULL);
> +		if (ret)
> +			goto err;
> +	}
>   

This is wrong, all the uncached devices should be iterated.


>   	flash_devs_run(c);
>   
> @@ -2141,7 +2164,7 @@ static int run_cache_set(struct cache_set *c)
>   
>   	bch_cache_set_error(c, "%s", err);
>   
> -	return -EIO;
> +	return ret;
>   }
>   
>   static const char *register_cache_set(struct cache *ca)
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c


At the first glance, the rested changes make sense. Could you please to 
split this big patch into multiple ones, and for each patch only 
contains a specific change.


Thanks for the fix up.


Coly Li


> index 5b828555bca8..f61cb71ca0c7 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -878,6 +878,7 @@ static int bch_root_node_dirty_init(struct cache_set *c,
>   
>   static int bch_dirty_init_thread(void *arg)
>   {
> +	int ret = 0;
>   	struct dirty_init_thrd_info *info = arg;
>   	struct bch_dirty_init_state *state = info->state;
>   	struct cache_set *c = state->c;
> @@ -919,7 +920,8 @@ static int bch_dirty_init_thread(void *arg)
>   		}
>   
>   		if (p) {
> -			if (bch_root_node_dirty_init(c, state->d, p) < 0)
> +			ret = bch_root_node_dirty_init(c, state->d, p);
> +			if (ret < 0)
>   				goto out;
>   		}
>   
> @@ -929,6 +931,7 @@ static int bch_dirty_init_thread(void *arg)
>   	}
>   
>   out:
> +	info->result = ret;
>   	/* In order to wake up state->wait in time */
>   	smp_mb__before_atomic();
>   	if (atomic_dec_and_test(&state->started))
> @@ -949,8 +952,9 @@ static int bch_btre_dirty_init_thread_nr(void)
>   	return n;
>   }
>   
> -void bch_sectors_dirty_init(struct bcache_device *d)
> +int bch_sectors_dirty_init(struct bcache_device *d)
>   {
> +	int ret = 0;
>   	int i;
>   	struct bkey *k = NULL;
>   	struct btree_iter iter;
> @@ -969,13 +973,13 @@ void bch_sectors_dirty_init(struct bcache_device *d)
>   		for_each_key_filter(&c->root->keys,
>   				    k, &iter, bch_ptr_invalid)
>   			sectors_dirty_init_fn(&op.op, c->root, k);
> -		return;
> +		return 0;
>   	}
>   
>   	state = kzalloc(sizeof(struct bch_dirty_init_state), GFP_KERNEL);
>   	if (!state) {
>   		pr_warn("sectors dirty init failed: cannot allocate memory\n");
> -		return;
> +		return -ENOMEM;
>   	}
>   
>   	state->c = c;
> @@ -1005,18 +1009,31 @@ void bch_sectors_dirty_init(struct bcache_device *d)
>   			pr_err("fails to run thread bch_dirty_init[%d]\n", i);
>   			for (--i; i >= 0; i--)
>   				kthread_stop(state->infos[i].thread);
> -			goto out;
> +			ret = -ENOMEM;
> +			goto out_wait;
>   		}
>   	}
>   
>   	/*
>   	 * Must wait for all threads to stop.
>   	 */
> +out_wait:
>   	wait_event_interruptible(state->wait,
>   		 atomic_read(&state->started) == 0);
>   
> +	if (ret)
> +		goto out;
> +
> +	for (i = 0; i < state->total_threads; i++) {
> +		if (state->infos[i].result) {
> +			ret = state->infos[i].result;
> +			goto out;
> +		}
> +	}
> +
>   out:
>   	kfree(state);
> +	return ret;
>   }
>   
>   void bch_cached_dev_writeback_init(struct cached_dev *dc)
> diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
> index 02b2f9df73f6..0e42b3de5be6 100644
> --- a/drivers/md/bcache/writeback.h
> +++ b/drivers/md/bcache/writeback.h
> @@ -32,6 +32,7 @@ struct bch_dirty_init_state;
>   struct dirty_init_thrd_info {
>   	struct bch_dirty_init_state	*state;
>   	struct task_struct		*thread;
> +	int				result;
>   };
>   
>   struct bch_dirty_init_state {
> @@ -148,7 +149,7 @@ static inline void bch_writeback_add(struct cached_dev *dc)
>   void bcache_dev_sectors_dirty_add(struct cache_set *c, unsigned int inode,
>   				  uint64_t offset, int nr_sectors);
>   
> -void bch_sectors_dirty_init(struct bcache_device *d);
> +int bch_sectors_dirty_init(struct bcache_device *d);
>   void bch_cached_dev_writeback_init(struct cached_dev *dc);
>   int bch_cached_dev_writeback_start(struct cached_dev *dc);
>   



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] bcache: fixup btree_cache_wait list damage
  2022-04-07 15:53 ` [PATCH 1/2] bcache: fixup btree_cache_wait list damage Coly Li
@ 2022-04-08  7:22   ` Zou Mingzhe
  2022-04-08  8:39     ` Coly Li
  0 siblings, 1 reply; 12+ messages in thread
From: Zou Mingzhe @ 2022-04-08  7:22 UTC (permalink / raw)
  To: Coly Li; +Cc: zoumingzhe, linux-bcache


在 2022/4/7 23:53, Coly Li 写道:
> On 4/1/22 8:27 PM, mingzhe.zou@easystack.cn wrote:
>> From: ZouMingzhe <mingzhe.zou@easystack.cn>
>>
>> We get a kernel crash about "list_add corruption. next->prev should be
>> prev (ffff9c801bc01210), but was ffff9c77b688237c. 
>> (next=ffffae586d8afe68)."
>>
>> crash> struct list_head 0xffff9c801bc01210
>> struct list_head {
>>    next = 0xffffae586d8afe68,
>>    prev = 0xffffae586d8afe68
>> }
>> crash> struct list_head 0xffff9c77b688237c
>> struct list_head {
>>    next = 0x0,
>>    prev = 0x0
>> }
>> crash> struct list_head 0xffffae586d8afe68
>> struct list_head struct: invalid kernel virtual address: 
>> ffffae586d8afe68  type: "gdb_readmem_callback"
>> Cannot access memory at address 0xffffae586d8afe68
>>
>> [230469.019492] Call Trace:
>> [230469.032041]  prepare_to_wait+0x8a/0xb0
>> [230469.044363]  ? bch_btree_keys_free+0x6c/0xc0 [escache]
>> [230469.056533]  mca_cannibalize_lock+0x72/0x90 [escache]
>> [230469.068788]  mca_alloc+0x2ae/0x450 [escache]
>> [230469.080790]  bch_btree_node_get+0x136/0x2d0 [escache]
>> [230469.092681]  bch_btree_check_thread+0x1e1/0x260 [escache]
>> [230469.104382]  ? finish_wait+0x80/0x80
>> [230469.115884]  ? bch_btree_check_recurse+0x1a0/0x1a0 [escache]
>> [230469.127259]  kthread+0x112/0x130
>> [230469.138448]  ? kthread_flush_work_fn+0x10/0x10
>> [230469.149477]  ret_from_fork+0x35/0x40
>>
>> bch_btree_check_thread() and bch_dirty_init_thread() maybe call
>> mca_cannibalize() to cannibalize other cached btree nodes. Only
>> one thread can do it at a time, so the op of other threads will
>> be added to the btree_cache_wait list.
>>
>> We must call finish_wait() to remove op from btree_cache_wait
>> before free it's memory address. Otherwise, the list will be
>> damaged. Also should call bch_cannibalize_unlock() to release
>> the btree_cache_alloc_lock and wake_up other waiters.
>>
>> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
>
>
> Thank you for this fix, it is really cool to find such defect on 
> cannibalize lock/unlock. It spent me a little time to understand how 
> it happens, and reply you late.
>
> I feel the root cause is not from where you patched in 
> bch_btree_check() and bch_root_node_dirty_init(), something really 
> fishing when using mca_cannibalize_lock(),
>
> 843 static int mca_cannibalize_lock(struct cache_set *c, struct 
> btree_op *op)
>  844 {
>  845         spin_lock(&c->btree_cannibalize_lock);
>  846         if (likely(c->btree_cache_alloc_lock == NULL)) {
>  847                 c->btree_cache_alloc_lock = current;
>  848         } else if (c->btree_cache_alloc_lock != current) {
>  849                 if (op)
>  850 prepare_to_wait(&c->btree_cache_wait, &op->wait,
>  851 TASK_UNINTERRUPTIBLE);
>  852 spin_unlock(&c->btree_cannibalize_lock);
>  853                 return -EINTR;
>  854         }
>  855         spin_unlock(&c->btree_cannibalize_lock);
>  856
>  857         return 0;
>  858 }
>
> In line 849-851, if the cannibalized locking failed, insert current 
> op->wait into c->btree_cache_wait. Then at line 852, return -EINTR to 
> indicate the caller should retry. But it seems no caller checks 
> whether the return value is -EINTR and handles it properly.
>
> Your patch should work, but I feel the issue of bch_cannibalize_lock() 
> is not solved yet. Maybe we should work on handling -EINTR returned 
> from mca_cannibalize_lock() IMHO.
>
The patch 2 handle the return value.
>
> BTW, when you observe the panic, how are the hardware configurations 
> about,
>
> - CPU cores
0-39, total 40 cpus
>
> - Memory size
memory status:
crash> kmem -i
                  PAGES        TOTAL      PERCENTAGE
     TOTAL MEM  32919429     125.6 GB         ----
          FREE   638133       2.4 GB    1% of TOTAL MEM
          USED  32281296     123.1 GB   98% of TOTAL MEM
        SHARED  1353791       5.2 GB    4% of TOTAL MEM
       BUFFERS   131366     513.1 MB    0% of TOTAL MEM
        CACHED  2022521       7.7 GB    6% of TOTAL MEM
          SLAB   590919       2.3 GB    1% of TOTAL MEM

    TOTAL HUGE        0            0         ----
     HUGE FREE        0            0    0% of TOTAL HUGE

    TOTAL SWAP        0            0         ----
     SWAP USED        0            0    0% of TOTAL SWAP
     SWAP FREE        0            0    0% of TOTAL SWAP

  COMMIT LIMIT  16459714      62.8 GB         ----
     COMMITTED  67485109     257.4 GB  410% of TOTAL LIMIT
>
> - Cache size

cache disk 460G

>
> -  Number of keys on the btree root node

c->root->keys->set->data info:

crash> btree 0xffff9c6bd873cc00|grep data
         data = 0xffff9c6bda6c0000
         data = 0xffff9c6bda6cd000
         data = 0x0
         data = 0x0
         data = {
       data = {
crash> bset 0xffff9c6bda6c0000
struct bset {
   csum = 4228267359687445853,
   magic = 15660900678624291974,
   seq = 15025931623832980119,
   version = 1,
   keys = 6621,
   {
     start = 0xffff9c6bda6c0020,
     d = 0xffff9c6bda6c0020
   }
}
crash> bset 0xffff9c6bda6cd000
struct bset {
   csum = 38040912,
   magic = 15660900678624291974,
   seq = 15025931623832980119,
   version = 0,
   keys = 0,
   {
     start = 0xffff9c6bda6cd020,
     d = 0xffff9c6bda6cd020
   }
}


>
> Thanks.
>
>
> Coly Li
>
>
>
>> ---
>>   drivers/md/bcache/btree.c     | 10 +++++++++-
>>   drivers/md/bcache/btree.h     |  2 ++
>>   drivers/md/bcache/writeback.c |  8 ++++++++
>>   3 files changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
>> index ad9f16689419..f8e6f5c7c736 100644
>> --- a/drivers/md/bcache/btree.c
>> +++ b/drivers/md/bcache/btree.c
>> @@ -885,7 +885,7 @@ static struct btree *mca_cannibalize(struct 
>> cache_set *c, struct btree_op *op,
>>    * cannibalize_bucket() will take. This means every time we unlock 
>> the root of
>>    * the btree, we need to release this lock if we have it held.
>>    */
>> -static void bch_cannibalize_unlock(struct cache_set *c)
>> +void bch_cannibalize_unlock(struct cache_set *c)
>>   {
>>       spin_lock(&c->btree_cannibalize_lock);
>>       if (c->btree_cache_alloc_lock == current) {
>> @@ -1968,6 +1968,14 @@ static int bch_btree_check_thread(void *arg)
>>               c->gc_stats.nodes++;
>>               bch_btree_op_init(&op, 0);
>>               ret = bcache_btree(check_recurse, p, c->root, &op);
>> +            /* The op may be added to cache_set's btree_cache_wait
>> +            * in mca_cannibalize(), must ensure it is removed from
>> +            * the list and release btree_cache_alloc_lock before
>> +            * free op memory.
>> +            * Otherwise, the btree_cache_wait will be damaged.
>> +            */
>> +            bch_cannibalize_unlock(c);
>> +            finish_wait(&c->btree_cache_wait, &(&op)->wait);
>>               if (ret)
>>                   goto out;
>>           }
>> diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
>> index 50482107134f..435e82574ac3 100644
>> --- a/drivers/md/bcache/btree.h
>> +++ b/drivers/md/bcache/btree.h
>> @@ -365,6 +365,8 @@ static inline void force_wake_up_gc(struct 
>> cache_set *c)
>> _r; \
>>   })
>>   +void bch_cannibalize_unlock(struct cache_set *c);
>> +
>>   #define MAP_DONE    0
>>   #define MAP_CONTINUE    1
>>   diff --git a/drivers/md/bcache/writeback.c 
>> b/drivers/md/bcache/writeback.c
>> index 9ee0005874cd..5b828555bca8 100644
>> --- a/drivers/md/bcache/writeback.c
>> +++ b/drivers/md/bcache/writeback.c
>> @@ -865,6 +865,14 @@ static int bch_root_node_dirty_init(struct 
>> cache_set *c,
>>           }
>>       } while (ret == -EAGAIN);
>>   +    /* The op may be added to cache_set's btree_cache_wait
>> +     * in mca_cannibalize(), must ensure it is removed from
>> +     * the list and release btree_cache_alloc_lock before
>> +     * free op memory.
>> +     * Otherwise, the btree_cache_wait will be damaged.
>> +     */
>> +    bch_cannibalize_unlock(c);
>> +    finish_wait(&c->btree_cache_wait, &(&op.op)->wait);
>>       return ret;
>>   }
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] bcache: fixup btree_cache_wait list damage
  2022-04-08  7:22   ` Zou Mingzhe
@ 2022-04-08  8:39     ` Coly Li
  0 siblings, 0 replies; 12+ messages in thread
From: Coly Li @ 2022-04-08  8:39 UTC (permalink / raw)
  To: Zou Mingzhe; +Cc: zoumingzhe, linux-bcache

On 4/8/22 3:22 PM, Zou Mingzhe wrote:
>
> 在 2022/4/7 23:53, Coly Li 写道:
>> On 4/1/22 8:27 PM, mingzhe.zou@easystack.cn wrote:
>>> From: ZouMingzhe <mingzhe.zou@easystack.cn>
>>>
>>> We get a kernel crash about "list_add corruption. next->prev should be
>>> prev (ffff9c801bc01210), but was ffff9c77b688237c. 
>>> (next=ffffae586d8afe68)."
>>>
>>> crash> struct list_head 0xffff9c801bc01210
>>> struct list_head {
>>>    next = 0xffffae586d8afe68,
>>>    prev = 0xffffae586d8afe68
>>> }
>>> crash> struct list_head 0xffff9c77b688237c
>>> struct list_head {
>>>    next = 0x0,
>>>    prev = 0x0
>>> }
>>> crash> struct list_head 0xffffae586d8afe68
>>> struct list_head struct: invalid kernel virtual address: 
>>> ffffae586d8afe68  type: "gdb_readmem_callback"
>>> Cannot access memory at address 0xffffae586d8afe68
>>>
>>> [230469.019492] Call Trace:
>>> [230469.032041]  prepare_to_wait+0x8a/0xb0
>>> [230469.044363]  ? bch_btree_keys_free+0x6c/0xc0 [escache]
>>> [230469.056533]  mca_cannibalize_lock+0x72/0x90 [escache]
>>> [230469.068788]  mca_alloc+0x2ae/0x450 [escache]
>>> [230469.080790]  bch_btree_node_get+0x136/0x2d0 [escache]
>>> [230469.092681]  bch_btree_check_thread+0x1e1/0x260 [escache]
>>> [230469.104382]  ? finish_wait+0x80/0x80
>>> [230469.115884]  ? bch_btree_check_recurse+0x1a0/0x1a0 [escache]
>>> [230469.127259]  kthread+0x112/0x130
>>> [230469.138448]  ? kthread_flush_work_fn+0x10/0x10
>>> [230469.149477]  ret_from_fork+0x35/0x40
>>>
>>> bch_btree_check_thread() and bch_dirty_init_thread() maybe call
>>> mca_cannibalize() to cannibalize other cached btree nodes. Only
>>> one thread can do it at a time, so the op of other threads will
>>> be added to the btree_cache_wait list.
>>>
>>> We must call finish_wait() to remove op from btree_cache_wait
>>> before free it's memory address. Otherwise, the list will be
>>> damaged. Also should call bch_cannibalize_unlock() to release
>>> the btree_cache_alloc_lock and wake_up other waiters.
>>>
>>> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
>>
>>
>> Thank you for this fix, it is really cool to find such defect on 
>> cannibalize lock/unlock. It spent me a little time to understand how 
>> it happens, and reply you late.
>>
>> I feel the root cause is not from where you patched in 
>> bch_btree_check() and bch_root_node_dirty_init(), something really 
>> fishing when using mca_cannibalize_lock(),
>>
>> 843 static int mca_cannibalize_lock(struct cache_set *c, struct 
>> btree_op *op)
>>  844 {
>>  845         spin_lock(&c->btree_cannibalize_lock);
>>  846         if (likely(c->btree_cache_alloc_lock == NULL)) {
>>  847                 c->btree_cache_alloc_lock = current;
>>  848         } else if (c->btree_cache_alloc_lock != current) {
>>  849                 if (op)
>>  850 prepare_to_wait(&c->btree_cache_wait, &op->wait,
>>  851 TASK_UNINTERRUPTIBLE);
>>  852 spin_unlock(&c->btree_cannibalize_lock);
>>  853                 return -EINTR;
>>  854         }
>>  855         spin_unlock(&c->btree_cannibalize_lock);
>>  856
>>  857         return 0;
>>  858 }
>>
>> In line 849-851, if the cannibalized locking failed, insert current 
>> op->wait into c->btree_cache_wait. Then at line 852, return -EINTR to 
>> indicate the caller should retry. But it seems no caller checks 
>> whether the return value is -EINTR and handles it properly.
>>
>> Your patch should work, but I feel the issue of 
>> bch_cannibalize_lock() is not solved yet. Maybe we should work on 
>> handling -EINTR returned from mca_cannibalize_lock() IMHO.
>>
> The patch 2 handle the return value.
>>
>> BTW, when you observe the panic, how are the hardware configurations 
>> about,
>>
>> - CPU cores
> 0-39, total 40 cpus
>>
>> - Memory size
> memory status:
> crash> kmem -i
>                  PAGES        TOTAL      PERCENTAGE
>     TOTAL MEM  32919429     125.6 GB         ----
>          FREE   638133       2.4 GB    1% of TOTAL MEM
>          USED  32281296     123.1 GB   98% of TOTAL MEM
>        SHARED  1353791       5.2 GB    4% of TOTAL MEM
>       BUFFERS   131366     513.1 MB    0% of TOTAL MEM
>        CACHED  2022521       7.7 GB    6% of TOTAL MEM
>          SLAB   590919       2.3 GB    1% of TOTAL MEM
>
>    TOTAL HUGE        0            0         ----
>     HUGE FREE        0            0    0% of TOTAL HUGE
>
>    TOTAL SWAP        0            0         ----
>     SWAP USED        0            0    0% of TOTAL SWAP
>     SWAP FREE        0            0    0% of TOTAL SWAP
>
>  COMMIT LIMIT  16459714      62.8 GB         ----
>     COMMITTED  67485109     257.4 GB  410% of TOTAL LIMIT
>>
>> - Cache size
>
> cache disk 460G
>
>>
>> -  Number of keys on the btree root node
>
> c->root->keys->set->data info:
>
> crash> btree 0xffff9c6bd873cc00|grep data
>         data = 0xffff9c6bda6c0000
>         data = 0xffff9c6bda6cd000
>         data = 0x0
>         data = 0x0
>         data = {
>       data = {
> crash> bset 0xffff9c6bda6c0000
> struct bset {
>   csum = 4228267359687445853,
>   magic = 15660900678624291974,
>   seq = 15025931623832980119,
>   version = 1,
>   keys = 6621,
>   {
>     start = 0xffff9c6bda6c0020,
>     d = 0xffff9c6bda6c0020
>   }
> }
> crash> bset 0xffff9c6bda6cd000
> struct bset {
>   csum = 38040912,
>   magic = 15660900678624291974,
>   seq = 15025931623832980119,
>   version = 0,
>   keys = 0,
>   {
>     start = 0xffff9c6bda6cd020,
>     d = 0xffff9c6bda6cd020
>   }
> } 


Thanks for the information. It seems a corner case of the caanibalize 
lock code is triggered with the parallel btree checking during boot 
time. Nice catch.

Coly Li


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] bcache: bch_sectors_dirty_init() check each thread result and return error
  2022-04-01 12:27 [PATCH 1/2] bcache: fixup btree_cache_wait list damage mingzhe.zou
  2022-04-01 12:27 ` [PATCH 2/2] bcache: check return in the register process and handle error mingzhe.zou
  2022-04-07 15:53 ` [PATCH 1/2] bcache: fixup btree_cache_wait list damage Coly Li
@ 2022-04-11  3:04 ` mingzhe.zou
  2022-04-11  3:04   ` [PATCH v2 2/3] bcache: check bch_sectors_dirty_init() return value mingzhe.zou
  2022-04-11  3:04   ` [PATCH v2 3/3] bcache: check bch_cached_dev_attach() " mingzhe.zou
  2 siblings, 2 replies; 12+ messages in thread
From: mingzhe.zou @ 2022-04-11  3:04 UTC (permalink / raw)
  To: colyli, linux-bcache; +Cc: zoumingzhe, ZouMingzhe

From: ZouMingzhe <mingzhe.zou@easystack.cn>

1. bch_dirty_init_thread() dont check bch_root_node_dirty_init() error
   and return it, add result in struct dirty_init_thrd_info.
2. bch_sectors_dirty_init() dont check each thread result and return void,
   we should check each thread and return error.
3. bch_btree_check() and bch_sectors_dirty_init() must wait all threads stop,
   cannot return error immediately.

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
---
 drivers/md/bcache/btree.c     |  6 +++++-
 drivers/md/bcache/writeback.c | 27 ++++++++++++++++++++++-----
 drivers/md/bcache/writeback.h |  3 ++-
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index f8e6f5c7c736..f5f2718e03e5 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2064,16 +2064,20 @@ int bch_btree_check(struct cache_set *c)
 			for (--i; i >= 0; i--)
 				kthread_stop(check_state->infos[i].thread);
 			ret = -ENOMEM;
-			goto out;
+			goto out_wait;
 		}
 	}
 
 	/*
 	 * Must wait for all threads to stop.
 	 */
+out_wait:
 	wait_event_interruptible(check_state->wait,
 				 atomic_read(&check_state->started) == 0);
 
+	if (ret)
+		goto out;
+
 	for (i = 0; i < check_state->total_threads; i++) {
 		if (check_state->infos[i].result) {
 			ret = check_state->infos[i].result;
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 5b828555bca8..f61cb71ca0c7 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -878,6 +878,7 @@ static int bch_root_node_dirty_init(struct cache_set *c,
 
 static int bch_dirty_init_thread(void *arg)
 {
+	int ret = 0;
 	struct dirty_init_thrd_info *info = arg;
 	struct bch_dirty_init_state *state = info->state;
 	struct cache_set *c = state->c;
@@ -919,7 +920,8 @@ static int bch_dirty_init_thread(void *arg)
 		}
 
 		if (p) {
-			if (bch_root_node_dirty_init(c, state->d, p) < 0)
+			ret = bch_root_node_dirty_init(c, state->d, p);
+			if (ret < 0)
 				goto out;
 		}
 
@@ -929,6 +931,7 @@ static int bch_dirty_init_thread(void *arg)
 	}
 
 out:
+	info->result = ret;
 	/* In order to wake up state->wait in time */
 	smp_mb__before_atomic();
 	if (atomic_dec_and_test(&state->started))
@@ -949,8 +952,9 @@ static int bch_btre_dirty_init_thread_nr(void)
 	return n;
 }
 
-void bch_sectors_dirty_init(struct bcache_device *d)
+int bch_sectors_dirty_init(struct bcache_device *d)
 {
+	int ret = 0;
 	int i;
 	struct bkey *k = NULL;
 	struct btree_iter iter;
@@ -969,13 +973,13 @@ void bch_sectors_dirty_init(struct bcache_device *d)
 		for_each_key_filter(&c->root->keys,
 				    k, &iter, bch_ptr_invalid)
 			sectors_dirty_init_fn(&op.op, c->root, k);
-		return;
+		return 0;
 	}
 
 	state = kzalloc(sizeof(struct bch_dirty_init_state), GFP_KERNEL);
 	if (!state) {
 		pr_warn("sectors dirty init failed: cannot allocate memory\n");
-		return;
+		return -ENOMEM;
 	}
 
 	state->c = c;
@@ -1005,18 +1009,31 @@ void bch_sectors_dirty_init(struct bcache_device *d)
 			pr_err("fails to run thread bch_dirty_init[%d]\n", i);
 			for (--i; i >= 0; i--)
 				kthread_stop(state->infos[i].thread);
-			goto out;
+			ret = -ENOMEM;
+			goto out_wait;
 		}
 	}
 
 	/*
 	 * Must wait for all threads to stop.
 	 */
+out_wait:
 	wait_event_interruptible(state->wait,
 		 atomic_read(&state->started) == 0);
 
+	if (ret)
+		goto out;
+
+	for (i = 0; i < state->total_threads; i++) {
+		if (state->infos[i].result) {
+			ret = state->infos[i].result;
+			goto out;
+		}
+	}
+
 out:
 	kfree(state);
+	return ret;
 }
 
 void bch_cached_dev_writeback_init(struct cached_dev *dc)
diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
index 02b2f9df73f6..0e42b3de5be6 100644
--- a/drivers/md/bcache/writeback.h
+++ b/drivers/md/bcache/writeback.h
@@ -32,6 +32,7 @@ struct bch_dirty_init_state;
 struct dirty_init_thrd_info {
 	struct bch_dirty_init_state	*state;
 	struct task_struct		*thread;
+	int				result;
 };
 
 struct bch_dirty_init_state {
@@ -148,7 +149,7 @@ static inline void bch_writeback_add(struct cached_dev *dc)
 void bcache_dev_sectors_dirty_add(struct cache_set *c, unsigned int inode,
 				  uint64_t offset, int nr_sectors);
 
-void bch_sectors_dirty_init(struct bcache_device *d);
+int bch_sectors_dirty_init(struct bcache_device *d);
 void bch_cached_dev_writeback_init(struct cached_dev *dc);
 int bch_cached_dev_writeback_start(struct cached_dev *dc);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] bcache: check bch_sectors_dirty_init() return value
  2022-04-11  3:04 ` [PATCH v2 1/3] bcache: bch_sectors_dirty_init() check each thread result and return error mingzhe.zou
@ 2022-04-11  3:04   ` mingzhe.zou
  2022-04-11  3:04   ` [PATCH v2 3/3] bcache: check bch_cached_dev_attach() " mingzhe.zou
  1 sibling, 0 replies; 12+ messages in thread
From: mingzhe.zou @ 2022-04-11  3:04 UTC (permalink / raw)
  To: colyli, linux-bcache; +Cc: zoumingzhe, ZouMingzhe

From: ZouMingzhe <mingzhe.zou@easystack.cn>

handle error when call bch_sectors_dirty_init() function

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
---
 drivers/md/bcache/super.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index bf3de149d3c9..e4a53c849fa6 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1300,21 +1300,17 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
 		bch_writeback_queue(dc);
 	}
 
-	bch_sectors_dirty_init(&dc->disk);
+	ret = bch_sectors_dirty_init(&dc->disk);
+	if (ret) {
+		pr_err("Fails in sectors dirty init for %s\n",
+		       dc->disk.disk->disk_name);
+		goto err;
+	}
 
 	ret = bch_cached_dev_run(dc);
 	if (ret && (ret != -EBUSY)) {
-		up_write(&dc->writeback_lock);
-		/*
-		 * bch_register_lock is held, bcache_device_stop() is not
-		 * able to be directly called. The kthread and kworker
-		 * created previously in bch_cached_dev_writeback_start()
-		 * have to be stopped manually here.
-		 */
-		kthread_stop(dc->writeback_thread);
-		cancel_writeback_rate_update_dwork(dc);
 		pr_err("Couldn't run cached device %pg\n", dc->bdev);
-		return ret;
+		goto err;
 	}
 
 	bcache_device_link(&dc->disk, c, "bdev");
@@ -1334,6 +1330,18 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
 		dc->disk.disk->disk_name,
 		dc->disk.c->set_uuid);
 	return 0;
+
+err:
+	up_write(&dc->writeback_lock);
+	/*
+	 * bch_register_lock is held, bcache_device_stop() is not
+	 * able to be directly called. The kthread and kworker
+	 * created previously in bch_cached_dev_writeback_start()
+	 * have to be stopped manually here.
+	 */
+	kthread_stop(dc->writeback_thread);
+	cancel_writeback_rate_update_dwork(dc);
+	return ret;
 }
 
 /* when dc->disk.kobj released */
@@ -1542,7 +1550,9 @@ static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
 		goto err;
 
 	bcache_device_attach(d, c, u - c->uuids);
-	bch_sectors_dirty_init(d);
+	err = bch_sectors_dirty_init(d);
+	if (err)
+		goto err;
 	bch_flash_dev_request_init(d);
 	err = add_disk(d->disk);
 	if (err)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] bcache: check bch_cached_dev_attach() return value
  2022-04-11  3:04 ` [PATCH v2 1/3] bcache: bch_sectors_dirty_init() check each thread result and return error mingzhe.zou
  2022-04-11  3:04   ` [PATCH v2 2/3] bcache: check bch_sectors_dirty_init() return value mingzhe.zou
@ 2022-04-11  3:04   ` mingzhe.zou
  2022-04-13 12:36     ` [PATCH v3] " mingzhe.zou
  2022-04-13 12:42     ` [PATCH v2 3/3] " Zou Mingzhe
  1 sibling, 2 replies; 12+ messages in thread
From: mingzhe.zou @ 2022-04-11  3:04 UTC (permalink / raw)
  To: colyli, linux-bcache; +Cc: zoumingzhe, ZouMingzhe

From: ZouMingzhe <mingzhe.zou@easystack.cn>

handle error when call bch_cached_dev_attach() function

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
---
 drivers/md/bcache/super.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e4a53c849fa6..940eea5f94de 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1460,7 +1460,7 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
 {
 	const char *err = "cannot allocate memory";
 	struct cache_set *c;
-	int ret = -ENOMEM;
+	int ret = -ENOMEM, ret_tmp;
 
 	memcpy(&dc->sb, sb, sizeof(struct cache_sb));
 	dc->bdev = bdev;
@@ -1480,8 +1480,14 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
 
 	list_add(&dc->list, &uncached_devices);
 	/* attach to a matched cache set if it exists */
-	list_for_each_entry(c, &bch_cache_sets, list)
-		bch_cached_dev_attach(dc, c, NULL);
+	err = "failed to attach cached device";
+	list_for_each_entry(c, &bch_cache_sets, list) {
+		ret_tmp = bch_cached_dev_attach(dc, c, NULL);
+		if (ret_tmp)
+			ret = ret_tmp;
+	}
+	if (ret)
+		goto err;
 
 	if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE ||
 	    BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) {
@@ -1981,6 +1987,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 
 static int run_cache_set(struct cache_set *c)
 {
+	int ret = -EIO, ret_tmp;
 	const char *err = "cannot allocate memory";
 	struct cached_dev *dc, *t;
 	struct cache *ca = c->cache;
@@ -2133,8 +2140,14 @@ static int run_cache_set(struct cache_set *c)
 	if (bch_has_feature_obso_large_bucket(&c->cache->sb))
 		pr_err("Detect obsoleted large bucket layout, all attached bcache device will be read-only\n");
 
-	list_for_each_entry_safe(dc, t, &uncached_devices, list)
-		bch_cached_dev_attach(dc, c, NULL);
+	err = "failed to attach cached device";
+	list_for_each_entry_safe(dc, t, &uncached_devices, list) {
+		ret_tmp = bch_cached_dev_attach(dc, c, NULL);
+		if (ret_tmp)
+			ret = ret_tmp;
+	}
+	if (ret)
+		goto err;
 
 	flash_devs_run(c);
 
@@ -2151,7 +2164,7 @@ static int run_cache_set(struct cache_set *c)
 
 	bch_cache_set_error(c, "%s", err);
 
-	return -EIO;
+	return ret;
 }
 
 static const char *register_cache_set(struct cache *ca)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3] bcache: check bch_cached_dev_attach() return value
  2022-04-11  3:04   ` [PATCH v2 3/3] bcache: check bch_cached_dev_attach() " mingzhe.zou
@ 2022-04-13 12:36     ` mingzhe.zou
  2022-04-13 12:42     ` [PATCH v2 3/3] " Zou Mingzhe
  1 sibling, 0 replies; 12+ messages in thread
From: mingzhe.zou @ 2022-04-13 12:36 UTC (permalink / raw)
  To: colyli, linux-bcache; +Cc: zoumingzhe, ZouMingzhe

From: ZouMingzhe <mingzhe.zou@easystack.cn>

handle error when call bch_cached_dev_attach() function

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
---
 drivers/md/bcache/super.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e4a53c849fa6..82be0ec83e1f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1460,7 +1460,7 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
 {
 	const char *err = "cannot allocate memory";
 	struct cache_set *c;
-	int ret = -ENOMEM;
+	int ret = -ENOMEM, ret_tmp;
 
 	memcpy(&dc->sb, sb, sizeof(struct cache_sb));
 	dc->bdev = bdev;
@@ -1480,8 +1480,15 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
 
 	list_add(&dc->list, &uncached_devices);
 	/* attach to a matched cache set if it exists */
-	list_for_each_entry(c, &bch_cache_sets, list)
-		bch_cached_dev_attach(dc, c, NULL);
+	ret = 0;
+	err = "failed to attach cached device";
+	list_for_each_entry(c, &bch_cache_sets, list) {
+		ret_tmp = bch_cached_dev_attach(dc, c, NULL);
+		if (ret_tmp)
+			ret = ret_tmp;
+	}
+	if (ret)
+		goto err;
 
 	if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE ||
 	    BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) {
@@ -1981,6 +1988,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 
 static int run_cache_set(struct cache_set *c)
 {
+	int ret = -EIO, ret_tmp;
 	const char *err = "cannot allocate memory";
 	struct cached_dev *dc, *t;
 	struct cache *ca = c->cache;
@@ -2133,8 +2141,15 @@ static int run_cache_set(struct cache_set *c)
 	if (bch_has_feature_obso_large_bucket(&c->cache->sb))
 		pr_err("Detect obsoleted large bucket layout, all attached bcache device will be read-only\n");
 
-	list_for_each_entry_safe(dc, t, &uncached_devices, list)
-		bch_cached_dev_attach(dc, c, NULL);
+	ret = 0;
+	err = "failed to attach cached device";
+	list_for_each_entry_safe(dc, t, &uncached_devices, list) {
+		ret_tmp = bch_cached_dev_attach(dc, c, NULL);
+		if (ret_tmp)
+			ret = ret_tmp;
+	}
+	if (ret)
+		goto err;
 
 	flash_devs_run(c);
 
@@ -2151,7 +2166,7 @@ static int run_cache_set(struct cache_set *c)
 
 	bch_cache_set_error(c, "%s", err);
 
-	return -EIO;
+	return ret;
 }
 
 static const char *register_cache_set(struct cache *ca)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] bcache: check bch_cached_dev_attach() return value
  2022-04-11  3:04   ` [PATCH v2 3/3] bcache: check bch_cached_dev_attach() " mingzhe.zou
  2022-04-13 12:36     ` [PATCH v3] " mingzhe.zou
@ 2022-04-13 12:42     ` Zou Mingzhe
  2022-04-13 15:10       ` Coly Li
  1 sibling, 1 reply; 12+ messages in thread
From: Zou Mingzhe @ 2022-04-13 12:42 UTC (permalink / raw)
  To: colyli, linux-bcache; +Cc: zoumingzhe


在 2022/4/11 11:04, mingzhe.zou@easystack.cn 写道:
> From: ZouMingzhe <mingzhe.zou@easystack.cn>
>
> handle error when call bch_cached_dev_attach() function
>
> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
> ---
>   drivers/md/bcache/super.c | 25 +++++++++++++++++++------
>   1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index e4a53c849fa6..940eea5f94de 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1460,7 +1460,7 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
>   {
>   	const char *err = "cannot allocate memory";
>   	struct cache_set *c;
> -	int ret = -ENOMEM;
> +	int ret = -ENOMEM, ret_tmp;
>   
>   	memcpy(&dc->sb, sb, sizeof(struct cache_sb));
>   	dc->bdev = bdev;
> @@ -1480,8 +1480,14 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
>   
>   	list_add(&dc->list, &uncached_devices);
>   	/* attach to a matched cache set if it exists */
> -	list_for_each_entry(c, &bch_cache_sets, list)
> -		bch_cached_dev_attach(dc, c, NULL);
> +	err = "failed to attach cached device";
> +	list_for_each_entry(c, &bch_cache_sets, list) {
> +		ret_tmp = bch_cached_dev_attach(dc, c, NULL);
> +		if (ret_tmp)
> +			ret = ret_tmp;
> +	}
> +	if (ret)
> +		goto err;

Hi, coly

Wrong here.

I have send v3.

mingzhe

>   
>   	if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE ||
>   	    BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) {
> @@ -1981,6 +1987,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
>   
>   static int run_cache_set(struct cache_set *c)
>   {
> +	int ret = -EIO, ret_tmp;
>   	const char *err = "cannot allocate memory";
>   	struct cached_dev *dc, *t;
>   	struct cache *ca = c->cache;
> @@ -2133,8 +2140,14 @@ static int run_cache_set(struct cache_set *c)
>   	if (bch_has_feature_obso_large_bucket(&c->cache->sb))
>   		pr_err("Detect obsoleted large bucket layout, all attached bcache device will be read-only\n");
>   
> -	list_for_each_entry_safe(dc, t, &uncached_devices, list)
> -		bch_cached_dev_attach(dc, c, NULL);
> +	err = "failed to attach cached device";
> +	list_for_each_entry_safe(dc, t, &uncached_devices, list) {
> +		ret_tmp = bch_cached_dev_attach(dc, c, NULL);
> +		if (ret_tmp)
> +			ret = ret_tmp;
> +	}
> +	if (ret)
> +		goto err;
>   
>   	flash_devs_run(c);
>   
> @@ -2151,7 +2164,7 @@ static int run_cache_set(struct cache_set *c)
>   
>   	bch_cache_set_error(c, "%s", err);
>   
> -	return -EIO;
> +	return ret;
>   }
>   
>   static const char *register_cache_set(struct cache *ca)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] bcache: check bch_cached_dev_attach() return value
  2022-04-13 12:42     ` [PATCH v2 3/3] " Zou Mingzhe
@ 2022-04-13 15:10       ` Coly Li
  0 siblings, 0 replies; 12+ messages in thread
From: Coly Li @ 2022-04-13 15:10 UTC (permalink / raw)
  To: Zou Mingzhe; +Cc: zoumingzhe, linux-bcache

On 4/13/22 8:42 PM, Zou Mingzhe wrote:
>
> 在 2022/4/11 11:04, mingzhe.zou@easystack.cn 写道:
>> From: ZouMingzhe <mingzhe.zou@easystack.cn>
>>
>> handle error when call bch_cached_dev_attach() function
>>
>> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
>> ---
>>   drivers/md/bcache/super.c | 25 +++++++++++++++++++------
>>   1 file changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index e4a53c849fa6..940eea5f94de 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -1460,7 +1460,7 @@ static int register_bdev(struct cache_sb *sb, 
>> struct cache_sb_disk *sb_disk,
>>   {
>>       const char *err = "cannot allocate memory";
>>       struct cache_set *c;
>> -    int ret = -ENOMEM;
>> +    int ret = -ENOMEM, ret_tmp;
>>         memcpy(&dc->sb, sb, sizeof(struct cache_sb));
>>       dc->bdev = bdev;
>> @@ -1480,8 +1480,14 @@ static int register_bdev(struct cache_sb *sb, 
>> struct cache_sb_disk *sb_disk,
>>         list_add(&dc->list, &uncached_devices);
>>       /* attach to a matched cache set if it exists */
>> -    list_for_each_entry(c, &bch_cache_sets, list)
>> -        bch_cached_dev_attach(dc, c, NULL);
>> +    err = "failed to attach cached device";
>> +    list_for_each_entry(c, &bch_cache_sets, list) {
>> +        ret_tmp = bch_cached_dev_attach(dc, c, NULL);
>> +        if (ret_tmp)
>> +            ret = ret_tmp;
>> +    }
>> +    if (ret)
>> +        goto err;
>
> Hi, coly
>
> Wrong here.
>
> I have send v3.
>
> mingzhe


Sure. BTW, next time it would be more convenient for me if you post new 
version in another email thread. Otherwise there is a little chance that 
I am confused with previous patches.

Thanks.

Coly Li


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-13 15:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 12:27 [PATCH 1/2] bcache: fixup btree_cache_wait list damage mingzhe.zou
2022-04-01 12:27 ` [PATCH 2/2] bcache: check return in the register process and handle error mingzhe.zou
2022-04-07 16:24   ` Coly Li
2022-04-07 15:53 ` [PATCH 1/2] bcache: fixup btree_cache_wait list damage Coly Li
2022-04-08  7:22   ` Zou Mingzhe
2022-04-08  8:39     ` Coly Li
2022-04-11  3:04 ` [PATCH v2 1/3] bcache: bch_sectors_dirty_init() check each thread result and return error mingzhe.zou
2022-04-11  3:04   ` [PATCH v2 2/3] bcache: check bch_sectors_dirty_init() return value mingzhe.zou
2022-04-11  3:04   ` [PATCH v2 3/3] bcache: check bch_cached_dev_attach() " mingzhe.zou
2022-04-13 12:36     ` [PATCH v3] " mingzhe.zou
2022-04-13 12:42     ` [PATCH v2 3/3] " Zou Mingzhe
2022-04-13 15:10       ` Coly Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.