Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/3] bcache: accelerate device registration speed 
@ 2020-02-15  8:28 Coly Li
  2020-02-15  8:28 ` [PATCH 1/3] bcache: move macro btree() and btree_root() into btree.h Coly Li
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Coly Li @ 2020-02-15  8:28 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Hi folks,

This series is an effort to accelerate bcache device registration
speed. With this series the registration speed can be x3 ~ 7x times
faster in my tests.

Since Linux v5.6-rc1, the bcache code may survive more than 24+ horus
on my testing machine (Lenovo SR650, 48 Cores, both NVMe SSDs as
cache device and cached device). Before Linux v5.3, bcache code may
only survive around 40 minutes then deadlock detected. After Linux v5.3,
bcache code may survive 12+ hours then out-of-memory happens and whole
system huang. Now in my testing bcache can survive from 24+ hours high
randome write I/Os, and generate a very large internal btree. After a
reboot, registring a single cached device and cache device may take
55 minutes on my testing machine, this is too long time IMHO.

I don't realize such problem because bcache could not work for so long
time before. After look into the bcache registration code, I realize
the bcache registration time is consumed by 3 steps,
1) Check btree nodes
   To check each key of all the btree nodes are valid and good.
2) Journal replay
   Replay all valid journal entries and insert them back to btree.
3) Count dirty sectors
   For a specific bcache device, iterate all btree keys and count
   dirty sectors (or stripes) for it.

The step 2) is strictly linear order, no chance to speed up. But the
step 1) and 3) accesses btree nodes in read-only more, they can be
speed up by parallelized threads. This series try to create multiple
threads to check btrees and count dirty sectors, from my testing
it can speed up x3 ~ x7 times for bcache registration.

It will be helpful if you may test the patches and feed back me the
result or problem. Currently I target these patchs for next merge
window of Linux v5.7.

Thanks in advance.

Coly Li
---

Coly Li (3):
  bcache: move macro btree() and btree_root() into btree.h
  bcache: make bch_btree_check() to be multiple threads
  bcache: make bch_sectors_dirty_init() to be multiple threads

 drivers/md/bcache/btree.c     | 229 ++++++++++++++++++++++++++++++------------
 drivers/md/bcache/btree.h     |  88 ++++++++++++++++
 drivers/md/bcache/writeback.c | 156 +++++++++++++++++++++++++++-
 drivers/md/bcache/writeback.h |  19 ++++
 4 files changed, 427 insertions(+), 65 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] bcache: move macro btree() and btree_root() into btree.h
  2020-02-15  8:28 [RFC PATCH 0/3] bcache: accelerate device registration speed Coly Li
@ 2020-02-15  8:28 ` Coly Li
  2020-02-19 16:29   ` Christoph Hellwig
  2020-02-15  8:28 ` [PATCH 2/3] bcache: make bch_btree_check() to be multiple threads Coly Li
  2020-02-15  8:28 ` [PATCH 3/3] bcache: make bch_sectors_dirty_init() " Coly Li
  2 siblings, 1 reply; 8+ messages in thread
From: Coly Li @ 2020-02-15  8:28 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

In order to accelerate bcache registration speed, the macro btree()
and btree_root() will be referenced out of btree.c. This patch moves
them from btree.c into btree.h with other relative function declaration
in btree.h, for the following changes.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/btree.c | 60 +-----------------------------------------
 drivers/md/bcache/btree.h | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 59 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index fa872df4e770..99cb201809af 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -101,64 +101,6 @@
 
 #define insert_lock(s, b)	((b)->level <= (s)->lock)
 
-/*
- * These macros are for recursing down the btree - they handle the details of
- * locking and looking up nodes in the cache for you. They're best treated as
- * mere syntax when reading code that uses them.
- *
- * op->lock determines whether we take a read or a write lock at a given depth.
- * If you've got a read lock and find that you need a write lock (i.e. you're
- * going to have to split), set op->lock and return -EINTR; btree_root() will
- * call you again and you'll have the correct lock.
- */
-
-/**
- * btree - recurse down the btree on a specified key
- * @fn:		function to call, which will be passed the child node
- * @key:	key to recurse on
- * @b:		parent btree node
- * @op:		pointer to struct btree_op
- */
-#define btree(fn, key, b, op, ...)					\
-({									\
-	int _r, l = (b)->level - 1;					\
-	bool _w = l <= (op)->lock;					\
-	struct btree *_child = bch_btree_node_get((b)->c, op, key, l,	\
-						  _w, b);		\
-	if (!IS_ERR(_child)) {						\
-		_r = bch_btree_ ## fn(_child, op, ##__VA_ARGS__);	\
-		rw_unlock(_w, _child);					\
-	} else								\
-		_r = PTR_ERR(_child);					\
-	_r;								\
-})
-
-/**
- * btree_root - call a function on the root of the btree
- * @fn:		function to call, which will be passed the child node
- * @c:		cache set
- * @op:		pointer to struct btree_op
- */
-#define btree_root(fn, c, op, ...)					\
-({									\
-	int _r = -EINTR;						\
-	do {								\
-		struct btree *_b = (c)->root;				\
-		bool _w = insert_lock(op, _b);				\
-		rw_lock(_w, _b, _b->level);				\
-		if (_b == (c)->root &&					\
-		    _w == insert_lock(op, _b)) {			\
-			_r = bch_btree_ ## fn(_b, op, ##__VA_ARGS__);	\
-		}							\
-		rw_unlock(_w, _b);					\
-		bch_cannibalize_unlock(c);				\
-		if (_r == -EINTR)					\
-			schedule();					\
-	} while (_r == -EINTR);						\
-									\
-	finish_wait(&(c)->btree_cache_wait, &(op)->wait);		\
-	_r;								\
-})
 
 static inline struct bset *write_block(struct btree *b)
 {
@@ -2422,7 +2364,7 @@ int __bch_btree_map_nodes(struct btree_op *op, struct cache_set *c,
 	return btree_root(map_nodes_recurse, c, op, from, fn, flags);
 }
 
-static int bch_btree_map_keys_recurse(struct btree *b, struct btree_op *op,
+int bch_btree_map_keys_recurse(struct btree *b, struct btree_op *op,
 				      struct bkey *from, btree_map_keys_fn *fn,
 				      int flags)
 {
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index f4dcca449391..f37153db3f6c 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -260,6 +260,13 @@ void bch_initial_gc_finish(struct cache_set *c);
 void bch_moving_gc(struct cache_set *c);
 int bch_btree_check(struct cache_set *c);
 void bch_initial_mark_key(struct cache_set *c, int level, struct bkey *k);
+typedef int (btree_map_keys_fn)(struct btree_op *op, struct btree *b,
+				struct bkey *k);
+int bch_btree_map_keys_recurse(struct btree *b, struct btree_op *op,
+			       struct bkey *from, btree_map_keys_fn *fn,
+			       int flags);
+int bch_btree_map_keys(struct btree_op *op, struct cache_set *c,
+		       struct bkey *from, btree_map_keys_fn *fn, int flags);
 
 static inline void wake_up_gc(struct cache_set *c)
 {
@@ -284,6 +291,65 @@ static inline void force_wake_up_gc(struct cache_set *c)
 	wake_up_gc(c);
 }
 
+/*
+ * These macros are for recursing down the btree - they handle the details of
+ * locking and looking up nodes in the cache for you. They're best treated as
+ * mere syntax when reading code that uses them.
+ *
+ * op->lock determines whether we take a read or a write lock at a given depth.
+ * If you've got a read lock and find that you need a write lock (i.e. you're
+ * going to have to split), set op->lock and return -EINTR; btree_root() will
+ * call you again and you'll have the correct lock.
+ */
+
+/**
+ * btree - recurse down the btree on a specified key
+ * @fn:		function to call, which will be passed the child node
+ * @key:	key to recurse on
+ * @b:		parent btree node
+ * @op:		pointer to struct btree_op
+ */
+#define btree(fn, key, b, op, ...)					\
+({									\
+	int _r, l = (b)->level - 1;					\
+	bool _w = l <= (op)->lock;					\
+	struct btree *_child = bch_btree_node_get((b)->c, op, key, l,	\
+						  _w, b);		\
+	if (!IS_ERR(_child)) {						\
+		_r = bch_btree_ ## fn(_child, op, ##__VA_ARGS__);	\
+		rw_unlock(_w, _child);					\
+	} else								\
+		_r = PTR_ERR(_child);					\
+	_r;								\
+})
+
+/**
+ * btree_root - call a function on the root of the btree
+ * @fn:		function to call, which will be passed the child node
+ * @c:		cache set
+ * @op:		pointer to struct btree_op
+ */
+#define btree_root(fn, c, op, ...)					\
+({									\
+	int _r = -EINTR;						\
+	do {								\
+		struct btree *_b = (c)->root;				\
+		bool _w = insert_lock(op, _b);				\
+		rw_lock(_w, _b, _b->level);				\
+		if (_b == (c)->root &&					\
+		    _w == insert_lock(op, _b)) {			\
+			_r = bch_btree_ ## fn(_b, op, ##__VA_ARGS__);	\
+		}							\
+		rw_unlock(_w, _b);					\
+		bch_cannibalize_unlock(c);                              \
+		if (_r == -EINTR)                                       \
+			schedule();                                     \
+	} while (_r == -EINTR);                                         \
+									\
+	finish_wait(&(c)->btree_cache_wait, &(op)->wait);               \
+	_r;                                                             \
+})
+
 #define MAP_DONE	0
 #define MAP_CONTINUE	1
 
-- 
2.16.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/3] bcache: make bch_btree_check() to be multiple threads
  2020-02-15  8:28 [RFC PATCH 0/3] bcache: accelerate device registration speed Coly Li
  2020-02-15  8:28 ` [PATCH 1/3] bcache: move macro btree() and btree_root() into btree.h Coly Li
@ 2020-02-15  8:28 ` Coly Li
  2020-02-19 16:30   ` Christoph Hellwig
  2020-02-15  8:28 ` [PATCH 3/3] bcache: make bch_sectors_dirty_init() " Coly Li
  2 siblings, 1 reply; 8+ messages in thread
From: Coly Li @ 2020-02-15  8:28 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

When registering a cache device, bch_btree_check() is called to check
all btree node, to make sure the btree is consistency and no corruption.

bch_btree_check() is recursively executed in single thread, when there
are a lot of data cached and the btree is huge, it may take very long
time to check all the btree nodes. In my testing, I observed it took
around 50 minutes to finish bch_btree_check().

When checking the bcache btree nodes, the cache set is not running yet,
and indeed the whole tree is in read-only state, it is safe to create
multiple threads to check the btree in parallel.

This patch tries to create multiple threads, and each thread tries to
one-by-one check the sub-tree indexed by a key from the btree root node.
The parallel thread number depends on how many keys in the btree root
node. At most BCH_BTR_CHKTHREAD_MAX (64) threads can be created, but in
practice is should be min(cpu-number/2, root-node-keys-number).

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/btree.c | 169 +++++++++++++++++++++++++++++++++++++++++++++-
 drivers/md/bcache/btree.h |  22 ++++++
 2 files changed, 188 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 99cb201809af..ce0289f58d38 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1897,13 +1897,176 @@ static int bch_btree_check_recurse(struct btree *b, struct btree_op *op)
 	return ret;
 }
 
+
+static int bch_btree_check_thread(void *arg)
+{
+	int ret;
+	struct btree_check_info *info = arg;
+	struct btree_check_state *check_state = info->state;
+	struct cache_set *c = check_state->c;
+	struct btree_iter iter;
+	struct bkey *k, *p;
+	int cur_idx, prev_idx, skip_nr;
+	int i, n;
+
+	k = p = NULL;
+	i = n = 0;
+	cur_idx = prev_idx = 0;
+	ret = 0;
+
+	/* root node keys are checked before thread created */
+	bch_btree_iter_init(&c->root->keys, &iter, NULL);
+	k = bch_btree_iter_next_filter(&iter, &c->root->keys, bch_ptr_bad);
+	BUG_ON(!k);
+
+	p = k;
+	while (k) {
+		/*
+		 * Fetch a root node key index, skip the keys which
+		 * should be fetched by other threads, then check the
+		 * sub-tree indexed by the fetched key.
+		 */
+		spin_lock(&check_state->idx_lock);
+		cur_idx = check_state->key_idx;
+		check_state->key_idx++;
+		spin_unlock(&check_state->idx_lock);
+
+		skip_nr = cur_idx - prev_idx;
+
+		while (skip_nr) {
+			k = bch_btree_iter_next_filter(&iter,
+						       &c->root->keys,
+						       bch_ptr_bad);
+			if (k)
+				p = k;
+			else {
+				/*
+				 * No more keys to check in root node,
+				 * current checking threads are enough,
+				 * stop creating more.
+				 */
+				atomic_set(&check_state->enough, 1);
+				/* Update check_state->enough earlier */
+				smp_mb();
+				goto out;
+			}
+			skip_nr--;
+			cond_resched();
+		}
+
+		if (p) {
+			struct btree_op op;
+
+			btree_node_prefetch(c->root, p);
+			c->gc_stats.nodes++;
+			bch_btree_op_init(&op, 0);
+			ret = btree(check_recurse, p, c->root, &op);
+			if (ret)
+				goto out;
+		}
+		p = NULL;
+		prev_idx = cur_idx;
+		cond_resched();
+	}
+
+out:
+	info->result = ret;
+	/* update check_state->started among all CPUs */
+	smp_mb();
+	if (atomic_dec_and_test(&check_state->started))
+		wake_up(&check_state->wait);
+
+	return ret;
+}
+
+
+
+static int bch_btree_chkthread_nr(void)
+{
+	int n = num_online_cpus()/2;
+
+	if (n == 0)
+		n = 1;
+	else if (n > BCH_BTR_CHKTHREAD_MAX)
+		n = BCH_BTR_CHKTHREAD_MAX;
+
+	return n;
+}
+
 int bch_btree_check(struct cache_set *c)
 {
-	struct btree_op op;
+	int ret = 0;
+	int i;
+	struct bkey *k = NULL;
+	struct btree_iter iter;
+	struct btree_check_state *check_state;
+	char name[32];
 
-	bch_btree_op_init(&op, SHRT_MAX);
+	/* check and mark root node keys */
+	for_each_key_filter(&c->root->keys, k, &iter, bch_ptr_invalid)
+		bch_initial_mark_key(c, c->root->level, k);
+
+	bch_initial_mark_key(c, c->root->level + 1, &c->root->key);
+
+	if (c->root->level == 0)
+		return 0;
+
+	check_state = kzalloc(sizeof(struct btree_check_state), GFP_KERNEL);
+	if (!check_state)
+		return -ENOMEM;
+
+	check_state->c = c;
+	check_state->total_threads = bch_btree_chkthread_nr();
+	check_state->key_idx = 0;
+	spin_lock_init(&check_state->idx_lock);
+	atomic_set(&check_state->started, 0);
+	atomic_set(&check_state->enough, 0);
+	init_waitqueue_head(&check_state->wait);
 
-	return btree_root(check_recurse, c, &op);
+	/*
+	 * Run multiple threads to check btree nodes in parallel,
+	 * if check_state->enough is non-zero, it means current
+	 * running check threads are enough, unncessary to create
+	 * more.
+	 */
+	for (i = 0; i < check_state->total_threads; i++) {
+		/* fetch latest check_state->enough earlier */
+		smp_mb();
+		if (atomic_read(&check_state->enough))
+			break;
+
+		check_state->infos[i].result = 0;
+		check_state->infos[i].state = check_state;
+		snprintf(name, sizeof(name), "bch_btrchk[%u]", i);
+		atomic_inc(&check_state->started);
+
+		check_state->infos[i].thread =
+			kthread_run(bch_btree_check_thread,
+				    &check_state->infos[i],
+				    name);
+		if (IS_ERR(check_state->infos[i].thread)) {
+			pr_err("fails to run thread bch_btrchk[%d]", i);
+			for (--i; i >= 0; i--)
+				kthread_stop(check_state->infos[i].thread);
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+
+	wait_event_interruptible(check_state->wait,
+				 atomic_read(&check_state->started) == 0 ||
+				  test_bit(CACHE_SET_IO_DISABLE, &c->flags));
+
+	for (i = 0; i < check_state->total_threads; i++) {
+		if (check_state->infos[i].result) {
+			ret = check_state->infos[i].result;
+			goto out;
+		}
+	}
+
+out:
+	kfree(check_state);
+	return ret;
 }
 
 void bch_initial_gc_finish(struct cache_set *c)
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index f37153db3f6c..38ca4ab08f2f 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -145,6 +145,9 @@ struct btree {
 	struct bio		*bio;
 };
 
+
+
+
 #define BTREE_FLAG(flag)						\
 static inline bool btree_node_ ## flag(struct btree *b)			\
 {	return test_bit(BTREE_NODE_ ## flag, &b->flags); }		\
@@ -216,6 +219,25 @@ struct btree_op {
 	unsigned int		insert_collision:1;
 };
 
+struct btree_check_state;
+struct btree_check_info {
+	struct btree_check_state	*state;
+	struct task_struct		*thread;
+	int				result;
+};
+
+#define BCH_BTR_CHKTHREAD_MAX	64
+struct btree_check_state {
+	struct cache_set		*c;
+	int				total_threads;
+	int				key_idx;
+	spinlock_t			idx_lock;
+	atomic_t			started;
+	atomic_t			enough;
+	wait_queue_head_t		wait;
+	struct btree_check_info		infos[BCH_BTR_CHKTHREAD_MAX];
+};
+
 static inline void bch_btree_op_init(struct btree_op *op, int write_lock_level)
 {
 	memset(op, 0, sizeof(struct btree_op));
-- 
2.16.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/3] bcache: make bch_sectors_dirty_init() to be multiple threads
  2020-02-15  8:28 [RFC PATCH 0/3] bcache: accelerate device registration speed Coly Li
  2020-02-15  8:28 ` [PATCH 1/3] bcache: move macro btree() and btree_root() into btree.h Coly Li
  2020-02-15  8:28 ` [PATCH 2/3] bcache: make bch_btree_check() to be multiple threads Coly Li
@ 2020-02-15  8:28 ` " Coly Li
  2 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2020-02-15  8:28 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

When attaching a cached device (a.k.a backing device) to a cache
device, bch_sectors_dirty_init() is called to count dirty sectors
and stripes (see what bcache_dev_sectors_dirty_add() does) on the
cache device.

The counting is done by a single thread recursive function
bch_btree_map_keys() to iterate all the bcache btree nodes.
If the btree has huge number of nodes, bch_sectors_dirty_init() will
take quite long time. In my testing, if the registering cache set has
a existed UUID which matches a already registered cached device, the
automatical attachment during the registration may take more than
55 minutes. This is too long for waiting the bcache to work in real
deployment.

Fortunately when bch_sectors_dirty_init() is called, no other thread
will access the btree yet, it is safe to do a read-only parallelized
dirty sectors counting by multiple threads.

This patch tries to create multiple threads, and each thread tries to
one-by-one count dirty sectors from the sub-tree indexed by a root
node key which the thread fetched. After the sub-tree is counted, the
counting thread will continue to fetch another root node key, until
the fetched key is NULL. How many threads in parallel depends on
the number of keys from the btree root node, and the number of online
CPU core. The thread number will be the less number but no more than
BCH_DIRTY_INIT_THRD_MAX. If there are only 2 keys in root node, it
can only be 2x times faster by this patch. But if there are 10 keys
in the root node, with this patch it can be 10x times faster.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/writeback.c | 158 +++++++++++++++++++++++++++++++++++++++++-
 drivers/md/bcache/writeback.h |  19 +++++
 2 files changed, 174 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 4a40f9eadeaf..84a1e65a21bf 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -785,7 +785,9 @@ static int sectors_dirty_init_fn(struct btree_op *_op, struct btree *b,
 	return MAP_CONTINUE;
 }
 
-void bch_sectors_dirty_init(struct bcache_device *d)
+static int bch_root_node_dirty_init(struct cache_set *c,
+				     struct bcache_device *d,
+				     struct bkey *k)
 {
 	struct sectors_dirty_init op;
 	int ret;
@@ -796,8 +798,13 @@ void bch_sectors_dirty_init(struct bcache_device *d)
 	op.start = KEY(op.inode, 0, 0);
 
 	do {
-		ret = bch_btree_map_keys(&op.op, d->c, &op.start,
-					 sectors_dirty_init_fn, 0);
+		ret = btree(map_keys_recurse,
+			    k,
+			    c->root,
+			    &op.op,
+			    &op.start,
+			    sectors_dirty_init_fn,
+			    0);
 		if (ret == -EAGAIN)
 			schedule_timeout_interruptible(
 				msecs_to_jiffies(INIT_KEYS_SLEEP_MS));
@@ -806,6 +813,151 @@ void bch_sectors_dirty_init(struct bcache_device *d)
 			break;
 		}
 	} while (ret == -EAGAIN);
+
+	return ret;
+}
+
+static int bch_dirty_init_thread(void *arg)
+{
+	struct dirty_init_thrd_info *info = arg;
+	struct bch_dirty_init_state *state = info->state;
+	struct cache_set *c = state->c;
+	struct btree_iter iter;
+	struct bkey *k, *p;
+	int cur_idx, prev_idx, skip_nr;
+	int i;
+
+	k = p = NULL;
+	i = 0;
+	cur_idx = prev_idx = 0;
+
+	bch_btree_iter_init(&c->root->keys, &iter, NULL);
+	k = bch_btree_iter_next_filter(&iter, &c->root->keys, bch_ptr_bad);
+	BUG_ON(!k);
+
+	p = k;
+
+	while (k) {
+		spin_lock(&state->idx_lock);
+		cur_idx = state->key_idx;
+		state->key_idx++;
+		spin_unlock(&state->idx_lock);
+
+		skip_nr = cur_idx - prev_idx;
+
+		while (skip_nr) {
+			k = bch_btree_iter_next_filter(&iter,
+						       &c->root->keys,
+						       bch_ptr_bad);
+			if (k)
+				p = k;
+			else {
+				atomic_set(&state->enough, 1);
+				/* Update state->enough earlier */
+				smp_mb();
+				goto out;
+			}
+			skip_nr--;
+			cond_resched();
+		}
+
+		if (p) {
+			if (bch_root_node_dirty_init(c, state->d, p) < 0)
+				goto out;
+		}
+
+		p = NULL;
+		prev_idx = cur_idx;
+		cond_resched();
+	}
+
+out:
+	/* In order to wake up state->wait in time */
+	smp_mb();
+	if (atomic_dec_and_test(&state->started))
+		wake_up(&state->wait);
+
+	return 0;
+}
+
+static int bch_btre_dirty_init_thread_nr(void)
+{
+	int n = num_online_cpus()/2;
+
+	if (n == 0)
+		n = 1;
+	else if (n > BCH_DIRTY_INIT_THRD_MAX)
+		n = BCH_DIRTY_INIT_THRD_MAX;
+
+	return n;
+}
+
+void bch_sectors_dirty_init(struct bcache_device *d)
+{
+	int i;
+	struct bkey *k = NULL;
+	struct btree_iter iter;
+	struct sectors_dirty_init op;
+	struct cache_set *c = d->c;
+	struct bch_dirty_init_state *state;
+	char name[32];
+
+	/* Just count root keys if no leaf node */
+	if (c->root->level == 0) {
+		bch_btree_op_init(&op.op, -1);
+		op.inode = d->id;
+		op.count = 0;
+		op.start = KEY(op.inode, 0, 0);
+
+		for_each_key_filter(&c->root->keys,
+				    k, &iter, bch_ptr_invalid)
+			sectors_dirty_init_fn(&op.op, c->root, k);
+		return;
+	}
+
+	state = kzalloc(sizeof(struct bch_dirty_init_state), GFP_KERNEL);
+	if (!state) {
+		pr_warn("sectors dirty init failed: cannot allocate memory");
+		return;
+	}
+
+	state->c = c;
+	state->d = d;
+	state->total_threads = bch_btre_dirty_init_thread_nr();
+	state->key_idx = 0;
+	spin_lock_init(&state->idx_lock);
+	atomic_set(&state->started, 0);
+	atomic_set(&state->enough, 0);
+	init_waitqueue_head(&state->wait);
+
+	for (i = 0; i < state->total_threads; i++) {
+		/* Fetch latest state->enough earlier */
+		smp_mb();
+		if (atomic_read(&state->enough))
+			break;
+
+		state->infos[i].state = state;
+		atomic_inc(&state->started);
+		snprintf(name, sizeof(name), "bch_dirty_init[%d]", i);
+
+		state->infos[i].thread =
+			kthread_run(bch_dirty_init_thread,
+				    &state->infos[i],
+				    name);
+		if (IS_ERR(state->infos[i].thread)) {
+			pr_err("fails to run thread bch_dirty_init[%d]", i);
+			for (--i; i >= 0; i--)
+				kthread_stop(state->infos[i].thread);
+			goto out;
+		}
+	}
+
+	wait_event_interruptible(state->wait,
+		 atomic_read(&state->started) == 0 ||
+		 test_bit(CACHE_SET_IO_DISABLE, &c->flags));
+
+out:
+	kfree(state);
 }
 
 void bch_cached_dev_writeback_init(struct cached_dev *dc)
diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
index 4e4c6810dc3c..b029843ce5b6 100644
--- a/drivers/md/bcache/writeback.h
+++ b/drivers/md/bcache/writeback.h
@@ -16,6 +16,7 @@
 
 #define BCH_AUTO_GC_DIRTY_THRESHOLD	50
 
+#define BCH_DIRTY_INIT_THRD_MAX	64
 /*
  * 14 (16384ths) is chosen here as something that each backing device
  * should be a reasonable fraction of the share, and not to blow up
@@ -23,6 +24,24 @@
  */
 #define WRITEBACK_SHARE_SHIFT   14
 
+struct bch_dirty_init_state;
+struct dirty_init_thrd_info {
+	struct bch_dirty_init_state	*state;
+	struct task_struct		*thread;
+};
+
+struct bch_dirty_init_state {
+	struct cache_set		*c;
+	struct bcache_device		*d;
+	int				total_threads;
+	int				key_idx;
+	spinlock_t			idx_lock;
+	atomic_t			started;
+	atomic_t			enough;
+	wait_queue_head_t		wait;
+	struct dirty_init_thrd_info	infos[BCH_DIRTY_INIT_THRD_MAX];
+};
+
 static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d)
 {
 	uint64_t i, ret = 0;
-- 
2.16.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] bcache: move macro btree() and btree_root() into btree.h
  2020-02-15  8:28 ` [PATCH 1/3] bcache: move macro btree() and btree_root() into btree.h Coly Li
@ 2020-02-19 16:29   ` Christoph Hellwig
  2020-02-20 13:21     ` Coly Li
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2020-02-19 16:29 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, linux-block

On Sat, Feb 15, 2020 at 04:28:56PM +0800, Coly Li wrote:
> In order to accelerate bcache registration speed, the macro btree()
> and btree_root() will be referenced out of btree.c. This patch moves
> them from btree.c into btree.h with other relative function declaration
> in btree.h, for the following changes.

Can you give them a bcache_ prefix?  That names are awfully generic
and bound to clash sooner or later.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] bcache: make bch_btree_check() to be multiple threads
  2020-02-15  8:28 ` [PATCH 2/3] bcache: make bch_btree_check() to be multiple threads Coly Li
@ 2020-02-19 16:30   ` Christoph Hellwig
  2020-02-20 13:21     ` Coly Li
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2020-02-19 16:30 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, linux-block

maybe s/be multiple threads/multithreaded/ in the subject?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] bcache: move macro btree() and btree_root() into btree.h
  2020-02-19 16:29   ` Christoph Hellwig
@ 2020-02-20 13:21     ` Coly Li
  0 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2020-02-20 13:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-bcache, linux-block

On 2020/2/20 12:29 上午, Christoph Hellwig wrote:
> On Sat, Feb 15, 2020 at 04:28:56PM +0800, Coly Li wrote:
>> In order to accelerate bcache registration speed, the macro btree()
>> and btree_root() will be referenced out of btree.c. This patch moves
>> them from btree.c into btree.h with other relative function declaration
>> in btree.h, for the following changes.
> 
> Can you give them a bcache_ prefix?  That names are awfully generic
> and bound to clash sooner or later.
> 

Sure I will do it in next version.

Thanks.

-- 

Coly Li

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] bcache: make bch_btree_check() to be multiple threads
  2020-02-19 16:30   ` Christoph Hellwig
@ 2020-02-20 13:21     ` Coly Li
  0 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2020-02-20 13:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-bcache, linux-block

On 2020/2/20 12:30 上午, Christoph Hellwig wrote:
> maybe s/be multiple threads/multithreaded/ in the subject?
> 

Sure, I will do it in next version.

Thanks.

-- 

Coly Li

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-15  8:28 [RFC PATCH 0/3] bcache: accelerate device registration speed Coly Li
2020-02-15  8:28 ` [PATCH 1/3] bcache: move macro btree() and btree_root() into btree.h Coly Li
2020-02-19 16:29   ` Christoph Hellwig
2020-02-20 13:21     ` Coly Li
2020-02-15  8:28 ` [PATCH 2/3] bcache: make bch_btree_check() to be multiple threads Coly Li
2020-02-19 16:30   ` Christoph Hellwig
2020-02-20 13:21     ` Coly Li
2020-02-15  8:28 ` [PATCH 3/3] bcache: make bch_sectors_dirty_init() " Coly Li

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git