linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/12] bcache patches for Linux v5.5
@ 2019-11-13  8:03 Coly Li
  2019-11-13  8:03 ` [PATCH 01/12] bcache: fix fifo index swapping condition in journal_pin_cmp() Coly Li
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

Hi Jens,

The second submit adds two missed patches from Christoph Hellwig.

This is the patches for Linux v5.5. The patches have been testing for
a while during my current development, they are ready to be merged.

There are still other patches under testing, I will submit to you in
later runs if I feel they are solid enough in my testing.

Thanks for taking care of this.

Coly Li

---

Andrea Righi (1):
  bcache: fix deadlock in bcache_allocator

Christoph Hellwig (2):
  bcache: remove the extra cflags for request.o
  bcache: don't export symbols

Coly Li (8):
  bcache: fix fifo index swapping condition in journal_pin_cmp()
  bcache: fix static checker warning in bcache_device_free()
  bcache: add more accurate error messages in read_super()
  bcache: deleted code comments for dead code in bch_data_insert_keys()
  bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front()
  bcache: add code comments in bch_btree_leaf_dirty()
  bcache: add idle_max_writeback_rate sysfs interface
  bcache: at least try to shrink 1 node in bch_mca_scan()

Guoju Fang (1):
  bcache: fix a lost wake-up problem caused by mca_cannibalize_lock

 drivers/md/bcache/Makefile    |  2 --
 drivers/md/bcache/alloc.c     |  5 +++-
 drivers/md/bcache/bcache.h    |  4 +++-
 drivers/md/bcache/bset.c      | 17 ++-----------
 drivers/md/bcache/btree.c     | 45 ++++++++++++++++++++++++++++++----
 drivers/md/bcache/closure.c   |  7 ------
 drivers/md/bcache/journal.h   |  4 ----
 drivers/md/bcache/request.c   | 12 ----------
 drivers/md/bcache/super.c     | 56 +++++++++++++++++++++++++++++++------------
 drivers/md/bcache/sysfs.c     |  7 ++++++
 drivers/md/bcache/writeback.c |  4 ++++
 11 files changed, 102 insertions(+), 61 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/12] bcache: fix fifo index swapping condition in journal_pin_cmp()
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-18 15:28   ` Coly Li
  2019-11-13  8:03 ` [PATCH 02/12] bcache: fix a lost wake-up problem caused by mca_cannibalize_lock Coly Li
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

Fifo structure journal.pin is implemented by a cycle buffer, if the back
index reaches highest location of the cycle buffer, it will be swapped
to 0. Once the swapping happens, it means a smaller fifo index might be
associated to a newer journal entry. So the btree node with oldest
journal entry won't be selected in bch_btree_leaf_dirty() to reference
the dirty B+tree leaf node. This problem may cause bcache journal won't
protect unflushed oldest B+tree dirty leaf node in power failure, and
this B+tree leaf node is possible to beinconsistent after reboot from
power failure.

This patch fixes the fifo index comparing logic in journal_pin_cmp(),
to avoid potential corrupted B+tree leaf node when the back index of
journal pin is swapped.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/btree.c   | 26 ++++++++++++++++++++++++++
 drivers/md/bcache/journal.h |  4 ----
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index ba434d9ac720..00523cd1db80 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -528,6 +528,32 @@ static void btree_node_write_work(struct work_struct *w)
 	mutex_unlock(&b->write_lock);
 }
 
+/* return true if journal pin 'l' is newer than 'r' */
+static bool journal_pin_cmp(struct cache_set *c,
+			    atomic_t *l,
+			    atomic_t *r)
+{
+	int l_idx, r_idx, f_idx, b_idx;
+	bool ret = false;
+
+	l_idx = fifo_idx(&(c)->journal.pin, (l));
+	r_idx = fifo_idx(&(c)->journal.pin, (r));
+	f_idx = (c)->journal.pin.front;
+	b_idx = (c)->journal.pin.back;
+
+	if (l_idx > r_idx)
+		ret = true;
+	/* in case fifo back pointer is swapped */
+	if (b_idx < f_idx) {
+		if (l_idx <= b_idx && r_idx >= f_idx)
+			ret = true;
+		else if (l_idx >= f_idx && r_idx <= b_idx)
+			ret = false;
+	}
+
+	return ret;
+}
+
 static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
 {
 	struct bset *i = btree_bset_last(b);
diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h
index f2ea34d5f431..06b3eaab7d16 100644
--- a/drivers/md/bcache/journal.h
+++ b/drivers/md/bcache/journal.h
@@ -157,10 +157,6 @@ struct journal_device {
 };
 
 #define BTREE_FLUSH_NR	8
-
-#define journal_pin_cmp(c, l, r)				\
-	(fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r)))
-
 #define JOURNAL_PIN	20000
 
 #define journal_full(j)						\
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/12] bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
  2019-11-13  8:03 ` [PATCH 01/12] bcache: fix fifo index swapping condition in journal_pin_cmp() Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-17  3:32   ` Eric Wheeler
  2019-11-13  8:03 ` [PATCH 03/12] bcache: fix static checker warning in bcache_device_free() Coly Li
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Guoju Fang, Coly Li

From: Guoju Fang <fangguoju@gmail.com>

This patch fix a lost wake-up problem caused by the race between
mca_cannibalize_lock and bch_cannibalize_unlock.

Consider two processes, A and B. Process A is executing
mca_cannibalize_lock, while process B takes c->btree_cache_alloc_lock
and is executing bch_cannibalize_unlock. The problem happens that after
process A executes cmpxchg and will execute prepare_to_wait. In this
timeslice process B executes wake_up, but after that process A executes
prepare_to_wait and set the state to TASK_INTERRUPTIBLE. Then process A
goes to sleep but no one will wake up it. This problem may cause bcache
device to dead.

Signed-off-by: Guoju Fang <fangguoju@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/bcache.h |  1 +
 drivers/md/bcache/btree.c  | 12 ++++++++----
 drivers/md/bcache/super.c  |  1 +
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 013e35a9e317..3653faf3bf48 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -582,6 +582,7 @@ struct cache_set {
 	 */
 	wait_queue_head_t	btree_cache_wait;
 	struct task_struct	*btree_cache_alloc_lock;
+	spinlock_t		btree_cannibalize_lock;
 
 	/*
 	 * When we free a btree node, we increment the gen of the bucket the
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 00523cd1db80..39d7fc1ef1ee 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -910,15 +910,17 @@ static struct btree *mca_find(struct cache_set *c, struct bkey *k)
 
 static int mca_cannibalize_lock(struct cache_set *c, struct btree_op *op)
 {
-	struct task_struct *old;
-
-	old = cmpxchg(&c->btree_cache_alloc_lock, NULL, current);
-	if (old && old != current) {
+	spin_lock(&c->btree_cannibalize_lock);
+	if (likely(c->btree_cache_alloc_lock == NULL)) {
+		c->btree_cache_alloc_lock = current;
+	} else if (c->btree_cache_alloc_lock != current) {
 		if (op)
 			prepare_to_wait(&c->btree_cache_wait, &op->wait,
 					TASK_UNINTERRUPTIBLE);
+		spin_unlock(&c->btree_cannibalize_lock);
 		return -EINTR;
 	}
+	spin_unlock(&c->btree_cannibalize_lock);
 
 	return 0;
 }
@@ -953,10 +955,12 @@ static struct btree *mca_cannibalize(struct cache_set *c, struct btree_op *op,
  */
 static void bch_cannibalize_unlock(struct cache_set *c)
 {
+	spin_lock(&c->btree_cannibalize_lock);
 	if (c->btree_cache_alloc_lock == current) {
 		c->btree_cache_alloc_lock = NULL;
 		wake_up(&c->btree_cache_wait);
 	}
+	spin_unlock(&c->btree_cannibalize_lock);
 }
 
 static struct btree *mca_alloc(struct cache_set *c, struct btree_op *op,
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 20ed838e9413..ebb854ed05a4 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1769,6 +1769,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 	sema_init(&c->sb_write_mutex, 1);
 	mutex_init(&c->bucket_lock);
 	init_waitqueue_head(&c->btree_cache_wait);
+	spin_lock_init(&c->btree_cannibalize_lock);
 	init_waitqueue_head(&c->bucket_wait);
 	init_waitqueue_head(&c->gc_wait);
 	sema_init(&c->uuid_write_mutex, 1);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/12] bcache: fix static checker warning in bcache_device_free()
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
  2019-11-13  8:03 ` [PATCH 01/12] bcache: fix fifo index swapping condition in journal_pin_cmp() Coly Li
  2019-11-13  8:03 ` [PATCH 02/12] bcache: fix a lost wake-up problem caused by mca_cannibalize_lock Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-13  8:03 ` [PATCH 04/12] bcache: add more accurate error messages in read_super() Coly Li
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

Commit cafe56359144 ("bcache: A block layer cache") leads to the
following static checker warning:

    ./drivers/md/bcache/super.c:770 bcache_device_free()
    warn: variable dereferenced before check 'd->disk' (see line 766)

drivers/md/bcache/super.c
   762  static void bcache_device_free(struct bcache_device *d)
   763  {
   764          lockdep_assert_held(&bch_register_lock);
   765
   766          pr_info("%s stopped", d->disk->disk_name);
                                      ^^^^^^^^^
Unchecked dereference.

   767
   768          if (d->c)
   769                  bcache_device_detach(d);
   770          if (d->disk && d->disk->flags & GENHD_FL_UP)
                    ^^^^^^^
Check too late.

   771                  del_gendisk(d->disk);
   772          if (d->disk && d->disk->queue)
   773                  blk_cleanup_queue(d->disk->queue);
   774          if (d->disk) {
   775                  ida_simple_remove(&bcache_device_idx,
   776                                    first_minor_to_idx(d->disk->first_minor));
   777                  put_disk(d->disk);
   778          }
   779

It is not 100% sure that the gendisk struct of bcache device will always
be there, the warning makes sense when there is problem in block core.

This patch tries to remove the static checking warning by checking
d->disk to avoid NULL pointer deferences.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index ebb854ed05a4..7beccede5360 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -761,20 +761,28 @@ static inline int idx_to_first_minor(int idx)
 
 static void bcache_device_free(struct bcache_device *d)
 {
+	struct gendisk *disk = d->disk;
+
 	lockdep_assert_held(&bch_register_lock);
 
-	pr_info("%s stopped", d->disk->disk_name);
+	if (disk)
+		pr_info("%s stopped", disk->disk_name);
+	else
+		pr_err("bcache device (NULL gendisk) stopped");
 
 	if (d->c)
 		bcache_device_detach(d);
-	if (d->disk && d->disk->flags & GENHD_FL_UP)
-		del_gendisk(d->disk);
-	if (d->disk && d->disk->queue)
-		blk_cleanup_queue(d->disk->queue);
-	if (d->disk) {
+
+	if (disk) {
+		if (disk->flags & GENHD_FL_UP)
+			del_gendisk(disk);
+
+		if (disk->queue)
+			blk_cleanup_queue(disk->queue);
+
 		ida_simple_remove(&bcache_device_idx,
-				  first_minor_to_idx(d->disk->first_minor));
-		put_disk(d->disk);
+				  first_minor_to_idx(disk->first_minor));
+		put_disk(disk);
 	}
 
 	bioset_exit(&d->bio_split);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/12] bcache: add more accurate error messages in read_super()
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (2 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 03/12] bcache: fix static checker warning in bcache_device_free() Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-13  8:03 ` [PATCH 05/12] bcache: deleted code comments for dead code in bch_data_insert_keys() Coly Li
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

Previous code only returns "Not a bcache superblock" for both bcache
super block offset and magic error. This patch addss more accurate error
messages,
- for super block unmatched offset:
  "Not a bcache superblock (bad offset)"
- for super block unmatched magic number:
  "Not a bcache superblock (bad magic)"

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 7beccede5360..623fdaf10c4c 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -92,10 +92,11 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
 	pr_debug("read sb version %llu, flags %llu, seq %llu, journal size %u",
 		 sb->version, sb->flags, sb->seq, sb->keys);
 
-	err = "Not a bcache superblock";
+	err = "Not a bcache superblock (bad offset)";
 	if (sb->offset != SB_SECTOR)
 		goto err;
 
+	err = "Not a bcache superblock (bad magic)";
 	if (memcmp(sb->magic, bcache_magic, 16))
 		goto err;
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/12] bcache: deleted code comments for dead code in bch_data_insert_keys()
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (3 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 04/12] bcache: add more accurate error messages in read_super() Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-13  8:03 ` [PATCH 06/12] bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front() Coly Li
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

In request.c:bch_data_insert_keys(), there is code comment for a piece
of dead code. This patch deletes the dead code and its code comment
since they are useless in practice.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/request.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 41adcd1546f1..73478a91a342 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -62,18 +62,6 @@ static void bch_data_insert_keys(struct closure *cl)
 	struct bkey *replace_key = op->replace ? &op->replace_key : NULL;
 	int ret;
 
-	/*
-	 * If we're looping, might already be waiting on
-	 * another journal write - can't wait on more than one journal write at
-	 * a time
-	 *
-	 * XXX: this looks wrong
-	 */
-#if 0
-	while (atomic_read(&s->cl.remaining) & CLOSURE_WAITING)
-		closure_sync(&s->cl);
-#endif
-
 	if (!op->replace)
 		journal_ref = bch_journal(op->c, &op->insert_keys,
 					  op->flush_journal ? cl : NULL);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/12] bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front()
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (4 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 05/12] bcache: deleted code comments for dead code in bch_data_insert_keys() Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-13  8:03 ` [PATCH 07/12] bcache: fix deadlock in bcache_allocator Coly Li
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

This patch adds simple code comments for bch_keylist_pop() and
bch_keylist_pop_front() in bset.c, to make the code more easier to
be understand.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/bset.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c
index 08768796b543..f37a429f093d 100644
--- a/drivers/md/bcache/bset.c
+++ b/drivers/md/bcache/bset.c
@@ -155,6 +155,7 @@ int __bch_keylist_realloc(struct keylist *l, unsigned int u64s)
 	return 0;
 }
 
+/* Pop the top key of keylist by pointing l->top to its previous key */
 struct bkey *bch_keylist_pop(struct keylist *l)
 {
 	struct bkey *k = l->keys;
@@ -168,6 +169,7 @@ struct bkey *bch_keylist_pop(struct keylist *l)
 	return l->top = k;
 }
 
+/* Pop the bottom key of keylist and update l->top_p */
 void bch_keylist_pop_front(struct keylist *l)
 {
 	l->top_p -= bkey_u64s(l->keys);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/12] bcache: fix deadlock in bcache_allocator
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (5 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 06/12] bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front() Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-17  3:33   ` Eric Wheeler
  2019-11-13  8:03 ` [PATCH 08/12] bcache: add code comments in bch_btree_leaf_dirty() Coly Li
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Andrea Righi, Coly Li

From: Andrea Righi <andrea.righi@canonical.com>

bcache_allocator can call the following:

 bch_allocator_thread()
  -> bch_prio_write()
     -> bch_bucket_alloc()
        -> wait on &ca->set->bucket_wait

But the wake up event on bucket_wait is supposed to come from
bch_allocator_thread() itself => deadlock:

[ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds.
[ 1158.495929]       Not tainted 5.3.0-050300rc3-generic #201908042232
[ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1158.504413] bcache_allocato D    0 15861      2 0x80004000
[ 1158.504419] Call Trace:
[ 1158.504429]  __schedule+0x2a8/0x670
[ 1158.504432]  schedule+0x2d/0x90
[ 1158.504448]  bch_bucket_alloc+0xe5/0x370 [bcache]
[ 1158.504453]  ? wait_woken+0x80/0x80
[ 1158.504466]  bch_prio_write+0x1dc/0x390 [bcache]
[ 1158.504476]  bch_allocator_thread+0x233/0x490 [bcache]
[ 1158.504491]  kthread+0x121/0x140
[ 1158.504503]  ? invalidate_buckets+0x890/0x890 [bcache]
[ 1158.504506]  ? kthread_park+0xb0/0xb0
[ 1158.504510]  ret_from_fork+0x35/0x40

Fix by making the call to bch_prio_write() non-blocking, so that
bch_allocator_thread() never waits on itself.

Moreover, make sure to wake up the garbage collector thread when
bch_prio_write() is failing to allocate buckets.

BugLink: https://bugs.launchpad.net/bugs/1784665
BugLink: https://bugs.launchpad.net/bugs/1796292
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/alloc.c  |  5 ++++-
 drivers/md/bcache/bcache.h |  2 +-
 drivers/md/bcache/super.c  | 27 +++++++++++++++++++++------
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index 6f776823b9ba..a1df0d95151c 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -377,7 +377,10 @@ static int bch_allocator_thread(void *arg)
 			if (!fifo_full(&ca->free_inc))
 				goto retry_invalidate;
 
-			bch_prio_write(ca);
+			if (bch_prio_write(ca, false) < 0) {
+				ca->invalidate_needs_gc = 1;
+				wake_up_gc(ca->set);
+			}
 		}
 	}
 out:
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 3653faf3bf48..50241e045c70 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -978,7 +978,7 @@ bool bch_cached_dev_error(struct cached_dev *dc);
 __printf(2, 3)
 bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...);
 
-void bch_prio_write(struct cache *ca);
+int bch_prio_write(struct cache *ca, bool wait);
 void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
 
 extern struct workqueue_struct *bcache_wq;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 623fdaf10c4c..d1352fcc6ff2 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -530,12 +530,29 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,
 	closure_sync(cl);
 }
 
-void bch_prio_write(struct cache *ca)
+int bch_prio_write(struct cache *ca, bool wait)
 {
 	int i;
 	struct bucket *b;
 	struct closure cl;
 
+	pr_debug("free_prio=%zu, free_none=%zu, free_inc=%zu",
+		 fifo_used(&ca->free[RESERVE_PRIO]),
+		 fifo_used(&ca->free[RESERVE_NONE]),
+		 fifo_used(&ca->free_inc));
+
+	/*
+	 * Pre-check if there are enough free buckets. In the non-blocking
+	 * scenario it's better to fail early rather than starting to allocate
+	 * buckets and do a cleanup later in case of failure.
+	 */
+	if (!wait) {
+		size_t avail = fifo_used(&ca->free[RESERVE_PRIO]) +
+			       fifo_used(&ca->free[RESERVE_NONE]);
+		if (prio_buckets(ca) > avail)
+			return -ENOMEM;
+	}
+
 	closure_init_stack(&cl);
 
 	lockdep_assert_held(&ca->set->bucket_lock);
@@ -545,9 +562,6 @@ void bch_prio_write(struct cache *ca)
 	atomic_long_add(ca->sb.bucket_size * prio_buckets(ca),
 			&ca->meta_sectors_written);
 
-	//pr_debug("free %zu, free_inc %zu, unused %zu", fifo_used(&ca->free),
-	//	 fifo_used(&ca->free_inc), fifo_used(&ca->unused));
-
 	for (i = prio_buckets(ca) - 1; i >= 0; --i) {
 		long bucket;
 		struct prio_set *p = ca->disk_buckets;
@@ -565,7 +579,7 @@ void bch_prio_write(struct cache *ca)
 		p->magic	= pset_magic(&ca->sb);
 		p->csum		= bch_crc64(&p->magic, bucket_bytes(ca) - 8);
 
-		bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true);
+		bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
 		BUG_ON(bucket == -1);
 
 		mutex_unlock(&ca->set->bucket_lock);
@@ -594,6 +608,7 @@ void bch_prio_write(struct cache *ca)
 
 		ca->prio_last_buckets[i] = ca->prio_buckets[i];
 	}
+	return 0;
 }
 
 static void prio_read(struct cache *ca, uint64_t bucket)
@@ -1964,7 +1979,7 @@ static int run_cache_set(struct cache_set *c)
 
 		mutex_lock(&c->bucket_lock);
 		for_each_cache(ca, c, i)
-			bch_prio_write(ca);
+			bch_prio_write(ca, true);
 		mutex_unlock(&c->bucket_lock);
 
 		err = "cannot allocate new UUID bucket";
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/12] bcache: add code comments in bch_btree_leaf_dirty()
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (6 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 07/12] bcache: fix deadlock in bcache_allocator Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-13  8:03 ` [PATCH 09/12] bcache: add idle_max_writeback_rate sysfs interface Coly Li
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

This patch adds code comments in bch_btree_leaf_dirty() to explain
why w->journal should always reference the eldest journal pin of
all the writing bkeys in the btree node. To make the bcache journal
code to be easier to be understood.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/btree.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 39d7fc1ef1ee..48e33ee0d876 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -569,6 +569,11 @@ static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
 
 	set_btree_node_dirty(b);
 
+	/*
+	 * w->journal is always the oldest journal pin of all bkeys
+	 * in the leaf node, to make sure the oldest jset seq won't
+	 * be increased before this btree node is flushed.
+	 */
 	if (journal_ref) {
 		if (w->journal &&
 		    journal_pin_cmp(b->c, w->journal, journal_ref)) {
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/12] bcache: add idle_max_writeback_rate sysfs interface
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (7 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 08/12] bcache: add code comments in bch_btree_leaf_dirty() Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-17  3:34   ` Eric Wheeler
  2019-11-13  8:03 ` [PATCH 10/12] bcache: at least try to shrink 1 node in bch_mca_scan() Coly Li
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

For writeback mode, if there is no regular I/O request for a while,
the writeback rate will be set to the maximum value (1TB/s for now).
This is good for most of the storage workload, but there are still
people don't what the maximum writeback rate in I/O idle time.

This patch adds a sysfs interface file idle_max_writeback_rate to
permit people to disable maximum writeback rate. Then the minimum
writeback rate can be advised by writeback_rate_minimum in the
bcache device's sysfs interface.

Reported-by: Christian Balzer <chibi@gol.com>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/bcache.h    | 1 +
 drivers/md/bcache/super.c     | 1 +
 drivers/md/bcache/sysfs.c     | 7 +++++++
 drivers/md/bcache/writeback.c | 4 ++++
 4 files changed, 13 insertions(+)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 50241e045c70..9198c1b480d9 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -724,6 +724,7 @@ struct cache_set {
 	unsigned int		gc_always_rewrite:1;
 	unsigned int		shrinker_disabled:1;
 	unsigned int		copy_gc_enabled:1;
+	unsigned int		idle_max_writeback_rate_enabled:1;
 
 #define BUCKET_HASH_BITS	12
 	struct hlist_head	bucket_hash[1 << BUCKET_HASH_BITS];
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index d1352fcc6ff2..77e9869345e7 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1834,6 +1834,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 	c->congested_read_threshold_us	= 2000;
 	c->congested_write_threshold_us	= 20000;
 	c->error_limit	= DEFAULT_IO_ERROR_LIMIT;
+	c->idle_max_writeback_rate_enabled = 1;
 	WARN_ON(test_and_clear_bit(CACHE_SET_IO_DISABLE, &c->flags));
 
 	return c;
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 627dcea0f5b6..733e2ddf3c78 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -134,6 +134,7 @@ rw_attribute(expensive_debug_checks);
 rw_attribute(cache_replacement_policy);
 rw_attribute(btree_shrinker_disabled);
 rw_attribute(copy_gc_enabled);
+rw_attribute(idle_max_writeback_rate);
 rw_attribute(gc_after_writeback);
 rw_attribute(size);
 
@@ -747,6 +748,8 @@ SHOW(__bch_cache_set)
 	sysfs_printf(gc_always_rewrite,		"%i", c->gc_always_rewrite);
 	sysfs_printf(btree_shrinker_disabled,	"%i", c->shrinker_disabled);
 	sysfs_printf(copy_gc_enabled,		"%i", c->copy_gc_enabled);
+	sysfs_printf(idle_max_writeback_rate,	"%i",
+		     c->idle_max_writeback_rate_enabled);
 	sysfs_printf(gc_after_writeback,	"%i", c->gc_after_writeback);
 	sysfs_printf(io_disable,		"%i",
 		     test_bit(CACHE_SET_IO_DISABLE, &c->flags));
@@ -864,6 +867,9 @@ STORE(__bch_cache_set)
 	sysfs_strtoul_bool(gc_always_rewrite,	c->gc_always_rewrite);
 	sysfs_strtoul_bool(btree_shrinker_disabled, c->shrinker_disabled);
 	sysfs_strtoul_bool(copy_gc_enabled,	c->copy_gc_enabled);
+	sysfs_strtoul_bool(idle_max_writeback_rate,
+			   c->idle_max_writeback_rate_enabled);
+
 	/*
 	 * write gc_after_writeback here may overwrite an already set
 	 * BCH_DO_AUTO_GC, it doesn't matter because this flag will be
@@ -954,6 +960,7 @@ static struct attribute *bch_cache_set_internal_files[] = {
 	&sysfs_gc_always_rewrite,
 	&sysfs_btree_shrinker_disabled,
 	&sysfs_copy_gc_enabled,
+	&sysfs_idle_max_writeback_rate,
 	&sysfs_gc_after_writeback,
 	&sysfs_io_disable,
 	&sysfs_cutoff_writeback,
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index d60268fe49e1..4a40f9eadeaf 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -122,6 +122,10 @@ static void __update_writeback_rate(struct cached_dev *dc)
 static bool set_at_max_writeback_rate(struct cache_set *c,
 				       struct cached_dev *dc)
 {
+	/* Don't sst max writeback rate if it is disabled */
+	if (!c->idle_max_writeback_rate_enabled)
+		return false;
+
 	/* Don't set max writeback rate if gc is running */
 	if (!c->gc_mark_valid)
 		return false;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/12] bcache: at least try to shrink 1 node in bch_mca_scan()
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (8 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 09/12] bcache: add idle_max_writeback_rate sysfs interface Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-13  8:03 ` [PATCH 11/12] bcache: remove the extra cflags for request.o Coly Li
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

In bch_mca_scan(), the number of shrinking btree node is calculated
by code like this,
	unsigned long nr = sc->nr_to_scan;

        nr /= c->btree_pages;
        nr = min_t(unsigned long, nr, mca_can_free(c));
variable sc->nr_to_scan is number of objects (here is bcache B+tree
nodes' number) to shrink, and pointer variable sc is sent from memory
management code as parametr of a callback.

If sc->nr_to_scan is smaller than c->btree_pages, after the above
calculation, variable 'nr' will be 0 and nothing will be shrunk. It is
frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make
nr to be zero. Then bch_mca_scan() will do nothing more then acquiring
and releasing mutex c->bucket_lock.

This patch checkes whether nr is 0 after the above calculation, if 0
is the result then set 1 to variable 'n'. Then at least bch_mca_scan()
will try to shrink a single B+tree node.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/btree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 48e33ee0d876..3df5fa4a501c 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -754,6 +754,8 @@ static unsigned long bch_mca_scan(struct shrinker *shrink,
 	 * IO can always make forward progress:
 	 */
 	nr /= c->btree_pages;
+	if (nr == 0)
+		nr = 1;
 	nr = min_t(unsigned long, nr, mca_can_free(c));
 
 	i = 0;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/12] bcache: remove the extra cflags for request.o
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (9 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 10/12] bcache: at least try to shrink 1 node in bch_mca_scan() Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-13  8:03 ` [PATCH 12/12] bcache: don't export symbols Coly Li
  2019-11-13 22:43 ` [PATCH v2 00/12] bcache patches for Linux v5.5 Jens Axboe
  12 siblings, 0 replies; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Christoph Hellwig, Coly Li

From: Christoph Hellwig <hch@lst.de>

There is no block directory this file needs includes from.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/Makefile | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
index d26b35195825..fd714628da6a 100644
--- a/drivers/md/bcache/Makefile
+++ b/drivers/md/bcache/Makefile
@@ -5,5 +5,3 @@ obj-$(CONFIG_BCACHE)	+= bcache.o
 bcache-y		:= alloc.o bset.o btree.o closure.o debug.o extents.o\
 	io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
 	util.o writeback.o
-
-CFLAGS_request.o	+= -Iblock
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/12] bcache: don't export symbols
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (10 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 11/12] bcache: remove the extra cflags for request.o Coly Li
@ 2019-11-13  8:03 ` Coly Li
  2019-11-13 22:43 ` [PATCH v2 00/12] bcache patches for Linux v5.5 Jens Axboe
  12 siblings, 0 replies; 20+ messages in thread
From: Coly Li @ 2019-11-13  8:03 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Christoph Hellwig, Coly Li

From: Christoph Hellwig <hch@lst.de>

None of the exported bcache symbols are actually used anywhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/bset.c    | 15 ---------------
 drivers/md/bcache/closure.c |  7 -------
 2 files changed, 22 deletions(-)

diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c
index f37a429f093d..cffcdc9feefb 100644
--- a/drivers/md/bcache/bset.c
+++ b/drivers/md/bcache/bset.c
@@ -311,7 +311,6 @@ void bch_btree_keys_free(struct btree_keys *b)
 	t->tree = NULL;
 	t->data = NULL;
 }
-EXPORT_SYMBOL(bch_btree_keys_free);
 
 int bch_btree_keys_alloc(struct btree_keys *b,
 			 unsigned int page_order,
@@ -344,7 +343,6 @@ int bch_btree_keys_alloc(struct btree_keys *b,
 	bch_btree_keys_free(b);
 	return -ENOMEM;
 }
-EXPORT_SYMBOL(bch_btree_keys_alloc);
 
 void bch_btree_keys_init(struct btree_keys *b, const struct btree_keys_ops *ops,
 			 bool *expensive_debug_checks)
@@ -363,7 +361,6 @@ void bch_btree_keys_init(struct btree_keys *b, const struct btree_keys_ops *ops,
 	 * any more.
 	 */
 }
-EXPORT_SYMBOL(bch_btree_keys_init);
 
 /* Binary tree stuff for auxiliary search trees */
 
@@ -680,7 +677,6 @@ void bch_bset_init_next(struct btree_keys *b, struct bset *i, uint64_t magic)
 
 	bch_bset_build_unwritten_tree(b);
 }
-EXPORT_SYMBOL(bch_bset_init_next);
 
 /*
  * Build auxiliary binary tree 'struct bset_tree *t', this tree is used to
@@ -734,7 +730,6 @@ void bch_bset_build_written_tree(struct btree_keys *b)
 	     j = inorder_next(j, t->size))
 		make_bfloat(t, j);
 }
-EXPORT_SYMBOL(bch_bset_build_written_tree);
 
 /* Insert */
 
@@ -782,7 +777,6 @@ fix_right:	do {
 			j = j * 2 + 1;
 		} while (j < t->size);
 }
-EXPORT_SYMBOL(bch_bset_fix_invalidated_key);
 
 static void bch_bset_fix_lookup_table(struct btree_keys *b,
 				      struct bset_tree *t,
@@ -857,7 +851,6 @@ bool bch_bkey_try_merge(struct btree_keys *b, struct bkey *l, struct bkey *r)
 
 	return b->ops->key_merge(b, l, r);
 }
-EXPORT_SYMBOL(bch_bkey_try_merge);
 
 void bch_bset_insert(struct btree_keys *b, struct bkey *where,
 		     struct bkey *insert)
@@ -877,7 +870,6 @@ void bch_bset_insert(struct btree_keys *b, struct bkey *where,
 	bkey_copy(where, insert);
 	bch_bset_fix_lookup_table(b, t, where);
 }
-EXPORT_SYMBOL(bch_bset_insert);
 
 unsigned int bch_btree_insert_key(struct btree_keys *b, struct bkey *k,
 			      struct bkey *replace_key)
@@ -933,7 +925,6 @@ copy:	bkey_copy(m, k);
 merged:
 	return status;
 }
-EXPORT_SYMBOL(bch_btree_insert_key);
 
 /* Lookup */
 
@@ -1079,7 +1070,6 @@ struct bkey *__bch_bset_search(struct btree_keys *b, struct bset_tree *t,
 
 	return i.l;
 }
-EXPORT_SYMBOL(__bch_bset_search);
 
 /* Btree iterator */
 
@@ -1134,7 +1124,6 @@ struct bkey *bch_btree_iter_init(struct btree_keys *b,
 {
 	return __bch_btree_iter_init(b, iter, search, b->set);
 }
-EXPORT_SYMBOL(bch_btree_iter_init);
 
 static inline struct bkey *__bch_btree_iter_next(struct btree_iter *iter,
 						 btree_iter_cmp_fn *cmp)
@@ -1167,7 +1156,6 @@ struct bkey *bch_btree_iter_next(struct btree_iter *iter)
 	return __bch_btree_iter_next(iter, btree_iter_cmp);
 
 }
-EXPORT_SYMBOL(bch_btree_iter_next);
 
 struct bkey *bch_btree_iter_next_filter(struct btree_iter *iter,
 					struct btree_keys *b, ptr_filter_fn fn)
@@ -1198,7 +1186,6 @@ int bch_bset_sort_state_init(struct bset_sort_state *state,
 
 	return mempool_init_page_pool(&state->pool, 1, page_order);
 }
-EXPORT_SYMBOL(bch_bset_sort_state_init);
 
 static void btree_mergesort(struct btree_keys *b, struct bset *out,
 			    struct btree_iter *iter,
@@ -1315,7 +1302,6 @@ void bch_btree_sort_partial(struct btree_keys *b, unsigned int start,
 
 	EBUG_ON(oldsize >= 0 && bch_count_data(b) != oldsize);
 }
-EXPORT_SYMBOL(bch_btree_sort_partial);
 
 void bch_btree_sort_and_fix_extents(struct btree_keys *b,
 				    struct btree_iter *iter,
@@ -1368,7 +1354,6 @@ void bch_btree_sort_lazy(struct btree_keys *b, struct bset_sort_state *state)
 out:
 	bch_bset_build_written_tree(b);
 }
-EXPORT_SYMBOL(bch_btree_sort_lazy);
 
 void bch_btree_keys_stats(struct btree_keys *b, struct bset_stats *stats)
 {
diff --git a/drivers/md/bcache/closure.c b/drivers/md/bcache/closure.c
index c12cd809ab19..0164a1fe94a9 100644
--- a/drivers/md/bcache/closure.c
+++ b/drivers/md/bcache/closure.c
@@ -45,7 +45,6 @@ void closure_sub(struct closure *cl, int v)
 {
 	closure_put_after_sub(cl, atomic_sub_return(v, &cl->remaining));
 }
-EXPORT_SYMBOL(closure_sub);
 
 /*
  * closure_put - decrement a closure's refcount
@@ -54,7 +53,6 @@ void closure_put(struct closure *cl)
 {
 	closure_put_after_sub(cl, atomic_dec_return(&cl->remaining));
 }
-EXPORT_SYMBOL(closure_put);
 
 /*
  * closure_wake_up - wake up all closures on a wait list, without memory barrier
@@ -76,7 +74,6 @@ void __closure_wake_up(struct closure_waitlist *wait_list)
 		closure_sub(cl, CLOSURE_WAITING + 1);
 	}
 }
-EXPORT_SYMBOL(__closure_wake_up);
 
 /**
  * closure_wait - add a closure to a waitlist
@@ -96,7 +93,6 @@ bool closure_wait(struct closure_waitlist *waitlist, struct closure *cl)
 
 	return true;
 }
-EXPORT_SYMBOL(closure_wait);
 
 struct closure_syncer {
 	struct task_struct	*task;
@@ -131,7 +127,6 @@ void __sched __closure_sync(struct closure *cl)
 
 	__set_current_state(TASK_RUNNING);
 }
-EXPORT_SYMBOL(__closure_sync);
 
 #ifdef CONFIG_BCACHE_CLOSURES_DEBUG
 
@@ -149,7 +144,6 @@ void closure_debug_create(struct closure *cl)
 	list_add(&cl->all, &closure_list);
 	spin_unlock_irqrestore(&closure_list_lock, flags);
 }
-EXPORT_SYMBOL(closure_debug_create);
 
 void closure_debug_destroy(struct closure *cl)
 {
@@ -162,7 +156,6 @@ void closure_debug_destroy(struct closure *cl)
 	list_del(&cl->all);
 	spin_unlock_irqrestore(&closure_list_lock, flags);
 }
-EXPORT_SYMBOL(closure_debug_destroy);
 
 static struct dentry *closure_debug;
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 00/12] bcache patches for Linux v5.5
  2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
                   ` (11 preceding siblings ...)
  2019-11-13  8:03 ` [PATCH 12/12] bcache: don't export symbols Coly Li
@ 2019-11-13 22:43 ` Jens Axboe
  12 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2019-11-13 22:43 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, linux-block

On 11/13/19 1:03 AM, Coly Li wrote:
> Hi Jens,
> 
> The second submit adds two missed patches from Christoph Hellwig.
> 
> This is the patches for Linux v5.5. The patches have been testing for
> a while during my current development, they are ready to be merged.
> 
> There are still other patches under testing, I will submit to you in
> later runs if I feel they are solid enough in my testing.
> 
> Thanks for taking care of this.

Applied, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/12] bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
  2019-11-13  8:03 ` [PATCH 02/12] bcache: fix a lost wake-up problem caused by mca_cannibalize_lock Coly Li
@ 2019-11-17  3:32   ` Eric Wheeler
  2019-11-18 15:40     ` Coly Li
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wheeler @ 2019-11-17  3:32 UTC (permalink / raw)
  To: Coly Li; +Cc: axboe, linux-bcache, linux-block, Guoju Fang

On Wed, 13 Nov 2019, Coly Li wrote:

> From: Guoju Fang <fangguoju@gmail.com>
> 
> This patch fix a lost wake-up problem caused by the race between
> mca_cannibalize_lock and bch_cannibalize_unlock.
> 
> Consider two processes, A and B. Process A is executing
> mca_cannibalize_lock, while process B takes c->btree_cache_alloc_lock
> and is executing bch_cannibalize_unlock. The problem happens that after
> process A executes cmpxchg and will execute prepare_to_wait. In this
> timeslice process B executes wake_up, but after that process A executes
> prepare_to_wait and set the state to TASK_INTERRUPTIBLE. Then process A
> goes to sleep but no one will wake up it. This problem may cause bcache
> device to dead.
> 
> Signed-off-by: Guoju Fang <fangguoju@gmail.com>
> Signed-off-by: Coly Li <colyli@suse.de>

Add cc stable?

-Eric


> ---
>  drivers/md/bcache/bcache.h |  1 +
>  drivers/md/bcache/btree.c  | 12 ++++++++----
>  drivers/md/bcache/super.c  |  1 +
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 013e35a9e317..3653faf3bf48 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -582,6 +582,7 @@ struct cache_set {
>  	 */
>  	wait_queue_head_t	btree_cache_wait;
>  	struct task_struct	*btree_cache_alloc_lock;
> +	spinlock_t		btree_cannibalize_lock;
>  
>  	/*
>  	 * When we free a btree node, we increment the gen of the bucket the
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 00523cd1db80..39d7fc1ef1ee 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -910,15 +910,17 @@ static struct btree *mca_find(struct cache_set *c, struct bkey *k)
>  
>  static int mca_cannibalize_lock(struct cache_set *c, struct btree_op *op)
>  {
> -	struct task_struct *old;
> -
> -	old = cmpxchg(&c->btree_cache_alloc_lock, NULL, current);
> -	if (old && old != current) {
> +	spin_lock(&c->btree_cannibalize_lock);
> +	if (likely(c->btree_cache_alloc_lock == NULL)) {
> +		c->btree_cache_alloc_lock = current;
> +	} else if (c->btree_cache_alloc_lock != current) {
>  		if (op)
>  			prepare_to_wait(&c->btree_cache_wait, &op->wait,
>  					TASK_UNINTERRUPTIBLE);
> +		spin_unlock(&c->btree_cannibalize_lock);
>  		return -EINTR;
>  	}
> +	spin_unlock(&c->btree_cannibalize_lock);
>  
>  	return 0;
>  }
> @@ -953,10 +955,12 @@ static struct btree *mca_cannibalize(struct cache_set *c, struct btree_op *op,
>   */
>  static void bch_cannibalize_unlock(struct cache_set *c)
>  {
> +	spin_lock(&c->btree_cannibalize_lock);
>  	if (c->btree_cache_alloc_lock == current) {
>  		c->btree_cache_alloc_lock = NULL;
>  		wake_up(&c->btree_cache_wait);
>  	}
> +	spin_unlock(&c->btree_cannibalize_lock);
>  }
>  
>  static struct btree *mca_alloc(struct cache_set *c, struct btree_op *op,
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 20ed838e9413..ebb854ed05a4 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1769,6 +1769,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
>  	sema_init(&c->sb_write_mutex, 1);
>  	mutex_init(&c->bucket_lock);
>  	init_waitqueue_head(&c->btree_cache_wait);
> +	spin_lock_init(&c->btree_cannibalize_lock);
>  	init_waitqueue_head(&c->bucket_wait);
>  	init_waitqueue_head(&c->gc_wait);
>  	sema_init(&c->uuid_write_mutex, 1);
> -- 
> 2.16.4
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 07/12] bcache: fix deadlock in bcache_allocator
  2019-11-13  8:03 ` [PATCH 07/12] bcache: fix deadlock in bcache_allocator Coly Li
@ 2019-11-17  3:33   ` Eric Wheeler
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Wheeler @ 2019-11-17  3:33 UTC (permalink / raw)
  To: Coly Li; +Cc: axboe, linux-bcache, linux-block, Andrea Righi

On Wed, 13 Nov 2019, Coly Li wrote:

> From: Andrea Righi <andrea.righi@canonical.com>
> 
> bcache_allocator can call the following:
> 
>  bch_allocator_thread()
>   -> bch_prio_write()
>      -> bch_bucket_alloc()
>         -> wait on &ca->set->bucket_wait
> 
> But the wake up event on bucket_wait is supposed to come from
> bch_allocator_thread() itself => deadlock:
> 
> [ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds.
> [ 1158.495929]       Not tainted 5.3.0-050300rc3-generic #201908042232
> [ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1158.504413] bcache_allocato D    0 15861      2 0x80004000
> [ 1158.504419] Call Trace:
> [ 1158.504429]  __schedule+0x2a8/0x670
> [ 1158.504432]  schedule+0x2d/0x90
> [ 1158.504448]  bch_bucket_alloc+0xe5/0x370 [bcache]
> [ 1158.504453]  ? wait_woken+0x80/0x80
> [ 1158.504466]  bch_prio_write+0x1dc/0x390 [bcache]
> [ 1158.504476]  bch_allocator_thread+0x233/0x490 [bcache]
> [ 1158.504491]  kthread+0x121/0x140
> [ 1158.504503]  ? invalidate_buckets+0x890/0x890 [bcache]
> [ 1158.504506]  ? kthread_park+0xb0/0xb0
> [ 1158.504510]  ret_from_fork+0x35/0x40
> 
> Fix by making the call to bch_prio_write() non-blocking, so that
> bch_allocator_thread() never waits on itself.
> 
> Moreover, make sure to wake up the garbage collector thread when
> bch_prio_write() is failing to allocate buckets.
> 
> BugLink: https://bugs.launchpad.net/bugs/1784665
> BugLink: https://bugs.launchpad.net/bugs/1796292
> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
> Signed-off-by: Coly Li <colyli@suse.de>

Add cc stable?

-Eric


> ---
>  drivers/md/bcache/alloc.c  |  5 ++++-
>  drivers/md/bcache/bcache.h |  2 +-
>  drivers/md/bcache/super.c  | 27 +++++++++++++++++++++------
>  3 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
> index 6f776823b9ba..a1df0d95151c 100644
> --- a/drivers/md/bcache/alloc.c
> +++ b/drivers/md/bcache/alloc.c
> @@ -377,7 +377,10 @@ static int bch_allocator_thread(void *arg)
>  			if (!fifo_full(&ca->free_inc))
>  				goto retry_invalidate;
>  
> -			bch_prio_write(ca);
> +			if (bch_prio_write(ca, false) < 0) {
> +				ca->invalidate_needs_gc = 1;
> +				wake_up_gc(ca->set);
> +			}
>  		}
>  	}
>  out:
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 3653faf3bf48..50241e045c70 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -978,7 +978,7 @@ bool bch_cached_dev_error(struct cached_dev *dc);
>  __printf(2, 3)
>  bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...);
>  
> -void bch_prio_write(struct cache *ca);
> +int bch_prio_write(struct cache *ca, bool wait);
>  void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
>  
>  extern struct workqueue_struct *bcache_wq;
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 623fdaf10c4c..d1352fcc6ff2 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -530,12 +530,29 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,
>  	closure_sync(cl);
>  }
>  
> -void bch_prio_write(struct cache *ca)
> +int bch_prio_write(struct cache *ca, bool wait)
>  {
>  	int i;
>  	struct bucket *b;
>  	struct closure cl;
>  
> +	pr_debug("free_prio=%zu, free_none=%zu, free_inc=%zu",
> +		 fifo_used(&ca->free[RESERVE_PRIO]),
> +		 fifo_used(&ca->free[RESERVE_NONE]),
> +		 fifo_used(&ca->free_inc));
> +
> +	/*
> +	 * Pre-check if there are enough free buckets. In the non-blocking
> +	 * scenario it's better to fail early rather than starting to allocate
> +	 * buckets and do a cleanup later in case of failure.
> +	 */
> +	if (!wait) {
> +		size_t avail = fifo_used(&ca->free[RESERVE_PRIO]) +
> +			       fifo_used(&ca->free[RESERVE_NONE]);
> +		if (prio_buckets(ca) > avail)
> +			return -ENOMEM;
> +	}
> +
>  	closure_init_stack(&cl);
>  
>  	lockdep_assert_held(&ca->set->bucket_lock);
> @@ -545,9 +562,6 @@ void bch_prio_write(struct cache *ca)
>  	atomic_long_add(ca->sb.bucket_size * prio_buckets(ca),
>  			&ca->meta_sectors_written);
>  
> -	//pr_debug("free %zu, free_inc %zu, unused %zu", fifo_used(&ca->free),
> -	//	 fifo_used(&ca->free_inc), fifo_used(&ca->unused));
> -
>  	for (i = prio_buckets(ca) - 1; i >= 0; --i) {
>  		long bucket;
>  		struct prio_set *p = ca->disk_buckets;
> @@ -565,7 +579,7 @@ void bch_prio_write(struct cache *ca)
>  		p->magic	= pset_magic(&ca->sb);
>  		p->csum		= bch_crc64(&p->magic, bucket_bytes(ca) - 8);
>  
> -		bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true);
> +		bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
>  		BUG_ON(bucket == -1);
>  
>  		mutex_unlock(&ca->set->bucket_lock);
> @@ -594,6 +608,7 @@ void bch_prio_write(struct cache *ca)
>  
>  		ca->prio_last_buckets[i] = ca->prio_buckets[i];
>  	}
> +	return 0;
>  }
>  
>  static void prio_read(struct cache *ca, uint64_t bucket)
> @@ -1964,7 +1979,7 @@ static int run_cache_set(struct cache_set *c)
>  
>  		mutex_lock(&c->bucket_lock);
>  		for_each_cache(ca, c, i)
> -			bch_prio_write(ca);
> +			bch_prio_write(ca, true);
>  		mutex_unlock(&c->bucket_lock);
>  
>  		err = "cannot allocate new UUID bucket";
> -- 
> 2.16.4
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/12] bcache: add idle_max_writeback_rate sysfs interface
  2019-11-13  8:03 ` [PATCH 09/12] bcache: add idle_max_writeback_rate sysfs interface Coly Li
@ 2019-11-17  3:34   ` Eric Wheeler
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Wheeler @ 2019-11-17  3:34 UTC (permalink / raw)
  To: Coly Li; +Cc: axboe, linux-bcache, linux-block

On Wed, 13 Nov 2019, Coly Li wrote:

> For writeback mode, if there is no regular I/O request for a while,
> the writeback rate will be set to the maximum value (1TB/s for now).
> This is good for most of the storage workload, but there are still
> people don't what the maximum writeback rate in I/O idle time.
> 
> This patch adds a sysfs interface file idle_max_writeback_rate to
> permit people to disable maximum writeback rate. Then the minimum
> writeback rate can be advised by writeback_rate_minimum in the
> bcache device's sysfs interface.
> 
> Reported-by: Christian Balzer <chibi@gol.com>
> Signed-off-by: Coly Li <colyli@suse.de>

This fixes a bug for Christian, add cc stable?

-Eric


--
Eric Wheeler



> ---
>  drivers/md/bcache/bcache.h    | 1 +
>  drivers/md/bcache/super.c     | 1 +
>  drivers/md/bcache/sysfs.c     | 7 +++++++
>  drivers/md/bcache/writeback.c | 4 ++++
>  4 files changed, 13 insertions(+)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 50241e045c70..9198c1b480d9 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -724,6 +724,7 @@ struct cache_set {
>  	unsigned int		gc_always_rewrite:1;
>  	unsigned int		shrinker_disabled:1;
>  	unsigned int		copy_gc_enabled:1;
> +	unsigned int		idle_max_writeback_rate_enabled:1;
>  
>  #define BUCKET_HASH_BITS	12
>  	struct hlist_head	bucket_hash[1 << BUCKET_HASH_BITS];
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index d1352fcc6ff2..77e9869345e7 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1834,6 +1834,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
>  	c->congested_read_threshold_us	= 2000;
>  	c->congested_write_threshold_us	= 20000;
>  	c->error_limit	= DEFAULT_IO_ERROR_LIMIT;
> +	c->idle_max_writeback_rate_enabled = 1;
>  	WARN_ON(test_and_clear_bit(CACHE_SET_IO_DISABLE, &c->flags));
>  
>  	return c;
> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
> index 627dcea0f5b6..733e2ddf3c78 100644
> --- a/drivers/md/bcache/sysfs.c
> +++ b/drivers/md/bcache/sysfs.c
> @@ -134,6 +134,7 @@ rw_attribute(expensive_debug_checks);
>  rw_attribute(cache_replacement_policy);
>  rw_attribute(btree_shrinker_disabled);
>  rw_attribute(copy_gc_enabled);
> +rw_attribute(idle_max_writeback_rate);
>  rw_attribute(gc_after_writeback);
>  rw_attribute(size);
>  
> @@ -747,6 +748,8 @@ SHOW(__bch_cache_set)
>  	sysfs_printf(gc_always_rewrite,		"%i", c->gc_always_rewrite);
>  	sysfs_printf(btree_shrinker_disabled,	"%i", c->shrinker_disabled);
>  	sysfs_printf(copy_gc_enabled,		"%i", c->copy_gc_enabled);
> +	sysfs_printf(idle_max_writeback_rate,	"%i",
> +		     c->idle_max_writeback_rate_enabled);
>  	sysfs_printf(gc_after_writeback,	"%i", c->gc_after_writeback);
>  	sysfs_printf(io_disable,		"%i",
>  		     test_bit(CACHE_SET_IO_DISABLE, &c->flags));
> @@ -864,6 +867,9 @@ STORE(__bch_cache_set)
>  	sysfs_strtoul_bool(gc_always_rewrite,	c->gc_always_rewrite);
>  	sysfs_strtoul_bool(btree_shrinker_disabled, c->shrinker_disabled);
>  	sysfs_strtoul_bool(copy_gc_enabled,	c->copy_gc_enabled);
> +	sysfs_strtoul_bool(idle_max_writeback_rate,
> +			   c->idle_max_writeback_rate_enabled);
> +
>  	/*
>  	 * write gc_after_writeback here may overwrite an already set
>  	 * BCH_DO_AUTO_GC, it doesn't matter because this flag will be
> @@ -954,6 +960,7 @@ static struct attribute *bch_cache_set_internal_files[] = {
>  	&sysfs_gc_always_rewrite,
>  	&sysfs_btree_shrinker_disabled,
>  	&sysfs_copy_gc_enabled,
> +	&sysfs_idle_max_writeback_rate,
>  	&sysfs_gc_after_writeback,
>  	&sysfs_io_disable,
>  	&sysfs_cutoff_writeback,
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index d60268fe49e1..4a40f9eadeaf 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -122,6 +122,10 @@ static void __update_writeback_rate(struct cached_dev *dc)
>  static bool set_at_max_writeback_rate(struct cache_set *c,
>  				       struct cached_dev *dc)
>  {
> +	/* Don't sst max writeback rate if it is disabled */
> +	if (!c->idle_max_writeback_rate_enabled)
> +		return false;
> +
>  	/* Don't set max writeback rate if gc is running */
>  	if (!c->gc_mark_valid)
>  		return false;
> -- 
> 2.16.4
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/12] bcache: fix fifo index swapping condition in journal_pin_cmp()
  2019-11-13  8:03 ` [PATCH 01/12] bcache: fix fifo index swapping condition in journal_pin_cmp() Coly Li
@ 2019-11-18 15:28   ` Coly Li
  2019-11-18 15:36     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Coly Li @ 2019-11-18 15:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block

On 2019/11/13 4:03 下午, Coly Li wrote:
> Fifo structure journal.pin is implemented by a cycle buffer, if the back
> index reaches highest location of the cycle buffer, it will be swapped
> to 0. Once the swapping happens, it means a smaller fifo index might be
> associated to a newer journal entry. So the btree node with oldest
> journal entry won't be selected in bch_btree_leaf_dirty() to reference
> the dirty B+tree leaf node. This problem may cause bcache journal won't
> protect unflushed oldest B+tree dirty leaf node in power failure, and
> this B+tree leaf node is possible to beinconsistent after reboot from
> power failure.
> 
> This patch fixes the fifo index comparing logic in journal_pin_cmp(),
> to avoid potential corrupted B+tree leaf node when the back index of
> journal pin is swapped.
> 
> Signed-off-by: Coly Li <colyli@suse.de>

Hi Jens,

Guoju Fang talked to me today, he told me this change was unnecessary
and I was over-thought.

Then I realize fifo_idx() uses a mask to handle the array index overflow
condition, so the index swap in journal_pin_cmp() won't happen. And yes,
Guoju and Kent are correct.

Since you already applied this patch, can you please to remove this
patch from your for-next branch ? This single patch does not break
thing, but it is unecessary at this moment.

Thanks.

Coly Li

> ---
>  drivers/md/bcache/btree.c   | 26 ++++++++++++++++++++++++++
>  drivers/md/bcache/journal.h |  4 ----
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index ba434d9ac720..00523cd1db80 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -528,6 +528,32 @@ static void btree_node_write_work(struct work_struct *w)
>  	mutex_unlock(&b->write_lock);
>  }
>  
> +/* return true if journal pin 'l' is newer than 'r' */
> +static bool journal_pin_cmp(struct cache_set *c,
> +			    atomic_t *l,
> +			    atomic_t *r)
> +{
> +	int l_idx, r_idx, f_idx, b_idx;
> +	bool ret = false;
> +
> +	l_idx = fifo_idx(&(c)->journal.pin, (l));
> +	r_idx = fifo_idx(&(c)->journal.pin, (r));
> +	f_idx = (c)->journal.pin.front;
> +	b_idx = (c)->journal.pin.back;
> +
> +	if (l_idx > r_idx)
> +		ret = true;
> +	/* in case fifo back pointer is swapped */
> +	if (b_idx < f_idx) {
> +		if (l_idx <= b_idx && r_idx >= f_idx)
> +			ret = true;
> +		else if (l_idx >= f_idx && r_idx <= b_idx)
> +			ret = false;
> +	}
> +
> +	return ret;
> +}
> +
>  static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
>  {
>  	struct bset *i = btree_bset_last(b);
> diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h
> index f2ea34d5f431..06b3eaab7d16 100644
> --- a/drivers/md/bcache/journal.h
> +++ b/drivers/md/bcache/journal.h
> @@ -157,10 +157,6 @@ struct journal_device {
>  };
>  
>  #define BTREE_FLUSH_NR	8
> -
> -#define journal_pin_cmp(c, l, r)				\
> -	(fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r)))
> -
>  #define JOURNAL_PIN	20000
>  
>  #define journal_full(j)						\
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/12] bcache: fix fifo index swapping condition in journal_pin_cmp()
  2019-11-18 15:28   ` Coly Li
@ 2019-11-18 15:36     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2019-11-18 15:36 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, linux-block

On 11/18/19 8:28 AM, Coly Li wrote:
> On 2019/11/13 4:03 下午, Coly Li wrote:
>> Fifo structure journal.pin is implemented by a cycle buffer, if the back
>> index reaches highest location of the cycle buffer, it will be swapped
>> to 0. Once the swapping happens, it means a smaller fifo index might be
>> associated to a newer journal entry. So the btree node with oldest
>> journal entry won't be selected in bch_btree_leaf_dirty() to reference
>> the dirty B+tree leaf node. This problem may cause bcache journal won't
>> protect unflushed oldest B+tree dirty leaf node in power failure, and
>> this B+tree leaf node is possible to beinconsistent after reboot from
>> power failure.
>>
>> This patch fixes the fifo index comparing logic in journal_pin_cmp(),
>> to avoid potential corrupted B+tree leaf node when the back index of
>> journal pin is swapped.
>>
>> Signed-off-by: Coly Li <colyli@suse.de>
> 
> Hi Jens,
> 
> Guoju Fang talked to me today, he told me this change was unnecessary
> and I was over-thought.
> 
> Then I realize fifo_idx() uses a mask to handle the array index overflow
> condition, so the index swap in journal_pin_cmp() won't happen. And yes,
> Guoju and Kent are correct.
> 
> Since you already applied this patch, can you please to remove this
> patch from your for-next branch ? This single patch does not break
> thing, but it is unecessary at this moment.

Sure, done.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/12] bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
  2019-11-17  3:32   ` Eric Wheeler
@ 2019-11-18 15:40     ` Coly Li
  0 siblings, 0 replies; 20+ messages in thread
From: Coly Li @ 2019-11-18 15:40 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: axboe, linux-bcache, linux-block, Guoju Fang

On 2019/11/17 11:32 上午, Eric Wheeler wrote:
> On Wed, 13 Nov 2019, Coly Li wrote:
> 
>> From: Guoju Fang <fangguoju@gmail.com>
>>
>> This patch fix a lost wake-up problem caused by the race between
>> mca_cannibalize_lock and bch_cannibalize_unlock.
>>
>> Consider two processes, A and B. Process A is executing
>> mca_cannibalize_lock, while process B takes c->btree_cache_alloc_lock
>> and is executing bch_cannibalize_unlock. The problem happens that after
>> process A executes cmpxchg and will execute prepare_to_wait. In this
>> timeslice process B executes wake_up, but after that process A executes
>> prepare_to_wait and set the state to TASK_INTERRUPTIBLE. Then process A
>> goes to sleep but no one will wake up it. This problem may cause bcache
>> device to dead.
>>
>> Signed-off-by: Guoju Fang <fangguoju@gmail.com>
>> Signed-off-by: Coly Li <colyli@suse.de>
> 
> Add cc stable?
> 

Yes, I agree. Now these patches are applied by Jens, how about
explicitly send these patches to linux-stable after they go upstream ?

Thanks.

Coly Li


> -Eric
> 
> 
>> ---
>>  drivers/md/bcache/bcache.h |  1 +
>>  drivers/md/bcache/btree.c  | 12 ++++++++----
>>  drivers/md/bcache/super.c  |  1 +
>>  3 files changed, 10 insertions(+), 4 deletions(-)
>>
[snip]


-- 

Coly Li

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-11-18 15:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13  8:03 [PATCH v2 00/12] bcache patches for Linux v5.5 Coly Li
2019-11-13  8:03 ` [PATCH 01/12] bcache: fix fifo index swapping condition in journal_pin_cmp() Coly Li
2019-11-18 15:28   ` Coly Li
2019-11-18 15:36     ` Jens Axboe
2019-11-13  8:03 ` [PATCH 02/12] bcache: fix a lost wake-up problem caused by mca_cannibalize_lock Coly Li
2019-11-17  3:32   ` Eric Wheeler
2019-11-18 15:40     ` Coly Li
2019-11-13  8:03 ` [PATCH 03/12] bcache: fix static checker warning in bcache_device_free() Coly Li
2019-11-13  8:03 ` [PATCH 04/12] bcache: add more accurate error messages in read_super() Coly Li
2019-11-13  8:03 ` [PATCH 05/12] bcache: deleted code comments for dead code in bch_data_insert_keys() Coly Li
2019-11-13  8:03 ` [PATCH 06/12] bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front() Coly Li
2019-11-13  8:03 ` [PATCH 07/12] bcache: fix deadlock in bcache_allocator Coly Li
2019-11-17  3:33   ` Eric Wheeler
2019-11-13  8:03 ` [PATCH 08/12] bcache: add code comments in bch_btree_leaf_dirty() Coly Li
2019-11-13  8:03 ` [PATCH 09/12] bcache: add idle_max_writeback_rate sysfs interface Coly Li
2019-11-17  3:34   ` Eric Wheeler
2019-11-13  8:03 ` [PATCH 10/12] bcache: at least try to shrink 1 node in bch_mca_scan() Coly Li
2019-11-13  8:03 ` [PATCH 11/12] bcache: remove the extra cflags for request.o Coly Li
2019-11-13  8:03 ` [PATCH 12/12] bcache: don't export symbols Coly Li
2019-11-13 22:43 ` [PATCH v2 00/12] bcache patches for Linux v5.5 Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).