linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fix degraded system performance due to workqueue overload
@ 2021-01-27 13:23 Kai Krakow
  2021-01-27 13:23 ` [PATCH 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-27 13:23 UTC (permalink / raw)
  To: linux-bcache

In the past months (and looking back, even years), I was seeing system
performance and latency degrading vastly when bcache is active.

Finally, with kernel 5.10, I was able to locate the problem:

[250336.887598] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 72s!
[250336.887606] Showing busy workqueues and worker pools:
[250336.887607] workqueue events: flags=0x0
[250336.887608]   pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=3/256 refcnt=4
[250336.887611]     pending: psi_avgs_work, psi_avgs_work, psi_avgs_work
[250336.887619]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=15/256 refcnt=16
[250336.887621]     in-flight: 3760137:psi_avgs_work
[250336.887624]     pending: psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work
[250336.887637]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[250336.887639]     pending: psi_avgs_work
[250336.887643] workqueue events_power_efficient: flags=0x80
[250336.887644]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[250336.887646]     pending: do_cache_clean
[250336.887651] workqueue mm_percpu_wq: flags=0x8
[250336.887651]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256 refcnt=4
[250336.887653]     pending: lru_add_drain_per_cpu BAR(60), vmstat_update
[250336.887666] workqueue bcache: flags=0x8
[250336.887667]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[250336.887668]     pending: cached_dev_nodata
[250336.887681] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=72s workers=2 idle: 3760136

I was able to track that back to the following commit:
56b30770b27d54d68ad51eccc6d888282b568cee ("bcache: Kill btree_io_wq")

Reverting that commit (with some adjustments due to later code changes)
improved my desktop latency a lot, I mean really a lot. The system was
finally able to handle somewhat higher loads without stalling for
several seconds and without spiking load into the hundreds while doing a
lot of write IO.

So I dug a little deeper and found that the assumption of this old
commit may no longer be true and bcache simply overwhelms the system_wq
with too many or too long running workers. This should really only be
used for workers that can do their work almost instantly, and it should
not be spammed with a lot of workers which bcache seems to do (look at
how many kthreads it creates from workers):

# ps aux | grep 'kworker/.*bc' | wc -l
131

And this is with a mostly idle system, it may easily reach 700+. Also,
with my patches in place, that number seems to be overall lower.

So I added another commit (patch 2) to move another worker queue over
to a dedicated worker queue ("bcache: Move journal work to new
background wq").

I tested this by overloading my desktop system with the following
parallel load:

  * A big download at 1 Gbit/s, resulting in 60+ MB/s write
  * Active IPFS daemon
  * Watching a YouTube video
  * Fully syncing 4 IMAP accounts with MailSpring
  * Running a Gentoo system update (compiling packages)
  * Browsing the web
  * Running a Windows VM (Qemu) with Outlook and defragmentation
  * Starting and closing several applications and clicking in them

IO setup: 4x HDD (2+2+4+4 TB) btrfs RAID-0 with 850 GB SSD bcache
Kernel 5.10.10

Without the patches, the system would have come to a stop, probably not
recovering from it (last time I tried, a clean shutdown took 1+ hour).
With the patches, the system easily survives and feels overall smooth
with only a small perceivable lag.

Boot times are more consistent, too, and faster when bcache is mostly
cold due to a previous system update.

Write rates of the system are more smooth now, and can easily sustain a
constant load of 200-300 MB/s while previously I would see long stalls
followed by vastly reduces write performance (down to 5-20 MB/s).

I'm not sure if there are side-effects of my patches that I cannot know
of but it works great for me: All write-related desktop stalling is
gone.

-- 
Regards,
Kai



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] Revert "bcache: Kill btree_io_wq"
  2021-01-27 13:23 Fix degraded system performance due to workqueue overload Kai Krakow
@ 2021-01-27 13:23 ` Kai Krakow
  2021-01-27 16:28   ` Kai Krakow
  2021-01-27 13:23 ` [PATCH 2/2] bcache: Move journal work to new background wq Kai Krakow
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2021-01-27 13:23 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow

This reverts commit 56b30770b27d54d68ad51eccc6d888282b568cee.

With the btree using the system_wq, I seem to see a lot more desktop
latency than I should. So let's revert this.
---
 drivers/md/bcache/bcache.h |  2 ++
 drivers/md/bcache/btree.c  | 21 +++++++++++++++++++--
 drivers/md/bcache/super.c  |  4 ++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 1d57f48307e6..b1ed16c7a534 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1042,5 +1042,7 @@ void bch_debug_exit(void);
 void bch_debug_init(void);
 void bch_request_exit(void);
 int bch_request_init(void);
+void bch_btree_exit(void);
+int bch_btree_init(void);
 
 #endif /* _BCACHE_H */
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 910df242c83d..952f022db5a5 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -99,6 +99,8 @@
 #define PTR_HASH(c, k)							\
 	(((k)->ptr[0] >> c->bucket_bits) | PTR_GEN(k, 0))
 
+static struct workqueue_struct *btree_io_wq;
+
 #define insert_lock(s, b)	((b)->level <= (s)->lock)
 
 
@@ -308,7 +310,7 @@ static void __btree_node_write_done(struct closure *cl)
 	btree_complete_write(b, w);
 
 	if (btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	closure_return_with_destructor(cl, btree_node_write_unlock);
 }
@@ -481,7 +483,7 @@ static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
 	BUG_ON(!i->keys);
 
 	if (!btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	set_btree_node_dirty(b);
 
@@ -2764,3 +2766,18 @@ void bch_keybuf_init(struct keybuf *buf)
 	spin_lock_init(&buf->lock);
 	array_allocator_init(&buf->freelist);
 }
+
+void bch_btree_exit(void)
+{
+	if (btree_io_wq)
+		destroy_workqueue(btree_io_wq);
+}
+
+int __init bch_btree_init(void)
+{
+	btree_io_wq = create_singlethread_workqueue("bch_btree_io");
+	if (!btree_io_wq)
+		return -ENOMEM;
+
+	return 0;
+}
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 2047a9cccdb5..dc4fe7eeda81 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2815,6 +2815,7 @@ static void bcache_exit(void)
 {
 	bch_debug_exit();
 	bch_request_exit();
+	bch_btree_exit();
 	if (bcache_kobj)
 		kobject_put(bcache_kobj);
 	if (bcache_wq)
@@ -2880,6 +2881,9 @@ static int __init bcache_init(void)
 	if (!bcache_wq)
 		goto err;
 
+	if (bch_btree_init())
+		goto err;
+
 	bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
 	if (!bch_journal_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] bcache: Move journal work to new background wq
  2021-01-27 13:23 Fix degraded system performance due to workqueue overload Kai Krakow
  2021-01-27 13:23 ` [PATCH 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
@ 2021-01-27 13:23 ` Kai Krakow
  2021-01-27 16:28   ` Kai Krakow
  2021-01-27 15:27 ` Fix degraded system performance due to workqueue overload Coly Li
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2021-01-27 13:23 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow

This is potentially long running and not latency sensitive, let's get
it out of the way of other latency sensitive events.
---
 drivers/md/bcache/bcache.h  | 1 +
 drivers/md/bcache/journal.c | 4 ++--
 drivers/md/bcache/super.c   | 7 +++++++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index b1ed16c7a534..70565ed5732d 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1001,6 +1001,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
 
 extern struct workqueue_struct *bcache_wq;
 extern struct workqueue_struct *bch_journal_wq;
+extern struct workqueue_struct *bch_background_wq;
 extern struct mutex bch_register_lock;
 extern struct list_head bch_cache_sets;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index aefbdb7e003b..942e97dd1755 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -932,8 +932,8 @@ atomic_t *bch_journal(struct cache_set *c,
 		journal_try_write(c);
 	} else if (!w->dirty) {
 		w->dirty = true;
-		schedule_delayed_work(&c->journal.work,
-				      msecs_to_jiffies(c->journal_delay_ms));
+		queue_delayed_work(bch_background_wq, &c->journal.work,
+				   msecs_to_jiffies(c->journal_delay_ms));
 		spin_unlock(&c->journal.lock);
 	} else {
 		spin_unlock(&c->journal.lock);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index dc4fe7eeda81..9e1481917ce6 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -49,6 +49,7 @@ static int bcache_major;
 static DEFINE_IDA(bcache_device_idx);
 static wait_queue_head_t unregister_wait;
 struct workqueue_struct *bcache_wq;
+struct workqueue_struct *bch_background_wq;
 struct workqueue_struct *bch_journal_wq;
 
 
@@ -2822,6 +2823,8 @@ static void bcache_exit(void)
 		destroy_workqueue(bcache_wq);
 	if (bch_journal_wq)
 		destroy_workqueue(bch_journal_wq);
+	if (bch_background_wq)
+		destroy_workqueue(bch_background_wq);
 
 	if (bcache_major)
 		unregister_blkdev(bcache_major, "bcache");
@@ -2884,6 +2887,10 @@ static int __init bcache_init(void)
 	if (bch_btree_init())
 		goto err;
 
+	bch_background_wq = alloc_workqueue("bch_background", 0, 0);
+	if (!bch_background_wq)
+		goto err;
+
 	bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
 	if (!bch_journal_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Fix degraded system performance due to workqueue overload
  2021-01-27 13:23 Fix degraded system performance due to workqueue overload Kai Krakow
  2021-01-27 13:23 ` [PATCH 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
  2021-01-27 13:23 ` [PATCH 2/2] bcache: Move journal work to new background wq Kai Krakow
@ 2021-01-27 15:27 ` Coly Li
  2021-01-27 16:39 ` [PATCH 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Coly Li @ 2021-01-27 15:27 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

On 1/27/21 9:23 PM, Kai Krakow wrote:
> In the past months (and looking back, even years), I was seeing system
> performance and latency degrading vastly when bcache is active.
> 
> Finally, with kernel 5.10, I was able to locate the problem:
> 
> [250336.887598] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 72s!
> [250336.887606] Showing busy workqueues and worker pools:
> [250336.887607] workqueue events: flags=0x0
> [250336.887608]   pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=3/256 refcnt=4
> [250336.887611]     pending: psi_avgs_work, psi_avgs_work, psi_avgs_work
> [250336.887619]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=15/256 refcnt=16
> [250336.887621]     in-flight: 3760137:psi_avgs_work
> [250336.887624]     pending: psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work
> [250336.887637]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [250336.887639]     pending: psi_avgs_work
> [250336.887643] workqueue events_power_efficient: flags=0x80
> [250336.887644]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [250336.887646]     pending: do_cache_clean
> [250336.887651] workqueue mm_percpu_wq: flags=0x8
> [250336.887651]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256 refcnt=4
> [250336.887653]     pending: lru_add_drain_per_cpu BAR(60), vmstat_update
> [250336.887666] workqueue bcache: flags=0x8
> [250336.887667]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [250336.887668]     pending: cached_dev_nodata
> [250336.887681] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=72s workers=2 idle: 3760136
> 
> I was able to track that back to the following commit:
> 56b30770b27d54d68ad51eccc6d888282b568cee ("bcache: Kill btree_io_wq")
> 
> Reverting that commit (with some adjustments due to later code changes)
> improved my desktop latency a lot, I mean really a lot. The system was
> finally able to handle somewhat higher loads without stalling for
> several seconds and without spiking load into the hundreds while doing a
> lot of write IO.
> 
> So I dug a little deeper and found that the assumption of this old
> commit may no longer be true and bcache simply overwhelms the system_wq
> with too many or too long running workers. This should really only be
> used for workers that can do their work almost instantly, and it should
> not be spammed with a lot of workers which bcache seems to do (look at
> how many kthreads it creates from workers):
> 
> # ps aux | grep 'kworker/.*bc' | wc -l
> 131
> 
> And this is with a mostly idle system, it may easily reach 700+. Also,
> with my patches in place, that number seems to be overall lower.
> 
> So I added another commit (patch 2) to move another worker queue over
> to a dedicated worker queue ("bcache: Move journal work to new
> background wq").
> 
> I tested this by overloading my desktop system with the following
> parallel load:
> 
>   * A big download at 1 Gbit/s, resulting in 60+ MB/s write
>   * Active IPFS daemon
>   * Watching a YouTube video
>   * Fully syncing 4 IMAP accounts with MailSpring
>   * Running a Gentoo system update (compiling packages)
>   * Browsing the web
>   * Running a Windows VM (Qemu) with Outlook and defragmentation
>   * Starting and closing several applications and clicking in them
> 
> IO setup: 4x HDD (2+2+4+4 TB) btrfs RAID-0 with 850 GB SSD bcache
> Kernel 5.10.10
> 
> Without the patches, the system would have come to a stop, probably not
> recovering from it (last time I tried, a clean shutdown took 1+ hour).
> With the patches, the system easily survives and feels overall smooth
> with only a small perceivable lag.
> 
> Boot times are more consistent, too, and faster when bcache is mostly
> cold due to a previous system update.
> 
> Write rates of the system are more smooth now, and can easily sustain a
> constant load of 200-300 MB/s while previously I would see long stalls
> followed by vastly reduces write performance (down to 5-20 MB/s).
> 
> I'm not sure if there are side-effects of my patches that I cannot know
> of but it works great for me: All write-related desktop stalling is
> gone.
> 


Hi Kai,

Overall I am OK with this series, it makes sense IMHO. Let's wait for
response from Kent, if there is no comment from him, I will add these
two patches in my v5.12 for-next series.

Thanks for the fix up.

Coly Li

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] bcache: Move journal work to new background wq
  2021-01-27 13:23 ` [PATCH 2/2] bcache: Move journal work to new background wq Kai Krakow
@ 2021-01-27 16:28   ` Kai Krakow
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-27 16:28 UTC (permalink / raw)
  To: linux-bcache

Please ignore this patch with missing IRP, will re-post to the intro mail.

Am Mi., 27. Jan. 2021 um 14:24 Uhr schrieb Kai Krakow <kai@kaishome.de>:
>
> This is potentially long running and not latency sensitive, let's get
> it out of the way of other latency sensitive events.
> ---
>  drivers/md/bcache/bcache.h  | 1 +
>  drivers/md/bcache/journal.c | 4 ++--
>  drivers/md/bcache/super.c   | 7 +++++++
>  3 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index b1ed16c7a534..70565ed5732d 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -1001,6 +1001,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
>
>  extern struct workqueue_struct *bcache_wq;
>  extern struct workqueue_struct *bch_journal_wq;
> +extern struct workqueue_struct *bch_background_wq;
>  extern struct mutex bch_register_lock;
>  extern struct list_head bch_cache_sets;
>
> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
> index aefbdb7e003b..942e97dd1755 100644
> --- a/drivers/md/bcache/journal.c
> +++ b/drivers/md/bcache/journal.c
> @@ -932,8 +932,8 @@ atomic_t *bch_journal(struct cache_set *c,
>                 journal_try_write(c);
>         } else if (!w->dirty) {
>                 w->dirty = true;
> -               schedule_delayed_work(&c->journal.work,
> -                                     msecs_to_jiffies(c->journal_delay_ms));
> +               queue_delayed_work(bch_background_wq, &c->journal.work,
> +                                  msecs_to_jiffies(c->journal_delay_ms));
>                 spin_unlock(&c->journal.lock);
>         } else {
>                 spin_unlock(&c->journal.lock);
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index dc4fe7eeda81..9e1481917ce6 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -49,6 +49,7 @@ static int bcache_major;
>  static DEFINE_IDA(bcache_device_idx);
>  static wait_queue_head_t unregister_wait;
>  struct workqueue_struct *bcache_wq;
> +struct workqueue_struct *bch_background_wq;
>  struct workqueue_struct *bch_journal_wq;
>
>
> @@ -2822,6 +2823,8 @@ static void bcache_exit(void)
>                 destroy_workqueue(bcache_wq);
>         if (bch_journal_wq)
>                 destroy_workqueue(bch_journal_wq);
> +       if (bch_background_wq)
> +               destroy_workqueue(bch_background_wq);
>
>         if (bcache_major)
>                 unregister_blkdev(bcache_major, "bcache");
> @@ -2884,6 +2887,10 @@ static int __init bcache_init(void)
>         if (bch_btree_init())
>                 goto err;
>
> +       bch_background_wq = alloc_workqueue("bch_background", 0, 0);
> +       if (!bch_background_wq)
> +               goto err;
> +
>         bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
>         if (!bch_journal_wq)
>                 goto err;
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "bcache: Kill btree_io_wq"
  2021-01-27 13:23 ` [PATCH 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
@ 2021-01-27 16:28   ` Kai Krakow
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-27 16:28 UTC (permalink / raw)
  To: linux-bcache

Please ignore this patch with missing IRP, will re-post to the intro mail.

Am Mi., 27. Jan. 2021 um 14:24 Uhr schrieb Kai Krakow <kai@kaishome.de>:
>
> This reverts commit 56b30770b27d54d68ad51eccc6d888282b568cee.
>
> With the btree using the system_wq, I seem to see a lot more desktop
> latency than I should. So let's revert this.
> ---
>  drivers/md/bcache/bcache.h |  2 ++
>  drivers/md/bcache/btree.c  | 21 +++++++++++++++++++--
>  drivers/md/bcache/super.c  |  4 ++++
>  3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 1d57f48307e6..b1ed16c7a534 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -1042,5 +1042,7 @@ void bch_debug_exit(void);
>  void bch_debug_init(void);
>  void bch_request_exit(void);
>  int bch_request_init(void);
> +void bch_btree_exit(void);
> +int bch_btree_init(void);
>
>  #endif /* _BCACHE_H */
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 910df242c83d..952f022db5a5 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -99,6 +99,8 @@
>  #define PTR_HASH(c, k)                                                 \
>         (((k)->ptr[0] >> c->bucket_bits) | PTR_GEN(k, 0))
>
> +static struct workqueue_struct *btree_io_wq;
> +
>  #define insert_lock(s, b)      ((b)->level <= (s)->lock)
>
>
> @@ -308,7 +310,7 @@ static void __btree_node_write_done(struct closure *cl)
>         btree_complete_write(b, w);
>
>         if (btree_node_dirty(b))
> -               schedule_delayed_work(&b->work, 30 * HZ);
> +               queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
>
>         closure_return_with_destructor(cl, btree_node_write_unlock);
>  }
> @@ -481,7 +483,7 @@ static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
>         BUG_ON(!i->keys);
>
>         if (!btree_node_dirty(b))
> -               schedule_delayed_work(&b->work, 30 * HZ);
> +               queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
>
>         set_btree_node_dirty(b);
>
> @@ -2764,3 +2766,18 @@ void bch_keybuf_init(struct keybuf *buf)
>         spin_lock_init(&buf->lock);
>         array_allocator_init(&buf->freelist);
>  }
> +
> +void bch_btree_exit(void)
> +{
> +       if (btree_io_wq)
> +               destroy_workqueue(btree_io_wq);
> +}
> +
> +int __init bch_btree_init(void)
> +{
> +       btree_io_wq = create_singlethread_workqueue("bch_btree_io");
> +       if (!btree_io_wq)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 2047a9cccdb5..dc4fe7eeda81 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2815,6 +2815,7 @@ static void bcache_exit(void)
>  {
>         bch_debug_exit();
>         bch_request_exit();
> +       bch_btree_exit();
>         if (bcache_kobj)
>                 kobject_put(bcache_kobj);
>         if (bcache_wq)
> @@ -2880,6 +2881,9 @@ static int __init bcache_init(void)
>         if (!bcache_wq)
>                 goto err;
>
> +       if (bch_btree_init())
> +               goto err;
> +
>         bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
>         if (!bch_journal_wq)
>                 goto err;
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] Revert "bcache: Kill btree_io_wq"
  2021-01-27 13:23 Fix degraded system performance due to workqueue overload Kai Krakow
                   ` (2 preceding siblings ...)
  2021-01-27 15:27 ` Fix degraded system performance due to workqueue overload Coly Li
@ 2021-01-27 16:39 ` Kai Krakow
  2021-01-27 16:39   ` [PATCH 2/2] bcache: Move journal work to new background wq Kai Krakow
  2021-01-28 10:09 ` Fix degraded system performance due to workqueue overload Kai Krakow
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2021-01-27 16:39 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

This reverts commit 56b30770b27d54d68ad51eccc6d888282b568cee.

With the btree using the system_wq, I seem to see a lot more desktop
latency than I should.

After some more investigation, it looks like the original assumption
of 56b3077 no longer is true, and bcache has a very high potential of
congesting the system_wq. In turn, this introduces laggy desktop
performance, IO stalls (at least with btrfs), and input events may be
delayed.

So let's revert this.

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/bcache.h |  2 ++
 drivers/md/bcache/btree.c  | 21 +++++++++++++++++++--
 drivers/md/bcache/super.c  |  4 ++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 1d57f48307e66..b1ed16c7a5341 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1042,5 +1042,7 @@ void bch_debug_exit(void);
 void bch_debug_init(void);
 void bch_request_exit(void);
 int bch_request_init(void);
+void bch_btree_exit(void);
+int bch_btree_init(void);
 
 #endif /* _BCACHE_H */
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 910df242c83df..952f022db5a5f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -99,6 +99,8 @@
 #define PTR_HASH(c, k)							\
 	(((k)->ptr[0] >> c->bucket_bits) | PTR_GEN(k, 0))
 
+static struct workqueue_struct *btree_io_wq;
+
 #define insert_lock(s, b)	((b)->level <= (s)->lock)
 
 
@@ -308,7 +310,7 @@ static void __btree_node_write_done(struct closure *cl)
 	btree_complete_write(b, w);
 
 	if (btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	closure_return_with_destructor(cl, btree_node_write_unlock);
 }
@@ -481,7 +483,7 @@ static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
 	BUG_ON(!i->keys);
 
 	if (!btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	set_btree_node_dirty(b);
 
@@ -2764,3 +2766,18 @@ void bch_keybuf_init(struct keybuf *buf)
 	spin_lock_init(&buf->lock);
 	array_allocator_init(&buf->freelist);
 }
+
+void bch_btree_exit(void)
+{
+	if (btree_io_wq)
+		destroy_workqueue(btree_io_wq);
+}
+
+int __init bch_btree_init(void)
+{
+	btree_io_wq = create_singlethread_workqueue("bch_btree_io");
+	if (!btree_io_wq)
+		return -ENOMEM;
+
+	return 0;
+}
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 2047a9cccdb5d..dc4fe7eeda815 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2815,6 +2815,7 @@ static void bcache_exit(void)
 {
 	bch_debug_exit();
 	bch_request_exit();
+	bch_btree_exit();
 	if (bcache_kobj)
 		kobject_put(bcache_kobj);
 	if (bcache_wq)
@@ -2880,6 +2881,9 @@ static int __init bcache_init(void)
 	if (!bcache_wq)
 		goto err;
 
+	if (bch_btree_init())
+		goto err;
+
 	bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
 	if (!bch_journal_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] bcache: Move journal work to new background wq
  2021-01-27 16:39 ` [PATCH 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
@ 2021-01-27 16:39   ` Kai Krakow
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-27 16:39 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

This is potentially long running and not latency sensitive, let's get
it out of the way of other latency sensitive events.

As observed in the previous commit, the system_wq comes easily
congested by bcache, and this fixes a few more stalls I was observing
every once in a while.

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/bcache.h  | 1 +
 drivers/md/bcache/journal.c | 4 ++--
 drivers/md/bcache/super.c   | 7 +++++++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index b1ed16c7a5341..70565ed5732d7 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1001,6 +1001,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
 
 extern struct workqueue_struct *bcache_wq;
 extern struct workqueue_struct *bch_journal_wq;
+extern struct workqueue_struct *bch_background_wq;
 extern struct mutex bch_register_lock;
 extern struct list_head bch_cache_sets;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index aefbdb7e003bc..942e97dd17554 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -932,8 +932,8 @@ atomic_t *bch_journal(struct cache_set *c,
 		journal_try_write(c);
 	} else if (!w->dirty) {
 		w->dirty = true;
-		schedule_delayed_work(&c->journal.work,
-				      msecs_to_jiffies(c->journal_delay_ms));
+		queue_delayed_work(bch_background_wq, &c->journal.work,
+				   msecs_to_jiffies(c->journal_delay_ms));
 		spin_unlock(&c->journal.lock);
 	} else {
 		spin_unlock(&c->journal.lock);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index dc4fe7eeda815..9e1481917ce6a 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -49,6 +49,7 @@ static int bcache_major;
 static DEFINE_IDA(bcache_device_idx);
 static wait_queue_head_t unregister_wait;
 struct workqueue_struct *bcache_wq;
+struct workqueue_struct *bch_background_wq;
 struct workqueue_struct *bch_journal_wq;
 
 
@@ -2822,6 +2823,8 @@ static void bcache_exit(void)
 		destroy_workqueue(bcache_wq);
 	if (bch_journal_wq)
 		destroy_workqueue(bch_journal_wq);
+	if (bch_background_wq)
+		destroy_workqueue(bch_background_wq);
 
 	if (bcache_major)
 		unregister_blkdev(bcache_major, "bcache");
@@ -2884,6 +2887,10 @@ static int __init bcache_init(void)
 	if (bch_btree_init())
 		goto err;
 
+	bch_background_wq = alloc_workqueue("bch_background", 0, 0);
+	if (!bch_background_wq)
+		goto err;
+
 	bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
 	if (!bch_journal_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Fix degraded system performance due to workqueue overload
  2021-01-27 13:23 Fix degraded system performance due to workqueue overload Kai Krakow
                   ` (3 preceding siblings ...)
  2021-01-27 16:39 ` [PATCH 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
@ 2021-01-28 10:09 ` Kai Krakow
  2021-01-28 10:50 ` [PATCH v2 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-28 10:09 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

Hi Kent!

Any objections about the following patches? Coly wanted to wait for
your confirmation.

Thanks.

Am Mi., 27. Jan. 2021 um 14:24 Uhr schrieb Kai Krakow <kai@kaishome.de>:
>
> In the past months (and looking back, even years), I was seeing system
> performance and latency degrading vastly when bcache is active.
>
> Finally, with kernel 5.10, I was able to locate the problem:
>
> [250336.887598] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 72s!
> [250336.887606] Showing busy workqueues and worker pools:
> [250336.887607] workqueue events: flags=0x0
> [250336.887608]   pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=3/256 refcnt=4
> [250336.887611]     pending: psi_avgs_work, psi_avgs_work, psi_avgs_work
> [250336.887619]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=15/256 refcnt=16
> [250336.887621]     in-flight: 3760137:psi_avgs_work
> [250336.887624]     pending: psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work
> [250336.887637]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [250336.887639]     pending: psi_avgs_work
> [250336.887643] workqueue events_power_efficient: flags=0x80
> [250336.887644]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [250336.887646]     pending: do_cache_clean
> [250336.887651] workqueue mm_percpu_wq: flags=0x8
> [250336.887651]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256 refcnt=4
> [250336.887653]     pending: lru_add_drain_per_cpu BAR(60), vmstat_update
> [250336.887666] workqueue bcache: flags=0x8
> [250336.887667]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [250336.887668]     pending: cached_dev_nodata
> [250336.887681] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=72s workers=2 idle: 3760136
>
> I was able to track that back to the following commit:
> 56b30770b27d54d68ad51eccc6d888282b568cee ("bcache: Kill btree_io_wq")
>
> Reverting that commit (with some adjustments due to later code changes)
> improved my desktop latency a lot, I mean really a lot. The system was
> finally able to handle somewhat higher loads without stalling for
> several seconds and without spiking load into the hundreds while doing a
> lot of write IO.
>
> So I dug a little deeper and found that the assumption of this old
> commit may no longer be true and bcache simply overwhelms the system_wq
> with too many or too long running workers. This should really only be
> used for workers that can do their work almost instantly, and it should
> not be spammed with a lot of workers which bcache seems to do (look at
> how many kthreads it creates from workers):
>
> # ps aux | grep 'kworker/.*bc' | wc -l
> 131
>
> And this is with a mostly idle system, it may easily reach 700+. Also,
> with my patches in place, that number seems to be overall lower.
>
> So I added another commit (patch 2) to move another worker queue over
> to a dedicated worker queue ("bcache: Move journal work to new
> background wq").
>
> I tested this by overloading my desktop system with the following
> parallel load:
>
>   * A big download at 1 Gbit/s, resulting in 60+ MB/s write
>   * Active IPFS daemon
>   * Watching a YouTube video
>   * Fully syncing 4 IMAP accounts with MailSpring
>   * Running a Gentoo system update (compiling packages)
>   * Browsing the web
>   * Running a Windows VM (Qemu) with Outlook and defragmentation
>   * Starting and closing several applications and clicking in them
>
> IO setup: 4x HDD (2+2+4+4 TB) btrfs RAID-0 with 850 GB SSD bcache
> Kernel 5.10.10
>
> Without the patches, the system would have come to a stop, probably not
> recovering from it (last time I tried, a clean shutdown took 1+ hour).
> With the patches, the system easily survives and feels overall smooth
> with only a small perceivable lag.
>
> Boot times are more consistent, too, and faster when bcache is mostly
> cold due to a previous system update.
>
> Write rates of the system are more smooth now, and can easily sustain a
> constant load of 200-300 MB/s while previously I would see long stalls
> followed by vastly reduces write performance (down to 5-20 MB/s).
>
> I'm not sure if there are side-effects of my patches that I cannot know
> of but it works great for me: All write-related desktop stalling is
> gone.
>
> --
> Regards,
> Kai
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/2] Revert "bcache: Kill btree_io_wq"
  2021-01-27 13:23 Fix degraded system performance due to workqueue overload Kai Krakow
                   ` (4 preceding siblings ...)
  2021-01-28 10:09 ` Fix degraded system performance due to workqueue overload Kai Krakow
@ 2021-01-28 10:50 ` Kai Krakow
  2021-01-28 10:50   ` [PATCH v2 2/2] bcache: Move journal work to new background wq Kai Krakow
  2021-01-28 23:28 ` [PATCH v3 1/3] Revert "bcache: Kill btree_io_wq" Kai Krakow
  2021-01-29 16:40 ` [PATCH v4 " Kai Krakow
  7 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2021-01-28 10:50 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

This reverts commit 56b30770b27d54d68ad51eccc6d888282b568cee.

With the btree using the system_wq, I seem to see a lot more desktop
latency than I should.

After some more investigation, it looks like the original assumption
of 56b3077 no longer is true, and bcache has a very high potential of
congesting the system_wq. In turn, this introduces laggy desktop
performance, IO stalls (at least with btrfs), and input events may be
delayed.

So let's revert this.

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/bcache.h |  2 ++
 drivers/md/bcache/btree.c  | 21 +++++++++++++++++++--
 drivers/md/bcache/super.c  |  4 ++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 1d57f48307e66..b1ed16c7a5341 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1042,5 +1042,7 @@ void bch_debug_exit(void);
 void bch_debug_init(void);
 void bch_request_exit(void);
 int bch_request_init(void);
+void bch_btree_exit(void);
+int bch_btree_init(void);
 
 #endif /* _BCACHE_H */
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 910df242c83df..952f022db5a5f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -99,6 +99,8 @@
 #define PTR_HASH(c, k)							\
 	(((k)->ptr[0] >> c->bucket_bits) | PTR_GEN(k, 0))
 
+static struct workqueue_struct *btree_io_wq;
+
 #define insert_lock(s, b)	((b)->level <= (s)->lock)
 
 
@@ -308,7 +310,7 @@ static void __btree_node_write_done(struct closure *cl)
 	btree_complete_write(b, w);
 
 	if (btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	closure_return_with_destructor(cl, btree_node_write_unlock);
 }
@@ -481,7 +483,7 @@ static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
 	BUG_ON(!i->keys);
 
 	if (!btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	set_btree_node_dirty(b);
 
@@ -2764,3 +2766,18 @@ void bch_keybuf_init(struct keybuf *buf)
 	spin_lock_init(&buf->lock);
 	array_allocator_init(&buf->freelist);
 }
+
+void bch_btree_exit(void)
+{
+	if (btree_io_wq)
+		destroy_workqueue(btree_io_wq);
+}
+
+int __init bch_btree_init(void)
+{
+	btree_io_wq = create_singlethread_workqueue("bch_btree_io");
+	if (!btree_io_wq)
+		return -ENOMEM;
+
+	return 0;
+}
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 2047a9cccdb5d..dc4fe7eeda815 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2815,6 +2815,7 @@ static void bcache_exit(void)
 {
 	bch_debug_exit();
 	bch_request_exit();
+	bch_btree_exit();
 	if (bcache_kobj)
 		kobject_put(bcache_kobj);
 	if (bcache_wq)
@@ -2880,6 +2881,9 @@ static int __init bcache_init(void)
 	if (!bcache_wq)
 		goto err;
 
+	if (bch_btree_init())
+		goto err;
+
 	bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
 	if (!bch_journal_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/2] bcache: Move journal work to new background wq
  2021-01-28 10:50 ` [PATCH v2 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
@ 2021-01-28 10:50   ` Kai Krakow
  2021-01-28 16:37     ` Kai Krakow
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2021-01-28 10:50 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

This is potentially long running and not latency sensitive, let's get
it out of the way of other latency sensitive events.

As observed in the previous commit, the system_wq comes easily
congested by bcache, and this fixes a few more stalls I was observing
every once in a while.

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/bcache.h  | 1 +
 drivers/md/bcache/journal.c | 4 ++--
 drivers/md/bcache/super.c   | 7 +++++++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index b1ed16c7a5341..70565ed5732d7 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1001,6 +1001,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
 
 extern struct workqueue_struct *bcache_wq;
 extern struct workqueue_struct *bch_journal_wq;
+extern struct workqueue_struct *bch_background_wq;
 extern struct mutex bch_register_lock;
 extern struct list_head bch_cache_sets;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index aefbdb7e003bc..942e97dd17554 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -932,8 +932,8 @@ atomic_t *bch_journal(struct cache_set *c,
 		journal_try_write(c);
 	} else if (!w->dirty) {
 		w->dirty = true;
-		schedule_delayed_work(&c->journal.work,
-				      msecs_to_jiffies(c->journal_delay_ms));
+		queue_delayed_work(bch_background_wq, &c->journal.work,
+				   msecs_to_jiffies(c->journal_delay_ms));
 		spin_unlock(&c->journal.lock);
 	} else {
 		spin_unlock(&c->journal.lock);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index dc4fe7eeda815..5206c037c13f3 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -49,6 +49,7 @@ static int bcache_major;
 static DEFINE_IDA(bcache_device_idx);
 static wait_queue_head_t unregister_wait;
 struct workqueue_struct *bcache_wq;
+struct workqueue_struct *bch_background_wq;
 struct workqueue_struct *bch_journal_wq;
 
 
@@ -2822,6 +2823,8 @@ static void bcache_exit(void)
 		destroy_workqueue(bcache_wq);
 	if (bch_journal_wq)
 		destroy_workqueue(bch_journal_wq);
+	if (bch_background_wq)
+		destroy_workqueue(bch_background_wq);
 
 	if (bcache_major)
 		unregister_blkdev(bcache_major, "bcache");
@@ -2884,6 +2887,10 @@ static int __init bcache_init(void)
 	if (bch_btree_init())
 		goto err;
 
+	bch_background_wq = alloc_workqueue("bch_background", WQ_MEM_RECLAIM, 0);
+	if (!bch_background_wq)
+		goto err;
+
 	bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
 	if (!bch_journal_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] bcache: Move journal work to new background wq
  2021-01-28 10:50   ` [PATCH v2 2/2] bcache: Move journal work to new background wq Kai Krakow
@ 2021-01-28 16:37     ` Kai Krakow
  2021-01-28 16:41       ` Kai Krakow
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2021-01-28 16:37 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache

Hi Coly!

With v2, I added WQ_MEM_RECLAIM to the background wq. But with it, I
consistently get 50% worse boot times (20 -> 30s), when set the value
to 0 again, boot times are better again. The desktop also reacts to
clicks earlier right after boot.

Is there memory reclaim pressure during boot?

As far as I understand the code, this would trigger an immediate
journal flush then under memory reclaim because this background wq is
only used to reschedule journal flush some time in the future (100ms?)
if there's nothing to write just now:

>         } else if (!w->dirty) {
>                 w->dirty = true;
> -               schedule_delayed_work(&c->journal.work,
> -                                     msecs_to_jiffies(c->journal_delay_ms));
> +               queue_delayed_work(bch_background_wq, &c->journal.work,
> +                                  msecs_to_jiffies(c->journal_delay_ms));
>                 spin_unlock(&c->journal.lock);
>         } else {

Do you have an explanation? Will it be safe to return to the previous
init using no WQ_MEM_RECLAIM for background wq? Like this:

-       bch_background_wq = alloc_workqueue("bch_background",
WQ_MEM_RECLAIM, 0);
+       bch_background_wq = alloc_workqueue("bch_background", 0, 0);

Maybe we should find a better name for this - not bch_background?

Thanks,
Kai

Am Do., 28. Jan. 2021 um 11:50 Uhr schrieb Kai Krakow <kai@kaishome.de>:
>
> This is potentially long running and not latency sensitive, let's get
> it out of the way of other latency sensitive events.
>
> As observed in the previous commit, the system_wq comes easily
> congested by bcache, and this fixes a few more stalls I was observing
> every once in a while.
>
> Cc: Coly Li <colyli@suse.de>
> Signed-off-by: Kai Krakow <kai@kaishome.de>
> ---
>  drivers/md/bcache/bcache.h  | 1 +
>  drivers/md/bcache/journal.c | 4 ++--
>  drivers/md/bcache/super.c   | 7 +++++++
>  3 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index b1ed16c7a5341..70565ed5732d7 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -1001,6 +1001,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
>
>  extern struct workqueue_struct *bcache_wq;
>  extern struct workqueue_struct *bch_journal_wq;
> +extern struct workqueue_struct *bch_background_wq;
>  extern struct mutex bch_register_lock;
>  extern struct list_head bch_cache_sets;
>
> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
> index aefbdb7e003bc..942e97dd17554 100644
> --- a/drivers/md/bcache/journal.c
> +++ b/drivers/md/bcache/journal.c
> @@ -932,8 +932,8 @@ atomic_t *bch_journal(struct cache_set *c,
>                 journal_try_write(c);
>         } else if (!w->dirty) {
>                 w->dirty = true;
> -               schedule_delayed_work(&c->journal.work,
> -                                     msecs_to_jiffies(c->journal_delay_ms));
> +               queue_delayed_work(bch_background_wq, &c->journal.work,
> +                                  msecs_to_jiffies(c->journal_delay_ms));
>                 spin_unlock(&c->journal.lock);
>         } else {
>                 spin_unlock(&c->journal.lock);
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index dc4fe7eeda815..5206c037c13f3 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -49,6 +49,7 @@ static int bcache_major;
>  static DEFINE_IDA(bcache_device_idx);
>  static wait_queue_head_t unregister_wait;
>  struct workqueue_struct *bcache_wq;
> +struct workqueue_struct *bch_background_wq;
>  struct workqueue_struct *bch_journal_wq;
>
>
> @@ -2822,6 +2823,8 @@ static void bcache_exit(void)
>                 destroy_workqueue(bcache_wq);
>         if (bch_journal_wq)
>                 destroy_workqueue(bch_journal_wq);
> +       if (bch_background_wq)
> +               destroy_workqueue(bch_background_wq);
>
>         if (bcache_major)
>                 unregister_blkdev(bcache_major, "bcache");
> @@ -2884,6 +2887,10 @@ static int __init bcache_init(void)
>         if (bch_btree_init())
>                 goto err;
>
> +       bch_background_wq = alloc_workqueue("bch_background", WQ_MEM_RECLAIM, 0);
> +       if (!bch_background_wq)
> +               goto err;
> +
>         bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
>         if (!bch_journal_wq)
>                 goto err;
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] bcache: Move journal work to new background wq
  2021-01-28 16:37     ` Kai Krakow
@ 2021-01-28 16:41       ` Kai Krakow
       [not found]         ` <988ba514-c607-688b-555d-18fbbb069f48@suse.de>
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2021-01-28 16:41 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache

> As far as I understand the code, this would trigger an immediate
> journal flush then under memory reclaim because this background wq is
> only used to reschedule journal flush some time in the future (100ms?)
> if there's nothing to write just now:

Thinking about this again, the code looks like it tries to coalesce
journal flushes into windows of 100ms, and the dirty flag is just used
as an indicator to know whether the flush has already been queued. Is
that true?

> >         } else if (!w->dirty) {
> >                 w->dirty = true;
> > -               schedule_delayed_work(&c->journal.work,
> > -                                     msecs_to_jiffies(c->journal_delay_ms));
> > +               queue_delayed_work(bch_background_wq, &c->journal.work,
> > +                                  msecs_to_jiffies(c->journal_delay_ms));
> >                 spin_unlock(&c->journal.lock);
> >         } else {

This would mean we start performing worse under memory reclaim...

Thanks,
Kai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/3] Revert "bcache: Kill btree_io_wq"
  2021-01-27 13:23 Fix degraded system performance due to workqueue overload Kai Krakow
                   ` (5 preceding siblings ...)
  2021-01-28 10:50 ` [PATCH v2 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
@ 2021-01-28 23:28 ` Kai Krakow
  2021-01-28 23:28   ` [PATCH v3 2/3] bcache: Give btree_io_wq correct semantics again Kai Krakow
                     ` (2 more replies)
  2021-01-29 16:40 ` [PATCH v4 " Kai Krakow
  7 siblings, 3 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-28 23:28 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

This reverts commit 56b30770b27d54d68ad51eccc6d888282b568cee.

With the btree using the `system_wq`, I seem to see a lot more desktop
latency than I should.

After some more investigation, it looks like the original assumption
of 56b3077 no longer is true, and bcache has a very high potential of
congesting the `system_wq`. In turn, this introduces laggy desktop
performance, IO stalls (at least with btrfs), and input events may be
delayed.

So let's revert this. It's important to note that the semantics of
using `system_wq` previously mean that `btree_io_wq` should be created
before and destroyed after other bcache wqs to keep the same
assumptions.

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/bcache.h |  2 ++
 drivers/md/bcache/btree.c  | 21 +++++++++++++++++++--
 drivers/md/bcache/super.c  |  4 ++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 1d57f48307e66..b1ed16c7a5341 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1042,5 +1042,7 @@ void bch_debug_exit(void);
 void bch_debug_init(void);
 void bch_request_exit(void);
 int bch_request_init(void);
+void bch_btree_exit(void);
+int bch_btree_init(void);
 
 #endif /* _BCACHE_H */
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 910df242c83df..952f022db5a5f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -99,6 +99,8 @@
 #define PTR_HASH(c, k)							\
 	(((k)->ptr[0] >> c->bucket_bits) | PTR_GEN(k, 0))
 
+static struct workqueue_struct *btree_io_wq;
+
 #define insert_lock(s, b)	((b)->level <= (s)->lock)
 
 
@@ -308,7 +310,7 @@ static void __btree_node_write_done(struct closure *cl)
 	btree_complete_write(b, w);
 
 	if (btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	closure_return_with_destructor(cl, btree_node_write_unlock);
 }
@@ -481,7 +483,7 @@ static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
 	BUG_ON(!i->keys);
 
 	if (!btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	set_btree_node_dirty(b);
 
@@ -2764,3 +2766,18 @@ void bch_keybuf_init(struct keybuf *buf)
 	spin_lock_init(&buf->lock);
 	array_allocator_init(&buf->freelist);
 }
+
+void bch_btree_exit(void)
+{
+	if (btree_io_wq)
+		destroy_workqueue(btree_io_wq);
+}
+
+int __init bch_btree_init(void)
+{
+	btree_io_wq = create_singlethread_workqueue("bch_btree_io");
+	if (!btree_io_wq)
+		return -ENOMEM;
+
+	return 0;
+}
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 2047a9cccdb5d..77c5d8b6d4316 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2821,6 +2821,7 @@ static void bcache_exit(void)
 		destroy_workqueue(bcache_wq);
 	if (bch_journal_wq)
 		destroy_workqueue(bch_journal_wq);
+	bch_btree_exit();
 
 	if (bcache_major)
 		unregister_blkdev(bcache_major, "bcache");
@@ -2876,6 +2877,9 @@ static int __init bcache_init(void)
 		return bcache_major;
 	}
 
+	if (bch_btree_init())
+		goto err;
+
 	bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM, 0);
 	if (!bcache_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/3] bcache: Give btree_io_wq correct semantics again
  2021-01-28 23:28 ` [PATCH v3 1/3] Revert "bcache: Kill btree_io_wq" Kai Krakow
@ 2021-01-28 23:28   ` Kai Krakow
  2021-01-28 23:28   ` [PATCH v3 3/3] bcache: Move journal work to new background wq Kai Krakow
       [not found]   ` <4fe07714-e5bf-4be3-6023-74b507ee54be@suse.de>
  2 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-28 23:28 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

Before killing `btree_io_wq`, the queue was allocated using
`create_singlethread_workqueue()` which has `WQ_MEM_RECLAIM`. After
killing it, it no longer had this property but `system_wq` is not
single threaded.

Let's combine both worlds and make it multi threaded but able to
reclaim memory.

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/btree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 952f022db5a5f..fe6dce125aba2 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2775,7 +2775,7 @@ void bch_btree_exit(void)
 
 int __init bch_btree_init(void)
 {
-	btree_io_wq = create_singlethread_workqueue("bch_btree_io");
+	btree_io_wq = alloc_workqueue("bch_btree_io", WQ_MEM_RECLAIM, 0);
 	if (!btree_io_wq)
 		return -ENOMEM;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/3] bcache: Move journal work to new background wq
  2021-01-28 23:28 ` [PATCH v3 1/3] Revert "bcache: Kill btree_io_wq" Kai Krakow
  2021-01-28 23:28   ` [PATCH v3 2/3] bcache: Give btree_io_wq correct semantics again Kai Krakow
@ 2021-01-28 23:28   ` Kai Krakow
       [not found]     ` <a52b9107-7e84-0fea-6095-84a9576d7cc4@suse.de>
       [not found]   ` <4fe07714-e5bf-4be3-6023-74b507ee54be@suse.de>
  2 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2021-01-28 23:28 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

This is potentially long running and not latency sensitive, let's get
it out of the way of other latency sensitive events.

As observed in the previous commit, the `system_wq` comes easily
congested by bcache, and this fixes a few more stalls I was observing
every once in a while.

Let's not make this `WQ_MEM_RECLAIM` as it showed to reduce performance
of boot and file system operations in my tests. Also, without
`WQ_MEM_RECLAIM`, I no longer see desktop stalls. This matches the
previous behavior as `system_wq` also does no memory reclaim:

> // workqueue.c:
> system_wq = alloc_workqueue("events", 0, 0);

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/bcache.h  | 1 +
 drivers/md/bcache/journal.c | 4 ++--
 drivers/md/bcache/super.c   | 7 +++++++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index b1ed16c7a5341..e8bf4f752e8be 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1001,6 +1001,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
 
 extern struct workqueue_struct *bcache_wq;
 extern struct workqueue_struct *bch_journal_wq;
+extern struct workqueue_struct *bch_flush_wq;
 extern struct mutex bch_register_lock;
 extern struct list_head bch_cache_sets;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index aefbdb7e003bc..c6613e8173337 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -932,8 +932,8 @@ atomic_t *bch_journal(struct cache_set *c,
 		journal_try_write(c);
 	} else if (!w->dirty) {
 		w->dirty = true;
-		schedule_delayed_work(&c->journal.work,
-				      msecs_to_jiffies(c->journal_delay_ms));
+		queue_delayed_work(bch_flush_wq, &c->journal.work,
+				   msecs_to_jiffies(c->journal_delay_ms));
 		spin_unlock(&c->journal.lock);
 	} else {
 		spin_unlock(&c->journal.lock);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 77c5d8b6d4316..817b36c39b4fc 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -49,6 +49,7 @@ static int bcache_major;
 static DEFINE_IDA(bcache_device_idx);
 static wait_queue_head_t unregister_wait;
 struct workqueue_struct *bcache_wq;
+struct workqueue_struct *bch_flush_wq;
 struct workqueue_struct *bch_journal_wq;
 
 
@@ -2821,6 +2822,8 @@ static void bcache_exit(void)
 		destroy_workqueue(bcache_wq);
 	if (bch_journal_wq)
 		destroy_workqueue(bch_journal_wq);
+	if (bch_flush_wq)
+		destroy_workqueue(bch_flush_wq);
 	bch_btree_exit();
 
 	if (bcache_major)
@@ -2884,6 +2887,10 @@ static int __init bcache_init(void)
 	if (!bcache_wq)
 		goto err;
 
+	bch_flush_wq = alloc_workqueue("bch_flush", 0, 0);
+	if (!bch_flush_wq)
+		goto err;
+
 	bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
 	if (!bch_journal_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] bcache: Move journal work to new background wq
       [not found]         ` <988ba514-c607-688b-555d-18fbbb069f48@suse.de>
@ 2021-01-29 16:36           ` Kai Krakow
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-29 16:36 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache

Am Fr., 29. Jan. 2021 um 17:01 Uhr schrieb Coly Li <colyli@suse.de>:
>
> On 1/29/21 12:41 AM, Kai Krakow wrote:
> >> As far as I understand the code, this would trigger an immediate
> >> journal flush then under memory reclaim because this background wq is
> >> only used to reschedule journal flush some time in the future (100ms?)

> For a typical 1000HZ jiffies, 100ms is extended 1 jiffy by
> msecs_to_jiffies().

Ah, you mean in the sense of lagging at least 1 jiffy behind because
work is dispatched asynchronously?

BTW: I'm using a 300 Hz system for my desktop, that's usually good
enough and maybe even a better choice for 60 Hz applications, as 300
divides easily by typical refresh rates (25, 30, 50, 60). But this is
useful information for the xpadneo driver I'm developing. Thanks.

> >>>         } else if (!w->dirty) {
> >>>                 w->dirty = true;
> >>> -               schedule_delayed_work(&c->journal.work,
> >>> -                                     msecs_to_jiffies(c->journal_delay_ms));
> >>> +               queue_delayed_work(bch_background_wq, &c->journal.work,
> >>> +                                  msecs_to_jiffies(c->journal_delay_ms));
> >>>                 spin_unlock(&c->journal.lock);
> >>>         } else {
> >
> > This would mean we start performing worse under memory reclaim...
>
> A journal write buffer is 8 pages, for 4KB kernel page size, it won't be
> a large occupation.

As far as I can see the called routine would only spawn the actual
writes in a closure anyways. So if this was used for memory reclaim,
effects would lag behind anyways.

Still, I'm seeing a huge difference if this queue gets allocated with
`WQ_MEM_RECLAIM`. It works fine for most filesystems but for btrfs
there are probably at least twice that many outstanding requests.

But I don't think we need to discuss whether it should run under
memory reclaim, when the original implementation using `system_wq`
didn't do that in the first place. I was just curious and wanted to
understand the context better.

I think it's important to design carefully to not have vastly
different behavior whether we had a 100 Hz or a 1000 Hz kernel. For
example, my server builds usually run a 100 Hz kernel.

Thanks,
Kai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/3] bcache: Move journal work to new background wq
       [not found]     ` <a52b9107-7e84-0fea-6095-84a9576d7cc4@suse.de>
@ 2021-01-29 16:37       ` Kai Krakow
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-29 16:37 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache

Am Fr., 29. Jan. 2021 um 17:06 Uhr schrieb Coly Li <colyli@suse.de>:
>
> On 1/29/21 7:28 AM, Kai Krakow wrote:
> > This is potentially long running and not latency sensitive, let's get
> > it out of the way of other latency sensitive events.
> >
> > As observed in the previous commit, the `system_wq` comes easily
> > congested by bcache, and this fixes a few more stalls I was observing
> > every once in a while.
> >
> > Let's not make this `WQ_MEM_RECLAIM` as it showed to reduce performance
> > of boot and file system operations in my tests. Also, without
> > `WQ_MEM_RECLAIM`, I no longer see desktop stalls. This matches the
> > previous behavior as `system_wq` also does no memory reclaim:
> >
> >> // workqueue.c:
> >> system_wq = alloc_workqueue("events", 0, 0);
> >
>
> Your text is convinced. I am OK with this patch. One more request, could
> you please add a comment at alloc_workqueue() to explain why setting
> flags as 0 and no WQ_MEM_RECLAIM. It should be helpful to others who
> don't read our discussion.

I'll post a v4.

> > Cc: Coly Li <colyli@suse.de>
> > Signed-off-by: Kai Krakow <kai@kaishome.de>
> > ---
> >  drivers/md/bcache/bcache.h  | 1 +
> >  drivers/md/bcache/journal.c | 4 ++--
> >  drivers/md/bcache/super.c   | 7 +++++++
> >  3 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> > index b1ed16c7a5341..e8bf4f752e8be 100644
> > --- a/drivers/md/bcache/bcache.h
> > +++ b/drivers/md/bcache/bcache.h
> > @@ -1001,6 +1001,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
> >
> >  extern struct workqueue_struct *bcache_wq;
> >  extern struct workqueue_struct *bch_journal_wq;
> > +extern struct workqueue_struct *bch_flush_wq;
> >  extern struct mutex bch_register_lock;
> >  extern struct list_head bch_cache_sets;
> >
> > diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
> > index aefbdb7e003bc..c6613e8173337 100644
> > --- a/drivers/md/bcache/journal.c
> > +++ b/drivers/md/bcache/journal.c
> > @@ -932,8 +932,8 @@ atomic_t *bch_journal(struct cache_set *c,
> >               journal_try_write(c);
> >       } else if (!w->dirty) {
> >               w->dirty = true;
> > -             schedule_delayed_work(&c->journal.work,
> > -                                   msecs_to_jiffies(c->journal_delay_ms));
> > +             queue_delayed_work(bch_flush_wq, &c->journal.work,
> > +                                msecs_to_jiffies(c->journal_delay_ms));
> >               spin_unlock(&c->journal.lock);
> >       } else {
> >               spin_unlock(&c->journal.lock);
> > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > index 77c5d8b6d4316..817b36c39b4fc 100644
> > --- a/drivers/md/bcache/super.c
> > +++ b/drivers/md/bcache/super.c
> > @@ -49,6 +49,7 @@ static int bcache_major;
> >  static DEFINE_IDA(bcache_device_idx);
> >  static wait_queue_head_t unregister_wait;
> >  struct workqueue_struct *bcache_wq;
> > +struct workqueue_struct *bch_flush_wq;
> >  struct workqueue_struct *bch_journal_wq;
> >
> >
> > @@ -2821,6 +2822,8 @@ static void bcache_exit(void)
> >               destroy_workqueue(bcache_wq);
> >       if (bch_journal_wq)
> >               destroy_workqueue(bch_journal_wq);
> > +     if (bch_flush_wq)
> > +             destroy_workqueue(bch_flush_wq);
> >       bch_btree_exit();
> >
> >       if (bcache_major)
> > @@ -2884,6 +2887,10 @@ static int __init bcache_init(void)
> >       if (!bcache_wq)
> >               goto err;
> >
> > +     bch_flush_wq = alloc_workqueue("bch_flush", 0, 0);
> > +     if (!bch_flush_wq)
> > +             goto err;
> > +
> >       bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
> >       if (!bch_journal_wq)
> >               goto err;
> >
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] Revert "bcache: Kill btree_io_wq"
  2021-01-27 13:23 Fix degraded system performance due to workqueue overload Kai Krakow
                   ` (6 preceding siblings ...)
  2021-01-28 23:28 ` [PATCH v3 1/3] Revert "bcache: Kill btree_io_wq" Kai Krakow
@ 2021-01-29 16:40 ` Kai Krakow
  2021-01-29 16:40   ` [PATCH v4 2/3] bcache: Give btree_io_wq correct semantics again Kai Krakow
  2021-01-29 16:40   ` [PATCH v4 3/3] bcache: Move journal work to new flush wq Kai Krakow
  7 siblings, 2 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-29 16:40 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

This reverts commit 56b30770b27d54d68ad51eccc6d888282b568cee.

With the btree using the `system_wq`, I seem to see a lot more desktop
latency than I should.

After some more investigation, it looks like the original assumption
of 56b3077 no longer is true, and bcache has a very high potential of
congesting the `system_wq`. In turn, this introduces laggy desktop
performance, IO stalls (at least with btrfs), and input events may be
delayed.

So let's revert this. It's important to note that the semantics of
using `system_wq` previously mean that `btree_io_wq` should be created
before and destroyed after other bcache wqs to keep the same
assumptions.

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/bcache.h |  2 ++
 drivers/md/bcache/btree.c  | 21 +++++++++++++++++++--
 drivers/md/bcache/super.c  |  4 ++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 1d57f48307e66..b1ed16c7a5341 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1042,5 +1042,7 @@ void bch_debug_exit(void);
 void bch_debug_init(void);
 void bch_request_exit(void);
 int bch_request_init(void);
+void bch_btree_exit(void);
+int bch_btree_init(void);
 
 #endif /* _BCACHE_H */
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 910df242c83df..952f022db5a5f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -99,6 +99,8 @@
 #define PTR_HASH(c, k)							\
 	(((k)->ptr[0] >> c->bucket_bits) | PTR_GEN(k, 0))
 
+static struct workqueue_struct *btree_io_wq;
+
 #define insert_lock(s, b)	((b)->level <= (s)->lock)
 
 
@@ -308,7 +310,7 @@ static void __btree_node_write_done(struct closure *cl)
 	btree_complete_write(b, w);
 
 	if (btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	closure_return_with_destructor(cl, btree_node_write_unlock);
 }
@@ -481,7 +483,7 @@ static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
 	BUG_ON(!i->keys);
 
 	if (!btree_node_dirty(b))
-		schedule_delayed_work(&b->work, 30 * HZ);
+		queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
 
 	set_btree_node_dirty(b);
 
@@ -2764,3 +2766,18 @@ void bch_keybuf_init(struct keybuf *buf)
 	spin_lock_init(&buf->lock);
 	array_allocator_init(&buf->freelist);
 }
+
+void bch_btree_exit(void)
+{
+	if (btree_io_wq)
+		destroy_workqueue(btree_io_wq);
+}
+
+int __init bch_btree_init(void)
+{
+	btree_io_wq = create_singlethread_workqueue("bch_btree_io");
+	if (!btree_io_wq)
+		return -ENOMEM;
+
+	return 0;
+}
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 2047a9cccdb5d..77c5d8b6d4316 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2821,6 +2821,7 @@ static void bcache_exit(void)
 		destroy_workqueue(bcache_wq);
 	if (bch_journal_wq)
 		destroy_workqueue(bch_journal_wq);
+	bch_btree_exit();
 
 	if (bcache_major)
 		unregister_blkdev(bcache_major, "bcache");
@@ -2876,6 +2877,9 @@ static int __init bcache_init(void)
 		return bcache_major;
 	}
 
+	if (bch_btree_init())
+		goto err;
+
 	bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM, 0);
 	if (!bcache_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 2/3] bcache: Give btree_io_wq correct semantics again
  2021-01-29 16:40 ` [PATCH v4 " Kai Krakow
@ 2021-01-29 16:40   ` Kai Krakow
  2021-01-29 16:40   ` [PATCH v4 3/3] bcache: Move journal work to new flush wq Kai Krakow
  1 sibling, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-29 16:40 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

Before killing `btree_io_wq`, the queue was allocated using
`create_singlethread_workqueue()` which has `WQ_MEM_RECLAIM`. After
killing it, it no longer had this property but `system_wq` is not
single threaded.

Let's combine both worlds and make it multi threaded but able to
reclaim memory.

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/btree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 952f022db5a5f..fe6dce125aba2 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2775,7 +2775,7 @@ void bch_btree_exit(void)
 
 int __init bch_btree_init(void)
 {
-	btree_io_wq = create_singlethread_workqueue("bch_btree_io");
+	btree_io_wq = alloc_workqueue("bch_btree_io", WQ_MEM_RECLAIM, 0);
 	if (!btree_io_wq)
 		return -ENOMEM;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/3] bcache: Move journal work to new flush wq
  2021-01-29 16:40 ` [PATCH v4 " Kai Krakow
  2021-01-29 16:40   ` [PATCH v4 2/3] bcache: Give btree_io_wq correct semantics again Kai Krakow
@ 2021-01-29 16:40   ` Kai Krakow
  1 sibling, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-29 16:40 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kai Krakow, Coly Li

This is potentially long running and not latency sensitive, let's get
it out of the way of other latency sensitive events.

As observed in the previous commit, the `system_wq` comes easily
congested by bcache, and this fixes a few more stalls I was observing
every once in a while.

Let's not make this `WQ_MEM_RECLAIM` as it showed to reduce performance
of boot and file system operations in my tests. Also, without
`WQ_MEM_RECLAIM`, I no longer see desktop stalls. This matches the
previous behavior as `system_wq` also does no memory reclaim:

> // workqueue.c:
> system_wq = alloc_workqueue("events", 0, 0);

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
 drivers/md/bcache/bcache.h  |  1 +
 drivers/md/bcache/journal.c |  4 ++--
 drivers/md/bcache/super.c   | 16 ++++++++++++++++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index b1ed16c7a5341..e8bf4f752e8be 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1001,6 +1001,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
 
 extern struct workqueue_struct *bcache_wq;
 extern struct workqueue_struct *bch_journal_wq;
+extern struct workqueue_struct *bch_flush_wq;
 extern struct mutex bch_register_lock;
 extern struct list_head bch_cache_sets;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index aefbdb7e003bc..c6613e8173337 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -932,8 +932,8 @@ atomic_t *bch_journal(struct cache_set *c,
 		journal_try_write(c);
 	} else if (!w->dirty) {
 		w->dirty = true;
-		schedule_delayed_work(&c->journal.work,
-				      msecs_to_jiffies(c->journal_delay_ms));
+		queue_delayed_work(bch_flush_wq, &c->journal.work,
+				   msecs_to_jiffies(c->journal_delay_ms));
 		spin_unlock(&c->journal.lock);
 	} else {
 		spin_unlock(&c->journal.lock);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 77c5d8b6d4316..7457ec160c9a1 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -49,6 +49,7 @@ static int bcache_major;
 static DEFINE_IDA(bcache_device_idx);
 static wait_queue_head_t unregister_wait;
 struct workqueue_struct *bcache_wq;
+struct workqueue_struct *bch_flush_wq;
 struct workqueue_struct *bch_journal_wq;
 
 
@@ -2821,6 +2822,8 @@ static void bcache_exit(void)
 		destroy_workqueue(bcache_wq);
 	if (bch_journal_wq)
 		destroy_workqueue(bch_journal_wq);
+	if (bch_flush_wq)
+		destroy_workqueue(bch_flush_wq);
 	bch_btree_exit();
 
 	if (bcache_major)
@@ -2884,6 +2887,19 @@ static int __init bcache_init(void)
 	if (!bcache_wq)
 		goto err;
 
+	/*
+	 * Let's not make this `WQ_MEM_RECLAIM` for the following reasons:
+	 *
+	 * 1. It used `system_wq` before which also does no memory reclaim.
+	 * 2. With `WQ_MEM_RECLAIM` desktop stalls, increased boot times, and
+	 *    reduced throughput can be observed.
+	 *
+	 * We still want to user our own queue to not congest the `system_wq`.
+	 */
+	bch_flush_wq = alloc_workqueue("bch_flush", 0, 0);
+	if (!bch_flush_wq)
+		goto err;
+
 	bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
 	if (!bch_journal_wq)
 		goto err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] Revert "bcache: Kill btree_io_wq"
       [not found]   ` <4fe07714-e5bf-4be3-6023-74b507ee54be@suse.de>
@ 2021-01-29 16:59     ` Kai Krakow
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2021-01-29 16:59 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache

Am Fr., 29. Jan. 2021 um 16:30 Uhr schrieb Coly Li <colyli@suse.de>:
>
> On 1/29/21 7:28 AM, Kai Krakow wrote:
> > This reverts commit 56b30770b27d54d68ad51eccc6d888282b568cee.
> >
> > With the btree using the `system_wq`, I seem to see a lot more desktop
> > latency than I should.
> >
> > After some more investigation, it looks like the original assumption
> > of 56b3077 no longer is true, and bcache has a very high potential of
> > congesting the `system_wq`. In turn, this introduces laggy desktop
> > performance, IO stalls (at least with btrfs), and input events may be
> > delayed.
> >
> > So let's revert this. It's important to note that the semantics of
> > using `system_wq` previously mean that `btree_io_wq` should be created
> > before and destroyed after other bcache wqs to keep the same
> > assumptions.
> >
> > Cc: Coly Li <colyli@suse.de>
> > Signed-off-by: Kai Krakow <kai@kaishome.de>
>
> The patch is OK to me in general. I just feel it might be unnecessary to
> use ordered work queue. The out-of-order system_wq is used for many
> years and works well with bcache journal.

This is why in v3 and later, I migrated this to an unordered queue. I
just wanted to keep the revert as-is, and then the follow-up patch
will fix this. Is that okay or should I squash both commits?

Thanks,
Kai

> > ---
> >  drivers/md/bcache/bcache.h |  2 ++
> >  drivers/md/bcache/btree.c  | 21 +++++++++++++++++++--
> >  drivers/md/bcache/super.c  |  4 ++++
> >  3 files changed, 25 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> > index 1d57f48307e66..b1ed16c7a5341 100644
> > --- a/drivers/md/bcache/bcache.h
> > +++ b/drivers/md/bcache/bcache.h
> > @@ -1042,5 +1042,7 @@ void bch_debug_exit(void);
> >  void bch_debug_init(void);
> >  void bch_request_exit(void);
> >  int bch_request_init(void);
> > +void bch_btree_exit(void);
> > +int bch_btree_init(void);
> >
> >  #endif /* _BCACHE_H */
> > diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> > index 910df242c83df..952f022db5a5f 100644
> > --- a/drivers/md/bcache/btree.c
> > +++ b/drivers/md/bcache/btree.c
> > @@ -99,6 +99,8 @@
> >  #define PTR_HASH(c, k)                                                       \
> >       (((k)->ptr[0] >> c->bucket_bits) | PTR_GEN(k, 0))
> >
> > +static struct workqueue_struct *btree_io_wq;
> > +
> >  #define insert_lock(s, b)    ((b)->level <= (s)->lock)
> >
> >
> > @@ -308,7 +310,7 @@ static void __btree_node_write_done(struct closure *cl)
> >       btree_complete_write(b, w);
> >
> >       if (btree_node_dirty(b))
> > -             schedule_delayed_work(&b->work, 30 * HZ);
> > +             queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
> >
> >       closure_return_with_destructor(cl, btree_node_write_unlock);
> >  }
> > @@ -481,7 +483,7 @@ static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
> >       BUG_ON(!i->keys);
> >
> >       if (!btree_node_dirty(b))
> > -             schedule_delayed_work(&b->work, 30 * HZ);
> > +             queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
> >
> >       set_btree_node_dirty(b);
> >
> > @@ -2764,3 +2766,18 @@ void bch_keybuf_init(struct keybuf *buf)
> >       spin_lock_init(&buf->lock);
> >       array_allocator_init(&buf->freelist);
> >  }
> > +
> > +void bch_btree_exit(void)
> > +{
> > +     if (btree_io_wq)
> > +             destroy_workqueue(btree_io_wq);
> > +}
> > +
> > +int __init bch_btree_init(void)
> > +{
> > +     btree_io_wq = create_singlethread_workqueue("bch_btree_io");
> > +     if (!btree_io_wq)
> > +             return -ENOMEM;
> > +
> > +     return 0;
> > +}
> > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > index 2047a9cccdb5d..77c5d8b6d4316 100644
> > --- a/drivers/md/bcache/super.c
> > +++ b/drivers/md/bcache/super.c
> > @@ -2821,6 +2821,7 @@ static void bcache_exit(void)
> >               destroy_workqueue(bcache_wq);
> >       if (bch_journal_wq)
> >               destroy_workqueue(bch_journal_wq);
> > +     bch_btree_exit();
> >
> >       if (bcache_major)
> >               unregister_blkdev(bcache_major, "bcache");
> > @@ -2876,6 +2877,9 @@ static int __init bcache_init(void)
> >               return bcache_major;
> >       }
> >
> > +     if (bch_btree_init())
> > +             goto err;
> > +
> >       bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM, 0);
> >       if (!bcache_wq)
> >               goto err;
> >
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-01-29 17:03 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 13:23 Fix degraded system performance due to workqueue overload Kai Krakow
2021-01-27 13:23 ` [PATCH 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
2021-01-27 16:28   ` Kai Krakow
2021-01-27 13:23 ` [PATCH 2/2] bcache: Move journal work to new background wq Kai Krakow
2021-01-27 16:28   ` Kai Krakow
2021-01-27 15:27 ` Fix degraded system performance due to workqueue overload Coly Li
2021-01-27 16:39 ` [PATCH 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
2021-01-27 16:39   ` [PATCH 2/2] bcache: Move journal work to new background wq Kai Krakow
2021-01-28 10:09 ` Fix degraded system performance due to workqueue overload Kai Krakow
2021-01-28 10:50 ` [PATCH v2 1/2] Revert "bcache: Kill btree_io_wq" Kai Krakow
2021-01-28 10:50   ` [PATCH v2 2/2] bcache: Move journal work to new background wq Kai Krakow
2021-01-28 16:37     ` Kai Krakow
2021-01-28 16:41       ` Kai Krakow
     [not found]         ` <988ba514-c607-688b-555d-18fbbb069f48@suse.de>
2021-01-29 16:36           ` Kai Krakow
2021-01-28 23:28 ` [PATCH v3 1/3] Revert "bcache: Kill btree_io_wq" Kai Krakow
2021-01-28 23:28   ` [PATCH v3 2/3] bcache: Give btree_io_wq correct semantics again Kai Krakow
2021-01-28 23:28   ` [PATCH v3 3/3] bcache: Move journal work to new background wq Kai Krakow
     [not found]     ` <a52b9107-7e84-0fea-6095-84a9576d7cc4@suse.de>
2021-01-29 16:37       ` Kai Krakow
     [not found]   ` <4fe07714-e5bf-4be3-6023-74b507ee54be@suse.de>
2021-01-29 16:59     ` [PATCH v3 1/3] Revert "bcache: Kill btree_io_wq" Kai Krakow
2021-01-29 16:40 ` [PATCH v4 " Kai Krakow
2021-01-29 16:40   ` [PATCH v4 2/3] bcache: Give btree_io_wq correct semantics again Kai Krakow
2021-01-29 16:40   ` [PATCH v4 3/3] bcache: Move journal work to new flush wq Kai Krakow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).