linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9] block: cancel all throttled bios in del_gendisk()
@ 2022-02-10 11:56 Yu Kuai
  2022-02-25  7:18 ` yukuai (C)
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Yu Kuai @ 2022-02-10 11:56 UTC (permalink / raw)
  To: ming.lei, tj, axboe; +Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Throttled bios can't be issued after del_gendisk() is done, thus
it's better to cancel them immediately rather than waiting for
throttle is done.

For example, if user thread is throttled with low bps while it's
issuing large io, and the device is deleted. The user thread will
wait for a long time for io to return.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
Changes in v9:
 - some minor changes as suggested by Ming.
Changes in v8:
 - fold two patches into one
Changes in v7:
 - use the new solution as suggested by Ming.

 block/blk-throttle.c | 44 +++++++++++++++++++++++++++++++++++++++++---
 block/blk-throttle.h |  2 ++
 block/genhd.c        |  2 ++
 3 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7c462c006b26..ca92e5fa2769 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -43,8 +43,12 @@
 static struct workqueue_struct *kthrotld_workqueue;
 
 enum tg_state_flags {
-	THROTL_TG_PENDING	= 1 << 0,	/* on parent's pending tree */
-	THROTL_TG_WAS_EMPTY	= 1 << 1,	/* bio_lists[] became non-empty */
+	/* on parent's pending tree */
+	THROTL_TG_PENDING	= 1 << 0,
+	/* bio_lists[] became non-empty */
+	THROTL_TG_WAS_EMPTY	= 1 << 1,
+	/* starts to cancel all bios, will be set if the disk is deleted */
+	THROTL_TG_CANCELING	= 1 << 2,
 };
 
 #define rb_entry_tg(node)	rb_entry((node), struct throtl_grp, rb_node)
@@ -871,7 +875,8 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 	       bio != throtl_peek_queued(&tg->service_queue.queued[rw]));
 
 	/* If tg->bps = -1, then BW is unlimited */
-	if (bps_limit == U64_MAX && iops_limit == UINT_MAX) {
+	if ((bps_limit == U64_MAX && iops_limit == UINT_MAX) ||
+	    tg->flags & THROTL_TG_CANCELING) {
 		if (wait)
 			*wait = 0;
 		return true;
@@ -1763,6 +1768,39 @@ static bool throtl_hierarchy_can_upgrade(struct throtl_grp *tg)
 	return false;
 }
 
+void blk_throtl_cancel_bios(struct request_queue *q)
+{
+	struct cgroup_subsys_state *pos_css;
+	struct blkcg_gq *blkg;
+
+	spin_lock_irq(&q->queue_lock);
+	/*
+	 * queue_lock is held, rcu lock is not needed here technically.
+	 * However, rcu lock is still held to emphasize that following
+	 * path need RCU protection and to prevent warning from lockdep.
+	 */
+	rcu_read_lock();
+	blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) {
+		struct throtl_grp *tg = blkg_to_tg(blkg);
+		struct throtl_service_queue *sq = &tg->service_queue;
+
+		/*
+		 * Set the flag to make sure throtl_pending_timer_fn() won't
+		 * stop until all throttled bios are dispatched.
+		 */
+		blkg_to_tg(blkg)->flags |= THROTL_TG_CANCELING;
+		/*
+		 * Update disptime after setting the above flag to make sure
+		 * throtl_select_dispatch() won't exit without dispatching.
+		 */
+		tg_update_disptime(tg);
+
+		throtl_schedule_pending_timer(sq, jiffies + 1);
+	}
+	rcu_read_unlock();
+	spin_unlock_irq(&q->queue_lock);
+}
+
 static bool throtl_can_upgrade(struct throtl_data *td,
 	struct throtl_grp *this_tg)
 {
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index 175f03abd9e4..2ae467ac17ea 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -160,12 +160,14 @@ static inline void blk_throtl_exit(struct request_queue *q) { }
 static inline void blk_throtl_register_queue(struct request_queue *q) { }
 static inline void blk_throtl_charge_bio_split(struct bio *bio) { }
 static inline bool blk_throtl_bio(struct bio *bio) { return false; }
+static inline void blk_throtl_cancel_bios(struct request_queue *q) { }
 #else /* CONFIG_BLK_DEV_THROTTLING */
 int blk_throtl_init(struct request_queue *q);
 void blk_throtl_exit(struct request_queue *q);
 void blk_throtl_register_queue(struct request_queue *q);
 void blk_throtl_charge_bio_split(struct bio *bio);
 bool __blk_throtl_bio(struct bio *bio);
+void blk_throtl_cancel_bios(struct request_queue *q);
 static inline bool blk_throtl_bio(struct bio *bio)
 {
 	struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg);
diff --git a/block/genhd.c b/block/genhd.c
index 9589d1d59afa..6acc98cd0365 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -29,6 +29,7 @@
 #include "blk.h"
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
+#include "blk-throttle.h"
 
 static struct kobject *block_depr;
 
@@ -625,6 +626,7 @@ void del_gendisk(struct gendisk *disk)
 
 	blk_mq_freeze_queue_wait(q);
 
+	blk_throtl_cancel_bios(disk->queue);
 	rq_qos_exit(q);
 	blk_sync_queue(q);
 	blk_flush_integrity();
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-02-10 11:56 [PATCH v9] block: cancel all throttled bios in del_gendisk() Yu Kuai
@ 2022-02-25  7:18 ` yukuai (C)
  2022-02-25  8:11 ` Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: yukuai (C) @ 2022-02-25  7:18 UTC (permalink / raw)
  To: ming.lei, tj, axboe; +Cc: cgroups, linux-block, linux-kernel, yi.zhang

friendly ping ...

在 2022/02/10 19:56, Yu Kuai 写道:
> Throttled bios can't be issued after del_gendisk() is done, thus
> it's better to cancel them immediately rather than waiting for
> throttle is done.
> 
> For example, if user thread is throttled with low bps while it's
> issuing large io, and the device is deleted. The user thread will
> wait for a long time for io to return.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
> Changes in v9:
>   - some minor changes as suggested by Ming.
> Changes in v8:
>   - fold two patches into one
> Changes in v7:
>   - use the new solution as suggested by Ming.
> 
>   block/blk-throttle.c | 44 +++++++++++++++++++++++++++++++++++++++++---
>   block/blk-throttle.h |  2 ++
>   block/genhd.c        |  2 ++
>   3 files changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 7c462c006b26..ca92e5fa2769 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -43,8 +43,12 @@
>   static struct workqueue_struct *kthrotld_workqueue;
>   
>   enum tg_state_flags {
> -	THROTL_TG_PENDING	= 1 << 0,	/* on parent's pending tree */
> -	THROTL_TG_WAS_EMPTY	= 1 << 1,	/* bio_lists[] became non-empty */
> +	/* on parent's pending tree */
> +	THROTL_TG_PENDING	= 1 << 0,
> +	/* bio_lists[] became non-empty */
> +	THROTL_TG_WAS_EMPTY	= 1 << 1,
> +	/* starts to cancel all bios, will be set if the disk is deleted */
> +	THROTL_TG_CANCELING	= 1 << 2,
>   };
>   
>   #define rb_entry_tg(node)	rb_entry((node), struct throtl_grp, rb_node)
> @@ -871,7 +875,8 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
>   	       bio != throtl_peek_queued(&tg->service_queue.queued[rw]));
>   
>   	/* If tg->bps = -1, then BW is unlimited */
> -	if (bps_limit == U64_MAX && iops_limit == UINT_MAX) {
> +	if ((bps_limit == U64_MAX && iops_limit == UINT_MAX) ||
> +	    tg->flags & THROTL_TG_CANCELING) {
>   		if (wait)
>   			*wait = 0;
>   		return true;
> @@ -1763,6 +1768,39 @@ static bool throtl_hierarchy_can_upgrade(struct throtl_grp *tg)
>   	return false;
>   }
>   
> +void blk_throtl_cancel_bios(struct request_queue *q)
> +{
> +	struct cgroup_subsys_state *pos_css;
> +	struct blkcg_gq *blkg;
> +
> +	spin_lock_irq(&q->queue_lock);
> +	/*
> +	 * queue_lock is held, rcu lock is not needed here technically.
> +	 * However, rcu lock is still held to emphasize that following
> +	 * path need RCU protection and to prevent warning from lockdep.
> +	 */
> +	rcu_read_lock();
> +	blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) {
> +		struct throtl_grp *tg = blkg_to_tg(blkg);
> +		struct throtl_service_queue *sq = &tg->service_queue;
> +
> +		/*
> +		 * Set the flag to make sure throtl_pending_timer_fn() won't
> +		 * stop until all throttled bios are dispatched.
> +		 */
> +		blkg_to_tg(blkg)->flags |= THROTL_TG_CANCELING;
> +		/*
> +		 * Update disptime after setting the above flag to make sure
> +		 * throtl_select_dispatch() won't exit without dispatching.
> +		 */
> +		tg_update_disptime(tg);
> +
> +		throtl_schedule_pending_timer(sq, jiffies + 1);
> +	}
> +	rcu_read_unlock();
> +	spin_unlock_irq(&q->queue_lock);
> +}
> +
>   static bool throtl_can_upgrade(struct throtl_data *td,
>   	struct throtl_grp *this_tg)
>   {
> diff --git a/block/blk-throttle.h b/block/blk-throttle.h
> index 175f03abd9e4..2ae467ac17ea 100644
> --- a/block/blk-throttle.h
> +++ b/block/blk-throttle.h
> @@ -160,12 +160,14 @@ static inline void blk_throtl_exit(struct request_queue *q) { }
>   static inline void blk_throtl_register_queue(struct request_queue *q) { }
>   static inline void blk_throtl_charge_bio_split(struct bio *bio) { }
>   static inline bool blk_throtl_bio(struct bio *bio) { return false; }
> +static inline void blk_throtl_cancel_bios(struct request_queue *q) { }
>   #else /* CONFIG_BLK_DEV_THROTTLING */
>   int blk_throtl_init(struct request_queue *q);
>   void blk_throtl_exit(struct request_queue *q);
>   void blk_throtl_register_queue(struct request_queue *q);
>   void blk_throtl_charge_bio_split(struct bio *bio);
>   bool __blk_throtl_bio(struct bio *bio);
> +void blk_throtl_cancel_bios(struct request_queue *q);
>   static inline bool blk_throtl_bio(struct bio *bio)
>   {
>   	struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg);
> diff --git a/block/genhd.c b/block/genhd.c
> index 9589d1d59afa..6acc98cd0365 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -29,6 +29,7 @@
>   #include "blk.h"
>   #include "blk-mq-sched.h"
>   #include "blk-rq-qos.h"
> +#include "blk-throttle.h"
>   
>   static struct kobject *block_depr;
>   
> @@ -625,6 +626,7 @@ void del_gendisk(struct gendisk *disk)
>   
>   	blk_mq_freeze_queue_wait(q);
>   
> +	blk_throtl_cancel_bios(disk->queue);
>   	rq_qos_exit(q);
>   	blk_sync_queue(q);
>   	blk_flush_integrity();
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-02-10 11:56 [PATCH v9] block: cancel all throttled bios in del_gendisk() Yu Kuai
  2022-02-25  7:18 ` yukuai (C)
@ 2022-02-25  8:11 ` Ming Lei
  2022-02-25 14:27 ` Jens Axboe
  2022-02-27 17:16 ` Christoph Hellwig
  3 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2022-02-25  8:11 UTC (permalink / raw)
  To: Yu Kuai; +Cc: tj, axboe, cgroups, linux-block, linux-kernel, yi.zhang

On Thu, Feb 10, 2022 at 07:56:37PM +0800, Yu Kuai wrote:
> Throttled bios can't be issued after del_gendisk() is done, thus
> it's better to cancel them immediately rather than waiting for
> throttle is done.
> 
> For example, if user thread is throttled with low bps while it's
> issuing large io, and the device is deleted. The user thread will
> wait for a long time for io to return.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
> Changes in v9:
>  - some minor changes as suggested by Ming.

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-02-10 11:56 [PATCH v9] block: cancel all throttled bios in del_gendisk() Yu Kuai
  2022-02-25  7:18 ` yukuai (C)
  2022-02-25  8:11 ` Ming Lei
@ 2022-02-25 14:27 ` Jens Axboe
  2022-02-27 17:16 ` Christoph Hellwig
  3 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2022-02-25 14:27 UTC (permalink / raw)
  To: Yu Kuai, ming.lei, tj; +Cc: cgroups, linux-block, linux-kernel, yi.zhang

On 2/10/22 4:56 AM, Yu Kuai wrote:
> Throttled bios can't be issued after del_gendisk() is done, thus
> it's better to cancel them immediately rather than waiting for
> throttle is done.
> 
> For example, if user thread is throttled with low bps while it's
> issuing large io, and the device is deleted. The user thread will
> wait for a long time for io to return.

I hand applied this for 5.18 as it's conflicting with other
changes. Please double check the end result.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-02-10 11:56 [PATCH v9] block: cancel all throttled bios in del_gendisk() Yu Kuai
                   ` (2 preceding siblings ...)
  2022-02-25 14:27 ` Jens Axboe
@ 2022-02-27 17:16 ` Christoph Hellwig
  2022-02-27 21:51   ` Jens Axboe
  2022-02-28  6:11   ` Ming Lei
  3 siblings, 2 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-02-27 17:16 UTC (permalink / raw)
  To: Yu Kuai; +Cc: ming.lei, tj, axboe, cgroups, linux-block, linux-kernel, yi.zhang

On Thu, Feb 10, 2022 at 07:56:37PM +0800, Yu Kuai wrote:
> Throttled bios can't be issued after del_gendisk() is done, thus
> it's better to cancel them immediately rather than waiting for
> throttle is done.
> 
> For example, if user thread is throttled with low bps while it's
> issuing large io, and the device is deleted. The user thread will
> wait for a long time for io to return.

FYI, this crashed left rigt and center when running xfstests with
traces pointing to throtl_pending_timer_fn.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-02-27 17:16 ` Christoph Hellwig
@ 2022-02-27 21:51   ` Jens Axboe
  2022-02-28  6:11   ` Ming Lei
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2022-02-27 21:51 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: ming.lei, tj, cgroups, linux-block, linux-kernel, yi.zhang

On 2/27/22 10:16 AM, Christoph Hellwig wrote:
> On Thu, Feb 10, 2022 at 07:56:37PM +0800, Yu Kuai wrote:
>> Throttled bios can't be issued after del_gendisk() is done, thus
>> it's better to cancel them immediately rather than waiting for
>> throttle is done.
>>
>> For example, if user thread is throttled with low bps while it's
>> issuing large io, and the device is deleted. The user thread will
>> wait for a long time for io to return.
> 
> FYI, this crashed left rigt and center when running xfstests with
> traces pointing to throtl_pending_timer_fn.

Dropped for now.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-02-27 17:16 ` Christoph Hellwig
  2022-02-27 21:51   ` Jens Axboe
@ 2022-02-28  6:11   ` Ming Lei
  2022-02-28  9:40     ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Ming Lei @ 2022-02-28  6:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, tj, axboe, cgroups, linux-block, linux-kernel, yi.zhang

Hi Christoph,

On Sun, Feb 27, 2022 at 09:16:54AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 10, 2022 at 07:56:37PM +0800, Yu Kuai wrote:
> > Throttled bios can't be issued after del_gendisk() is done, thus
> > it's better to cancel them immediately rather than waiting for
> > throttle is done.
> > 
> > For example, if user thread is throttled with low bps while it's
> > issuing large io, and the device is deleted. The user thread will
> > wait for a long time for io to return.
> 
> FYI, this crashed left rigt and center when running xfstests with
> traces pointing to throtl_pending_timer_fn.

Can you share the exact xfstests test(fs, test)? Or panic log?

I can't reproduce it when running './check -g auto' on XFS, meantime
tracking throtl_pending_timer_fn().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-02-28  6:11   ` Ming Lei
@ 2022-02-28  9:40     ` Christoph Hellwig
  2022-03-01 10:29       ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2022-02-28  9:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Yu Kuai, tj, axboe, cgroups, linux-block,
	linux-kernel, yi.zhang

On Mon, Feb 28, 2022 at 02:11:30PM +0800, Ming Lei wrote:
> > FYI, this crashed left rigt and center when running xfstests with
> > traces pointing to throtl_pending_timer_fn.
> 
> Can you share the exact xfstests test(fs, test)? Or panic log?
> 
> I can't reproduce it when running './check -g auto' on XFS, meantime
> tracking throtl_pending_timer_fn().

From a quick run using f2fs:

generic/081 files ... [  316.487861] run fstests generic/081 at 2022-02-28 09:38:40
[  318.291133] F2FS-fs (dm-3): Found nat_bits in checkpoint
[  318.298016] F2FS-fs (dm-3): Mounted with checkpoint version = 526422b7
[  318.363888] device-mapper: snapshots: Invalidating snapshot: Unable to allocate exceptio.
[  318.540023] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6I
[  318.541556] CPU: 0 PID: 13947 Comm: dmsetup Not tainted 5.17.0-rc2+ #1074
[  318.542514] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/24
[  318.543695] RIP: 0010:__lock_acquire+0x5b4/0x1de0
[  318.544351] Code: 89 84 24 40 09 00 00 0f 87 2a 05 00 00 3b 05 17 c7 b2 03 41 bf 01 00 03
[  318.546881] RSP: 0000:ffffc90000003cf8 EFLAGS: 00010002
[  318.547610] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[  318.548556] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6bf3
[  318.549497] RBP: 6b6b6b6b6b6b6bf3 R08: 0000000000000001 R09: 0000000000000001
[  318.550438] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8880117c8040
[  318.551378] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  318.552323] FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[  318.553380] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  318.554119] CR2: 00007f54883c9028 CR3: 0000000012a52000 CR4: 00000000000006f0
[  318.555036] Call Trace:
[  318.555360]  <IRQ>
[  318.555630]  lock_acquire+0xd6/0x300
[  318.556099]  ? throtl_pending_timer_fn+0x69/0xa50
[  318.556709]  ? _raw_spin_lock_irq+0x4f/0x60
[  318.557252]  ? throtl_pd_offline+0x60/0x60
[  318.557783]  _raw_spin_lock_irq+0x40/0x60
[  318.558303]  ? throtl_pending_timer_fn+0x69/0xa50
[  318.558915]  throtl_pending_timer_fn+0x69/0xa50
[  318.559485]  ? throtl_pd_offline+0x60/0x60
[  318.560003]  ? throtl_pd_offline+0x60/0x60
[  318.560520]  call_timer_fn+0x9f/0x2c0
[  318.560985]  __run_timers.part.0+0x1fc/0x2f0
[  318.561524]  ? lock_is_held_type+0xe4/0x140
[  318.562055]  run_timer_softirq+0x2c/0x60
[  318.562550]  __do_softirq+0x174/0x512
[  318.563016]  __irq_exit_rcu+0xdf/0x130
[  318.563491]  irq_exit_rcu+0x5/0x20
[  318.563926]  sysvec_apic_timer_interrupt+0xa2/0xd0
[  318.564529]  </IRQ>
[  318.564800]  <TASK>
[  318.565072]  asm_sysvec_apic_timer_interrupt+0x12/0x20

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-02-28  9:40     ` Christoph Hellwig
@ 2022-03-01 10:29       ` Ming Lei
  2022-03-01 13:54         ` yukuai (C)
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2022-03-01 10:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, tj, axboe, cgroups, linux-block, linux-kernel, yi.zhang

On Mon, Feb 28, 2022 at 01:40:53AM -0800, Christoph Hellwig wrote:
> On Mon, Feb 28, 2022 at 02:11:30PM +0800, Ming Lei wrote:
> > > FYI, this crashed left rigt and center when running xfstests with
> > > traces pointing to throtl_pending_timer_fn.
> > 
> > Can you share the exact xfstests test(fs, test)? Or panic log?
> > 
> > I can't reproduce it when running './check -g auto' on XFS, meantime
> > tracking throtl_pending_timer_fn().
> 
> From a quick run using f2fs:
> 
> generic/081 files ... [  316.487861] run fstests generic/081 at 2022-02-28 09:38:40

Thanks for providing the reproducer.

The reason is that the pending timer is deleted in blkg's release
handler, so the timer can still be live after request queue is released.

The patch of 'block: cancel all throttled bios in del_gendisk()' should just
make it easier to trigger.

After patch of "block: move blkcg initialization/destroy into disk allocation/
release handler" lands, the issue can be fixed easily by:

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index fa063c6c0338..e8d4be5e1de3 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -82,6 +82,7 @@ static void blkg_free(struct blkcg_gq *blkg)
 		if (blkg->pd[i])
 			blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
 
+	blk_put_queue(blkg->q);
 	free_percpu(blkg->iostat_cpu);
 	percpu_ref_exit(&blkg->refcnt);
 	kfree(blkg);
@@ -297,9 +298,10 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 	blkg->online = true;
 	spin_unlock(&blkcg->lock);
 
-	if (!ret)
+	if (!ret && blk_get_queue(q))
 		return blkg;
-
+	else if (!ret)
+		ret = -ENODEV;
 	/* @blkg failed fully initialized, use the usual release path */
 	blkg_put(blkg);
 	return ERR_PTR(ret);


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-03-01 10:29       ` Ming Lei
@ 2022-03-01 13:54         ` yukuai (C)
  2022-03-02  0:51           ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: yukuai (C) @ 2022-03-01 13:54 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig
  Cc: tj, axboe, cgroups, linux-block, linux-kernel, yi.zhang

在 2022/03/01 18:29, Ming Lei 写道:
> On Mon, Feb 28, 2022 at 01:40:53AM -0800, Christoph Hellwig wrote:
>> On Mon, Feb 28, 2022 at 02:11:30PM +0800, Ming Lei wrote:
>>>> FYI, this crashed left rigt and center when running xfstests with
>>>> traces pointing to throtl_pending_timer_fn.
>>>
>>> Can you share the exact xfstests test(fs, test)? Or panic log?
>>>
>>> I can't reproduce it when running './check -g auto' on XFS, meantime
>>> tracking throtl_pending_timer_fn().
>>
>>  From a quick run using f2fs:
>>
>> generic/081 files ... [  316.487861] run fstests generic/081 at 2022-02-28 09:38:40
> 
> Thanks for providing the reproducer.
> 
> The reason is that the pending timer is deleted in blkg's release
> handler, so the timer can still be live after request queue is released.
> 
> The patch of 'block: cancel all throttled bios in del_gendisk()' should just
> make it easier to trigger.
> 
> After patch of "block: move blkcg initialization/destroy into disk allocation/
> release handler" lands, the issue can be fixed easily by:

Hi,

Thanks for locating this problem,

Perhaps this patch should wait for the problem to be solved.

Thanks,
Kuai
> 
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index fa063c6c0338..e8d4be5e1de3 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -82,6 +82,7 @@ static void blkg_free(struct blkcg_gq *blkg)
>   		if (blkg->pd[i])
>   			blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
>   
> +	blk_put_queue(blkg->q);
>   	free_percpu(blkg->iostat_cpu);
>   	percpu_ref_exit(&blkg->refcnt);
>   	kfree(blkg);
> @@ -297,9 +298,10 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
>   	blkg->online = true;
>   	spin_unlock(&blkcg->lock);
>   
> -	if (!ret)
> +	if (!ret && blk_get_queue(q))
>   		return blkg;
> -
> +	else if (!ret)
> +		ret = -ENODEV;
>   	/* @blkg failed fully initialized, use the usual release path */
>   	blkg_put(blkg);
>   	return ERR_PTR(ret);
> 
> 
> Thanks,
> Ming
> 
> .
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-03-01 13:54         ` yukuai (C)
@ 2022-03-02  0:51           ` Ming Lei
  2022-03-17 11:22             ` yukuai (C)
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2022-03-02  0:51 UTC (permalink / raw)
  To: yukuai (C)
  Cc: Christoph Hellwig, tj, axboe, cgroups, linux-block, linux-kernel,
	yi.zhang

On Tue, Mar 01, 2022 at 09:54:28PM +0800, yukuai (C) wrote:
> 在 2022/03/01 18:29, Ming Lei 写道:
> > On Mon, Feb 28, 2022 at 01:40:53AM -0800, Christoph Hellwig wrote:
> > > On Mon, Feb 28, 2022 at 02:11:30PM +0800, Ming Lei wrote:
> > > > > FYI, this crashed left rigt and center when running xfstests with
> > > > > traces pointing to throtl_pending_timer_fn.
> > > > 
> > > > Can you share the exact xfstests test(fs, test)? Or panic log?
> > > > 
> > > > I can't reproduce it when running './check -g auto' on XFS, meantime
> > > > tracking throtl_pending_timer_fn().
> > > 
> > >  From a quick run using f2fs:
> > > 
> > > generic/081 files ... [  316.487861] run fstests generic/081 at 2022-02-28 09:38:40
> > 
> > Thanks for providing the reproducer.
> > 
> > The reason is that the pending timer is deleted in blkg's release
> > handler, so the timer can still be live after request queue is released.
> > 
> > The patch of 'block: cancel all throttled bios in del_gendisk()' should just
> > make it easier to trigger.
> > 
> > After patch of "block: move blkcg initialization/destroy into disk allocation/
> > release handler" lands, the issue can be fixed easily by:
> 
> Hi,
> 
> Thanks for locating this problem,
> 
> Perhaps this patch should wait for the problem to be solved.

BTW, please see the top 3 patches in the following tree:

https://github.com/ming1/linux/commits/my_v5.18-pre-rc

xfstests generic/081 can run for hours without problems, without the fix,
the crash can be triggered in 10 minutes.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-03-02  0:51           ` Ming Lei
@ 2022-03-17 11:22             ` yukuai (C)
  2022-03-18  7:04               ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: yukuai (C) @ 2022-03-17 11:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, tj, axboe, cgroups, linux-block, linux-kernel,
	yi.zhang

在 2022/03/02 8:51, Ming Lei 写道:
> On Tue, Mar 01, 2022 at 09:54:28PM +0800, yukuai (C) wrote:
>> 在 2022/03/01 18:29, Ming Lei 写道:
>>> On Mon, Feb 28, 2022 at 01:40:53AM -0800, Christoph Hellwig wrote:
>>>> On Mon, Feb 28, 2022 at 02:11:30PM +0800, Ming Lei wrote:
>>>>>> FYI, this crashed left rigt and center when running xfstests with
>>>>>> traces pointing to throtl_pending_timer_fn.
>>>>>
>>>>> Can you share the exact xfstests test(fs, test)? Or panic log?
>>>>>
>>>>> I can't reproduce it when running './check -g auto' on XFS, meantime
>>>>> tracking throtl_pending_timer_fn().
>>>>
>>>>   From a quick run using f2fs:
>>>>
>>>> generic/081 files ... [  316.487861] run fstests generic/081 at 2022-02-28 09:38:40
>>>
>>> Thanks for providing the reproducer.
>>>
>>> The reason is that the pending timer is deleted in blkg's release
>>> handler, so the timer can still be live after request queue is released.
>>>
>>> The patch of 'block: cancel all throttled bios in del_gendisk()' should just
>>> make it easier to trigger.
>>>
>>> After patch of "block: move blkcg initialization/destroy into disk allocation/
>>> release handler" lands, the issue can be fixed easily by:

Hi, Ming

Now that the above patch is landed in linux-next, do you intend to fix
the above problem? Or I can take over if you don't mind.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9] block: cancel all throttled bios in del_gendisk()
  2022-03-17 11:22             ` yukuai (C)
@ 2022-03-18  7:04               ` Ming Lei
  0 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2022-03-18  7:04 UTC (permalink / raw)
  To: yukuai (C)
  Cc: Christoph Hellwig, tj, axboe, cgroups, linux-block, linux-kernel,
	yi.zhang

On Thu, Mar 17, 2022 at 07:22:04PM +0800, yukuai (C) wrote:
> 在 2022/03/02 8:51, Ming Lei 写道:
> > On Tue, Mar 01, 2022 at 09:54:28PM +0800, yukuai (C) wrote:
> > > 在 2022/03/01 18:29, Ming Lei 写道:
> > > > On Mon, Feb 28, 2022 at 01:40:53AM -0800, Christoph Hellwig wrote:
> > > > > On Mon, Feb 28, 2022 at 02:11:30PM +0800, Ming Lei wrote:
> > > > > > > FYI, this crashed left rigt and center when running xfstests with
> > > > > > > traces pointing to throtl_pending_timer_fn.
> > > > > > 
> > > > > > Can you share the exact xfstests test(fs, test)? Or panic log?
> > > > > > 
> > > > > > I can't reproduce it when running './check -g auto' on XFS, meantime
> > > > > > tracking throtl_pending_timer_fn().
> > > > > 
> > > > >   From a quick run using f2fs:
> > > > > 
> > > > > generic/081 files ... [  316.487861] run fstests generic/081 at 2022-02-28 09:38:40
> > > > 
> > > > Thanks for providing the reproducer.
> > > > 
> > > > The reason is that the pending timer is deleted in blkg's release
> > > > handler, so the timer can still be live after request queue is released.
> > > > 
> > > > The patch of 'block: cancel all throttled bios in del_gendisk()' should just
> > > > make it easier to trigger.
> > > > 
> > > > After patch of "block: move blkcg initialization/destroy into disk allocation/
> > > > release handler" lands, the issue can be fixed easily by:
> 
> Hi, Ming
> 
> Now that the above patch is landed in linux-next, do you intend to fix
> the above problem? Or I can take over if you don't mind.

Hi Yu Kuai, 

I will send one fix and your V9 later.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-03-18  7:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10 11:56 [PATCH v9] block: cancel all throttled bios in del_gendisk() Yu Kuai
2022-02-25  7:18 ` yukuai (C)
2022-02-25  8:11 ` Ming Lei
2022-02-25 14:27 ` Jens Axboe
2022-02-27 17:16 ` Christoph Hellwig
2022-02-27 21:51   ` Jens Axboe
2022-02-28  6:11   ` Ming Lei
2022-02-28  9:40     ` Christoph Hellwig
2022-03-01 10:29       ` Ming Lei
2022-03-01 13:54         ` yukuai (C)
2022-03-02  0:51           ` Ming Lei
2022-03-17 11:22             ` yukuai (C)
2022-03-18  7:04               ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).