* Re: [PATCH v5] block: fix null pointer dereference in blk_mq_rq_timed_out()
2019-09-27 8:19 [PATCH v5] block: fix null pointer dereference in blk_mq_rq_timed_out() Yufen Yu
@ 2019-09-27 8:16 ` Ming Lei
2019-09-27 12:52 ` Bob Liu
2019-09-27 13:01 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Ming Lei @ 2019-09-27 8:16 UTC (permalink / raw)
To: Yufen Yu; +Cc: axboe, linux-block, hch, keith.busch, bvanassche
On Fri, Sep 27, 2019 at 04:19:55PM +0800, Yufen Yu wrote:
> We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
> as following:
>
> [ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
> [ 108.827059] PGD 0 P4D 0
> [ 108.827313] Oops: 0000 [#1] SMP PTI
> [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
> [ 108.829503] Workqueue: kblockd blk_mq_timeout_work
> [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
> [ 108.838191] Call Trace:
> [ 108.838406] bt_iter+0x74/0x80
> [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450
> [ 108.839074] ? __switch_to_asm+0x34/0x70
> [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40
> [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40
> [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f
> [ 108.840732] blk_mq_timeout_work+0x74/0x200
> [ 108.841151] process_one_work+0x297/0x680
> [ 108.841550] worker_thread+0x29c/0x6f0
> [ 108.841926] ? rescuer_thread+0x580/0x580
> [ 108.842344] kthread+0x16a/0x1a0
> [ 108.842666] ? kthread_flush_work+0x170/0x170
> [ 108.843100] ret_from_fork+0x35/0x40
>
> The bug is caused by the race between timeout handle and completion for
> flush request.
>
> When timeout handle function blk_mq_rq_timed_out() try to read
> 'req->q->mq_ops', the 'req' have completed and reinitiated by next
> flush request, which would call blk_rq_init() to clear 'req' as 0.
>
> After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
> normal requests lifetime are protected by refcount. Until 'rq->ref'
> drop to zero, the request can really be free. Thus, these requests
> cannot been reused before timeout handle finish.
>
> However, flush request has defined .end_io and rq->end_io() is still
> called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
> can be reused by the next flush request handle, resulting in null
> pointer deference BUG ON.
>
> We fix this problem by covering flush request with 'rq->ref'.
> If the refcount is not zero, flush_end_io() return and wait the
> last holder recall it. To record the request status, we add a new
> entry 'rq_status', which will be used in flush_end_io().
>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: stable@vger.kernel.org # v4.18+
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>
> -------
> v2:
> - move rq_status from struct request to struct blk_flush_queue
> v3:
> - remove unnecessary '{}' pair.
> v4:
> - let spinlock to protect 'fq->rq_status'
> v5:
> - move rq_status after flush_running_idx member of struct blk_flush_queue
> ---
> block/blk-flush.c | 10 ++++++++++
> block/blk-mq.c | 5 ++++-
> block/blk.h | 7 +++++++
> 3 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index aedd9320e605..1eec9cbe5a0a 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -214,6 +214,16 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
>
> /* release the tag's ownership to the req cloned from */
> spin_lock_irqsave(&fq->mq_flush_lock, flags);
> +
> + if (!refcount_dec_and_test(&flush_rq->ref)) {
> + fq->rq_status = error;
> + spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
> + return;
> + }
> +
> + if (fq->rq_status != BLK_STS_OK)
> + error = fq->rq_status;
> +
> hctx = flush_rq->mq_hctx;
> if (!q->elevator) {
> blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 20a49be536b5..e04fa9ab5574 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -912,7 +912,10 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
> */
> if (blk_mq_req_expired(rq, next))
> blk_mq_rq_timed_out(rq, reserved);
> - if (refcount_dec_and_test(&rq->ref))
> +
> + if (is_flush_rq(rq, hctx))
> + rq->end_io(rq, 0);
> + else if (refcount_dec_and_test(&rq->ref))
> __blk_mq_free_request(rq);
>
> return true;
> diff --git a/block/blk.h b/block/blk.h
> index ed347f7a97b1..2d8cdafee799 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -19,6 +19,7 @@ struct blk_flush_queue {
> unsigned int flush_queue_delayed:1;
> unsigned int flush_pending_idx:1;
> unsigned int flush_running_idx:1;
> + blk_status_t rq_status;
> unsigned long flush_pending_since;
> struct list_head flush_queue[2];
> struct list_head flush_data_in_flight;
> @@ -47,6 +48,12 @@ static inline void __blk_get_queue(struct request_queue *q)
> kobject_get(&q->kobj);
> }
>
> +static inline bool
> +is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx)
> +{
> + return hctx->fq->flush_rq == req;
> +}
> +
> struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
> int node, int cmd_size, gfp_t flags);
> void blk_free_flush_queue(struct blk_flush_queue *q);
> --
> 2.17.2
>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
thanks,
Ming
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v5] block: fix null pointer dereference in blk_mq_rq_timed_out()
@ 2019-09-27 8:19 Yufen Yu
2019-09-27 8:16 ` Ming Lei
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Yufen Yu @ 2019-09-27 8:19 UTC (permalink / raw)
To: axboe; +Cc: linux-block, ming.lei, hch, keith.busch, bvanassche
We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
as following:
[ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
[ 108.827059] PGD 0 P4D 0
[ 108.827313] Oops: 0000 [#1] SMP PTI
[ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
[ 108.829503] Workqueue: kblockd blk_mq_timeout_work
[ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
[ 108.838191] Call Trace:
[ 108.838406] bt_iter+0x74/0x80
[ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450
[ 108.839074] ? __switch_to_asm+0x34/0x70
[ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.840273] ? syscall_return_via_sysret+0xf/0x7f
[ 108.840732] blk_mq_timeout_work+0x74/0x200
[ 108.841151] process_one_work+0x297/0x680
[ 108.841550] worker_thread+0x29c/0x6f0
[ 108.841926] ? rescuer_thread+0x580/0x580
[ 108.842344] kthread+0x16a/0x1a0
[ 108.842666] ? kthread_flush_work+0x170/0x170
[ 108.843100] ret_from_fork+0x35/0x40
The bug is caused by the race between timeout handle and completion for
flush request.
When timeout handle function blk_mq_rq_timed_out() try to read
'req->q->mq_ops', the 'req' have completed and reinitiated by next
flush request, which would call blk_rq_init() to clear 'req' as 0.
After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
normal requests lifetime are protected by refcount. Until 'rq->ref'
drop to zero, the request can really be free. Thus, these requests
cannot been reused before timeout handle finish.
However, flush request has defined .end_io and rq->end_io() is still
called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
can be reused by the next flush request handle, resulting in null
pointer deference BUG ON.
We fix this problem by covering flush request with 'rq->ref'.
If the refcount is not zero, flush_end_io() return and wait the
last holder recall it. To record the request status, we add a new
entry 'rq_status', which will be used in flush_end_io().
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
-------
v2:
- move rq_status from struct request to struct blk_flush_queue
v3:
- remove unnecessary '{}' pair.
v4:
- let spinlock to protect 'fq->rq_status'
v5:
- move rq_status after flush_running_idx member of struct blk_flush_queue
---
block/blk-flush.c | 10 ++++++++++
block/blk-mq.c | 5 ++++-
block/blk.h | 7 +++++++
3 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index aedd9320e605..1eec9cbe5a0a 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -214,6 +214,16 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
/* release the tag's ownership to the req cloned from */
spin_lock_irqsave(&fq->mq_flush_lock, flags);
+
+ if (!refcount_dec_and_test(&flush_rq->ref)) {
+ fq->rq_status = error;
+ spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
+ return;
+ }
+
+ if (fq->rq_status != BLK_STS_OK)
+ error = fq->rq_status;
+
hctx = flush_rq->mq_hctx;
if (!q->elevator) {
blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 20a49be536b5..e04fa9ab5574 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -912,7 +912,10 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
*/
if (blk_mq_req_expired(rq, next))
blk_mq_rq_timed_out(rq, reserved);
- if (refcount_dec_and_test(&rq->ref))
+
+ if (is_flush_rq(rq, hctx))
+ rq->end_io(rq, 0);
+ else if (refcount_dec_and_test(&rq->ref))
__blk_mq_free_request(rq);
return true;
diff --git a/block/blk.h b/block/blk.h
index ed347f7a97b1..2d8cdafee799 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -19,6 +19,7 @@ struct blk_flush_queue {
unsigned int flush_queue_delayed:1;
unsigned int flush_pending_idx:1;
unsigned int flush_running_idx:1;
+ blk_status_t rq_status;
unsigned long flush_pending_since;
struct list_head flush_queue[2];
struct list_head flush_data_in_flight;
@@ -47,6 +48,12 @@ static inline void __blk_get_queue(struct request_queue *q)
kobject_get(&q->kobj);
}
+static inline bool
+is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx)
+{
+ return hctx->fq->flush_rq == req;
+}
+
struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
int node, int cmd_size, gfp_t flags);
void blk_free_flush_queue(struct blk_flush_queue *q);
--
2.17.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v5] block: fix null pointer dereference in blk_mq_rq_timed_out()
2019-09-27 8:19 [PATCH v5] block: fix null pointer dereference in blk_mq_rq_timed_out() Yufen Yu
2019-09-27 8:16 ` Ming Lei
@ 2019-09-27 12:52 ` Bob Liu
2019-09-27 13:01 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Bob Liu @ 2019-09-27 12:52 UTC (permalink / raw)
To: Yufen Yu, axboe; +Cc: linux-block, ming.lei, hch, keith.busch, bvanassche
On 9/27/19 4:19 PM, Yufen Yu wrote:
> We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
> as following:
>
> [ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
> [ 108.827059] PGD 0 P4D 0
> [ 108.827313] Oops: 0000 [#1] SMP PTI
> [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
> [ 108.829503] Workqueue: kblockd blk_mq_timeout_work
> [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
> [ 108.838191] Call Trace:
> [ 108.838406] bt_iter+0x74/0x80
> [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450
> [ 108.839074] ? __switch_to_asm+0x34/0x70
> [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40
> [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40
> [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f
> [ 108.840732] blk_mq_timeout_work+0x74/0x200
> [ 108.841151] process_one_work+0x297/0x680
> [ 108.841550] worker_thread+0x29c/0x6f0
> [ 108.841926] ? rescuer_thread+0x580/0x580
> [ 108.842344] kthread+0x16a/0x1a0
> [ 108.842666] ? kthread_flush_work+0x170/0x170
> [ 108.843100] ret_from_fork+0x35/0x40
>
> The bug is caused by the race between timeout handle and completion for
> flush request.
>
> When timeout handle function blk_mq_rq_timed_out() try to read
> 'req->q->mq_ops', the 'req' have completed and reinitiated by next
> flush request, which would call blk_rq_init() to clear 'req' as 0.
>
> After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
> normal requests lifetime are protected by refcount. Until 'rq->ref'
> drop to zero, the request can really be free. Thus, these requests
> cannot been reused before timeout handle finish.
>
> However, flush request has defined .end_io and rq->end_io() is still
> called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
> can be reused by the next flush request handle, resulting in null
> pointer deference BUG ON.
>
> We fix this problem by covering flush request with 'rq->ref'.
> If the refcount is not zero, flush_end_io() return and wait the
> last holder recall it. To record the request status, we add a new
> entry 'rq_status', which will be used in flush_end_io().
>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: stable@vger.kernel.org # v4.18+
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>
> -------
> v2:
> - move rq_status from struct request to struct blk_flush_queue
> v3:
> - remove unnecessary '{}' pair.
> v4:
> - let spinlock to protect 'fq->rq_status'
> v5:
> - move rq_status after flush_running_idx member of struct blk_flush_queue
> ---
> block/blk-flush.c | 10 ++++++++++
> block/blk-mq.c | 5 ++++-
> block/blk.h | 7 +++++++
> 3 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index aedd9320e605..1eec9cbe5a0a 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -214,6 +214,16 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
>
> /* release the tag's ownership to the req cloned from */
> spin_lock_irqsave(&fq->mq_flush_lock, flags);
> +
> + if (!refcount_dec_and_test(&flush_rq->ref)) {
> + fq->rq_status = error;
> + spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
> + return;
> + }
> +
> + if (fq->rq_status != BLK_STS_OK)
> + error = fq->rq_status;
> +
> hctx = flush_rq->mq_hctx;
> if (!q->elevator) {
> blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 20a49be536b5..e04fa9ab5574 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -912,7 +912,10 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
> */
> if (blk_mq_req_expired(rq, next))
> blk_mq_rq_timed_out(rq, reserved);
> - if (refcount_dec_and_test(&rq->ref))
> +
> + if (is_flush_rq(rq, hctx))
> + rq->end_io(rq, 0);
> + else if (refcount_dec_and_test(&rq->ref))
> __blk_mq_free_request(rq);
>
> return true;
> diff --git a/block/blk.h b/block/blk.h
> index ed347f7a97b1..2d8cdafee799 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -19,6 +19,7 @@ struct blk_flush_queue {
> unsigned int flush_queue_delayed:1;
> unsigned int flush_pending_idx:1;
> unsigned int flush_running_idx:1;
> + blk_status_t rq_status;
> unsigned long flush_pending_since;
> struct list_head flush_queue[2];
> struct list_head flush_data_in_flight;
> @@ -47,6 +48,12 @@ static inline void __blk_get_queue(struct request_queue *q)
> kobject_get(&q->kobj);
> }
>
> +static inline bool
> +is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx)
> +{
> + return hctx->fq->flush_rq == req;
> +}
> +
> struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
> int node, int cmd_size, gfp_t flags);
> void blk_free_flush_queue(struct blk_flush_queue *q);
>
Looks good to me.
Reviewed-by: Bob Liu <bob.liu@oracle.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v5] block: fix null pointer dereference in blk_mq_rq_timed_out()
2019-09-27 8:19 [PATCH v5] block: fix null pointer dereference in blk_mq_rq_timed_out() Yufen Yu
2019-09-27 8:16 ` Ming Lei
2019-09-27 12:52 ` Bob Liu
@ 2019-09-27 13:01 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2019-09-27 13:01 UTC (permalink / raw)
To: Yufen Yu; +Cc: linux-block, ming.lei, hch, keith.busch, bvanassche
On 9/27/19 10:19 AM, Yufen Yu wrote:
> We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
> as following:
>
> [ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
> [ 108.827059] PGD 0 P4D 0
> [ 108.827313] Oops: 0000 [#1] SMP PTI
> [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
> [ 108.829503] Workqueue: kblockd blk_mq_timeout_work
> [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
> [ 108.838191] Call Trace:
> [ 108.838406] bt_iter+0x74/0x80
> [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450
> [ 108.839074] ? __switch_to_asm+0x34/0x70
> [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40
> [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40
> [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f
> [ 108.840732] blk_mq_timeout_work+0x74/0x200
> [ 108.841151] process_one_work+0x297/0x680
> [ 108.841550] worker_thread+0x29c/0x6f0
> [ 108.841926] ? rescuer_thread+0x580/0x580
> [ 108.842344] kthread+0x16a/0x1a0
> [ 108.842666] ? kthread_flush_work+0x170/0x170
> [ 108.843100] ret_from_fork+0x35/0x40
>
> The bug is caused by the race between timeout handle and completion for
> flush request.
>
> When timeout handle function blk_mq_rq_timed_out() try to read
> 'req->q->mq_ops', the 'req' have completed and reinitiated by next
> flush request, which would call blk_rq_init() to clear 'req' as 0.
>
> After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
> normal requests lifetime are protected by refcount. Until 'rq->ref'
> drop to zero, the request can really be free. Thus, these requests
> cannot been reused before timeout handle finish.
>
> However, flush request has defined .end_io and rq->end_io() is still
> called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
> can be reused by the next flush request handle, resulting in null
> pointer deference BUG ON.
>
> We fix this problem by covering flush request with 'rq->ref'.
> If the refcount is not zero, flush_end_io() return and wait the
> last holder recall it. To record the request status, we add a new
> entry 'rq_status', which will be used in flush_end_io().
Thanks, applied.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-09-27 13:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27 8:19 [PATCH v5] block: fix null pointer dereference in blk_mq_rq_timed_out() Yufen Yu
2019-09-27 8:16 ` Ming Lei
2019-09-27 12:52 ` Bob Liu
2019-09-27 13:01 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).