linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
@ 2019-07-22 17:30 Guenter Roeck
  2019-07-22 20:29 ` Guenter Roeck
  2019-07-28 15:19 ` Guenter Roeck
  0 siblings, 2 replies; 10+ messages in thread
From: Guenter Roeck @ 2019-07-22 17:30 UTC (permalink / raw)
  To: Paolo Valente, Jens Axboe
  Cc: linux-block, linux-kernel, Guenter Roeck, Hsin-Yi Wang,
	Nicolas Boichat, Doug Anderson

In bfq_insert_request(), bfqq is initialized with:
	bfqq = bfq_init_rq(rq);
In bfq_init_rq(), we find:
	if (unlikely(!rq->elv.icq))
		return NULL;
Indeed, rq->elv.icq can be NULL if the memory allocation in
create_task_io_context() failed.

A comment in bfq_insert_request() suggests that bfqq is supposed to be
non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
debugging and practical experience shows, this is not the case in the
above situation.

This results in the following crash.

Unable to handle kernel NULL pointer dereference
	at virtual address 00000000000001b0
...
Call trace:
 bfq_setup_cooperator+0x44/0x134
 bfq_insert_requests+0x10c/0x630
 blk_mq_sched_insert_requests+0x60/0xb4
 blk_mq_flush_plug_list+0x290/0x2d4
 blk_flush_plug_list+0xe0/0x230
 blk_finish_plug+0x30/0x40
 generic_writepages+0x60/0x94
 blkdev_writepages+0x24/0x30
 do_writepages+0x74/0xac
 __filemap_fdatawrite_range+0x94/0xc8
 file_write_and_wait_range+0x44/0xa0
 blkdev_fsync+0x38/0x68
 vfs_fsync_range+0x68/0x80
 do_fsync+0x44/0x80

The problem is relatively easy to reproduce by running an image with
failslab enabled, such as with:

cd /sys/kernel/debug/failslab
echo 10 > probability
echo 300 > times

Avoid the problem by checking if bfqq is NULL before using it. With the
NULL check in place, requests with missing io context are queued
immediately, and the crash is no longer seen.

Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
Reported-by: Hsin-Yi Wang  <hsinyi@google.com>
Cc: Hsin-Yi Wang <hsinyi@google.com>
Cc: Nicolas Boichat <drinkcat@chromium.org>
Cc: Doug Anderson <dianders@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 block/bfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 72860325245a..56f3f4227010 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	spin_lock_irq(&bfqd->lock);
 	bfqq = bfq_init_rq(rq);
-	if (at_head || blk_rq_is_passthrough(rq)) {
+	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
 		if (at_head)
 			list_add(&rq->queuelist, &bfqd->dispatch);
 		else
-- 
2.7.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
  2019-07-22 17:30 [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request Guenter Roeck
@ 2019-07-22 20:29 ` Guenter Roeck
  2019-07-22 23:36   ` Bob Liu
  2019-07-28 15:19 ` Guenter Roeck
  1 sibling, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2019-07-22 20:29 UTC (permalink / raw)
  To: Paolo Valente, Jens Axboe
  Cc: linux-block, linux-kernel, Hsin-Yi Wang, Nicolas Boichat, Doug Anderson

On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote:
> In bfq_insert_request(), bfqq is initialized with:
> 	bfqq = bfq_init_rq(rq);
> In bfq_init_rq(), we find:
> 	if (unlikely(!rq->elv.icq))
> 		return NULL;
> Indeed, rq->elv.icq can be NULL if the memory allocation in
> create_task_io_context() failed.
> 

The above function should have been ioc_create_icq(), sorry.

Guenter

> A comment in bfq_insert_request() suggests that bfqq is supposed to be
> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
> debugging and practical experience shows, this is not the case in the
> above situation.
> 
> This results in the following crash.
> 
> Unable to handle kernel NULL pointer dereference
> 	at virtual address 00000000000001b0
> ...
> Call trace:
>  bfq_setup_cooperator+0x44/0x134
>  bfq_insert_requests+0x10c/0x630
>  blk_mq_sched_insert_requests+0x60/0xb4
>  blk_mq_flush_plug_list+0x290/0x2d4
>  blk_flush_plug_list+0xe0/0x230
>  blk_finish_plug+0x30/0x40
>  generic_writepages+0x60/0x94
>  blkdev_writepages+0x24/0x30
>  do_writepages+0x74/0xac
>  __filemap_fdatawrite_range+0x94/0xc8
>  file_write_and_wait_range+0x44/0xa0
>  blkdev_fsync+0x38/0x68
>  vfs_fsync_range+0x68/0x80
>  do_fsync+0x44/0x80
> 
> The problem is relatively easy to reproduce by running an image with
> failslab enabled, such as with:
> 
> cd /sys/kernel/debug/failslab
> echo 10 > probability
> echo 300 > times
> 
> Avoid the problem by checking if bfqq is NULL before using it. With the
> NULL check in place, requests with missing io context are queued
> immediately, and the crash is no longer seen.
> 
> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
> Reported-by: Hsin-Yi Wang  <hsinyi@google.com>
> Cc: Hsin-Yi Wang <hsinyi@google.com>
> Cc: Nicolas Boichat <drinkcat@chromium.org>
> Cc: Doug Anderson <dianders@chromium.org>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
>  block/bfq-iosched.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 72860325245a..56f3f4227010 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  
>  	spin_lock_irq(&bfqd->lock);
>  	bfqq = bfq_init_rq(rq);
> -	if (at_head || blk_rq_is_passthrough(rq)) {
> +	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>  		if (at_head)
>  			list_add(&rq->queuelist, &bfqd->dispatch);
>  		else
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
  2019-07-22 20:29 ` Guenter Roeck
@ 2019-07-22 23:36   ` Bob Liu
  2019-07-23  0:06     ` Guenter Roeck
  0 siblings, 1 reply; 10+ messages in thread
From: Bob Liu @ 2019-07-22 23:36 UTC (permalink / raw)
  To: Guenter Roeck, Paolo Valente, Jens Axboe
  Cc: linux-block, linux-kernel, Hsin-Yi Wang, Nicolas Boichat, Doug Anderson

On 7/23/19 4:29 AM, Guenter Roeck wrote:
> On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote:
>> In bfq_insert_request(), bfqq is initialized with:
>> 	bfqq = bfq_init_rq(rq);
>> In bfq_init_rq(), we find:
>> 	if (unlikely(!rq->elv.icq))
>> 		return NULL;
>> Indeed, rq->elv.icq can be NULL if the memory allocation in
>> create_task_io_context() failed.
>>
> 
> The above function should have been ioc_create_icq(), sorry.
> 
> Guenter
> 
>> A comment in bfq_insert_request() suggests that bfqq is supposed to be
>> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
>> debugging and practical experience shows, this is not the case in the
>> above situation.
>>
>> This results in the following crash.
>>
>> Unable to handle kernel NULL pointer dereference
>> 	at virtual address 00000000000001b0
>> ...
>> Call trace:
>>  bfq_setup_cooperator+0x44/0x134
>>  bfq_insert_requests+0x10c/0x630
>>  blk_mq_sched_insert_requests+0x60/0xb4
>>  blk_mq_flush_plug_list+0x290/0x2d4
>>  blk_flush_plug_list+0xe0/0x230
>>  blk_finish_plug+0x30/0x40
>>  generic_writepages+0x60/0x94
>>  blkdev_writepages+0x24/0x30
>>  do_writepages+0x74/0xac
>>  __filemap_fdatawrite_range+0x94/0xc8
>>  file_write_and_wait_range+0x44/0xa0
>>  blkdev_fsync+0x38/0x68
>>  vfs_fsync_range+0x68/0x80
>>  do_fsync+0x44/0x80
>>
>> The problem is relatively easy to reproduce by running an image with
>> failslab enabled, such as with:
>>
>> cd /sys/kernel/debug/failslab
>> echo 10 > probability
>> echo 300 > times
>>
>> Avoid the problem by checking if bfqq is NULL before using it. With the
>> NULL check in place, requests with missing io context are queued
>> immediately, and the crash is no longer seen.
>>


What about other place use blk_init_rq()?
E.g in bfq_request_merged():
1897                 struct bfq_queue *bfqq = bfq_init_rq(req);
1898                 struct bfq_data *bfqd = bfqq->bfqd;

Which may have same Null-pointer bug?

-Bob

>> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
>> Reported-by: Hsin-Yi Wang  <hsinyi@google.com>
>> Cc: Hsin-Yi Wang <hsinyi@google.com>
>> Cc: Nicolas Boichat <drinkcat@chromium.org>
>> Cc: Doug Anderson <dianders@chromium.org>
>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>> ---
>>  block/bfq-iosched.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 72860325245a..56f3f4227010 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>  
>>  	spin_lock_irq(&bfqd->lock);
>>  	bfqq = bfq_init_rq(rq);
>> -	if (at_head || blk_rq_is_passthrough(rq)) {
>> +	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>>  		if (at_head)
>>  			list_add(&rq->queuelist, &bfqd->dispatch);
>>  		else
>> -- 
>> 2.7.4
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
  2019-07-22 23:36   ` Bob Liu
@ 2019-07-23  0:06     ` Guenter Roeck
  0 siblings, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2019-07-23  0:06 UTC (permalink / raw)
  To: Bob Liu
  Cc: Paolo Valente, Jens Axboe, linux-block, linux-kernel,
	Hsin-Yi Wang, Nicolas Boichat, Doug Anderson

On Tue, Jul 23, 2019 at 07:36:16AM +0800, Bob Liu wrote:
> On 7/23/19 4:29 AM, Guenter Roeck wrote:
> > On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote:
> >> In bfq_insert_request(), bfqq is initialized with:
> >> 	bfqq = bfq_init_rq(rq);
> >> In bfq_init_rq(), we find:
> >> 	if (unlikely(!rq->elv.icq))
> >> 		return NULL;
> >> Indeed, rq->elv.icq can be NULL if the memory allocation in
> >> create_task_io_context() failed.
> >>
> > 
> > The above function should have been ioc_create_icq(), sorry.
> > 
> > Guenter
> > 
> >> A comment in bfq_insert_request() suggests that bfqq is supposed to be
> >> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
> >> debugging and practical experience shows, this is not the case in the
> >> above situation.
> >>
> >> This results in the following crash.
> >>
> >> Unable to handle kernel NULL pointer dereference
> >> 	at virtual address 00000000000001b0
> >> ...
> >> Call trace:
> >>  bfq_setup_cooperator+0x44/0x134
> >>  bfq_insert_requests+0x10c/0x630
> >>  blk_mq_sched_insert_requests+0x60/0xb4
> >>  blk_mq_flush_plug_list+0x290/0x2d4
> >>  blk_flush_plug_list+0xe0/0x230
> >>  blk_finish_plug+0x30/0x40
> >>  generic_writepages+0x60/0x94
> >>  blkdev_writepages+0x24/0x30
> >>  do_writepages+0x74/0xac
> >>  __filemap_fdatawrite_range+0x94/0xc8
> >>  file_write_and_wait_range+0x44/0xa0
> >>  blkdev_fsync+0x38/0x68
> >>  vfs_fsync_range+0x68/0x80
> >>  do_fsync+0x44/0x80
> >>
> >> The problem is relatively easy to reproduce by running an image with
> >> failslab enabled, such as with:
> >>
> >> cd /sys/kernel/debug/failslab
> >> echo 10 > probability
> >> echo 300 > times
> >>
> >> Avoid the problem by checking if bfqq is NULL before using it. With the
> >> NULL check in place, requests with missing io context are queued
> >> immediately, and the crash is no longer seen.
> >>
> 
> 
> What about other place use blk_init_rq()?
> E.g in bfq_request_merged():
> 1897                 struct bfq_queue *bfqq = bfq_init_rq(req);
> 1898                 struct bfq_data *bfqd = bfqq->bfqd;
> 
> Which may have same Null-pointer bug?
> 
Good question. Doug asked it as well. My understanding is that the code
won't hit those places since bfq is essentially bypassed. Of course,
I may be completely wrong, so we should wait for Paolo to confirm
if I my understanding is correct (or wrong).

Thanks,
Guenter

> -Bob
> 
> >> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
> >> Reported-by: Hsin-Yi Wang  <hsinyi@google.com>
> >> Cc: Hsin-Yi Wang <hsinyi@google.com>
> >> Cc: Nicolas Boichat <drinkcat@chromium.org>
> >> Cc: Doug Anderson <dianders@chromium.org>
> >> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> >> ---
> >>  block/bfq-iosched.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> >> index 72860325245a..56f3f4227010 100644
> >> --- a/block/bfq-iosched.c
> >> +++ b/block/bfq-iosched.c
> >> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>  
> >>  	spin_lock_irq(&bfqd->lock);
> >>  	bfqq = bfq_init_rq(rq);
> >> -	if (at_head || blk_rq_is_passthrough(rq)) {
> >> +	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
> >>  		if (at_head)
> >>  			list_add(&rq->queuelist, &bfqd->dispatch);
> >>  		else
> >> -- 
> >> 2.7.4
> >>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
  2019-07-22 17:30 [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request Guenter Roeck
  2019-07-22 20:29 ` Guenter Roeck
@ 2019-07-28 15:19 ` Guenter Roeck
  2019-07-30  8:55   ` Paolo Valente
  1 sibling, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2019-07-28 15:19 UTC (permalink / raw)
  To: Paolo Valente, Jens Axboe
  Cc: linux-block, linux-kernel, Hsin-Yi Wang, Nicolas Boichat, Doug Anderson

ping ... just in case this patch got lost in Paolo's queue.

Guenter

On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote:
> In bfq_insert_request(), bfqq is initialized with:
> 	bfqq = bfq_init_rq(rq);
> In bfq_init_rq(), we find:
> 	if (unlikely(!rq->elv.icq))
> 		return NULL;
> Indeed, rq->elv.icq can be NULL if the memory allocation in
> create_task_io_context() failed.
> 
> A comment in bfq_insert_request() suggests that bfqq is supposed to be
> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
> debugging and practical experience shows, this is not the case in the
> above situation.
> 
> This results in the following crash.
> 
> Unable to handle kernel NULL pointer dereference
> 	at virtual address 00000000000001b0
> ...
> Call trace:
>  bfq_setup_cooperator+0x44/0x134
>  bfq_insert_requests+0x10c/0x630
>  blk_mq_sched_insert_requests+0x60/0xb4
>  blk_mq_flush_plug_list+0x290/0x2d4
>  blk_flush_plug_list+0xe0/0x230
>  blk_finish_plug+0x30/0x40
>  generic_writepages+0x60/0x94
>  blkdev_writepages+0x24/0x30
>  do_writepages+0x74/0xac
>  __filemap_fdatawrite_range+0x94/0xc8
>  file_write_and_wait_range+0x44/0xa0
>  blkdev_fsync+0x38/0x68
>  vfs_fsync_range+0x68/0x80
>  do_fsync+0x44/0x80
> 
> The problem is relatively easy to reproduce by running an image with
> failslab enabled, such as with:
> 
> cd /sys/kernel/debug/failslab
> echo 10 > probability
> echo 300 > times
> 
> Avoid the problem by checking if bfqq is NULL before using it. With the
> NULL check in place, requests with missing io context are queued
> immediately, and the crash is no longer seen.
> 
> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
> Reported-by: Hsin-Yi Wang  <hsinyi@google.com>
> Cc: Hsin-Yi Wang <hsinyi@google.com>
> Cc: Nicolas Boichat <drinkcat@chromium.org>
> Cc: Doug Anderson <dianders@chromium.org>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
>  block/bfq-iosched.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 72860325245a..56f3f4227010 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  
>  	spin_lock_irq(&bfqd->lock);
>  	bfqq = bfq_init_rq(rq);
> -	if (at_head || blk_rq_is_passthrough(rq)) {
> +	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>  		if (at_head)
>  			list_add(&rq->queuelist, &bfqd->dispatch);
>  		else
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
  2019-07-28 15:19 ` Guenter Roeck
@ 2019-07-30  8:55   ` Paolo Valente
  2019-07-30 13:35     ` Guenter Roeck
  2019-07-30 17:06     ` Guenter Roeck
  0 siblings, 2 replies; 10+ messages in thread
From: Paolo Valente @ 2019-07-30  8:55 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jens Axboe, linux-block, kernel list, Hsin-Yi Wang,
	Nicolas Boichat, Doug Anderson

Hi Guenter,
sorry for the delay (Dolomiti's fault).

I didn't consider that rq->elv-icq might have been NULL also
because of OOM.  Thanks for spotting this issue.

As for the other places where the return value of bfq_init_rq is used,
unfortunately I think they matter too.  Those other places are related
to request merging, which is the alternative destiny of requests
(instead of being just inserted).  But, regardless of whether a
request is to be merged or inserted, that request may be destined to a
bfq_queue (possibly merged with a request already in a bfq_queue), and
a NULL return value by bfq_init_rq leads to a crash.  I guess you can
reproduce your failure also for the merge case, by generating
sequential, direct I/O with queue depth > 1, and of course by enabling
failslab.

If my considerations above are correct, do you want to propose a
complete fix yourself?

Thanks,
Paolo

> Il giorno 28 lug 2019, alle ore 17:19, Guenter Roeck <linux@roeck-us.net> ha scritto:
> 
> ping ... just in case this patch got lost in Paolo's queue.
> 
> Guenter
> 
> On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote:
>> In bfq_insert_request(), bfqq is initialized with:
>> 	bfqq = bfq_init_rq(rq);
>> In bfq_init_rq(), we find:
>> 	if (unlikely(!rq->elv.icq))
>> 		return NULL;
>> Indeed, rq->elv.icq can be NULL if the memory allocation in
>> create_task_io_context() failed.
>> 
>> A comment in bfq_insert_request() suggests that bfqq is supposed to be
>> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
>> debugging and practical experience shows, this is not the case in the
>> above situation.
>> 
>> This results in the following crash.
>> 
>> Unable to handle kernel NULL pointer dereference
>> 	at virtual address 00000000000001b0
>> ...
>> Call trace:
>> bfq_setup_cooperator+0x44/0x134
>> bfq_insert_requests+0x10c/0x630
>> blk_mq_sched_insert_requests+0x60/0xb4
>> blk_mq_flush_plug_list+0x290/0x2d4
>> blk_flush_plug_list+0xe0/0x230
>> blk_finish_plug+0x30/0x40
>> generic_writepages+0x60/0x94
>> blkdev_writepages+0x24/0x30
>> do_writepages+0x74/0xac
>> __filemap_fdatawrite_range+0x94/0xc8
>> file_write_and_wait_range+0x44/0xa0
>> blkdev_fsync+0x38/0x68
>> vfs_fsync_range+0x68/0x80
>> do_fsync+0x44/0x80
>> 
>> The problem is relatively easy to reproduce by running an image with
>> failslab enabled, such as with:
>> 
>> cd /sys/kernel/debug/failslab
>> echo 10 > probability
>> echo 300 > times
>> 
>> Avoid the problem by checking if bfqq is NULL before using it. With the
>> NULL check in place, requests with missing io context are queued
>> immediately, and the crash is no longer seen.
>> 
>> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
>> Reported-by: Hsin-Yi Wang  <hsinyi@google.com>
>> Cc: Hsin-Yi Wang <hsinyi@google.com>
>> Cc: Nicolas Boichat <drinkcat@chromium.org>
>> Cc: Doug Anderson <dianders@chromium.org>
>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>> ---
>> block/bfq-iosched.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 72860325245a..56f3f4227010 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>> 
>> 	spin_lock_irq(&bfqd->lock);
>> 	bfqq = bfq_init_rq(rq);
>> -	if (at_head || blk_rq_is_passthrough(rq)) {
>> +	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>> 		if (at_head)
>> 			list_add(&rq->queuelist, &bfqd->dispatch);
>> 		else
>> -- 
>> 2.7.4
>> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
  2019-07-30  8:55   ` Paolo Valente
@ 2019-07-30 13:35     ` Guenter Roeck
  2019-07-31 10:11       ` Paolo Valente
  2019-07-30 17:06     ` Guenter Roeck
  1 sibling, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2019-07-30 13:35 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, linux-block, kernel list, Hsin-Yi Wang,
	Nicolas Boichat, Doug Anderson

On 7/30/19 1:55 AM, Paolo Valente wrote:
> Hi Guenter,
> sorry for the delay (Dolomiti's fault).
> 
> I didn't consider that rq->elv-icq might have been NULL also
> because of OOM.  Thanks for spotting this issue.
> 
> As for the other places where the return value of bfq_init_rq is used,
> unfortunately I think they matter too.  Those other places are related
> to request merging, which is the alternative destiny of requests
> (instead of being just inserted).  But, regardless of whether a
> request is to be merged or inserted, that request may be destined to a
> bfq_queue (possibly merged with a request already in a bfq_queue), and
> a NULL return value by bfq_init_rq leads to a crash.  I guess you can
> reproduce your failure also for the merge case, by generating
> sequential, direct I/O with queue depth > 1, and of course by enabling
> failslab.
> 
My assumption was that requests would only be merged if they are associated
with the same io context. In that case, that IO context isn't reallocated
with ioc_create_icq() but reused, and icq would thus never be NULL.
I guess that assumption was wrong.

> If my considerations above are correct, do you want to propose a
> complete fix yourself?
> 

Sure, I'll send an updated patch.

Thanks,
Guenter

> Thanks,
> Paolo
> 
>> Il giorno 28 lug 2019, alle ore 17:19, Guenter Roeck <linux@roeck-us.net> ha scritto:
>>
>> ping ... just in case this patch got lost in Paolo's queue.
>>
>> Guenter
>>
>> On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote:
>>> In bfq_insert_request(), bfqq is initialized with:
>>> 	bfqq = bfq_init_rq(rq);
>>> In bfq_init_rq(), we find:
>>> 	if (unlikely(!rq->elv.icq))
>>> 		return NULL;
>>> Indeed, rq->elv.icq can be NULL if the memory allocation in
>>> create_task_io_context() failed.
>>>
>>> A comment in bfq_insert_request() suggests that bfqq is supposed to be
>>> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
>>> debugging and practical experience shows, this is not the case in the
>>> above situation.
>>>
>>> This results in the following crash.
>>>
>>> Unable to handle kernel NULL pointer dereference
>>> 	at virtual address 00000000000001b0
>>> ...
>>> Call trace:
>>> bfq_setup_cooperator+0x44/0x134
>>> bfq_insert_requests+0x10c/0x630
>>> blk_mq_sched_insert_requests+0x60/0xb4
>>> blk_mq_flush_plug_list+0x290/0x2d4
>>> blk_flush_plug_list+0xe0/0x230
>>> blk_finish_plug+0x30/0x40
>>> generic_writepages+0x60/0x94
>>> blkdev_writepages+0x24/0x30
>>> do_writepages+0x74/0xac
>>> __filemap_fdatawrite_range+0x94/0xc8
>>> file_write_and_wait_range+0x44/0xa0
>>> blkdev_fsync+0x38/0x68
>>> vfs_fsync_range+0x68/0x80
>>> do_fsync+0x44/0x80
>>>
>>> The problem is relatively easy to reproduce by running an image with
>>> failslab enabled, such as with:
>>>
>>> cd /sys/kernel/debug/failslab
>>> echo 10 > probability
>>> echo 300 > times
>>>
>>> Avoid the problem by checking if bfqq is NULL before using it. With the
>>> NULL check in place, requests with missing io context are queued
>>> immediately, and the crash is no longer seen.
>>>
>>> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
>>> Reported-by: Hsin-Yi Wang  <hsinyi@google.com>
>>> Cc: Hsin-Yi Wang <hsinyi@google.com>
>>> Cc: Nicolas Boichat <drinkcat@chromium.org>
>>> Cc: Doug Anderson <dianders@chromium.org>
>>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>>> ---
>>> block/bfq-iosched.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>> index 72860325245a..56f3f4227010 100644
>>> --- a/block/bfq-iosched.c
>>> +++ b/block/bfq-iosched.c
>>> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>
>>> 	spin_lock_irq(&bfqd->lock);
>>> 	bfqq = bfq_init_rq(rq);
>>> -	if (at_head || blk_rq_is_passthrough(rq)) {
>>> +	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>>> 		if (at_head)
>>> 			list_add(&rq->queuelist, &bfqd->dispatch);
>>> 		else
>>> -- 
>>> 2.7.4
>>>
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
  2019-07-30  8:55   ` Paolo Valente
  2019-07-30 13:35     ` Guenter Roeck
@ 2019-07-30 17:06     ` Guenter Roeck
  1 sibling, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2019-07-30 17:06 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, linux-block, kernel list, Hsin-Yi Wang,
	Nicolas Boichat, Doug Anderson

Paolo,

On Tue, Jul 30, 2019 at 10:55:24AM +0200, Paolo Valente wrote:
> Hi Guenter,
> sorry for the delay (Dolomiti's fault).
> 
> I didn't consider that rq->elv-icq might have been NULL also
> because of OOM.  Thanks for spotting this issue.
> 
> As for the other places where the return value of bfq_init_rq is used,
> unfortunately I think they matter too.  Those other places are related
> to request merging, which is the alternative destiny of requests
> (instead of being just inserted).  But, regardless of whether a
> request is to be merged or inserted, that request may be destined to a
> bfq_queue (possibly merged with a request already in a bfq_queue), and
> a NULL return value by bfq_init_rq leads to a crash.  I guess you can
> reproduce your failure also for the merge case, by generating
> sequential, direct I/O with queue depth > 1, and of course by enabling
> failslab.
> 
> If my considerations above are correct, do you want to propose a
> complete fix yourself?
> 

I had another look into the code. Unfortunately, both bfq_request_merged()
and bfq_requests_merged() simply assume that bfq_init_rq() never returns
NULL, and don't give me an idea for a path of action if it returns NULL
after all. I'll have to pass the problem off to you for a fix.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
  2019-07-30 13:35     ` Guenter Roeck
@ 2019-07-31 10:11       ` Paolo Valente
  2019-07-31 13:20         ` Guenter Roeck
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Valente @ 2019-07-31 10:11 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jens Axboe, linux-block, kernel list, Hsin-Yi Wang,
	Nicolas Boichat, Doug Anderson



> Il giorno 30 lug 2019, alle ore 15:35, Guenter Roeck <linux@roeck-us.net> ha scritto:
> 
> On 7/30/19 1:55 AM, Paolo Valente wrote:
>> Hi Guenter,
>> sorry for the delay (Dolomiti's fault).
>> I didn't consider that rq->elv-icq might have been NULL also
>> because of OOM.  Thanks for spotting this issue.
>> As for the other places where the return value of bfq_init_rq is used,
>> unfortunately I think they matter too.  Those other places are related
>> to request merging, which is the alternative destiny of requests
>> (instead of being just inserted).  But, regardless of whether a
>> request is to be merged or inserted, that request may be destined to a
>> bfq_queue (possibly merged with a request already in a bfq_queue), and
>> a NULL return value by bfq_init_rq leads to a crash.  I guess you can
>> reproduce your failure also for the merge case, by generating
>> sequential, direct I/O with queue depth > 1, and of course by enabling
>> failslab.
> My assumption was that requests would only be merged if they are associated
> with the same io context. In that case, that IO context isn't reallocated
> with ioc_create_icq() but reused, and icq would thus never be NULL.
> I guess that assumption was wrong.

I don't remember such a filtering.  I had a look again, but didn't
find anything relevant.  However, more competent people see these
emails.  Maybe someone can give us better advice.  Otherwise, to stay
on the safe side, I'd propose to handle any possible NULL return.

And I'll manage it, as per your request.

Thanks,
Paolo

> 
>> If my considerations above are correct, do you want to propose a
>> complete fix yourself?
> 
> Sure, I'll send an updated patch.
> 
> Thanks,
> Guenter
> 
>> Thanks,
>> Paolo
>>> Il giorno 28 lug 2019, alle ore 17:19, Guenter Roeck <linux@roeck-us.net> ha scritto:
>>> 
>>> ping ... just in case this patch got lost in Paolo's queue.
>>> 
>>> Guenter
>>> 
>>> On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote:
>>>> In bfq_insert_request(), bfqq is initialized with:
>>>> 	bfqq = bfq_init_rq(rq);
>>>> In bfq_init_rq(), we find:
>>>> 	if (unlikely(!rq->elv.icq))
>>>> 		return NULL;
>>>> Indeed, rq->elv.icq can be NULL if the memory allocation in
>>>> create_task_io_context() failed.
>>>> 
>>>> A comment in bfq_insert_request() suggests that bfqq is supposed to be
>>>> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
>>>> debugging and practical experience shows, this is not the case in the
>>>> above situation.
>>>> 
>>>> This results in the following crash.
>>>> 
>>>> Unable to handle kernel NULL pointer dereference
>>>> 	at virtual address 00000000000001b0
>>>> ...
>>>> Call trace:
>>>> bfq_setup_cooperator+0x44/0x134
>>>> bfq_insert_requests+0x10c/0x630
>>>> blk_mq_sched_insert_requests+0x60/0xb4
>>>> blk_mq_flush_plug_list+0x290/0x2d4
>>>> blk_flush_plug_list+0xe0/0x230
>>>> blk_finish_plug+0x30/0x40
>>>> generic_writepages+0x60/0x94
>>>> blkdev_writepages+0x24/0x30
>>>> do_writepages+0x74/0xac
>>>> __filemap_fdatawrite_range+0x94/0xc8
>>>> file_write_and_wait_range+0x44/0xa0
>>>> blkdev_fsync+0x38/0x68
>>>> vfs_fsync_range+0x68/0x80
>>>> do_fsync+0x44/0x80
>>>> 
>>>> The problem is relatively easy to reproduce by running an image with
>>>> failslab enabled, such as with:
>>>> 
>>>> cd /sys/kernel/debug/failslab
>>>> echo 10 > probability
>>>> echo 300 > times
>>>> 
>>>> Avoid the problem by checking if bfqq is NULL before using it. With the
>>>> NULL check in place, requests with missing io context are queued
>>>> immediately, and the crash is no longer seen.
>>>> 
>>>> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
>>>> Reported-by: Hsin-Yi Wang  <hsinyi@google.com>
>>>> Cc: Hsin-Yi Wang <hsinyi@google.com>
>>>> Cc: Nicolas Boichat <drinkcat@chromium.org>
>>>> Cc: Doug Anderson <dianders@chromium.org>
>>>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>>>> ---
>>>> block/bfq-iosched.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>>> index 72860325245a..56f3f4227010 100644
>>>> --- a/block/bfq-iosched.c
>>>> +++ b/block/bfq-iosched.c
>>>> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>> 
>>>> 	spin_lock_irq(&bfqd->lock);
>>>> 	bfqq = bfq_init_rq(rq);
>>>> -	if (at_head || blk_rq_is_passthrough(rq)) {
>>>> +	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>>>> 		if (at_head)
>>>> 			list_add(&rq->queuelist, &bfqd->dispatch);
>>>> 		else
>>>> -- 
>>>> 2.7.4
>>>> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request
  2019-07-31 10:11       ` Paolo Valente
@ 2019-07-31 13:20         ` Guenter Roeck
  0 siblings, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2019-07-31 13:20 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, linux-block, kernel list, Hsin-Yi Wang,
	Nicolas Boichat, Doug Anderson

On 7/31/19 3:11 AM, Paolo Valente wrote:
> 
> 
>> Il giorno 30 lug 2019, alle ore 15:35, Guenter Roeck <linux@roeck-us.net> ha scritto:
>>
>> On 7/30/19 1:55 AM, Paolo Valente wrote:
>>> Hi Guenter,
>>> sorry for the delay (Dolomiti's fault).
>>> I didn't consider that rq->elv-icq might have been NULL also
>>> because of OOM.  Thanks for spotting this issue.
>>> As for the other places where the return value of bfq_init_rq is used,
>>> unfortunately I think they matter too.  Those other places are related
>>> to request merging, which is the alternative destiny of requests
>>> (instead of being just inserted).  But, regardless of whether a
>>> request is to be merged or inserted, that request may be destined to a
>>> bfq_queue (possibly merged with a request already in a bfq_queue), and
>>> a NULL return value by bfq_init_rq leads to a crash.  I guess you can
>>> reproduce your failure also for the merge case, by generating
>>> sequential, direct I/O with queue depth > 1, and of course by enabling
>>> failslab.
>> My assumption was that requests would only be merged if they are associated
>> with the same io context. In that case, that IO context isn't reallocated
>> with ioc_create_icq() but reused, and icq would thus never be NULL.
>> I guess that assumption was wrong.
> 
> I don't remember such a filtering.  I had a look again, but didn't
> find anything relevant.  However, more competent people see these

Me not either, when I had a closer look yesterday. My conclusion was
that your analysis is correct.

Thanks,
Guenter

> emails.  Maybe someone can give us better advice.  Otherwise, to stay
> on the safe side, I'd propose to handle any possible NULL return.
> 
> And I'll manage it, as per your request.
> 
> Thanks,
> Paolo
> 
>>
>>> If my considerations above are correct, do you want to propose a
>>> complete fix yourself?
>>
>> Sure, I'll send an updated patch.
>>
>> Thanks,
>> Guenter
>>
>>> Thanks,
>>> Paolo
>>>> Il giorno 28 lug 2019, alle ore 17:19, Guenter Roeck <linux@roeck-us.net> ha scritto:
>>>>
>>>> ping ... just in case this patch got lost in Paolo's queue.
>>>>
>>>> Guenter
>>>>
>>>> On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote:
>>>>> In bfq_insert_request(), bfqq is initialized with:
>>>>> 	bfqq = bfq_init_rq(rq);
>>>>> In bfq_init_rq(), we find:
>>>>> 	if (unlikely(!rq->elv.icq))
>>>>> 		return NULL;
>>>>> Indeed, rq->elv.icq can be NULL if the memory allocation in
>>>>> create_task_io_context() failed.
>>>>>
>>>>> A comment in bfq_insert_request() suggests that bfqq is supposed to be
>>>>> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
>>>>> debugging and practical experience shows, this is not the case in the
>>>>> above situation.
>>>>>
>>>>> This results in the following crash.
>>>>>
>>>>> Unable to handle kernel NULL pointer dereference
>>>>> 	at virtual address 00000000000001b0
>>>>> ...
>>>>> Call trace:
>>>>> bfq_setup_cooperator+0x44/0x134
>>>>> bfq_insert_requests+0x10c/0x630
>>>>> blk_mq_sched_insert_requests+0x60/0xb4
>>>>> blk_mq_flush_plug_list+0x290/0x2d4
>>>>> blk_flush_plug_list+0xe0/0x230
>>>>> blk_finish_plug+0x30/0x40
>>>>> generic_writepages+0x60/0x94
>>>>> blkdev_writepages+0x24/0x30
>>>>> do_writepages+0x74/0xac
>>>>> __filemap_fdatawrite_range+0x94/0xc8
>>>>> file_write_and_wait_range+0x44/0xa0
>>>>> blkdev_fsync+0x38/0x68
>>>>> vfs_fsync_range+0x68/0x80
>>>>> do_fsync+0x44/0x80
>>>>>
>>>>> The problem is relatively easy to reproduce by running an image with
>>>>> failslab enabled, such as with:
>>>>>
>>>>> cd /sys/kernel/debug/failslab
>>>>> echo 10 > probability
>>>>> echo 300 > times
>>>>>
>>>>> Avoid the problem by checking if bfqq is NULL before using it. With the
>>>>> NULL check in place, requests with missing io context are queued
>>>>> immediately, and the crash is no longer seen.
>>>>>
>>>>> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
>>>>> Reported-by: Hsin-Yi Wang  <hsinyi@google.com>
>>>>> Cc: Hsin-Yi Wang <hsinyi@google.com>
>>>>> Cc: Nicolas Boichat <drinkcat@chromium.org>
>>>>> Cc: Doug Anderson <dianders@chromium.org>
>>>>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>>>>> ---
>>>>> block/bfq-iosched.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>>>> index 72860325245a..56f3f4227010 100644
>>>>> --- a/block/bfq-iosched.c
>>>>> +++ b/block/bfq-iosched.c
>>>>> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>>>
>>>>> 	spin_lock_irq(&bfqd->lock);
>>>>> 	bfqq = bfq_init_rq(rq);
>>>>> -	if (at_head || blk_rq_is_passthrough(rq)) {
>>>>> +	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>>>>> 		if (at_head)
>>>>> 			list_add(&rq->queuelist, &bfqd->dispatch);
>>>>> 		else
>>>>> -- 
>>>>> 2.7.4
>>>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-07-31 13:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22 17:30 [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request Guenter Roeck
2019-07-22 20:29 ` Guenter Roeck
2019-07-22 23:36   ` Bob Liu
2019-07-23  0:06     ` Guenter Roeck
2019-07-28 15:19 ` Guenter Roeck
2019-07-30  8:55   ` Paolo Valente
2019-07-30 13:35     ` Guenter Roeck
2019-07-31 10:11       ` Paolo Valente
2019-07-31 13:20         ` Guenter Roeck
2019-07-30 17:06     ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).