linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered
@ 2021-08-26  3:00 luojiaxing
  2021-08-26  7:29 ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: luojiaxing @ 2021-08-26  3:00 UTC (permalink / raw)
  To: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	john.garry, ming.lei
  Cc: luojiaxing

Dear all:


I meet some problem when test hisi_sas driver(under SCSI) based on 
5.14-rc4 kernel, it's found that error handle can not be triggered after

abnormal IO occur in some test with a low probability. For example, 
circularly run disk hardreset or disable all local phy of expander when 
running fio.


We add some tracepoint and print to see what happen, and we got the 
following information:

(1).print rq and rq_state at bt_tags_iter() to confirm how many IOs is 
running now.

<4>[  897.431182] bt_tags_iter: rqs[2808]: 0xffff202007bd3000; rq_state: 1
<4>[  897.437514] bt_tags_iter: rqs[3185]: 0xffff0020c5261e00; rq_state: 1
<4>[  897.443841] bt_tags_iter: rqs[3612]: 0xffff00212f242a00; rq_state: 1
<4>[  897.450167] bt_tags_iter: rqs[2808]: 0xffff00211d208100; rq_state: 1
<4>[  897.456492] bt_tags_iter: rqs[2921]: 0xffff00211d208100; rq_state: 1
<4>[  897.462818] bt_tags_iter: rqs[1214]: 0xffff002151d21b00; rq_state: 1
<4>[  897.469143] bt_tags_iter: rqs[2648]: 0xffff0020c4bfa200; rq_state: 1

The preceding information show that rq with tag[2808] is found in 
different hctx by bt_tags_iter() and with different pointer saved in 
tags->rqs[].

And tag[2808] own the same pointer value saved in rqs[] with tag[2921]. 
It's wrong because our driver share tag between all hctx, so it's not 
possible

to allocate one tag to different rq.


(2).check tracepoints(temporarily add) in blk_mq_get_driver_tag() and 
blk_mq_put_tag() to see where this tag is come from.

     Line 1322969:            <...>-20189   [013] .... 893.427707: 
blk_mq_get_driver_tag: rqs[2808]: 0xffff00211d208100
     Line 1322997:  irq/1161-hisi_s-7602    [012] d..1 893.427814: 
blk_mq_put_tag_in_free_request: rqs[2808]: 0xffff00211d208100
     Line 1331257:            <...>-20189   [013] .... 893.462663: 
blk_mq_get_driver_tag: rqs[2860]: 0xffff00211d208100
     Line 1331289:  irq/1161-hisi_s-7602    [012] d..1 893.462785: 
blk_mq_put_tag_in_free_request: rqs[2860]: 0xffff00211d208100
     Line 1338493:            <...>-20189   [013] .... 893.493519: 
blk_mq_get_driver_tag: rqs[2921]: 0xffff00211d208100

As we can see this rq is allocated to tag[2808] once, and finially come 
to tag[2921], but rqs[2808] still save the pointer.

There will be no problem until we encounter a rare situation.

For example, tag[2808] is reassigned to another hctx for execution, then 
some IO meet some error.

Before waking up the error handle thread, SCSI compares the values of 
scsi_host_busy() and shost->host_failed.

If the values are different, SCSI waits for the completion of some I/Os. 
According to the print provided by (1), the return value of 
scsi_host_busy() should be 7 for tag [2808] is calculated twice,

and the value of shost->host_failed is 6. As a result, this two values 
are never equal, and error handle cannot be triggered.


A temporary workaround is provided and can solve the problem as:

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731..e3dc773 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -190,6 +190,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct 
blk_mq_ctx *ctx,
                 BUG_ON(tag >= tags->nr_reserved_tags);
                 sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
         }
+       tags->rqs[tag] = NULL;
  }


Since we did not encounter this problem in some previous kernel 
versions, we wondered if the community already knew about the problem or 
could provide some solutions.


Thanks

Jiaxing




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered
  2021-08-26  3:00 rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered luojiaxing
@ 2021-08-26  7:29 ` Ming Lei
  2021-08-31  2:27   ` luojiaxing
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2021-08-26  7:29 UTC (permalink / raw)
  To: luojiaxing
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen, john.garry

On Thu, Aug 26, 2021 at 11:00:34AM +0800, luojiaxing wrote:
> Dear all:
> 
> 
> I meet some problem when test hisi_sas driver(under SCSI) based on 5.14-rc4
> kernel, it's found that error handle can not be triggered after
> 
> abnormal IO occur in some test with a low probability. For example,
> circularly run disk hardreset or disable all local phy of expander when
> running fio.
> 
> 
> We add some tracepoint and print to see what happen, and we got the
> following information:
> 
> (1).print rq and rq_state at bt_tags_iter() to confirm how many IOs is
> running now.
> 
> <4>[  897.431182] bt_tags_iter: rqs[2808]: 0xffff202007bd3000; rq_state: 1
> <4>[  897.437514] bt_tags_iter: rqs[3185]: 0xffff0020c5261e00; rq_state: 1
> <4>[  897.443841] bt_tags_iter: rqs[3612]: 0xffff00212f242a00; rq_state: 1
> <4>[  897.450167] bt_tags_iter: rqs[2808]: 0xffff00211d208100; rq_state: 1
> <4>[  897.456492] bt_tags_iter: rqs[2921]: 0xffff00211d208100; rq_state: 1
> <4>[  897.462818] bt_tags_iter: rqs[1214]: 0xffff002151d21b00; rq_state: 1
> <4>[  897.469143] bt_tags_iter: rqs[2648]: 0xffff0020c4bfa200; rq_state: 1
> 
> The preceding information show that rq with tag[2808] is found in different
> hctx by bt_tags_iter() and with different pointer saved in tags->rqs[].
> 
> And tag[2808] own the same pointer value saved in rqs[] with tag[2921]. It's
> wrong because our driver share tag between all hctx, so it's not possible

What is your io scheduler? I guess it is deadline, and can you observe
such issue by switching to none?

The tricky thing is that one request dumped may be re-allocated to other tag
after returning from bt_tags_iter().

> 
> to allocate one tag to different rq.
> 
> 
> (2).check tracepoints(temporarily add) in blk_mq_get_driver_tag() and
> blk_mq_put_tag() to see where this tag is come from.
> 
>     Line 1322969:            <...>-20189   [013] .... 893.427707:
> blk_mq_get_driver_tag: rqs[2808]: 0xffff00211d208100
>     Line 1322997:  irq/1161-hisi_s-7602    [012] d..1 893.427814:
> blk_mq_put_tag_in_free_request: rqs[2808]: 0xffff00211d208100
>     Line 1331257:            <...>-20189   [013] .... 893.462663:
> blk_mq_get_driver_tag: rqs[2860]: 0xffff00211d208100
>     Line 1331289:  irq/1161-hisi_s-7602    [012] d..1 893.462785:
> blk_mq_put_tag_in_free_request: rqs[2860]: 0xffff00211d208100
>     Line 1338493:            <...>-20189   [013] .... 893.493519:
> blk_mq_get_driver_tag: rqs[2921]: 0xffff00211d208100
> 
> As we can see this rq is allocated to tag[2808] once, and finially come to
> tag[2921], but rqs[2808] still save the pointer.

Yeah, we know this kind of handling, but not see it as issue.

> 
> There will be no problem until we encounter a rare situation.
> 
> For example, tag[2808] is reassigned to another hctx for execution, then
> some IO meet some error.

I guess the race is triggered when 2808 is just assigned, meantime
->rqs[] isn't updated.

> 
> Before waking up the error handle thread, SCSI compares the values of
> scsi_host_busy() and shost->host_failed.
> 
> If the values are different, SCSI waits for the completion of some I/Os.
> According to the print provided by (1), the return value of scsi_host_busy()
> should be 7 for tag [2808] is calculated twice,
> 
> and the value of shost->host_failed is 6. As a result, this two values are
> never equal, and error handle cannot be triggered.
> 
> 
> A temporary workaround is provided and can solve the problem as:
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 2a37731..e3dc773 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -190,6 +190,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct
> blk_mq_ctx *ctx,
>                 BUG_ON(tag >= tags->nr_reserved_tags);
>                 sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
>         }
> +       tags->rqs[tag] = NULL;
>  }
> 
> 
> Since we did not encounter this problem in some previous kernel versions, we
> wondered if the community already knew about the problem or could provide
> some solutions.

Can you try the following patch?

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 86f87346232a..97557ba0737f 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -301,7 +301,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 		return true;
 
 	if (!(iter_data->flags & BT_TAG_ITER_STARTED) ||
-	    blk_mq_request_started(rq))
+	    (blk_mq_request_started(rq) && rq->tag == bitnr))
 		ret = iter_data->fn(rq, iter_data->data, reserved);
 	if (!iter_static_rqs)
 		blk_mq_put_rq_ref(rq);



Thanks,
Ming


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered
  2021-08-26  7:29 ` Ming Lei
@ 2021-08-31  2:27   ` luojiaxing
  2021-08-31  8:37     ` Ming Lei
  2021-09-01  9:36     ` Ming Lei
  0 siblings, 2 replies; 8+ messages in thread
From: luojiaxing @ 2021-08-31  2:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen, john.garry

Hi, Ming


Sorry to reply so late, This issue occur in low probability,

so it take some time to confirm.


On 2021/8/26 15:29, Ming Lei wrote:
> On Thu, Aug 26, 2021 at 11:00:34AM +0800, luojiaxing wrote:
>> Dear all:
>>
>>
>> I meet some problem when test hisi_sas driver(under SCSI) based on 5.14-rc4
>> kernel, it's found that error handle can not be triggered after
>>
>> abnormal IO occur in some test with a low probability. For example,
>> circularly run disk hardreset or disable all local phy of expander when
>> running fio.
>>
>>
>> We add some tracepoint and print to see what happen, and we got the
>> following information:
>>
>> (1).print rq and rq_state at bt_tags_iter() to confirm how many IOs is
>> running now.
>>
>> <4>[  897.431182] bt_tags_iter: rqs[2808]: 0xffff202007bd3000; rq_state: 1
>> <4>[  897.437514] bt_tags_iter: rqs[3185]: 0xffff0020c5261e00; rq_state: 1
>> <4>[  897.443841] bt_tags_iter: rqs[3612]: 0xffff00212f242a00; rq_state: 1
>> <4>[  897.450167] bt_tags_iter: rqs[2808]: 0xffff00211d208100; rq_state: 1
>> <4>[  897.456492] bt_tags_iter: rqs[2921]: 0xffff00211d208100; rq_state: 1
>> <4>[  897.462818] bt_tags_iter: rqs[1214]: 0xffff002151d21b00; rq_state: 1
>> <4>[  897.469143] bt_tags_iter: rqs[2648]: 0xffff0020c4bfa200; rq_state: 1
>>
>> The preceding information show that rq with tag[2808] is found in different
>> hctx by bt_tags_iter() and with different pointer saved in tags->rqs[].
>>
>> And tag[2808] own the same pointer value saved in rqs[] with tag[2921]. It's
>> wrong because our driver share tag between all hctx, so it's not possible
> What is your io scheduler? I guess it is deadline,


yes


>   and can you observe
> such issue by switching to none?


Yes, it happen when switched to none


>
> The tricky thing is that one request dumped may be re-allocated to other tag
> after returning from bt_tags_iter().
>
>> to allocate one tag to different rq.
>>
>>
>> (2).check tracepoints(temporarily add) in blk_mq_get_driver_tag() and
>> blk_mq_put_tag() to see where this tag is come from.
>>
>>      Line 1322969:            <...>-20189   [013] .... 893.427707:
>> blk_mq_get_driver_tag: rqs[2808]: 0xffff00211d208100
>>      Line 1322997:  irq/1161-hisi_s-7602    [012] d..1 893.427814:
>> blk_mq_put_tag_in_free_request: rqs[2808]: 0xffff00211d208100
>>      Line 1331257:            <...>-20189   [013] .... 893.462663:
>> blk_mq_get_driver_tag: rqs[2860]: 0xffff00211d208100
>>      Line 1331289:  irq/1161-hisi_s-7602    [012] d..1 893.462785:
>> blk_mq_put_tag_in_free_request: rqs[2860]: 0xffff00211d208100
>>      Line 1338493:            <...>-20189   [013] .... 893.493519:
>> blk_mq_get_driver_tag: rqs[2921]: 0xffff00211d208100
>>
>> As we can see this rq is allocated to tag[2808] once, and finially come to
>> tag[2921], but rqs[2808] still save the pointer.
> Yeah, we know this kind of handling, but not see it as issue.
>
>> There will be no problem until we encounter a rare situation.
>>
>> For example, tag[2808] is reassigned to another hctx for execution, then
>> some IO meet some error.
> I guess the race is triggered when 2808 is just assigned, meantime
> ->rqs[] isn't updated.


As we shared tag between hctx, so if 2808 was assinged to other hctx.

So previous hctx's rqs will not updated。


>> Before waking up the error handle thread, SCSI compares the values of
>> scsi_host_busy() and shost->host_failed.
>>
>> If the values are different, SCSI waits for the completion of some I/Os.
>> According to the print provided by (1), the return value of scsi_host_busy()
>> should be 7 for tag [2808] is calculated twice,
>>
>> and the value of shost->host_failed is 6. As a result, this two values are
>> never equal, and error handle cannot be triggered.
>>
>>
>> A temporary workaround is provided and can solve the problem as:
>>
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 2a37731..e3dc773 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -190,6 +190,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct
>> blk_mq_ctx *ctx,
>>                  BUG_ON(tag >= tags->nr_reserved_tags);
>>                  sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
>>          }
>> +       tags->rqs[tag] = NULL;
>>   }
>>
>>
>> Since we did not encounter this problem in some previous kernel versions, we
>> wondered if the community already knew about the problem or could provide
>> some solutions.
> Can you try the following patch?


I tested it. it can fix the bug.


However, if there is still a problem in the following scenario? For 
example, driver tag 0 is assigned

to rq0 in hctx0, and reclaimed after rq completed. Next time driver tag 
0 is still assigned to rq0 but

in hctx1. So at this time,  bt_tags_iter will still got two rqs.


Thanks

Jiaxing


>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 86f87346232a..97557ba0737f 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -301,7 +301,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>   		return true;
>   
>   	if (!(iter_data->flags & BT_TAG_ITER_STARTED) ||
> -	    blk_mq_request_started(rq))
> +	    (blk_mq_request_started(rq) && rq->tag == bitnr))
>   		ret = iter_data->fn(rq, iter_data->data, reserved);
>   	if (!iter_static_rqs)
>   		blk_mq_put_rq_ref(rq);
>
>
>
> Thanks,
> Ming
>
>
> .
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered
  2021-08-31  2:27   ` luojiaxing
@ 2021-08-31  8:37     ` Ming Lei
  2021-09-01  7:07       ` luojiaxing
  2021-09-01  9:36     ` Ming Lei
  1 sibling, 1 reply; 8+ messages in thread
From: Ming Lei @ 2021-08-31  8:37 UTC (permalink / raw)
  To: luojiaxing
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen, john.garry

On Tue, Aug 31, 2021 at 10:27:28AM +0800, luojiaxing wrote:
> Hi, Ming
> 
> 
> Sorry to reply so late, This issue occur in low probability,
> 
> so it take some time to confirm.
> 
> 
> On 2021/8/26 15:29, Ming Lei wrote:
> > On Thu, Aug 26, 2021 at 11:00:34AM +0800, luojiaxing wrote:
> > > Dear all:
> > > 
> > > 
> > > I meet some problem when test hisi_sas driver(under SCSI) based on 5.14-rc4
> > > kernel, it's found that error handle can not be triggered after
> > > 
> > > abnormal IO occur in some test with a low probability. For example,
> > > circularly run disk hardreset or disable all local phy of expander when
> > > running fio.
> > > 
> > > 
> > > We add some tracepoint and print to see what happen, and we got the
> > > following information:
> > > 
> > > (1).print rq and rq_state at bt_tags_iter() to confirm how many IOs is
> > > running now.
> > > 
> > > <4>[  897.431182] bt_tags_iter: rqs[2808]: 0xffff202007bd3000; rq_state: 1
> > > <4>[  897.437514] bt_tags_iter: rqs[3185]: 0xffff0020c5261e00; rq_state: 1
> > > <4>[  897.443841] bt_tags_iter: rqs[3612]: 0xffff00212f242a00; rq_state: 1
> > > <4>[  897.450167] bt_tags_iter: rqs[2808]: 0xffff00211d208100; rq_state: 1
> > > <4>[  897.456492] bt_tags_iter: rqs[2921]: 0xffff00211d208100; rq_state: 1
> > > <4>[  897.462818] bt_tags_iter: rqs[1214]: 0xffff002151d21b00; rq_state: 1
> > > <4>[  897.469143] bt_tags_iter: rqs[2648]: 0xffff0020c4bfa200; rq_state: 1
> > > 
> > > The preceding information show that rq with tag[2808] is found in different
> > > hctx by bt_tags_iter() and with different pointer saved in tags->rqs[].
> > > 
> > > And tag[2808] own the same pointer value saved in rqs[] with tag[2921]. It's
> > > wrong because our driver share tag between all hctx, so it's not possible
> > What is your io scheduler? I guess it is deadline,
> 
> 
> yes
> 
> 
> >   and can you observe
> > such issue by switching to none?
> 
> 
> Yes, it happen when switched to none
> 
> 
> > 
> > The tricky thing is that one request dumped may be re-allocated to other tag
> > after returning from bt_tags_iter().
> > 
> > > to allocate one tag to different rq.
> > > 
> > > 
> > > (2).check tracepoints(temporarily add) in blk_mq_get_driver_tag() and
> > > blk_mq_put_tag() to see where this tag is come from.
> > > 
> > >      Line 1322969:            <...>-20189   [013] .... 893.427707:
> > > blk_mq_get_driver_tag: rqs[2808]: 0xffff00211d208100
> > >      Line 1322997:  irq/1161-hisi_s-7602    [012] d..1 893.427814:
> > > blk_mq_put_tag_in_free_request: rqs[2808]: 0xffff00211d208100
> > >      Line 1331257:            <...>-20189   [013] .... 893.462663:
> > > blk_mq_get_driver_tag: rqs[2860]: 0xffff00211d208100
> > >      Line 1331289:  irq/1161-hisi_s-7602    [012] d..1 893.462785:
> > > blk_mq_put_tag_in_free_request: rqs[2860]: 0xffff00211d208100
> > >      Line 1338493:            <...>-20189   [013] .... 893.493519:
> > > blk_mq_get_driver_tag: rqs[2921]: 0xffff00211d208100
> > > 
> > > As we can see this rq is allocated to tag[2808] once, and finially come to
> > > tag[2921], but rqs[2808] still save the pointer.
> > Yeah, we know this kind of handling, but not see it as issue.
> > 
> > > There will be no problem until we encounter a rare situation.
> > > 
> > > For example, tag[2808] is reassigned to another hctx for execution, then
> > > some IO meet some error.
> > I guess the race is triggered when 2808 is just assigned, meantime
> > ->rqs[] isn't updated.
> 
> 
> As we shared tag between hctx, so if 2808 was assinged to other hctx.
> 
> So previous hctx's rqs will not updated。

request->state is updated before releasing the tag, and we always grab
request refcount which prevents the tag from being re-used.

scsi_host_busy() counts the transient host state, and it is fine to see
same request reused from different tags, but when ->fn() is calling, the
request's state is started, as you observed in the log 1.

> 
> 
> > > Before waking up the error handle thread, SCSI compares the values of
> > > scsi_host_busy() and shost->host_failed.
> > > 
> > > If the values are different, SCSI waits for the completion of some I/Os.
> > > According to the print provided by (1), the return value of scsi_host_busy()
> > > should be 7 for tag [2808] is calculated twice,
> > > 
> > > and the value of shost->host_failed is 6. As a result, this two values are
> > > never equal, and error handle cannot be triggered.
> > > 
> > > 
> > > A temporary workaround is provided and can solve the problem as:
> > > 
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 2a37731..e3dc773 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -190,6 +190,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct
> > > blk_mq_ctx *ctx,
> > >                  BUG_ON(tag >= tags->nr_reserved_tags);
> > >                  sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
> > >          }
> > > +       tags->rqs[tag] = NULL;
> > >   }
> > > 
> > > 
> > > Since we did not encounter this problem in some previous kernel versions, we
> > > wondered if the community already knew about the problem or could provide
> > > some solutions.
> > Can you try the following patch?
> 
> 
> I tested it. it can fix the bug.
> 
> 
> However, if there is still a problem in the following scenario? For example,
> driver tag 0 is assigned
> 
> to rq0 in hctx0, and reclaimed after rq completed. Next time driver tag 0 is
> still assigned to rq0 but
> 
> in hctx1. So at this time,  bt_tags_iter will still got two rqs.

Looks I missed that you were talking about shared sbitmap, maybe you
need the following change? Otherwise you may get duplicated counting
since all tags points to single same&shared tags.

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 86f87346232a..36fc696e83d9 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -380,8 +380,10 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv)
 {
 	int i;
+	int queues = blk_mq_is_sbitmap_shared(tagset->flags) ? 1:
+		tagset->nr_hw_queues;
 
-	for (i = 0; i < tagset->nr_hw_queues; i++) {
+	for (i = 0; i < queues; i++) {
 		if (tagset->tags && tagset->tags[i])
 			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
 					      BT_TAG_ITER_STARTED);


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered
  2021-08-31  8:37     ` Ming Lei
@ 2021-09-01  7:07       ` luojiaxing
  0 siblings, 0 replies; 8+ messages in thread
From: luojiaxing @ 2021-09-01  7:07 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen, john.garry


On 2021/8/31 16:37, Ming Lei wrote:
> On Tue, Aug 31, 2021 at 10:27:28AM +0800, luojiaxing wrote:
>> Hi, Ming
>>
>>
>> Sorry to reply so late, This issue occur in low probability,
>>
>> so it take some time to confirm.
>>
>>
>> On 2021/8/26 15:29, Ming Lei wrote:
>>> On Thu, Aug 26, 2021 at 11:00:34AM +0800, luojiaxing wrote:
>>>> Dear all:
>>>>
>>>>
>>>> I meet some problem when test hisi_sas driver(under SCSI) based on 5.14-rc4
>>>> kernel, it's found that error handle can not be triggered after
>>>>
>>>> abnormal IO occur in some test with a low probability. For example,
>>>> circularly run disk hardreset or disable all local phy of expander when
>>>> running fio.
>>>>
>>>>
>>>> We add some tracepoint and print to see what happen, and we got the
>>>> following information:
>>>>
>>>> (1).print rq and rq_state at bt_tags_iter() to confirm how many IOs is
>>>> running now.
>>>>
>>>> <4>[  897.431182] bt_tags_iter: rqs[2808]: 0xffff202007bd3000; rq_state: 1
>>>> <4>[  897.437514] bt_tags_iter: rqs[3185]: 0xffff0020c5261e00; rq_state: 1
>>>> <4>[  897.443841] bt_tags_iter: rqs[3612]: 0xffff00212f242a00; rq_state: 1
>>>> <4>[  897.450167] bt_tags_iter: rqs[2808]: 0xffff00211d208100; rq_state: 1
>>>> <4>[  897.456492] bt_tags_iter: rqs[2921]: 0xffff00211d208100; rq_state: 1
>>>> <4>[  897.462818] bt_tags_iter: rqs[1214]: 0xffff002151d21b00; rq_state: 1
>>>> <4>[  897.469143] bt_tags_iter: rqs[2648]: 0xffff0020c4bfa200; rq_state: 1
>>>>
>>>> The preceding information show that rq with tag[2808] is found in different
>>>> hctx by bt_tags_iter() and with different pointer saved in tags->rqs[].
>>>>
>>>> And tag[2808] own the same pointer value saved in rqs[] with tag[2921]. It's
>>>> wrong because our driver share tag between all hctx, so it's not possible
>>> What is your io scheduler? I guess it is deadline,
>>
>> yes
>>
>>
>>>    and can you observe
>>> such issue by switching to none?
>>
>> Yes, it happen when switched to none
>>
>>
>>> The tricky thing is that one request dumped may be re-allocated to other tag
>>> after returning from bt_tags_iter().
>>>
>>>> to allocate one tag to different rq.
>>>>
>>>>
>>>> (2).check tracepoints(temporarily add) in blk_mq_get_driver_tag() and
>>>> blk_mq_put_tag() to see where this tag is come from.
>>>>
>>>>       Line 1322969:            <...>-20189   [013] .... 893.427707:
>>>> blk_mq_get_driver_tag: rqs[2808]: 0xffff00211d208100
>>>>       Line 1322997:  irq/1161-hisi_s-7602    [012] d..1 893.427814:
>>>> blk_mq_put_tag_in_free_request: rqs[2808]: 0xffff00211d208100
>>>>       Line 1331257:            <...>-20189   [013] .... 893.462663:
>>>> blk_mq_get_driver_tag: rqs[2860]: 0xffff00211d208100
>>>>       Line 1331289:  irq/1161-hisi_s-7602    [012] d..1 893.462785:
>>>> blk_mq_put_tag_in_free_request: rqs[2860]: 0xffff00211d208100
>>>>       Line 1338493:            <...>-20189   [013] .... 893.493519:
>>>> blk_mq_get_driver_tag: rqs[2921]: 0xffff00211d208100
>>>>
>>>> As we can see this rq is allocated to tag[2808] once, and finially come to
>>>> tag[2921], but rqs[2808] still save the pointer.
>>> Yeah, we know this kind of handling, but not see it as issue.
>>>
>>>> There will be no problem until we encounter a rare situation.
>>>>
>>>> For example, tag[2808] is reassigned to another hctx for execution, then
>>>> some IO meet some error.
>>> I guess the race is triggered when 2808 is just assigned, meantime
>>> ->rqs[] isn't updated.
>>
>> As we shared tag between hctx, so if 2808 was assinged to other hctx.
>>
>> So previous hctx's rqs will not updated。
> request->state is updated before releasing the tag, and we always grab
> request refcount which prevents the tag from being re-used.
>
> scsi_host_busy() counts the transient host state, and it is fine to see
> same request reused from different tags, but when ->fn() is calling, the
> request's state is started, as you observed in the log 1.
>
>>
>>>> Before waking up the error handle thread, SCSI compares the values of
>>>> scsi_host_busy() and shost->host_failed.
>>>>
>>>> If the values are different, SCSI waits for the completion of some I/Os.
>>>> According to the print provided by (1), the return value of scsi_host_busy()
>>>> should be 7 for tag [2808] is calculated twice,
>>>>
>>>> and the value of shost->host_failed is 6. As a result, this two values are
>>>> never equal, and error handle cannot be triggered.
>>>>
>>>>
>>>> A temporary workaround is provided and can solve the problem as:
>>>>
>>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>>> index 2a37731..e3dc773 100644
>>>> --- a/block/blk-mq-tag.c
>>>> +++ b/block/blk-mq-tag.c
>>>> @@ -190,6 +190,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct
>>>> blk_mq_ctx *ctx,
>>>>                   BUG_ON(tag >= tags->nr_reserved_tags);
>>>>                   sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
>>>>           }
>>>> +       tags->rqs[tag] = NULL;
>>>>    }
>>>>
>>>>
>>>> Since we did not encounter this problem in some previous kernel versions, we
>>>> wondered if the community already knew about the problem or could provide
>>>> some solutions.
>>> Can you try the following patch?
>>
>> I tested it. it can fix the bug.
>>
>>
>> However, if there is still a problem in the following scenario? For example,
>> driver tag 0 is assigned
>>
>> to rq0 in hctx0, and reclaimed after rq completed. Next time driver tag 0 is
>> still assigned to rq0 but
>>
>> in hctx1. So at this time,  bt_tags_iter will still got two rqs.
> Looks I missed that you were talking about shared sbitmap, maybe you
> need the following change? Otherwise you may get duplicated counting
> since all tags points to single same&shared tags.


I test the following change, it can not fix the issue.

BTW, even if my HW queue shared sbitmap, but each queue didn't shared 
blk_mq_tags. so the following change look strange.


I wonder why not clear the rq pointer saved in tags->rqs[], I don't see 
any benefit to keep these pointer after tag is put.


Thanks

Jiaxing


>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 86f87346232a..36fc696e83d9 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -380,8 +380,10 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
>   		busy_tag_iter_fn *fn, void *priv)
>   {
>   	int i;
> +	int queues = blk_mq_is_sbitmap_shared(tagset->flags) ? 1:
> +		tagset->nr_hw_queues;
>   
> -	for (i = 0; i < tagset->nr_hw_queues; i++) {
> +	for (i = 0; i < queues; i++) {
>   		if (tagset->tags && tagset->tags[i])
>   			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
>   					      BT_TAG_ITER_STARTED);
>
>
> Thanks,
> Ming
>
>
> .
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered
  2021-08-31  2:27   ` luojiaxing
  2021-08-31  8:37     ` Ming Lei
@ 2021-09-01  9:36     ` Ming Lei
  2021-09-01  9:50       ` Ming Lei
  2021-09-06  8:16       ` luojiaxing
  1 sibling, 2 replies; 8+ messages in thread
From: Ming Lei @ 2021-09-01  9:36 UTC (permalink / raw)
  To: luojiaxing
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen, john.garry

On Tue, Aug 31, 2021 at 10:27:28AM +0800, luojiaxing wrote:
> Hi, Ming
> 
> 
> Sorry to reply so late, This issue occur in low probability,
> 
> so it take some time to confirm.
> 
> 
> On 2021/8/26 15:29, Ming Lei wrote:
> > On Thu, Aug 26, 2021 at 11:00:34AM +0800, luojiaxing wrote:
> > > Dear all:
> > > 
> > > 
> > > I meet some problem when test hisi_sas driver(under SCSI) based on 5.14-rc4
> > > kernel, it's found that error handle can not be triggered after
> > > 
> > > abnormal IO occur in some test with a low probability. For example,
> > > circularly run disk hardreset or disable all local phy of expander when
> > > running fio.
> > > 
> > > 
> > > We add some tracepoint and print to see what happen, and we got the
> > > following information:
> > > 
> > > (1).print rq and rq_state at bt_tags_iter() to confirm how many IOs is
> > > running now.
> > > 
> > > <4>[  897.431182] bt_tags_iter: rqs[2808]: 0xffff202007bd3000; rq_state: 1
> > > <4>[  897.437514] bt_tags_iter: rqs[3185]: 0xffff0020c5261e00; rq_state: 1
> > > <4>[  897.443841] bt_tags_iter: rqs[3612]: 0xffff00212f242a00; rq_state: 1
> > > <4>[  897.450167] bt_tags_iter: rqs[2808]: 0xffff00211d208100; rq_state: 1
> > > <4>[  897.456492] bt_tags_iter: rqs[2921]: 0xffff00211d208100; rq_state: 1
> > > <4>[  897.462818] bt_tags_iter: rqs[1214]: 0xffff002151d21b00; rq_state: 1
> > > <4>[  897.469143] bt_tags_iter: rqs[2648]: 0xffff0020c4bfa200; rq_state: 1
> > > 
> > > The preceding information show that rq with tag[2808] is found in different
> > > hctx by bt_tags_iter() and with different pointer saved in tags->rqs[].
> > > 
> > > And tag[2808] own the same pointer value saved in rqs[] with tag[2921]. It's
> > > wrong because our driver share tag between all hctx, so it's not possible
> > What is your io scheduler? I guess it is deadline,
> 
> 
> yes
> 
> 
> >   and can you observe
> > such issue by switching to none?
> 
> 
> Yes, it happen when switched to none
> 
> 
> > 
> > The tricky thing is that one request dumped may be re-allocated to other tag
> > after returning from bt_tags_iter().
> > 
> > > to allocate one tag to different rq.
> > > 
> > > 
> > > (2).check tracepoints(temporarily add) in blk_mq_get_driver_tag() and
> > > blk_mq_put_tag() to see where this tag is come from.
> > > 
> > >      Line 1322969:            <...>-20189   [013] .... 893.427707:
> > > blk_mq_get_driver_tag: rqs[2808]: 0xffff00211d208100
> > >      Line 1322997:  irq/1161-hisi_s-7602    [012] d..1 893.427814:
> > > blk_mq_put_tag_in_free_request: rqs[2808]: 0xffff00211d208100
> > >      Line 1331257:            <...>-20189   [013] .... 893.462663:
> > > blk_mq_get_driver_tag: rqs[2860]: 0xffff00211d208100
> > >      Line 1331289:  irq/1161-hisi_s-7602    [012] d..1 893.462785:
> > > blk_mq_put_tag_in_free_request: rqs[2860]: 0xffff00211d208100
> > >      Line 1338493:            <...>-20189   [013] .... 893.493519:
> > > blk_mq_get_driver_tag: rqs[2921]: 0xffff00211d208100
> > > 
> > > As we can see this rq is allocated to tag[2808] once, and finially come to
> > > tag[2921], but rqs[2808] still save the pointer.
> > Yeah, we know this kind of handling, but not see it as issue.
> > 
> > > There will be no problem until we encounter a rare situation.
> > > 
> > > For example, tag[2808] is reassigned to another hctx for execution, then
> > > some IO meet some error.
> > I guess the race is triggered when 2808 is just assigned, meantime
> > ->rqs[] isn't updated.
> 
> 
> As we shared tag between hctx, so if 2808 was assinged to other hctx.
> 
> So previous hctx's rqs will not updated。
> 
> 
> > > Before waking up the error handle thread, SCSI compares the values of
> > > scsi_host_busy() and shost->host_failed.
> > > 
> > > If the values are different, SCSI waits for the completion of some I/Os.
> > > According to the print provided by (1), the return value of scsi_host_busy()
> > > should be 7 for tag [2808] is calculated twice,
> > > 
> > > and the value of shost->host_failed is 6. As a result, this two values are
> > > never equal, and error handle cannot be triggered.
> > > 
> > > 
> > > A temporary workaround is provided and can solve the problem as:
> > > 
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 2a37731..e3dc773 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -190,6 +190,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct
> > > blk_mq_ctx *ctx,
> > >                  BUG_ON(tag >= tags->nr_reserved_tags);
> > >                  sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
> > >          }
> > > +       tags->rqs[tag] = NULL;
> > >   }
> > > 
> > > 
> > > Since we did not encounter this problem in some previous kernel versions, we
> > > wondered if the community already knew about the problem or could provide
> > > some solutions.
> > Can you try the following patch?
> 
> 
> I tested it. it can fix the bug.
> 
> 
> However, if there is still a problem in the following scenario? For example,
> driver tag 0 is assigned
> 
> to rq0 in hctx0, and reclaimed after rq completed. Next time driver tag 0 is
> still assigned to rq0 but
> 
> in hctx1. So at this time,  bt_tags_iter will still got two rqs.

Each hctx has its own rq pool so far, so no such issue you worried.

John's patch works towards sharing rq pool among hctxs in case of
shared sbitmap, not merged yet, but ->rqs[] should be shared too, still
no such issue.

Follows the revised patch for handling the stale request in ->rqs[] issue:


diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 86f87346232a..ff5caeb82542 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -208,7 +208,7 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
 
 	spin_lock_irqsave(&tags->lock, flags);
 	rq = tags->rqs[bitnr];
-	if (!rq || !refcount_inc_not_zero(&rq->ref))
+	if (!rq || rq->tag != bitnr || !refcount_inc_not_zero(&rq->ref))
 		rq = NULL;
 	spin_unlock_irqrestore(&tags->lock, flags);
 	return rq;


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered
  2021-09-01  9:36     ` Ming Lei
@ 2021-09-01  9:50       ` Ming Lei
  2021-09-06  8:16       ` luojiaxing
  1 sibling, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-09-01  9:50 UTC (permalink / raw)
  To: luojiaxing
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen, john.garry

On Wed, Sep 01, 2021 at 05:36:55PM +0800, Ming Lei wrote:
> On Tue, Aug 31, 2021 at 10:27:28AM +0800, luojiaxing wrote:
> > Hi, Ming
> > 
> > 
> > Sorry to reply so late, This issue occur in low probability,
> > 
> > so it take some time to confirm.
> > 
> > 
> > On 2021/8/26 15:29, Ming Lei wrote:
> > > On Thu, Aug 26, 2021 at 11:00:34AM +0800, luojiaxing wrote:
> > > > Dear all:
> > > > 
> > > > 
> > > > I meet some problem when test hisi_sas driver(under SCSI) based on 5.14-rc4
> > > > kernel, it's found that error handle can not be triggered after
> > > > 
> > > > abnormal IO occur in some test with a low probability. For example,
> > > > circularly run disk hardreset or disable all local phy of expander when
> > > > running fio.
> > > > 
> > > > 
> > > > We add some tracepoint and print to see what happen, and we got the
> > > > following information:
> > > > 
> > > > (1).print rq and rq_state at bt_tags_iter() to confirm how many IOs is
> > > > running now.
> > > > 
> > > > <4>[  897.431182] bt_tags_iter: rqs[2808]: 0xffff202007bd3000; rq_state: 1
> > > > <4>[  897.437514] bt_tags_iter: rqs[3185]: 0xffff0020c5261e00; rq_state: 1
> > > > <4>[  897.443841] bt_tags_iter: rqs[3612]: 0xffff00212f242a00; rq_state: 1
> > > > <4>[  897.450167] bt_tags_iter: rqs[2808]: 0xffff00211d208100; rq_state: 1
> > > > <4>[  897.456492] bt_tags_iter: rqs[2921]: 0xffff00211d208100; rq_state: 1
> > > > <4>[  897.462818] bt_tags_iter: rqs[1214]: 0xffff002151d21b00; rq_state: 1
> > > > <4>[  897.469143] bt_tags_iter: rqs[2648]: 0xffff0020c4bfa200; rq_state: 1
> > > > 
> > > > The preceding information show that rq with tag[2808] is found in different
> > > > hctx by bt_tags_iter() and with different pointer saved in tags->rqs[].
> > > > 
> > > > And tag[2808] own the same pointer value saved in rqs[] with tag[2921]. It's
> > > > wrong because our driver share tag between all hctx, so it's not possible
> > > What is your io scheduler? I guess it is deadline,
> > 
> > 
> > yes
> > 
> > 
> > >   and can you observe
> > > such issue by switching to none?
> > 
> > 
> > Yes, it happen when switched to none
> > 
> > 
> > > 
> > > The tricky thing is that one request dumped may be re-allocated to other tag
> > > after returning from bt_tags_iter().
> > > 
> > > > to allocate one tag to different rq.
> > > > 
> > > > 
> > > > (2).check tracepoints(temporarily add) in blk_mq_get_driver_tag() and
> > > > blk_mq_put_tag() to see where this tag is come from.
> > > > 
> > > >      Line 1322969:            <...>-20189   [013] .... 893.427707:
> > > > blk_mq_get_driver_tag: rqs[2808]: 0xffff00211d208100
> > > >      Line 1322997:  irq/1161-hisi_s-7602    [012] d..1 893.427814:
> > > > blk_mq_put_tag_in_free_request: rqs[2808]: 0xffff00211d208100
> > > >      Line 1331257:            <...>-20189   [013] .... 893.462663:
> > > > blk_mq_get_driver_tag: rqs[2860]: 0xffff00211d208100
> > > >      Line 1331289:  irq/1161-hisi_s-7602    [012] d..1 893.462785:
> > > > blk_mq_put_tag_in_free_request: rqs[2860]: 0xffff00211d208100
> > > >      Line 1338493:            <...>-20189   [013] .... 893.493519:
> > > > blk_mq_get_driver_tag: rqs[2921]: 0xffff00211d208100
> > > > 
> > > > As we can see this rq is allocated to tag[2808] once, and finially come to
> > > > tag[2921], but rqs[2808] still save the pointer.
> > > Yeah, we know this kind of handling, but not see it as issue.
> > > 
> > > > There will be no problem until we encounter a rare situation.
> > > > 
> > > > For example, tag[2808] is reassigned to another hctx for execution, then
> > > > some IO meet some error.
> > > I guess the race is triggered when 2808 is just assigned, meantime
> > > ->rqs[] isn't updated.
> > 
> > 
> > As we shared tag between hctx, so if 2808 was assinged to other hctx.
> > 
> > So previous hctx's rqs will not updated。
> > 
> > 
> > > > Before waking up the error handle thread, SCSI compares the values of
> > > > scsi_host_busy() and shost->host_failed.
> > > > 
> > > > If the values are different, SCSI waits for the completion of some I/Os.
> > > > According to the print provided by (1), the return value of scsi_host_busy()
> > > > should be 7 for tag [2808] is calculated twice,
> > > > 
> > > > and the value of shost->host_failed is 6. As a result, this two values are
> > > > never equal, and error handle cannot be triggered.
> > > > 
> > > > 
> > > > A temporary workaround is provided and can solve the problem as:
> > > > 
> > > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > > index 2a37731..e3dc773 100644
> > > > --- a/block/blk-mq-tag.c
> > > > +++ b/block/blk-mq-tag.c
> > > > @@ -190,6 +190,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct
> > > > blk_mq_ctx *ctx,
> > > >                  BUG_ON(tag >= tags->nr_reserved_tags);
> > > >                  sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
> > > >          }
> > > > +       tags->rqs[tag] = NULL;
> > > >   }
> > > > 
> > > > 
> > > > Since we did not encounter this problem in some previous kernel versions, we
> > > > wondered if the community already knew about the problem or could provide
> > > > some solutions.
> > > Can you try the following patch?
> > 
> > 
> > I tested it. it can fix the bug.
> > 
> > 
> > However, if there is still a problem in the following scenario? For example,
> > driver tag 0 is assigned
> > 
> > to rq0 in hctx0, and reclaimed after rq completed. Next time driver tag 0 is
> > still assigned to rq0 but
> > 
> > in hctx1. So at this time,  bt_tags_iter will still got two rqs.
> 
> Each hctx has its own rq pool so far, so no such issue you worried.
> 
> John's patch works towards sharing rq pool among hctxs in case of
> shared sbitmap, not merged yet, but ->rqs[] should be shared too, still
> no such issue.
> 
> Follows the revised patch for handling the stale request in ->rqs[] issue:
> 
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 86f87346232a..ff5caeb82542 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -208,7 +208,7 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
>  
>  	spin_lock_irqsave(&tags->lock, flags);
>  	rq = tags->rqs[bitnr];
> -	if (!rq || !refcount_inc_not_zero(&rq->ref))
> +	if (!rq || rq->tag != bitnr || !refcount_inc_not_zero(&rq->ref))
>  		rq = NULL;
>  	spin_unlock_irqrestore(&tags->lock, flags);
>  	return rq;

Explain the issue a bit in detail:

scsi_host_check_in_flight() is used to counting scsi in-flight requests
after scsi host is blocked, so no new scsi command can be marked as
SCMD_STATE_INFLIGHT. However, driver tag allocation still can be run at
that time by blk-mq core.

The issue is in blk_mq_tagset_busy_iter(). One request is in-flight, but
this request may be kept in another slot of ->rqs[], meantime the slot
can be allocated out but ->rqs[] isn't updated yet. Then the in-flight
request is counted twice as SCMD_STATE_INFLIGHT.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered
  2021-09-01  9:36     ` Ming Lei
  2021-09-01  9:50       ` Ming Lei
@ 2021-09-06  8:16       ` luojiaxing
  1 sibling, 0 replies; 8+ messages in thread
From: luojiaxing @ 2021-09-06  8:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen, john.garry


On 2021/9/1 17:36, Ming Lei wrote:
> On Tue, Aug 31, 2021 at 10:27:28AM +0800, luojiaxing wrote:
>> Hi, Ming
>>
>>
>> Sorry to reply so late, This issue occur in low probability,
>>
>> so it take some time to confirm.
>>
>>
>> On 2021/8/26 15:29, Ming Lei wrote:
>>> On Thu, Aug 26, 2021 at 11:00:34AM +0800, luojiaxing wrote:
>>>> Dear all:
>>>>
>>>>
>>>> I meet some problem when test hisi_sas driver(under SCSI) based on 5.14-rc4
>>>> kernel, it's found that error handle can not be triggered after
>>>>
>>>> abnormal IO occur in some test with a low probability. For example,
>>>> circularly run disk hardreset or disable all local phy of expander when
>>>> running fio.
>>>>
>>>>
>>>> We add some tracepoint and print to see what happen, and we got the
>>>> following information:
>>>>
>>>> (1).print rq and rq_state at bt_tags_iter() to confirm how many IOs is
>>>> running now.
>>>>
>>>> <4>[  897.431182] bt_tags_iter: rqs[2808]: 0xffff202007bd3000; rq_state: 1
>>>> <4>[  897.437514] bt_tags_iter: rqs[3185]: 0xffff0020c5261e00; rq_state: 1
>>>> <4>[  897.443841] bt_tags_iter: rqs[3612]: 0xffff00212f242a00; rq_state: 1
>>>> <4>[  897.450167] bt_tags_iter: rqs[2808]: 0xffff00211d208100; rq_state: 1
>>>> <4>[  897.456492] bt_tags_iter: rqs[2921]: 0xffff00211d208100; rq_state: 1
>>>> <4>[  897.462818] bt_tags_iter: rqs[1214]: 0xffff002151d21b00; rq_state: 1
>>>> <4>[  897.469143] bt_tags_iter: rqs[2648]: 0xffff0020c4bfa200; rq_state: 1
>>>>
>>>> The preceding information show that rq with tag[2808] is found in different
>>>> hctx by bt_tags_iter() and with different pointer saved in tags->rqs[].
>>>>
>>>> And tag[2808] own the same pointer value saved in rqs[] with tag[2921]. It's
>>>> wrong because our driver share tag between all hctx, so it's not possible
>>> What is your io scheduler? I guess it is deadline,
>>
>> yes
>>
>>
>>>    and can you observe
>>> such issue by switching to none?
>>
>> Yes, it happen when switched to none
>>
>>
>>> The tricky thing is that one request dumped may be re-allocated to other tag
>>> after returning from bt_tags_iter().
>>>
>>>> to allocate one tag to different rq.
>>>>
>>>>
>>>> (2).check tracepoints(temporarily add) in blk_mq_get_driver_tag() and
>>>> blk_mq_put_tag() to see where this tag is come from.
>>>>
>>>>       Line 1322969:            <...>-20189   [013] .... 893.427707:
>>>> blk_mq_get_driver_tag: rqs[2808]: 0xffff00211d208100
>>>>       Line 1322997:  irq/1161-hisi_s-7602    [012] d..1 893.427814:
>>>> blk_mq_put_tag_in_free_request: rqs[2808]: 0xffff00211d208100
>>>>       Line 1331257:            <...>-20189   [013] .... 893.462663:
>>>> blk_mq_get_driver_tag: rqs[2860]: 0xffff00211d208100
>>>>       Line 1331289:  irq/1161-hisi_s-7602    [012] d..1 893.462785:
>>>> blk_mq_put_tag_in_free_request: rqs[2860]: 0xffff00211d208100
>>>>       Line 1338493:            <...>-20189   [013] .... 893.493519:
>>>> blk_mq_get_driver_tag: rqs[2921]: 0xffff00211d208100
>>>>
>>>> As we can see this rq is allocated to tag[2808] once, and finially come to
>>>> tag[2921], but rqs[2808] still save the pointer.
>>> Yeah, we know this kind of handling, but not see it as issue.
>>>
>>>> There will be no problem until we encounter a rare situation.
>>>>
>>>> For example, tag[2808] is reassigned to another hctx for execution, then
>>>> some IO meet some error.
>>> I guess the race is triggered when 2808 is just assigned, meantime
>>> ->rqs[] isn't updated.
>>
>> As we shared tag between hctx, so if 2808 was assinged to other hctx.
>>
>> So previous hctx's rqs will not updated。
>>
>>
>>>> Before waking up the error handle thread, SCSI compares the values of
>>>> scsi_host_busy() and shost->host_failed.
>>>>
>>>> If the values are different, SCSI waits for the completion of some I/Os.
>>>> According to the print provided by (1), the return value of scsi_host_busy()
>>>> should be 7 for tag [2808] is calculated twice,
>>>>
>>>> and the value of shost->host_failed is 6. As a result, this two values are
>>>> never equal, and error handle cannot be triggered.
>>>>
>>>>
>>>> A temporary workaround is provided and can solve the problem as:
>>>>
>>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>>> index 2a37731..e3dc773 100644
>>>> --- a/block/blk-mq-tag.c
>>>> +++ b/block/blk-mq-tag.c
>>>> @@ -190,6 +190,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct
>>>> blk_mq_ctx *ctx,
>>>>                   BUG_ON(tag >= tags->nr_reserved_tags);
>>>>                   sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
>>>>           }
>>>> +       tags->rqs[tag] = NULL;
>>>>    }
>>>>
>>>>
>>>> Since we did not encounter this problem in some previous kernel versions, we
>>>> wondered if the community already knew about the problem or could provide
>>>> some solutions.
>>> Can you try the following patch?
>>
>> I tested it. it can fix the bug.
>>
>>
>> However, if there is still a problem in the following scenario? For example,
>> driver tag 0 is assigned
>>
>> to rq0 in hctx0, and reclaimed after rq completed. Next time driver tag 0 is
>> still assigned to rq0 but
>>
>> in hctx1. So at this time,  bt_tags_iter will still got two rqs.
> Each hctx has its own rq pool so far, so no such issue you worried.
>
> John's patch works towards sharing rq pool among hctxs in case of
> shared sbitmap, not merged yet, but ->rqs[] should be shared too, still
> no such issue.
>
> Follows the revised patch for handling the stale request in ->rqs[] issue:


Following patch can fix the issue.

Tested-by: Luo Jiaxing <luojiaxing@huawei.com>


Thanks

Jiaxing


>
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 86f87346232a..ff5caeb82542 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -208,7 +208,7 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
>   
>   	spin_lock_irqsave(&tags->lock, flags);
>   	rq = tags->rqs[bitnr];
> -	if (!rq || !refcount_inc_not_zero(&rq->ref))
> +	if (!rq || rq->tag != bitnr || !refcount_inc_not_zero(&rq->ref))
>   		rq = NULL;
>   	spin_unlock_irqrestore(&tags->lock, flags);
>   	return rq;
>
>
> Thanks,
> Ming
>
>
> .
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-06  8:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26  3:00 rq pointer in tags->rqs[] is not cleared in time and make SCSI error handle can not be triggered luojiaxing
2021-08-26  7:29 ` Ming Lei
2021-08-31  2:27   ` luojiaxing
2021-08-31  8:37     ` Ming Lei
2021-09-01  7:07       ` luojiaxing
2021-09-01  9:36     ` Ming Lei
2021-09-01  9:50       ` Ming Lei
2021-09-06  8:16       ` luojiaxing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).