linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fsync hangs after scsi rejected a request
@ 2019-01-21 21:01 Florian Stecker
  2019-01-22  3:22 ` Ming Lei
  2019-01-25  4:05 ` Bart Van Assche
  0 siblings, 2 replies; 16+ messages in thread
From: Florian Stecker @ 2019-01-21 21:01 UTC (permalink / raw)
  To: linux-block

Hi everyone,

on my laptop, I am experiencing occasional hangs of applications during 
fsync(), which are sometimes up to 30 seconds long. I'm using a BTRFS 
which spans two partitions on the same SSD (one of them used to contain 
a Windows, but I removed it and added the partition to the BTRFS volume 
instead). Also, the problem only occurs when an I/O scheduler 
(mq-deadline) is in use. I'm running kernel version 4.20.3.

 From what I understand so far, what happens is that a sync request 
fails in the SCSI/ATA layer, in ata_std_qc_defer(), because it is a 
"Non-NCQ command" and can not be queued together with other commands. 
This propagates up into blk_mq_dispatch_rq_list(), where the call

ret = q->mq_ops->queue_rq(hctx, &bd);

returns BLK_STS_DEV_RESOURCE. Later in blk_mq_dispatch_rq_list(), there 
is the piece of code

needs_restart = blk_mq_sched_needs_restart(hctx);
if (!needs_restart ||
	(no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
	blk_mq_run_hw_queue(hctx, true);
else if (needs_restart && (ret == BLK_STS_RESOURCE))
	blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY);

which restarts the queue after a delay if BLK_STS_RESOURCE was returned, 
but somehow not for BLK_STS_DEV_RESOURCE. Instead, nothing happens and 
fsync() seems to hang until some other process wants to do I/O.

So if I do

- else if (needs_restart && (ret == BLK_STS_RESOURCE))
+ else if (needs_restart && (ret == BLK_STS_RESOURCE || ret == 
BLK_STS_DEV_RESOURCE))

it fixes my problem. But was there a reason why BLK_STS_DEV_RESOURCE was 
treated differently that BLK_STS_RESOURCE here?

In any case, it seems wrong to me that ret is used here at all, as it 
just contains the return value of the last request in the list, and 
whether we rerun the queue should probably not only depend on the last 
request?

Can anyone of the experts tell me whether this makes sense or I got 
something completely wrong?

Best,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-21 21:01 fsync hangs after scsi rejected a request Florian Stecker
@ 2019-01-22  3:22 ` Ming Lei
  2019-01-25  0:45   ` Florian Stecker
  2019-01-25  4:05 ` Bart Van Assche
  1 sibling, 1 reply; 16+ messages in thread
From: Ming Lei @ 2019-01-22  3:22 UTC (permalink / raw)
  To: Florian Stecker, Linux SCSI List, Martin K. Petersen, Bart Van Assche
  Cc: linux-block

On Tue, Jan 22, 2019 at 5:13 AM Florian Stecker <m19@florianstecker.de> wrote:
>
> Hi everyone,
>
> on my laptop, I am experiencing occasional hangs of applications during
> fsync(), which are sometimes up to 30 seconds long. I'm using a BTRFS
> which spans two partitions on the same SSD (one of them used to contain
> a Windows, but I removed it and added the partition to the BTRFS volume
> instead). Also, the problem only occurs when an I/O scheduler
> (mq-deadline) is in use. I'm running kernel version 4.20.3.
>
>  From what I understand so far, what happens is that a sync request
> fails in the SCSI/ATA layer, in ata_std_qc_defer(), because it is a
> "Non-NCQ command" and can not be queued together with other commands.
> This propagates up into blk_mq_dispatch_rq_list(), where the call
>
> ret = q->mq_ops->queue_rq(hctx, &bd);
>
> returns BLK_STS_DEV_RESOURCE. Later in blk_mq_dispatch_rq_list(), there
> is the piece of code
>
> needs_restart = blk_mq_sched_needs_restart(hctx);
> if (!needs_restart ||
>         (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
>         blk_mq_run_hw_queue(hctx, true);
> else if (needs_restart && (ret == BLK_STS_RESOURCE))
>         blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY);
>
> which restarts the queue after a delay if BLK_STS_RESOURCE was returned,
> but somehow not for BLK_STS_DEV_RESOURCE. Instead, nothing happens and
> fsync() seems to hang until some other process wants to do I/O.
>
> So if I do
>
> - else if (needs_restart && (ret == BLK_STS_RESOURCE))
> + else if (needs_restart && (ret == BLK_STS_RESOURCE || ret ==
> BLK_STS_DEV_RESOURCE))
>
> it fixes my problem. But was there a reason why BLK_STS_DEV_RESOURCE was
> treated differently that BLK_STS_RESOURCE here?

Please see the comment:

/*
 * BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if
 * device related resources are unavailable, but the driver can guarantee
 * that the queue will be rerun in the future once resources become
 * available again. This is typically the case for device specific
 * resources that are consumed for IO. If the driver fails allocating these
 * resources, we know that inflight (or pending) IO will free these
 * resource upon completion.
 *
 * This is different from BLK_STS_RESOURCE in that it explicitly references
 * a device specific resource. For resources of wider scope, allocation
 * failure can happen without having pending IO. This means that we can't
 * rely on request completions freeing these resources, as IO may not be in
 * flight. Examples of that are kernel memory allocations, DMA mappings, or
 * any other system wide resources.
 */
#define BLK_STS_DEV_RESOURCE    ((__force blk_status_t)13)

>
> In any case, it seems wrong to me that ret is used here at all, as it
> just contains the return value of the last request in the list, and
> whether we rerun the queue should probably not only depend on the last
> request?
>
> Can anyone of the experts tell me whether this makes sense or I got
> something completely wrong?

Sounds a bug in SCSI or ata driver.

I remember there is hole in SCSI wrt. returning BLK_STS_DEV_RESOURCE,
but I never get lucky to reproduce it.

scsi_queue_rq():
        ......
        case BLK_STS_RESOURCE:
                if (atomic_read(&sdev->device_busy) ||
                    scsi_device_blocked(sdev))
                        ret = BLK_STS_DEV_RESOURCE;

All in-flight request may complete between reading 'sdev->device_busy'
and setting ret as 'BLK_STS_DEV_RESOURCE', then this IO hang may
be triggered.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-22  3:22 ` Ming Lei
@ 2019-01-25  0:45   ` Florian Stecker
  2019-01-25  1:33     ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Stecker @ 2019-01-25  0:45 UTC (permalink / raw)
  To: Ming Lei, Linux SCSI List, Martin K. Petersen, Bart Van Assche
  Cc: linux-block



On 1/22/19 4:22 AM, Ming Lei wrote:
> On Tue, Jan 22, 2019 at 5:13 AM Florian Stecker <m19@florianstecker.de> wrote:
>>
>> Hi everyone,
>>
>> on my laptop, I am experiencing occasional hangs of applications during
>> fsync(), which are sometimes up to 30 seconds long. I'm using a BTRFS
>> which spans two partitions on the same SSD (one of them used to contain
>> a Windows, but I removed it and added the partition to the BTRFS volume
>> instead). Also, the problem only occurs when an I/O scheduler
>> (mq-deadline) is in use. I'm running kernel version 4.20.3.
>>
>>   From what I understand so far, what happens is that a sync request
>> fails in the SCSI/ATA layer, in ata_std_qc_defer(), because it is a
>> "Non-NCQ command" and can not be queued together with other commands.
>> This propagates up into blk_mq_dispatch_rq_list(), where the call
>>
>> ret = q->mq_ops->queue_rq(hctx, &bd);
>>
>> returns BLK_STS_DEV_RESOURCE. Later in blk_mq_dispatch_rq_list(), there
>> is the piece of code
>>
>> needs_restart = blk_mq_sched_needs_restart(hctx);
>> if (!needs_restart ||
>>          (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
>>          blk_mq_run_hw_queue(hctx, true);
>> else if (needs_restart && (ret == BLK_STS_RESOURCE))
>>          blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY);
>>
>> which restarts the queue after a delay if BLK_STS_RESOURCE was returned,
>> but somehow not for BLK_STS_DEV_RESOURCE. Instead, nothing happens and
>> fsync() seems to hang until some other process wants to do I/O.
>>
>> So if I do
>>
>> - else if (needs_restart && (ret == BLK_STS_RESOURCE))
>> + else if (needs_restart && (ret == BLK_STS_RESOURCE || ret ==
>> BLK_STS_DEV_RESOURCE))
>>
>> it fixes my problem. But was there a reason why BLK_STS_DEV_RESOURCE was
>> treated differently that BLK_STS_RESOURCE here?
> 
> Please see the comment:
> 
> /*
>   * BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if
>   * device related resources are unavailable, but the driver can guarantee
>   * that the queue will be rerun in the future once resources become
>   * available again. This is typically the case for device specific
>   * resources that are consumed for IO. If the driver fails allocating these
>   * resources, we know that inflight (or pending) IO will free these
>   * resource upon completion.
>   *
>   * This is different from BLK_STS_RESOURCE in that it explicitly references
>   * a device specific resource. For resources of wider scope, allocation
>   * failure can happen without having pending IO. This means that we can't
>   * rely on request completions freeing these resources, as IO may not be in
>   * flight. Examples of that are kernel memory allocations, DMA mappings, or
>   * any other system wide resources.
>   */
> #define BLK_STS_DEV_RESOURCE    ((__force blk_status_t)13)
> 
>>
>> In any case, it seems wrong to me that ret is used here at all, as it
>> just contains the return value of the last request in the list, and
>> whether we rerun the queue should probably not only depend on the last
>> request?
>>
>> Can anyone of the experts tell me whether this makes sense or I got
>> something completely wrong?
> 
> Sounds a bug in SCSI or ata driver.
> 
> I remember there is hole in SCSI wrt. returning BLK_STS_DEV_RESOURCE,
> but I never get lucky to reproduce it.
> 
> scsi_queue_rq():
>          ......
>          case BLK_STS_RESOURCE:
>                  if (atomic_read(&sdev->device_busy) ||
>                      scsi_device_blocked(sdev))
>                          ret = BLK_STS_DEV_RESOURCE;
> 

OK, please tell me if I understand this right now. What is supposed to 
happen is

- request 1 is being processed by the device
- request 2 (sync) fails with STS_DEV_RESOURCE because request 1 is 
still being processed
- request 2 is put into hctx->dispatch inside blk_mq_dispatch_rq_list *
- queue does not get rerun because of STS_DEV_RESOURCE
- request 1 finishes
- scsi_end_request calls blk_mq_run_hw_queues
- blk_mq_run_hw_queue finds request 2 in hctx->dispatch **
- runs request 2

However, what happens for me is that request 1 finishes earlier, so that 
** is executed before *, and finds an empty hctx->dispatch list.

At least this is consistent with what I see.

 > All in-flight request may complete between reading 'sdev->device_busy'
 > and setting ret as 'BLK_STS_DEV_RESOURCE', then this IO hang may
 > be triggered.
 >

More accurate would be: between reading sdev->device_busy and doing 
list_splice_init(list,&hctx->dispatch) inside blk_mq_dispatch_rq_list.

I'm not sure how the SCSI driver can do anything about this except 
returning BLK_STS_RESOURCE instead (which I guess would not be a great 
solution), as basically everything else happens inside blk-mq. Or what 
do you have in mind?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-25  0:45   ` Florian Stecker
@ 2019-01-25  1:33     ` Ming Lei
  2019-01-25  3:49       ` jianchao.wang
  0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2019-01-25  1:33 UTC (permalink / raw)
  To: Florian Stecker
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block

On Fri, Jan 25, 2019 at 8:45 AM Florian Stecker <m19@florianstecker.de> wrote:
>
>
>
> On 1/22/19 4:22 AM, Ming Lei wrote:
> > On Tue, Jan 22, 2019 at 5:13 AM Florian Stecker <m19@florianstecker.de> wrote:
> >>
> >> Hi everyone,
> >>
> >> on my laptop, I am experiencing occasional hangs of applications during
> >> fsync(), which are sometimes up to 30 seconds long. I'm using a BTRFS
> >> which spans two partitions on the same SSD (one of them used to contain
> >> a Windows, but I removed it and added the partition to the BTRFS volume
> >> instead). Also, the problem only occurs when an I/O scheduler
> >> (mq-deadline) is in use. I'm running kernel version 4.20.3.
> >>
> >>   From what I understand so far, what happens is that a sync request
> >> fails in the SCSI/ATA layer, in ata_std_qc_defer(), because it is a
> >> "Non-NCQ command" and can not be queued together with other commands.
> >> This propagates up into blk_mq_dispatch_rq_list(), where the call
> >>
> >> ret = q->mq_ops->queue_rq(hctx, &bd);
> >>
> >> returns BLK_STS_DEV_RESOURCE. Later in blk_mq_dispatch_rq_list(), there
> >> is the piece of code
> >>
> >> needs_restart = blk_mq_sched_needs_restart(hctx);
> >> if (!needs_restart ||
> >>          (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
> >>          blk_mq_run_hw_queue(hctx, true);
> >> else if (needs_restart && (ret == BLK_STS_RESOURCE))
> >>          blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY);
> >>
> >> which restarts the queue after a delay if BLK_STS_RESOURCE was returned,
> >> but somehow not for BLK_STS_DEV_RESOURCE. Instead, nothing happens and
> >> fsync() seems to hang until some other process wants to do I/O.
> >>
> >> So if I do
> >>
> >> - else if (needs_restart && (ret == BLK_STS_RESOURCE))
> >> + else if (needs_restart && (ret == BLK_STS_RESOURCE || ret ==
> >> BLK_STS_DEV_RESOURCE))
> >>
> >> it fixes my problem. But was there a reason why BLK_STS_DEV_RESOURCE was
> >> treated differently that BLK_STS_RESOURCE here?
> >
> > Please see the comment:
> >
> > /*
> >   * BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if
> >   * device related resources are unavailable, but the driver can guarantee
> >   * that the queue will be rerun in the future once resources become
> >   * available again. This is typically the case for device specific
> >   * resources that are consumed for IO. If the driver fails allocating these
> >   * resources, we know that inflight (or pending) IO will free these
> >   * resource upon completion.
> >   *
> >   * This is different from BLK_STS_RESOURCE in that it explicitly references
> >   * a device specific resource. For resources of wider scope, allocation
> >   * failure can happen without having pending IO. This means that we can't
> >   * rely on request completions freeing these resources, as IO may not be in
> >   * flight. Examples of that are kernel memory allocations, DMA mappings, or
> >   * any other system wide resources.
> >   */
> > #define BLK_STS_DEV_RESOURCE    ((__force blk_status_t)13)
> >
> >>
> >> In any case, it seems wrong to me that ret is used here at all, as it
> >> just contains the return value of the last request in the list, and
> >> whether we rerun the queue should probably not only depend on the last
> >> request?
> >>
> >> Can anyone of the experts tell me whether this makes sense or I got
> >> something completely wrong?
> >
> > Sounds a bug in SCSI or ata driver.
> >
> > I remember there is hole in SCSI wrt. returning BLK_STS_DEV_RESOURCE,
> > but I never get lucky to reproduce it.
> >
> > scsi_queue_rq():
> >          ......
> >          case BLK_STS_RESOURCE:
> >                  if (atomic_read(&sdev->device_busy) ||
> >                      scsi_device_blocked(sdev))
> >                          ret = BLK_STS_DEV_RESOURCE;
> >
>
> OK, please tell me if I understand this right now. What is supposed to
> happen is
>
> - request 1 is being processed by the device
> - request 2 (sync) fails with STS_DEV_RESOURCE because request 1 is
> still being processed
> - request 2 is put into hctx->dispatch inside blk_mq_dispatch_rq_list *
> - queue does not get rerun because of STS_DEV_RESOURCE
> - request 1 finishes
> - scsi_end_request calls blk_mq_run_hw_queues
> - blk_mq_run_hw_queue finds request 2 in hctx->dispatch **
> - runs request 2

Right, it is one typical case if the hw queue depth is 2.

>
> However, what happens for me is that request 1 finishes earlier, so that
> ** is executed before *, and finds an empty hctx->dispatch list.
>
> At least this is consistent with what I see.
>
>  > All in-flight request may complete between reading 'sdev->device_busy'
>  > and setting ret as 'BLK_STS_DEV_RESOURCE', then this IO hang may
>  > be triggered.
>  >
>
> More accurate would be: between reading sdev->device_busy and doing
> list_splice_init(list,&hctx->dispatch) inside blk_mq_dispatch_rq_list.

Exactly.

>
> I'm not sure how the SCSI driver can do anything about this except
> returning BLK_STS_RESOURCE instead (which I guess would not be a great
> solution), as basically everything else happens inside blk-mq. Or what
> do you have in mind?

Not have a better idea yet, :-(

thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-25  1:33     ` Ming Lei
@ 2019-01-25  3:49       ` jianchao.wang
  2019-01-25  8:56         ` Florian Stecker
  0 siblings, 1 reply; 16+ messages in thread
From: jianchao.wang @ 2019-01-25  3:49 UTC (permalink / raw)
  To: Ming Lei, Florian Stecker
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block



On 1/25/19 9:33 AM, Ming Lei wrote:
> On Fri, Jan 25, 2019 at 8:45 AM Florian Stecker <m19@florianstecker.de> wrote:
>>
>>
>>
>> On 1/22/19 4:22 AM, Ming Lei wrote:
>>> On Tue, Jan 22, 2019 at 5:13 AM Florian Stecker <m19@florianstecker.de> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> on my laptop, I am experiencing occasional hangs of applications during
>>>> fsync(), which are sometimes up to 30 seconds long. I'm using a BTRFS
>>>> which spans two partitions on the same SSD (one of them used to contain
>>>> a Windows, but I removed it and added the partition to the BTRFS volume
>>>> instead). Also, the problem only occurs when an I/O scheduler
>>>> (mq-deadline) is in use. I'm running kernel version 4.20.3.
>>>>
>>>>   From what I understand so far, what happens is that a sync request
>>>> fails in the SCSI/ATA layer, in ata_std_qc_defer(), because it is a
>>>> "Non-NCQ command" and can not be queued together with other commands.
>>>> This propagates up into blk_mq_dispatch_rq_list(), where the call
>>>>
>>>> ret = q->mq_ops->queue_rq(hctx, &bd);
>>>>
>>>> returns BLK_STS_DEV_RESOURCE. Later in blk_mq_dispatch_rq_list(), there
>>>> is the piece of code
>>>>
>>>> needs_restart = blk_mq_sched_needs_restart(hctx);
>>>> if (!needs_restart ||
>>>>          (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
>>>>          blk_mq_run_hw_queue(hctx, true);
>>>> else if (needs_restart && (ret == BLK_STS_RESOURCE))
>>>>          blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY);
>>>>
>>>> which restarts the queue after a delay if BLK_STS_RESOURCE was returned,
>>>> but somehow not for BLK_STS_DEV_RESOURCE. Instead, nothing happens and
>>>> fsync() seems to hang until some other process wants to do I/O.
>>>>
>>>> So if I do
>>>>
>>>> - else if (needs_restart && (ret == BLK_STS_RESOURCE))
>>>> + else if (needs_restart && (ret == BLK_STS_RESOURCE || ret ==
>>>> BLK_STS_DEV_RESOURCE))
>>>>
>>>> it fixes my problem. But was there a reason why BLK_STS_DEV_RESOURCE was
>>>> treated differently that BLK_STS_RESOURCE here?
>>>
>>> Please see the comment:
>>>
>>> /*
>>>   * BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if
>>>   * device related resources are unavailable, but the driver can guarantee
>>>   * that the queue will be rerun in the future once resources become
>>>   * available again. This is typically the case for device specific
>>>   * resources that are consumed for IO. If the driver fails allocating these
>>>   * resources, we know that inflight (or pending) IO will free these
>>>   * resource upon completion.
>>>   *
>>>   * This is different from BLK_STS_RESOURCE in that it explicitly references
>>>   * a device specific resource. For resources of wider scope, allocation
>>>   * failure can happen without having pending IO. This means that we can't
>>>   * rely on request completions freeing these resources, as IO may not be in
>>>   * flight. Examples of that are kernel memory allocations, DMA mappings, or
>>>   * any other system wide resources.
>>>   */
>>> #define BLK_STS_DEV_RESOURCE    ((__force blk_status_t)13)
>>>
>>>>
>>>> In any case, it seems wrong to me that ret is used here at all, as it
>>>> just contains the return value of the last request in the list, and
>>>> whether we rerun the queue should probably not only depend on the last
>>>> request?
>>>>
>>>> Can anyone of the experts tell me whether this makes sense or I got
>>>> something completely wrong?
>>>
>>> Sounds a bug in SCSI or ata driver.
>>>
>>> I remember there is hole in SCSI wrt. returning BLK_STS_DEV_RESOURCE,
>>> but I never get lucky to reproduce it.
>>>
>>> scsi_queue_rq():
>>>          ......
>>>          case BLK_STS_RESOURCE:
>>>                  if (atomic_read(&sdev->device_busy) ||
>>>                      scsi_device_blocked(sdev))
>>>                          ret = BLK_STS_DEV_RESOURCE;
>>>
>>
>> OK, please tell me if I understand this right now. What is supposed to
>> happen is
>>
>> - request 1 is being processed by the device
>> - request 2 (sync) fails with STS_DEV_RESOURCE because request 1 is
>> still being processed
>> - request 2 is put into hctx->dispatch inside blk_mq_dispatch_rq_list *
>> - queue does not get rerun because of STS_DEV_RESOURCE
>> - request 1 finishes
>> - scsi_end_request calls blk_mq_run_hw_queues
>> - blk_mq_run_hw_queue finds request 2 in hctx->dispatch **
>> - runs request 2
> 
> Right, it is one typical case if the hw queue depth is 2.
> 
>>
>> However, what happens for me is that request 1 finishes earlier, so that
>> ** is executed before *, and finds an empty hctx->dispatch list.
>>
>> At least this is consistent with what I see.
>>
>>  > All in-flight request may complete between reading 'sdev->device_busy'
>>  > and setting ret as 'BLK_STS_DEV_RESOURCE', then this IO hang may
>>  > be triggered.
>>  >
>>
>> More accurate would be: between reading sdev->device_busy and doing
>> list_splice_init(list,&hctx->dispatch) inside blk_mq_dispatch_rq_list.
> 
> Exactly.

It sounds like not so easy to trigger.

blk_mq_dispatch_rq_list
  scsi_queue_rq
     if (atomic_read(&sdev->device_busy) ||
       scsi_device_blocked(sdev))
       ret = BLK_STS_DEV_RESOURCE;             scsi_end_request
                                                 __blk_mq_end_request
                                                   blk_mq_sched_restart // clear RESTART
                                                     blk_mq_run_hw_queue
                                                 blk_mq_run_hw_queues
 
  list_splice_init(list, &hctx->dispatch)
  needs_restart = blk_mq_sched_needs_restart(hctx)

The 'needs_restart' will be false, so the queue would be rerun.

Thanks
Jianchao

> 
>>
>> I'm not sure how the SCSI driver can do anything about this except
>> returning BLK_STS_RESOURCE instead (which I guess would not be a great
>> solution), as basically everything else happens inside blk-mq. Or what
>> do you have in mind?
> 
> Not have a better idea yet, :-(
> 
> thanks,
> Ming Lei
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-21 21:01 fsync hangs after scsi rejected a request Florian Stecker
  2019-01-22  3:22 ` Ming Lei
@ 2019-01-25  4:05 ` Bart Van Assche
  1 sibling, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2019-01-25  4:05 UTC (permalink / raw)
  To: Florian Stecker, linux-block

On 1/21/19 1:01 PM, Florian Stecker wrote:
> on my laptop, I am experiencing occasional hangs of applications during 
> fsync(), which are sometimes up to 30 seconds long. I'm using a BTRFS 
> which spans two partitions on the same SSD (one of them used to contain 
> a Windows, but I removed it and added the partition to the BTRFS volume 
> instead). Also, the problem only occurs when an I/O scheduler 
> (mq-deadline) is in use. I'm running kernel version 4.20.3.
> 
> From what I understand so far, what happens is that a sync request 
> fails in the SCSI/ATA layer, in ata_std_qc_defer(), because it is a 
> "Non-NCQ command" and can not be queued together with other commands. 
> This propagates up into blk_mq_dispatch_rq_list(), where the call

Can you have a look at 
https://bugzilla.kernel.org/show_bug.cgi?id=202353 and see whether that 
issue is related to what you encountered?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-25  3:49       ` jianchao.wang
@ 2019-01-25  8:56         ` Florian Stecker
  2019-01-25  9:05           ` jianchao.wang
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Stecker @ 2019-01-25  8:56 UTC (permalink / raw)
  To: jianchao.wang, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block

On 1/25/19 4:49 AM, jianchao.wang wrote:
> 
> It sounds like not so easy to trigger.
> 
> blk_mq_dispatch_rq_list
>    scsi_queue_rq
>       if (atomic_read(&sdev->device_busy) ||
>         scsi_device_blocked(sdev))
>         ret = BLK_STS_DEV_RESOURCE;             scsi_end_request
>                                                   __blk_mq_end_request
>                                                     blk_mq_sched_restart // clear RESTART
>                                                       blk_mq_run_hw_queue
>                                                   blk_mq_run_hw_queues
>   
>    list_splice_init(list, &hctx->dispatch)
>    needs_restart = blk_mq_sched_needs_restart(hctx)
> 
> The 'needs_restart' will be false, so the queue would be rerun.
> 
> Thanks
> Jianchao

Good point. So the RESTART flag is supposed to protect against this? Now 
I see, this is also sort of what the lengthy comment in 
blk_mq_dispatch_rq_list is saying.

May I complain that this is very unintuitive (the queue gets rerun when 
the RESTART flag is _not_ set) and also unreliable, as not every caller 
of blk_mq_dispatch_rq_list seems to set the flag, and also it does not 
always get cleared in __blk_mq_end_request?

__blk_mq_end_request does the following:

	if (rq->end_io) {
		rq_qos_done(rq->q, rq);
		rq->end_io(rq, error);
	} else {
		if (unlikely(blk_bidi_rq(rq)))
			blk_mq_free_request(rq->next_rq);
		blk_mq_free_request(rq);
	}

and blk_mq_free_request then calls blk_mq_sched_restart, which clears 
the flag. But in my case, rq->end_io != 0, so blk_mq_free_request is 
never called.

On 1/25/19 5:05 AM, Bart Van Assche wrote:
 >
 > Can you have a look at
 > https://bugzilla.kernel.org/show_bug.cgi?id=202353 and see whether that
 > issue is related to what you encountered?
 >
 > Thanks,
 >
 > Bart.

I don't know. My hangs are only up to 30 sec (but that's because BTRFS 
does a transaction every 30s, I don't know what would happen with ext4), 
and for me only one process blocks, everything else still works 
flawlessly. Especially programs which do not fsync are not affected at 
all. If I find some time, I can also try downgrading my kernel to 4.18 
and see if the problem persists.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-25  8:56         ` Florian Stecker
@ 2019-01-25  9:05           ` jianchao.wang
  2019-01-25 22:10             ` Florian Stecker
  0 siblings, 1 reply; 16+ messages in thread
From: jianchao.wang @ 2019-01-25  9:05 UTC (permalink / raw)
  To: Florian Stecker, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block



On 1/25/19 4:56 PM, Florian Stecker wrote:
> On 1/25/19 4:49 AM, jianchao.wang wrote:
>>
>> It sounds like not so easy to trigger.
>>
>> blk_mq_dispatch_rq_list
>>    scsi_queue_rq
>>       if (atomic_read(&sdev->device_busy) ||
>>         scsi_device_blocked(sdev))
>>         ret = BLK_STS_DEV_RESOURCE;             scsi_end_request
>>                                                   __blk_mq_end_request
>>                                                     blk_mq_sched_restart // clear RESTART
>>                                                       blk_mq_run_hw_queue
>>                                                   blk_mq_run_hw_queues
>>      list_splice_init(list, &hctx->dispatch)
>>    needs_restart = blk_mq_sched_needs_restart(hctx)
>>
>> The 'needs_restart' will be false, so the queue would be rerun.
>>
>> Thanks
>> Jianchao
> 
> Good point. So the RESTART flag is supposed to protect against this? Now I see, this is also sort of what the lengthy comment in blk_mq_dispatch_rq_list is saying.
> 
> May I complain that this is very unintuitive (the queue gets rerun when the RESTART flag is _not_ set) and also unreliable, as not every caller of blk_mq_dispatch_rq_list seems to set the flag, and also it does not always get cleared in __blk_mq_end_request?
> 
> __blk_mq_end_request does the following:
> 
>     if (rq->end_io) {
>         rq_qos_done(rq->q, rq);
>         rq->end_io(rq, error);
>     } else {
>         if (unlikely(blk_bidi_rq(rq)))
>             blk_mq_free_request(rq->next_rq);
>         blk_mq_free_request(rq);
>     }
> 
> and blk_mq_free_request then calls blk_mq_sched_restart, which clears the flag. But in my case, rq->end_io != 0, so blk_mq_free_request is never called.

So what is your rq->end_io ?
flush_end_io ? or mq_flush_data_end_io ? or other ?
In normal case, the blk_mq_end_request should be finally invoked.

Did you ever try the bfq io scheduler instead of mq-deadline ?

Can you share your dmesg and config file here ?

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-25  9:05           ` jianchao.wang
@ 2019-01-25 22:10             ` Florian Stecker
  2019-01-26  1:18               ` jianchao.wang
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Stecker @ 2019-01-25 22:10 UTC (permalink / raw)
  To: jianchao.wang, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block



On 1/25/19 10:05 AM, jianchao.wang wrote:
> 
> 
> On 1/25/19 4:56 PM, Florian Stecker wrote:
>> On 1/25/19 4:49 AM, jianchao.wang wrote:
>>>
>>> It sounds like not so easy to trigger.
>>>
>>> blk_mq_dispatch_rq_list
>>>     scsi_queue_rq
>>>        if (atomic_read(&sdev->device_busy) ||
>>>          scsi_device_blocked(sdev))
>>>          ret = BLK_STS_DEV_RESOURCE;             scsi_end_request
>>>                                                    __blk_mq_end_request
>>>                                                      blk_mq_sched_restart // clear RESTART
>>>                                                        blk_mq_run_hw_queue
>>>                                                    blk_mq_run_hw_queues
>>>       list_splice_init(list, &hctx->dispatch)
>>>     needs_restart = blk_mq_sched_needs_restart(hctx)
>>>
>>> The 'needs_restart' will be false, so the queue would be rerun.
>>>
>>> Thanks
>>> Jianchao
>>
>> Good point. So the RESTART flag is supposed to protect against this? Now I see, this is also sort of what the lengthy comment in blk_mq_dispatch_rq_list is saying.
>>
>> May I complain that this is very unintuitive (the queue gets rerun when the RESTART flag is _not_ set) and also unreliable, as not every caller of blk_mq_dispatch_rq_list seems to set the flag, and also it does not always get cleared in __blk_mq_end_request?
>>
>> __blk_mq_end_request does the following:
>>
>>      if (rq->end_io) {
>>          rq_qos_done(rq->q, rq);
>>          rq->end_io(rq, error);
>>      } else {
>>          if (unlikely(blk_bidi_rq(rq)))
>>              blk_mq_free_request(rq->next_rq);
>>          blk_mq_free_request(rq);
>>      }
>>
>> and blk_mq_free_request then calls blk_mq_sched_restart, which clears the flag. But in my case, rq->end_io != 0, so blk_mq_free_request is never called.
> 
> So what is your rq->end_io ?
> flush_end_io ? or mq_flush_data_end_io ? or other ?
> In normal case, the blk_mq_end_request should be finally invoked.
> 
> Did you ever try the bfq io scheduler instead of mq-deadline ?
> 
> Can you share your dmesg and config file here ?

Sure.

dmesg: https://florianstecker.de/dmesg-2019-01-25.txt
Note that no I/O scheduler is in use here, I deactivated them in a udev 
rule because I still want to be able to use my laptop. When I test this 
stuff I just reactivate mq-deadline manually.

Config is the default in Arch Linux: 
https://git.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux

The problem also appears with BFQ.

And rq->end_io is set to mq_flush_data_end_io when this happens. The 
only point I see where this function could invoke blk_mq_end_request is 
via blk_flush_complete_seq, but it does so only if seq == REQ_FSEQ_DONE. 
However, seq is REQ_FSEQ_POSTFLUSH for me (i.e. it just transitioned 
from REQ_FSEQ_DATA to REQ_FSEQ_POSTFLUSH).

> 
> Thanks
> Jianchao
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-25 22:10             ` Florian Stecker
@ 2019-01-26  1:18               ` jianchao.wang
  2019-01-26  1:25                 ` jianchao.wang
  0 siblings, 1 reply; 16+ messages in thread
From: jianchao.wang @ 2019-01-26  1:18 UTC (permalink / raw)
  To: Florian Stecker, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block



On 1/26/19 6:10 AM, Florian Stecker wrote:
> 
> 
> On 1/25/19 10:05 AM, jianchao.wang wrote:
>>
>>
>> On 1/25/19 4:56 PM, Florian Stecker wrote:
>>> On 1/25/19 4:49 AM, jianchao.wang wrote:
>>>>
>>>> It sounds like not so easy to trigger.
>>>>
>>>> blk_mq_dispatch_rq_list
>>>>     scsi_queue_rq
>>>>        if (atomic_read(&sdev->device_busy) ||
>>>>          scsi_device_blocked(sdev))
>>>>          ret = BLK_STS_DEV_RESOURCE;             scsi_end_request
>>>>                                                    __blk_mq_end_request
>>>>                                                      blk_mq_sched_restart // clear RESTART
>>>>                                                        blk_mq_run_hw_queue
>>>>                                                    blk_mq_run_hw_queues
>>>>       list_splice_init(list, &hctx->dispatch)
>>>>     needs_restart = blk_mq_sched_needs_restart(hctx)
>>>>
>>>> The 'needs_restart' will be false, so the queue would be rerun.
>>>>
>>>> Thanks
>>>> Jianchao
>>>
>>> Good point. So the RESTART flag is supposed to protect against this? Now I see, this is also sort of what the lengthy comment in blk_mq_dispatch_rq_list is saying.
>>>
>>> May I complain that this is very unintuitive (the queue gets rerun when the RESTART flag is _not_ set) and also unreliable, as not every caller of blk_mq_dispatch_rq_list seems to set the flag, and also it does not always get cleared in __blk_mq_end_request?
>>>
>>> __blk_mq_end_request does the following:
>>>
>>>      if (rq->end_io) {
>>>          rq_qos_done(rq->q, rq);
>>>          rq->end_io(rq, error);
>>>      } else {
>>>          if (unlikely(blk_bidi_rq(rq)))
>>>              blk_mq_free_request(rq->next_rq);
>>>          blk_mq_free_request(rq);
>>>      }
>>>
>>> and blk_mq_free_request then calls blk_mq_sched_restart, which clears the flag. But in my case, rq->end_io != 0, so blk_mq_free_request is never called.
>>
>> So what is your rq->end_io ?
>> flush_end_io ? or mq_flush_data_end_io ? or other ?
>> In normal case, the blk_mq_end_request should be finally invoked.
>>
>> Did you ever try the bfq io scheduler instead of mq-deadline ?
>>
>> Can you share your dmesg and config file here ?
> 
> Sure.
> 
> dmesg: https://urldefense.proofpoint.com/v2/url?u=https-3A__florianstecker.de_dmesg-2D2019-2D01-2D25.txt&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=DC2fhRw2g4haGhhHxOYyX4zD6FWQQ5YoERdfptVRHM4&e=
> Note that no I/O scheduler is in use here, I deactivated them in a udev rule because I still want to be able to use my laptop. When I test this stuff I just reactivate mq-deadline manually.
> 
> Config is the default in Arch Linux: https://urldefense.proofpoint.com/v2/url?u=https-3A__git.archlinux.org_svntogit_packages.git_tree_trunk_config-3Fh-3Dpackages_linux&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=GjV_fyXaKZ8za8zdB_gqmKW7yBlbla2OHRaAT0iGZkc&e=
> 
> The problem also appears with BFQ.
> 
> And rq->end_io is set to mq_flush_data_end_io when this happens. The only point I see where this function could invoke blk_mq_end_request is via blk_flush_complete_seq, but it does so only if seq == REQ_FSEQ_DONE. However, seq is REQ_FSEQ_POSTFLUSH for me (i.e. it just transitioned from REQ_FSEQ_DATA to REQ_FSEQ_POSTFLUSH).


Could it be this scenario ?

data + post flush

blk_flush_complete_seq          a flush
  case REQ_FSEQ_DATA
  blk_flush_queue_rq
  
  finally issued to driver      blk_mq_dispatch_rq_list
                                try to issue a flush req
                                failed due to NON-NCQ command
                                due to RESTART is set, do nothing
  req complete
  req->end_io() // doesn't check RESTART
  mq_flush_data_end_io
   blk_flush_complete_seq
     case REQ_FSEQ_POSTFLUSH
     blk_kick_flush
       do nothing because previous flush
       has not been completed, so 
       flush_pending_idx != flush_running_idx

Thanks
Jianchao
  


>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-26  1:18               ` jianchao.wang
@ 2019-01-26  1:25                 ` jianchao.wang
  2019-01-26 15:37                   ` Florian Stecker
  0 siblings, 1 reply; 16+ messages in thread
From: jianchao.wang @ 2019-01-26  1:25 UTC (permalink / raw)
  To: Florian Stecker, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block



On 1/26/19 9:18 AM, jianchao.wang wrote:
> 
> 
> On 1/26/19 6:10 AM, Florian Stecker wrote:
>>
>>
>> On 1/25/19 10:05 AM, jianchao.wang wrote:
>>>
>>>
>>> On 1/25/19 4:56 PM, Florian Stecker wrote:
>>>> On 1/25/19 4:49 AM, jianchao.wang wrote:
>>>>>
>>>>> It sounds like not so easy to trigger.
>>>>>
>>>>> blk_mq_dispatch_rq_list
>>>>>     scsi_queue_rq
>>>>>        if (atomic_read(&sdev->device_busy) ||
>>>>>          scsi_device_blocked(sdev))
>>>>>          ret = BLK_STS_DEV_RESOURCE;             scsi_end_request
>>>>>                                                    __blk_mq_end_request
>>>>>                                                      blk_mq_sched_restart // clear RESTART
>>>>>                                                        blk_mq_run_hw_queue
>>>>>                                                    blk_mq_run_hw_queues
>>>>>       list_splice_init(list, &hctx->dispatch)
>>>>>     needs_restart = blk_mq_sched_needs_restart(hctx)
>>>>>
>>>>> The 'needs_restart' will be false, so the queue would be rerun.
>>>>>
>>>>> Thanks
>>>>> Jianchao
>>>>
>>>> Good point. So the RESTART flag is supposed to protect against this? Now I see, this is also sort of what the lengthy comment in blk_mq_dispatch_rq_list is saying.
>>>>
>>>> May I complain that this is very unintuitive (the queue gets rerun when the RESTART flag is _not_ set) and also unreliable, as not every caller of blk_mq_dispatch_rq_list seems to set the flag, and also it does not always get cleared in __blk_mq_end_request?
>>>>
>>>> __blk_mq_end_request does the following:
>>>>
>>>>      if (rq->end_io) {
>>>>          rq_qos_done(rq->q, rq);
>>>>          rq->end_io(rq, error);
>>>>      } else {
>>>>          if (unlikely(blk_bidi_rq(rq)))
>>>>              blk_mq_free_request(rq->next_rq);
>>>>          blk_mq_free_request(rq);
>>>>      }
>>>>
>>>> and blk_mq_free_request then calls blk_mq_sched_restart, which clears the flag. But in my case, rq->end_io != 0, so blk_mq_free_request is never called.
>>>
>>> So what is your rq->end_io ?
>>> flush_end_io ? or mq_flush_data_end_io ? or other ?
>>> In normal case, the blk_mq_end_request should be finally invoked.
>>>
>>> Did you ever try the bfq io scheduler instead of mq-deadline ?
>>>
>>> Can you share your dmesg and config file here ?
>>
>> Sure.
>>
>> dmesg: https://urldefense.proofpoint.com/v2/url?u=https-3A__florianstecker.de_dmesg-2D2019-2D01-2D25.txt&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=DC2fhRw2g4haGhhHxOYyX4zD6FWQQ5YoERdfptVRHM4&e=
>> Note that no I/O scheduler is in use here, I deactivated them in a udev rule because I still want to be able to use my laptop. When I test this stuff I just reactivate mq-deadline manually.
>>
>> Config is the default in Arch Linux: https://urldefense.proofpoint.com/v2/url?u=https-3A__git.archlinux.org_svntogit_packages.git_tree_trunk_config-3Fh-3Dpackages_linux&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=GjV_fyXaKZ8za8zdB_gqmKW7yBlbla2OHRaAT0iGZkc&e=
>>
>> The problem also appears with BFQ.
>>
>> And rq->end_io is set to mq_flush_data_end_io when this happens. The only point I see where this function could invoke blk_mq_end_request is via blk_flush_complete_seq, but it does so only if seq == REQ_FSEQ_DONE. However, seq is REQ_FSEQ_POSTFLUSH for me (i.e. it just transitioned from REQ_FSEQ_DATA to REQ_FSEQ_POSTFLUSH).
> 
> 
> Could it be this scenario ?
> 
> data + post flush
> 
> blk_flush_complete_seq          a flush
>   case REQ_FSEQ_DATA
>   blk_flush_queue_rq
>   
>   finally issued to driver      blk_mq_dispatch_rq_list
>                                 try to issue a flush req
>                                 failed due to NON-NCQ command
>                                 due to RESTART is set, do nothing
>   req complete
>   req->end_io() // doesn't check RESTART
>   mq_flush_data_end_io
>    blk_flush_complete_seq
>      case REQ_FSEQ_POSTFLUSH
>      blk_kick_flush
>        do nothing because previous flush
>        has not been completed, so 
>        flush_pending_idx != flush_running_idx
> 

If it is right, maybe we could try following

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ba37b9..3411b6a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -549,6 +549,7 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
        if (rq->end_io) {
                rq_qos_done(rq->q, rq);
                rq->end_io(rq, error);
+               blk_mq_sched_restart(hctx);
        } else {
                if (unlikely(blk_bidi_rq(rq)))
                        blk_mq_free_request(rq->next_rq);



>   
> 
> 
>>>
>>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-26  1:25                 ` jianchao.wang
@ 2019-01-26 15:37                   ` Florian Stecker
  2019-01-28  2:42                     ` jianchao.wang
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Stecker @ 2019-01-26 15:37 UTC (permalink / raw)
  To: jianchao.wang, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block



On 1/26/19 2:25 AM, jianchao.wang wrote:
> 
> 
> On 1/26/19 9:18 AM, jianchao.wang wrote:
>>
>>
>> On 1/26/19 6:10 AM, Florian Stecker wrote:
>>>
>>>
>>> On 1/25/19 10:05 AM, jianchao.wang wrote:
>>>>
>>>>
>>>> On 1/25/19 4:56 PM, Florian Stecker wrote:
>>>>> On 1/25/19 4:49 AM, jianchao.wang wrote:
>>>>>>
>>>>>> It sounds like not so easy to trigger.
>>>>>>
>>>>>> blk_mq_dispatch_rq_list
>>>>>>      scsi_queue_rq
>>>>>>         if (atomic_read(&sdev->device_busy) ||
>>>>>>           scsi_device_blocked(sdev))
>>>>>>           ret = BLK_STS_DEV_RESOURCE;             scsi_end_request
>>>>>>                                                     __blk_mq_end_request
>>>>>>                                                       blk_mq_sched_restart // clear RESTART
>>>>>>                                                         blk_mq_run_hw_queue
>>>>>>                                                     blk_mq_run_hw_queues
>>>>>>        list_splice_init(list, &hctx->dispatch)
>>>>>>      needs_restart = blk_mq_sched_needs_restart(hctx)
>>>>>>
>>>>>> The 'needs_restart' will be false, so the queue would be rerun.
>>>>>>
>>>>>> Thanks
>>>>>> Jianchao
>>>>>
>>>>> Good point. So the RESTART flag is supposed to protect against this? Now I see, this is also sort of what the lengthy comment in blk_mq_dispatch_rq_list is saying.
>>>>>
>>>>> May I complain that this is very unintuitive (the queue gets rerun when the RESTART flag is _not_ set) and also unreliable, as not every caller of blk_mq_dispatch_rq_list seems to set the flag, and also it does not always get cleared in __blk_mq_end_request?
>>>>>
>>>>> __blk_mq_end_request does the following:
>>>>>
>>>>>       if (rq->end_io) {
>>>>>           rq_qos_done(rq->q, rq);
>>>>>           rq->end_io(rq, error);
>>>>>       } else {
>>>>>           if (unlikely(blk_bidi_rq(rq)))
>>>>>               blk_mq_free_request(rq->next_rq);
>>>>>           blk_mq_free_request(rq);
>>>>>       }
>>>>>
>>>>> and blk_mq_free_request then calls blk_mq_sched_restart, which clears the flag. But in my case, rq->end_io != 0, so blk_mq_free_request is never called.
>>>>
>>>> So what is your rq->end_io ?
>>>> flush_end_io ? or mq_flush_data_end_io ? or other ?
>>>> In normal case, the blk_mq_end_request should be finally invoked.
>>>>
>>>> Did you ever try the bfq io scheduler instead of mq-deadline ?
>>>>
>>>> Can you share your dmesg and config file here ?
>>>
>>> Sure.
>>>
>>> dmesg: https://urldefense.proofpoint.com/v2/url?u=https-3A__florianstecker.de_dmesg-2D2019-2D01-2D25.txt&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=DC2fhRw2g4haGhhHxOYyX4zD6FWQQ5YoERdfptVRHM4&e=
>>> Note that no I/O scheduler is in use here, I deactivated them in a udev rule because I still want to be able to use my laptop. When I test this stuff I just reactivate mq-deadline manually.
>>>
>>> Config is the default in Arch Linux: https://urldefense.proofpoint.com/v2/url?u=https-3A__git.archlinux.org_svntogit_packages.git_tree_trunk_config-3Fh-3Dpackages_linux&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=GjV_fyXaKZ8za8zdB_gqmKW7yBlbla2OHRaAT0iGZkc&e=
>>>
>>> The problem also appears with BFQ.
>>>
>>> And rq->end_io is set to mq_flush_data_end_io when this happens. The only point I see where this function could invoke blk_mq_end_request is via blk_flush_complete_seq, but it does so only if seq == REQ_FSEQ_DONE. However, seq is REQ_FSEQ_POSTFLUSH for me (i.e. it just transitioned from REQ_FSEQ_DATA to REQ_FSEQ_POSTFLUSH).
>>
>>
>> Could it be this scenario ?
>>
>> data + post flush
>>
>> blk_flush_complete_seq          a flush
>>    case REQ_FSEQ_DATA
>>    blk_flush_queue_rq
>>    
>>    finally issued to driver      blk_mq_dispatch_rq_list
>>                                  try to issue a flush req
>>                                  failed due to NON-NCQ command
>>                                  due to RESTART is set, do nothing
>>    req complete
>>    req->end_io() // doesn't check RESTART
>>    mq_flush_data_end_io
>>    blk_flush_complete_seq
>>       case REQ_FSEQ_POSTFLUSH
>>       blk_kick_flush
>>         do nothing because previous flush
>>         has not been completed, so
>>         flush_pending_idx != flush_running_idx
>>
> 
> If it is right, maybe we could try following
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3ba37b9..3411b6a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -549,6 +549,7 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
>          if (rq->end_io) {
>                  rq_qos_done(rq->q, rq);
>                  rq->end_io(rq, error);
> +               blk_mq_sched_restart(hctx);
>          } else {
>                  if (unlikely(blk_bidi_rq(rq)))
>                          blk_mq_free_request(rq->next_rq);
> 
> 
> 

I'm not sure if I understand every detail, but your scenario sounds 
roughly the same as what I imagine. The patch solves the problem for me 
(except for the fact that hctx is not defined in that function).

Actually, it doesn't seem to me as if any of the other users of 
blk_mq_free_request require or even expect that it calls 
blk_mq_sched_restart (I could be wrong though). If that is true, how 
about we just call it from __blk_mq_end_request, like this (also tested 
successfully on my laptop)?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6a7566244de3..9f3d3456c4af 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -481,7 +481,6 @@ static void __blk_mq_free_request(struct request *rq)
                 blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
         if (sched_tag != -1)
                 blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
-       blk_mq_sched_restart(hctx);
         blk_queue_exit(q);
  }

@@ -522,6 +521,7 @@ EXPORT_SYMBOL_GPL(blk_mq_free_request);
  inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
  {
         u64 now = ktime_get_ns();
+       struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, 
rq->mq_ctx->cpu);

         if (rq->rq_flags & RQF_STATS) {
                 blk_mq_poll_stats_start(rq->q);
@@ -541,6 +541,8 @@ inline void __blk_mq_end_request(struct request *rq, 
blk_status_t error)
                         blk_mq_free_request(rq->next_rq);
                 blk_mq_free_request(rq);
         }
+
+       blk_mq_sched_restart(hctx);
  }
  EXPORT_SYMBOL(__blk_mq_end_request);

Thanks,
Florian

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-26 15:37                   ` Florian Stecker
@ 2019-01-28  2:42                     ` jianchao.wang
  2019-01-30  1:13                       ` jianchao.wang
  0 siblings, 1 reply; 16+ messages in thread
From: jianchao.wang @ 2019-01-28  2:42 UTC (permalink / raw)
  To: Florian Stecker, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block



On 1/26/19 11:37 PM, Florian Stecker wrote:
> 
> 
> On 1/26/19 2:25 AM, jianchao.wang wrote:
...
>>>>
>>>> dmesg: https://urldefense.proofpoint.com/v2/url?u=https-3A__florianstecker.de_dmesg-2D2019-2D01-2D25.txt&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=DC2fhRw2g4haGhhHxOYyX4zD6FWQQ5YoERdfptVRHM4&e=
>>>> Note that no I/O scheduler is in use here, I deactivated them in a udev rule because I still want to be able to use my laptop. When I test this stuff I just reactivate mq-deadline manually.
>>>>
>>>> Config is the default in Arch Linux: https://urldefense.proofpoint.com/v2/url?u=https-3A__git.archlinux.org_svntogit_packages.git_tree_trunk_config-3Fh-3Dpackages_linux&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=GjV_fyXaKZ8za8zdB_gqmKW7yBlbla2OHRaAT0iGZkc&e=
>>>>
>>>> The problem also appears with BFQ.
>>>>
>>>> And rq->end_io is set to mq_flush_data_end_io when this happens. The only point I see where this function could invoke blk_mq_end_request is via blk_flush_complete_seq, but it does so only if seq == REQ_FSEQ_DONE. However, seq is REQ_FSEQ_POSTFLUSH for me (i.e. it just transitioned from REQ_FSEQ_DATA to REQ_FSEQ_POSTFLUSH).
>>>
>>>
>>> Could it be this scenario ?
>>>
>>> data + post flush
>>>
>>> blk_flush_complete_seq          a flush
>>>    case REQ_FSEQ_DATA
>>>    blk_flush_queue_rq
>>>       finally issued to driver      blk_mq_dispatch_rq_list
>>>                                  try to issue a flush req
>>>                                  failed due to NON-NCQ command
>>>                                  due to RESTART is set, do nothing
>>>    req complete
>>>    req->end_io() // doesn't check RESTART
>>>    mq_flush_data_end_io
>>>    blk_flush_complete_seq
>>>       case REQ_FSEQ_POSTFLUSH
>>>       blk_kick_flush
>>>         do nothing because previous flush
>>>         has not been completed, so
>>>         flush_pending_idx != flush_running_idx
>>>

This should be corrected as following,
 data + post flush
 blk_flush_complete_seq          a flush
   case REQ_FSEQ_DATA
   blk_flush_queue_rq
      finally issued to driver   blk_mq_dispatch_rq_list
                                 try to issue a flush req
                                 failed due to NON-NCQ command
                                 return BLK_STS_DEV_RESOURCE

 req complete
 req->end_io() // doesn't check RESTART
 mq_flush_data_end_io
 case REQ_FSEQ_POSTFLUSH
    blk_kick_flush
    do nothing because previous flush
    has not been completed, so
    flush_pending_idx != flush_running_idx
 blk_mq_mq_run_hw_queue
                                 insert rq to hctx->dispatch_list
                                 due to RESTART is set, do nothing

...
> I'm not sure if I understand every detail, but your scenario sounds roughly the same as what I imagine. The patch solves the problem for me (except for the fact that hctx is not defined in that function).
> 
> Actually, it doesn't seem to me as if any of the other users of blk_mq_free_request require or even expect that it calls blk_mq_sched_restart (I could be wrong though). If that is true, how about we just call it from __blk_mq_end_request, like this (also tested successfully on my laptop)?
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6a7566244de3..9f3d3456c4af 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -481,7 +481,6 @@ static void __blk_mq_free_request(struct request *rq)
>                 blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
>         if (sched_tag != -1)
>                 blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
> -       blk_mq_sched_restart(hctx);
>         blk_queue_exit(q);
>  }
> 
> @@ -522,6 +521,7 @@ EXPORT_SYMBOL_GPL(blk_mq_free_request);
>  inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
>  {
>         u64 now = ktime_get_ns();
> +       struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
> 
>         if (rq->rq_flags & RQF_STATS) {
>                 blk_mq_poll_stats_start(rq->q);
> @@ -541,6 +541,8 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
>                         blk_mq_free_request(rq->next_rq);
>                 blk_mq_free_request(rq);
>         }
> +
> +       blk_mq_sched_restart(hctx);
>  }
>  EXPORT_SYMBOL(__blk_mq_end_request);

We should invoke blk_mq_sched_restart before blk_queue_exit is invoked.

How about this one ?

diff --git a/block/blk-flush.c b/block/blk-flush.c
index a3fc7191..6e0f2d9 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -335,7 +335,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
        blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error);
        spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 
-       blk_mq_run_hw_queue(hctx, true);
+       blk_mq_sched_restart(hctx);
 }

Thanks
Jianchao

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-28  2:42                     ` jianchao.wang
@ 2019-01-30  1:13                       ` jianchao.wang
  2019-01-30  1:13                         ` Florian Stecker
  0 siblings, 1 reply; 16+ messages in thread
From: jianchao.wang @ 2019-01-30  1:13 UTC (permalink / raw)
  To: Florian Stecker, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block

Hi Florian

On 1/28/19 10:42 AM, jianchao.wang wrote:
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index a3fc7191..6e0f2d9 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -335,7 +335,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
>         blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error);
>         spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
>  
> -       blk_mq_run_hw_queue(hctx, true);
> +       blk_mq_sched_restart(hctx);
>  }

Does this patch work ?

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-30  1:13                       ` jianchao.wang
@ 2019-01-30  1:13                         ` Florian Stecker
  2019-01-30  1:21                           ` jianchao.wang
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Stecker @ 2019-01-30  1:13 UTC (permalink / raw)
  To: jianchao.wang, Florian Stecker, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block

Hi Jianchao,

I just tried it out - it seems to work nicely.

Best,
Florian


On January 30, 2019 2:13:14 AM GMT+01:00, "jianchao.wang" <jianchao.w.wang@oracle.com> wrote:
>Hi Florian
>
>On 1/28/19 10:42 AM, jianchao.wang wrote:
>> diff --git a/block/blk-flush.c b/block/blk-flush.c
>> index a3fc7191..6e0f2d9 100644
>> --- a/block/blk-flush.c
>> +++ b/block/blk-flush.c
>> @@ -335,7 +335,7 @@ static void mq_flush_data_end_io(struct request
>*rq, blk_status_t error)
>>         blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error);
>>         spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
>>  
>> -       blk_mq_run_hw_queue(hctx, true);
>> +       blk_mq_sched_restart(hctx);
>>  }
>
>Does this patch work ?
>
>Thanks
>Jianchao

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: fsync hangs after scsi rejected a request
  2019-01-30  1:13                         ` Florian Stecker
@ 2019-01-30  1:21                           ` jianchao.wang
  0 siblings, 0 replies; 16+ messages in thread
From: jianchao.wang @ 2019-01-30  1:21 UTC (permalink / raw)
  To: Florian Stecker, Florian Stecker, Ming Lei
  Cc: Linux SCSI List, Martin K. Petersen, Bart Van Assche, linux-block



On 1/30/19 9:13 AM, Florian Stecker wrote:
> Hi Jianchao,
> 
> I just tried it out - it seems to work nicely.

Thanks so much for your confirmation.
May I have the honor of having your Reported-by and Tested-by ?

Thanks
Jianchao
> 
> Best,
> Florian
> 
> 
> On January 30, 2019 2:13:14 AM GMT+01:00, "jianchao.wang" <jianchao.w.wang@oracle.com> wrote:
>> Hi Florian
>>
>> On 1/28/19 10:42 AM, jianchao.wang wrote:
>>> diff --git a/block/blk-flush.c b/block/blk-flush.c
>>> index a3fc7191..6e0f2d9 100644
>>> --- a/block/blk-flush.c
>>> +++ b/block/blk-flush.c
>>> @@ -335,7 +335,7 @@ static void mq_flush_data_end_io(struct request
>> *rq, blk_status_t error)
>>>         blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error);
>>>         spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
>>>  
>>> -       blk_mq_run_hw_queue(hctx, true);
>>> +       blk_mq_sched_restart(hctx);
>>>  }
>>
>> Does this patch work ?
>>
>> Thanks
>> Jianchao
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-01-30  1:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-21 21:01 fsync hangs after scsi rejected a request Florian Stecker
2019-01-22  3:22 ` Ming Lei
2019-01-25  0:45   ` Florian Stecker
2019-01-25  1:33     ` Ming Lei
2019-01-25  3:49       ` jianchao.wang
2019-01-25  8:56         ` Florian Stecker
2019-01-25  9:05           ` jianchao.wang
2019-01-25 22:10             ` Florian Stecker
2019-01-26  1:18               ` jianchao.wang
2019-01-26  1:25                 ` jianchao.wang
2019-01-26 15:37                   ` Florian Stecker
2019-01-28  2:42                     ` jianchao.wang
2019-01-30  1:13                       ` jianchao.wang
2019-01-30  1:13                         ` Florian Stecker
2019-01-30  1:21                           ` jianchao.wang
2019-01-25  4:05 ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).