blkdev_issue_discard() hangs forever if the underlying storage device is removed

All of lore.kernel.org
 help / color / mirror / Atom feed

* blkdev_issue_discard() hangs forever if the underlying storage device is removed
@ 2011-08-27  6:11 Bart Van Assche
  2011-08-29 11:56 ` Lukas Czerner
  2011-09-22 17:26 ` Bart Van Assche
  0 siblings, 2 replies; 13+ messages in thread
From: Bart Van Assche @ 2011-08-27  6:11 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Lukas Czerner; +Cc: LKML

Apparently blkdev_issue_discard() never times out, not even if the
device has been removed. This is what appeared in the kernel log after
device removal (triggered by running mkfs.ext4 on an SRP SCSI device
node):

sd 15:0:0:0: [sdb] Attached SCSI disk
scsi host15: SRP abort called
scsi host15: SRP reset_device called
scsi host15: ib_srp: SRP reset_host called
scsi host15: ib_srp: connection closed
scsi host15: ib_srp: Got failed path rec status -110
scsi host15: ib_srp: Path record query failed
scsi host15: ib_srp: reconnect failed (-110), removing target port.
sd 15:0:0:0: Device offlined - not ready after error recovery
INFO: task mkfs.ext4:4304 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mkfs.ext4       D 0000000000000000     0  4304   3649 0x00000000
 ffff88006c313b98 0000000000000046 ffffffff813e3038 ffffffff81e6b580
 0000000000000082 000000010003cfdc ffff88006c313fd8 ffff880070fbcbc0
 00000000001d1f40 ffff88006c313fd8 ffff88006c312000 ffff88006c312000
Call Trace:
 [<ffffffff813e3038>] ? schedule+0x628/0x830
 [<ffffffff813e3835>] schedule_timeout+0x1d5/0x310
 [<ffffffff810805de>] ? put_lock_stats+0xe/0x40
 [<ffffffff81080e05>] ? lock_release_holdtime+0xb5/0x160
 [<ffffffff813e6ac0>] ? _raw_spin_unlock_irq+0x30/0x60
 [<ffffffff8103f7d9>] ? sub_preempt_count+0xa9/0xe0
 [<ffffffff813e28e0>] wait_for_common+0x110/0x160
 [<ffffffff810425f0>] ? try_to_wake_up+0x2c0/0x2c0
 [<ffffffff813e2a0d>] wait_for_completion+0x1d/0x20
 [<ffffffff811de93a>] blkdev_issue_discard+0x27a/0x2c0
 [<ffffffff813e2806>] ? wait_for_common+0x36/0x160
 [<ffffffff811df371>] blkdev_ioctl+0x701/0x760
 [<ffffffff8112b7bf>] ? kmem_cache_free+0x6f/0x160
 [<ffffffff811755b7>] block_ioctl+0x47/0x50
 [<ffffffff81151b78>] do_vfs_ioctl+0x98/0x570
 [<ffffffff813e76dc>] ? sysret_check+0x27/0x62
 [<ffffffff8115209f>] sys_ioctl+0x4f/0x80
 [<ffffffff813e76ab>] system_call_fastpath+0x16/0x1b
no locks held by mkfs.ext4/4304.

The above message kept repeating forever until system reboot.

Kernel version:
$ git show | head -n 1
commit ed8f37370d83e695c0a4fa5d5fc7a83ecb947526
$ git describe
v3.0-7216-ged8f373

I'm considering this as a bug because the state described above makes it
impossible to kill the mkfs process and also makes it impossible to remove the
kernel module ib_srp. That's why I also reported this as
https://bugzilla.kernel.org/show_bug.cgi?id=40472.

Any opinions ?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-27  6:11 blkdev_issue_discard() hangs forever if the underlying storage device is removed Bart Van Assche
@ 2011-08-29 11:56 ` Lukas Czerner
  2011-08-29 14:15   ` Lin Ming
  2011-08-29 17:56   ` Bart Van Assche
  2011-09-22 17:26 ` Bart Van Assche
  1 sibling, 2 replies; 13+ messages in thread
From: Lukas Czerner @ 2011-08-29 11:56 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, Mike Snitzer, Lukas Czerner, LKML

On Sat, 27 Aug 2011, Bart Van Assche wrote:

> Apparently blkdev_issue_discard() never times out, not even if the
> device has been removed. This is what appeared in the kernel log after
> device removal (triggered by running mkfs.ext4 on an SRP SCSI device
> node):
> 
> sd 15:0:0:0: [sdb] Attached SCSI disk
> scsi host15: SRP abort called
> scsi host15: SRP reset_device called
> scsi host15: ib_srp: SRP reset_host called
> scsi host15: ib_srp: connection closed
> scsi host15: ib_srp: Got failed path rec status -110
> scsi host15: ib_srp: Path record query failed
> scsi host15: ib_srp: reconnect failed (-110), removing target port.
> sd 15:0:0:0: Device offlined - not ready after error recovery
> INFO: task mkfs.ext4:4304 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> mkfs.ext4       D 0000000000000000     0  4304   3649 0x00000000
>  ffff88006c313b98 0000000000000046 ffffffff813e3038 ffffffff81e6b580
>  0000000000000082 000000010003cfdc ffff88006c313fd8 ffff880070fbcbc0
>  00000000001d1f40 ffff88006c313fd8 ffff88006c312000 ffff88006c312000
> Call Trace:
>  [<ffffffff813e3038>] ? schedule+0x628/0x830
>  [<ffffffff813e3835>] schedule_timeout+0x1d5/0x310
>  [<ffffffff810805de>] ? put_lock_stats+0xe/0x40
>  [<ffffffff81080e05>] ? lock_release_holdtime+0xb5/0x160
>  [<ffffffff813e6ac0>] ? _raw_spin_unlock_irq+0x30/0x60
>  [<ffffffff8103f7d9>] ? sub_preempt_count+0xa9/0xe0
>  [<ffffffff813e28e0>] wait_for_common+0x110/0x160
>  [<ffffffff810425f0>] ? try_to_wake_up+0x2c0/0x2c0
>  [<ffffffff813e2a0d>] wait_for_completion+0x1d/0x20
>  [<ffffffff811de93a>] blkdev_issue_discard+0x27a/0x2c0
>  [<ffffffff813e2806>] ? wait_for_common+0x36/0x160
>  [<ffffffff811df371>] blkdev_ioctl+0x701/0x760
>  [<ffffffff8112b7bf>] ? kmem_cache_free+0x6f/0x160
>  [<ffffffff811755b7>] block_ioctl+0x47/0x50
>  [<ffffffff81151b78>] do_vfs_ioctl+0x98/0x570
>  [<ffffffff813e76dc>] ? sysret_check+0x27/0x62
>  [<ffffffff8115209f>] sys_ioctl+0x4f/0x80
>  [<ffffffff813e76ab>] system_call_fastpath+0x16/0x1b
> no locks held by mkfs.ext4/4304.
> 
> The above message kept repeating forever until system reboot.
> 
> Kernel version:
> $ git show | head -n 1
> commit ed8f37370d83e695c0a4fa5d5fc7a83ecb947526
> $ git describe
> v3.0-7216-ged8f373
> 
> I'm considering this as a bug because the state described above makes it
> impossible to kill the mkfs process and also makes it impossible to remove the
> kernel module ib_srp. That's why I also reported this as
> https://bugzilla.kernel.org/show_bug.cgi?id=40472.
> 
> Any opinions ?
> 
> Thanks,
> 
> Bart.

Thanks to reporting this! The problem looks odd to me. I am trying to
find some race condition that would cause the problem in
blkdev_issue_discard(), however I can not see anything.

The situation described in the backtrace shows that the blkdev_issue_discard()
is waiting in wait_for_completion(). That means that the last bio issued
from that function has not yet completed.

In blkdev_issue_discard() we do:

atomic_set(&bb.done, 1);
...
...
while (nr_sects) {
...
	atomic_inc(&bb.done);
	submit_bio(type, bio);
}

and after all bios has been submitted it will do 

if (!atomic_dec_and_test(&bb.done))
	wait_for_completion(&wait);

Than bio completion callback will do:

if (atomic_dec_and_test(&bb->done))
	complete(bb->wait);

The only reason for this to happen I can see is that the last bio was
not completed yet (e.g. the bio_batch_end_io() callback has not been
called by the last submitted bio). Does bios have some sort of timeout
after it dies out? Is it possible that we cal lose bio like that ?

Regarding the atomic operations I do not think that implicit memory
barriers are needed here as atomic_dec_and_test() implies memory
barrier, atomic_set() is out of the scope of our interest (and it would
not cause the problem like that anyway) and reordering atomic_inc()
would not cause problem like this as well.

So I do not think that the problem is in blkdev_issue_discard().
Any thoughts ?

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-29 11:56 ` Lukas Czerner
@ 2011-08-29 14:15   ` Lin Ming
  2011-08-29 14:42     ` Lukas Czerner
  2011-08-29 17:56   ` Bart Van Assche
  1 sibling, 1 reply; 13+ messages in thread
From: Lin Ming @ 2011-08-29 14:15 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Bart Van Assche, Jens Axboe, Mike Snitzer, LKML

On Mon, Aug 29, 2011 at 7:56 PM, Lukas Czerner <lczerner@redhat.com> wrote:
>
> Regarding the atomic operations I do not think that implicit memory
> barriers are needed here as atomic_dec_and_test() implies memory

Which implicit memory barrier you are talking about?

> barrier, atomic_set() is out of the scope of our interest (and it would
> not cause the problem like that anyway) and reordering atomic_inc()
> would not cause problem like this as well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-29 14:15   ` Lin Ming
@ 2011-08-29 14:42     ` Lukas Czerner
  2011-08-29 15:11       ` Lin Ming
  0 siblings, 1 reply; 13+ messages in thread
From: Lukas Czerner @ 2011-08-29 14:42 UTC (permalink / raw)
  To: Lin Ming; +Cc: Lukas Czerner, Bart Van Assche, Jens Axboe, Mike Snitzer, LKML

On Mon, 29 Aug 2011, Lin Ming wrote:

> On Mon, Aug 29, 2011 at 7:56 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> >
> > Regarding the atomic operations I do not think that implicit memory
> > barriers are needed here as atomic_dec_and_test() implies memory
> 
> Which implicit memory barrier you are talking about?

smp_mb() at both side of the operation as documented here in
Documentation/memory-barriers.txt

[citation]
Any atomic operation that modifies some state in memory and returns
information about the state (old or new) implies an SMP-conditional
general memory barrier (smp_mb()) on each side of the actual operation
(with the exception of explicit lock operations, described later).
These include:

	xchg();
	cmpxchg();
	atomic_cmpxchg();
	atomic_inc_return();
	atomic_dec_return();
	atomic_add_return();
	atomic_sub_return();
	atomic_inc_and_test();
	atomic_dec_and_test();
	atomic_sub_and_test();
	atomic_add_negative();
	atomic_add_unless();
	test_and_set_bit();
	test_and_clear_bit();
	test_and_change_bit();

[/citation]

-Lukas

> 
> > barrier, atomic_set() is out of the scope of our interest (and it would
> > not cause the problem like that anyway) and reordering atomic_inc()
> > would not cause problem like this as well.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-29 14:42     ` Lukas Czerner
@ 2011-08-29 15:11       ` Lin Ming
  2011-08-29 15:16         ` Lukas Czerner
  0 siblings, 1 reply; 13+ messages in thread
From: Lin Ming @ 2011-08-29 15:11 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Bart Van Assche, Jens Axboe, Mike Snitzer, LKML

On Mon, Aug 29, 2011 at 10:42 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> On Mon, 29 Aug 2011, Lin Ming wrote:
>
>> On Mon, Aug 29, 2011 at 7:56 PM, Lukas Czerner <lczerner@redhat.com> wrote:
>> >
>> > Regarding the atomic operations I do not think that implicit memory
>> > barriers are needed here as atomic_dec_and_test() implies memory
>>
>> Which implicit memory barrier you are talking about?
>
> smp_mb() at both side of the operation as documented here in
> Documentation/memory-barriers.txt

Thanks for the info.

But I don't follow you ... why that implicit memory barriers are NOT needed?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-29 15:11       ` Lin Ming
@ 2011-08-29 15:16         ` Lukas Czerner
  2011-08-29 15:26           ` Lin Ming
  0 siblings, 1 reply; 13+ messages in thread
From: Lukas Czerner @ 2011-08-29 15:16 UTC (permalink / raw)
  To: Lin Ming; +Cc: Lukas Czerner, Bart Van Assche, Jens Axboe, Mike Snitzer, LKML

On Mon, 29 Aug 2011, Lin Ming wrote:

> On Mon, Aug 29, 2011 at 10:42 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> > On Mon, 29 Aug 2011, Lin Ming wrote:
> >
> >> On Mon, Aug 29, 2011 at 7:56 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> >> >
> >> > Regarding the atomic operations I do not think that implicit memory
> >> > barriers are needed here as atomic_dec_and_test() implies memory
> >>
> >> Which implicit memory barrier you are talking about?
> >
> > smp_mb() at both side of the operation as documented here in
> > Documentation/memory-barriers.txt
> 
> Thanks for the info.
> 
> But I don't follow you ... why that implicit memory barriers are NOT needed?
> 

Oh, I am sorry I have actually wanted to say that *explicit* memory
barriers are no needed in that case. Sorry for the confusion!

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-29 15:16         ` Lukas Czerner
@ 2011-08-29 15:26           ` Lin Ming
  0 siblings, 0 replies; 13+ messages in thread
From: Lin Ming @ 2011-08-29 15:26 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Bart Van Assche, Jens Axboe, Mike Snitzer, LKML

On Mon, Aug 29, 2011 at 11:16 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> On Mon, 29 Aug 2011, Lin Ming wrote:
>
>> On Mon, Aug 29, 2011 at 10:42 PM, Lukas Czerner <lczerner@redhat.com> wrote:
>> > On Mon, 29 Aug 2011, Lin Ming wrote:
>> >
>> >> On Mon, Aug 29, 2011 at 7:56 PM, Lukas Czerner <lczerner@redhat.com> wrote:
>> >> >
>> >> > Regarding the atomic operations I do not think that implicit memory
>> >> > barriers are needed here as atomic_dec_and_test() implies memory
>> >>
>> >> Which implicit memory barrier you are talking about?
>> >
>> > smp_mb() at both side of the operation as documented here in
>> > Documentation/memory-barriers.txt
>>
>> Thanks for the info.
>>
>> But I don't follow you ... why that implicit memory barriers are NOT needed?
>>
>
> Oh, I am sorry I have actually wanted to say that *explicit* memory
> barriers are no needed in that case. Sorry for the confusion!

Make sense now :)

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-29 11:56 ` Lukas Czerner
  2011-08-29 14:15   ` Lin Ming
@ 2011-08-29 17:56   ` Bart Van Assche
  2011-08-30  2:01     ` Dave Chinner
  2011-08-30 10:34     ` Lukas Czerner
  1 sibling, 2 replies; 13+ messages in thread
From: Bart Van Assche @ 2011-08-29 17:56 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Jens Axboe, Mike Snitzer, LKML

On Mon, Aug 29, 2011 at 1:56 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> On Sat, 27 Aug 2011, Bart Van Assche wrote:
>
>> Apparently blkdev_issue_discard() never times out, not even if the
>> device has been removed. This is what appeared in the kernel log after
>> device removal (triggered by running mkfs.ext4 on an SRP SCSI device
>> node):
>>
>> sd 15:0:0:0: [sdb] Attached SCSI disk
>> scsi host15: SRP abort called
>> scsi host15: SRP reset_device called
>> scsi host15: ib_srp: SRP reset_host called
>> scsi host15: ib_srp: connection closed
>> scsi host15: ib_srp: Got failed path rec status -110
>> scsi host15: ib_srp: Path record query failed
>> scsi host15: ib_srp: reconnect failed (-110), removing target port.
>> sd 15:0:0:0: Device offlined - not ready after error recovery
>> INFO: task mkfs.ext4:4304 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> mkfs.ext4       D 0000000000000000     0 4304 3649 0x00000000
>>  ffff88006c313b98 0000000000000046 ffffffff813e3038 ffffffff81e6b580
>>  0000000000000082 000000010003cfdc ffff88006c313fd8 ffff880070fbcbc0
>>  00000000001d1f40 ffff88006c313fd8 ffff88006c312000 ffff88006c312000
>> Call Trace:
>>  [<ffffffff813e3038>] ? schedule+0x628/0x830
>>  [<ffffffff813e3835>] schedule_timeout+0x1d5/0x310
>>  [<ffffffff810805de>] ? put_lock_stats+0xe/0x40
>>  [<ffffffff81080e05>] ? lock_release_holdtime+0xb5/0x160
>>  [<ffffffff813e6ac0>] ? _raw_spin_unlock_irq+0x30/0x60
>>  [<ffffffff8103f7d9>] ? sub_preempt_count+0xa9/0xe0
>>  [<ffffffff813e28e0>] wait_for_common+0x110/0x160
>>  [<ffffffff810425f0>] ? try_to_wake_up+0x2c0/0x2c0
>>  [<ffffffff813e2a0d>] wait_for_completion+0x1d/0x20
>>  [<ffffffff811de93a>] blkdev_issue_discard+0x27a/0x2c0
>>  [<ffffffff813e2806>] ? wait_for_common+0x36/0x160
>>  [<ffffffff811df371>] blkdev_ioctl+0x701/0x760
>>  [<ffffffff8112b7bf>] ? kmem_cache_free+0x6f/0x160
>>  [<ffffffff811755b7>] block_ioctl+0x47/0x50
>>  [<ffffffff81151b78>] do_vfs_ioctl+0x98/0x570
>>  [<ffffffff813e76dc>] ? sysret_check+0x27/0x62
>>  [<ffffffff8115209f>] sys_ioctl+0x4f/0x80
>>  [<ffffffff813e76ab>] system_call_fastpath+0x16/0x1b
>> no locks held by mkfs.ext4/4304.
>>
>> The above message kept repeating forever until system reboot.
>>
>> Kernel version:
>> $ git show | head -n 1
>> commit ed8f37370d83e695c0a4fa5d5fc7a83ecb947526
>> $ git describe
>> v3.0-7216-ged8f373
>>
>> I'm considering this as a bug because the state described above makes it
>> impossible to kill the mkfs process and also makes it impossible to remove the
>> kernel module ib_srp. That's why I also reported this as
>> https://bugzilla.kernel.org/show_bug.cgi?id=40472.
>
> I am trying to find some race condition that would cause the problem in
> blkdev_issue_discard(), however I can not see anything.

I'm not sure why you are looking for a race condition - this looks
like a plain deadlock to me.

> The only reason for this to happen I can see is that the last bio was
> not completed yet (e.g. the bio_batch_end_io() callback has not been
> called by the last submitted bio). Does bios have some sort of timeout
> after it dies out? Is it possible that we can lose bio like that ?

A key fact here is that the block device to which the discard request
was issued is gone, so the discard request will never finish
successfully. Do all relevant error paths guarantee that
blkdev_issue_discard() will finish in a finite time ?

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-29 17:56   ` Bart Van Assche
@ 2011-08-30  2:01     ` Dave Chinner
  2011-08-30 10:38       ` Bart Van Assche
  2011-09-27 17:57       ` Bart Van Assche
  2011-08-30 10:34     ` Lukas Czerner
  1 sibling, 2 replies; 13+ messages in thread
From: Dave Chinner @ 2011-08-30  2:01 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Lukas Czerner, Jens Axboe, Mike Snitzer, LKML

On Mon, Aug 29, 2011 at 07:56:33PM +0200, Bart Van Assche wrote:
> On Mon, Aug 29, 2011 at 1:56 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> > On Sat, 27 Aug 2011, Bart Van Assche wrote:
> >
> >> Apparently blkdev_issue_discard() never times out, not even if the
> >> device has been removed. This is what appeared in the kernel log after
> >> device removal (triggered by running mkfs.ext4 on an SRP SCSI device
> >> node):
> >>
> >> sd 15:0:0:0: [sdb] Attached SCSI disk
> >> scsi host15: SRP abort called
> >> scsi host15: SRP reset_device called
> >> scsi host15: ib_srp: SRP reset_host called
> >> scsi host15: ib_srp: connection closed
> >> scsi host15: ib_srp: Got failed path rec status -110
> >> scsi host15: ib_srp: Path record query failed
> >> scsi host15: ib_srp: reconnect failed (-110), removing target port.
> >> sd 15:0:0:0: Device offlined - not ready after error recovery
> >> INFO: task mkfs.ext4:4304 blocked for more than 120 seconds.
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> mkfs.ext4       D 0000000000000000     0 4304 3649 0x00000000
> >>  ffff88006c313b98 0000000000000046 ffffffff813e3038 ffffffff81e6b580
> >>  0000000000000082 000000010003cfdc ffff88006c313fd8 ffff880070fbcbc0
> >>  00000000001d1f40 ffff88006c313fd8 ffff88006c312000 ffff88006c312000
> >> Call Trace:
> >>  [<ffffffff813e3038>] ? schedule+0x628/0x830
> >>  [<ffffffff813e3835>] schedule_timeout+0x1d5/0x310
> >>  [<ffffffff810805de>] ? put_lock_stats+0xe/0x40
> >>  [<ffffffff81080e05>] ? lock_release_holdtime+0xb5/0x160
> >>  [<ffffffff813e6ac0>] ? _raw_spin_unlock_irq+0x30/0x60
> >>  [<ffffffff8103f7d9>] ? sub_preempt_count+0xa9/0xe0
> >>  [<ffffffff813e28e0>] wait_for_common+0x110/0x160
> >>  [<ffffffff810425f0>] ? try_to_wake_up+0x2c0/0x2c0
> >>  [<ffffffff813e2a0d>] wait_for_completion+0x1d/0x20
> >>  [<ffffffff811de93a>] blkdev_issue_discard+0x27a/0x2c0
> >>  [<ffffffff813e2806>] ? wait_for_common+0x36/0x160
> >>  [<ffffffff811df371>] blkdev_ioctl+0x701/0x760
> >>  [<ffffffff8112b7bf>] ? kmem_cache_free+0x6f/0x160
> >>  [<ffffffff811755b7>] block_ioctl+0x47/0x50
> >>  [<ffffffff81151b78>] do_vfs_ioctl+0x98/0x570
> >>  [<ffffffff813e76dc>] ? sysret_check+0x27/0x62
> >>  [<ffffffff8115209f>] sys_ioctl+0x4f/0x80
> >>  [<ffffffff813e76ab>] system_call_fastpath+0x16/0x1b
> >> no locks held by mkfs.ext4/4304.
> >>
> >> The above message kept repeating forever until system reboot.
> >>
> >> Kernel version:
> >> $ git show | head -n 1
> >> commit ed8f37370d83e695c0a4fa5d5fc7a83ecb947526
> >> $ git describe
> >> v3.0-7216-ged8f373
> >>
> >> I'm considering this as a bug because the state described above makes it
> >> impossible to kill the mkfs process and also makes it impossible to remove the
> >> kernel module ib_srp. That's why I also reported this as
> >> https://bugzilla.kernel.org/show_bug.cgi?id=40472.
> >
> > I am trying to find some race condition that would cause the problem in
> > blkdev_issue_discard(), however I can not see anything.
> 
> I'm not sure why you are looking for a race condition - this looks
> like a plain deadlock to me.
> 
> > The only reason for this to happen I can see is that the last bio was
> > not completed yet (e.g. the bio_batch_end_io() callback has not been
> > called by the last submitted bio). Does bios have some sort of timeout
> > after it dies out? Is it possible that we can lose bio like that ?
> 
> A key fact here is that the block device to which the discard request
> was issued is gone, so the discard request will never finish
> successfully. Do all relevant error paths guarantee that
> blkdev_issue_discard() will finish in a finite time ?

The underlying block device driver is supposed to handle timing out
of lost IOs and causcwinge them to be completed with an error.
blkdev_issue_discard() is simply waiting for that error to be
delivered.

If the driver has not detected that an outstanding request has not
completed then that is a driver bug, not a bug in
blkdev_issue_discard(). IOWs, you should be asking the ib_srp people
why the in flight bio was not timed out or errored out when the
block device abort was run....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-30  2:01     ` Dave Chinner
@ 2011-08-30 10:38       ` Bart Van Assche
  2011-09-27 17:57       ` Bart Van Assche
  1 sibling, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2011-08-30 10:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Lukas Czerner, Jens Axboe, Mike Snitzer, LKML

On Tue, Aug 30, 2011 at 4:01 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Aug 29, 2011 at 07:56:33PM +0200, Bart Van Assche wrote:
>> A key fact here is that the block device to which the discard request
>> was issued is gone, so the discard request will never finish
>> successfully. Do all relevant error paths guarantee that
>> blkdev_issue_discard() will finish in a finite time ?
>
> The underlying block device driver is supposed to handle timing out
> of lost IOs and causcwinge them to be completed with an error.
> blkdev_issue_discard() is simply waiting for that error to be
> delivered.
>
> If the driver has not detected that an outstanding request has not
> completed then that is a driver bug, not a bug in
> blkdev_issue_discard(). IOWs, you should be asking the ib_srp people
> why the in flight bio was not timed out or errored out when the
> block device abort was run....

Sorry, but the above doesn't make sense. This issue has nothing to do
with the ib_srp implementation: blkdev_issue_discard() was invoked by
mkfs after ib_srp had removed the SCSI host associated with the block
device. So none of the code in ib_srp has been invoked while
processing this blkdev_issue_discard() call.

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-30  2:01     ` Dave Chinner
  2011-08-30 10:38       ` Bart Van Assche
@ 2011-09-27 17:57       ` Bart Van Assche
  1 sibling, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2011-09-27 17:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Lukas Czerner, Jens Axboe, Mike Snitzer, LKML

On Tue, Aug 30, 2011 at 4:01 AM, Dave Chinner <david@fromorbit.com> wrote:
> The underlying block device driver is supposed to handle timing out
> of lost IOs and causing them to be completed with an error.
> blkdev_issue_discard() is simply waiting for that error to be
> delivered.

Hi Dave,

You're probably right that it's not a block layer issue. This patch
fixed it for me: http://marc.info/?l=linux-scsi&m=131680195721932&w=2

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-29 17:56   ` Bart Van Assche
  2011-08-30  2:01     ` Dave Chinner
@ 2011-08-30 10:34     ` Lukas Czerner
  1 sibling, 0 replies; 13+ messages in thread
From: Lukas Czerner @ 2011-08-30 10:34 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Lukas Czerner, Jens Axboe, Mike Snitzer, LKML

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4176 bytes --]

On Mon, 29 Aug 2011, Bart Van Assche wrote:

> On Mon, Aug 29, 2011 at 1:56 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> > On Sat, 27 Aug 2011, Bart Van Assche wrote:
> >
> >> Apparently blkdev_issue_discard() never times out, not even if the
> >> device has been removed. This is what appeared in the kernel log after
> >> device removal (triggered by running mkfs.ext4 on an SRP SCSI device
> >> node):
> >>
> >> sd 15:0:0:0: [sdb] Attached SCSI disk
> >> scsi host15: SRP abort called
> >> scsi host15: SRP reset_device called
> >> scsi host15: ib_srp: SRP reset_host called
> >> scsi host15: ib_srp: connection closed
> >> scsi host15: ib_srp: Got failed path rec status -110
> >> scsi host15: ib_srp: Path record query failed
> >> scsi host15: ib_srp: reconnect failed (-110), removing target port.
> >> sd 15:0:0:0: Device offlined - not ready after error recovery
> >> INFO: task mkfs.ext4:4304 blocked for more than 120 seconds.
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> mkfs.ext4       D 0000000000000000     0 4304 3649 0x00000000
> >>  ffff88006c313b98 0000000000000046 ffffffff813e3038 ffffffff81e6b580
> >>  0000000000000082 000000010003cfdc ffff88006c313fd8 ffff880070fbcbc0
> >>  00000000001d1f40 ffff88006c313fd8 ffff88006c312000 ffff88006c312000
> >> Call Trace:
> >>  [<ffffffff813e3038>] ? schedule+0x628/0x830
> >>  [<ffffffff813e3835>] schedule_timeout+0x1d5/0x310
> >>  [<ffffffff810805de>] ? put_lock_stats+0xe/0x40
> >>  [<ffffffff81080e05>] ? lock_release_holdtime+0xb5/0x160
> >>  [<ffffffff813e6ac0>] ? _raw_spin_unlock_irq+0x30/0x60
> >>  [<ffffffff8103f7d9>] ? sub_preempt_count+0xa9/0xe0
> >>  [<ffffffff813e28e0>] wait_for_common+0x110/0x160
> >>  [<ffffffff810425f0>] ? try_to_wake_up+0x2c0/0x2c0
> >>  [<ffffffff813e2a0d>] wait_for_completion+0x1d/0x20
> >>  [<ffffffff811de93a>] blkdev_issue_discard+0x27a/0x2c0
> >>  [<ffffffff813e2806>] ? wait_for_common+0x36/0x160
> >>  [<ffffffff811df371>] blkdev_ioctl+0x701/0x760
> >>  [<ffffffff8112b7bf>] ? kmem_cache_free+0x6f/0x160
> >>  [<ffffffff811755b7>] block_ioctl+0x47/0x50
> >>  [<ffffffff81151b78>] do_vfs_ioctl+0x98/0x570
> >>  [<ffffffff813e76dc>] ? sysret_check+0x27/0x62
> >>  [<ffffffff8115209f>] sys_ioctl+0x4f/0x80
> >>  [<ffffffff813e76ab>] system_call_fastpath+0x16/0x1b
> >> no locks held by mkfs.ext4/4304.
> >>
> >> The above message kept repeating forever until system reboot.
> >>
> >> Kernel version:
> >> $ git show | head -n 1
> >> commit ed8f37370d83e695c0a4fa5d5fc7a83ecb947526
> >> $ git describe
> >> v3.0-7216-ged8f373
> >>
> >> I'm considering this as a bug because the state described above makes it
> >> impossible to kill the mkfs process and also makes it impossible to remove the
> >> kernel module ib_srp. That's why I also reported this as
> >> https://bugzilla.kernel.org/show_bug.cgi?id=40472.
> >
> > I am trying to find some race condition that would cause the problem in
> > blkdev_issue_discard(), however I can not see anything.
> 
> I'm not sure why you are looking for a race condition - this looks
> like a plain deadlock to me.
> 
> > The only reason for this to happen I can see is that the last bio was
> > not completed yet (e.g. the bio_batch_end_io() callback has not been
> > called by the last submitted bio). Does bios have some sort of timeout
> > after it dies out? Is it possible that we can lose bio like that ?
> 
> A key fact here is that the block device to which the discard request
> was issued is gone, so the discard request will never finish
> successfully. Do all relevant error paths guarantee that
> blkdev_issue_discard() will finish in a finite time ?
> 
> Bart.

Not really, it is not deadlock. We are simply waiting for the bios in
flight to finish and they should finish (successfully or not) sometime.
But apparently the driver either "lost" the bio or did not noticed that
the device disappeared and is still waiting for the command to finish.

I was looking for the race condition to see if I can blame
blkdev_issue_discard() not calling complete() properly. But since I do
not see any flaw there, I believe that the problem is in the block
device driver.

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: blkdev_issue_discard() hangs forever if the underlying storage device is removed
  2011-08-27  6:11 blkdev_issue_discard() hangs forever if the underlying storage device is removed Bart Van Assche
  2011-08-29 11:56 ` Lukas Czerner
@ 2011-09-22 17:26 ` Bart Van Assche
  1 sibling, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2011-09-22 17:26 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Lukas Czerner; +Cc: LKML

On Sat, Aug 27, 2011 at 8:11 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> Apparently blkdev_issue_discard() never times out, not even if the
> device has been removed. This is what appeared in the kernel log after
> device removal (triggered by running mkfs.ext4 on an SRP SCSI device
> node):
>
> [ ... ]

In case anyone is interested, I ran into a similar call stack with
3.1-rc6 for the truncate_inode_pages() call. I/O was started while the
SRP connection was fully operational and the call stack was reported
after ib_srp had invoked scsi_remove_host(). That excludes the ib_srp
driver as a potential cause of this hang, isn't it ?

INFO: task fio:17621 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fio             D 000000010003baef     0 17621  17606 0x00000004
 ffff8800952498c8 0000000000000046 ffffffff813d81ef ffffffff81082bee
 ffff880000000000 ffff880095249fd8 ffff880095249fd8 ffff880095249fd8
 ffff8801a8bf4ce0 ffff880095249fd8 ffff880095249fd8 ffff880095248000
Call Trace:
 [<ffffffff813d81ef>] ? __schedule+0x66f/0x7d0
 [<ffffffff81082bee>] ? mark_held_locks+0x6e/0x130
 [<ffffffff810e04e0>] ? __lock_page+0x70/0x70
 [<ffffffff8103d21f>] schedule+0x3f/0x60
 [<ffffffff813d8440>] io_schedule+0x60/0x80
 [<ffffffff810e04ee>] sleep_on_page+0xe/0x20
 [<ffffffff813d8a4a>] __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff810e2f6f>] ? find_get_pages+0x10f/0x1c0
 [<ffffffff810e2e60>] ? filemap_fault+0x4b0/0x4b0
 [<ffffffff810e04d7>] __lock_page+0x67/0x70
 [<ffffffff81069d10>] ? autoremove_wake_function+0x50/0x50
 [<ffffffff810ee793>] truncate_inode_pages_range+0x493/0x4a0
 [<ffffffff810ee7b5>] truncate_inode_pages+0x15/0x20
 [<ffffffff8116ff07>] kill_bdev+0x37/0x40
 [<ffffffff81170da4>] __blkdev_put+0x74/0x1c0
 [<ffffffff81170f50>] blkdev_put+0x60/0x190
 [<ffffffff811710a4>] blkdev_close+0x24/0x30
 [<ffffffff8113c138>] fput+0xf8/0x230
 [<ffffffff811381d6>] filp_close+0x66/0x90
 [<ffffffff81049302>] put_files_struct+0xf2/0x1d0
 [<ffffffff81049248>] ? put_files_struct+0x38/0x1d0
 [<ffffffff810494a2>] exit_files+0x52/0x60
 [<ffffffff81049978>] do_exit+0x158/0x850
 [<ffffffff8105b2ee>] ? get_signal_to_deliver+0xee/0x5d0
 [<ffffffff813dacb7>] ? _raw_spin_lock_irq+0x17/0x60
 [<ffffffff813db500>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff8104a30c>] do_group_exit+0x5c/0xd0
 [<ffffffff8105b430>] get_signal_to_deliver+0x230/0x5d0
 [<ffffffff8100219b>] do_signal+0x6b/0x750
 [<ffffffff8106dd02>] ? hrtimer_cancel+0x22/0x30
 [<ffffffff813d9db4>] ? do_nanosleep+0xa4/0xd0
 [<ffffffff8106eb4c>] ? hrtimer_nanosleep+0xac/0x150
 [<ffffffff813e36b1>] ? sysret_signal+0x5/0x3d
 [<ffffffff810028fd>] do_notify_resume+0x5d/0x70
 [<ffffffff811ebe2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff813e38cb>] int_signal+0x12/0x17
1 lock held by fio/17621:
 #0:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff81170d6f>]
__blkdev_put+0x3f/0x1c0

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-09-27 17:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-27  6:11 blkdev_issue_discard() hangs forever if the underlying storage device is removed Bart Van Assche
2011-08-29 11:56 ` Lukas Czerner
2011-08-29 14:15   ` Lin Ming
2011-08-29 14:42     ` Lukas Czerner
2011-08-29 15:11       ` Lin Ming
2011-08-29 15:16         ` Lukas Czerner
2011-08-29 15:26           ` Lin Ming
2011-08-29 17:56   ` Bart Van Assche
2011-08-30  2:01     ` Dave Chinner
2011-08-30 10:38       ` Bart Van Assche
2011-09-27 17:57       ` Bart Van Assche
2011-08-30 10:34     ` Lukas Czerner
2011-09-22 17:26 ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.