All of lore.kernel.org
 help / color / mirror / Atom feed
* exit_aio() hang after I/O failure
@ 2012-01-22 12:48 Bart Van Assche
  2012-01-23 16:15 ` Jeff Moyer
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2012-01-22 12:48 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, LKML

Hi,

Apparently processes can hang in exit_aio() with at least kernel 3.2.1
after an I/O failure. Has anyone seen this before ?

This occurred after a SCSI device had been removed entirely (and hence
after all I/O requests were killed by scsi_remove_host()).

>From sysrq-t:

fio             D ffff88006546ab40     0  8966   8965 0x00000004
 ffff88011dab1b88 0000000000000046 ffff880100000000 ffff88012fa11d80
 ffff88006546a880 ffff88011dab1fd8 ffff88011dab1fd8 ffff88011dab1fd8
 ffff880128b89440 ffff88006546a880 ffff88011dab1ce8 0000000181039871
Call Trace:
 [<ffffffff813b360f>] schedule+0x3f/0x60
 [<ffffffff813b368f>] io_schedule+0x5f/0x80
 [<ffffffff811719e0>] wait_for_all_aios+0xc0/0x100
 [<ffffffff8103c210>] ? try_to_wake_up+0x270/0x270
 [<ffffffff81172955>] exit_aio+0x55/0xc0
 [<ffffffff8104119d>] mmput+0x2d/0x110
 [<ffffffff810478cd>] exit_mm+0x10d/0x130
 [<ffffffff81047f4c>] do_exit+0x65c/0x850
 [<ffffffff810484a4>] do_group_exit+0x44/0xb0
 [<ffffffff81057cb8>] get_signal_to_deliver+0x218/0x5a0
 [<ffffffff81002065>] do_signal+0x65/0x700
 [<ffffffff8106925a>] ? hrtimer_try_to_cancel+0x4a/0xb0
 [<ffffffff810692e2>] ? hrtimer_cancel+0x22/0x30
 [<ffffffff813b43c4>] ? do_nanosleep+0xa4/0xd0
 [<ffffffff813b4326>] ? do_nanosleep+0x6/0xd0
 [<ffffffff81069ea0>] ? hrtimer_nanosleep+0xa0/0x150
 [<ffffffff81002775>] do_notify_resume+0x55/0x70
 [<ffffffff811dcd6e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff813bcff3>] int_signal+0x12/0x17

Bart.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: exit_aio() hang after I/O failure
  2012-01-22 12:48 exit_aio() hang after I/O failure Bart Van Assche
@ 2012-01-23 16:15 ` Jeff Moyer
  2012-01-23 16:47   ` Bart Van Assche
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Moyer @ 2012-01-23 16:15 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, Tejun Heo, LKML

Bart Van Assche <bvanassche@acm.org> writes:

> Hi,
>
> Apparently processes can hang in exit_aio() with at least kernel 3.2.1
> after an I/O failure. Has anyone seen this before ?
>
> This occurred after a SCSI device had been removed entirely (and hence
> after all I/O requests were killed by scsi_remove_host()).

Fixed here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=69e4747ee9727d660b88d7e1efe0f4afcb35db1b

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: exit_aio() hang after I/O failure
  2012-01-23 16:15 ` Jeff Moyer
@ 2012-01-23 16:47   ` Bart Van Assche
  2012-02-11 18:30     ` Bart Van Assche
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2012-01-23 16:47 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Jens Axboe, Tejun Heo, LKML

On Mon, Jan 23, 2012 at 4:15 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Bart Van Assche <bvanassche@acm.org> writes:
> > Apparently processes can hang in exit_aio() with at least kernel 3.2.1
> > after an I/O failure. Has anyone seen this before ?
> >
> > This occurred after a SCSI device had been removed entirely (and hence
> > after all I/O requests were killed by scsi_remove_host()).
>
> Fixed here:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=69e4747ee9727d660b88d7e1efe0f4afcb35db1b

Thanks a lot for the feedback - I'll give this patch a try.

Bart.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: exit_aio() hang after I/O failure
  2012-01-23 16:47   ` Bart Van Assche
@ 2012-02-11 18:30     ` Bart Van Assche
  2012-02-13 14:11       ` Jeff Moyer
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2012-02-11 18:30 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Jens Axboe, Tejun Heo, LKML

On Mon, Jan 23, 2012 at 4:47 PM, Bart Van Assche <bvanassche@acm.org> wrote:
> On Mon, Jan 23, 2012 at 4:15 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Bart Van Assche <bvanassche@acm.org> writes:
> > > Apparently processes can hang in exit_aio() with at least kernel 3.2.1
> > > after an I/O failure. Has anyone seen this before ?
> > >
> > > This occurred after a SCSI device had been removed entirely (and hence
> > > after all I/O requests were killed by scsi_remove_host()).
> >
> > Fixed here:
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=69e4747ee9727d660b88d7e1efe0f4afcb35db1b
>
> Thanks a lot for the feedback - I'll give this patch a try.

Bad news: I've been able to reproduce exactly the same call stack with
kernel 3.2.5. That kernel version includes the aforementioned commit.

# echo t >/proc/sysrq-trigger
[ ... ]
fio             D 0000000000000001     0 25052  25008 0x00000004
 ffff88001c32fb88 0000000000000046 ffff880000000000 ffff88007d949bc8
 ffff88001c8e14d0 ffff88001c32ffd8 ffff88001c32ffd8 ffff88001c32ffd8
 ffff880128b894d0 ffff88001c8e14d0 ffff88001c32fb88 000000018106f24d
Call Trace:
 [<ffffffff813b683f>] schedule+0x3f/0x60
 [<ffffffff813b68ef>] io_schedule+0x8f/0xd0
 [<ffffffff81174410>] wait_for_all_aios+0xc0/0x100
 [<ffffffff8103c3c0>] ? try_to_wake_up+0x270/0x270
 [<ffffffff81175385>] exit_aio+0x55/0xc0
 [<ffffffff810413cd>] mmput+0x2d/0x110
 [<ffffffff81047c1d>] exit_mm+0x10d/0x130
 [<ffffffff810482b1>] do_exit+0x671/0x860
 [<ffffffff81033c1e>] ? finish_task_switch+0x4e/0xe0
 [<ffffffff81048804>] do_group_exit+0x44/0xb0
 [<ffffffff81058018>] get_signal_to_deliver+0x218/0x5a0
 [<ffffffff81002065>] do_signal+0x65/0x700
 [<ffffffff811740d0>] ? aio_read_evt+0x150/0x150
 [<ffffffff8103c3c0>] ? try_to_wake_up+0x270/0x270
 [<ffffffff81002785>] do_notify_resume+0x65/0x80
 [<ffffffff811df84e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff813c0333>] int_signal+0x12/0x17
[ ... ]

Bart.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: exit_aio() hang after I/O failure
  2012-02-11 18:30     ` Bart Van Assche
@ 2012-02-13 14:11       ` Jeff Moyer
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Moyer @ 2012-02-13 14:11 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, Tejun Heo, LKML

Bart Van Assche <bvanassche@acm.org> writes:

> On Mon, Jan 23, 2012 at 4:47 PM, Bart Van Assche <bvanassche@acm.org> wrote:
>> On Mon, Jan 23, 2012 at 4:15 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> > Bart Van Assche <bvanassche@acm.org> writes:
>> > > Apparently processes can hang in exit_aio() with at least kernel 3.2.1
>> > > after an I/O failure. Has anyone seen this before ?
>> > >
>> > > This occurred after a SCSI device had been removed entirely (and hence
>> > > after all I/O requests were killed by scsi_remove_host()).
>> >
>> > Fixed here:
>> >
>> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=69e4747ee9727d660b88d7e1efe0f4afcb35db1b
>>
>> Thanks a lot for the feedback - I'll give this patch a try.
>
> Bad news: I've been able to reproduce exactly the same call stack with
> kernel 3.2.5. That kernel version includes the aforementioned commit.

OK, thanks for testing.  I'll try to reproduce this.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-02-13 14:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-22 12:48 exit_aio() hang after I/O failure Bart Van Assche
2012-01-23 16:15 ` Jeff Moyer
2012-01-23 16:47   ` Bart Van Assche
2012-02-11 18:30     ` Bart Van Assche
2012-02-13 14:11       ` Jeff Moyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.