* Hang in 9p/virtio
@ 2016-07-30 21:42 Vegard Nossum
2016-08-02 9:03 ` Cornelia Huck
0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2016-07-30 21:42 UTC (permalink / raw)
To: Eric Van Hensbergen, Michael S. Tsirkin
Cc: Cornelia Huck, Aneesh Kumar K.V, v9fs-developer, LKML
Hi,
With fault injection triggering an allocation failure for the
alloc_indirect() call in virtqueue_add() I'm seeing a hang in
p9_virtio_zc_request() -- it seems to be waiting here indefinitely
(i.e. at least 120 seconds):
err = wait_event_interruptible(*req->wq,
req->status >= REQ_STATUS_RCVD);
Maybe somebody who is already familiar with the could would have a look?
Stack trace for the memory allocation failure:
CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
ffffffff84354a78 ffff88010594f2e8 ffffffff81d72f91 ffffffff84354a60
1ffff10020b29e62 ffff88010594f398 ffffffff81e07df7 00007faad2003fff
0000000000000064 ffffffffffffffff 0000000041b58ab3 ffffffff840a481c
Call Trace:
[...]
[<ffffffff81473886>] __kmalloc+0x66/0x2e0
[<ffffffff81f7c6b4>] alloc_indirect.isra.8+0x24/0xa0
[<ffffffff81f7d37f>] virtqueue_add_sgs+0x41f/0xc90
[<ffffffff836eb281>] p9_virtio_zc_request+0x531/0xdb0
[<ffffffff836d6ecf>] p9_client_zc_rpc.constprop.14+0x23f/0xe80
[<ffffffff836db77c>] p9_client_read+0x4bc/0x8d0
[<ffffffff8193f0a3>] v9fs_file_read_iter+0xd3/0x190
[<ffffffff814b4b62>] do_iter_readv_writev+0x212/0x490
[<ffffffff814b6be9>] do_readv_writev+0x359/0x660
[<ffffffff814babc7>] vfs_readv+0x67/0xa0
[<ffffffff814bacd8>] do_readv+0xd8/0x270
Stack trace for the stuck call:
NMI backtrace for cpu 2
CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff8801174f5b00 task.stack: ffff880105948000
RIP: 0010:[<ffffffff810b02a0>] [<ffffffff810b02a0>]
__default_send_IPI_dest_field+0xe0/0x130
Call Trace:
[...]
[<ffffffff811d584e>] prepare_to_wait_event+0x19e/0x410
[<ffffffff836eb790>] p9_virtio_zc_request+0xa40/0xdb0
[<ffffffff836d6ecf>] p9_client_zc_rpc.constprop.14+0x23f/0xe80
[<ffffffff836db77c>] p9_client_read+0x4bc/0x8d0
[<ffffffff8193f0a3>] v9fs_file_read_iter+0xd3/0x190
[<ffffffff814b4b62>] do_iter_readv_writev+0x212/0x490
[<ffffffff814b6be9>] do_readv_writev+0x359/0x660
[<ffffffff814babc7>] vfs_readv+0x67/0xa0
[<ffffffff814bacd8>] do_readv+0xd8/0x270
Vegard
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hang in 9p/virtio
2016-07-30 21:42 Hang in 9p/virtio Vegard Nossum
@ 2016-08-02 9:03 ` Cornelia Huck
2016-08-02 9:13 ` Vegard Nossum
0 siblings, 1 reply; 6+ messages in thread
From: Cornelia Huck @ 2016-08-02 9:03 UTC (permalink / raw)
To: Vegard Nossum
Cc: Eric Van Hensbergen, Michael S. Tsirkin, Aneesh Kumar K.V,
v9fs-developer, LKML
On Sat, 30 Jul 2016 23:42:18 +0200
Vegard Nossum <vegard.nossum@oracle.com> wrote:
> Hi,
>
> With fault injection triggering an allocation failure for the
> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
> (i.e. at least 120 seconds):
>
> err = wait_event_interruptible(*req->wq,
> req->status >= REQ_STATUS_RCVD);
>
> Maybe somebody who is already familiar with the could would have a look?
>
> Stack trace for the memory allocation failure:
>
> CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> ffffffff84354a78 ffff88010594f2e8 ffffffff81d72f91 ffffffff84354a60
> 1ffff10020b29e62 ffff88010594f398 ffffffff81e07df7 00007faad2003fff
> 0000000000000064 ffffffffffffffff 0000000041b58ab3 ffffffff840a481c
> Call Trace:
> [...]
> [<ffffffff81473886>] __kmalloc+0x66/0x2e0
> [<ffffffff81f7c6b4>] alloc_indirect.isra.8+0x24/0xa0
> [<ffffffff81f7d37f>] virtqueue_add_sgs+0x41f/0xc90
> [<ffffffff836eb281>] p9_virtio_zc_request+0x531/0xdb0
> [<ffffffff836d6ecf>] p9_client_zc_rpc.constprop.14+0x23f/0xe80
> [<ffffffff836db77c>] p9_client_read+0x4bc/0x8d0
> [<ffffffff8193f0a3>] v9fs_file_read_iter+0xd3/0x190
> [<ffffffff814b4b62>] do_iter_readv_writev+0x212/0x490
> [<ffffffff814b6be9>] do_readv_writev+0x359/0x660
> [<ffffffff814babc7>] vfs_readv+0x67/0xa0
> [<ffffffff814bacd8>] do_readv+0xd8/0x270
>
> Stack trace for the stuck call:
>
> NMI backtrace for cpu 2
> CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> task: ffff8801174f5b00 task.stack: ffff880105948000
> RIP: 0010:[<ffffffff810b02a0>] [<ffffffff810b02a0>]
> __default_send_IPI_dest_field+0xe0/0x130
> Call Trace:
> [...]
> [<ffffffff811d584e>] prepare_to_wait_event+0x19e/0x410
> [<ffffffff836eb790>] p9_virtio_zc_request+0xa40/0xdb0
> [<ffffffff836d6ecf>] p9_client_zc_rpc.constprop.14+0x23f/0xe80
> [<ffffffff836db77c>] p9_client_read+0x4bc/0x8d0
> [<ffffffff8193f0a3>] v9fs_file_read_iter+0xd3/0x190
> [<ffffffff814b4b62>] do_iter_readv_writev+0x212/0x490
> [<ffffffff814b6be9>] do_readv_writev+0x359/0x660
> [<ffffffff814babc7>] vfs_readv+0x67/0xa0
> [<ffffffff814bacd8>] do_readv+0xd8/0x270
What happens is that the code falls back to direct virtio addressing
(after indirect addressing failed) - and this should work.
I'm more inclined to suspect a qemu instead of a kernel bug, as your
qemu version is quite old and there have been fixes in the virtio
buffer handling and virtio-9p in the meantime. (I'm suspecting
"virtio-9p: fix any_layout".)
Could you retry with a more recent qemu (at least version 2.4)?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hang in 9p/virtio
2016-08-02 9:03 ` Cornelia Huck
@ 2016-08-02 9:13 ` Vegard Nossum
2016-08-02 13:35 ` Vegard Nossum
0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2016-08-02 9:13 UTC (permalink / raw)
To: Cornelia Huck
Cc: Eric Van Hensbergen, Michael S. Tsirkin, Aneesh Kumar K.V,
v9fs-developer, LKML
On 08/02/2016 11:03 AM, Cornelia Huck wrote:
> On Sat, 30 Jul 2016 23:42:18 +0200
> Vegard Nossum <vegard.nossum@oracle.com> wrote:
>
>> Hi,
>>
>> With fault injection triggering an allocation failure for the
>> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
>> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
>> (i.e. at least 120 seconds):
>>
[...]
> What happens is that the code falls back to direct virtio addressing
> (after indirect addressing failed) - and this should work.
>
> I'm more inclined to suspect a qemu instead of a kernel bug, as your
> qemu version is quite old and there have been fixes in the virtio
> buffer handling and virtio-9p in the meantime. (I'm suspecting
> "virtio-9p: fix any_layout".)
>
> Could you retry with a more recent qemu (at least version 2.4)?
I think maybe the version number in the stack trace is a bit misleading,
this is the full/actual version:
$ kvm --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright
(c) 2003-2008 Fabrice Bellard
I'll still try to get qemu from git and see if it makes a difference.
Thanks,
Vegard
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hang in 9p/virtio
2016-08-02 9:13 ` Vegard Nossum
@ 2016-08-02 13:35 ` Vegard Nossum
2016-08-02 16:35 ` Cornelia Huck
0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2016-08-02 13:35 UTC (permalink / raw)
To: Cornelia Huck
Cc: Eric Van Hensbergen, Michael S. Tsirkin, Aneesh Kumar K.V,
v9fs-developer, LKML
On 08/02/2016 11:13 AM, Vegard Nossum wrote:
> On 08/02/2016 11:03 AM, Cornelia Huck wrote:
>> On Sat, 30 Jul 2016 23:42:18 +0200
>> Vegard Nossum <vegard.nossum@oracle.com> wrote:
>>
>>> Hi,
>>>
>>> With fault injection triggering an allocation failure for the
>>> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
>>> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
>>> (i.e. at least 120 seconds):
>>>
> [...]
>
>> What happens is that the code falls back to direct virtio addressing
>> (after indirect addressing failed) - and this should work.
>>
>> I'm more inclined to suspect a qemu instead of a kernel bug, as your
>> qemu version is quite old and there have been fixes in the virtio
>> buffer handling and virtio-9p in the meantime. (I'm suspecting
>> "virtio-9p: fix any_layout".)
>>
>> Could you retry with a more recent qemu (at least version 2.4)?
>
> I think maybe the version number in the stack trace is a bit misleading,
> this is the full/actual version:
>
> $ kvm --version
> QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright
> (c) 2003-2008 Fabrice Bellard
>
> I'll still try to get qemu from git and see if it makes a difference.
> Thanks,
I still seem to get it:
$ qemu-system-x86_64 --version
QEMU emulator version 2.6.91 (v2.7.0-rc1-2-gcc0100f-dirty), Copyright
(c) 2003-2008 Fabrice Bellard
INFO: task trinity-c2:26510 blocked for more than 120 seconds.
Not tainted 4.7.0+ #71
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
trinity-c2 D ffff88010509fcf0 26600 26510 1238 0x10080004
ffff88010509fcf0 ffff880119ea1080 ffff880119ea1098 ffff88011ada06a0
ffff88011ada06c8 ffff88011ad9fd58 ffffffff844d3060 ffff880119d0db00
ffff880119ea0000 ffff880105098000 ffffed0020a13001 ffff880105098008
Call Trace:
[<ffffffff838a01fa>] schedule+0x9a/0x1c0
[<ffffffff838a0373>] schedule_preempt_disabled+0x13/0x20
[<ffffffff838a4046>] mutex_lock_nested+0x2d6/0x7d0
[<ffffffff81512244>] ? __fdget_pos+0x84/0xb0
[<ffffffff838a3d70>] ? ww_mutex_unlock+0x260/0x260
[<ffffffff814bb510>] ? do_pwritev+0x170/0x170
[<ffffffff81512244>] __fdget_pos+0x84/0xb0
[<ffffffff814bad19>] do_readv+0x79/0x270
[<ffffffff814baca0>] ? vfs_readv+0xa0/0xa0
[<ffffffff81dd1fd3>] ? __this_cpu_preempt_check+0x13/0x20
[<ffffffff814bb510>] ? do_pwritev+0x170/0x170
[<ffffffff814bb51b>] SyS_readv+0xb/0x10
[<ffffffff81005391>] do_syscall_64+0x1a1/0x460
[<ffffffff8137335a>] ? __context_tracking_enter+0xaa/0x200
[<ffffffff838adc6a>] entry_SYSCALL64_slow_path+0x25/0x25
1 lock held by trinity-c2/26510:
#0: (&f->f_pos_lock){......}, at: [<ffffffff81512244>]
__fdget_pos+0x84/0xb0
Showing all locks held in the system:
2 locks held by khungtaskd/505:
#0: (rcu_read_lock){......}, at: [<ffffffff812baf3f>] watchdog+0xff/0x840
#1: (tasklist_lock){......}, at: [<ffffffff811e1f80>]
debug_show_all_locks+0x70/0x280
1 lock held by trinity-c1/26123:
#0: (&f->f_pos_lock){......}, at: [<ffffffff81512244>]
__fdget_pos+0x84/0xb0
1 lock held by trinity-c2/26510:
#0: (&f->f_pos_lock){......}, at: [<ffffffff81512244>]
__fdget_pos+0x84/0xb0
1 lock held by trinity-c0/29159:
#0: (&f->f_pos_lock){......}, at: [<ffffffff81512244>]
__fdget_pos+0x84/0xb0
=============================================
...
Kernel panic - not syncing: hung_task: blocked tasks
CPU: 0 PID: 505 Comm: khungtaskd Not tainted 4.7.0+ #71
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Vegard
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hang in 9p/virtio
2016-08-02 13:35 ` Vegard Nossum
@ 2016-08-02 16:35 ` Cornelia Huck
2016-08-02 16:49 ` Michael S. Tsirkin
0 siblings, 1 reply; 6+ messages in thread
From: Cornelia Huck @ 2016-08-02 16:35 UTC (permalink / raw)
To: Vegard Nossum
Cc: Eric Van Hensbergen, Michael S. Tsirkin, Aneesh Kumar K.V,
v9fs-developer, LKML
On Tue, 2 Aug 2016 15:35:34 +0200
Vegard Nossum <vegard.nossum@oracle.com> wrote:
> On 08/02/2016 11:13 AM, Vegard Nossum wrote:
> > On 08/02/2016 11:03 AM, Cornelia Huck wrote:
> >> On Sat, 30 Jul 2016 23:42:18 +0200
> >> Vegard Nossum <vegard.nossum@oracle.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> With fault injection triggering an allocation failure for the
> >>> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
> >>> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
> >>> (i.e. at least 120 seconds):
> >>>
> > [...]
> >
> >> What happens is that the code falls back to direct virtio addressing
> >> (after indirect addressing failed) - and this should work.
> >>
> >> I'm more inclined to suspect a qemu instead of a kernel bug, as your
> >> qemu version is quite old and there have been fixes in the virtio
> >> buffer handling and virtio-9p in the meantime. (I'm suspecting
> >> "virtio-9p: fix any_layout".)
> >>
> >> Could you retry with a more recent qemu (at least version 2.4)?
> >
> > I think maybe the version number in the stack trace is a bit misleading,
> > this is the full/actual version:
> >
> > $ kvm --version
> > QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright
> > (c) 2003-2008 Fabrice Bellard
> >
> > I'll still try to get qemu from git and see if it makes a difference.
> > Thanks,
>
> I still seem to get it:
>
> $ qemu-system-x86_64 --version
> QEMU emulator version 2.6.91 (v2.7.0-rc1-2-gcc0100f-dirty), Copyright
> (c) 2003-2008 Fabrice Bellard
:(
Sorry, no good immediate idea.
One thing would be to check whether you get notified by qemu after the
request was queued (i.e., whether vring_interrupt() ever gets called
with 9p's req_done() after the alloc failure was injected). This would
help to suggest whether to continue debugging here or in qemu.
I still think the root of this error is some failure of the virtio 9p
code to deal with non-indirect buffers, either in the driver or in qemu.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hang in 9p/virtio
2016-08-02 16:35 ` Cornelia Huck
@ 2016-08-02 16:49 ` Michael S. Tsirkin
0 siblings, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2016-08-02 16:49 UTC (permalink / raw)
To: Cornelia Huck
Cc: Vegard Nossum, Eric Van Hensbergen, Aneesh Kumar K.V,
v9fs-developer, LKML
On Tue, Aug 02, 2016 at 06:35:02PM +0200, Cornelia Huck wrote:
> On Tue, 2 Aug 2016 15:35:34 +0200
> Vegard Nossum <vegard.nossum@oracle.com> wrote:
>
> > On 08/02/2016 11:13 AM, Vegard Nossum wrote:
> > > On 08/02/2016 11:03 AM, Cornelia Huck wrote:
> > >> On Sat, 30 Jul 2016 23:42:18 +0200
> > >> Vegard Nossum <vegard.nossum@oracle.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> With fault injection triggering an allocation failure for the
> > >>> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
> > >>> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
> > >>> (i.e. at least 120 seconds):
> > >>>
> > > [...]
> > >
> > >> What happens is that the code falls back to direct virtio addressing
> > >> (after indirect addressing failed) - and this should work.
> > >>
> > >> I'm more inclined to suspect a qemu instead of a kernel bug, as your
> > >> qemu version is quite old and there have been fixes in the virtio
> > >> buffer handling and virtio-9p in the meantime. (I'm suspecting
> > >> "virtio-9p: fix any_layout".)
> > >>
> > >> Could you retry with a more recent qemu (at least version 2.4)?
> > >
> > > I think maybe the version number in the stack trace is a bit misleading,
> > > this is the full/actual version:
> > >
> > > $ kvm --version
> > > QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright
> > > (c) 2003-2008 Fabrice Bellard
> > >
> > > I'll still try to get qemu from git and see if it makes a difference.
> > > Thanks,
> >
> > I still seem to get it:
> >
> > $ qemu-system-x86_64 --version
> > QEMU emulator version 2.6.91 (v2.7.0-rc1-2-gcc0100f-dirty), Copyright
> > (c) 2003-2008 Fabrice Bellard
>
> :(
>
> Sorry, no good immediate idea.
>
> One thing would be to check whether you get notified by qemu after the
> request was queued (i.e., whether vring_interrupt() ever gets called
> with 9p's req_done() after the alloc failure was injected). This would
> help to suggest whether to continue debugging here or in qemu.
>
> I still think the root of this error is some failure of the virtio 9p
> code to deal with non-indirect buffers, either in the driver or in qemu.
It might be interesting to just disable indirect buffers on qemu
command line by specifying indirect_desc=off.
This way you avoid using error paths.
--
MST
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-08-02 16:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-30 21:42 Hang in 9p/virtio Vegard Nossum
2016-08-02 9:03 ` Cornelia Huck
2016-08-02 9:13 ` Vegard Nossum
2016-08-02 13:35 ` Vegard Nossum
2016-08-02 16:35 ` Cornelia Huck
2016-08-02 16:49 ` Michael S. Tsirkin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.