All of lore.kernel.org
 help / color / mirror / Atom feed
* Hang in 9p/virtio
@ 2016-07-30 21:42 Vegard Nossum
  2016-08-02  9:03 ` Cornelia Huck
  0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2016-07-30 21:42 UTC (permalink / raw)
  To: Eric Van Hensbergen, Michael S. Tsirkin
  Cc: Cornelia Huck, Aneesh Kumar K.V, v9fs-developer, LKML

Hi,

With fault injection triggering an allocation failure for the
alloc_indirect() call in virtqueue_add() I'm seeing a hang in
p9_virtio_zc_request() -- it seems to be waiting here indefinitely
(i.e. at least 120 seconds):

         err = wait_event_interruptible(*req->wq,
                                        req->status >= REQ_STATUS_RCVD);

Maybe somebody who is already familiar with the could would have a look?

Stack trace for the memory allocation failure:

CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
  ffffffff84354a78 ffff88010594f2e8 ffffffff81d72f91 ffffffff84354a60
  1ffff10020b29e62 ffff88010594f398 ffffffff81e07df7 00007faad2003fff
  0000000000000064 ffffffffffffffff 0000000041b58ab3 ffffffff840a481c
Call Trace:
  [...]
  [<ffffffff81473886>] __kmalloc+0x66/0x2e0
  [<ffffffff81f7c6b4>] alloc_indirect.isra.8+0x24/0xa0
  [<ffffffff81f7d37f>] virtqueue_add_sgs+0x41f/0xc90
  [<ffffffff836eb281>] p9_virtio_zc_request+0x531/0xdb0
  [<ffffffff836d6ecf>] p9_client_zc_rpc.constprop.14+0x23f/0xe80
  [<ffffffff836db77c>] p9_client_read+0x4bc/0x8d0
  [<ffffffff8193f0a3>] v9fs_file_read_iter+0xd3/0x190
  [<ffffffff814b4b62>] do_iter_readv_writev+0x212/0x490
  [<ffffffff814b6be9>] do_readv_writev+0x359/0x660
  [<ffffffff814babc7>] vfs_readv+0x67/0xa0
  [<ffffffff814bacd8>] do_readv+0xd8/0x270

Stack trace for the stuck call:

NMI backtrace for cpu 2
CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff8801174f5b00 task.stack: ffff880105948000
RIP: 0010:[<ffffffff810b02a0>]  [<ffffffff810b02a0>] 
__default_send_IPI_dest_field+0xe0/0x130
Call Trace:
  [...]
  [<ffffffff811d584e>] prepare_to_wait_event+0x19e/0x410
  [<ffffffff836eb790>] p9_virtio_zc_request+0xa40/0xdb0
  [<ffffffff836d6ecf>] p9_client_zc_rpc.constprop.14+0x23f/0xe80
  [<ffffffff836db77c>] p9_client_read+0x4bc/0x8d0
  [<ffffffff8193f0a3>] v9fs_file_read_iter+0xd3/0x190
  [<ffffffff814b4b62>] do_iter_readv_writev+0x212/0x490
  [<ffffffff814b6be9>] do_readv_writev+0x359/0x660
  [<ffffffff814babc7>] vfs_readv+0x67/0xa0
  [<ffffffff814bacd8>] do_readv+0xd8/0x270


Vegard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hang in 9p/virtio
  2016-07-30 21:42 Hang in 9p/virtio Vegard Nossum
@ 2016-08-02  9:03 ` Cornelia Huck
  2016-08-02  9:13   ` Vegard Nossum
  0 siblings, 1 reply; 6+ messages in thread
From: Cornelia Huck @ 2016-08-02  9:03 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Eric Van Hensbergen, Michael S. Tsirkin, Aneesh Kumar K.V,
	v9fs-developer, LKML

On Sat, 30 Jul 2016 23:42:18 +0200
Vegard Nossum <vegard.nossum@oracle.com> wrote:

> Hi,
> 
> With fault injection triggering an allocation failure for the
> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
> (i.e. at least 120 seconds):
> 
>          err = wait_event_interruptible(*req->wq,
>                                         req->status >= REQ_STATUS_RCVD);
> 
> Maybe somebody who is already familiar with the could would have a look?
> 
> Stack trace for the memory allocation failure:
> 
> CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>   ffffffff84354a78 ffff88010594f2e8 ffffffff81d72f91 ffffffff84354a60
>   1ffff10020b29e62 ffff88010594f398 ffffffff81e07df7 00007faad2003fff
>   0000000000000064 ffffffffffffffff 0000000041b58ab3 ffffffff840a481c
> Call Trace:
>   [...]
>   [<ffffffff81473886>] __kmalloc+0x66/0x2e0
>   [<ffffffff81f7c6b4>] alloc_indirect.isra.8+0x24/0xa0
>   [<ffffffff81f7d37f>] virtqueue_add_sgs+0x41f/0xc90
>   [<ffffffff836eb281>] p9_virtio_zc_request+0x531/0xdb0
>   [<ffffffff836d6ecf>] p9_client_zc_rpc.constprop.14+0x23f/0xe80
>   [<ffffffff836db77c>] p9_client_read+0x4bc/0x8d0
>   [<ffffffff8193f0a3>] v9fs_file_read_iter+0xd3/0x190
>   [<ffffffff814b4b62>] do_iter_readv_writev+0x212/0x490
>   [<ffffffff814b6be9>] do_readv_writev+0x359/0x660
>   [<ffffffff814babc7>] vfs_readv+0x67/0xa0
>   [<ffffffff814bacd8>] do_readv+0xd8/0x270
> 
> Stack trace for the stuck call:
> 
> NMI backtrace for cpu 2
> CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> task: ffff8801174f5b00 task.stack: ffff880105948000
> RIP: 0010:[<ffffffff810b02a0>]  [<ffffffff810b02a0>] 
> __default_send_IPI_dest_field+0xe0/0x130
> Call Trace:
>   [...]
>   [<ffffffff811d584e>] prepare_to_wait_event+0x19e/0x410
>   [<ffffffff836eb790>] p9_virtio_zc_request+0xa40/0xdb0
>   [<ffffffff836d6ecf>] p9_client_zc_rpc.constprop.14+0x23f/0xe80
>   [<ffffffff836db77c>] p9_client_read+0x4bc/0x8d0
>   [<ffffffff8193f0a3>] v9fs_file_read_iter+0xd3/0x190
>   [<ffffffff814b4b62>] do_iter_readv_writev+0x212/0x490
>   [<ffffffff814b6be9>] do_readv_writev+0x359/0x660
>   [<ffffffff814babc7>] vfs_readv+0x67/0xa0
>   [<ffffffff814bacd8>] do_readv+0xd8/0x270

What happens is that the code falls back to direct virtio addressing
(after indirect addressing failed) - and this should work.

I'm more inclined to suspect a qemu instead of a kernel bug, as your
qemu version is quite old and there have been fixes in the virtio
buffer handling and virtio-9p in the meantime. (I'm suspecting
"virtio-9p: fix any_layout".)

Could you retry with a more recent qemu (at least version 2.4)?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hang in 9p/virtio
  2016-08-02  9:03 ` Cornelia Huck
@ 2016-08-02  9:13   ` Vegard Nossum
  2016-08-02 13:35     ` Vegard Nossum
  0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2016-08-02  9:13 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Van Hensbergen, Michael S. Tsirkin, Aneesh Kumar K.V,
	v9fs-developer, LKML

On 08/02/2016 11:03 AM, Cornelia Huck wrote:
> On Sat, 30 Jul 2016 23:42:18 +0200
> Vegard Nossum <vegard.nossum@oracle.com> wrote:
>
>> Hi,
>>
>> With fault injection triggering an allocation failure for the
>> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
>> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
>> (i.e. at least 120 seconds):
>>
[...]

> What happens is that the code falls back to direct virtio addressing
> (after indirect addressing failed) - and this should work.
>
> I'm more inclined to suspect a qemu instead of a kernel bug, as your
> qemu version is quite old and there have been fixes in the virtio
> buffer handling and virtio-9p in the meantime. (I'm suspecting
> "virtio-9p: fix any_layout".)
>
> Could you retry with a more recent qemu (at least version 2.4)?

I think maybe the version number in the stack trace is a bit misleading,
this is the full/actual version:

$ kvm --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright 
(c) 2003-2008 Fabrice Bellard

I'll still try to get qemu from git and see if it makes a difference.
Thanks,


Vegard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hang in 9p/virtio
  2016-08-02  9:13   ` Vegard Nossum
@ 2016-08-02 13:35     ` Vegard Nossum
  2016-08-02 16:35       ` Cornelia Huck
  0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2016-08-02 13:35 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Van Hensbergen, Michael S. Tsirkin, Aneesh Kumar K.V,
	v9fs-developer, LKML

On 08/02/2016 11:13 AM, Vegard Nossum wrote:
> On 08/02/2016 11:03 AM, Cornelia Huck wrote:
>> On Sat, 30 Jul 2016 23:42:18 +0200
>> Vegard Nossum <vegard.nossum@oracle.com> wrote:
>>
>>> Hi,
>>>
>>> With fault injection triggering an allocation failure for the
>>> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
>>> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
>>> (i.e. at least 120 seconds):
>>>
> [...]
>
>> What happens is that the code falls back to direct virtio addressing
>> (after indirect addressing failed) - and this should work.
>>
>> I'm more inclined to suspect a qemu instead of a kernel bug, as your
>> qemu version is quite old and there have been fixes in the virtio
>> buffer handling and virtio-9p in the meantime. (I'm suspecting
>> "virtio-9p: fix any_layout".)
>>
>> Could you retry with a more recent qemu (at least version 2.4)?
>
> I think maybe the version number in the stack trace is a bit misleading,
> this is the full/actual version:
>
> $ kvm --version
> QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright
> (c) 2003-2008 Fabrice Bellard
>
> I'll still try to get qemu from git and see if it makes a difference.
> Thanks,

I still seem to get it:

$ qemu-system-x86_64 --version
QEMU emulator version 2.6.91 (v2.7.0-rc1-2-gcc0100f-dirty), Copyright 
(c) 2003-2008 Fabrice Bellard

INFO: task trinity-c2:26510 blocked for more than 120 seconds.
       Not tainted 4.7.0+ #71
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
trinity-c2      D ffff88010509fcf0 26600 26510   1238 0x10080004
  ffff88010509fcf0 ffff880119ea1080 ffff880119ea1098 ffff88011ada06a0
  ffff88011ada06c8 ffff88011ad9fd58 ffffffff844d3060 ffff880119d0db00
  ffff880119ea0000 ffff880105098000 ffffed0020a13001 ffff880105098008
Call Trace:
  [<ffffffff838a01fa>] schedule+0x9a/0x1c0
  [<ffffffff838a0373>] schedule_preempt_disabled+0x13/0x20
  [<ffffffff838a4046>] mutex_lock_nested+0x2d6/0x7d0
  [<ffffffff81512244>] ? __fdget_pos+0x84/0xb0
  [<ffffffff838a3d70>] ? ww_mutex_unlock+0x260/0x260
  [<ffffffff814bb510>] ? do_pwritev+0x170/0x170
  [<ffffffff81512244>] __fdget_pos+0x84/0xb0
  [<ffffffff814bad19>] do_readv+0x79/0x270
  [<ffffffff814baca0>] ? vfs_readv+0xa0/0xa0
  [<ffffffff81dd1fd3>] ? __this_cpu_preempt_check+0x13/0x20
  [<ffffffff814bb510>] ? do_pwritev+0x170/0x170
  [<ffffffff814bb51b>] SyS_readv+0xb/0x10
  [<ffffffff81005391>] do_syscall_64+0x1a1/0x460
  [<ffffffff8137335a>] ? __context_tracking_enter+0xaa/0x200
  [<ffffffff838adc6a>] entry_SYSCALL64_slow_path+0x25/0x25
1 lock held by trinity-c2/26510:
  #0:  (&f->f_pos_lock){......}, at: [<ffffffff81512244>] 
__fdget_pos+0x84/0xb0

Showing all locks held in the system:
2 locks held by khungtaskd/505:
  #0:  (rcu_read_lock){......}, at: [<ffffffff812baf3f>] watchdog+0xff/0x840
  #1:  (tasklist_lock){......}, at: [<ffffffff811e1f80>] 
debug_show_all_locks+0x70/0x280
1 lock held by trinity-c1/26123:
  #0:  (&f->f_pos_lock){......}, at: [<ffffffff81512244>] 
__fdget_pos+0x84/0xb0
1 lock held by trinity-c2/26510:
  #0:  (&f->f_pos_lock){......}, at: [<ffffffff81512244>] 
__fdget_pos+0x84/0xb0
1 lock held by trinity-c0/29159:
  #0:  (&f->f_pos_lock){......}, at: [<ffffffff81512244>] 
__fdget_pos+0x84/0xb0

=============================================

...

Kernel panic - not syncing: hung_task: blocked tasks
CPU: 0 PID: 505 Comm: khungtaskd Not tainted 4.7.0+ #71
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014


Vegard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hang in 9p/virtio
  2016-08-02 13:35     ` Vegard Nossum
@ 2016-08-02 16:35       ` Cornelia Huck
  2016-08-02 16:49         ` Michael S. Tsirkin
  0 siblings, 1 reply; 6+ messages in thread
From: Cornelia Huck @ 2016-08-02 16:35 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Eric Van Hensbergen, Michael S. Tsirkin, Aneesh Kumar K.V,
	v9fs-developer, LKML

On Tue, 2 Aug 2016 15:35:34 +0200
Vegard Nossum <vegard.nossum@oracle.com> wrote:

> On 08/02/2016 11:13 AM, Vegard Nossum wrote:
> > On 08/02/2016 11:03 AM, Cornelia Huck wrote:
> >> On Sat, 30 Jul 2016 23:42:18 +0200
> >> Vegard Nossum <vegard.nossum@oracle.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> With fault injection triggering an allocation failure for the
> >>> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
> >>> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
> >>> (i.e. at least 120 seconds):
> >>>
> > [...]
> >
> >> What happens is that the code falls back to direct virtio addressing
> >> (after indirect addressing failed) - and this should work.
> >>
> >> I'm more inclined to suspect a qemu instead of a kernel bug, as your
> >> qemu version is quite old and there have been fixes in the virtio
> >> buffer handling and virtio-9p in the meantime. (I'm suspecting
> >> "virtio-9p: fix any_layout".)
> >>
> >> Could you retry with a more recent qemu (at least version 2.4)?
> >
> > I think maybe the version number in the stack trace is a bit misleading,
> > this is the full/actual version:
> >
> > $ kvm --version
> > QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright
> > (c) 2003-2008 Fabrice Bellard
> >
> > I'll still try to get qemu from git and see if it makes a difference.
> > Thanks,
> 
> I still seem to get it:
> 
> $ qemu-system-x86_64 --version
> QEMU emulator version 2.6.91 (v2.7.0-rc1-2-gcc0100f-dirty), Copyright 
> (c) 2003-2008 Fabrice Bellard

:(

Sorry, no good immediate idea.

One thing would be to check whether you get notified by qemu after the
request was queued (i.e., whether vring_interrupt() ever gets called
with 9p's req_done() after the alloc failure was injected). This would
help to suggest whether to continue debugging here or in qemu.

I still think the root of this error is some failure of the virtio 9p
code to deal with non-indirect buffers, either in the driver or in qemu.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hang in 9p/virtio
  2016-08-02 16:35       ` Cornelia Huck
@ 2016-08-02 16:49         ` Michael S. Tsirkin
  0 siblings, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2016-08-02 16:49 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Vegard Nossum, Eric Van Hensbergen, Aneesh Kumar K.V,
	v9fs-developer, LKML

On Tue, Aug 02, 2016 at 06:35:02PM +0200, Cornelia Huck wrote:
> On Tue, 2 Aug 2016 15:35:34 +0200
> Vegard Nossum <vegard.nossum@oracle.com> wrote:
> 
> > On 08/02/2016 11:13 AM, Vegard Nossum wrote:
> > > On 08/02/2016 11:03 AM, Cornelia Huck wrote:
> > >> On Sat, 30 Jul 2016 23:42:18 +0200
> > >> Vegard Nossum <vegard.nossum@oracle.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> With fault injection triggering an allocation failure for the
> > >>> alloc_indirect() call in virtqueue_add() I'm seeing a hang in
> > >>> p9_virtio_zc_request() -- it seems to be waiting here indefinitely
> > >>> (i.e. at least 120 seconds):
> > >>>
> > > [...]
> > >
> > >> What happens is that the code falls back to direct virtio addressing
> > >> (after indirect addressing failed) - and this should work.
> > >>
> > >> I'm more inclined to suspect a qemu instead of a kernel bug, as your
> > >> qemu version is quite old and there have been fixes in the virtio
> > >> buffer handling and virtio-9p in the meantime. (I'm suspecting
> > >> "virtio-9p: fix any_layout".)
> > >>
> > >> Could you retry with a more recent qemu (at least version 2.4)?
> > >
> > > I think maybe the version number in the stack trace is a bit misleading,
> > > this is the full/actual version:
> > >
> > > $ kvm --version
> > > QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright
> > > (c) 2003-2008 Fabrice Bellard
> > >
> > > I'll still try to get qemu from git and see if it makes a difference.
> > > Thanks,
> > 
> > I still seem to get it:
> > 
> > $ qemu-system-x86_64 --version
> > QEMU emulator version 2.6.91 (v2.7.0-rc1-2-gcc0100f-dirty), Copyright 
> > (c) 2003-2008 Fabrice Bellard
> 
> :(
> 
> Sorry, no good immediate idea.
> 
> One thing would be to check whether you get notified by qemu after the
> request was queued (i.e., whether vring_interrupt() ever gets called
> with 9p's req_done() after the alloc failure was injected). This would
> help to suggest whether to continue debugging here or in qemu.
> 
> I still think the root of this error is some failure of the virtio 9p
> code to deal with non-indirect buffers, either in the driver or in qemu.

It might be interesting to just disable indirect buffers on qemu
command line by specifying indirect_desc=off.
This way you avoid using error paths.

-- 
MST

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-02 16:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-30 21:42 Hang in 9p/virtio Vegard Nossum
2016-08-02  9:03 ` Cornelia Huck
2016-08-02  9:13   ` Vegard Nossum
2016-08-02 13:35     ` Vegard Nossum
2016-08-02 16:35       ` Cornelia Huck
2016-08-02 16:49         ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.