Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. [1]

All of lore.kernel.org
 help / color / mirror / Atom feed

* Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. [1]
       [not found] <efd45fba-5724-0036-8473-0274b5816ae9@redhat.com>
@ 2017-11-13 15:54 ` David Hill
       [not found]   ` <CALapVYHmf7gG25nA-5LkoaTDR8gB0xQ1Ro_FyyCQNbzrfSp+aQ@mail.gmail.com>
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-11-13 15:54 UTC (permalink / raw)
  To: kvm


Hi guys,

    Starting with kernel 4.14-rc1, my CI failed to completely shutdown 
one of the
VMs and it stuck in "in shutdown" while sending this kernel message:

[ 7496.552971] INFO: task qemu-system-x86:5978 blocked for more than 120 
seconds.
[ 7496.552987]       Tainted: G          I 
4.14.0-0.rc1.git3.1.fc28.x86_64 #1
[ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004
[ 7496.553024] Call Trace:
[ 7496.553044]  __schedule+0x2dc/0xbb0
[ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
[ 7496.553074]  schedule+0x3d/0x90
[ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
[ 7496.553100]  ? finish_wait+0x90/0x90
[ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
[ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
[ 7496.553166]  SyS_ioctl+0x79/0x90
[ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 7496.553190] RIP: 0033:0x7fa1ea0e1817
[ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
[ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 
00007fa1ea0e1817
[ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI: 
0000000000000021
[ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09: 
000055e330245d92
[ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12: 
000055e33351a000
[ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15: 
0000000000000000
[ 7496.553284]
                Showing all locks held in the system:
[ 7496.553313] 1 lock held by khungtaskd/161:
[ 7496.553319]  #0:  (tasklist_lock){.+.+}, at: [<ffffffff8111740d>] 
debug_show_all_locks+0x3d/0x1a0
[ 7496.553373] 1 lock held by in:imklog/1194:
[ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at: [<ffffffff8130ecfc>] 
__fdget_pos+0x4c/0x60
[ 7496.553541] 1 lock held by qemu-system-x86/5978:
[ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at: [<ffffffffc077e498>] 
vhost_net_ioctl+0x358/0x910 [vhost_net]



I'm currently bisecting to figure out which commit breaks this but for 
some reasons, when hitting this commit:

# bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 
'wireless-drivers-next-for-davem-2017-08-07' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63

the host will not allow SSHd to establish a new session and when 
starting a KVM guest, the host will hard lock.   I'm still bisecting but 
I marked that commit as bad even though perhaps it would be good.
Hopefully, this commit was a bad one and my bisection will pinpoint which
commit broke the kernel.    If you have an idea of which commit might break
the system, please let me know which one I should test first.

Thank you very much,
David Hill

[1] https://bugzilla.kernel.org/show_bug.cgi?id=197861

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. [1]
       [not found]   ` <CALapVYHmf7gG25nA-5LkoaTDR8gB0xQ1Ro_FyyCQNbzrfSp+aQ@mail.gmail.com>
@ 2017-11-15 21:08     ` David Hill
  2017-11-22 18:22       ` Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-11-15 21:08 UTC (permalink / raw)
  To: kvm

Hi guys,

     The same behavior is seen with 4.15-rc0 :

$ sudo virsh list
  Id    Name                           State
----------------------------------------------------
  2     undercloud-0-pike              in shutdown

$ uname -a
Linux zappa.orion 4.15.0-0.rc0.git1.1.fc28.x86_64 #1 SMP Mon Nov 13 
19:54:17 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


$ ps aufxg | grep qemu

qemu      8595 81.6 12.8 22831200 16961240 ?   D    11:44 214:26 
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name 
guest=undercloud-0-pike,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-undercloud-0-pike/master-key.aes 
-machine pc-i440fx-2.10,accel=kvm,usb=off,vmport=off,dump-guest-core=off 
-cpu Westmere -m 16384 -realtime mlock=off -smp 
4,sockets=4,cores=1,threads=1 -uuid 0de0918d-f67e-426a-91c1-5e38e86c96b3 
-no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-undercloud-0-pike/monitor.sock,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet 
-no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 
-boot menu=on,strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 
-device 
ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 
-device 
ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
file=/var/lib/jenkins/VMs/undercloud-0-pike.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
-drive if=none,id=drive-ide0-0-0,readonly=on -device 
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=3 
-netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:79:8e:41,bus=pci.0,addr=0x3,bootindex=1 
-netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:b8:8b:a5,bus=pci.0,addr=0x8 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -chardev 
spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 
-spice 
port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on 
-device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 
-device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device 
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev 
spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

Thank you very much,

David Hill


On 2017-11-14 10:44 AM, Dave Hill wrote:
> I'm not able to bisect further as the system wont boot properly when I 
> reach that commit.
>
> On Mon, Nov 13, 2017 at 10:54 AM, David Hill <dhill@redhat.com 
> <mailto:dhill@redhat.com>> wrote:
>
>
>     Hi guys,
>
>        Starting with kernel 4.14-rc1, my CI failed to completely
>     shutdown one of the
>     VMs and it stuck in "in shutdown" while sending this kernel message:
>
>     [ 7496.552971] INFO: task qemu-system-x86:5978 blocked for more
>     than 120 seconds.
>     [ 7496.552987]       Tainted: G          I
>     4.14.0-0.rc1.git3.1.fc28.x86_64 #1
>     [ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004
>     [ 7496.553024] Call Trace:
>     [ 7496.553044]  __schedule+0x2dc/0xbb0
>     [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>     [ 7496.553074]  schedule+0x3d/0x90
>     [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>     [ 7496.553100]  ? finish_wait+0x90/0x90
>     [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>     [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>     [ 7496.553166]  SyS_ioctl+0x79/0x90
>     [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>     [ 7496.553190] RIP: 0033:0x7fa1ea0e1817
>     [ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246
>     ORIG_RAX: 0000000000000010
>     [ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX:
>     00007fa1ea0e1817
>     [ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI:
>     0000000000000021
>     [ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09:
>     000055e330245d92
>     [ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12:
>     000055e33351a000
>     [ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15:
>     0000000000000000
>     [ 7496.553284]
>                    Showing all locks held in the system:
>     [ 7496.553313] 1 lock held by khungtaskd/161:
>     [ 7496.553319]  #0:  (tasklist_lock){.+.+}, at:
>     [<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0
>     [ 7496.553373] 1 lock held by in:imklog/1194:
>     [ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at:
>     [<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60
>     [ 7496.553541] 1 lock held by qemu-system-x86/5978:
>     [ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at:
>     [<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net]
>
>
>
>     I'm currently bisecting to figure out which commit breaks this but
>     for some reasons, when hitting this commit:
>
>     # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag
>     'wireless-drivers-next-for-davem-2017-08-07' of
>     git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
>     <http://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next>
>     git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63
>
>     the host will not allow SSHd to establish a new session and when
>     starting a KVM guest, the host will hard lock.   I'm still
>     bisecting but I marked that commit as bad even though perhaps it
>     would be good.
>     Hopefully, this commit was a bad one and my bisection will
>     pinpoint which
>     commit broke the kernel.    If you have an idea of which commit
>     might break
>     the system, please let me know which one I should test first.
>
>     Thank you very much,
>     David Hill
>
>     [1] https://bugzilla.kernel.org/show_bug.cgi?id=197861
>     <https://bugzilla.kernel.org/show_bug.cgi?id=197861>
>
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-15 21:08     ` David Hill
@ 2017-11-22 18:22       ` David Hill
  2017-11-23 23:48         ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-11-22 18:22 UTC (permalink / raw)
  To: kvm

Anyone had time to take a little look at the kernel trace?


On 2017-11-15 04:08 PM, David Hill wrote:
> Hi guys,
>
>     The same behavior is seen with 4.15-rc0 :
>
> $ sudo virsh list
>  Id    Name                           State
> ----------------------------------------------------
>  2     undercloud-0-pike              in shutdown
>
> $ uname -a
> Linux zappa.orion 4.15.0-0.rc0.git1.1.fc28.x86_64 #1 SMP Mon Nov 13 
> 19:54:17 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>
>
> $ ps aufxg | grep qemu
>
> qemu      8595 81.6 12.8 22831200 16961240 ?   D    11:44 214:26 
> /usr/bin/qemu-system-x86_64 -machine accel=kvm -name 
> guest=undercloud-0-pike,debug-threads=on -S -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-undercloud-0-pike/master-key.aes 
> -machine 
> pc-i440fx-2.10,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu 
> Westmere -m 16384 -realtime mlock=off -smp 
> 4,sockets=4,cores=1,threads=1 -uuid 
> 0de0918d-f67e-426a-91c1-5e38e86c96b3 -no-user-config -nodefaults 
> -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-undercloud-0-pike/monitor.sock,server,nowait 
> -mon chardev=charmonitor,id=monitor,mode=control -rtc 
> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet 
> -no-shutdown -global PIIX4_PM.disable_s3=1 -global 
> PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device 
> ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device 
> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 
> -device 
> ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 
> -device 
> ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
> file=/var/lib/jenkins/VMs/undercloud-0-pike.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 
> -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
> -drive if=none,id=drive-ide0-0-0,readonly=on -device 
> ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=3 
> -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:79:8e:41,bus=pci.0,addr=0x3,bootindex=1 
> -netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32 -device 
> virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:b8:8b:a5,bus=pci.0,addr=0x8 
> -chardev pty,id=charserial0 -device 
> isa-serial,chardev=charserial0,id=serial0 -chardev 
> spicevmc,id=charchannel0,name=vdagent -device 
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 
> -spice 
> port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on 
> -device 
> qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 
> -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device 
> hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev 
> spicevmc,id=charredir0,name=usbredir -device 
> usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev 
> spicevmc,id=charredir1,name=usbredir -device 
> usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>
> Thank you very much,
>
> David Hill
>
>
> On 2017-11-14 10:44 AM, Dave Hill wrote:
>> I'm not able to bisect further as the system wont boot properly when 
>> I reach that commit.
>>
>> On Mon, Nov 13, 2017 at 10:54 AM, David Hill <dhill@redhat.com 
>> <mailto:dhill@redhat.com>> wrote:
>>
>>
>>     Hi guys,
>>
>>        Starting with kernel 4.14-rc1, my CI failed to completely
>>     shutdown one of the
>>     VMs and it stuck in "in shutdown" while sending this kernel message:
>>
>>     [ 7496.552971] INFO: task qemu-system-x86:5978 blocked for more
>>     than 120 seconds.
>>     [ 7496.552987]       Tainted: G          I
>>     4.14.0-0.rc1.git3.1.fc28.x86_64 #1
>>     [ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs"
>>     disables this message.
>>     [ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004
>>     [ 7496.553024] Call Trace:
>>     [ 7496.553044]  __schedule+0x2dc/0xbb0
>>     [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>>     [ 7496.553074]  schedule+0x3d/0x90
>>     [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>>     [ 7496.553100]  ? finish_wait+0x90/0x90
>>     [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>>     [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>>     [ 7496.553166]  SyS_ioctl+0x79/0x90
>>     [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>     [ 7496.553190] RIP: 0033:0x7fa1ea0e1817
>>     [ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246
>>     ORIG_RAX: 0000000000000010
>>     [ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX:
>>     00007fa1ea0e1817
>>     [ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI:
>>     0000000000000021
>>     [ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09:
>>     000055e330245d92
>>     [ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12:
>>     000055e33351a000
>>     [ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15:
>>     0000000000000000
>>     [ 7496.553284]
>>                    Showing all locks held in the system:
>>     [ 7496.553313] 1 lock held by khungtaskd/161:
>>     [ 7496.553319]  #0:  (tasklist_lock){.+.+}, at:
>>     [<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0
>>     [ 7496.553373] 1 lock held by in:imklog/1194:
>>     [ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at:
>>     [<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60
>>     [ 7496.553541] 1 lock held by qemu-system-x86/5978:
>>     [ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at:
>>     [<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net]
>>
>>
>>
>>     I'm currently bisecting to figure out which commit breaks this but
>>     for some reasons, when hitting this commit:
>>
>>     # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag
>>     'wireless-drivers-next-for-davem-2017-08-07' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
>> <http://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next>
>>     git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63
>>
>>     the host will not allow SSHd to establish a new session and when
>>     starting a KVM guest, the host will hard lock.   I'm still
>>     bisecting but I marked that commit as bad even though perhaps it
>>     would be good.
>>     Hopefully, this commit was a bad one and my bisection will
>>     pinpoint which
>>     commit broke the kernel.    If you have an idea of which commit
>>     might break
>>     the system, please let me know which one I should test first.
>>
>>     Thank you very much,
>>     David Hill
>>
>>     [1] https://bugzilla.kernel.org/show_bug.cgi?id=197861
>>     <https://bugzilla.kernel.org/show_bug.cgi?id=197861>
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-22 18:22       ` Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover David Hill
@ 2017-11-23 23:48         ` Paolo Bonzini
  2017-11-24  3:11           ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2017-11-23 23:48 UTC (permalink / raw)
  To: David Hill, kvm, Jason Wang (jasowang@redhat.com)

Jason, any ideas?

Thanks,

Paolo

On 22/11/2017 19:22, David Hill wrote:
> ore than 120 seconds.
>     [ 7496.552987]       Tainted: G          I
>     4.14.0-0.rc1.git3.1.fc28.x86_64 #1
>     [ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004
>     [ 7496.553024] Call Trace:
>     [ 7496.553044]  __schedule+0x2dc/0xbb0
>     [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>     [ 7496.553074]  schedule+0x3d/0x90
>     [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>     [ 7496.553100]  ? finish_wait+0x90/0x90
>     [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>     [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>     [ 7496.553166]  SyS_ioctl+0x79/0x90
>     [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>     [ 7496.553190] RIP: 0033:0x7fa1ea0e1817
>     [ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246
>     ORIG_RAX: 0000000000000010
>     [ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX:
>     00007fa1ea0e1817
>     [ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI:
>     0000000000000021
>     [ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09:
>     000055e330245d92
>     [ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12:
>     000055e33351a000
>     [ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15:
>     0000000000000000
>     [ 7496.553284]
>                    Showing all locks held in the system:
>     [ 7496.553313] 1 lock held by khungtaskd/161:
>     [ 7496.553319]  #0:  (tasklist_lock){.+.+}, at:
>     [<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0
>     [ 7496.553373] 1 lock held by in:imklog/1194:
>     [ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at:
>     [<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60
>     [ 7496.553541] 1 lock held by qemu-system-x86/5978:
>     [ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at:
>     [<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-23 23:48         ` Paolo Bonzini
@ 2017-11-24  3:11           ` Jason Wang
  2017-11-24 16:19             ` David Hill
  2017-11-24 16:22             ` David Hill
  0 siblings, 2 replies; 31+ messages in thread
From: Jason Wang @ 2017-11-24  3:11 UTC (permalink / raw)
  To: Paolo Bonzini, David Hill, kvm



On 2017年11月24日 07:48, Paolo Bonzini wrote:
> Jason, any ideas?
>
> Thanks,
>
> Paolo
>
> On 22/11/2017 19:22, David Hill wrote:
>> ore than 120 seconds.
>>      [ 7496.552987]       Tainted: G          I
>>      4.14.0-0.rc1.git3.1.fc28.x86_64 #1
>>      [ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs"
>>      disables this message.
>>      [ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004
>>      [ 7496.553024] Call Trace:
>>      [ 7496.553044]  __schedule+0x2dc/0xbb0
>>      [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>>      [ 7496.553074]  schedule+0x3d/0x90
>>      [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>>      [ 7496.553100]  ? finish_wait+0x90/0x90
>>      [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>>      [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>>      [ 7496.553166]  SyS_ioctl+0x79/0x90
>>      [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>      [ 7496.553190] RIP: 0033:0x7fa1ea0e1817
>>      [ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246
>>      ORIG_RAX: 0000000000000010
>>      [ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX:
>>      00007fa1ea0e1817
>>      [ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI:
>>      0000000000000021
>>      [ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09:
>>      000055e330245d92
>>      [ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12:
>>      000055e33351a000
>>      [ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15:
>>      0000000000000000
>>      [ 7496.553284]
>>                     Showing all locks held in the system:
>>      [ 7496.553313] 1 lock held by khungtaskd/161:
>>      [ 7496.553319]  #0:  (tasklist_lock){.+.+}, at:
>>      [<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0
>>      [ 7496.553373] 1 lock held by in:imklog/1194:
>>      [ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at:
>>      [<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60
>>      [ 7496.553541] 1 lock held by qemu-system-x86/5978:
>>      [ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at:
>>      [<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net]

Hi:

The backtrace shows zero copied skb was not sent for a long while for 
some reason. This could be either a bug in vhost_net or somewhere in the 
host driver, qdiscs or others.

What's your network setups in host (e.g the qdiscs or network driver)? 
Can you still hit the issue if you switch to use another type of 
ethernet driver/cards? Can this still be reproducible in net.git 
(https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/).

Will try to reproduce this locally.

Thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-24  3:11           ` Jason Wang
@ 2017-11-24 16:19             ` David Hill
  2017-11-24 16:22             ` David Hill
  1 sibling, 0 replies; 31+ messages in thread
From: David Hill @ 2017-11-24 16:19 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm

It's really hard to reproduce actually because in order to hit this 
issue, I must do the following:


1) Create an undercloud VM (tripleo openstack)

2) Update it, install an undercloud

3) Create 7 other VMs

4) Deploy an overcloud on these 7 VMs

5) Stop those VMs

6) Try stopping the undercloud VM and it get stucks "in shutdown"


It breaks here ... I tried reproducing this by starting/stopping VMs and 
I wasn't able to hit it ...   It takes a good 2-3 hours each time I test 
this.

I get this issue only with 4.14 and later and I wasn't able to reproduce 
this with 4.13 and older.


On 2017-11-23 10:11 PM, Jason Wang wrote:
>
>
> On 2017年11月24日 07:48, Paolo Bonzini wrote:
>> Jason, any ideas?
>>
>> Thanks,
>>
>> Paolo
>>
>> On 22/11/2017 19:22, David Hill wrote:
>>> ore than 120 seconds.
>>>      [ 7496.552987]       Tainted: G          I
>>>      4.14.0-0.rc1.git3.1.fc28.x86_64 #1
>>>      [ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs"
>>>      disables this message.
>>>      [ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004
>>>      [ 7496.553024] Call Trace:
>>>      [ 7496.553044]  __schedule+0x2dc/0xbb0
>>>      [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>>>      [ 7496.553074]  schedule+0x3d/0x90
>>>      [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>>>      [ 7496.553100]  ? finish_wait+0x90/0x90
>>>      [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>>>      [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>>>      [ 7496.553166]  SyS_ioctl+0x79/0x90
>>>      [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>      [ 7496.553190] RIP: 0033:0x7fa1ea0e1817
>>>      [ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246
>>>      ORIG_RAX: 0000000000000010
>>>      [ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX:
>>>      00007fa1ea0e1817
>>>      [ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI:
>>>      0000000000000021
>>>      [ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09:
>>>      000055e330245d92
>>>      [ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12:
>>>      000055e33351a000
>>>      [ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15:
>>>      0000000000000000
>>>      [ 7496.553284]
>>>                     Showing all locks held in the system:
>>>      [ 7496.553313] 1 lock held by khungtaskd/161:
>>>      [ 7496.553319]  #0:  (tasklist_lock){.+.+}, at:
>>>      [<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0
>>>      [ 7496.553373] 1 lock held by in:imklog/1194:
>>>      [ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at:
>>>      [<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60
>>>      [ 7496.553541] 1 lock held by qemu-system-x86/5978:
>>>      [ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at:
>>>      [<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net]
>
> Hi:
>
> The backtrace shows zero copied skb was not sent for a long while for 
> some reason. This could be either a bug in vhost_net or somewhere in 
> the host driver, qdiscs or others.
>
> What's your network setups in host (e.g the qdiscs or network driver)? 
> Can you still hit the issue if you switch to use another type of 
> ethernet driver/cards? Can this still be reproducible in net.git 
> (https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/).
>
> Will try to reproduce this locally.
>
> Thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-24  3:11           ` Jason Wang
  2017-11-24 16:19             ` David Hill
@ 2017-11-24 16:22             ` David Hill
  2017-11-27  3:44               ` Jason Wang
  1 sibling, 1 reply; 31+ messages in thread
From: David Hill @ 2017-11-24 16:22 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm

The VMs all have 2 vNICs ... and this is the hypervisor:

[root@zappa ~]# brctl show
bridge name    bridge id        STP enabled    interfaces
virbr0        8000.525400914858    yes        virbr0-nic
                             vnet0
                             vnet1


1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet6 ::1/128 scope host
        valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
group default qlen 1000
     link/ether 84:2b:2b:13:f2:91 brd ff:ff:ff:ff:ff:ff
     inet redacted/24 brd 173.178.138.255 scope global dynamic eno1
        valid_lft 48749sec preferred_lft 48749sec
     inet6 fe80::862b:2bff:fe13:f291/64 scope link
        valid_lft forever preferred_lft forever
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
group default qlen 1000
     link/ether 84:2b:2b:13:f2:92 brd ff:ff:ff:ff:ff:ff
     inet 192.168.1.3/24 brd 192.168.1.255 scope global eno2
        valid_lft forever preferred_lft forever
     inet6 fe80::862b:2bff:fe13:f292/64 scope link
        valid_lft forever preferred_lft forever
4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP group default qlen 1000
     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0

        valid_lft forever preferred_lft forever
     inet 192.168.122.10/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.11/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.12/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.15/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.16/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.17/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.18/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.31/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.32/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.33/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.34/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.35/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.36/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.37/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.45/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.46/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.47/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.48/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.49/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.50/32 scope global virbr0
        valid_lft forever preferred_lft forever
     inet 192.168.122.51/32 scope global virbr0
        valid_lft forever preferred_lft forever
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master 
virbr0 state DOWN group default qlen 1000
     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
125: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc 
fq_codel state UNKNOWN group default qlen 100
     link/none
     inet 10.10.122.28/21 brd 10.10.127.255 scope global tun0
        valid_lft forever preferred_lft forever
     inet6 fe80::1f9b:bfd4:e9c9:2059/64 scope link stable-privacy
        valid_lft forever preferred_lft forever
402: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel 
master virbr0 state UNKNOWN group default qlen 1000
     link/ether fe:54:00:09:27:39 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::fc54:ff:fe09:2739/64 scope link
        valid_lft forever preferred_lft forever
403: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel 
master virbr0 state UNKNOWN group default qlen 1000
     link/ether fe:54:00:ea:6b:18 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::fc54:ff:feea:6b18/64 scope link
        valid_lft forever preferred_lft forever


On 2017-11-23 10:11 PM, Jason Wang wrote:
>
>
> On 2017年11月24日 07:48, Paolo Bonzini wrote:
>> Jason, any ideas?
>>
>> Thanks,
>>
>> Paolo
>>
>> On 22/11/2017 19:22, David Hill wrote:
>>> ore than 120 seconds.
>>>      [ 7496.552987]       Tainted: G          I
>>>      4.14.0-0.rc1.git3.1.fc28.x86_64 #1
>>>      [ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs"
>>>      disables this message.
>>>      [ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004
>>>      [ 7496.553024] Call Trace:
>>>      [ 7496.553044]  __schedule+0x2dc/0xbb0
>>>      [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>>>      [ 7496.553074]  schedule+0x3d/0x90
>>>      [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>>>      [ 7496.553100]  ? finish_wait+0x90/0x90
>>>      [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>>>      [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>>>      [ 7496.553166]  SyS_ioctl+0x79/0x90
>>>      [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>      [ 7496.553190] RIP: 0033:0x7fa1ea0e1817
>>>      [ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246
>>>      ORIG_RAX: 0000000000000010
>>>      [ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX:
>>>      00007fa1ea0e1817
>>>      [ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI:
>>>      0000000000000021
>>>      [ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09:
>>>      000055e330245d92
>>>      [ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12:
>>>      000055e33351a000
>>>      [ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15:
>>>      0000000000000000
>>>      [ 7496.553284]
>>>                     Showing all locks held in the system:
>>>      [ 7496.553313] 1 lock held by khungtaskd/161:
>>>      [ 7496.553319]  #0:  (tasklist_lock){.+.+}, at:
>>>      [<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0
>>>      [ 7496.553373] 1 lock held by in:imklog/1194:
>>>      [ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at:
>>>      [<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60
>>>      [ 7496.553541] 1 lock held by qemu-system-x86/5978:
>>>      [ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at:
>>>      [<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net]
>
> Hi:
>
> The backtrace shows zero copied skb was not sent for a long while for 
> some reason. This could be either a bug in vhost_net or somewhere in 
> the host driver, qdiscs or others.
>
> What's your network setups in host (e.g the qdiscs or network driver)? 
> Can you still hit the issue if you switch to use another type of 
> ethernet driver/cards? Can this still be reproducible in net.git 
> (https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/).
>
> Will try to reproduce this locally.
>
> Thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-24 16:22             ` David Hill
@ 2017-11-27  3:44               ` Jason Wang
  2017-11-27 19:38                 ` David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2017-11-27  3:44 UTC (permalink / raw)
  To: David Hill, Paolo Bonzini, kvm



On 2017年11月25日 00:22, David Hill wrote:
> The VMs all have 2 vNICs ... and this is the hypervisor:
>
> [root@zappa ~]# brctl show
> bridge name    bridge id        STP enabled    interfaces
> virbr0        8000.525400914858    yes        virbr0-nic
>                             vnet0
>                             vnet1
>
>
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
> group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     inet 127.0.0.1/8 scope host lo
>        valid_lft forever preferred_lft forever
>     inet6 ::1/128 scope host
>        valid_lft forever preferred_lft forever
> 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
> group default qlen 1000
>     link/ether 84:2b:2b:13:f2:91 brd ff:ff:ff:ff:ff:ff
>     inet redacted/24 brd 173.178.138.255 scope global dynamic eno1
>        valid_lft 48749sec preferred_lft 48749sec
>     inet6 fe80::862b:2bff:fe13:f291/64 scope link
>        valid_lft forever preferred_lft forever
> 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
> group default qlen 1000
>     link/ether 84:2b:2b:13:f2:92 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.1.3/24 brd 192.168.1.255 scope global eno2
>        valid_lft forever preferred_lft forever
>     inet6 fe80::862b:2bff:fe13:f292/64 scope link
>        valid_lft forever preferred_lft forever
> 4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
> state UP group default qlen 1000
>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
>
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.10/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.11/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.12/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.15/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.16/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.17/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.18/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.31/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.32/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.33/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.34/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.35/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.36/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.37/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.45/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.46/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.47/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.48/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.49/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.50/32 scope global virbr0
>        valid_lft forever preferred_lft forever
>     inet 192.168.122.51/32 scope global virbr0
>        valid_lft forever preferred_lft forever
> 5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master 
> virbr0 state DOWN group default qlen 1000
>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
> 125: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc 
> fq_codel state UNKNOWN group default qlen 100
>     link/none
>     inet 10.10.122.28/21 brd 10.10.127.255 scope global tun0
>        valid_lft forever preferred_lft forever
>     inet6 fe80::1f9b:bfd4:e9c9:2059/64 scope link stable-privacy
>        valid_lft forever preferred_lft forever
> 402: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel 
> master virbr0 state UNKNOWN group default qlen 1000
>     link/ether fe:54:00:09:27:39 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::fc54:ff:fe09:2739/64 scope link
>        valid_lft forever preferred_lft forever
> 403: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel 
> master virbr0 state UNKNOWN group default qlen 1000
>     link/ether fe:54:00:ea:6b:18 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::fc54:ff:feea:6b18/64 scope link
>        valid_lft forever preferred_lft forever
>

I could not reproduce this locally by simply running netperf through a 
mlx4 card. Some more questions:

- What kind of workloads did you run in guest?
- Did you meet this issue in a specific type of network card (I guess 
broadcom is used in this case)?
- Virbr0 looks like a bridge created by libvirt that did NAT and other 
stuffs, can you still hit this issue if you don't use virbr0?

And what's more important, zerocopy is known to have issues, for 
production environment, need to disable it through vhost_net module 
parameters.

Thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-27  3:44               ` Jason Wang
@ 2017-11-27 19:38                 ` David Hill
  2017-11-28 18:00                   ` David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-11-27 19:38 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 2017-11-26 10:44 PM, Jason Wang wrote:
>
>
> On 2017年11月25日 00:22, David Hill wrote:
>> The VMs all have 2 vNICs ... and this is the hypervisor:
>>
>> [root@zappa ~]# brctl show
>> bridge name    bridge id        STP enabled    interfaces
>> virbr0        8000.525400914858    yes        virbr0-nic
>>                             vnet0
>>                             vnet1
>>
>>
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
>> group default qlen 1000
>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>     inet 127.0.0.1/8 scope host lo
>>        valid_lft forever preferred_lft forever
>>     inet6 ::1/128 scope host
>>        valid_lft forever preferred_lft forever
>> 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
>> group default qlen 1000
>>     link/ether 84:2b:2b:13:f2:91 brd ff:ff:ff:ff:ff:ff
>>     inet redacted/24 brd 173.178.138.255 scope global dynamic eno1
>>        valid_lft 48749sec preferred_lft 48749sec
>>     inet6 fe80::862b:2bff:fe13:f291/64 scope link
>>        valid_lft forever preferred_lft forever
>> 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
>> group default qlen 1000
>>     link/ether 84:2b:2b:13:f2:92 brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.1.3/24 brd 192.168.1.255 scope global eno2
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::862b:2bff:fe13:f292/64 scope link
>>        valid_lft forever preferred_lft forever
>> 4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
>> state UP group default qlen 1000
>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
>>
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.10/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.11/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.12/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.15/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.16/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.17/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.18/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.31/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.32/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.33/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.34/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.35/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.36/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.37/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.45/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.46/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.47/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.48/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.49/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.50/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>>     inet 192.168.122.51/32 scope global virbr0
>>        valid_lft forever preferred_lft forever
>> 5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master 
>> virbr0 state DOWN group default qlen 1000
>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>> 125: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc 
>> fq_codel state UNKNOWN group default qlen 100
>>     link/none
>>     inet 10.10.122.28/21 brd 10.10.127.255 scope global tun0
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::1f9b:bfd4:e9c9:2059/64 scope link stable-privacy
>>        valid_lft forever preferred_lft forever
>> 402: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel 
>> master virbr0 state UNKNOWN group default qlen 1000
>>     link/ether fe:54:00:09:27:39 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::fc54:ff:fe09:2739/64 scope link
>>        valid_lft forever preferred_lft forever
>> 403: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel 
>> master virbr0 state UNKNOWN group default qlen 1000
>>     link/ether fe:54:00:ea:6b:18 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::fc54:ff:feea:6b18/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>
> I could not reproduce this locally by simply running netperf through a 
> mlx4 card. Some more questions:
>
> - What kind of workloads did you run in guest?
> - Did you meet this issue in a specific type of network card (I guess 
> broadcom is used in this case)?
> - Virbr0 looks like a bridge created by libvirt that did NAT and other 
> stuffs, can you still hit this issue if you don't use virbr0?
>
> And what's more important, zerocopy is known to have issues, for 
> production environment, need to disable it through vhost_net module 
> parameters.
>
> Thanks

I'm deploying an overcloud through a undercloud virtual machine... The 
VM has 4vCPUs and 16GB of RAM as well as to virtio nics so I'm using 
only virtual hardware here.
I spawn 7 VMs on the hypervisor and deploy an overcloud using tripleo on 
them ... everything's virtual and if I remove the bridge, then I'll have 
to configure each VMs differently.
The load is quite high on the VM that won't shutdown but when I shut it 
down, it's doing nothing ...   This is a hard bug to troubleshoot and I 
can't bisect the kernel because at some
point the system simply won't boot properly.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-27 19:38                 ` David Hill
@ 2017-11-28 18:00                   ` David Hill
  2017-11-29  1:52                     ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-11-28 18:00 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 2017-11-27 02:38 PM, David Hill wrote:
>
>
> On 2017-11-26 10:44 PM, Jason Wang wrote:
>>
>>
>> On 2017年11月25日 00:22, David Hill wrote:
>>> The VMs all have 2 vNICs ... and this is the hypervisor:
>>>
>>> [root@zappa ~]# brctl show
>>> bridge name    bridge id        STP enabled    interfaces
>>> virbr0        8000.525400914858    yes        virbr0-nic
>>>                             vnet0
>>>                             vnet1
>>>
>>>
>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
>>> group default qlen 1000
>>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>     inet 127.0.0.1/8 scope host lo
>>>        valid_lft forever preferred_lft forever
>>>     inet6 ::1/128 scope host
>>>        valid_lft forever preferred_lft forever
>>> 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state 
>>> UP group default qlen 1000
>>>     link/ether 84:2b:2b:13:f2:91 brd ff:ff:ff:ff:ff:ff
>>>     inet redacted/24 brd 173.178.138.255 scope global dynamic eno1
>>>        valid_lft 48749sec preferred_lft 48749sec
>>>     inet6 fe80::862b:2bff:fe13:f291/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state 
>>> UP group default qlen 1000
>>>     link/ether 84:2b:2b:13:f2:92 brd ff:ff:ff:ff:ff:ff
>>>     inet 192.168.1.3/24 brd 192.168.1.255 scope global eno2
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::862b:2bff:fe13:f292/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
>>> state UP group default qlen 1000
>>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>>>     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
>>>
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.10/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.11/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.12/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.15/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.16/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.17/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.18/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.31/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.32/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.33/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.34/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.35/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.36/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.37/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.45/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.46/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.47/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.48/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.49/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.50/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>>     inet 192.168.122.51/32 scope global virbr0
>>>        valid_lft forever preferred_lft forever
>>> 5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master 
>>> virbr0 state DOWN group default qlen 1000
>>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>>> 125: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc 
>>> fq_codel state UNKNOWN group default qlen 100
>>>     link/none
>>>     inet 10.10.122.28/21 brd 10.10.127.255 scope global tun0
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::1f9b:bfd4:e9c9:2059/64 scope link stable-privacy
>>>        valid_lft forever preferred_lft forever
>>> 402: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
>>> fq_codel master virbr0 state UNKNOWN group default qlen 1000
>>>     link/ether fe:54:00:09:27:39 brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::fc54:ff:fe09:2739/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 403: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
>>> fq_codel master virbr0 state UNKNOWN group default qlen 1000
>>>     link/ether fe:54:00:ea:6b:18 brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::fc54:ff:feea:6b18/64 scope link
>>>        valid_lft forever preferred_lft forever
>>>
>>
>> I could not reproduce this locally by simply running netperf through 
>> a mlx4 card. Some more questions:
>>
>> - What kind of workloads did you run in guest?
>> - Did you meet this issue in a specific type of network card (I guess 
>> broadcom is used in this case)?
>> - Virbr0 looks like a bridge created by libvirt that did NAT and 
>> other stuffs, can you still hit this issue if you don't use virbr0?
>>
>> And what's more important, zerocopy is known to have issues, for 
>> production environment, need to disable it through vhost_net module 
>> parameters.
>>
>> Thanks
>
> I'm deploying an overcloud through a undercloud virtual machine... The 
> VM has 4vCPUs and 16GB of RAM as well as to virtio nics so I'm using 
> only virtual hardware here.
> I spawn 7 VMs on the hypervisor and deploy an overcloud using tripleo 
> on them ... everything's virtual and if I remove the bridge, then I'll 
> have to configure each VMs differently.
> The load is quite high on the VM that won't shutdown but when I shut 
> it down, it's doing nothing ...   This is a hard bug to troubleshoot 
> and I can't bisect the kernel because at some
> point the system simply won't boot properly.

I've disabled zerocopy with the following:

[root@zappa modprobe.d]# cat vhost-net.conf
options vhost_net  experimental_zcopytx=0


And I haven't reproduce this issue so far.   The problem I have right 
now is that experimental_zcopytx has been enabled by default with this 
commit:

commit f9611c43ab0ddaf547b395c90fb842f55959334c
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Thu Dec 6 14:56:00 2012 +0200

     vhost-net: enable zerocopy tx by default

     Zero copy TX has been around for a while now.
     We seem to be down to eliminating theoretical bugs
     and performance tuning at this point:
     it's probably time to enable it by default so that
     most users get the benefit.

     Keep the flag around meanwhile so users can experiment
     with disabling this if they experience regressions.
     I expect that we will remove it in the future.

     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

I'll try some more pass in producing this issue and I'll keep you posted.

Thank you very much,

David Hill

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-28 18:00                   ` David Hill
@ 2017-11-29  1:52                     ` Jason Wang
  2017-11-29  2:52                       ` Dave Hill
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2017-11-29  1:52 UTC (permalink / raw)
  To: David Hill, Paolo Bonzini, kvm



On 2017年11月29日 02:00, David Hill wrote:
>
>
> On 2017-11-27 02:38 PM, David Hill wrote:
>>
>>
>> On 2017-11-26 10:44 PM, Jason Wang wrote:
>>>
>>>
>>> On 2017年11月25日 00:22, David Hill wrote:
>>>> The VMs all have 2 vNICs ... and this is the hypervisor:
>>>>
>>>> [root@zappa ~]# brctl show
>>>> bridge name    bridge id        STP enabled    interfaces
>>>> virbr0        8000.525400914858    yes        virbr0-nic
>>>>                             vnet0
>>>>                             vnet1
>>>>
>>>>
>>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
>>>> group default qlen 1000
>>>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>>     inet 127.0.0.1/8 scope host lo
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 ::1/128 scope host
>>>>        valid_lft forever preferred_lft forever
>>>> 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state 
>>>> UP group default qlen 1000
>>>>     link/ether 84:2b:2b:13:f2:91 brd ff:ff:ff:ff:ff:ff
>>>>     inet redacted/24 brd 173.178.138.255 scope global dynamic eno1
>>>>        valid_lft 48749sec preferred_lft 48749sec
>>>>     inet6 fe80::862b:2bff:fe13:f291/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state 
>>>> UP group default qlen 1000
>>>>     link/ether 84:2b:2b:13:f2:92 brd ff:ff:ff:ff:ff:ff
>>>>     inet 192.168.1.3/24 brd 192.168.1.255 scope global eno2
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::862b:2bff:fe13:f292/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
>>>> state UP group default qlen 1000
>>>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>>>>     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
>>>>
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.10/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.11/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.12/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.15/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.16/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.17/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.18/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.31/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.32/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.33/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.34/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.35/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.36/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.37/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.45/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.46/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.47/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.48/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.49/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.50/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 192.168.122.51/32 scope global virbr0
>>>>        valid_lft forever preferred_lft forever
>>>> 5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master 
>>>> virbr0 state DOWN group default qlen 1000
>>>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>>>> 125: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc 
>>>> fq_codel state UNKNOWN group default qlen 100
>>>>     link/none
>>>>     inet 10.10.122.28/21 brd 10.10.127.255 scope global tun0
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::1f9b:bfd4:e9c9:2059/64 scope link stable-privacy
>>>>        valid_lft forever preferred_lft forever
>>>> 402: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
>>>> fq_codel master virbr0 state UNKNOWN group default qlen 1000
>>>>     link/ether fe:54:00:09:27:39 brd ff:ff:ff:ff:ff:ff
>>>>     inet6 fe80::fc54:ff:fe09:2739/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 403: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
>>>> fq_codel master virbr0 state UNKNOWN group default qlen 1000
>>>>     link/ether fe:54:00:ea:6b:18 brd ff:ff:ff:ff:ff:ff
>>>>     inet6 fe80::fc54:ff:feea:6b18/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>
>>> I could not reproduce this locally by simply running netperf through 
>>> a mlx4 card. Some more questions:
>>>
>>> - What kind of workloads did you run in guest?
>>> - Did you meet this issue in a specific type of network card (I 
>>> guess broadcom is used in this case)?
>>> - Virbr0 looks like a bridge created by libvirt that did NAT and 
>>> other stuffs, can you still hit this issue if you don't use virbr0?
>>>
>>> And what's more important, zerocopy is known to have issues, for 
>>> production environment, need to disable it through vhost_net module 
>>> parameters.
>>>
>>> Thanks
>>
>> I'm deploying an overcloud through a undercloud virtual machine... 
>> The VM has 4vCPUs and 16GB of RAM as well as to virtio nics so I'm 
>> using only virtual hardware here.
>> I spawn 7 VMs on the hypervisor and deploy an overcloud using tripleo 
>> on them ... everything's virtual and if I remove the bridge, then 
>> I'll have to configure each VMs differently.
>> The load is quite high on the VM that won't shutdown but when I shut 
>> it down, it's doing nothing ...   This is a hard bug to troubleshoot 
>> and I can't bisect the kernel because at some
>> point the system simply won't boot properly.
>
> I've disabled zerocopy with the following:
>
> [root@zappa modprobe.d]# cat vhost-net.conf
> options vhost_net  experimental_zcopytx=0
>
>
> And I haven't reproduce this issue so far.   The problem I have right 
> now is that experimental_zcopytx has been enabled by default with this 
> commit:
>
> commit f9611c43ab0ddaf547b395c90fb842f55959334c
> Author: Michael S. Tsirkin <mst@redhat.com>
> Date:   Thu Dec 6 14:56:00 2012 +0200
>
>     vhost-net: enable zerocopy tx by default
>
>     Zero copy TX has been around for a while now.
>     We seem to be down to eliminating theoretical bugs
>     and performance tuning at this point:
>     it's probably time to enable it by default so that
>     most users get the benefit.
>
>     Keep the flag around meanwhile so users can experiment
>     with disabling this if they experience regressions.
>     I expect that we will remove it in the future.
>
>     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> I'll try some more pass in producing this issue and I'll keep you posted.
>
> Thank you very much,
>
> David Hill
>

Thanks. Zerocopy is disabled by several distribution by default. For 
upstream, the only reason to let it on is to hope more developers can 
help and fix the issues.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-29  1:52                     ` Jason Wang
@ 2017-11-29  2:52                       ` Dave Hill
  2017-11-29  5:15                         ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hill @ 2017-11-29  2:52 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 11/28/2017 8:52 PM, Jason Wang wrote:
>
>
> On 2017年11月29日 02:00, David Hill wrote:
>>
>>
>> On 2017-11-27 02:38 PM, David Hill wrote:
>>>
>>>
>>> On 2017-11-26 10:44 PM, Jason Wang wrote:
>>>>
>>>>
>>>> On 2017年11月25日 00:22, David Hill wrote:
>>>>> The VMs all have 2 vNICs ... and this is the hypervisor:
>>>>>
>>>>> [root@zappa ~]# brctl show
>>>>> bridge name    bridge id        STP enabled    interfaces
>>>>> virbr0        8000.525400914858    yes        virbr0-nic
>>>>>                             vnet0
>>>>>                             vnet1
>>>>>
>>>>>
>>>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state 
>>>>> UNKNOWN group default qlen 1000
>>>>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>>>     inet 127.0.0.1/8 scope host lo
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet6 ::1/128 scope host
>>>>>        valid_lft forever preferred_lft forever
>>>>> 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state 
>>>>> UP group default qlen 1000
>>>>>     link/ether 84:2b:2b:13:f2:91 brd ff:ff:ff:ff:ff:ff
>>>>>     inet redacted/24 brd 173.178.138.255 scope global dynamic eno1
>>>>>        valid_lft 48749sec preferred_lft 48749sec
>>>>>     inet6 fe80::862b:2bff:fe13:f291/64 scope link
>>>>>        valid_lft forever preferred_lft forever
>>>>> 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state 
>>>>> UP group default qlen 1000
>>>>>     link/ether 84:2b:2b:13:f2:92 brd ff:ff:ff:ff:ff:ff
>>>>>     inet 192.168.1.3/24 brd 192.168.1.255 scope global eno2
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet6 fe80::862b:2bff:fe13:f292/64 scope link
>>>>>        valid_lft forever preferred_lft forever
>>>>> 4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
>>>>> noqueue state UP group default qlen 1000
>>>>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>>>>>     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
>>>>>
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.10/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.11/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.12/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.15/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.16/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.17/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.18/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.31/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.32/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.33/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.34/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.35/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.36/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.37/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.45/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.46/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.47/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.48/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.49/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.50/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet 192.168.122.51/32 scope global virbr0
>>>>>        valid_lft forever preferred_lft forever
>>>>> 5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel 
>>>>> master virbr0 state DOWN group default qlen 1000
>>>>>     link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
>>>>> 125: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 
>>>>> qdisc fq_codel state UNKNOWN group default qlen 100
>>>>>     link/none
>>>>>     inet 10.10.122.28/21 brd 10.10.127.255 scope global tun0
>>>>>        valid_lft forever preferred_lft forever
>>>>>     inet6 fe80::1f9b:bfd4:e9c9:2059/64 scope link stable-privacy
>>>>>        valid_lft forever preferred_lft forever
>>>>> 402: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
>>>>> fq_codel master virbr0 state UNKNOWN group default qlen 1000
>>>>>     link/ether fe:54:00:09:27:39 brd ff:ff:ff:ff:ff:ff
>>>>>     inet6 fe80::fc54:ff:fe09:2739/64 scope link
>>>>>        valid_lft forever preferred_lft forever
>>>>> 403: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
>>>>> fq_codel master virbr0 state UNKNOWN group default qlen 1000
>>>>>     link/ether fe:54:00:ea:6b:18 brd ff:ff:ff:ff:ff:ff
>>>>>     inet6 fe80::fc54:ff:feea:6b18/64 scope link
>>>>>        valid_lft forever preferred_lft forever
>>>>>
>>>>
>>>> I could not reproduce this locally by simply running netperf 
>>>> through a mlx4 card. Some more questions:
>>>>
>>>> - What kind of workloads did you run in guest?
>>>> - Did you meet this issue in a specific type of network card (I 
>>>> guess broadcom is used in this case)?
>>>> - Virbr0 looks like a bridge created by libvirt that did NAT and 
>>>> other stuffs, can you still hit this issue if you don't use virbr0?
>>>>
>>>> And what's more important, zerocopy is known to have issues, for 
>>>> production environment, need to disable it through vhost_net module 
>>>> parameters.
>>>>
>>>> Thanks
>>>
>>> I'm deploying an overcloud through a undercloud virtual machine... 
>>> The VM has 4vCPUs and 16GB of RAM as well as to virtio nics so I'm 
>>> using only virtual hardware here.
>>> I spawn 7 VMs on the hypervisor and deploy an overcloud using 
>>> tripleo on them ... everything's virtual and if I remove the bridge, 
>>> then I'll have to configure each VMs differently.
>>> The load is quite high on the VM that won't shutdown but when I shut 
>>> it down, it's doing nothing ...   This is a hard bug to troubleshoot 
>>> and I can't bisect the kernel because at some
>>> point the system simply won't boot properly.
>>
>> I've disabled zerocopy with the following:
>>
>> [root@zappa modprobe.d]# cat vhost-net.conf
>> options vhost_net  experimental_zcopytx=0
>>
>>
>> And I haven't reproduce this issue so far.   The problem I have right 
>> now is that experimental_zcopytx has been enabled by default with 
>> this commit:
>>
>> commit f9611c43ab0ddaf547b395c90fb842f55959334c
>> Author: Michael S. Tsirkin <mst@redhat.com>
>> Date:   Thu Dec 6 14:56:00 2012 +0200
>>
>>     vhost-net: enable zerocopy tx by default
>>
>>     Zero copy TX has been around for a while now.
>>     We seem to be down to eliminating theoretical bugs
>>     and performance tuning at this point:
>>     it's probably time to enable it by default so that
>>     most users get the benefit.
>>
>>     Keep the flag around meanwhile so users can experiment
>>     with disabling this if they experience regressions.
>>     I expect that we will remove it in the future.
>>
>>     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>
>> I'll try some more pass in producing this issue and I'll keep you 
>> posted.
>>
>> Thank you very much,
>>
>> David Hill
>>
>
> Thanks. Zerocopy is disabled by several distribution by default. For 
> upstream, the only reason to let it on is to hope more developers can 
> help and fix the issues.
>
>
So I never hit this issue with previous kernel and this issue started 
happening with the v4.14-rc series.   I'm using rawhide so perhaps this 
is why it isn't disabled by default but I have to mention it's an update 
of FC25 up to FC28 and it never got disabled.
Perhaps it should be disabled in Fedora too if it's not the case... I'm 
not sure this is the place to discuss this ... is it?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-29  2:52                       ` Dave Hill
@ 2017-11-29  5:15                         ` Jason Wang
  2017-11-29 19:13                           ` David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2017-11-29  5:15 UTC (permalink / raw)
  To: Dave Hill, Paolo Bonzini, kvm



On 2017年11月29日 10:52, Dave Hill wrote:
>>>
>>
>> Thanks. Zerocopy is disabled by several distribution by default. For 
>> upstream, the only reason to let it on is to hope more developers can 
>> help and fix the issues.
>>
>>
> So I never hit this issue with previous kernel and this issue started 
> happening with the v4.14-rc series.


Right, this still need to be investigated if it was introduced recently.

Looking at git history, the only suspected commit is for 4.14 is

commit 1e6f74536de08b5e50cf0e37e735911c2cef7c62
Author: Willem de Bruijn <willemb@google.com>
Date:   Fri Oct 6 13:22:31 2017 -0400

     vhost_net: do not stall on zerocopy depletion

Maybe you can try to revert it and see.

If it does not solve your issue, I suspect there's bug elsewhere that 
cause a packet to be held for very long time.

>   I'm using rawhide so perhaps this is why it isn't disabled by 
> default but I have to mention it's an update of FC25 up to FC28 and it 
> never got disabled.
> Perhaps it should be disabled in Fedora too if it's not the case... 
> I'm not sure this is the place to discuss this ... is it? 

Probably not, but I guess Fedora tries to use new technology aggressively.

Thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-29  5:15                         ` Jason Wang
@ 2017-11-29 19:13                           ` David Hill
  2017-11-30  2:42                             ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-11-29 19:13 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 2017-11-29 12:15 AM, Jason Wang wrote:
>
>
> On 2017年11月29日 10:52, Dave Hill wrote:
>>>>
>>>
>>> Thanks. Zerocopy is disabled by several distribution by default. For 
>>> upstream, the only reason to let it on is to hope more developers 
>>> can help and fix the issues.
>>>
>>>
>> So I never hit this issue with previous kernel and this issue started 
>> happening with the v4.14-rc series.
>
>
> Right, this still need to be investigated if it was introduced recently.
>
> Looking at git history, the only suspected commit is for 4.14 is
>
> commit 1e6f74536de08b5e50cf0e37e735911c2cef7c62
> Author: Willem de Bruijn <willemb@google.com>
> Date:   Fri Oct 6 13:22:31 2017 -0400
>
>     vhost_net: do not stall on zerocopy depletion
>
> Maybe you can try to revert it and see.
>
> If it does not solve your issue, I suspect there's bug elsewhere that 
> cause a packet to be held for very long time.
>
>>   I'm using rawhide so perhaps this is why it isn't disabled by 
>> default but I have to mention it's an update of FC25 up to FC28 and 
>> it never got disabled.
>> Perhaps it should be disabled in Fedora too if it's not the case... 
>> I'm not sure this is the place to discuss this ... is it? 
>
> Probably not, but I guess Fedora tries to use new technology 
> aggressively.
>
> Thanks

I can revert that commit in 4.15-rc1 but I can't find it in 4.14.2 ...  
Is there another commit that could affect this ?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-29 19:13                           ` David Hill
@ 2017-11-30  2:42                             ` Jason Wang
  2017-11-30 20:52                               ` David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2017-11-30  2:42 UTC (permalink / raw)
  To: David Hill, Paolo Bonzini, kvm



On 2017年11月30日 03:13, David Hill wrote:
>
>
> On 2017-11-29 12:15 AM, Jason Wang wrote:
>>
>>
>> On 2017年11月29日 10:52, Dave Hill wrote:
>>>>>
>>>>
>>>> Thanks. Zerocopy is disabled by several distribution by default. 
>>>> For upstream, the only reason to let it on is to hope more 
>>>> developers can help and fix the issues.
>>>>
>>>>
>>> So I never hit this issue with previous kernel and this issue 
>>> started happening with the v4.14-rc series.
>>
>>
>> Right, this still need to be investigated if it was introduced recently.
>>
>> Looking at git history, the only suspected commit is for 4.14 is
>>
>> commit 1e6f74536de08b5e50cf0e37e735911c2cef7c62
>> Author: Willem de Bruijn <willemb@google.com>
>> Date:   Fri Oct 6 13:22:31 2017 -0400
>>
>>     vhost_net: do not stall on zerocopy depletion
>>
>> Maybe you can try to revert it and see.
>>
>> If it does not solve your issue, I suspect there's bug elsewhere that 
>> cause a packet to be held for very long time.
>>
>>>   I'm using rawhide so perhaps this is why it isn't disabled by 
>>> default but I have to mention it's an update of FC25 up to FC28 and 
>>> it never got disabled.
>>> Perhaps it should be disabled in Fedora too if it's not the case... 
>>> I'm not sure this is the place to discuss this ... is it? 
>>
>> Probably not, but I guess Fedora tries to use new technology 
>> aggressively.
>>
>> Thanks
>
> I can revert that commit in 4.15-rc1 but I can't find it in 4.14.2 
> ...  Is there another commit that could affect this ?

My bad, the suspicious is then:

1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")
c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct 
ubuf_info)->refcnt to refcount_t")

Thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-30  2:42                             ` Jason Wang
@ 2017-11-30 20:52                               ` David Hill
  2017-11-30 20:59                                 ` David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-11-30 20:52 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 2017-11-29 09:42 PM, Jason Wang wrote:
>
>
> On 2017年11月30日 03:13, David Hill wrote:
>>
>>
>> On 2017-11-29 12:15 AM, Jason Wang wrote:
>>>
>>>
>>> On 2017年11月29日 10:52, Dave Hill wrote:
>>>>>>
>>>>>
>>>>> Thanks. Zerocopy is disabled by several distribution by default. 
>>>>> For upstream, the only reason to let it on is to hope more 
>>>>> developers can help and fix the issues.
>>>>>
>>>>>
>>>> So I never hit this issue with previous kernel and this issue 
>>>> started happening with the v4.14-rc series.
>>>
>>>
>>> Right, this still need to be investigated if it was introduced 
>>> recently.
>>>
>>> Looking at git history, the only suspected commit is for 4.14 is
>>>
>>> commit 1e6f74536de08b5e50cf0e37e735911c2cef7c62
>>> Author: Willem de Bruijn <willemb@google.com>
>>> Date:   Fri Oct 6 13:22:31 2017 -0400
>>>
>>>     vhost_net: do not stall on zerocopy depletion
>>>
>>> Maybe you can try to revert it and see.
>>>
>>> If it does not solve your issue, I suspect there's bug elsewhere 
>>> that cause a packet to be held for very long time.
>>>
>>>>   I'm using rawhide so perhaps this is why it isn't disabled by 
>>>> default but I have to mention it's an update of FC25 up to FC28 and 
>>>> it never got disabled.
>>>> Perhaps it should be disabled in Fedora too if it's not the case... 
>>>> I'm not sure this is the place to discuss this ... is it? 
>>>
>>> Probably not, but I guess Fedora tries to use new technology 
>>> aggressively.
>>>
>>> Thanks
>>
>> I can revert that commit in 4.15-rc1 but I can't find it in 4.14.2 
>> ...  Is there another commit that could affect this ?
>
> My bad, the suspicious is then:
>
> 1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")
> c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct 
> ubuf_info)->refcnt to refcount_t")
>
> Thanks
>

Reverting those two commits breaks kernel compilation:

net/core/dev.c: In function ‘dev_queue_xmit_nit’:
net/core/dev.c:1952:8: error: implicit declaration of function 
‘skb_orphan_frags_rx’; did you mean ‘skb_orphan_frags’? 
[-Werror=implicit-function-declaration]
    if (!skb_orphan_frags_rx(skb2, GFP_ATOMIC))
         ^~~~~~~~~~~~~~~~~~~
         skb_orphan_frags


I changed skb_orphan_frags_rx to skb_orphan_frags and it compiled but 
will everything blow up?

Thanks,
Dave

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-30 20:52                               ` David Hill
@ 2017-11-30 20:59                                 ` David Hill
  2017-12-01 16:38                                   ` David Hill
                                                     ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: David Hill @ 2017-11-30 20:59 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 2017-11-30 03:52 PM, David Hill wrote:
>
>
> On 2017-11-29 09:42 PM, Jason Wang wrote:
>>
>>
>> On 2017年11月30日 03:13, David Hill wrote:
>>>
>>>
>>> On 2017-11-29 12:15 AM, Jason Wang wrote:
>>>>
>>>>
>>>> On 2017年11月29日 10:52, Dave Hill wrote:
>>>>>>>
>>>>>>
>>>>>> Thanks. Zerocopy is disabled by several distribution by default. 
>>>>>> For upstream, the only reason to let it on is to hope more 
>>>>>> developers can help and fix the issues.
>>>>>>
>>>>>>
>>>>> So I never hit this issue with previous kernel and this issue 
>>>>> started happening with the v4.14-rc series.
>>>>
>>>>
>>>> Right, this still need to be investigated if it was introduced 
>>>> recently.
>>>>
>>>> Looking at git history, the only suspected commit is for 4.14 is
>>>>
>>>> commit 1e6f74536de08b5e50cf0e37e735911c2cef7c62
>>>> Author: Willem de Bruijn <willemb@google.com>
>>>> Date:   Fri Oct 6 13:22:31 2017 -0400
>>>>
>>>>     vhost_net: do not stall on zerocopy depletion
>>>>
>>>> Maybe you can try to revert it and see.
>>>>
>>>> If it does not solve your issue, I suspect there's bug elsewhere 
>>>> that cause a packet to be held for very long time.
>>>>
>>>>>   I'm using rawhide so perhaps this is why it isn't disabled by 
>>>>> default but I have to mention it's an update of FC25 up to FC28 
>>>>> and it never got disabled.
>>>>> Perhaps it should be disabled in Fedora too if it's not the 
>>>>> case... I'm not sure this is the place to discuss this ... is it? 
>>>>
>>>> Probably not, but I guess Fedora tries to use new technology 
>>>> aggressively.
>>>>
>>>> Thanks
>>>
>>> I can revert that commit in 4.15-rc1 but I can't find it in 4.14.2 
>>> ...  Is there another commit that could affect this ?
>>
>> My bad, the suspicious is then:
>>
>> 1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")
>> c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct 
>> ubuf_info)->refcnt to refcount_t")
>>
>> Thanks
>>
>
> Reverting those two commits breaks kernel compilation:
>
> net/core/dev.c: In function ‘dev_queue_xmit_nit’:
> net/core/dev.c:1952:8: error: implicit declaration of function 
> ‘skb_orphan_frags_rx’; did you mean ‘skb_orphan_frags’? 
> [-Werror=implicit-function-declaration]
>    if (!skb_orphan_frags_rx(skb2, GFP_ATOMIC))
>         ^~~~~~~~~~~~~~~~~~~
>         skb_orphan_frags
>
>
> I changed skb_orphan_frags_rx to skb_orphan_frags and it compiled but 
> will everything blow up?
>
> Thanks,
> Dave

Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too ... 
compiling and I'll keep you posted.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-30 20:59                                 ` David Hill
@ 2017-12-01 16:38                                   ` David Hill
  2017-12-04  4:08                                     ` Jason Wang
  2017-12-02 12:16                                   ` Harald Moeller
  2017-12-02 16:37                                   ` Harald Moeller
  2 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-12-01 16:38 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 2017-11-30 03:59 PM, David Hill wrote:
>
>
> On 2017-11-30 03:52 PM, David Hill wrote:
>>
>>
>> On 2017-11-29 09:42 PM, Jason Wang wrote:
>>>
>>>
>>> On 2017年11月30日 03:13, David Hill wrote:
>>>>
>>>>
>>>> On 2017-11-29 12:15 AM, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 2017年11月29日 10:52, Dave Hill wrote:
>>>>>>>>
>>>>>>>
>>>>>>> Thanks. Zerocopy is disabled by several distribution by default. 
>>>>>>> For upstream, the only reason to let it on is to hope more 
>>>>>>> developers can help and fix the issues.
>>>>>>>
>>>>>>>
>>>>>> So I never hit this issue with previous kernel and this issue 
>>>>>> started happening with the v4.14-rc series.
>>>>>
>>>>>
>>>>> Right, this still need to be investigated if it was introduced 
>>>>> recently.
>>>>>
>>>>> Looking at git history, the only suspected commit is for 4.14 is
>>>>>
>>>>> commit 1e6f74536de08b5e50cf0e37e735911c2cef7c62
>>>>> Author: Willem de Bruijn <willemb@google.com>
>>>>> Date:   Fri Oct 6 13:22:31 2017 -0400
>>>>>
>>>>>     vhost_net: do not stall on zerocopy depletion
>>>>>
>>>>> Maybe you can try to revert it and see.
>>>>>
>>>>> If it does not solve your issue, I suspect there's bug elsewhere 
>>>>> that cause a packet to be held for very long time.
>>>>>
>>>>>>   I'm using rawhide so perhaps this is why it isn't disabled by 
>>>>>> default but I have to mention it's an update of FC25 up to FC28 
>>>>>> and it never got disabled.
>>>>>> Perhaps it should be disabled in Fedora too if it's not the 
>>>>>> case... I'm not sure this is the place to discuss this ... is it? 
>>>>>
>>>>> Probably not, but I guess Fedora tries to use new technology 
>>>>> aggressively.
>>>>>
>>>>> Thanks
>>>>
>>>> I can revert that commit in 4.15-rc1 but I can't find it in 4.14.2 
>>>> ...  Is there another commit that could affect this ?
>>>
>>> My bad, the suspicious is then:
>>>
>>> 1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")
>>> c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct 
>>> ubuf_info)->refcnt to refcount_t")
>>>
>>> Thanks
>>>
>>
>> Reverting those two commits breaks kernel compilation:
>>
>> net/core/dev.c: In function ‘dev_queue_xmit_nit’:
>> net/core/dev.c:1952:8: error: implicit declaration of function 
>> ‘skb_orphan_frags_rx’; did you mean ‘skb_orphan_frags’? 
>> [-Werror=implicit-function-declaration]
>>    if (!skb_orphan_frags_rx(skb2, GFP_ATOMIC))
>>         ^~~~~~~~~~~~~~~~~~~
>>         skb_orphan_frags
>>
>>
>> I changed skb_orphan_frags_rx to skb_orphan_frags and it compiled but 
>> will everything blow up?
>>
>> Thanks,
>> Dave
>
> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too ... 
> compiling and I'll keep you posted.

So I'm still able to reproduce this issue even with reverting these 3 
commits.  Would you have other suspect commits ?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-30 20:59                                 ` David Hill
  2017-12-01 16:38                                   ` David Hill
@ 2017-12-02 12:16                                   ` Harald Moeller
  2017-12-02 16:37                                   ` Harald Moeller
  2 siblings, 0 replies; 31+ messages in thread
From: Harald Moeller @ 2017-12-02 12:16 UTC (permalink / raw)
  To: linux-kernel

Hello, my name is Harry and this is my first post here, hope I'm doing 
this the right way, sorry if not ...

I'm not a subscriber to the full list yet so I understand I shall ask 
you to be personally CCed.

I am following this as I do experience the same (or sort-a same) issue 
with 4.14.2.

My setup is more simple, just an oVirt host shutting down some VMs. 
Doesn't happen all the time but I'd say around 3 from 10.

This is what I see (slightly different from David):

Dec 01 16:11:53 oVirtHost01.xyz.net kernel: INFO: task qemu-kvm:1173 
blocked for more than 120 seconds.
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:       Tainted: G          
I     4.14.2-1.el7.hakimo.x86_64 #4
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: qemu-kvm        D 0  
1173      1 0x00000084
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: Call Trace:
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: __schedule+0x28d/0x880
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  schedule+0x36/0x80
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: 
vhost_net_ubuf_put_and_wait+0x61/0x90 [vhost_net]
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  ? remove_wait_queue+0x60/0x60
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: vhost_net_ioctl+0x317/0x8e0 
[vhost_net]
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_vfs_ioctl+0xa7/0x5f0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  SyS_ioctl+0x79/0x90
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_syscall_64+0x67/0x1b0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: 
entry_SYSCALL64_slow_path+0x25/0x25
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RIP: 0033:0x7fb8862d1107
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RSP: 002b:00007fff4acd7e58 
EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RAX: ffffffffffffffda RBX: 
000055abaa2d29c0 RCX: 00007fb8862d1107
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RDX: 00007fff4acd7e60 RSI: 
000000004008af30 RDI: 0000000000000028
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RBP: 00007fff4acd7e60 R08: 
000055aba805e10f R09: 00000000ffffffff
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R10: 0000000000000004 R11: 
0000000000000246 R12: 000055ababf32510
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R13: 0000000000000001 R14: 
000055ababf32498 R15: 000055abaa2a0b40

This is still happening after reverting the three suggested commits

1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")

c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct 
ubuf_info)->refcnt to refcount_t")

581fe0ea61584d88072527ae9fb9dcb9d1f2783e {"net: orphan frags on 
stand-alone ptype in dev_queue_xmit_nit"}

Anything I could be helpful with trying to solve this? Any more info I 
could provide?

Harry

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-11-30 20:59                                 ` David Hill
  2017-12-01 16:38                                   ` David Hill
  2017-12-02 12:16                                   ` Harald Moeller
@ 2017-12-02 16:37                                   ` Harald Moeller
  2017-12-07  2:44                                     ` David Hill
  2 siblings, 1 reply; 31+ messages in thread
From: Harald Moeller @ 2017-12-02 16:37 UTC (permalink / raw)
  To: kvm

Hello, my name is Harry and this is my first post here, hope I'm doing 
this the right way, sorry if not ...

I'm not a subscriber to the full list yet so I understand I shall ask 
you to be personally CCed.

I am following this as I do experience the same (or sort-a same) issue 
with 4.14.2.

My setup is more simple, just an oVirt host shutting down some VMs. 
Doesn't happen all the time but I'd say around 3 from 10.

This is what I see (slightly different from David):

Dec 01 16:11:53 oVirtHost01.xyz.net kernel: INFO: task qemu-kvm:1173 
blocked for more than 120 seconds.
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:       Tainted: G          
I     4.14.2-1.el7.hakimo.x86_64 #4
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: qemu-kvm        D 0 
1173      1 0x00000084
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: Call Trace:
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: __schedule+0x28d/0x880
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  schedule+0x36/0x80
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: 
vhost_net_ubuf_put_and_wait+0x61/0x90 [vhost_net]
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  ? remove_wait_queue+0x60/0x60
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: vhost_net_ioctl+0x317/0x8e0 
[vhost_net]
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_vfs_ioctl+0xa7/0x5f0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  SyS_ioctl+0x79/0x90
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_syscall_64+0x67/0x1b0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: 
entry_SYSCALL64_slow_path+0x25/0x25
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RIP: 0033:0x7fb8862d1107
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RSP: 002b:00007fff4acd7e58 
EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RAX: ffffffffffffffda RBX: 
000055abaa2d29c0 RCX: 00007fb8862d1107
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RDX: 00007fff4acd7e60 RSI: 
000000004008af30 RDI: 0000000000000028
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RBP: 00007fff4acd7e60 R08: 
000055aba805e10f R09: 00000000ffffffff
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R10: 0000000000000004 R11: 
0000000000000246 R12: 000055ababf32510
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R13: 0000000000000001 R14: 
000055ababf32498 R15: 000055abaa2a0b40

This is still happening after reverting the three suggested commits

1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")

c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct 
ubuf_info)->refcnt to refcount_t")

581fe0ea61584d88072527ae9fb9dcb9d1f2783e {"net: orphan frags on 
stand-alone ptype in dev_queue_xmit_nit"}

Anything I could be helpful with trying to solve this? Any more info I 
could provide?

Harry

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-01 16:38                                   ` David Hill
@ 2017-12-04  4:08                                     ` Jason Wang
  2017-12-04 19:51                                       ` David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2017-12-04  4:08 UTC (permalink / raw)
  To: David Hill, Paolo Bonzini, kvm



On 2017年12月02日 00:38, David Hill wrote:
>>
>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too ... 
>> compiling and I'll keep you posted.
>
> So I'm still able to reproduce this issue even with reverting these 3 
> commits.  Would you have other suspect commits ? 

Thanks for the testing. No, I don't have other suspect commits.

Looks like somebody else it hitting your issue too (see 
https://www.spinics.net/lists/netdev/msg468319.html)

But he claims the issue were fixed by using qemu 2.10.1.

So you may:

-try to see if qemu 2.10.1 solves your issue
-if not, try to see if commit 2ddf71e23cc246e95af72a6deed67b4a50a7b81c 
("net: add notifier hooks for devmap bpf map") is the first bad commit
-if not, maybe you can continue your bisection through git bisect skip

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-04  4:08                                     ` Jason Wang
@ 2017-12-04 19:51                                       ` David Hill
  2017-12-07  4:34                                         ` David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-12-04 19:51 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm


On 2017-12-03 11:08 PM, Jason Wang wrote:
>
>
> On 2017年12月02日 00:38, David Hill wrote:
>>>
>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too ... 
>>> compiling and I'll keep you posted.
>>
>> So I'm still able to reproduce this issue even with reverting these 3 
>> commits.  Would you have other suspect commits ? 
>
> Thanks for the testing. No, I don't have other suspect commits.
>
> Looks like somebody else it hitting your issue too (see 
> https://www.spinics.net/lists/netdev/msg468319.html)
>
> But he claims the issue were fixed by using qemu 2.10.1.
>
> So you may:
>
> -try to see if qemu 2.10.1 solves your issue
It didn't solve it for him... it's only harder to reproduce. [1]
> -if not, try to see if commit 2ddf71e23cc246e95af72a6deed67b4a50a7b81c 
> ("net: add notifier hooks for devmap bpf map") is the first bad commit
I'll try to see what I can do here
> -if not, maybe you can continue your bisection through git bisect skip
>
Some commits are so broken that the system won't boot ...  What I fear 
is that if I git bisect skip those commits, I'll also skip the commit 
culprit of my original problem

[1] https://www.spinics.net/lists/netdev/msg469887.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-02 16:37                                   ` Harald Moeller
@ 2017-12-07  2:44                                     ` David Hill
  0 siblings, 0 replies; 31+ messages in thread
From: David Hill @ 2017-12-07  2:44 UTC (permalink / raw)
  To: Harald Moeller, kvm

Have you tried adding this:


cat<<EOF>/etc/modprobe.d/vhost-net.conf
options vhost_net  experimental_zcopytx=0
EOF

reboot


Other than this, you can try bisecting but in my case, the system wont 
boot when reaching a given commit.



On 2017-12-02 11:37 AM, Harald Moeller wrote:
> Hello, my name is Harry and this is my first post here, hope I'm doing 
> this the right way, sorry if not ...
>
> I'm not a subscriber to the full list yet so I understand I shall ask 
> you to be personally CCed.
>
> I am following this as I do experience the same (or sort-a same) issue 
> with 4.14.2.
>
> My setup is more simple, just an oVirt host shutting down some VMs. 
> Doesn't happen all the time but I'd say around 3 from 10.
>
> This is what I see (slightly different from David):
>
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: INFO: task qemu-kvm:1173 
> blocked for more than 120 seconds.
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel:       Tainted: G          
> I     4.14.2-1.el7.hakimo.x86_64 #4
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: qemu-kvm        D 0 
> 1173      1 0x00000084
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: Call Trace:
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: __schedule+0x28d/0x880
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  schedule+0x36/0x80
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: 
> vhost_net_ubuf_put_and_wait+0x61/0x90 [vhost_net]
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  ? 
> remove_wait_queue+0x60/0x60
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: 
> vhost_net_ioctl+0x317/0x8e0 [vhost_net]
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_vfs_ioctl+0xa7/0x5f0
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  SyS_ioctl+0x79/0x90
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_syscall_64+0x67/0x1b0
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: 
> entry_SYSCALL64_slow_path+0x25/0x25
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RIP: 0033:0x7fb8862d1107
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RSP: 002b:00007fff4acd7e58 
> EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RAX: ffffffffffffffda RBX: 
> 000055abaa2d29c0 RCX: 00007fb8862d1107
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RDX: 00007fff4acd7e60 RSI: 
> 000000004008af30 RDI: 0000000000000028
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RBP: 00007fff4acd7e60 R08: 
> 000055aba805e10f R09: 00000000ffffffff
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R10: 0000000000000004 R11: 
> 0000000000000246 R12: 000055ababf32510
> Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R13: 0000000000000001 R14: 
> 000055ababf32498 R15: 000055abaa2a0b40
>
> This is still happening after reverting the three suggested commits
>
> 1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")
>
> c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct 
> ubuf_info)->refcnt to refcount_t")
>
> 581fe0ea61584d88072527ae9fb9dcb9d1f2783e {"net: orphan frags on 
> stand-alone ptype in dev_queue_xmit_nit"}
>
> Anything I could be helpful with trying to solve this? Any more info I 
> could provide?
>
> Harry
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-04 19:51                                       ` David Hill
@ 2017-12-07  4:34                                         ` David Hill
  2017-12-07  4:42                                           ` David Hill
  2017-12-07  5:12                                           ` Jason Wang
  0 siblings, 2 replies; 31+ messages in thread
From: David Hill @ 2017-12-07  4:34 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 2017-12-04 02:51 PM, David Hill wrote:
>
> On 2017-12-03 11:08 PM, Jason Wang wrote:
>>
>>
>> On 2017年12月02日 00:38, David Hill wrote:
>>>>
>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too 
>>>> ... compiling and I'll keep you posted.
>>>
>>> So I'm still able to reproduce this issue even with reverting these 
>>> 3 commits.  Would you have other suspect commits ? 
>>
>> Thanks for the testing. No, I don't have other suspect commits.
>>
>> Looks like somebody else it hitting your issue too (see 
>> https://www.spinics.net/lists/netdev/msg468319.html)
>>
>> But he claims the issue were fixed by using qemu 2.10.1.
>>
>> So you may:
>>
>> -try to see if qemu 2.10.1 solves your issue
> It didn't solve it for him... it's only harder to reproduce. [1]
>> -if not, try to see if commit 
>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier hooks 
>> for devmap bpf map") is the first bad commit
> I'll try to see what I can do here
I'm looking at that commit and it's been introduced before v4.13 if I'm 
not mistaken while this issue appeared between v4.13 and v4.14-rc1 .  
Between those two releases, there're  1352 commits.
Is there a way to quickly know which commits are touching vhost-net, 
zerocopy ?


[ 7496.553044]  __schedule+0x2dc/0xbb0
[ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
[ 7496.553074]  schedule+0x3d/0x90
[ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
[ 7496.553100]  ? finish_wait+0x90/0x90
[ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
[ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
[ 7496.553166]  SyS_ioctl+0x79/0x90
[ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe

>> -if not, maybe you can continue your bisection through git bisect skip
>>
> Some commits are so broken that the system won't boot ...  What I fear 
> is that if I git bisect skip those commits, I'll also skip the commit 
> culprit of my original problem
>
> [1] https://www.spinics.net/lists/netdev/msg469887.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-07  4:34                                         ` David Hill
@ 2017-12-07  4:42                                           ` David Hill
  2017-12-07  5:13                                             ` Jason Wang
  2017-12-07  5:12                                           ` Jason Wang
  1 sibling, 1 reply; 31+ messages in thread
From: David Hill @ 2017-12-07  4:42 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm

On 2017-12-06 11:34 PM, David Hill wrote:
>
>
> On 2017-12-04 02:51 PM, David Hill wrote:
>>
>> On 2017-12-03 11:08 PM, Jason Wang wrote:
>>>
>>>
>>> On 2017年12月02日 00:38, David Hill wrote:
>>>>>
>>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too 
>>>>> ... compiling and I'll keep you posted.
>>>>
>>>> So I'm still able to reproduce this issue even with reverting these 
>>>> 3 commits.  Would you have other suspect commits ? 
>>>
>>> Thanks for the testing. No, I don't have other suspect commits.
>>>
>>> Looks like somebody else it hitting your issue too (see 
>>> https://www.spinics.net/lists/netdev/msg468319.html)
>>>
>>> But he claims the issue were fixed by using qemu 2.10.1.
>>>
>>> So you may:
>>>
>>> -try to see if qemu 2.10.1 solves your issue
>> It didn't solve it for him... it's only harder to reproduce. [1]
>>> -if not, try to see if commit 
>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier hooks 
>>> for devmap bpf map") is the first bad commit
>> I'll try to see what I can do here
> I'm looking at that commit and it's been introduced before v4.13 if 
> I'm not mistaken while this issue appeared between v4.13 and v4.14-rc1 
> .  Between those two releases, there're  1352 commits.
> Is there a way to quickly know which commits are touching vhost-net, 
> zerocopy ?
>
>
> [ 7496.553044]  __schedule+0x2dc/0xbb0
> [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
> [ 7496.553074]  schedule+0x3d/0x90
> [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
> [ 7496.553100]  ? finish_wait+0x90/0x90
> [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
> [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
> [ 7496.553166]  SyS_ioctl+0x79/0x90
> [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe

That vhost_net_ubuf_put_and)wait call has been changed in this commit 
with the following comment:

commit 0ad8b480d6ee916aa84324f69acf690142aecd0e
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Thu Feb 13 11:42:05 2014 +0200

     vhost: fix ref cnt checking deadlock

     vhost checked the counter within the refcnt before decrementing.  It
     really wanted to know that it is the one that has the last 
reference, as
     a way to batch freeing resources a bit more efficiently.

     Note: we only let refcount go to 0 on device release.

     This works well but we now access the ref counter twice so there's a
     race: all users might see a high count and decide to defer freeing
     resources.
     In the end no one initiates freeing resources until the last reference
     is gone (which is on VM shotdown so might happen after a looooong 
time).

     Let's do what we probably should have done straight away:
     switch from kref to plain atomic, documenting the
     semantics, return the refcount value atomically after decrement,
     then use that to avoid the deadlock.

     Reported-by: Qin Chuanyu <qinchuanyu@huawei.com>
     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
     Acked-by: Jason Wang <jasowang@redhat.com>
     Signed-off-by: David S. Miller <davem@davemloft.net>

So at this point, are we hitting a deadlock when using 
experimental_zcopytx ?

>
>>> -if not, maybe you can continue your bisection through git bisect skip
>>>
>> Some commits are so broken that the system won't boot ...  What I 
>> fear is that if I git bisect skip those commits, I'll also skip the 
>> commit culprit of my original problem
>>
>> [1] https://www.spinics.net/lists/netdev/msg469887.html
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-07  4:34                                         ` David Hill
  2017-12-07  4:42                                           ` David Hill
@ 2017-12-07  5:12                                           ` Jason Wang
  1 sibling, 0 replies; 31+ messages in thread
From: Jason Wang @ 2017-12-07  5:12 UTC (permalink / raw)
  To: David Hill, Paolo Bonzini, kvm



On 2017年12月07日 12:34, David Hill wrote:
>
>
> On 2017-12-04 02:51 PM, David Hill wrote:
>>
>> On 2017-12-03 11:08 PM, Jason Wang wrote:
>>>
>>>
>>> On 2017年12月02日 00:38, David Hill wrote:
>>>>>
>>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too 
>>>>> ... compiling and I'll keep you posted.
>>>>
>>>> So I'm still able to reproduce this issue even with reverting these 
>>>> 3 commits.  Would you have other suspect commits ? 
>>>
>>> Thanks for the testing. No, I don't have other suspect commits.
>>>
>>> Looks like somebody else it hitting your issue too (see 
>>> https://www.spinics.net/lists/netdev/msg468319.html)
>>>
>>> But he claims the issue were fixed by using qemu 2.10.1.
>>>
>>> So you may:
>>>
>>> -try to see if qemu 2.10.1 solves your issue
>> It didn't solve it for him... it's only harder to reproduce. [1]
>>> -if not, try to see if commit 
>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier hooks 
>>> for devmap bpf map") is the first bad commit
>> I'll try to see what I can do here
> I'm looking at that commit and it's been introduced before v4.13 if 
> I'm not mistaken while this issue appeared between v4.13 and v4.14-rc1 
> .  Between those two releases, there're  1352 commits.
> Is there a way to quickly know which commits are touching vhost-net, 
> zerocopy ?
>
>
> [ 7496.553044]  __schedule+0x2dc/0xbb0
> [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
> [ 7496.553074]  schedule+0x3d/0x90
> [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
> [ 7496.553100]  ? finish_wait+0x90/0x90
> [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
> [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
> [ 7496.553166]  SyS_ioctl+0x79/0x90
> [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe

e.g you can do

#git log --oneline v4.13..v4.14-rc1 drivers/vhost/net.c
8b949be vhost_net: correctly check tx avail during rx busy polling
c1d1b43 net: convert (struct ubuf_info)->refcnt to refcount_t
1f8b977 sock: enable MSG_ZEROCOPY
7a68ada Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

If I understand it correctly, you can still hit the issue before 
1f8b977? If yes, you probably can bisect between 7a68ada and 1f8b977.

Thanks

>
>>> -if not, maybe you can continue your bisection through git bisect skip
>>>
>> Some commits are so broken that the system won't boot ...  What I 
>> fear is that if I git bisect skip those commits, I'll also skip the 
>> commit culprit of my original problem
>>
>> [1] https://www.spinics.net/lists/netdev/msg469887.html
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-07  4:42                                           ` David Hill
@ 2017-12-07  5:13                                             ` Jason Wang
  2017-12-08 18:03                                               ` David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2017-12-07  5:13 UTC (permalink / raw)
  To: David Hill, Paolo Bonzini, kvm



On 2017年12月07日 12:42, David Hill wrote:
>
>
> On 2017-12-06 11:34 PM, David Hill wrote:
>>
>>
>> On 2017-12-04 02:51 PM, David Hill wrote:
>>>
>>> On 2017-12-03 11:08 PM, Jason Wang wrote:
>>>>
>>>>
>>>> On 2017年12月02日 00:38, David Hill wrote:
>>>>>>
>>>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too 
>>>>>> ... compiling and I'll keep you posted.
>>>>>
>>>>> So I'm still able to reproduce this issue even with reverting 
>>>>> these 3 commits.  Would you have other suspect commits ? 
>>>>
>>>> Thanks for the testing. No, I don't have other suspect commits.
>>>>
>>>> Looks like somebody else it hitting your issue too (see 
>>>> https://www.spinics.net/lists/netdev/msg468319.html)
>>>>
>>>> But he claims the issue were fixed by using qemu 2.10.1.
>>>>
>>>> So you may:
>>>>
>>>> -try to see if qemu 2.10.1 solves your issue
>>> It didn't solve it for him... it's only harder to reproduce. [1]
>>>> -if not, try to see if commit 
>>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier hooks 
>>>> for devmap bpf map") is the first bad commit
>>> I'll try to see what I can do here
>> I'm looking at that commit and it's been introduced before v4.13 if 
>> I'm not mistaken while this issue appeared between v4.13 and 
>> v4.14-rc1 .  Between those two releases, there're  1352 commits.
>> Is there a way to quickly know which commits are touching vhost-net, 
>> zerocopy ?
>>
>>
>> [ 7496.553044]  __schedule+0x2dc/0xbb0
>> [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>> [ 7496.553074]  schedule+0x3d/0x90
>> [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>> [ 7496.553100]  ? finish_wait+0x90/0x90
>> [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>> [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>> [ 7496.553166]  SyS_ioctl+0x79/0x90
>> [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> That vhost_net_ubuf_put_and)wait call has been changed in this commit 
> with the following comment:
>
> commit 0ad8b480d6ee916aa84324f69acf690142aecd0e
> Author: Michael S. Tsirkin <mst@redhat.com>
> Date:   Thu Feb 13 11:42:05 2014 +0200
>
>     vhost: fix ref cnt checking deadlock
>
>     vhost checked the counter within the refcnt before decrementing.  It
>     really wanted to know that it is the one that has the last 
> reference, as
>     a way to batch freeing resources a bit more efficiently.
>
>     Note: we only let refcount go to 0 on device release.
>
>     This works well but we now access the ref counter twice so there's a
>     race: all users might see a high count and decide to defer freeing
>     resources.
>     In the end no one initiates freeing resources until the last 
> reference
>     is gone (which is on VM shotdown so might happen after a looooong 
> time).
>
>     Let's do what we probably should have done straight away:
>     switch from kref to plain atomic, documenting the
>     semantics, return the refcount value atomically after decrement,
>     then use that to avoid the deadlock.
>
>     Reported-by: Qin Chuanyu <qinchuanyu@huawei.com>
>     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>     Acked-by: Jason Wang <jasowang@redhat.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
>
> So at this point, are we hitting a deadlock when using 
> experimental_zcopytx ? 

Yes. But there could be another possibility that it was not caused by 
vhost_net itself but other places that holds a packet.

Thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-07  5:13                                             ` Jason Wang
@ 2017-12-08 18:03                                               ` David Hill
  2017-12-12  3:53                                                 ` David Hill
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-12-08 18:03 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 2017-12-07 12:13 AM, Jason Wang wrote:
>
>
> On 2017年12月07日 12:42, David Hill wrote:
>>
>>
>> On 2017-12-06 11:34 PM, David Hill wrote:
>>>
>>>
>>> On 2017-12-04 02:51 PM, David Hill wrote:
>>>>
>>>> On 2017-12-03 11:08 PM, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 2017年12月02日 00:38, David Hill wrote:
>>>>>>>
>>>>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too 
>>>>>>> ... compiling and I'll keep you posted.
>>>>>>
>>>>>> So I'm still able to reproduce this issue even with reverting 
>>>>>> these 3 commits.  Would you have other suspect commits ? 
>>>>>
>>>>> Thanks for the testing. No, I don't have other suspect commits.
>>>>>
>>>>> Looks like somebody else it hitting your issue too (see 
>>>>> https://www.spinics.net/lists/netdev/msg468319.html)
>>>>>
>>>>> But he claims the issue were fixed by using qemu 2.10.1.
>>>>>
>>>>> So you may:
>>>>>
>>>>> -try to see if qemu 2.10.1 solves your issue
>>>> It didn't solve it for him... it's only harder to reproduce. [1]
>>>>> -if not, try to see if commit 
>>>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier hooks 
>>>>> for devmap bpf map") is the first bad commit
>>>> I'll try to see what I can do here
>>> I'm looking at that commit and it's been introduced before v4.13 if 
>>> I'm not mistaken while this issue appeared between v4.13 and 
>>> v4.14-rc1 .  Between those two releases, there're 1352 commits.
>>> Is there a way to quickly know which commits are touching vhost-net, 
>>> zerocopy ?
>>>
>>>
>>> [ 7496.553044]  __schedule+0x2dc/0xbb0
>>> [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>>> [ 7496.553074]  schedule+0x3d/0x90
>>> [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>>> [ 7496.553100]  ? finish_wait+0x90/0x90
>>> [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>>> [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>>> [ 7496.553166]  SyS_ioctl+0x79/0x90
>>> [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>
>> That vhost_net_ubuf_put_and)wait call has been changed in this commit 
>> with the following comment:
>>
>> commit 0ad8b480d6ee916aa84324f69acf690142aecd0e
>> Author: Michael S. Tsirkin <mst@redhat.com>
>> Date:   Thu Feb 13 11:42:05 2014 +0200
>>
>>     vhost: fix ref cnt checking deadlock
>>
>>     vhost checked the counter within the refcnt before decrementing.  It
>>     really wanted to know that it is the one that has the last 
>> reference, as
>>     a way to batch freeing resources a bit more efficiently.
>>
>>     Note: we only let refcount go to 0 on device release.
>>
>>     This works well but we now access the ref counter twice so there's a
>>     race: all users might see a high count and decide to defer freeing
>>     resources.
>>     In the end no one initiates freeing resources until the last 
>> reference
>>     is gone (which is on VM shotdown so might happen after a looooong 
>> time).
>>
>>     Let's do what we probably should have done straight away:
>>     switch from kref to plain atomic, documenting the
>>     semantics, return the refcount value atomically after decrement,
>>     then use that to avoid the deadlock.
>>
>>     Reported-by: Qin Chuanyu <qinchuanyu@huawei.com>
>>     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>     Acked-by: Jason Wang <jasowang@redhat.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>>
>>
>> So at this point, are we hitting a deadlock when using 
>> experimental_zcopytx ? 
>
> Yes. But there could be another possibility that it was not caused by 
> vhost_net itself but other places that holds a packet.
>
> Thanks

While bisecting, when I reach this commit 
46d4b68f891bee5d83a32508bfbd9778be6b1b63, the system kernel panic when I 
run virt-customize :

Message from syslogd@zappa at Dec  8 12:52:06 ...
  kernel:[  350.016376] Kernel panic - not syncing: Fatal exception in 
interrupt

I marked that commit as bad again.   Will continue bisecting!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-08 18:03                                               ` David Hill
@ 2017-12-12  3:53                                                 ` David Hill
  2017-12-19  3:36                                                   ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: David Hill @ 2017-12-12  3:53 UTC (permalink / raw)
  To: Jason Wang, Paolo Bonzini, kvm



On 2017-12-08 01:03 PM, David Hill wrote:
>
>
> On 2017-12-07 12:13 AM, Jason Wang wrote:
>>
>>
>> On 2017年12月07日 12:42, David Hill wrote:
>>>
>>>
>>> On 2017-12-06 11:34 PM, David Hill wrote:
>>>>
>>>>
>>>> On 2017-12-04 02:51 PM, David Hill wrote:
>>>>>
>>>>> On 2017-12-03 11:08 PM, Jason Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 2017年12月02日 00:38, David Hill wrote:
>>>>>>>>
>>>>>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e 
>>>>>>>> too ... compiling and I'll keep you posted.
>>>>>>>
>>>>>>> So I'm still able to reproduce this issue even with reverting 
>>>>>>> these 3 commits.  Would you have other suspect commits ? 
>>>>>>
>>>>>> Thanks for the testing. No, I don't have other suspect commits.
>>>>>>
>>>>>> Looks like somebody else it hitting your issue too (see 
>>>>>> https://www.spinics.net/lists/netdev/msg468319.html)
>>>>>>
>>>>>> But he claims the issue were fixed by using qemu 2.10.1.
>>>>>>
>>>>>> So you may:
>>>>>>
>>>>>> -try to see if qemu 2.10.1 solves your issue
>>>>> It didn't solve it for him... it's only harder to reproduce. [1]
>>>>>> -if not, try to see if commit 
>>>>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier 
>>>>>> hooks for devmap bpf map") is the first bad commit
>>>>> I'll try to see what I can do here
>>>> I'm looking at that commit and it's been introduced before v4.13 if 
>>>> I'm not mistaken while this issue appeared between v4.13 and 
>>>> v4.14-rc1 .  Between those two releases, there're 1352 commits.
>>>> Is there a way to quickly know which commits are touching 
>>>> vhost-net, zerocopy ?
>>>>
>>>>
>>>> [ 7496.553044]  __schedule+0x2dc/0xbb0
>>>> [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>>>> [ 7496.553074]  schedule+0x3d/0x90
>>>> [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>>>> [ 7496.553100]  ? finish_wait+0x90/0x90
>>>> [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>>>> [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>>>> [ 7496.553166]  SyS_ioctl+0x79/0x90
>>>> [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>
>>> That vhost_net_ubuf_put_and)wait call has been changed in this 
>>> commit with the following comment:
>>>
>>> commit 0ad8b480d6ee916aa84324f69acf690142aecd0e
>>> Author: Michael S. Tsirkin <mst@redhat.com>
>>> Date:   Thu Feb 13 11:42:05 2014 +0200
>>>
>>>     vhost: fix ref cnt checking deadlock
>>>
>>>     vhost checked the counter within the refcnt before 
>>> decrementing.  It
>>>     really wanted to know that it is the one that has the last 
>>> reference, as
>>>     a way to batch freeing resources a bit more efficiently.
>>>
>>>     Note: we only let refcount go to 0 on device release.
>>>
>>>     This works well but we now access the ref counter twice so 
>>> there's a
>>>     race: all users might see a high count and decide to defer freeing
>>>     resources.
>>>     In the end no one initiates freeing resources until the last 
>>> reference
>>>     is gone (which is on VM shotdown so might happen after a 
>>> looooong time).
>>>
>>>     Let's do what we probably should have done straight away:
>>>     switch from kref to plain atomic, documenting the
>>>     semantics, return the refcount value atomically after decrement,
>>>     then use that to avoid the deadlock.
>>>
>>>     Reported-by: Qin Chuanyu <qinchuanyu@huawei.com>
>>>     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>     Acked-by: Jason Wang <jasowang@redhat.com>
>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>>
>>>
>>> So at this point, are we hitting a deadlock when using 
>>> experimental_zcopytx ? 
>>
>> Yes. But there could be another possibility that it was not caused by 
>> vhost_net itself but other places that holds a packet.
>>
>> Thanks
>
> While bisecting, when I reach this commit 
> 46d4b68f891bee5d83a32508bfbd9778be6b1b63, the system kernel panic when 
> I run virt-customize :
>
> Message from syslogd@zappa at Dec  8 12:52:06 ...
>  kernel:[  350.016376] Kernel panic - not syncing: Fatal exception in 
> interrupt
>
> I marked that commit as bad again.   Will continue bisecting!
>

It looks like the first bad commit would be the following:

[jenkins@zappa linux-stable-new]$ sudo bash bisect.sh -g
3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit
commit 3ece782693c4b64d588dd217868558ab9a19bfe7
Author: Willem de Bruijn <willemb@google.com>
Date:   Thu Aug 3 16:29:38 2017 -0400

     sock: skb_copy_ubufs support for compound pages

     Refine skb_copy_ubufs to support compound pages. With upcoming TCP
     zerocopy sendmsg, such fragments may appear.

     The existing code replaces each page one for one. Splitting each
     compound page into an independent number of regular pages can result
     in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.

     Instead, fill all destination pages but the last to PAGE_SIZE.
     Split the existing alloc + copy loop into separate stages:
     1. compute bytelength and minimum number of pages to store this.
     2. allocate
     3. copy, filling each page except the last to PAGE_SIZE bytes
     4. update skb frag array

     Signed-off-by: Willem de Bruijn <willemb@google.com>
     Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2 
6ecf86d9f06a2d98946f531f1e4cf803de071b10 M    include
:040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb 
4fc8384362693e4619fab39b0a945f6f2349226b M    net

Here is the bisect log:

[root@zappa linux-stable-new]# git bisect log
git bisect start
# bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1
git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e
# good: [e87c13993f16549e77abce9744af844c55154349] Linux 4.13.16
git bisect good e87c13993f16549e77abce9744af844c55154349
# good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
# good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
# bad: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect bad aae3dbb4776e7916b6cd442d00159bea27a695c1
# good: [bf1d6b2c76eda86159519bf5c427b1fa8f51f733] Merge tag 
'staging-4.14-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good bf1d6b2c76eda86159519bf5c427b1fa8f51f733
# bad: [e833251ad813168253fef9915aaf6a8c883337b0] rxrpc: Add 
notification of end-of-Tx phase
git bisect bad e833251ad813168253fef9915aaf6a8c883337b0
# bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 
'wireless-drivers-next-for-davem-2017-08-07' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63
# good: [cf6c6ea352faadb15d1373d890bf857080b218a4] iwlwifi: mvm: fix the 
FIFO numbers in A000 devices
git bisect good cf6c6ea352faadb15d1373d890bf857080b218a4
# good: [65205cc465e9b37abbdbb3d595c46081b97e35bc] sctp: remove the 
typedef sctp_addiphdr_t
git bisect good 65205cc465e9b37abbdbb3d595c46081b97e35bc
# bad: [ecbd87b8430419199cc9dd91598d5552a180f558] phylink: add support 
for MII ioctl access to Clause 45 PHYs
git bisect bad ecbd87b8430419199cc9dd91598d5552a180f558
# bad: [52267790ef52d7513879238ca9fac22c1733e0e3] sock: add MSG_ZEROCOPY
git bisect bad 52267790ef52d7513879238ca9fac22c1733e0e3
# good: [04b1d4e50e82536c12da00ee04a77510c459c844] net: core: Make the 
FIB notification chain generic
git bisect good 04b1d4e50e82536c12da00ee04a77510c459c844
# good: [9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c] ipv6: Regenerate host 
route according to node pointer upon loopback up
git bisect good 9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c
# good: [0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549] mlxsw: 
spectrum_router: Add support for route replace
git bisect good 0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549
# good: [84b7187ca2338832e3af58eb5123c02bb6921e4e] Merge branch 
'mlxsw-Support-for-IPv6-UC-router'
git bisect good 84b7187ca2338832e3af58eb5123c02bb6921e4e
# bad: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: skb_copy_ubufs 
support for compound pages
git bisect bad 3ece782693c4b64d588dd217868558ab9a19bfe7
# good: [98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f] sock: allocate skbs 
from optmem
git bisect good 98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f
# first bad commit: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: 
skb_copy_ubufs support for compound pages

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-12  3:53                                                 ` David Hill
@ 2017-12-19  3:36                                                   ` Jason Wang
  2017-12-19 16:19                                                     ` Willem de Bruijn
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2017-12-19  3:36 UTC (permalink / raw)
  To: David Hill, Paolo Bonzini, kvm; +Cc: Willem de Bruijn, netdev



On 2017年12月12日 11:53, David Hill wrote:
>
>
> On 2017-12-08 01:03 PM, David Hill wrote:
>>
>>
>> On 2017-12-07 12:13 AM, Jason Wang wrote:
>>>
>>>
>>> On 2017年12月07日 12:42, David Hill wrote:
>>>>
>>>>
>>>> On 2017-12-06 11:34 PM, David Hill wrote:
>>>>>
>>>>>
>>>>> On 2017-12-04 02:51 PM, David Hill wrote:
>>>>>>
>>>>>> On 2017-12-03 11:08 PM, Jason Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2017年12月02日 00:38, David Hill wrote:
>>>>>>>>>
>>>>>>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e 
>>>>>>>>> too ... compiling and I'll keep you posted.
>>>>>>>>
>>>>>>>> So I'm still able to reproduce this issue even with reverting 
>>>>>>>> these 3 commits.  Would you have other suspect commits ? 
>>>>>>>
>>>>>>> Thanks for the testing. No, I don't have other suspect commits.
>>>>>>>
>>>>>>> Looks like somebody else it hitting your issue too (see 
>>>>>>> https://www.spinics.net/lists/netdev/msg468319.html)
>>>>>>>
>>>>>>> But he claims the issue were fixed by using qemu 2.10.1.
>>>>>>>
>>>>>>> So you may:
>>>>>>>
>>>>>>> -try to see if qemu 2.10.1 solves your issue
>>>>>> It didn't solve it for him... it's only harder to reproduce. [1]
>>>>>>> -if not, try to see if commit 
>>>>>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier 
>>>>>>> hooks for devmap bpf map") is the first bad commit
>>>>>> I'll try to see what I can do here
>>>>> I'm looking at that commit and it's been introduced before v4.13 
>>>>> if I'm not mistaken while this issue appeared between v4.13 and 
>>>>> v4.14-rc1 .  Between those two releases, there're 1352 commits.
>>>>> Is there a way to quickly know which commits are touching 
>>>>> vhost-net, zerocopy ?
>>>>>
>>>>>
>>>>> [ 7496.553044]  __schedule+0x2dc/0xbb0
>>>>> [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>>>>> [ 7496.553074]  schedule+0x3d/0x90
>>>>> [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>>>>> [ 7496.553100]  ? finish_wait+0x90/0x90
>>>>> [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>>>>> [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>>>>> [ 7496.553166]  SyS_ioctl+0x79/0x90
>>>>> [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>>
>>>> That vhost_net_ubuf_put_and)wait call has been changed in this 
>>>> commit with the following comment:
>>>>
>>>> commit 0ad8b480d6ee916aa84324f69acf690142aecd0e
>>>> Author: Michael S. Tsirkin <mst@redhat.com>
>>>> Date:   Thu Feb 13 11:42:05 2014 +0200
>>>>
>>>>     vhost: fix ref cnt checking deadlock
>>>>
>>>>     vhost checked the counter within the refcnt before 
>>>> decrementing.  It
>>>>     really wanted to know that it is the one that has the last 
>>>> reference, as
>>>>     a way to batch freeing resources a bit more efficiently.
>>>>
>>>>     Note: we only let refcount go to 0 on device release.
>>>>
>>>>     This works well but we now access the ref counter twice so 
>>>> there's a
>>>>     race: all users might see a high count and decide to defer freeing
>>>>     resources.
>>>>     In the end no one initiates freeing resources until the last 
>>>> reference
>>>>     is gone (which is on VM shotdown so might happen after a 
>>>> looooong time).
>>>>
>>>>     Let's do what we probably should have done straight away:
>>>>     switch from kref to plain atomic, documenting the
>>>>     semantics, return the refcount value atomically after decrement,
>>>>     then use that to avoid the deadlock.
>>>>
>>>>     Reported-by: Qin Chuanyu <qinchuanyu@huawei.com>
>>>>     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>     Acked-by: Jason Wang <jasowang@redhat.com>
>>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>>
>>>>
>>>> So at this point, are we hitting a deadlock when using 
>>>> experimental_zcopytx ? 
>>>
>>> Yes. But there could be another possibility that it was not caused 
>>> by vhost_net itself but other places that holds a packet.
>>>
>>> Thanks
>>
>> While bisecting, when I reach this commit 
>> 46d4b68f891bee5d83a32508bfbd9778be6b1b63, the system kernel panic 
>> when I run virt-customize :
>>
>> Message from syslogd@zappa at Dec  8 12:52:06 ...
>>  kernel:[  350.016376] Kernel panic - not syncing: Fatal exception in 
>> interrupt
>>
>> I marked that commit as bad again.   Will continue bisecting!
>>
>
> It looks like the first bad commit would be the following:
>
> [jenkins@zappa linux-stable-new]$ sudo bash bisect.sh -g
> 3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit
> commit 3ece782693c4b64d588dd217868558ab9a19bfe7
> Author: Willem de Bruijn <willemb@google.com>
> Date:   Thu Aug 3 16:29:38 2017 -0400
>
>     sock: skb_copy_ubufs support for compound pages
>
>     Refine skb_copy_ubufs to support compound pages. With upcoming TCP
>     zerocopy sendmsg, such fragments may appear.
>
>     The existing code replaces each page one for one. Splitting each
>     compound page into an independent number of regular pages can result
>     in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.
>
>     Instead, fill all destination pages but the last to PAGE_SIZE.
>     Split the existing alloc + copy loop into separate stages:
>     1. compute bytelength and minimum number of pages to store this.
>     2. allocate
>     3. copy, filling each page except the last to PAGE_SIZE bytes
>     4. update skb frag array
>
>     Signed-off-by: Willem de Bruijn <willemb@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> :040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2 
> 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M    include
> :040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb 
> 4fc8384362693e4619fab39b0a945f6f2349226b M    net
>
> Here is the bisect log:

Thanks for the hard bisecting.

Cc netdev and Willem.


>
> [root@zappa linux-stable-new]# git bisect log
> git bisect start
> # bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1
> git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e
> # good: [e87c13993f16549e77abce9744af844c55154349] Linux 4.13.16
> git bisect good e87c13993f16549e77abce9744af844c55154349
> # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
> git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
> # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
> git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
> # bad: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> git bisect bad aae3dbb4776e7916b6cd442d00159bea27a695c1
> # good: [bf1d6b2c76eda86159519bf5c427b1fa8f51f733] Merge tag 
> 'staging-4.14-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect good bf1d6b2c76eda86159519bf5c427b1fa8f51f733
> # bad: [e833251ad813168253fef9915aaf6a8c883337b0] rxrpc: Add 
> notification of end-of-Tx phase
> git bisect bad e833251ad813168253fef9915aaf6a8c883337b0
> # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 
> 'wireless-drivers-next-for-davem-2017-08-07' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
> git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63
> # good: [cf6c6ea352faadb15d1373d890bf857080b218a4] iwlwifi: mvm: fix 
> the FIFO numbers in A000 devices
> git bisect good cf6c6ea352faadb15d1373d890bf857080b218a4
> # good: [65205cc465e9b37abbdbb3d595c46081b97e35bc] sctp: remove the 
> typedef sctp_addiphdr_t
> git bisect good 65205cc465e9b37abbdbb3d595c46081b97e35bc
> # bad: [ecbd87b8430419199cc9dd91598d5552a180f558] phylink: add support 
> for MII ioctl access to Clause 45 PHYs
> git bisect bad ecbd87b8430419199cc9dd91598d5552a180f558
> # bad: [52267790ef52d7513879238ca9fac22c1733e0e3] sock: add MSG_ZEROCOPY
> git bisect bad 52267790ef52d7513879238ca9fac22c1733e0e3
> # good: [04b1d4e50e82536c12da00ee04a77510c459c844] net: core: Make the 
> FIB notification chain generic
> git bisect good 04b1d4e50e82536c12da00ee04a77510c459c844
> # good: [9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c] ipv6: Regenerate 
> host route according to node pointer upon loopback up
> git bisect good 9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c
> # good: [0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549] mlxsw: 
> spectrum_router: Add support for route replace
> git bisect good 0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549
> # good: [84b7187ca2338832e3af58eb5123c02bb6921e4e] Merge branch 
> 'mlxsw-Support-for-IPv6-UC-router'
> git bisect good 84b7187ca2338832e3af58eb5123c02bb6921e4e
> # bad: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: skb_copy_ubufs 
> support for compound pages
> git bisect bad 3ece782693c4b64d588dd217868558ab9a19bfe7
> # good: [98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f] sock: allocate skbs 
> from optmem
> git bisect good 98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f
> # first bad commit: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: 
> skb_copy_ubufs support for compound pages
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
  2017-12-19  3:36                                                   ` Jason Wang
@ 2017-12-19 16:19                                                     ` Willem de Bruijn
  0 siblings, 0 replies; 31+ messages in thread
From: Willem de Bruijn @ 2017-12-19 16:19 UTC (permalink / raw)
  To: Jason Wang; +Cc: David Hill, Paolo Bonzini, kvm, Willem de Bruijn, netdev

>> It looks like the first bad commit would be the following:
>>
>> [jenkins@zappa linux-stable-new]$ sudo bash bisect.sh -g
>> 3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit
>> commit 3ece782693c4b64d588dd217868558ab9a19bfe7
>> Author: Willem de Bruijn <willemb@google.com>
>> Date:   Thu Aug 3 16:29:38 2017 -0400
>>
>>     sock: skb_copy_ubufs support for compound pages
>>
>>     Refine skb_copy_ubufs to support compound pages. With upcoming TCP
>>     zerocopy sendmsg, such fragments may appear.
>>
>>     The existing code replaces each page one for one. Splitting each
>>     compound page into an independent number of regular pages can result
>>     in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.
>>
>>     Instead, fill all destination pages but the last to PAGE_SIZE.
>>     Split the existing alloc + copy loop into separate stages:
>>     1. compute bytelength and minimum number of pages to store this.
>>     2. allocate
>>     3. copy, filling each page except the last to PAGE_SIZE bytes
>>     4. update skb frag array
>>
>>     Signed-off-by: Willem de Bruijn <willemb@google.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> :040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2
>> 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M    include
>> :040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb
>> 4fc8384362693e4619fab39b0a945f6f2349226b M    net
>>
>> Here is the bisect log:
>
>
> Thanks for the hard bisecting.
>
> Cc netdev and Willem.

This is being discussed in

http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>

David also previously reported this at

  https://bugzilla.kernel.org/show_bug.cgi?id=197861

which has a pointer to the above thread, too. Let's discuss this in a
single thread. I have suggested a fix there.

Thanks for bisecting. Please also test the patch in the above thread
if possible.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-12-19 16:20 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <efd45fba-5724-0036-8473-0274b5816ae9@redhat.com>
2017-11-13 15:54 ` Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover. [1] David Hill
     [not found]   ` <CALapVYHmf7gG25nA-5LkoaTDR8gB0xQ1Ro_FyyCQNbzrfSp+aQ@mail.gmail.com>
2017-11-15 21:08     ` David Hill
2017-11-22 18:22       ` Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover David Hill
2017-11-23 23:48         ` Paolo Bonzini
2017-11-24  3:11           ` Jason Wang
2017-11-24 16:19             ` David Hill
2017-11-24 16:22             ` David Hill
2017-11-27  3:44               ` Jason Wang
2017-11-27 19:38                 ` David Hill
2017-11-28 18:00                   ` David Hill
2017-11-29  1:52                     ` Jason Wang
2017-11-29  2:52                       ` Dave Hill
2017-11-29  5:15                         ` Jason Wang
2017-11-29 19:13                           ` David Hill
2017-11-30  2:42                             ` Jason Wang
2017-11-30 20:52                               ` David Hill
2017-11-30 20:59                                 ` David Hill
2017-12-01 16:38                                   ` David Hill
2017-12-04  4:08                                     ` Jason Wang
2017-12-04 19:51                                       ` David Hill
2017-12-07  4:34                                         ` David Hill
2017-12-07  4:42                                           ` David Hill
2017-12-07  5:13                                             ` Jason Wang
2017-12-08 18:03                                               ` David Hill
2017-12-12  3:53                                                 ` David Hill
2017-12-19  3:36                                                   ` Jason Wang
2017-12-19 16:19                                                     ` Willem de Bruijn
2017-12-07  5:12                                           ` Jason Wang
2017-12-02 12:16                                   ` Harald Moeller
2017-12-02 16:37                                   ` Harald Moeller
2017-12-07  2:44                                     ` David Hill

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.