KVM Archive on lore.kernel.org
 help / color / Atom feed
* [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic
@ 2020-09-13 18:39 bugzilla-daemon
  2020-10-12  0:07 ` [Bug 209253] " bugzilla-daemon
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: bugzilla-daemon @ 2020-09-13 18:39 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

            Bug ID: 209253
           Summary: Loss of connectivity on guest after important host <->
                    guest traffic
           Product: Virtualization
           Version: unspecified
    Kernel Version: 5.8.0-1-amd64
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: aubincleme@gmail.com
        Regression: No

Created attachment 292501
  --> https://bugzilla.kernel.org/attachment.cgi?id=292501&action=edit
Packet capture

I have an hypervisor running one guest VM. This guest VM has one virtual
network card, configured to use a NATed network with the host.

Upon guest startup, the guest can ping both the host and a server on the
internet. However, when starting heavy traffic between the guest and the host,
the host kernel issues the following trace, and the VM looses its network
connectivity unpon its next restart. I'm also adding to this issue a pcap file
showing the traffic on the host virtual nic, which slowly degrades, starting
form frames 21 to 23.

Information about the host :
Linux 5.8.0-1-amd64 #1 SMP Debian 5.8.7-1 (2020-09-05) x86_64 GNU/Linux
QEMU emulator version 5.1.0 (Debian 1:5.1+dfsg-4)

Information about the guest VM :
Linux 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux

Network configuration for the guest :
<interface type="network">
  <mac address="52:54:00:f2:29:56"/>
  <source network="bridge-vm"/>
  <model type="virtio"/>
  <link state="up"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>
<network>
  <name>bridge-vm</name>
  <uuid>159f7f26-391c-44f7-8e6e-dc1b213837a6</uuid>
  <forward mode="nat">
    <nat>
      <port start="1024" end="65535"/>
    </nat>
  </forward>
  <bridge name="virbr0" stp="on" delay="0"/>
  <mac address="52:54:00:45:68:56"/>
  <ip address="192.168.122.1" netmask="255.255.255.0">
    <dhcp>
      <range start="192.168.122.2" end="192.168.122.254"/>
    </dhcp>
  </ip>
</network>

Kernel trace on the host :
[ 1492.533631] ------------[ cut here ]------------
[ 1492.533637] WARNING: CPU: 2 PID: 3835 at fs/eventfd.c:74
eventfd_signal+0x88/0xa0
[ 1492.533638] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth
jitterentropy_rng drbg ansi_cprng ecdh_generic ecc cfg80211 macvtap macvlan
vhost_net vhost tap vhost_iotlb nf_conntrack_netlink xfrm_user xfrm_algo
xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp nft_compat nft_counter nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_tables nfnetlink bridge
stp llc tun overlay binfmt_misc intel_rapl_msr intel_rapl_common
x86_pkg_temp_thermal intel_powerclamp coretemp joydev kvm_intel
snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic
ledtrig_audio snd_hda_intel snd_intel_dspcfg rapl snd_hda_codec eeepc_wmi
intel_cstate snd_hda_core intel_uncore snd_hwdep asus_wmi snd_pcm pcspkr
battery snd_timer iTCO_wdt sparse_keymap intel_pmc_bxt rfkill xpad snd
serio_raw iTCO_vendor_support ff_memless mei_me intel_wmi_thunderbolt wmi_bmof
soundcore sg watchdog mei acpi_pad evdev parport_pc ppdev lp
[ 1492.533657]  parport nfsd vfio_pci vfio_virqfd vfio_iommu_type1 vfio
auth_rpcgss irqbypass nfs_acl lockd grace sunrpc ip_tables x_tables autofs4
ext4 crc16 mbcache jbd2 crc32c_generic dm_crypt dm_mod sd_mod t10_pi crc_t10dif
crct10dif_generic hid_generic usbhid hid i915 crct10dif_pclmul crct10dif_common
crc32_pclmul crc32c_intel ghash_clmulni_intel i2c_algo_bit drm_kms_helper ahci
libahci e1000e aesni_intel mxm_wmi psmouse libaes crypto_simd cryptd
glue_helper libata cec xhci_pci ptp xhci_hcd pps_core drm i2c_i801 i2c_smbus
usbcore scsi_mod usb_common wmi video button
[ 1492.533674] CPU: 2 PID: 3835 Comm: vhost-3826 Not tainted 5.8.0-1-amd64 #1
Debian 5.8.7-1
[ 1492.533674] Hardware name: System manufacturer System Product Name/Z170-A,
BIOS 3802 03/15/2018
[ 1492.533676] RIP: 0010:eventfd_signal+0x88/0xa0
[ 1492.533677] Code: 03 00 00 00 4c 89 f7 e8 e6 a8 dc ff 65 ff 0d 0f 20 92 6a
4c 89 ee 4c 89 f7 e8 34 6a 52 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45
31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
[ 1492.533678] RSP: 0018:ffffb27ac06afd40 EFLAGS: 00010202
[ 1492.533679] RAX: 0000000000000001 RBX: ffff94c12c87f000 RCX:
0000000000000000
[ 1492.533680] RDX: 000000000000ad5c RSI: 0000000000000001 RDI:
ffff94c0dd9e81c0
[ 1492.533680] RBP: 0000000000000000 R08: 0000006e000000a8 R09:
ffff94c12c87f348
[ 1492.533681] R10: 0000000000000000 R11: 0000000000000020 R12:
ffff94c10d3f01d0
[ 1492.533681] R13: ffff94c10d3f0000 R14: ffff94c10d3f00c8 R15:
ffffb27ac06afe18
[ 1492.533682] FS:  0000000000000000(0000) GS:ffff94c14ed00000(0000)
knlGS:0000000000000000
[ 1492.533683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1492.533683] CR2: 0000000000000000 CR3: 0000000bc2300003 CR4:
00000000003626e0
[ 1492.533684] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1492.533685] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1492.533685] Call Trace:
[ 1492.533691]  handle_rx+0xbe/0x9f0 [vhost_net]
[ 1492.533694]  vhost_worker+0x88/0xd0 [vhost]
[ 1492.533696]  ? vhost_exceeds_weight+0x50/0x50 [vhost]
[ 1492.533698]  kthread+0x119/0x140
[ 1492.533699]  ? __kthread_bind_mask+0x60/0x60
[ 1492.533701]  ret_from_fork+0x22/0x30
[ 1492.533703] ---[ end trace a62bb924e0497bb1 ]---

Thanks in advance for your time!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
@ 2020-10-12  0:07 ` bugzilla-daemon
  2020-10-17 15:59 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: bugzilla-daemon @ 2020-10-12  0:07 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

Ian Pilcher (arequipeno@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arequipeno@gmail.com

--- Comment #1 from Ian Pilcher (arequipeno@gmail.com) ---
I am seeing a very similar crash, but the device in my case is an NVIDIA GPU,
passed through to a Windows guest for video processing.

[ 6094.567434] WARNING: CPU: 7 PID: 2524 at fs/eventfd.c:74
eventfd_signal+0x88/0xa0
[ 6094.567464] Modules linked in: vhost_net vhost tap vhost_iotlb tun
nft_chain_nat 8021q garp mrp stp llc sch_ingress bonding openvswitch nsh
nf_conncount nf_nat nft_counter ipt_REJECT ip6t_REJECT nf_reject_ipv4
nf_reject_ipv6 xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nft_compat nf_tables nfnetlink intel_rapl_msr intel_rapl_common sb_edac
x86_pkg_temp_thermal sunrpc intel_powerclamp coretemp raid10 kvm_intel iTCO_wdt
intel_pmc_bxt kvm iTCO_vendor_support gpio_ich ipmi_ssif rapl ixgbe
intel_cstate joydev i2c_i801 igb intel_uncore ioatdma mei_me acpi_ipmi
i2c_smbus mdio intel_pch_thermal mei dca lpc_ich ipmi_si ipmi_devintf
ipmi_msghandler acpi_pad vfat fat ip_tables xfs ast drm_vram_helper
drm_ttm_helper i2c_algo_bit drm_kms_helper cec ttm mxm_wmi drm crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel wmi vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio
[ 6094.567794] CPU: 7 PID: 2524 Comm: CPU 3/KVM Not tainted
5.8.13-200.fc32.x86_64 #1
[ 6094.567834] Hardware name: Supermicro SYS-5028D-TN4T/X10SDV-TLN4F, BIOS 2.1
11/22/2019
[ 6094.567868] RIP: 0010:eventfd_signal+0x88/0xa0
[ 6094.567889] Code: 03 00 00 00 4c 89 f7 e8 26 16 db ff 65 ff 0d 3f f3 ca 43
4c 89 ee 4c 89 f7 e8 34 8e 7f 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45
31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
[ 6094.567974] RSP: 0018:ffffac8780b97bb0 EFLAGS: 00010286
[ 6094.567996] RAX: 00000000ffffffff RBX: ffff9b01c8ed0000 RCX:
0000000000000004
[ 6094.568021] RDX: 00000000c8088704 RSI: 0000000000000001 RDI:
ffff9b11ef445240
[ 6094.568050] RBP: ffffac8780b97c18 R08: ffff9b11ef4cdf40 R09:
00000000c8088708
[ 6094.568080] R10: 0000000000000000 R11: 0000000000000190 R12:
0000000000000002
[ 6094.568105] R13: ffff9b11ef4bbb00 R14: ffff9b11ef4cdf40 R15:
ffff9b11ef4bbb00
[ 6094.568145] FS:  00007f30b8b78700(0000) GS:ffff9b123fbc0000(0000)
knlGS:000000ad76006000
[ 6094.568178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6094.568202] CR2: 000001d9b5100000 CR3: 0000001fb0ed2003 CR4:
00000000003626e0
[ 6094.568232] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 6094.568261] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 6094.568290] Call Trace:
[ 6094.568336]  ioeventfd_write+0x51/0x80 [kvm]
[ 6094.568385]  __kvm_io_bus_write+0x88/0xb0 [kvm]
[ 6094.568417]  kvm_io_bus_write+0x43/0x60 [kvm]
[ 6094.568454]  write_mmio+0x70/0xf0 [kvm]
[ 6094.568488]  emulator_read_write_onepage+0x11e/0x330 [kvm]
[ 6094.568527]  emulator_read_write+0xca/0x180 [kvm]
[ 6094.568564]  segmented_write.isra.0+0x4a/0x60 [kvm]
[ 6094.568601]  x86_emulate_insn+0x850/0xe60 [kvm]
[ 6094.568636]  x86_emulate_instruction+0x2c7/0x780 [kvm]
[ 6094.568680]  ? kvm_io_bus_write+0x43/0x60 [kvm]
[ 6094.569821]  kvm_arch_vcpu_ioctl_run+0xeb9/0x1770 [kvm]
[ 6094.570963]  kvm_vcpu_ioctl+0x209/0x590 [kvm]
[ 6094.572099]  ksys_ioctl+0x82/0xc0
[ 6094.573208]  __x64_sys_ioctl+0x16/0x20
[ 6094.574294]  do_syscall_64+0x4d/0x90
[ 6094.575348]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 6094.576388] RIP: 0033:0x7f30c2bac3bb
[ 6094.577422] Code: 0f 1e fa 48 8b 05 dd aa 0c 00 64 c7 00 26 00 00 00 48 c7
c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d ad aa 0c 00 f7 d8 64 89 01 48
[ 6094.579603] RSP: 002b:00007f30b8b77668 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 6094.580713] RAX: ffffffffffffffda RBX: 000056277c8e4ae0 RCX:
00007f30c2bac3bb
[ 6094.581835] RDX: 0000000000000000 RSI: 000000000000ae80 RDI:
000000000000001e
[ 6094.582951] RBP: 00007f30c010a000 R08: 000056277aebebf0 R09:
0000000000000000
[ 6094.584028] R10: 0000000000000001 R11: 0000000000000246 R12:
0000000000000001
[ 6094.585078] R13: 00007f30c010b001 R14: 0000000000000000 R15:
000056277b358a00
[ 6094.586103] ---[ end trace dab8395baf5baf8c ]---

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
  2020-10-12  0:07 ` [Bug 209253] " bugzilla-daemon
@ 2020-10-17 15:59 ` bugzilla-daemon
  2020-10-20 14:28 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: bugzilla-daemon @ 2020-10-17 15:59 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

Martin (kdev@mb706.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kdev@mb706.com

--- Comment #2 from Martin (kdev@mb706.com) ---
I am having problems possibly related to Clement's, and likely related to
Ian's. 
I am running KVM on a dual Nvidia GPU machine, passing one GPU through to the
KVM guest (Ubuntu 20.04.1).
The setup ran stable for quite a while on Fedora 31 (5.7.15-100). After
upgrading to Fedora 32 (5.8.13), the system tends to run well for a few hours
after rebooting, and then produces oopses (below). After the oops, VirtIO
drives, VirtIO network cards, and PCIe passthrough tend to hang indefinitely
within minutes of rebooting the guest, usually making the guest inoperable
(unless only non-VirtIO devices and no GPU passthrough are used). Rebooting the
host makes things work again for a few hours until the next oops happens. I am
on 5.8.14 now with the same problem.

Oops 1 (I saw this twice, once on 5.8.13-200 and once on 5.8.14-200):

WARNING: CPU: 28 PID: 17651 at fs/eventfd.c:74 eventfd_signal+0x88/0xa0
Modules linked in: vhost_net vhost tap vhost_iotlb v4l2loopback(OE) xt_nat
xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nft_objref
nf_conntrack_tftp tun bridge stp llc evdi(OE) vboxnetadp(OE) vboxnetflt(OE)
vboxdrv(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables
ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter sunrpc ucsi_ccg typec_ucsi
nvidia_drm(POE) typec nvidia_modeset(POE) snd_hda_codec_realtek nvidia_uvm(OE)
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb edac_mce_amd btrtl
btbcm snd_hda_intel uvcvideo iwlmvm snd_intel_dspcfg kvm_amd btintel
snd_usb_audio snd_hda_codec videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib
mac80211
 nvidia(POE) kvm bluetooth snd_hda_core videobuf2_v4l2 snd_rawmidi snd_hwdep
videobuf2_common libarc4 snd_seq iwlwifi videodev joydev rapl snd_seq_device
ecdh_generic wmi_bmof pcspkr cfg80211 mc ecc snd_pcm drm_kms_helper snd_timer
sp5100_tco k10temp snd i2c_piix4 rfkill soundcore cec i2c_nvidia_gpu gpio_amdpt
gpio_generic acpi_cpufreq drm ip_tables dm_crypt hid_lenovo mxm_wmi
crct10dif_pclmul crc32_pclmul crc32c_intel nvme ghash_clmulni_intel nvme_core
igb wacom ccp uas dca usb_storage i2c_algo_bit wmi pinctrl_amd vfio_pci
irqbypass vfio_virqfd vfio_iommu_type1 vfio fuse
CPU: 28 PID: 17651 Comm: CPU 5/KVM Tainted: P           OE    
5.8.13-200.fc32.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE
EX-CF, BIOS F12 12/11/2019
RIP: 0010:eventfd_signal+0x88/0xa0
Code: 03 00 00 00 4c 89 f7 e8 26 16 db ff 65 ff 0d 3f f3 ca 4b 4c 89 ee 4c 89
f7 e8 34 8e 7f 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 e4 5b 5d 4c
89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
RSP: 0018:ffffab10c8db7bb0 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff9a71e16b8000 RCX: 0000000000000004
RDX: 00000000c8088704 RSI: 0000000000000001 RDI: ffff9a8335656580
RBP: ffffab10c8db7c18 R08: ffff9a72f7d120a0 R09: 00000000c8088708
R10: 0000000000000000 R11: 0000000000000014 R12: 0000000000000001
R13: ffff9a72a3153448 R14: ffff9a72f7d120a0 R15: ffff9a72a3153448
FS:  0000000000000000(0000) GS:ffff9a7e7f280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64e403f024 CR3: 000000041b5f4000 CR4: 00000000003406e0
Call Trace:
 ioeventfd_write+0x51/0x80 [kvm]
 __kvm_io_bus_write+0x88/0xb0 [kvm]
 kvm_io_bus_write+0x43/0x60 [kvm]
 write_mmio+0x70/0xf0 [kvm]
 emulator_read_write_onepage+0x11e/0x330 [kvm]
 emulator_read_write+0xca/0x180 [kvm]
 segmented_write.isra.0+0x4a/0x60 [kvm]
 x86_emulate_insn+0x850/0xe60 [kvm]
 x86_emulate_instruction+0x2c7/0x780 [kvm]
 ? kvm_set_cr8+0x1e/0x40 [kvm]
 kvm_arch_vcpu_ioctl_run+0xeb9/0x1770 [kvm]
 ? x86_pmu_enable+0x106/0x2f0
 ? __switch_to_xtra+0x495/0x500
 kvm_vcpu_ioctl+0x209/0x590 [kvm]
 ksys_ioctl+0x82/0xc0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x4d/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f5f6a84f3bb
Code: 0f 1e fa 48 8b 05 dd aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
c3 48 8b 0d ad aa 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f5f527fb668 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000055c459b6f1f0 RCX: 00007f5f6a84f3bb
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000022
RBP: 00007f5f6bcca000 R08: 000055c45750abf0 R09: 000000003b9aca00
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007f5f6bccb004 R14: 0000000000000000 R15: 000055c4579a4a00

Oops 2 (saw this once on 5.8.14-200):

WARNING: CPU: 24 PID: 0 at fs/eventfd.c:74 eventfd_signal+0x88/0xa0
Modules linked in: v4l2loopback(OE) nfnetlink_queue nfnetlink_log vhost_net
vhost tap vhost_iotlb xt_nat xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc vboxnetadp(OE)
vboxnetflt(OE) vboxdrv(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat
nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter sunrpc nvidia_drm(POE)
nvidia_modeset(POE) iwlmvm nvidia_uvm(OE) snd_hda_codec_realtek ucsi_ccg
typec_ucsi mac80211 typec edac_mce_amd snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi uvcvideo btusb btrtl btbcm nvidia(POE) snd_hda_intel
videobuf2_vmalloc libarc4 kvm_amd btintel videobuf2_memops snd_intel_dspcfg
 snd_hda_codec videobuf2_v4l2 kvm bluetooth videobuf2_common snd_usb_audio
iwlwifi snd_hda_core videodev snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi
joydev rapl snd_seq_device ecdh_generic mc pcspkr wmi_bmof ecc cfg80211 snd_pcm
drm_kms_helper snd_timer snd sp5100_tco i2c_piix4 k10temp rfkill soundcore cec
i2c_nvidia_gpu gpio_amdpt gpio_generic acpi_cpufreq drm ip_tables dm_crypt
mxm_wmi crct10dif_pclmul crc32_pclmul crc32c_intel nvme ghash_clmulni_intel igb
nvme_core wacom uas dca hid_lenovo ccp usb_storage i2c_algo_bit wmi pinctrl_amd
vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio fuse
CPU: 24 PID: 0 Comm: swapper/24 Tainted: P           OE    
5.8.14-200.fc32.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE
EX-CF, BIOS F12 12/11/2019
RIP: 0010:eventfd_signal+0x88/0xa0
Code: 03 00 00 00 4c 89 f7 e8 a6 14 db ff 65 ff 0d bf f1 ca 78 4c 89 ee 4c 89
f7 e8 b4 9c 7f 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 e4 5b 5d 4c
89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
RSP: 0018:ffffb5e2c6d2cf38 EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff894a2a1f1480 RCX: 000000000000001f
RDX: ffff89423920ce00 RSI: 0000000000000001 RDI: ffff894929afc580
RBP: ffff89423920cea4 R08: ffffb5e2c6d2cff8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000a1
R13: 0000000000000000 R14: ffffb5e2c6d2cfb4 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff89423f180000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055ae44aa8990 CR3: 0000000797204000 CR4: 00000000003406e0
Call Trace:
 <IRQ>
 vfio_msihandler+0x12/0x20 [vfio_pci]
 __handle_irq_event_percpu+0x42/0x180
 handle_irq_event+0x47/0x8a
 handle_edge_irq+0x87/0x220
 asm_call_irq_on_stack+0x12/0x20
 </IRQ>
 common_interrupt+0xb2/0x140
 asm_common_interrupt+0x1e/0x40
RIP: 0010:cpuidle_enter_state+0xb6/0x3f0
Code: 90 a5 6b 78 e8 5b be 7b ff 49 89 c7 0f 1f 44 00 00 31 ff e8 2c d7 7b ff
80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e0 01 00
00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
RSP: 0018:ffffb5e2c0337e88 EFLAGS: 00000246
RAX: ffff89423f1aa2c0 RBX: ffff89423366e400 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 000000002abf3055 RDI: 0000000000000000
RBP: ffffffff88b78940 R08: 00000a86556fd237 R09: 0000000000000018
R10: 0000000000002358 R11: 0000000000000781 R12: 0000000000000002
R13: ffff89423366e400 R14: 0000000000000002 R15: 00000a86556fd237
 ? cpuidle_enter_state+0xa4/0x3f0
 cpuidle_enter+0x29/0x40
 do_idle+0x1d5/0x2a0
 cpu_startup_entry+0x19/0x20
 start_secondary+0x144/0x170
 secondary_startup_64+0xb6/0xc0

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
  2020-10-12  0:07 ` [Bug 209253] " bugzilla-daemon
  2020-10-17 15:59 ` bugzilla-daemon
@ 2020-10-20 14:28 ` bugzilla-daemon
  2020-10-22 16:14 ` bugzilla-daemon
  2020-10-22 22:26 ` bugzilla-daemon
  4 siblings, 0 replies; 6+ messages in thread
From: bugzilla-daemon @ 2020-10-20 14:28 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #3 from Ian Pilcher (arequipeno@gmail.com) ---
Thanks for the note about 5.7.15.  I've built it for F32, and my GPU *seems*
stable so far (been running a video upscaling job for almost 24 hours).

(I did have an issue where the hypervisor (libvirtd?) seems to lock up if I try
to start the guest wit virt-manager over SSH; had to start it with 'virsh
start' locally.)

This indicates that it *may* be possible to bisect this issue, although the
lack of a time-bound reproducer means that it will take a long time.

@Clément - Can you test 5.7.15?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-10-20 14:28 ` bugzilla-daemon
@ 2020-10-22 16:14 ` bugzilla-daemon
  2020-10-22 22:26 ` bugzilla-daemon
  4 siblings, 0 replies; 6+ messages in thread
From: bugzilla-daemon @ 2020-10-22 16:14 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

Justin Gatzen (justin.gatzen@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |justin.gatzen@gmail.com

--- Comment #4 from Justin Gatzen (justin.gatzen@gmail.com) ---
Seeing the same issue here on 5.9.1 and 5.8.14. I did not notice any trouble on
5.8.2 for about two weeks of usage, but that is just an anecdote. I have not
attempted to bisect this because the bug takes quite a while to trigger.


Stack trace from a VFIO enabled VM with SRIOV nic, NVME passthrough, GPU
passthrough, Windows guest:

[83372.203651] ------------[ cut here ]------------
[83372.203659] WARNING: CPU: 16 PID: 0 at fs/eventfd.c:74
eventfd_signal+0x89/0xa0
[83372.203661] Modules linked in: vhost_net vhost tap vhost_iotlb tun vfio_pci
vfio_virqfd vfio_iommu_type1 vfio ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter ip_tables nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nft_counter uinput nft_meta_bridge nf_tables nfnetlink xpad hid_microsoft
ff_memless mlx5_ib ib_uverbs nouveau ib_core edac_mce_amd mlx5_core iwlmvm
kvm_amd mac80211 kvm snd_hda_codec_realtek irqbypass sp5100_tco
snd_hda_codec_generic pcspkr rapl k10temp wmi_bmof i2c_piix4 mlxfw
ledtrig_audio snd_hda_codec_hdmi joydev pci_hyperv_intf snd_hda_intel bridge
snd_intel_dspcfg libarc4 snd_hda_codec btusb btrtl iwlwifi snd_hda_core btbcm
mxm_wmi stp btintel llc video snd_hwdep bluetooth snd_pcm ttm snd_timer
cfg80211 drm_kms_helper veth igb snd ecdh_generic ecc cec i2c_algo_bit
soundcore r8169 rfkill dca acpi_cpufreq vfat fat drm essiv authenc dm_crypt xfs
crct10dif_pclmul ccp crc32_pclmul crc32c_intel nvme
[83372.203687]  ghash_clmulni_intel nvme_core wmi pinctrl_amd
[83372.203702] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.9.1-dirty #49
[83372.203704] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS
MASTER/X570 AORUS MASTER, BIOS F30 09/07/2020
[83372.203707] RIP: 0010:eventfd_signal+0x89/0xa0
[83372.203709] Code: 03 00 00 00 4c 89 f7 e8 b5 3c db ff 65 ff 0d 3e 26 cb 52
4c 89 ee 4c 89 f7 e8 b3 65 7e 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45
31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e c3 66 0f 1f 44 00
[83372.203712] RSP: 0018:ffffa2cdc0554f28 EFLAGS: 00010002
[83372.203714] RAX: 0000000000000001 RBX: ffff8ceaec0d4880 RCX:
000000000000001f
[83372.203716] RDX: ffff8ceaed3b7200 RSI: 0000000000000001 RDI:
ffff8ce6eaa58340
[83372.203717] RBP: ffff8ceaed3b7200 R08: 00004bd39a951b00 R09:
0000000000000000
[83372.203719] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000000d9
[83372.203720] R13: 0000000000000000 R14: ffffa2cdc0554fa4 R15:
0000000000000000
[83372.203722] FS:  0000000000000000(0000) GS:ffff8cf5bee00000(0000)
knlGS:0000000000000000
[83372.203724] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83372.203726] CR2: 00005562c8109c68 CR3: 00000001694fc000 CR4:
0000000000350ee0
[83372.203727] Call Trace:
[83372.203729]  <IRQ>
[83372.203734]  vfio_msihandler+0x12/0x20 [vfio_pci]
[83372.203738]  __handle_irq_event_percpu+0x42/0x180
[83372.203740]  handle_irq_event_percpu+0x21/0x60
[83372.203742]  handle_irq_event+0x36/0x53
[83372.203744]  handle_edge_irq+0x83/0x190
[83372.203747]  asm_call_irq_on_stack+0x12/0x20
[83372.203749]  </IRQ>
[83372.203751]  common_interrupt+0xb5/0x130
[83372.203753]  asm_common_interrupt+0x1e/0x40
[83372.203756] RIP: 0010:cpuidle_enter_state+0xdf/0x390
[83372.203758] Code: e8 76 49 7d ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00
f6 c4 02 0f 85 8f 02 00 00 31 ff e8 28 b3 83 ff fb 66 0f 1f 44 00 00 <45> 85 f6
0f 88 20 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d
[83372.203761] RSP: 0018:ffffa2cdc01c7e88 EFLAGS: 00000246
[83372.203762] RAX: ffff8cf5bee2a300 RBX: ffff8cf3b21d9400 RCX:
000000000000001f
[83372.203764] RDX: 0000000000000000 RSI: 000000002491bed3 RDI:
0000000000000000
[83372.203765] RBP: ffffffffaeb7f180 R08: 00004bd39a95195c R09:
0000000000001206
[83372.203767] R10: 0000000000004852 R11: ffff8cf5bee29124 R12:
00004bd39a95195c
[83372.203768] R13: ffffffffaeb7f268 R14: 0000000000000002 R15:
ffff8cf3b21d9400
[83372.203772]  ? cpuidle_enter_state+0xba/0x390
[83372.203774]  cpuidle_enter+0x29/0x40
[83372.203776]  do_idle+0x1b8/0x240
[83372.203778]  cpu_startup_entry+0x19/0x20
[83372.203780]  start_secondary+0x103/0x130
[83372.203783]  secondary_startup_64+0xb6/0xc0
[83372.203785] ---[ end trace a1dd3ff1b3977640 ]---


And from a non-VFIO run of the mill linux guest:

[37342.189129] ------------[ cut here ]------------
[37342.189137] WARNING: CPU: 21 PID: 1137 at fs/eventfd.c:74
eventfd_signal+0x89/0xa0
[37342.189139] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio
vhost_net vhost tap vhost_iotlb tun nouveau ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter ip_tables nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 mlx5_ib nft_counter nft_meta_bridge nf_tables ib_uverbs uinput
nfnetlink ib_core iwlmvm xpad hid_microsoft ff_memless mac80211 mlx5_core
snd_hda_codec_realtek joydev libarc4 snd_hda_codec_generic snd_hda_codec_hdmi
ledtrig_audio iwlwifi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_hda_codec
kvm_amd bridge btusb snd_hda_core btrtl kvm btbcm video btintel snd_hwdep
bluetooth ttm cfg80211 snd_pcm drm_kms_helper igb stp llc snd_timer irqbypass
veth rapl snd mlxfw pcspkr wmi_bmof mxm_wmi sp5100_tco k10temp i2c_piix4
i2c_algo_bit pci_hyperv_intf ecdh_generic cec r8169 dca soundcore ecc rfkill
drm acpi_cpufreq vfat fat essiv authenc dm_crypt xfs nvme nvme_core
crct10dif_pclmul ccp crc32_pclmul
[37342.189162]  crc32c_intel ghash_clmulni_intel wmi pinctrl_amd
[37342.189174] CPU: 21 PID: 1137 Comm: vhost-1125 Not tainted 5.9.1-dirty #50
[37342.189175] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS
MASTER/X570 AORUS MASTER, BIOS F30 09/07/2020
[37342.189178] RIP: 0010:eventfd_signal+0x89/0xa0
[37342.189180] Code: 03 00 00 00 4c 89 f7 e8 b5 3c db ff 65 ff 0d 3e 26 cb 79
4c 89 ee 4c 89 f7 e8 b3 65 7e 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45
31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e c3 66 0f 1f 44 00
[37342.189182] RSP: 0018:ffffadbcc1233d60 EFLAGS: 00010202
[37342.189183] RAX: 0000000000000001 RBX: ffff986bb47746b8 RCX:
00000000c3160001
[37342.189185] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff986948f6c9c0
[37342.189186] RBP: ffff986bb4770000 R08: 0000000000000001 R09:
0000000000000101
[37342.189187] R10: 000000002b30008a R11: 0000000022308000 R12:
0000000000000001
[37342.189189] R13: ffffadbcc1233e18 R14: ffff98630de8f300 R15:
ffff986bb47746b8
[37342.189190] FS:  0000000000000000(0000) GS:ffff986bbef40000(0000)
knlGS:0000000000000000
[37342.189192] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37342.189193] CR2: 00007f77c69ee890 CR3: 0000001029214000 CR4:
0000000000350ee0
[37342.189194] Call Trace:
[37342.189199]  vhost_tx_batch.isra.0+0x7d/0xc0 [vhost_net]
[37342.189202]  handle_tx_copy+0x15c/0x550 [vhost_net]
[37342.189204]  handle_tx+0xa5/0xe0 [vhost_net]
[37342.189207]  vhost_worker+0x8d/0xd0 [vhost]
[37342.189209]  ? vhost_vring_call_reset+0x40/0x40 [vhost]
[37342.189212]  kthread+0xfe/0x140
[37342.189214]  ? kthread_park+0x90/0x90
[37342.189216]  ret_from_fork+0x22/0x30
[37342.189218] ---[ end trace ae48714189db9592 ]---

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-10-22 16:14 ` bugzilla-daemon
@ 2020-10-22 22:26 ` bugzilla-daemon
  4 siblings, 0 replies; 6+ messages in thread
From: bugzilla-daemon @ 2020-10-22 22:26 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #5 from Ian Pilcher (arequipeno@gmail.com) ---
Based on my git bisect, it looks like this commit is triggering the WARNING.

commit c49fa6397b6d29ce10c0ae5b2528bb004a14691f
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Mon Aug 17 11:08:18 2020 -0600

    vfio-pci: Avoid recursive read-lock usage

    [ Upstream commit bc93b9ae0151ae5ad5b8504cdc598428ea99570b ]

    A down_read on memory_lock is held when performing read/write accesses
    to MMIO BAR space, including across the copy_to/from_user() callouts
    which may fault.  If the user buffer for these copies resides in an
    mmap of device MMIO space, the mmap fault handler will acquire a
    recursive read-lock on memory_lock.  Avoid this by reducing the lock
    granularity.  Sequential accesses requiring multiple ioread/iowrite
    cycles are expected to be rare, therefore typical accesses should not
    see additional overhead.

    VGA MMIO accesses are expected to be non-fatal regardless of the PCI
    memory enable bit to allow legacy probing, this behavior remains with
    a comment added.  ioeventfds are now included in memory access testing,
    with writes dropped while memory space is disabled.

    Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on
disabled memory")
    Reported-by: Zhiyi Guo <zhguo@redhat.com>
    Tested-by: Zhiyi Guo <zhguo@redhat.com>
    Reviewed-by: Cornelia Huck <cohuck@redhat.com>
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 drivers/vfio/pci/vfio_pci_private.h |   2 +
 drivers/vfio/pci/vfio_pci_rdwr.c    | 120 ++++++++++++++++++++++++++++--------
 2 files changed, 98 insertions(+), 24 deletions(-)

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bc93b9ae0151ae5ad5b8504cdc598428ea99570b

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
2020-10-12  0:07 ` [Bug 209253] " bugzilla-daemon
2020-10-17 15:59 ` bugzilla-daemon
2020-10-20 14:28 ` bugzilla-daemon
2020-10-22 16:14 ` bugzilla-daemon
2020-10-22 22:26 ` bugzilla-daemon

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git