kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic
@ 2020-09-13 18:39 bugzilla-daemon
  2020-10-12  0:07 ` [Bug 209253] " bugzilla-daemon
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-09-13 18:39 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

            Bug ID: 209253
           Summary: Loss of connectivity on guest after important host <->
                    guest traffic
           Product: Virtualization
           Version: unspecified
    Kernel Version: 5.8.0-1-amd64
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: aubincleme@gmail.com
        Regression: No

Created attachment 292501
  --> https://bugzilla.kernel.org/attachment.cgi?id=292501&action=edit
Packet capture

I have an hypervisor running one guest VM. This guest VM has one virtual
network card, configured to use a NATed network with the host.

Upon guest startup, the guest can ping both the host and a server on the
internet. However, when starting heavy traffic between the guest and the host,
the host kernel issues the following trace, and the VM looses its network
connectivity unpon its next restart. I'm also adding to this issue a pcap file
showing the traffic on the host virtual nic, which slowly degrades, starting
form frames 21 to 23.

Information about the host :
Linux 5.8.0-1-amd64 #1 SMP Debian 5.8.7-1 (2020-09-05) x86_64 GNU/Linux
QEMU emulator version 5.1.0 (Debian 1:5.1+dfsg-4)

Information about the guest VM :
Linux 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux

Network configuration for the guest :
<interface type="network">
  <mac address="52:54:00:f2:29:56"/>
  <source network="bridge-vm"/>
  <model type="virtio"/>
  <link state="up"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>
<network>
  <name>bridge-vm</name>
  <uuid>159f7f26-391c-44f7-8e6e-dc1b213837a6</uuid>
  <forward mode="nat">
    <nat>
      <port start="1024" end="65535"/>
    </nat>
  </forward>
  <bridge name="virbr0" stp="on" delay="0"/>
  <mac address="52:54:00:45:68:56"/>
  <ip address="192.168.122.1" netmask="255.255.255.0">
    <dhcp>
      <range start="192.168.122.2" end="192.168.122.254"/>
    </dhcp>
  </ip>
</network>

Kernel trace on the host :
[ 1492.533631] ------------[ cut here ]------------
[ 1492.533637] WARNING: CPU: 2 PID: 3835 at fs/eventfd.c:74
eventfd_signal+0x88/0xa0
[ 1492.533638] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth
jitterentropy_rng drbg ansi_cprng ecdh_generic ecc cfg80211 macvtap macvlan
vhost_net vhost tap vhost_iotlb nf_conntrack_netlink xfrm_user xfrm_algo
xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp nft_compat nft_counter nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_tables nfnetlink bridge
stp llc tun overlay binfmt_misc intel_rapl_msr intel_rapl_common
x86_pkg_temp_thermal intel_powerclamp coretemp joydev kvm_intel
snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic
ledtrig_audio snd_hda_intel snd_intel_dspcfg rapl snd_hda_codec eeepc_wmi
intel_cstate snd_hda_core intel_uncore snd_hwdep asus_wmi snd_pcm pcspkr
battery snd_timer iTCO_wdt sparse_keymap intel_pmc_bxt rfkill xpad snd
serio_raw iTCO_vendor_support ff_memless mei_me intel_wmi_thunderbolt wmi_bmof
soundcore sg watchdog mei acpi_pad evdev parport_pc ppdev lp
[ 1492.533657]  parport nfsd vfio_pci vfio_virqfd vfio_iommu_type1 vfio
auth_rpcgss irqbypass nfs_acl lockd grace sunrpc ip_tables x_tables autofs4
ext4 crc16 mbcache jbd2 crc32c_generic dm_crypt dm_mod sd_mod t10_pi crc_t10dif
crct10dif_generic hid_generic usbhid hid i915 crct10dif_pclmul crct10dif_common
crc32_pclmul crc32c_intel ghash_clmulni_intel i2c_algo_bit drm_kms_helper ahci
libahci e1000e aesni_intel mxm_wmi psmouse libaes crypto_simd cryptd
glue_helper libata cec xhci_pci ptp xhci_hcd pps_core drm i2c_i801 i2c_smbus
usbcore scsi_mod usb_common wmi video button
[ 1492.533674] CPU: 2 PID: 3835 Comm: vhost-3826 Not tainted 5.8.0-1-amd64 #1
Debian 5.8.7-1
[ 1492.533674] Hardware name: System manufacturer System Product Name/Z170-A,
BIOS 3802 03/15/2018
[ 1492.533676] RIP: 0010:eventfd_signal+0x88/0xa0
[ 1492.533677] Code: 03 00 00 00 4c 89 f7 e8 e6 a8 dc ff 65 ff 0d 0f 20 92 6a
4c 89 ee 4c 89 f7 e8 34 6a 52 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45
31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
[ 1492.533678] RSP: 0018:ffffb27ac06afd40 EFLAGS: 00010202
[ 1492.533679] RAX: 0000000000000001 RBX: ffff94c12c87f000 RCX:
0000000000000000
[ 1492.533680] RDX: 000000000000ad5c RSI: 0000000000000001 RDI:
ffff94c0dd9e81c0
[ 1492.533680] RBP: 0000000000000000 R08: 0000006e000000a8 R09:
ffff94c12c87f348
[ 1492.533681] R10: 0000000000000000 R11: 0000000000000020 R12:
ffff94c10d3f01d0
[ 1492.533681] R13: ffff94c10d3f0000 R14: ffff94c10d3f00c8 R15:
ffffb27ac06afe18
[ 1492.533682] FS:  0000000000000000(0000) GS:ffff94c14ed00000(0000)
knlGS:0000000000000000
[ 1492.533683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1492.533683] CR2: 0000000000000000 CR3: 0000000bc2300003 CR4:
00000000003626e0
[ 1492.533684] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1492.533685] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1492.533685] Call Trace:
[ 1492.533691]  handle_rx+0xbe/0x9f0 [vhost_net]
[ 1492.533694]  vhost_worker+0x88/0xd0 [vhost]
[ 1492.533696]  ? vhost_exceeds_weight+0x50/0x50 [vhost]
[ 1492.533698]  kthread+0x119/0x140
[ 1492.533699]  ? __kthread_bind_mask+0x60/0x60
[ 1492.533701]  ret_from_fork+0x22/0x30
[ 1492.533703] ---[ end trace a62bb924e0497bb1 ]---

Thanks in advance for your time!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
@ 2020-10-12  0:07 ` bugzilla-daemon
  2020-10-17 15:59 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-12  0:07 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

Ian Pilcher (arequipeno@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arequipeno@gmail.com

--- Comment #1 from Ian Pilcher (arequipeno@gmail.com) ---
I am seeing a very similar crash, but the device in my case is an NVIDIA GPU,
passed through to a Windows guest for video processing.

[ 6094.567434] WARNING: CPU: 7 PID: 2524 at fs/eventfd.c:74
eventfd_signal+0x88/0xa0
[ 6094.567464] Modules linked in: vhost_net vhost tap vhost_iotlb tun
nft_chain_nat 8021q garp mrp stp llc sch_ingress bonding openvswitch nsh
nf_conncount nf_nat nft_counter ipt_REJECT ip6t_REJECT nf_reject_ipv4
nf_reject_ipv6 xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nft_compat nf_tables nfnetlink intel_rapl_msr intel_rapl_common sb_edac
x86_pkg_temp_thermal sunrpc intel_powerclamp coretemp raid10 kvm_intel iTCO_wdt
intel_pmc_bxt kvm iTCO_vendor_support gpio_ich ipmi_ssif rapl ixgbe
intel_cstate joydev i2c_i801 igb intel_uncore ioatdma mei_me acpi_ipmi
i2c_smbus mdio intel_pch_thermal mei dca lpc_ich ipmi_si ipmi_devintf
ipmi_msghandler acpi_pad vfat fat ip_tables xfs ast drm_vram_helper
drm_ttm_helper i2c_algo_bit drm_kms_helper cec ttm mxm_wmi drm crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel wmi vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio
[ 6094.567794] CPU: 7 PID: 2524 Comm: CPU 3/KVM Not tainted
5.8.13-200.fc32.x86_64 #1
[ 6094.567834] Hardware name: Supermicro SYS-5028D-TN4T/X10SDV-TLN4F, BIOS 2.1
11/22/2019
[ 6094.567868] RIP: 0010:eventfd_signal+0x88/0xa0
[ 6094.567889] Code: 03 00 00 00 4c 89 f7 e8 26 16 db ff 65 ff 0d 3f f3 ca 43
4c 89 ee 4c 89 f7 e8 34 8e 7f 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45
31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
[ 6094.567974] RSP: 0018:ffffac8780b97bb0 EFLAGS: 00010286
[ 6094.567996] RAX: 00000000ffffffff RBX: ffff9b01c8ed0000 RCX:
0000000000000004
[ 6094.568021] RDX: 00000000c8088704 RSI: 0000000000000001 RDI:
ffff9b11ef445240
[ 6094.568050] RBP: ffffac8780b97c18 R08: ffff9b11ef4cdf40 R09:
00000000c8088708
[ 6094.568080] R10: 0000000000000000 R11: 0000000000000190 R12:
0000000000000002
[ 6094.568105] R13: ffff9b11ef4bbb00 R14: ffff9b11ef4cdf40 R15:
ffff9b11ef4bbb00
[ 6094.568145] FS:  00007f30b8b78700(0000) GS:ffff9b123fbc0000(0000)
knlGS:000000ad76006000
[ 6094.568178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6094.568202] CR2: 000001d9b5100000 CR3: 0000001fb0ed2003 CR4:
00000000003626e0
[ 6094.568232] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 6094.568261] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 6094.568290] Call Trace:
[ 6094.568336]  ioeventfd_write+0x51/0x80 [kvm]
[ 6094.568385]  __kvm_io_bus_write+0x88/0xb0 [kvm]
[ 6094.568417]  kvm_io_bus_write+0x43/0x60 [kvm]
[ 6094.568454]  write_mmio+0x70/0xf0 [kvm]
[ 6094.568488]  emulator_read_write_onepage+0x11e/0x330 [kvm]
[ 6094.568527]  emulator_read_write+0xca/0x180 [kvm]
[ 6094.568564]  segmented_write.isra.0+0x4a/0x60 [kvm]
[ 6094.568601]  x86_emulate_insn+0x850/0xe60 [kvm]
[ 6094.568636]  x86_emulate_instruction+0x2c7/0x780 [kvm]
[ 6094.568680]  ? kvm_io_bus_write+0x43/0x60 [kvm]
[ 6094.569821]  kvm_arch_vcpu_ioctl_run+0xeb9/0x1770 [kvm]
[ 6094.570963]  kvm_vcpu_ioctl+0x209/0x590 [kvm]
[ 6094.572099]  ksys_ioctl+0x82/0xc0
[ 6094.573208]  __x64_sys_ioctl+0x16/0x20
[ 6094.574294]  do_syscall_64+0x4d/0x90
[ 6094.575348]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 6094.576388] RIP: 0033:0x7f30c2bac3bb
[ 6094.577422] Code: 0f 1e fa 48 8b 05 dd aa 0c 00 64 c7 00 26 00 00 00 48 c7
c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d ad aa 0c 00 f7 d8 64 89 01 48
[ 6094.579603] RSP: 002b:00007f30b8b77668 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 6094.580713] RAX: ffffffffffffffda RBX: 000056277c8e4ae0 RCX:
00007f30c2bac3bb
[ 6094.581835] RDX: 0000000000000000 RSI: 000000000000ae80 RDI:
000000000000001e
[ 6094.582951] RBP: 00007f30c010a000 R08: 000056277aebebf0 R09:
0000000000000000
[ 6094.584028] R10: 0000000000000001 R11: 0000000000000246 R12:
0000000000000001
[ 6094.585078] R13: 00007f30c010b001 R14: 0000000000000000 R15:
000056277b358a00
[ 6094.586103] ---[ end trace dab8395baf5baf8c ]---

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
  2020-10-12  0:07 ` [Bug 209253] " bugzilla-daemon
@ 2020-10-17 15:59 ` bugzilla-daemon
  2020-10-20 14:28 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-17 15:59 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

Martin (kdev@mb706.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kdev@mb706.com

--- Comment #2 from Martin (kdev@mb706.com) ---
I am having problems possibly related to Clement's, and likely related to
Ian's. 
I am running KVM on a dual Nvidia GPU machine, passing one GPU through to the
KVM guest (Ubuntu 20.04.1).
The setup ran stable for quite a while on Fedora 31 (5.7.15-100). After
upgrading to Fedora 32 (5.8.13), the system tends to run well for a few hours
after rebooting, and then produces oopses (below). After the oops, VirtIO
drives, VirtIO network cards, and PCIe passthrough tend to hang indefinitely
within minutes of rebooting the guest, usually making the guest inoperable
(unless only non-VirtIO devices and no GPU passthrough are used). Rebooting the
host makes things work again for a few hours until the next oops happens. I am
on 5.8.14 now with the same problem.

Oops 1 (I saw this twice, once on 5.8.13-200 and once on 5.8.14-200):

WARNING: CPU: 28 PID: 17651 at fs/eventfd.c:74 eventfd_signal+0x88/0xa0
Modules linked in: vhost_net vhost tap vhost_iotlb v4l2loopback(OE) xt_nat
xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nft_objref
nf_conntrack_tftp tun bridge stp llc evdi(OE) vboxnetadp(OE) vboxnetflt(OE)
vboxdrv(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables
ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter sunrpc ucsi_ccg typec_ucsi
nvidia_drm(POE) typec nvidia_modeset(POE) snd_hda_codec_realtek nvidia_uvm(OE)
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb edac_mce_amd btrtl
btbcm snd_hda_intel uvcvideo iwlmvm snd_intel_dspcfg kvm_amd btintel
snd_usb_audio snd_hda_codec videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib
mac80211
 nvidia(POE) kvm bluetooth snd_hda_core videobuf2_v4l2 snd_rawmidi snd_hwdep
videobuf2_common libarc4 snd_seq iwlwifi videodev joydev rapl snd_seq_device
ecdh_generic wmi_bmof pcspkr cfg80211 mc ecc snd_pcm drm_kms_helper snd_timer
sp5100_tco k10temp snd i2c_piix4 rfkill soundcore cec i2c_nvidia_gpu gpio_amdpt
gpio_generic acpi_cpufreq drm ip_tables dm_crypt hid_lenovo mxm_wmi
crct10dif_pclmul crc32_pclmul crc32c_intel nvme ghash_clmulni_intel nvme_core
igb wacom ccp uas dca usb_storage i2c_algo_bit wmi pinctrl_amd vfio_pci
irqbypass vfio_virqfd vfio_iommu_type1 vfio fuse
CPU: 28 PID: 17651 Comm: CPU 5/KVM Tainted: P           OE    
5.8.13-200.fc32.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE
EX-CF, BIOS F12 12/11/2019
RIP: 0010:eventfd_signal+0x88/0xa0
Code: 03 00 00 00 4c 89 f7 e8 26 16 db ff 65 ff 0d 3f f3 ca 4b 4c 89 ee 4c 89
f7 e8 34 8e 7f 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 e4 5b 5d 4c
89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
RSP: 0018:ffffab10c8db7bb0 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff9a71e16b8000 RCX: 0000000000000004
RDX: 00000000c8088704 RSI: 0000000000000001 RDI: ffff9a8335656580
RBP: ffffab10c8db7c18 R08: ffff9a72f7d120a0 R09: 00000000c8088708
R10: 0000000000000000 R11: 0000000000000014 R12: 0000000000000001
R13: ffff9a72a3153448 R14: ffff9a72f7d120a0 R15: ffff9a72a3153448
FS:  0000000000000000(0000) GS:ffff9a7e7f280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64e403f024 CR3: 000000041b5f4000 CR4: 00000000003406e0
Call Trace:
 ioeventfd_write+0x51/0x80 [kvm]
 __kvm_io_bus_write+0x88/0xb0 [kvm]
 kvm_io_bus_write+0x43/0x60 [kvm]
 write_mmio+0x70/0xf0 [kvm]
 emulator_read_write_onepage+0x11e/0x330 [kvm]
 emulator_read_write+0xca/0x180 [kvm]
 segmented_write.isra.0+0x4a/0x60 [kvm]
 x86_emulate_insn+0x850/0xe60 [kvm]
 x86_emulate_instruction+0x2c7/0x780 [kvm]
 ? kvm_set_cr8+0x1e/0x40 [kvm]
 kvm_arch_vcpu_ioctl_run+0xeb9/0x1770 [kvm]
 ? x86_pmu_enable+0x106/0x2f0
 ? __switch_to_xtra+0x495/0x500
 kvm_vcpu_ioctl+0x209/0x590 [kvm]
 ksys_ioctl+0x82/0xc0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x4d/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f5f6a84f3bb
Code: 0f 1e fa 48 8b 05 dd aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
c3 48 8b 0d ad aa 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f5f527fb668 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000055c459b6f1f0 RCX: 00007f5f6a84f3bb
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000022
RBP: 00007f5f6bcca000 R08: 000055c45750abf0 R09: 000000003b9aca00
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007f5f6bccb004 R14: 0000000000000000 R15: 000055c4579a4a00

Oops 2 (saw this once on 5.8.14-200):

WARNING: CPU: 24 PID: 0 at fs/eventfd.c:74 eventfd_signal+0x88/0xa0
Modules linked in: v4l2loopback(OE) nfnetlink_queue nfnetlink_log vhost_net
vhost tap vhost_iotlb xt_nat xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc vboxnetadp(OE)
vboxnetflt(OE) vboxdrv(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat
nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter sunrpc nvidia_drm(POE)
nvidia_modeset(POE) iwlmvm nvidia_uvm(OE) snd_hda_codec_realtek ucsi_ccg
typec_ucsi mac80211 typec edac_mce_amd snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi uvcvideo btusb btrtl btbcm nvidia(POE) snd_hda_intel
videobuf2_vmalloc libarc4 kvm_amd btintel videobuf2_memops snd_intel_dspcfg
 snd_hda_codec videobuf2_v4l2 kvm bluetooth videobuf2_common snd_usb_audio
iwlwifi snd_hda_core videodev snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi
joydev rapl snd_seq_device ecdh_generic mc pcspkr wmi_bmof ecc cfg80211 snd_pcm
drm_kms_helper snd_timer snd sp5100_tco i2c_piix4 k10temp rfkill soundcore cec
i2c_nvidia_gpu gpio_amdpt gpio_generic acpi_cpufreq drm ip_tables dm_crypt
mxm_wmi crct10dif_pclmul crc32_pclmul crc32c_intel nvme ghash_clmulni_intel igb
nvme_core wacom uas dca hid_lenovo ccp usb_storage i2c_algo_bit wmi pinctrl_amd
vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio fuse
CPU: 24 PID: 0 Comm: swapper/24 Tainted: P           OE    
5.8.14-200.fc32.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE
EX-CF, BIOS F12 12/11/2019
RIP: 0010:eventfd_signal+0x88/0xa0
Code: 03 00 00 00 4c 89 f7 e8 a6 14 db ff 65 ff 0d bf f1 ca 78 4c 89 ee 4c 89
f7 e8 b4 9c 7f 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 e4 5b 5d 4c
89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
RSP: 0018:ffffb5e2c6d2cf38 EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff894a2a1f1480 RCX: 000000000000001f
RDX: ffff89423920ce00 RSI: 0000000000000001 RDI: ffff894929afc580
RBP: ffff89423920cea4 R08: ffffb5e2c6d2cff8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000a1
R13: 0000000000000000 R14: ffffb5e2c6d2cfb4 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff89423f180000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055ae44aa8990 CR3: 0000000797204000 CR4: 00000000003406e0
Call Trace:
 <IRQ>
 vfio_msihandler+0x12/0x20 [vfio_pci]
 __handle_irq_event_percpu+0x42/0x180
 handle_irq_event+0x47/0x8a
 handle_edge_irq+0x87/0x220
 asm_call_irq_on_stack+0x12/0x20
 </IRQ>
 common_interrupt+0xb2/0x140
 asm_common_interrupt+0x1e/0x40
RIP: 0010:cpuidle_enter_state+0xb6/0x3f0
Code: 90 a5 6b 78 e8 5b be 7b ff 49 89 c7 0f 1f 44 00 00 31 ff e8 2c d7 7b ff
80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e0 01 00
00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
RSP: 0018:ffffb5e2c0337e88 EFLAGS: 00000246
RAX: ffff89423f1aa2c0 RBX: ffff89423366e400 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 000000002abf3055 RDI: 0000000000000000
RBP: ffffffff88b78940 R08: 00000a86556fd237 R09: 0000000000000018
R10: 0000000000002358 R11: 0000000000000781 R12: 0000000000000002
R13: ffff89423366e400 R14: 0000000000000002 R15: 00000a86556fd237
 ? cpuidle_enter_state+0xa4/0x3f0
 cpuidle_enter+0x29/0x40
 do_idle+0x1d5/0x2a0
 cpu_startup_entry+0x19/0x20
 start_secondary+0x144/0x170
 secondary_startup_64+0xb6/0xc0

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
  2020-10-12  0:07 ` [Bug 209253] " bugzilla-daemon
  2020-10-17 15:59 ` bugzilla-daemon
@ 2020-10-20 14:28 ` bugzilla-daemon
  2020-10-22 16:14 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-20 14:28 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #3 from Ian Pilcher (arequipeno@gmail.com) ---
Thanks for the note about 5.7.15.  I've built it for F32, and my GPU *seems*
stable so far (been running a video upscaling job for almost 24 hours).

(I did have an issue where the hypervisor (libvirtd?) seems to lock up if I try
to start the guest wit virt-manager over SSH; had to start it with 'virsh
start' locally.)

This indicates that it *may* be possible to bisect this issue, although the
lack of a time-bound reproducer means that it will take a long time.

@Clément - Can you test 5.7.15?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-10-20 14:28 ` bugzilla-daemon
@ 2020-10-22 16:14 ` bugzilla-daemon
  2020-10-22 22:26 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-22 16:14 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

Justin Gatzen (justin.gatzen@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |justin.gatzen@gmail.com

--- Comment #4 from Justin Gatzen (justin.gatzen@gmail.com) ---
Seeing the same issue here on 5.9.1 and 5.8.14. I did not notice any trouble on
5.8.2 for about two weeks of usage, but that is just an anecdote. I have not
attempted to bisect this because the bug takes quite a while to trigger.


Stack trace from a VFIO enabled VM with SRIOV nic, NVME passthrough, GPU
passthrough, Windows guest:

[83372.203651] ------------[ cut here ]------------
[83372.203659] WARNING: CPU: 16 PID: 0 at fs/eventfd.c:74
eventfd_signal+0x89/0xa0
[83372.203661] Modules linked in: vhost_net vhost tap vhost_iotlb tun vfio_pci
vfio_virqfd vfio_iommu_type1 vfio ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter ip_tables nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nft_counter uinput nft_meta_bridge nf_tables nfnetlink xpad hid_microsoft
ff_memless mlx5_ib ib_uverbs nouveau ib_core edac_mce_amd mlx5_core iwlmvm
kvm_amd mac80211 kvm snd_hda_codec_realtek irqbypass sp5100_tco
snd_hda_codec_generic pcspkr rapl k10temp wmi_bmof i2c_piix4 mlxfw
ledtrig_audio snd_hda_codec_hdmi joydev pci_hyperv_intf snd_hda_intel bridge
snd_intel_dspcfg libarc4 snd_hda_codec btusb btrtl iwlwifi snd_hda_core btbcm
mxm_wmi stp btintel llc video snd_hwdep bluetooth snd_pcm ttm snd_timer
cfg80211 drm_kms_helper veth igb snd ecdh_generic ecc cec i2c_algo_bit
soundcore r8169 rfkill dca acpi_cpufreq vfat fat drm essiv authenc dm_crypt xfs
crct10dif_pclmul ccp crc32_pclmul crc32c_intel nvme
[83372.203687]  ghash_clmulni_intel nvme_core wmi pinctrl_amd
[83372.203702] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.9.1-dirty #49
[83372.203704] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS
MASTER/X570 AORUS MASTER, BIOS F30 09/07/2020
[83372.203707] RIP: 0010:eventfd_signal+0x89/0xa0
[83372.203709] Code: 03 00 00 00 4c 89 f7 e8 b5 3c db ff 65 ff 0d 3e 26 cb 52
4c 89 ee 4c 89 f7 e8 b3 65 7e 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45
31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e c3 66 0f 1f 44 00
[83372.203712] RSP: 0018:ffffa2cdc0554f28 EFLAGS: 00010002
[83372.203714] RAX: 0000000000000001 RBX: ffff8ceaec0d4880 RCX:
000000000000001f
[83372.203716] RDX: ffff8ceaed3b7200 RSI: 0000000000000001 RDI:
ffff8ce6eaa58340
[83372.203717] RBP: ffff8ceaed3b7200 R08: 00004bd39a951b00 R09:
0000000000000000
[83372.203719] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000000d9
[83372.203720] R13: 0000000000000000 R14: ffffa2cdc0554fa4 R15:
0000000000000000
[83372.203722] FS:  0000000000000000(0000) GS:ffff8cf5bee00000(0000)
knlGS:0000000000000000
[83372.203724] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83372.203726] CR2: 00005562c8109c68 CR3: 00000001694fc000 CR4:
0000000000350ee0
[83372.203727] Call Trace:
[83372.203729]  <IRQ>
[83372.203734]  vfio_msihandler+0x12/0x20 [vfio_pci]
[83372.203738]  __handle_irq_event_percpu+0x42/0x180
[83372.203740]  handle_irq_event_percpu+0x21/0x60
[83372.203742]  handle_irq_event+0x36/0x53
[83372.203744]  handle_edge_irq+0x83/0x190
[83372.203747]  asm_call_irq_on_stack+0x12/0x20
[83372.203749]  </IRQ>
[83372.203751]  common_interrupt+0xb5/0x130
[83372.203753]  asm_common_interrupt+0x1e/0x40
[83372.203756] RIP: 0010:cpuidle_enter_state+0xdf/0x390
[83372.203758] Code: e8 76 49 7d ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00
f6 c4 02 0f 85 8f 02 00 00 31 ff e8 28 b3 83 ff fb 66 0f 1f 44 00 00 <45> 85 f6
0f 88 20 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d
[83372.203761] RSP: 0018:ffffa2cdc01c7e88 EFLAGS: 00000246
[83372.203762] RAX: ffff8cf5bee2a300 RBX: ffff8cf3b21d9400 RCX:
000000000000001f
[83372.203764] RDX: 0000000000000000 RSI: 000000002491bed3 RDI:
0000000000000000
[83372.203765] RBP: ffffffffaeb7f180 R08: 00004bd39a95195c R09:
0000000000001206
[83372.203767] R10: 0000000000004852 R11: ffff8cf5bee29124 R12:
00004bd39a95195c
[83372.203768] R13: ffffffffaeb7f268 R14: 0000000000000002 R15:
ffff8cf3b21d9400
[83372.203772]  ? cpuidle_enter_state+0xba/0x390
[83372.203774]  cpuidle_enter+0x29/0x40
[83372.203776]  do_idle+0x1b8/0x240
[83372.203778]  cpu_startup_entry+0x19/0x20
[83372.203780]  start_secondary+0x103/0x130
[83372.203783]  secondary_startup_64+0xb6/0xc0
[83372.203785] ---[ end trace a1dd3ff1b3977640 ]---


And from a non-VFIO run of the mill linux guest:

[37342.189129] ------------[ cut here ]------------
[37342.189137] WARNING: CPU: 21 PID: 1137 at fs/eventfd.c:74
eventfd_signal+0x89/0xa0
[37342.189139] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio
vhost_net vhost tap vhost_iotlb tun nouveau ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter ip_tables nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 mlx5_ib nft_counter nft_meta_bridge nf_tables ib_uverbs uinput
nfnetlink ib_core iwlmvm xpad hid_microsoft ff_memless mac80211 mlx5_core
snd_hda_codec_realtek joydev libarc4 snd_hda_codec_generic snd_hda_codec_hdmi
ledtrig_audio iwlwifi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_hda_codec
kvm_amd bridge btusb snd_hda_core btrtl kvm btbcm video btintel snd_hwdep
bluetooth ttm cfg80211 snd_pcm drm_kms_helper igb stp llc snd_timer irqbypass
veth rapl snd mlxfw pcspkr wmi_bmof mxm_wmi sp5100_tco k10temp i2c_piix4
i2c_algo_bit pci_hyperv_intf ecdh_generic cec r8169 dca soundcore ecc rfkill
drm acpi_cpufreq vfat fat essiv authenc dm_crypt xfs nvme nvme_core
crct10dif_pclmul ccp crc32_pclmul
[37342.189162]  crc32c_intel ghash_clmulni_intel wmi pinctrl_amd
[37342.189174] CPU: 21 PID: 1137 Comm: vhost-1125 Not tainted 5.9.1-dirty #50
[37342.189175] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS
MASTER/X570 AORUS MASTER, BIOS F30 09/07/2020
[37342.189178] RIP: 0010:eventfd_signal+0x89/0xa0
[37342.189180] Code: 03 00 00 00 4c 89 f7 e8 b5 3c db ff 65 ff 0d 3e 26 cb 79
4c 89 ee 4c 89 f7 e8 b3 65 7e 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45
31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e c3 66 0f 1f 44 00
[37342.189182] RSP: 0018:ffffadbcc1233d60 EFLAGS: 00010202
[37342.189183] RAX: 0000000000000001 RBX: ffff986bb47746b8 RCX:
00000000c3160001
[37342.189185] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff986948f6c9c0
[37342.189186] RBP: ffff986bb4770000 R08: 0000000000000001 R09:
0000000000000101
[37342.189187] R10: 000000002b30008a R11: 0000000022308000 R12:
0000000000000001
[37342.189189] R13: ffffadbcc1233e18 R14: ffff98630de8f300 R15:
ffff986bb47746b8
[37342.189190] FS:  0000000000000000(0000) GS:ffff986bbef40000(0000)
knlGS:0000000000000000
[37342.189192] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37342.189193] CR2: 00007f77c69ee890 CR3: 0000001029214000 CR4:
0000000000350ee0
[37342.189194] Call Trace:
[37342.189199]  vhost_tx_batch.isra.0+0x7d/0xc0 [vhost_net]
[37342.189202]  handle_tx_copy+0x15c/0x550 [vhost_net]
[37342.189204]  handle_tx+0xa5/0xe0 [vhost_net]
[37342.189207]  vhost_worker+0x8d/0xd0 [vhost]
[37342.189209]  ? vhost_vring_call_reset+0x40/0x40 [vhost]
[37342.189212]  kthread+0xfe/0x140
[37342.189214]  ? kthread_park+0x90/0x90
[37342.189216]  ret_from_fork+0x22/0x30
[37342.189218] ---[ end trace ae48714189db9592 ]---

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-10-22 16:14 ` bugzilla-daemon
@ 2020-10-22 22:26 ` bugzilla-daemon
  2020-10-23  8:40 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-22 22:26 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #5 from Ian Pilcher (arequipeno@gmail.com) ---
Based on my git bisect, it looks like this commit is triggering the WARNING.

commit c49fa6397b6d29ce10c0ae5b2528bb004a14691f
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Mon Aug 17 11:08:18 2020 -0600

    vfio-pci: Avoid recursive read-lock usage

    [ Upstream commit bc93b9ae0151ae5ad5b8504cdc598428ea99570b ]

    A down_read on memory_lock is held when performing read/write accesses
    to MMIO BAR space, including across the copy_to/from_user() callouts
    which may fault.  If the user buffer for these copies resides in an
    mmap of device MMIO space, the mmap fault handler will acquire a
    recursive read-lock on memory_lock.  Avoid this by reducing the lock
    granularity.  Sequential accesses requiring multiple ioread/iowrite
    cycles are expected to be rare, therefore typical accesses should not
    see additional overhead.

    VGA MMIO accesses are expected to be non-fatal regardless of the PCI
    memory enable bit to allow legacy probing, this behavior remains with
    a comment added.  ioeventfds are now included in memory access testing,
    with writes dropped while memory space is disabled.

    Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on
disabled memory")
    Reported-by: Zhiyi Guo <zhguo@redhat.com>
    Tested-by: Zhiyi Guo <zhguo@redhat.com>
    Reviewed-by: Cornelia Huck <cohuck@redhat.com>
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 drivers/vfio/pci/vfio_pci_private.h |   2 +
 drivers/vfio/pci/vfio_pci_rdwr.c    | 120 ++++++++++++++++++++++++++++--------
 2 files changed, 98 insertions(+), 24 deletions(-)

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bc93b9ae0151ae5ad5b8504cdc598428ea99570b

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-10-22 22:26 ` bugzilla-daemon
@ 2020-10-23  8:40 ` bugzilla-daemon
  2020-10-23 18:23 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-23  8:40 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #6 from Justin Gatzen (justin.gatzen@gmail.com) ---
Thanks, I will revert that commit and report back. This commit was backported
to kernel 5.8.4, which would jive with my (unproven) suspicion that 5.8.2 was
unaffected.

Also, to clarify, the second stack trace on my previous post (Comment #4) could
have infact been a VFIO VM. There were several VMs running with vhost/virtio
net communicating with each other and it was not clear which VM triggered it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-10-23  8:40 ` bugzilla-daemon
@ 2020-10-23 18:23 ` bugzilla-daemon
  2020-10-23 19:05 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-23 18:23 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #7 from Alex Williamson (alex.williamson@redhat.com) ---
Color me suspicious, but there are backtraces from two configurations in the
comments here that have no vfio devices, the original post and Justin's second
trace.  The identified commit can only affect vfio configurations.

All of the backtraces seem to be from triggering this warning:

__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
{
        unsigned long flags;

        /*
         * Deadlock or stack overflow issues can happen if we recurse here
         * through waitqueue wakeup handlers. If the caller users potentially
         * nested waitqueues with custom wakeup handlers, then it should
         * check eventfd_signal_count() before calling this function. If
         * it returns true, the eventfd_signal() call should be deferred to a
         * safe context.
         */
        if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
                return 0;

This cpu-local counter is only incremented while holding a spinlock with IRQs
disabled while handling the wait queue.

It's not obvious to me how the backtraces shown can lead to recursive eventfd
signals.  I've setup a configuration for stress testing, but any detailed
description of a reliable reproducer would be appreciated.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-10-23 18:23 ` bugzilla-daemon
@ 2020-10-23 19:05 ` bugzilla-daemon
  2020-10-23 22:28 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-23 19:05 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #8 from Ian Pilcher (arequipeno@gmail.com) ---
Created attachment 293159
  --> https://bugzilla.kernel.org/attachment.cgi?id=293159&action=edit
Guest XML

Attached is the XML of my VFIO guest.  This is a Windows 10 Pro VM, which I use
to run Topaz Labs Video Enhance AI.

  https://topazlabs.com/video-enhance-ai/

Topaz Labs offer a free 30-day trial.  I have been using this to upscale
DVD-quality video (720x480) by 225% to 1620x1080, using the Artemis-LQ model. 
This typically triggers the warning within a few minutes.  (During the bisect I
considered a 1 hour run without hitting the warning to be "success.")

Here is the information about my GPU (and associated audio device):

05:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P1000]
(rev a1) (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company Device 11bc
        Physical Slot: 2
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 89
        NUMA node: 0
        Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 383fe0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at 383ff0000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at e000 [size=128]
        Expansion ROM at fb000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00a78  Data: 0000
        Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr-
TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
Latency L0s <1us, L1 <4us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, NROPrPrP-,
LTR-
                         10BitTagComp-, 10BitTagReq-, OBFF Via message,
ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported,
EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+,
LinkEqualizationRequest-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [420 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr-
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn-
ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1
Len=024 <?>
        Capabilities: [900 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau

05:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio
Controller (rev a1)
        Subsystem: Hewlett-Packard Company Device 11bc
        Physical Slot: 2
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx+
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin B routed to IRQ 17
        NUMA node: 0
        Region 0: Memory at fb080000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
SlotPowerLimit 26.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr-
TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
Latency L0s <1us, L1 <4us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, NROPrPrP-,
LTR-
                         10BitTagComp-, 10BitTagReq-, OBFF Via message,
ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported,
EmergencyPowerReductionInit-
                         FRS-, TPHComp-, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr-
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn-
ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-10-23 19:05 ` bugzilla-daemon
@ 2020-10-23 22:28 ` bugzilla-daemon
  2020-10-27 12:51 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-23 22:28 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #9 from Ian Pilcher (arequipeno@gmail.com) ---
Testing 5.9.1 with commit c49fa6397b6d29ce10c0ae5b2528bb004a14691f reverted,
and it seems stable (before reverting, I hit the warning withing a few
minutes).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (8 preceding siblings ...)
  2020-10-23 22:28 ` bugzilla-daemon
@ 2020-10-27 12:51 ` bugzilla-daemon
  2020-10-28 17:12 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-27 12:51 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #10 from Ian Pilcher (arequipeno@gmail.com) ---
(In reply to Alex Williamson from comment #7)
> Color me suspicious, but there are backtraces from two configurations in the
> comments here that have no vfio devices, the original post and Justin's
> second trace.  The identified commit can only affect vfio configurations.
> 
> All of the backtraces seem to be from triggering this warning:
> 
> __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
> {
>         unsigned long flags;
> 
>         /*
>          * Deadlock or stack overflow issues can happen if we recurse here
>          * through waitqueue wakeup handlers. If the caller users potentially
>          * nested waitqueues with custom wakeup handlers, then it should
>          * check eventfd_signal_count() before calling this function. If
>          * it returns true, the eventfd_signal() call should be deferred to a
>          * safe context.
>          */
>         if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
>                 return 0;

It's quite possible that some of the backtraces in this bug have different root
cause(s).  That doesn't change the fact that commit
c49fa6397b6d29ce10c0ae5b2528bb004a14691f does reliably trigger the WARNING for
some of us.

> It's not obvious to me how the backtraces shown can lead to recursive
> eventfd signals.  I've setup a configuration for stress testing, but any
> detailed description of a reliable reproducer would be appreciated.

Is the VM XML and other information sufficient for you to reproduce?  (If not,
I can set up access to my hypervisor.)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (9 preceding siblings ...)
  2020-10-27 12:51 ` bugzilla-daemon
@ 2020-10-28 17:12 ` bugzilla-daemon
  2020-10-28 20:35 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-28 17:12 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #11 from Alex Williamson (alex.williamson@redhat.com) ---
I have reproduced, I don't know the solution yet, but I think I have a
workaround, at least for occurrences like shown in comment 1.  These seem to be
generated by the GeForce MSI quirk, which uses an ioeventfd for the re-arming
of the MSI.  Perhaps we haven't unwound out of the previous MMIO write before
another is triggered, but we should be running with IRQs disabled... I'm not
sure.  We can disable this ioeventfd in QEMU though at a slight interrupt
latency cost while further debugging.  libvirt xml changes:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
...
  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='vfio-pci.x-no-vfio-ioeventfd=on'/>
  </qemu:commandline>
</domain>

IOW, replace the first line to include the schema, otherwise libvirt will
ignore and remove the addition, then add the <qemu:commandline> stanza at the
end of the <domain> definition.  If using QEMU command line directly you can
similarly just add "-global vfio-pci.x-no-vfio-ioeventfd=on".

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (10 preceding siblings ...)
  2020-10-28 17:12 ` bugzilla-daemon
@ 2020-10-28 20:35 ` bugzilla-daemon
  2020-10-28 20:37 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-28 20:35 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #12 from Alex Williamson (alex.williamson@redhat.com) ---
Created attachment 293281
  --> https://bugzilla.kernel.org/attachment.cgi?id=293281&action=edit
Test fix for ioeventfd_write traces

For those experiencing the issue described by Ian in comment 1 and the first
issue from Martin in comment 2, please try this patch if you're able. 
vfio_pci_ioeventfd_handler() is called in a spinlock context with interrupts
disabled and tries to acquire a read lock on the memory semaphore to verify the
device memory is enabled.  The down_read() call can sleep and therefore should
instead be called from a thread context if there is contention.  TBH, I've
never seen it contended, so in practice the thread is never really used, but
this seems to solve the specific case Ian has identified and it would have been
introduced by the commit noted in comment 5.

This change should be specific to configurations with NVIDIA GPUs assigned. 
I'm curious about the traces with vfio_msihandler as well, if anyone can
provide a reproducer of that it would be appreciated.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (11 preceding siblings ...)
  2020-10-28 20:35 ` bugzilla-daemon
@ 2020-10-28 20:37 ` bugzilla-daemon
  2020-10-28 20:52 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-28 20:37 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #13 from Alex Williamson (alex.williamson@redhat.com) ---
PS - Don't forget to undo the workaround in comment 11 if applied before
testing the above patch.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (12 preceding siblings ...)
  2020-10-28 20:37 ` bugzilla-daemon
@ 2020-10-28 20:52 ` bugzilla-daemon
  2020-10-28 23:41 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-28 20:52 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #14 from Justin Gatzen (justin.gatzen@gmail.com) ---
I have been running 5.9.1 with bc93b9ae0151ae5ad5b8504cdc598428ea99570b
reverted for 5 days now and have not had any recurrences. I can also confirm
two of my VMs have Geforce GPUs (GTX1060, GT710) passed through along with
virtio-net in the same VM.

Going to give the new patch from comment 12 a shot now. Thanks for looking into
this Alex!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (13 preceding siblings ...)
  2020-10-28 20:52 ` bugzilla-daemon
@ 2020-10-28 23:41 ` bugzilla-daemon
  2020-10-30 22:13 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-28 23:41 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #15 from Ian Pilcher (arequipeno@gmail.com) ---
(In reply to Alex Williamson from comment #12)
> Created attachment 293281 [details]
> Test fix for ioeventfd_write traces

Stable for 75 minutes so far with this patch applied.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (14 preceding siblings ...)
  2020-10-28 23:41 ` bugzilla-daemon
@ 2020-10-30 22:13 ` bugzilla-daemon
  2020-10-30 22:53 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-30 22:13 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #18 from Alex Williamson (alex.williamson@redhat.com) ---
Patch posted earlier today:

https://lore.kernel.org/lkml/160407008986.9986.83949368176304529.stgit@gimli.home/

I'll intend to get this in for v5.10-rc3 next week and it should get backported
to affected kernels from there.

Justin, I'll add a Tested-by for you as well.  Interesting that you reported
different backtraces and this still appears to resolve it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (15 preceding siblings ...)
  2020-10-30 22:13 ` bugzilla-daemon
@ 2020-10-30 22:53 ` bugzilla-daemon
  2020-11-10 18:35 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-10-30 22:53 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #19 from Justin Gatzen (justin.gatzen@gmail.com) ---
Agreed its less clear for my stack traces. I also didn't have a good reproducer
other than my regular daily usage and waiting. If it recurs and this was a
false positive I'll let you know.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (16 preceding siblings ...)
  2020-10-30 22:53 ` bugzilla-daemon
@ 2020-11-10 18:35 ` bugzilla-daemon
  2020-11-10 18:52 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-11-10 18:35 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #20 from Ian Pilcher (arequipeno@gmail.com) ---
(In reply to Alex Williamson from comment #18)
> I'll intend to get this in for v5.10-rc3 next week and it should get
> backported to affected kernels from there.

I see that this has gone into Linus's tree, but it looks like you didn't cc
stable@vger.kernel.org.  (I'm not an expert on the process, though, so maybe
that isn't needed anymore?)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (17 preceding siblings ...)
  2020-11-10 18:35 ` bugzilla-daemon
@ 2020-11-10 18:52 ` bugzilla-daemon
  2020-11-10 22:43 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-11-10 18:52 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #21 from Alex Williamson (alex.williamson@redhat.com) ---
(In reply to Ian Pilcher from comment #20)
> I see that this has gone into Linus's tree, but it looks like you didn't cc
> stable@vger.kernel.org.  (I'm not an expert on the process, though, so maybe
> that isn't needed anymore?)

The Fixes: tag serves the same purpose and is more explicit, it's already in
the 5.9-stable queue.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (18 preceding siblings ...)
  2020-11-10 18:52 ` bugzilla-daemon
@ 2020-11-10 22:43 ` bugzilla-daemon
  2020-11-16 11:21 ` bugzilla-daemon
  2020-12-04 22:49 ` bugzilla-daemon
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-11-10 22:43 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

--- Comment #22 from Ian Pilcher (arequipeno@gmail.com) ---
(In reply to Alex Williamson from comment #21)
> The Fixes: tag serves the same purpose and is more explicit, it's already in
> the 5.9-stable queue.

Gotcha.  Sorry about the noise.

BTW, any reason not to close this?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (19 preceding siblings ...)
  2020-11-10 22:43 ` bugzilla-daemon
@ 2020-11-16 11:21 ` bugzilla-daemon
  2020-12-04 22:49 ` bugzilla-daemon
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-11-16 11:21 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

Mair O'Nire (maironire@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maironire@gmail.com

--- Comment #23 from Mair O'Nire (maironire@gmail.com) ---
Hi all.
I'm new here, registered exactly to track this bug.
I've got a new laptop recently, HP Omen 2020, with AMD Ryzen 4800H and Nvidia
1660Ti, and whilst was trying to setup kvm virtualized system I ran into the
same issue. During investigation I've found Ian's report, and then this one,
too.
Backtraces were very similar to other posted above, all at fs/eventfd.c:74, but
sometimes with different origin. So I decided to give Alex's patch a try, and
built mainline kernel with this patch applied. Mine host kernel was
5.8.[6,17,18], Fedora 33, and they all seems to be affected. Guest is Windows
10.
So, I do confirm - patch works, system became stable, and there was no crashes
for about a week, nor under regular work load, neither while running
benchmarks. Before the fix guest uptime varied from 2 to 10 minutes and then
crash. Moreover, I had to reboot host to get Nvidia back to work in vm again.
Thanks Alex, Ian, and others, you guys doin' a great work if you don't know
this already :)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 209253] Loss of connectivity on guest after important host <-> guest traffic
  2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
                   ` (20 preceding siblings ...)
  2020-11-16 11:21 ` bugzilla-daemon
@ 2020-12-04 22:49 ` bugzilla-daemon
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2020-12-04 22:49 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209253

Clément Aubin (aubincleme@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #24 from Clément Aubin (aubincleme@gmail.com) ---
Hi everyone, 

Recently tested with the kernel 5.9.12 and the issue is indeed fixed on my
side. Huge thanks to everyone participating to this issue, especially @Alex and
@Ian for the diagnosis and the fix !

I'm closing the issue, don't hesitate to re-open it if needed.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-12-04 22:49 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-13 18:39 [Bug 209253] New: Loss of connectivity on guest after important host <-> guest traffic bugzilla-daemon
2020-10-12  0:07 ` [Bug 209253] " bugzilla-daemon
2020-10-17 15:59 ` bugzilla-daemon
2020-10-20 14:28 ` bugzilla-daemon
2020-10-22 16:14 ` bugzilla-daemon
2020-10-22 22:26 ` bugzilla-daemon
2020-10-23  8:40 ` bugzilla-daemon
2020-10-23 18:23 ` bugzilla-daemon
2020-10-23 19:05 ` bugzilla-daemon
2020-10-23 22:28 ` bugzilla-daemon
2020-10-27 12:51 ` bugzilla-daemon
2020-10-28 17:12 ` bugzilla-daemon
2020-10-28 20:35 ` bugzilla-daemon
2020-10-28 20:37 ` bugzilla-daemon
2020-10-28 20:52 ` bugzilla-daemon
2020-10-28 23:41 ` bugzilla-daemon
2020-10-30 22:13 ` bugzilla-daemon
2020-10-30 22:53 ` bugzilla-daemon
2020-11-10 18:35 ` bugzilla-daemon
2020-11-10 18:52 ` bugzilla-daemon
2020-11-10 22:43 ` bugzilla-daemon
2020-11-16 11:21 ` bugzilla-daemon
2020-12-04 22:49 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).