linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: Virtual machine fails to start on 6.0-rc2
@ 2022-08-25 23:42 Laba, SlawomirX
  2022-08-26  0:08 ` Alex Williamson
  0 siblings, 1 reply; 3+ messages in thread
From: Laba, SlawomirX @ 2022-08-25 23:42 UTC (permalink / raw)
  To: linux-pci, netdev, linux-kernel, regressions, bhelgaas,
	alex.williamson, kvm, linux-mm, songmuchun, mike.kravetz
  Cc: Brandeburg, Jesse, Nguyen, Anthony L

We were testing the changes for our VF devices and noticed an issue when starting VMs with passthrough VFs. We then moved back to mainline kernel and reproduced the issue on 6.0-rc2

We noticed that the startup of the KVM hangs.

Steps to reproduce:
Create a VF from the PF interface.
Configure VM XML with the VF PCI.
Start the KVM.

To isolate the issue we moved back to kernel 5.19 and it was working fine.

Working tag v5.19
Tested failing commit 4c612826bec1

[root@localhost sl]# uname -r
6.0.0-rc2-00159-g4c612826bec1
[root@localhost sl]# echo 1 > /sys/class/net/ens785f3/device/sriov_numvfs
[root@localhost sl]# virsh start rhel_9_0_first
^C
[root@localhost sl]# virsh list --all
 Id   Name             State
-------------------------------
 1    rhel_9_0_first   paused	

Dmesg:
[  +0.042400] iavf: Intel(R) Ethernet Adaptive Virtual Function Network Driver
[  +0.000004] Copyright (c) 2013 - 2018 Intel Corporation.
[  +0.000309] iavf 0000:18:19.0: enabling device (0000 -> 0002)
[  +0.073471] iavf 0000:18:19.0: Invalid MAC address 00:00:00:00:00:00, using random
[  +0.000674] iavf 0000:18:19.0: Multiqueue Enabled: Queue pair count = 16
[  +0.000466] iavf 0000:18:19.0: MAC address: 5a:0c:b5:f7:4f:0b
[  +0.000003] iavf 0000:18:19.0: GRO is enabled
[  +0.005941] iavf 0000:18:19.0 ens785f3v0: renamed from eth0
[  +0.179174] IPv6: ADDRCONF(NETDEV_CHANGE): ens785f3v0: link becomes ready
[  +0.000040] iavf 0000:18:19.0 ens785f3v0: NIC Link is Up Speed is 25 Gbps Full Duplex
[ +26.408503] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[  +0.399621] VFIO - User Level meta-driver version: 0.3
[  +0.151579] iavf 0000:18:19.0: Remove device
[  +0.292158] ice 0000:18:00.3 ens785f3: Setting MAC 52:54:00:9f:ea:de on VF 0. VF driver will be reinitialized
[  +0.083676] ice 0000:18:00.3: Clearing port VLAN on VF 0
[  +0.155905] tun: Universal TUN/TAP device driver, 1.6
[  +0.000976] virbr0: port 1(vnet0) entered blocking state
[  +0.000017] virbr0: port 1(vnet0) entered disabled state
[  +0.000052] device vnet0 entered promiscuous mode
[  +0.000244] virbr0: port 1(vnet0) entered blocking state
[  +0.000003] virbr0: port 1(vnet0) entered listening state
[  +2.019924] virbr0: port 1(vnet0) entered learning state
[  +2.047997] virbr0: port 1(vnet0) entered forwarding state
[  +0.000018] virbr0: topology change detected, propagating
[Aug25 19:12] INFO: task khugepaged:507 blocked for more than 122 seconds.
[  +0.000016]       Tainted: G        W I        6.0.0-rc2-00159-g4c612826bec1 #1
[  +0.000010] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000010] task:khugepaged      state:D stack:    0 pid:  507 ppid:     2 flags:0x00004000
[  +0.000004] Call Trace:
[  +0.000001]  <TASK>
[  +0.000003]  __schedule+0x1bc/0x550
[  +0.000006]  ? osq_unlock+0xf/0x90
[  +0.000005]  schedule+0x5d/0xd0
[  +0.000002]  rwsem_down_write_slowpath+0x2c9/0x5e0
[  +0.000004]  ? find_vma+0x64/0x70
[  +0.000004]  collapse_huge_page+0x1f8/0x8a0
[  +0.000004]  ? _raw_spin_unlock+0x14/0x30
[  +0.000002]  ? preempt_count_add+0x70/0xa0
[  +0.000005]  ? _raw_spin_lock_irqsave+0x21/0x30
[  +0.000001]  ? lock_timer_base+0x61/0x80
[  +0.000005]  khugepaged_scan_pmd+0x33d/0x7b0
[  +0.000003]  khugepaged_scan_mm_slot+0x155/0x440
[  +0.000003]  khugepaged+0x189/0x3e0
[  +0.000002]  ? preempt_count_add+0x70/0xa0
[  +0.000002]  ? _raw_spin_unlock_irqrestore+0x1e/0x40
[  +0.000002]  ? khugepaged_scan_mm_slot+0x440/0x440
[  +0.000001]  kthread+0xf0/0x120
[  +0.000003]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000003]  ret_from_fork+0x1f/0x30
[  +0.000006]  </TASK>

On the working version there is a line that enables the interface for the VF which is missing on non-working one:
[  +0.911730] vfio-pci 0000:18:19.0: enabling device (0000 -> 0002)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BUG: Virtual machine fails to start on 6.0-rc2
  2022-08-25 23:42 BUG: Virtual machine fails to start on 6.0-rc2 Laba, SlawomirX
@ 2022-08-26  0:08 ` Alex Williamson
  2022-08-26 18:41   ` Laba, SlawomirX
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Williamson @ 2022-08-26  0:08 UTC (permalink / raw)
  To: Laba, SlawomirX
  Cc: linux-pci, netdev, linux-kernel, regressions, bhelgaas, kvm,
	linux-mm, songmuchun, mike.kravetz, Brandeburg, Jesse, Nguyen,
	Anthony L, Andrew Morton

On Thu, 25 Aug 2022 23:42:47 +0000
"Laba, SlawomirX" <slawomirx.laba@intel.com> wrote:

> We were testing the changes for our VF devices and noticed an issue
> when starting VMs with passthrough VFs. We then moved back to
> mainline kernel and reproduced the issue on 6.0-rc2
> 
> We noticed that the startup of the KVM hangs.
> 
> Steps to reproduce:
> Create a VF from the PF interface.
> Configure VM XML with the VF PCI.
> Start the KVM.
> 
> To isolate the issue we moved back to kernel 5.19 and it was working
> fine.
> 
> Working tag v5.19
> Tested failing commit 4c612826bec1

Does this resolve the issue?

https://lore.kernel.org/all/166015037385.760108.16881097713975517242.stgit@omen/

This is currently in Andrew's mm-hotfixes-unstable branch and we're
waiting for it to be merged to mainline.  Thanks,

Alex
 
> [root@localhost sl]# uname -r
> 6.0.0-rc2-00159-g4c612826bec1
> [root@localhost sl]# echo 1 >
> /sys/class/net/ens785f3/device/sriov_numvfs [root@localhost sl]#
> virsh start rhel_9_0_first ^C
> [root@localhost sl]# virsh list --all
>  Id   Name             State
> -------------------------------
>  1    rhel_9_0_first   paused	
> 
> Dmesg:
> [  +0.042400] iavf: Intel(R) Ethernet Adaptive Virtual Function
> Network Driver [  +0.000004] Copyright (c) 2013 - 2018 Intel
> Corporation. [  +0.000309] iavf 0000:18:19.0: enabling device (0000
> -> 0002) [  +0.073471] iavf 0000:18:19.0: Invalid MAC address
> 00:00:00:00:00:00, using random [  +0.000674] iavf 0000:18:19.0:
> Multiqueue Enabled: Queue pair count = 16 [  +0.000466] iavf
> 0000:18:19.0: MAC address: 5a:0c:b5:f7:4f:0b [  +0.000003] iavf
> 0000:18:19.0: GRO is enabled [  +0.005941] iavf 0000:18:19.0
> ens785f3v0: renamed from eth0 [  +0.179174] IPv6:
> ADDRCONF(NETDEV_CHANGE): ens785f3v0: link becomes ready [  +0.000040]
> iavf 0000:18:19.0 ens785f3v0: NIC Link is Up Speed is 25 Gbps Full
> Duplex [ +26.408503] bridge: filtering via arp/ip/ip6tables is no
> longer available by default. Update your scripts to load br_netfilter
> if you need this. [  +0.399621] VFIO - User Level meta-driver
> version: 0.3 [  +0.151579] iavf 0000:18:19.0: Remove device [
> +0.292158] ice 0000:18:00.3 ens785f3: Setting MAC 52:54:00:9f:ea:de
> on VF 0. VF driver will be reinitialized [  +0.083676] ice
> 0000:18:00.3: Clearing port VLAN on VF 0 [  +0.155905] tun: Universal
> TUN/TAP device driver, 1.6 [  +0.000976] virbr0: port 1(vnet0)
> entered blocking state [  +0.000017] virbr0: port 1(vnet0) entered
> disabled state [  +0.000052] device vnet0 entered promiscuous mode [
> +0.000244] virbr0: port 1(vnet0) entered blocking state [  +0.000003]
> virbr0: port 1(vnet0) entered listening state [  +2.019924] virbr0:
> port 1(vnet0) entered learning state [  +2.047997] virbr0: port
> 1(vnet0) entered forwarding state [  +0.000018] virbr0: topology
> change detected, propagating [Aug25 19:12] INFO: task khugepaged:507
> blocked for more than 122 seconds. [  +0.000016]       Tainted: G
>    W I        6.0.0-rc2-00159-g4c612826bec1 #1 [  +0.000010] "echo 0
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [
> > +0.000010] task:khugepaged      state:D stack:    0 pid:  507 ppid:
> >     2 flags:0x00004000 [  +0.000004] Call Trace:
> [  +0.000001]  <TASK>
> [  +0.000003]  __schedule+0x1bc/0x550
> [  +0.000006]  ? osq_unlock+0xf/0x90
> [  +0.000005]  schedule+0x5d/0xd0
> [  +0.000002]  rwsem_down_write_slowpath+0x2c9/0x5e0
> [  +0.000004]  ? find_vma+0x64/0x70
> [  +0.000004]  collapse_huge_page+0x1f8/0x8a0
> [  +0.000004]  ? _raw_spin_unlock+0x14/0x30
> [  +0.000002]  ? preempt_count_add+0x70/0xa0
> [  +0.000005]  ? _raw_spin_lock_irqsave+0x21/0x30
> [  +0.000001]  ? lock_timer_base+0x61/0x80
> [  +0.000005]  khugepaged_scan_pmd+0x33d/0x7b0
> [  +0.000003]  khugepaged_scan_mm_slot+0x155/0x440
> [  +0.000003]  khugepaged+0x189/0x3e0
> [  +0.000002]  ? preempt_count_add+0x70/0xa0
> [  +0.000002]  ? _raw_spin_unlock_irqrestore+0x1e/0x40
> [  +0.000002]  ? khugepaged_scan_mm_slot+0x440/0x440
> [  +0.000001]  kthread+0xf0/0x120
> [  +0.000003]  ? kthread_complete_and_exit+0x20/0x20
> [  +0.000003]  ret_from_fork+0x1f/0x30
> [  +0.000006]  </TASK>
> 
> On the working version there is a line that enables the interface for
> the VF which is missing on non-working one: [  +0.911730] vfio-pci
> 0000:18:19.0: enabling device (0000 -> 0002)
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: BUG: Virtual machine fails to start on 6.0-rc2
  2022-08-26  0:08 ` Alex Williamson
@ 2022-08-26 18:41   ` Laba, SlawomirX
  0 siblings, 0 replies; 3+ messages in thread
From: Laba, SlawomirX @ 2022-08-26 18:41 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-pci, netdev, linux-kernel, regressions, bhelgaas, kvm,
	linux-mm, songmuchun, mike.kravetz, Brandeburg, Jesse, Nguyen,
	Anthony L, Andrew Morton

> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, August 26, 2022 2:08 AM
> To: Laba, SlawomirX <slawomirx.laba@intel.com>
> Cc: linux-pci@vger.kernel.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; regressions@lists.linux.dev; bhelgaas@google.com;
> kvm@vger.kernel.org; linux-mm@kvack.org; songmuchun@bytedance.com;
> mike.kravetz@oracle.com; Brandeburg, Jesse
> <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Andrew Morton <akpm@linux-
> foundation.org>
> Subject: Re: BUG: Virtual machine fails to start on 6.0-rc2
> 
> On Thu, 25 Aug 2022 23:42:47 +0000
> "Laba, SlawomirX" <slawomirx.laba@intel.com> wrote:
> 
> > We were testing the changes for our VF devices and noticed an issue
> > when starting VMs with passthrough VFs. We then moved back to mainline
> > kernel and reproduced the issue on 6.0-rc2
> >
> > We noticed that the startup of the KVM hangs.
> >
> > Steps to reproduce:
> > Create a VF from the PF interface.
> > Configure VM XML with the VF PCI.
> > Start the KVM.
> >
> > To isolate the issue we moved back to kernel 5.19 and it was working
> > fine.
> >
> > Working tag v5.19
> > Tested failing commit 4c612826bec1
> 
> Does this resolve the issue?
> 
> https://lore.kernel.org/all/166015037385.760108.16881097713975517242.stgit
> @omen/
> 
> This is currently in Andrew's mm-hotfixes-unstable branch and we're waiting
> for it to be merged to mainline.  Thanks,
> 
> Alex

With the patch you linked, the issue is gone.

Thanks,
Slawek

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-26 18:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 23:42 BUG: Virtual machine fails to start on 6.0-rc2 Laba, SlawomirX
2022-08-26  0:08 ` Alex Williamson
2022-08-26 18:41   ` Laba, SlawomirX

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).