* Write to srvio_numvfs triggers kernel panic @ 2022-05-04 19:56 Volodymyr Babchuk 2022-05-06 20:17 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Volodymyr Babchuk @ 2022-05-04 19:56 UTC (permalink / raw) To: linux-pci Hello, I have encountered issue when PCI code tries to use both fields in union { struct pci_sriov *sriov; /* PF: SR-IOV info */ struct pci_dev *physfn; /* VF: related PF */ }; (which are part of struct pci_dev) at the same time. Symptoms are following: # echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs pci 0000:01:00.2: reg 0x20c: [mem 0x30018000-0x3001ffff 64bit] pci 0000:01:00.2: VF(n) BAR0 space: [mem 0x30018000-0x30117fff 64bit] (contains BAR0 for 32 VFs) Unable to handle kernel paging request at virtual address 0001000200000010 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [0001000200000010] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter crct10dif_ce nvme nvme_core at24 pci_endpoint_test bridge pdrv_genirq ip_tables x_tables ipv6 CPU: 3 PID: 287 Comm: sh Not tainted 5.10.41-lorc+ #233 Hardware name: XENVM-4.17 (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) pc : pcie_aspm_get_link+0x90/0xcc lr : pcie_aspm_get_link+0x8c/0xcc sp : ffff8000130d39c0 x29: ffff8000130d39c0 x28: 00000000000001a4 x27: 00000000ffffee4b x26: ffff80001164f560 x25: 0000000000000000 x24: 0000000000000000 x23: ffff80001164f660 x22: 0000000000000000 x23: ffff80001164f660 x22: 0000000000000000 x21: ffff000003f08000 x20: ffff800010db37d8 x19: ffff000004b8e780 x18: ffffffffffffffff x17: 0000000000000000 x16: 00000000deadbeef x15: ffff8000930d36c7 x14: 0000000000000006 x13: ffff8000115c2710 x12: 000000000000081c x11: 00000000000002b4 x10: ffff8000115c2710 x9 : ffff8000115c2710 x8 : 00000000ffffefff x7 : ffff80001161a710 x6 : ffff80001161a710 x5 : ffff00003fdad900 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff000003c51c80 x0 : 0001000200000000 Call trace: pcie_aspm_get_link+0x90/0xcc aspm_ctrl_attrs_are_visible+0x30/0xc0 internal_create_group+0xd0/0x3cc internal_create_groups.part.0+0x4c/0xc0 sysfs_create_groups+0x20/0x34 device_add+0x2b4/0x760 pci_device_add+0x814/0x854 pci_iov_add_virtfn+0x240/0x2f0 sriov_enable+0x1f8/0x474 pci_sriov_configure_simple+0x38/0x90 sriov_numvfs_store+0xa4/0x1a0 dev_attr_store+0x1c/0x30 sysfs_kf_write+0x48/0x60 kernfs_fop_write_iter+0x118/0x1ac new_sync_write+0xe8/0x184 vfs_write+0x23c/0x2a0 ksys_write+0x68/0xf4 __arm64_sys_write+0x20/0x2c el0_svc_common.constprop.0+0x78/0x1a0 do_el0_svc+0x28/0x94 el0_svc+0x14/0x20 el0_sync_handler+0xa4/0x130 el0_sync+0x180/0x1c0 Code: d0002120 9133e000 97ffef8e f9400a60 (f9400813) Debugging showed the following: pci_iov_add_virtfn() allocates new struct pci_dev: virtfn = pci_alloc_dev(bus); and sets physfn: virtfn->is_virtfn = 1; virtfn->physfn = pci_dev_get(dev); then we will get into sriov_init() via the following call path: pci_device_add(virtfn, virtfn->bus); pci_init_capabilities(dev); pci_iov_init(dev); sriov_init(dev, pos); sriov_init() overwrites value in the union: dev->sriov = iov; <<<<<---- There dev->is_physfn = 1; Next, we will get into function that causes panic: pci_device_add(virtfn, virtfn->bus); device_add(&dev->dev) sysfs_create_groups() internal_create_group() aspm_ctrl_attrs_are_visible() pcie_aspm_get_link() pci_upstream_bridge() pci_physfn() which is static inline struct pci_dev *pci_physfn(struct pci_dev *dev) { #ifdef CONFIG_PCI_IOV if (dev->is_virtfn) dev = dev->physfn; #endif return dev; } as is_virtfn == 1 and dev->physfn was overwritten via write to dev->sriov, pci_physfn() will return pointer to struct pci_sriov allocated by sriov_init(). And then pci_upstream_bridge() will cause panic by acessing it as if it were pointer to struct pci_dev I encountered this issue on ARM64, Linux 5.10.41. Tried to test on master branch, but it is quite difficult to rebase platform code on the master. But I compared all relevant parts of PCI code and didn't found any differences. Looks like I am missing following, because how SR-IOV can be so broken? But what exactly? -- Volodymyr Babchuk at EPAM ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-04 19:56 Write to srvio_numvfs triggers kernel panic Volodymyr Babchuk @ 2022-05-06 20:17 ` Bjorn Helgaas 2022-05-07 1:34 ` Jason Gunthorpe 2022-05-07 10:22 ` Volodymyr Babchuk 0 siblings, 2 replies; 14+ messages in thread From: Bjorn Helgaas @ 2022-05-06 20:17 UTC (permalink / raw) To: Volodymyr Babchuk Cc: linux-pci, Alex Williamson, Leon Romanovsky, Jason Gunthorpe [+cc Alex, Leon, Jason] On Wed, May 04, 2022 at 07:56:01PM +0000, Volodymyr Babchuk wrote: > > Hello, > > I have encountered issue when PCI code tries to use both fields in > > union { > struct pci_sriov *sriov; /* PF: SR-IOV info */ > struct pci_dev *physfn; /* VF: related PF */ > }; > > (which are part of struct pci_dev) at the same time. > > Symptoms are following: > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs > > pci 0000:01:00.2: reg 0x20c: [mem 0x30018000-0x3001ffff 64bit] > pci 0000:01:00.2: VF(n) BAR0 space: [mem 0x30018000-0x30117fff 64bit] (contains BAR0 for 32 VFs) > Unable to handle kernel paging request at virtual address 0001000200000010 > Mem abort info: > ESR = 0x96000004 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > Data abort info: > ISV = 0, ISS = 0x00000004 > CM = 0, WnR = 0 > [0001000200000010] address between user and kernel address ranges > Internal error: Oops: 96000004 [#1] PREEMPT SMP > Modules linked in: xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 > nf_defrag_ipv4 libcrc32c iptable_filter crct10dif_ce nvme nvme_core at24 > pci_endpoint_test bridge pdrv_genirq ip_tables x_tables ipv6 > CPU: 3 PID: 287 Comm: sh Not tainted 5.10.41-lorc+ #233 > Hardware name: XENVM-4.17 (DT) > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) > pc : pcie_aspm_get_link+0x90/0xcc > lr : pcie_aspm_get_link+0x8c/0xcc > sp : ffff8000130d39c0 > x29: ffff8000130d39c0 x28: 00000000000001a4 > x27: 00000000ffffee4b x26: ffff80001164f560 > x25: 0000000000000000 x24: 0000000000000000 > x23: ffff80001164f660 x22: 0000000000000000 > x23: ffff80001164f660 x22: 0000000000000000 > x21: ffff000003f08000 x20: ffff800010db37d8 > x19: ffff000004b8e780 x18: ffffffffffffffff > x17: 0000000000000000 x16: 00000000deadbeef > x15: ffff8000930d36c7 x14: 0000000000000006 > x13: ffff8000115c2710 x12: 000000000000081c > x11: 00000000000002b4 x10: ffff8000115c2710 > x9 : ffff8000115c2710 x8 : 00000000ffffefff > x7 : ffff80001161a710 x6 : ffff80001161a710 > x5 : ffff00003fdad900 x4 : 0000000000000000 > x3 : 0000000000000000 x2 : 0000000000000000 > x1 : ffff000003c51c80 x0 : 0001000200000000 > Call trace: > pcie_aspm_get_link+0x90/0xcc > aspm_ctrl_attrs_are_visible+0x30/0xc0 > internal_create_group+0xd0/0x3cc > internal_create_groups.part.0+0x4c/0xc0 > sysfs_create_groups+0x20/0x34 > device_add+0x2b4/0x760 > pci_device_add+0x814/0x854 > pci_iov_add_virtfn+0x240/0x2f0 > sriov_enable+0x1f8/0x474 > pci_sriov_configure_simple+0x38/0x90 > sriov_numvfs_store+0xa4/0x1a0 > dev_attr_store+0x1c/0x30 > sysfs_kf_write+0x48/0x60 > kernfs_fop_write_iter+0x118/0x1ac > new_sync_write+0xe8/0x184 > vfs_write+0x23c/0x2a0 > ksys_write+0x68/0xf4 > __arm64_sys_write+0x20/0x2c > el0_svc_common.constprop.0+0x78/0x1a0 > do_el0_svc+0x28/0x94 > el0_svc+0x14/0x20 > el0_sync_handler+0xa4/0x130 > el0_sync+0x180/0x1c0 > Code: d0002120 9133e000 97ffef8e f9400a60 (f9400813) > > > Debugging showed the following: > > pci_iov_add_virtfn() allocates new struct pci_dev: > > virtfn = pci_alloc_dev(bus); > and sets physfn: > virtfn->is_virtfn = 1; > virtfn->physfn = pci_dev_get(dev); > > then we will get into sriov_init() via the following call path: > > pci_device_add(virtfn, virtfn->bus); > pci_init_capabilities(dev); > pci_iov_init(dev); > sriov_init(dev, pos); We called pci_device_add() with the VF. pci_iov_init() only calls sriov_init() if it finds an SR-IOV capability on the device: pci_iov_init(struct pci_dev *dev) pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV); if (pos) return sriov_init(dev, pos); So this means the VF must have an SR-IOV capability, which sounds a little dubious. From PCIe r6.0: 9.3.3 SR-IOV Extended Capability The SR-IOV Extended Capability defined here is a PCIe extended capability that must be implemented in each PF that supports SR-IOV. This Capability is used to describe and control a PF’s SR-IOV Capabilities. For Multi-Function Devices, each PF that supports SR-IOV shall provide the Capability structure defined in this section. This Capability structure may be present in any Function with a Type 0 Configuration Space Header. This Capability must not be present in Functions with a Type 1 Configuration Space Header. Can you supply the output of "sudo lspci -vv" for your system? It could be that the device has an SR-IOV capability when it shouldn't. But even if it does, Linux could tolerate that better than it does today. > sriov_init() overwrites value in the union: > dev->sriov = iov; <<<<<---- There > dev->is_physfn = 1; > > Next, we will get into function that causes panic: > > pci_device_add(virtfn, virtfn->bus); > device_add(&dev->dev) > sysfs_create_groups() > internal_create_group() > aspm_ctrl_attrs_are_visible() > pcie_aspm_get_link() > pci_upstream_bridge() > pci_physfn() > > which is > > static inline struct pci_dev *pci_physfn(struct pci_dev *dev) > { > #ifdef CONFIG_PCI_IOV > if (dev->is_virtfn) > dev = dev->physfn; > #endif > return dev; > } > > > as is_virtfn == 1 and dev->physfn was overwritten via write to > dev->sriov, pci_physfn() will return pointer to struct pci_sriov > allocated by sriov_init(). And then pci_upstream_bridge() will > cause panic by acessing it as if it were pointer to struct pci_dev > > I encountered this issue on ARM64, Linux 5.10.41. Tried to test > on master branch, but it is quite difficult to rebase platform > code on the master. But I compared all relevant parts of PCI code > and didn't found any differences. > > Looks like I am missing following, because how SR-IOV can be so broken? > But what exactly? > > -- > Volodymyr Babchuk at EPAM ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-06 20:17 ` Bjorn Helgaas @ 2022-05-07 1:34 ` Jason Gunthorpe 2022-05-07 10:25 ` Volodymyr Babchuk 2022-05-07 10:22 ` Volodymyr Babchuk 1 sibling, 1 reply; 14+ messages in thread From: Jason Gunthorpe @ 2022-05-07 1:34 UTC (permalink / raw) To: Bjorn Helgaas Cc: Volodymyr Babchuk, linux-pci, Alex Williamson, Leon Romanovsky On Fri, May 06, 2022 at 03:17:22PM -0500, Bjorn Helgaas wrote: > > Modules linked in: xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 > > nf_defrag_ipv4 libcrc32c iptable_filter crct10dif_ce nvme nvme_core at24 > > pci_endpoint_test bridge pdrv_genirq ip_tables x_tables ipv6 > > CPU: 3 PID: 287 Comm: sh Not tainted 5.10.41-lorc+ #233 > > Hardware name: XENVM-4.17 (DT) ^^^^^^^^^^^^^^^^^ > So this means the VF must have an SR-IOV capability, which sounds a > little dubious. From PCIe r6.0: Enabling SRIOV from within a VM is "exciting" - I would not be surprised if there was some wonky bugs here. Jsaon ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-07 1:34 ` Jason Gunthorpe @ 2022-05-07 10:25 ` Volodymyr Babchuk 2022-05-08 11:19 ` Leon Romanovsky 0 siblings, 1 reply; 14+ messages in thread From: Volodymyr Babchuk @ 2022-05-07 10:25 UTC (permalink / raw) To: Jason Gunthorpe Cc: Bjorn Helgaas, linux-pci, Alex Williamson, Leon Romanovsky Jason Gunthorpe <jgg@ziepe.ca> writes: > On Fri, May 06, 2022 at 03:17:22PM -0500, Bjorn Helgaas wrote: > >> > Modules linked in: xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 >> > nf_defrag_ipv4 libcrc32c iptable_filter crct10dif_ce nvme nvme_core at24 >> > pci_endpoint_test bridge pdrv_genirq ip_tables x_tables ipv6 >> > CPU: 3 PID: 287 Comm: sh Not tainted 5.10.41-lorc+ #233 >> > Hardware name: XENVM-4.17 (DT) > ^^^^^^^^^^^^^^^^^ > >> So this means the VF must have an SR-IOV capability, which sounds a >> little dubious. From PCIe r6.0: > > Enabling SRIOV from within a VM is "exciting" - I would not be > surprised if there was some wonky bugs here. Well, yes. But in this case, this VM has direct access to the PCIe controller. So it should not cause any troubles. I'll try baremetal setup, though. -- Volodymyr Babchuk at EPAM ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-07 10:25 ` Volodymyr Babchuk @ 2022-05-08 11:19 ` Leon Romanovsky 2022-05-09 18:22 ` Keith Busch 0 siblings, 1 reply; 14+ messages in thread From: Leon Romanovsky @ 2022-05-08 11:19 UTC (permalink / raw) To: Volodymyr Babchuk Cc: Jason Gunthorpe, Bjorn Helgaas, linux-pci, Alex Williamson On Sat, May 07, 2022 at 10:25:16AM +0000, Volodymyr Babchuk wrote: > > > Jason Gunthorpe <jgg@ziepe.ca> writes: > > > On Fri, May 06, 2022 at 03:17:22PM -0500, Bjorn Helgaas wrote: > > > >> > Modules linked in: xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 > >> > nf_defrag_ipv4 libcrc32c iptable_filter crct10dif_ce nvme nvme_core at24 > >> > pci_endpoint_test bridge pdrv_genirq ip_tables x_tables ipv6 > >> > CPU: 3 PID: 287 Comm: sh Not tainted 5.10.41-lorc+ #233 > >> > Hardware name: XENVM-4.17 (DT) > > ^^^^^^^^^^^^^^^^^ > > > >> So this means the VF must have an SR-IOV capability, which sounds a > >> little dubious. From PCIe r6.0: > > > > Enabling SRIOV from within a VM is "exciting" - I would not be > > surprised if there was some wonky bugs here. > > Well, yes. But in this case, this VM has direct access to the PCIe > controller. So it should not cause any troubles. I'll try baremetal > setup, though. If I remember correctly, the VM software that runs on hypervisor needs to filter PCI capabilities even for pass-through devices. The vfio kernel module did this for QEMU, so something similar I would expect from XEN too. Thanks > > > -- > Volodymyr Babchuk at EPAM ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-08 11:19 ` Leon Romanovsky @ 2022-05-09 18:22 ` Keith Busch 0 siblings, 0 replies; 14+ messages in thread From: Keith Busch @ 2022-05-09 18:22 UTC (permalink / raw) To: Leon Romanovsky Cc: Volodymyr Babchuk, Jason Gunthorpe, Bjorn Helgaas, linux-pci, Alex Williamson On Sun, May 08, 2022 at 02:19:56PM +0300, Leon Romanovsky wrote: > On Sat, May 07, 2022 at 10:25:16AM +0000, Volodymyr Babchuk wrote: > > Jason Gunthorpe <jgg@ziepe.ca> writes: > > > On Fri, May 06, 2022 at 03:17:22PM -0500, Bjorn Helgaas wrote: > > > > > >> > Modules linked in: xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 > > >> > nf_defrag_ipv4 libcrc32c iptable_filter crct10dif_ce nvme nvme_core at24 > > >> > pci_endpoint_test bridge pdrv_genirq ip_tables x_tables ipv6 > > >> > CPU: 3 PID: 287 Comm: sh Not tainted 5.10.41-lorc+ #233 > > >> > Hardware name: XENVM-4.17 (DT) > > > ^^^^^^^^^^^^^^^^^ > > > > > >> So this means the VF must have an SR-IOV capability, which sounds a > > >> little dubious. From PCIe r6.0: > > > > > > Enabling SRIOV from within a VM is "exciting" - I would not be > > > surprised if there was some wonky bugs here. > > > > Well, yes. But in this case, this VM has direct access to the PCIe > > controller. So it should not cause any troubles. I'll try baremetal > > setup, though. > > If I remember correctly, the VM software that runs on hypervisor needs > to filter PCI capabilities even for pass-through devices. > > The vfio kernel module did this for QEMU, so something similar I would > expect from XEN too. AIUI, some kind of translation from GPA to HPA needs to happen, but I'm not sure how that would occur for the VF's enabled within the guest when the host doesn't know about those functions. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-06 20:17 ` Bjorn Helgaas 2022-05-07 1:34 ` Jason Gunthorpe @ 2022-05-07 10:22 ` Volodymyr Babchuk 2022-05-07 15:41 ` Bjorn Helgaas 1 sibling, 1 reply; 14+ messages in thread From: Volodymyr Babchuk @ 2022-05-07 10:22 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-pci, Alex Williamson, Leon Romanovsky, Jason Gunthorpe Hello Bjorn, Bjorn Helgaas <helgaas@kernel.org> writes: > [+cc Alex, Leon, Jason] > > On Wed, May 04, 2022 at 07:56:01PM +0000, Volodymyr Babchuk wrote: >> >> Hello, >> >> I have encountered issue when PCI code tries to use both fields in >> >> union { >> struct pci_sriov *sriov; /* PF: SR-IOV info */ >> struct pci_dev *physfn; /* VF: related PF */ >> }; >> >> (which are part of struct pci_dev) at the same time. >> >> Symptoms are following: >> >> # echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs >> >> pci 0000:01:00.2: reg 0x20c: [mem 0x30018000-0x3001ffff 64bit] >> pci 0000:01:00.2: VF(n) BAR0 space: [mem 0x30018000-0x30117fff 64bit] (contains BAR0 for 32 VFs) >> Unable to handle kernel paging request at virtual address 0001000200000010 >> Mem abort info: >> ESR = 0x96000004 >> EC = 0x25: DABT (current EL), IL = 32 bits >> SET = 0, FnV = 0 >> EA = 0, S1PTW = 0 >> Data abort info: >> ISV = 0, ISS = 0x00000004 >> CM = 0, WnR = 0 >> [0001000200000010] address between user and kernel address ranges >> Internal error: Oops: 96000004 [#1] PREEMPT SMP >> Modules linked in: xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 >> nf_defrag_ipv4 libcrc32c iptable_filter crct10dif_ce nvme nvme_core at24 >> pci_endpoint_test bridge pdrv_genirq ip_tables x_tables ipv6 >> CPU: 3 PID: 287 Comm: sh Not tainted 5.10.41-lorc+ #233 >> Hardware name: XENVM-4.17 (DT) >> pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) >> pc : pcie_aspm_get_link+0x90/0xcc >> lr : pcie_aspm_get_link+0x8c/0xcc >> sp : ffff8000130d39c0 >> x29: ffff8000130d39c0 x28: 00000000000001a4 >> x27: 00000000ffffee4b x26: ffff80001164f560 >> x25: 0000000000000000 x24: 0000000000000000 >> x23: ffff80001164f660 x22: 0000000000000000 >> x23: ffff80001164f660 x22: 0000000000000000 >> x21: ffff000003f08000 x20: ffff800010db37d8 >> x19: ffff000004b8e780 x18: ffffffffffffffff >> x17: 0000000000000000 x16: 00000000deadbeef >> x15: ffff8000930d36c7 x14: 0000000000000006 >> x13: ffff8000115c2710 x12: 000000000000081c >> x11: 00000000000002b4 x10: ffff8000115c2710 >> x9 : ffff8000115c2710 x8 : 00000000ffffefff >> x7 : ffff80001161a710 x6 : ffff80001161a710 >> x5 : ffff00003fdad900 x4 : 0000000000000000 >> x3 : 0000000000000000 x2 : 0000000000000000 >> x1 : ffff000003c51c80 x0 : 0001000200000000 >> Call trace: >> pcie_aspm_get_link+0x90/0xcc >> aspm_ctrl_attrs_are_visible+0x30/0xc0 >> internal_create_group+0xd0/0x3cc >> internal_create_groups.part.0+0x4c/0xc0 >> sysfs_create_groups+0x20/0x34 >> device_add+0x2b4/0x760 >> pci_device_add+0x814/0x854 >> pci_iov_add_virtfn+0x240/0x2f0 >> sriov_enable+0x1f8/0x474 >> pci_sriov_configure_simple+0x38/0x90 >> sriov_numvfs_store+0xa4/0x1a0 >> dev_attr_store+0x1c/0x30 >> sysfs_kf_write+0x48/0x60 >> kernfs_fop_write_iter+0x118/0x1ac >> new_sync_write+0xe8/0x184 >> vfs_write+0x23c/0x2a0 >> ksys_write+0x68/0xf4 >> __arm64_sys_write+0x20/0x2c >> el0_svc_common.constprop.0+0x78/0x1a0 >> do_el0_svc+0x28/0x94 >> el0_svc+0x14/0x20 >> el0_sync_handler+0xa4/0x130 >> el0_sync+0x180/0x1c0 >> Code: d0002120 9133e000 97ffef8e f9400a60 (f9400813) >> >> >> Debugging showed the following: >> >> pci_iov_add_virtfn() allocates new struct pci_dev: >> >> virtfn = pci_alloc_dev(bus); >> and sets physfn: >> virtfn->is_virtfn = 1; >> virtfn->physfn = pci_dev_get(dev); >> >> then we will get into sriov_init() via the following call path: >> >> pci_device_add(virtfn, virtfn->bus); >> pci_init_capabilities(dev); >> pci_iov_init(dev); >> sriov_init(dev, pos); > > We called pci_device_add() with the VF. pci_iov_init() only calls > sriov_init() if it finds an SR-IOV capability on the device: > > pci_iov_init(struct pci_dev *dev) > pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV); > if (pos) > return sriov_init(dev, pos); > > So this means the VF must have an SR-IOV capability, which sounds a > little dubious. From PCIe r6.0: [...] Yes, I dived into debugging and came to the same conclusions. I'm still investigating this, but looks like my PCIe controller (DesignWare-based) incorrectly reads configuration space for VF. Looks like instead of providing access VF config space, it reads PF's one. > > Can you supply the output of "sudo lspci -vv" for your system? Sure: root@spider:~# lspci -vv 00:00.0 PCI bridge: Renesas Technology Corp. Device 0031 (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 189 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: [disabled] Memory behind bridge: 30000000-301fffff [size=2M] Prefetchable memory behind bridge: [disabled] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable+ Count=128/128 Maskable+ 64bit+ Address: 0000000004030040 Data: 0000 Masking: fffffffe Pending: 00000000 Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag+ RBE+ DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (ok), Width x2 (ok) TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt- RootCap: CRSVisible- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, NROPrPrP+, LTR+ 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd- AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 RootCmd: CERptEn- NFERptEn- FERptEn- RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000 Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [158 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn-, PerformEqu- LaneErrStat: 0 Capabilities: [178 v1] Physical Layer 16.0 GT/s <?> Capabilities: [19c v1] Lane Margining at the Receiver <?> Capabilities: [1bc v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=14us L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us Capabilities: [1cc v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [2cc v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?> Capabilities: [304 v1] Data Link Feature <?> Capabilities: [310 v1] Precision Time Measurement PTMCap: Requester:+ Responder:+ Root:+ PTMClockGranularity: 16ns PTMControl: Enabled:- RootSelected:- PTMEffectiveGranularity: Unknown Capabilities: [31c v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?> Kernel driver in use: pcieport Kernel modules: pci_endpoint_test 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express]) Subsystem: Samsung Electronics Co Ltd Device a809 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 0 NUMA node: 0 Region 0: Memory at 30010000 (64-bit, non-prefetchable) [size=32K] Expansion ROM at 30000000 [virtual] [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] Express (v2) Endpoint, MSI 00 [8/5710] DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR- 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, TPHComp-, ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=64 Masked- Vector table: BAR=0 offset=00004000 PBA: BAR=0 offset=00003000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00 Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [178 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn-, PerformEqu- LaneErrStat: 0 Capabilities: [198 v1] Physical Layer 16.0 GT/s <?> Capabilities: [1c0 v1] Lane Margining at the Receiver <?> Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- IOVSta: Migration- Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 00 VF offset: 2, stride: 1, Device ID: a824 Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [3a4 v1] Data Link Feature <?> Kernel driver in use: nvme Kernel modules: nvme > It could be that the device has an SR-IOV capability when it > shouldn't. But even if it does, Linux could tolerate that better > than it does today. > Agree there. I can create simple patch that checks for is_virtfn in sriov_init(). But what to do if it is set? >> sriov_init() overwrites value in the union: >> dev->sriov = iov; <<<<<---- There >> dev->is_physfn = 1; >> -- Volodymyr Babchuk at EPAM ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-07 10:22 ` Volodymyr Babchuk @ 2022-05-07 15:41 ` Bjorn Helgaas 2022-05-08 11:07 ` Volodymyr Babchuk 0 siblings, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2022-05-07 15:41 UTC (permalink / raw) To: Volodymyr Babchuk Cc: linux-pci, Alex Williamson, Leon Romanovsky, Jason Gunthorpe On Sat, May 07, 2022 at 10:22:32AM +0000, Volodymyr Babchuk wrote: > Bjorn Helgaas <helgaas@kernel.org> writes: > > On Wed, May 04, 2022 at 07:56:01PM +0000, Volodymyr Babchuk wrote: > >> > >> I have encountered issue when PCI code tries to use both fields in > >> > >> union { > >> struct pci_sriov *sriov; /* PF: SR-IOV info */ > >> struct pci_dev *physfn; /* VF: related PF */ > >> }; > >> > >> (which are part of struct pci_dev) at the same time. > >> > >> Symptoms are following: > >> > >> # echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs > >> > >> pci 0000:01:00.2: reg 0x20c: [mem 0x30018000-0x3001ffff 64bit] > >> pci 0000:01:00.2: VF(n) BAR0 space: [mem 0x30018000-0x30117fff 64bit] (contains BAR0 for 32 VFs) > >> Unable to handle kernel paging request at virtual address 0001000200000010 > >> Debugging showed the following: > >> > >> pci_iov_add_virtfn() allocates new struct pci_dev: > >> > >> virtfn = pci_alloc_dev(bus); > >> and sets physfn: > >> virtfn->is_virtfn = 1; > >> virtfn->physfn = pci_dev_get(dev); > >> > >> then we will get into sriov_init() via the following call path: > >> > >> pci_device_add(virtfn, virtfn->bus); > >> pci_init_capabilities(dev); > >> pci_iov_init(dev); > >> sriov_init(dev, pos); > > > > We called pci_device_add() with the VF. pci_iov_init() only calls > > sriov_init() if it finds an SR-IOV capability on the device: > > > > pci_iov_init(struct pci_dev *dev) > > pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV); > > if (pos) > > return sriov_init(dev, pos); > > > > So this means the VF must have an SR-IOV capability, which sounds a > > little dubious. From PCIe r6.0: > > [...] > > Yes, I dived into debugging and came to the same conclusions. I'm still > investigating this, but looks like my PCIe controller (DesignWare-based) > incorrectly reads configuration space for VF. Looks like instead of > providing access VF config space, it reads PF's one. > > > Can you supply the output of "sudo lspci -vv" for your system? > > Sure: > > root@spider:~# lspci -vv > 00:00.0 PCI bridge: Renesas Technology Corp. Device 0031 (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 189 > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 > I/O behind bridge: [disabled] > Memory behind bridge: 30000000-301fffff [size=2M] > Prefetchable memory behind bridge: [disabled] > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- > BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] MSI: Enable+ Count=128/128 Maskable+ 64bit+ > Address: 0000000004030040 Data: 0000 > Masking: fffffffe Pending: 00000000 > Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0 > ExtTag+ RBE+ > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- > LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us > ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 5GT/s (ok), Width x2 (ok) > TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt- > RootCap: CRSVisible- > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > DevCap2: Completion Timeout: Not Supported, TimeoutDis+, NROPrPrP+, LTR+ > 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- > EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- > FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd- > AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd- > AtomicOpsCtl: ReqEn- EgressBlck- > LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- > MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: 00000000 00000000 00000000 00000000 > RootCmd: CERptEn- NFERptEn- FERptEn- > RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- > FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 > ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000 > Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00 > Capabilities: [158 v1] Secondary PCI Express > LnkCtl3: LnkEquIntrruptEn-, PerformEqu- > LaneErrStat: 0 > Capabilities: [178 v1] Physical Layer 16.0 GT/s <?> > Capabilities: [19c v1] Lane Margining at the Receiver <?> > Capabilities: [1bc v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > PortCommonModeRestoreTime=10us PortTPowerOnTime=14us > L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > T_CommonMode=0us LTR1.2_Threshold=0ns > L1SubCtl2: T_PwrOn=10us > Capabilities: [1cc v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> > Capabilities: [2cc v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?> > Capabilities: [304 v1] Data Link Feature <?> > Capabilities: [310 v1] Precision Time Measurement > PTMCap: Requester:+ Responder:+ Root:+ > PTMClockGranularity: 16ns > PTMControl: Enabled:- RootSelected:- > PTMEffectiveGranularity: Unknown > Capabilities: [31c v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?> > Kernel driver in use: pcieport > Kernel modules: pci_endpoint_test > > 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express]) > Subsystem: Samsung Electronics Co Ltd Device a809 > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 0 > NUMA node: 0 > Region 0: Memory at 30010000 (64-bit, non-prefetchable) [size=32K] > Expansion ROM at 30000000 [virtual] [disabled] [size=64K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [70] Express (v2) Endpoint, MSI 00 [8/5710] > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- > LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR- > 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- > EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- > FRS-, TPHComp-, ExtTPHComp- > AtomicOpsCap: 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled > AtomicOpsCtl: ReqEn- > LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Capabilities: [b0] MSI-X: Enable+ Count=64 Masked- > Vector table: BAR=0 offset=00004000 > PBA: BAR=0 offset=00003000 > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- > MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: 00000000 00000000 00000000 00000000 > Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00 > Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI) > ARICap: MFVC- ACS-, Next Function: 0 > ARICtl: MFVC- ACS-, Function Group: 0 > Capabilities: [178 v1] Secondary PCI Express > LnkCtl3: LnkEquIntrruptEn-, PerformEqu- > LaneErrStat: 0 > Capabilities: [198 v1] Physical Layer 16.0 GT/s <?> > Capabilities: [1c0 v1] Lane Margining at the Receiver <?> > Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV) > IOVCap: Migration-, Interrupt Message Number: 000 > IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- > IOVSta: Migration- > Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 00 > VF offset: 2, stride: 1, Device ID: a824 > Supported Page Size: 00000553, System Page Size: 00000001 > Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable) > VF Migration: offset: 00000000, BIR: 0 > Capabilities: [3a4 v1] Data Link Feature <?> > Kernel driver in use: nvme > Kernel modules: nvme I guess this is before enabling SR-IOV on 01:00.0, so it doesn't show the VFs themselves. > > It could be that the device has an SR-IOV capability when it > > shouldn't. But even if it does, Linux could tolerate that better > > than it does today. > > Agree there. I can create simple patch that checks for is_virtfn > in sriov_init(). But what to do if it is set? Maybe something like this? It makes no sense to me that a VF would have an SR-IOV capability, but ... If the below avoids the problem, maybe collect another "lspci -vv" output including the VF(s). diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 952217572113..9c5184384a45 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -901,6 +901,10 @@ int pci_iov_init(struct pci_dev *dev) if (!pci_is_pcie(dev)) return -ENODEV; + /* Some devices include SR-IOV cap on VFs as well as PFs */ + if (dev->is_virtfn) + return -ENODEV; + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV); if (pos) return sriov_init(dev, pos); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-07 15:41 ` Bjorn Helgaas @ 2022-05-08 11:07 ` Volodymyr Babchuk 2022-05-09 16:49 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Volodymyr Babchuk @ 2022-05-08 11:07 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-pci, Alex Williamson, Leon Romanovsky, Jason Gunthorpe Hello Bjorn, Bjorn Helgaas <helgaas@kernel.org> writes: > On Sat, May 07, 2022 at 10:22:32AM +0000, Volodymyr Babchuk wrote: >> Bjorn Helgaas <helgaas@kernel.org> writes: >> > On Wed, May 04, 2022 at 07:56:01PM +0000, Volodymyr Babchuk wrote: >> >> >> >> I have encountered issue when PCI code tries to use both fields in >> >> >> >> union { >> >> struct pci_sriov *sriov; /* PF: SR-IOV info */ >> >> struct pci_dev *physfn; /* VF: related PF */ >> >> }; >> >> >> >> (which are part of struct pci_dev) at the same time. >> >> >> >> Symptoms are following: >> >> >> >> # echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs >> >> >> >> pci 0000:01:00.2: reg 0x20c: [mem 0x30018000-0x3001ffff 64bit] >> >> pci 0000:01:00.2: VF(n) BAR0 space: [mem 0x30018000-0x30117fff 64bit] (contains BAR0 for 32 VFs) >> >> Unable to handle kernel paging request at virtual address 0001000200000010 > >> >> Debugging showed the following: >> >> >> >> pci_iov_add_virtfn() allocates new struct pci_dev: >> >> >> >> virtfn = pci_alloc_dev(bus); >> >> and sets physfn: >> >> virtfn->is_virtfn = 1; >> >> virtfn->physfn = pci_dev_get(dev); >> >> >> >> then we will get into sriov_init() via the following call path: >> >> >> >> pci_device_add(virtfn, virtfn->bus); >> >> pci_init_capabilities(dev); >> >> pci_iov_init(dev); >> >> sriov_init(dev, pos); >> > >> > We called pci_device_add() with the VF. pci_iov_init() only calls >> > sriov_init() if it finds an SR-IOV capability on the device: >> > >> > pci_iov_init(struct pci_dev *dev) >> > pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV); >> > if (pos) >> > return sriov_init(dev, pos); >> > >> > So this means the VF must have an SR-IOV capability, which sounds a >> > little dubious. From PCIe r6.0: >> >> [...] >> >> Yes, I dived into debugging and came to the same conclusions. I'm still >> investigating this, but looks like my PCIe controller (DesignWare-based) >> incorrectly reads configuration space for VF. Looks like instead of >> providing access VF config space, it reads PF's one. >> >> > Can you supply the output of "sudo lspci -vv" for your system? >> >> Sure: >> >> root@spider:~# lspci -vv >> 00:00.0 PCI bridge: Renesas Technology Corp. Device 0031 (prog-if 00 [Normal decode]) >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- >> Latency: 0 >> Interrupt: pin A routed to IRQ 189 >> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 >> I/O behind bridge: [disabled] >> Memory behind bridge: 30000000-301fffff [size=2M] >> Prefetchable memory behind bridge: [disabled] >> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- >> BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B- >> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >> Capabilities: [40] Power Management version 3 >> Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold+) >> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [50] MSI: Enable+ Count=128/128 Maskable+ 64bit+ >> Address: 0000000004030040 Data: 0000 >> Masking: fffffffe Pending: 00000000 >> Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00 >> DevCap: MaxPayload 256 bytes, PhantFunc 0 >> ExtTag+ RBE+ >> DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ >> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ >> MaxPayload 128 bytes, MaxReadReq 512 bytes >> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- >> LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us >> ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+ >> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- >> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed 5GT/s (ok), Width x2 (ok) >> TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt- >> RootCap: CRSVisible- >> RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- >> RootSta: PME ReqID 0000, PMEStatus- PMEPending- >> DevCap2: Completion Timeout: Not Supported, TimeoutDis+, NROPrPrP+, LTR+ >> 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- >> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- >> FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd- >> AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- >> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd- >> AtomicOpsCtl: ReqEn- EgressBlck- >> LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- >> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- >> Compliance De-emphasis: -6dB >> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- >> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- >> Capabilities: [100 v2] Advanced Error Reporting >> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- >> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ >> AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- >> MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- >> HeaderLog: 00000000 00000000 00000000 00000000 >> RootCmd: CERptEn- NFERptEn- FERptEn- >> RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- >> FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 >> ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000 >> Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00 >> Capabilities: [158 v1] Secondary PCI Express >> LnkCtl3: LnkEquIntrruptEn-, PerformEqu- >> LaneErrStat: 0 >> Capabilities: [178 v1] Physical Layer 16.0 GT/s <?> >> Capabilities: [19c v1] Lane Margining at the Receiver <?> >> Capabilities: [1bc v1] L1 PM Substates >> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ >> PortCommonModeRestoreTime=10us PortTPowerOnTime=14us >> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ >> T_CommonMode=0us LTR1.2_Threshold=0ns >> L1SubCtl2: T_PwrOn=10us >> Capabilities: [1cc v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> >> Capabilities: [2cc v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?> >> Capabilities: [304 v1] Data Link Feature <?> >> Capabilities: [310 v1] Precision Time Measurement >> PTMCap: Requester:+ Responder:+ Root:+ >> PTMClockGranularity: 16ns >> PTMControl: Enabled:- RootSelected:- >> PTMEffectiveGranularity: Unknown >> Capabilities: [31c v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?> >> Kernel driver in use: pcieport >> Kernel modules: pci_endpoint_test >> >> 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express]) >> Subsystem: Samsung Electronics Co Ltd Device a809 >> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- >> Latency: 0 >> Interrupt: pin A routed to IRQ 0 >> NUMA node: 0 >> Region 0: Memory at 30010000 (64-bit, non-prefetchable) [size=32K] >> Expansion ROM at 30000000 [virtual] [disabled] [size=64K] >> Capabilities: [40] Power Management version 3 >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) >> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [70] Express (v2) Endpoint, MSI 00 [8/5710] >> DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited >> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W >> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- >> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- >> MaxPayload 128 bytes, MaxReadReq 512 bytes >> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- >> LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported >> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ >> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- >> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded) >> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- >> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR- >> 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- >> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- >> FRS-, TPHComp-, ExtTPHComp- >> AtomicOpsCap: 32bit- 64bit- 128bitCAS- >> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled >> AtomicOpsCtl: ReqEn- >> LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- >> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- >> Compliance De-emphasis: -6dB >> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- >> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- >> Capabilities: [b0] MSI-X: Enable+ Count=64 Masked- >> Vector table: BAR=0 offset=00004000 >> PBA: BAR=0 offset=00003000 >> Capabilities: [100 v2] Advanced Error Reporting >> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- >> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ >> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- >> MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- >> HeaderLog: 00000000 00000000 00000000 00000000 >> Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00 >> Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI) >> ARICap: MFVC- ACS-, Next Function: 0 >> ARICtl: MFVC- ACS-, Function Group: 0 >> Capabilities: [178 v1] Secondary PCI Express >> LnkCtl3: LnkEquIntrruptEn-, PerformEqu- >> LaneErrStat: 0 >> Capabilities: [198 v1] Physical Layer 16.0 GT/s <?> >> Capabilities: [1c0 v1] Lane Margining at the Receiver <?> >> Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV) >> IOVCap: Migration-, Interrupt Message Number: 000 >> IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- >> IOVSta: Migration- >> Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 00 >> VF offset: 2, stride: 1, Device ID: a824 >> Supported Page Size: 00000553, System Page Size: 00000001 >> Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable) >> VF Migration: offset: 00000000, BIR: 0 >> Capabilities: [3a4 v1] Data Link Feature <?> >> Kernel driver in use: nvme >> Kernel modules: nvme > > I guess this is before enabling SR-IOV on 01:00.0, so it doesn't show > the VFs themselves. Yes. Because kernel crashed without your suggested patch. >> > It could be that the device has an SR-IOV capability when it >> > shouldn't. But even if it does, Linux could tolerate that better >> > than it does today. >> >> Agree there. I can create simple patch that checks for is_virtfn >> in sriov_init(). But what to do if it is set? > > Maybe something like this? It makes no sense to me that a VF would > have an SR-IOV capability, but ... > > If the below avoids the problem, maybe collect another "lspci -vv" > output including the VF(s). > I had another crash in nvme_pci_enable(), for which I made quick workaround. And now yeah, it looks like I have some issues with my root complex HW: [skipping bridge info] 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express]) Subsystem: Samsung Electronics Co Ltd Device a809 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 0 NUMA node: 0 Region 0: Memory at 30010000 (64-bit, non-prefetchable) [size=32K] Expansion ROM at 30000000 [virtual] [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR- 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, TPHComp-, ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable- Count=64 Masked- Vector table: BAR=0 offset=00004000 PBA: BAR=0 offset=00003000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00 Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [178 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn-, PerformEqu- LaneErrStat: 0 Capabilities: [198 v1] Physical Layer 16.0 GT/s <?> Capabilities: [1c0 v1] Lane Margining at the Receiver <?> Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy- IOVSta: Migration- Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00 VF offset: 2, stride: 1, Device ID: a824 Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [3a4 v1] Data Link Feature <?> Kernel driver in use: nvme Kernel modules: nvme 01:00.2 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express]) Subsystem: Samsung Electronics Co Ltd Device a809 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 0 NUMA node: 0 Region 0: Memory at 30018000 (64-bit, non-prefetchable) [size=32K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR- 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, TPHComp-, ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable- Count=64 Masked- Vector table: BAR=0 offset=00004000 PBA: BAR=0 offset=00003000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00 Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [178 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn-, PerformEqu- LaneErrStat: 0 Capabilities: [198 v1] Physical Layer 16.0 GT/s <?> Capabilities: [1c0 v1] Lane Margining at the Receiver <?> Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy- IOVSta: Migration- Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00 VF offset: 2, stride: 1, Device ID: a824 Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [3a4 v1] Data Link Feature <?> Kernel modules: nvme As you can see, output for func 0 and func 2 is identical, so yeah, looks like my system reads config space for func 0 in both cases. Now at least I know where to look. Thank you for your help. > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index 952217572113..9c5184384a45 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -901,6 +901,10 @@ int pci_iov_init(struct pci_dev *dev) > if (!pci_is_pcie(dev)) > return -ENODEV; > > + /* Some devices include SR-IOV cap on VFs as well as PFs */ > + if (dev->is_virtfn) > + return -ENODEV; > + > pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV); > if (pos) > return sriov_init(dev, pos); Thanks, this patch helped. You can have my Tested-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> if you are going to include it in the kernel. On other hand, I'm wondering if it is correct to have both is_virtfn and is_physfn in the first place, as there can 4 combinations and only two (or three?) of them are valid. Maybe it is worth to replace them with enum? -- Volodymyr Babchuk at EPAM ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-08 11:07 ` Volodymyr Babchuk @ 2022-05-09 16:49 ` Bjorn Helgaas 2022-05-09 16:58 ` Alex Williamson 0 siblings, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2022-05-09 16:49 UTC (permalink / raw) To: Volodymyr Babchuk Cc: linux-pci, Alex Williamson, Leon Romanovsky, Jason Gunthorpe On Sun, May 08, 2022 at 11:07:40AM +0000, Volodymyr Babchuk wrote: > I had another crash in nvme_pci_enable(), for which I made quick > workaround. And now yeah, it looks like I have some issues with > my root complex HW: Please point to the root complex issue you see. > 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express]) > Subsystem: Samsung Electronics Co Ltd Device a809 > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 0 > NUMA node: 0 > Region 0: Memory at 30010000 (64-bit, non-prefetchable) [size=32K] > Expansion ROM at 30000000 [virtual] [disabled] [size=64K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [70] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- > LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR- > 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- > EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- > FRS-, TPHComp-, ExtTPHComp- > AtomicOpsCap: 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled > AtomicOpsCtl: ReqEn- > LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Capabilities: [b0] MSI-X: Enable- Count=64 Masked- > Vector table: BAR=0 offset=00004000 > PBA: BAR=0 offset=00003000 > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- > MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: 00000000 00000000 00000000 00000000 > Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00 > Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI) > ARICap: MFVC- ACS-, Next Function: 0 > ARICtl: MFVC- ACS-, Function Group: 0 > Capabilities: [178 v1] Secondary PCI Express > LnkCtl3: LnkEquIntrruptEn-, PerformEqu- > LaneErrStat: 0 > Capabilities: [198 v1] Physical Layer 16.0 GT/s <?> > Capabilities: [1c0 v1] Lane Margining at the Receiver <?> > Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV) > IOVCap: Migration-, Interrupt Message Number: 000 > IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy- > IOVSta: Migration- > Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00 > VF offset: 2, stride: 1, Device ID: a824 > Supported Page Size: 00000553, System Page Size: 00000001 > Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable) > VF Migration: offset: 00000000, BIR: 0 > Capabilities: [3a4 v1] Data Link Feature <?> > Kernel driver in use: nvme > Kernel modules: nvme > > 01:00.2 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express]) > Subsystem: Samsung Electronics Co Ltd Device a809 > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 0 > NUMA node: 0 > Region 0: Memory at 30018000 (64-bit, non-prefetchable) [size=32K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [70] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- > LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR- > 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- > EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- > FRS-, TPHComp-, ExtTPHComp- > AtomicOpsCap: 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled > AtomicOpsCtl: ReqEn- > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Capabilities: [b0] MSI-X: Enable- Count=64 Masked- > Vector table: BAR=0 offset=00004000 > PBA: BAR=0 offset=00003000 > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- > MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: 00000000 00000000 00000000 00000000 > Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00 > Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI) > ARICap: MFVC- ACS-, Next Function: 0 > ARICtl: MFVC- ACS-, Function Group: 0 > Capabilities: [178 v1] Secondary PCI Express > LnkCtl3: LnkEquIntrruptEn-, PerformEqu- > LaneErrStat: 0 > Capabilities: [198 v1] Physical Layer 16.0 GT/s <?> > Capabilities: [1c0 v1] Lane Margining at the Receiver <?> > Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV) > IOVCap: Migration-, Interrupt Message Number: 000 > IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy- > IOVSta: Migration- > Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00 > VF offset: 2, stride: 1, Device ID: a824 > Supported Page Size: 00000553, System Page Size: 00000001 > Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable) > VF Migration: offset: 00000000, BIR: 0 > Capabilities: [3a4 v1] Data Link Feature <?> > Kernel modules: nvme > > As you can see, output for func 0 and func 2 is identical, so yeah, > looks like my system reads config space for func 0 in both cases. They are not identical: 01:00.0 Non-Volatile memory controller Region 0: Memory at 30010000 01:00.2 Non-Volatile memory controller Region 0: Memory at 30018000 > On other hand, I'm wondering if it is correct to have both is_virtfn and > is_physfn in the first place, as there can 4 combinations and only two > (or three?) of them are valid. Maybe it is worth to replace them with > enum? Good question. I think there was a reason, but I can't remember it right now. Bjorn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-09 16:49 ` Bjorn Helgaas @ 2022-05-09 16:58 ` Alex Williamson 2022-05-10 6:39 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Alex Williamson @ 2022-05-09 16:58 UTC (permalink / raw) To: Bjorn Helgaas Cc: Volodymyr Babchuk, linux-pci, Leon Romanovsky, Jason Gunthorpe On Mon, 9 May 2022 11:49:29 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > On Sun, May 08, 2022 at 11:07:40AM +0000, Volodymyr Babchuk wrote: > > > > As you can see, output for func 0 and func 2 is identical, so yeah, > > looks like my system reads config space for func 0 in both cases. > > They are not identical: > > 01:00.0 Non-Volatile memory controller > Region 0: Memory at 30010000 > > 01:00.2 Non-Volatile memory controller > Region 0: Memory at 30018000 > > > On other hand, I'm wondering if it is correct to have both is_virtfn and > > is_physfn in the first place, as there can 4 combinations and only two > > (or three?) of them are valid. Maybe it is worth to replace them with > > enum? > > Good question. I think there was a reason, but I can't remember it > right now. AIUI, three combinations are valid: is_physfn = 0, is_virtfn = 0: A non-SR-IOV function is_physfn = 1, is_virtfn = 0: An SR-IOV PF is_physfn = 0, is_virtfn = 1: An SR-IOV VF As implemented with bit fields this is 2 bits, which is more space efficient than an enum. Thanks, Alex ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-09 16:58 ` Alex Williamson @ 2022-05-10 6:39 ` Christoph Hellwig 2022-05-10 17:37 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2022-05-10 6:39 UTC (permalink / raw) To: Alex Williamson Cc: Bjorn Helgaas, Volodymyr Babchuk, linux-pci, Leon Romanovsky, Jason Gunthorpe On Mon, May 09, 2022 at 10:58:57AM -0600, Alex Williamson wrote: > is_physfn = 0, is_virtfn = 0: A non-SR-IOV function > is_physfn = 1, is_virtfn = 0: An SR-IOV PF > is_physfn = 0, is_virtfn = 1: An SR-IOV VF > > As implemented with bit fields this is 2 bits, which is more space > efficient than an enum. Thanks, A two-bit bitfield with explicit constants for the values would probably still much eaiser to understand. And there is some code that seems to intepret is_physfn a bit odd, e.g.: arch/powerpc/kernel/eeh_sysfs.c: np = pci_device_to_OF_node(pdev->is_physfn ? pdev : pdev->physfn); arch/powerpc/kernel/eeh_sysfs.c: np = pci_device_to_OF_node(pdev->is_physfn ? pdev : pdev->physfn); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-10 6:39 ` Christoph Hellwig @ 2022-05-10 17:37 ` Bjorn Helgaas 2022-05-12 7:18 ` Volodymyr Babchuk 0 siblings, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2022-05-10 17:37 UTC (permalink / raw) To: Christoph Hellwig Cc: Alex Williamson, Volodymyr Babchuk, linux-pci, Leon Romanovsky, Jason Gunthorpe On Mon, May 09, 2022 at 11:39:30PM -0700, Christoph Hellwig wrote: > On Mon, May 09, 2022 at 10:58:57AM -0600, Alex Williamson wrote: > > is_physfn = 0, is_virtfn = 0: A non-SR-IOV function > > is_physfn = 1, is_virtfn = 0: An SR-IOV PF > > is_physfn = 0, is_virtfn = 1: An SR-IOV VF > > > > As implemented with bit fields this is 2 bits, which is more space > > efficient than an enum. Thanks, > > A two-bit bitfield with explicit constants for the values would probably > still much eaiser to understand. > > And there is some code that seems to intepret is_physfn a bit odd, e.g.: > > arch/powerpc/kernel/eeh_sysfs.c: np = pci_device_to_OF_node(pdev->is_physfn ? pdev : pdev->physfn); > arch/powerpc/kernel/eeh_sysfs.c: np = pci_device_to_OF_node(pdev->is_physfn ? pdev : pdev->physfn); "dev->sriov != NULL" and "dev->is_physfn" are basically the same and many of the dev->is_physfn uses in drivers/pci would end up being simpler if replaced with dev->sriov, e.g., int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id) { if (!dev->is_physfn) return -EINVAL; return dev->bus->number + ((dev->devfn + dev->sriov->offset + dev->sriov->stride * vf_id) >> 8); } would be more obvious as: if (dev->sriov) return dev->bus->number + ((dev->devfn + dev->sriov->offset + dev->sriov->stride * vf_id) >> 8); return -EINVAL; ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write to srvio_numvfs triggers kernel panic 2022-05-10 17:37 ` Bjorn Helgaas @ 2022-05-12 7:18 ` Volodymyr Babchuk 0 siblings, 0 replies; 14+ messages in thread From: Volodymyr Babchuk @ 2022-05-12 7:18 UTC (permalink / raw) To: Bjorn Helgaas Cc: Christoph Hellwig, Alex Williamson, linux-pci, Leon Romanovsky, Jason Gunthorpe Hello Bjorn, Bjorn Helgaas <helgaas@kernel.org> writes: > On Mon, May 09, 2022 at 11:39:30PM -0700, Christoph Hellwig wrote: >> On Mon, May 09, 2022 at 10:58:57AM -0600, Alex Williamson wrote: >> > is_physfn = 0, is_virtfn = 0: A non-SR-IOV function >> > is_physfn = 1, is_virtfn = 0: An SR-IOV PF >> > is_physfn = 0, is_virtfn = 1: An SR-IOV VF >> > >> > As implemented with bit fields this is 2 bits, which is more space >> > efficient than an enum. Thanks, >> >> A two-bit bitfield with explicit constants for the values would probably >> still much eaiser to understand. >> >> And there is some code that seems to intepret is_physfn a bit odd, e.g.: >> >> arch/powerpc/kernel/eeh_sysfs.c: np = pci_device_to_OF_node(pdev->is_physfn ? pdev : pdev->physfn); >> arch/powerpc/kernel/eeh_sysfs.c: np = pci_device_to_OF_node(pdev->is_physfn ? pdev : pdev->physfn); > > "dev->sriov != NULL" and "dev->is_physfn" are basically the same and > many of the dev->is_physfn uses in drivers/pci would end up being > simpler if replaced with dev->sriov, e.g., > > int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id) > { > if (!dev->is_physfn) > return -EINVAL; > return dev->bus->number + ((dev->devfn + dev->sriov->offset + > dev->sriov->stride * vf_id) >> 8); > } > > would be more obvious as: > > if (dev->sriov) > return dev->bus->number + ((dev->devfn + dev->sriov->offset + > dev->sriov->stride * vf_id) >> 8); > return -EINVAL; I'm not sure that this will work because dev->sriov and dev->physfn are stored in the same union. -- Volodymyr Babchuk at EPAM ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-05-12 7:18 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-04 19:56 Write to srvio_numvfs triggers kernel panic Volodymyr Babchuk 2022-05-06 20:17 ` Bjorn Helgaas 2022-05-07 1:34 ` Jason Gunthorpe 2022-05-07 10:25 ` Volodymyr Babchuk 2022-05-08 11:19 ` Leon Romanovsky 2022-05-09 18:22 ` Keith Busch 2022-05-07 10:22 ` Volodymyr Babchuk 2022-05-07 15:41 ` Bjorn Helgaas 2022-05-08 11:07 ` Volodymyr Babchuk 2022-05-09 16:49 ` Bjorn Helgaas 2022-05-09 16:58 ` Alex Williamson 2022-05-10 6:39 ` Christoph Hellwig 2022-05-10 17:37 ` Bjorn Helgaas 2022-05-12 7:18 ` Volodymyr Babchuk
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.