linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>
Subject: Re: Write to srvio_numvfs triggers kernel panic
Date: Fri, 6 May 2022 15:17:22 -0500	[thread overview]
Message-ID: <20220506201722.GA555374@bhelgaas> (raw)
In-Reply-To: <87a6bxm5vm.fsf@epam.com>

[+cc Alex, Leon, Jason]

On Wed, May 04, 2022 at 07:56:01PM +0000, Volodymyr Babchuk wrote:
> 
> Hello,
> 
> I have encountered issue when PCI code tries to use both fields in
> 
>         union {
> 		struct pci_sriov	*sriov;		/* PF: SR-IOV info */
> 		struct pci_dev		*physfn;	/* VF: related PF */
> 	};
> 
> (which are part of struct pci_dev) at the same time.
> 
> Symptoms are following:
> 
> # echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
> 
> pci 0000:01:00.2: reg 0x20c: [mem 0x30018000-0x3001ffff 64bit]
> pci 0000:01:00.2: VF(n) BAR0 space: [mem 0x30018000-0x30117fff 64bit] (contains BAR0 for 32 VFs)
>  Unable to handle kernel paging request at virtual address 0001000200000010
>  Mem abort info:
>    ESR = 0x96000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>  Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
>  [0001000200000010] address between user and kernel address ranges
>  Internal error: Oops: 96000004 [#1] PREEMPT SMP
> Modules linked in: xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 libcrc32c iptable_filter crct10dif_ce nvme nvme_core at24
> pci_endpoint_test bridge pdrv_genirq ip_tables x_tables ipv6
>  CPU: 3 PID: 287 Comm: sh Not tainted 5.10.41-lorc+ #233
>  Hardware name: XENVM-4.17 (DT)
>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
>  pc : pcie_aspm_get_link+0x90/0xcc
>  lr : pcie_aspm_get_link+0x8c/0xcc
>  sp : ffff8000130d39c0
>  x29: ffff8000130d39c0 x28: 00000000000001a4
>  x27: 00000000ffffee4b x26: ffff80001164f560
>  x25: 0000000000000000 x24: 0000000000000000
>  x23: ffff80001164f660 x22: 0000000000000000
>  x23: ffff80001164f660 x22: 0000000000000000
>  x21: ffff000003f08000 x20: ffff800010db37d8
>  x19: ffff000004b8e780 x18: ffffffffffffffff
>  x17: 0000000000000000 x16: 00000000deadbeef
>  x15: ffff8000930d36c7 x14: 0000000000000006
>  x13: ffff8000115c2710 x12: 000000000000081c
>  x11: 00000000000002b4 x10: ffff8000115c2710
>  x9 : ffff8000115c2710 x8 : 00000000ffffefff
>  x7 : ffff80001161a710 x6 : ffff80001161a710
>  x5 : ffff00003fdad900 x4 : 0000000000000000
>  x3 : 0000000000000000 x2 : 0000000000000000
>  x1 : ffff000003c51c80 x0 : 0001000200000000
>  Call trace:
>   pcie_aspm_get_link+0x90/0xcc
>   aspm_ctrl_attrs_are_visible+0x30/0xc0
>   internal_create_group+0xd0/0x3cc
>   internal_create_groups.part.0+0x4c/0xc0
>   sysfs_create_groups+0x20/0x34
>   device_add+0x2b4/0x760
>   pci_device_add+0x814/0x854
>   pci_iov_add_virtfn+0x240/0x2f0
>   sriov_enable+0x1f8/0x474
>   pci_sriov_configure_simple+0x38/0x90
>   sriov_numvfs_store+0xa4/0x1a0
>   dev_attr_store+0x1c/0x30
>   sysfs_kf_write+0x48/0x60
>   kernfs_fop_write_iter+0x118/0x1ac
>   new_sync_write+0xe8/0x184
>   vfs_write+0x23c/0x2a0
>   ksys_write+0x68/0xf4
>   __arm64_sys_write+0x20/0x2c
>   el0_svc_common.constprop.0+0x78/0x1a0
>   do_el0_svc+0x28/0x94
>   el0_svc+0x14/0x20
>   el0_sync_handler+0xa4/0x130
>   el0_sync+0x180/0x1c0
> Code: d0002120 9133e000 97ffef8e f9400a60 (f9400813)
> 
> 
> Debugging showed the following:
> 
> pci_iov_add_virtfn() allocates new struct pci_dev:
> 
> 	virtfn = pci_alloc_dev(bus);
> and sets physfn:
> 	virtfn->is_virtfn = 1;
> 	virtfn->physfn = pci_dev_get(dev);
> 
> then we will get into sriov_init() via the following call path:
> 
> pci_device_add(virtfn, virtfn->bus);
>   pci_init_capabilities(dev);
>     pci_iov_init(dev);
>       sriov_init(dev, pos);

We called pci_device_add() with the VF.  pci_iov_init() only calls
sriov_init() if it finds an SR-IOV capability on the device:

  pci_iov_init(struct pci_dev *dev)
    pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
    if (pos)
      return sriov_init(dev, pos);

So this means the VF must have an SR-IOV capability, which sounds a
little dubious.  From PCIe r6.0:

  9.3.3 SR-IOV Extended Capability

  The SR-IOV Extended Capability defined here is a PCIe extended
  capability that must be implemented in each PF that supports SR-IOV.
  This Capability is used to describe and control a PF’s SR-IOV
  Capabilities.

  For Multi-Function Devices, each PF that supports SR-IOV shall
  provide the Capability structure defined in this section.  This
  Capability structure may be present in any Function with a Type 0
  Configuration Space Header. This Capability must not be present in
  Functions with a Type 1 Configuration Space Header.

Can you supply the output of "sudo lspci -vv" for your system?

It could be that the device has an SR-IOV capability when it
shouldn't.  But even if it does, Linux could tolerate that better
than it does today.

> sriov_init() overwrites value in the union:
> 	dev->sriov = iov; <<<<<---- There
> 	dev->is_physfn = 1;
> 
> Next, we will get into function that causes panic:
> 
> pci_device_add(virtfn, virtfn->bus);
>   device_add(&dev->dev)
>     sysfs_create_groups()
>       internal_create_group()
>         aspm_ctrl_attrs_are_visible()
>           pcie_aspm_get_link()
>             pci_upstream_bridge()
>               pci_physfn()
> 
> which is
> 
> static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
> {
> #ifdef CONFIG_PCI_IOV
> 	if (dev->is_virtfn)
> 		dev = dev->physfn;
> #endif
> 	return dev;
> }
> 
> 
> as is_virtfn == 1 and dev->physfn was overwritten via write to
> dev->sriov, pci_physfn() will return pointer to struct pci_sriov
> allocated by sriov_init(). And then pci_upstream_bridge() will
> cause panic by acessing it as if it were pointer to struct pci_dev
> 
> I encountered this issue on ARM64, Linux 5.10.41. Tried to test
> on master branch, but it is quite difficult to rebase platform
> code on the master. But I compared all relevant parts of PCI code
> and didn't found any differences.
> 
> Looks like I am missing following, because how SR-IOV can be so broken?
> But what exactly?
> 
> -- 
> Volodymyr Babchuk at EPAM

  reply	other threads:[~2022-05-06 20:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-04 19:56 Write to srvio_numvfs triggers kernel panic Volodymyr Babchuk
2022-05-06 20:17 ` Bjorn Helgaas [this message]
2022-05-07  1:34   ` Jason Gunthorpe
2022-05-07 10:25     ` Volodymyr Babchuk
2022-05-08 11:19       ` Leon Romanovsky
2022-05-09 18:22         ` Keith Busch
2022-05-07 10:22   ` Volodymyr Babchuk
2022-05-07 15:41     ` Bjorn Helgaas
2022-05-08 11:07       ` Volodymyr Babchuk
2022-05-09 16:49         ` Bjorn Helgaas
2022-05-09 16:58           ` Alex Williamson
2022-05-10  6:39             ` Christoph Hellwig
2022-05-10 17:37               ` Bjorn Helgaas
2022-05-12  7:18                 ` Volodymyr Babchuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220506201722.GA555374@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=alex.williamson@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).