All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Monjalon <thomas@monjalon.net>
To: David Marchand <david.marchand@redhat.com>,
	dpdk stable <stable@dpdk.org>
Cc: Matan Azrad <matan@mellanox.com>, dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v2] bus/pci: fix driver detach clear
Date: Wed, 20 Nov 2019 14:51:36 +0100	[thread overview]
Message-ID: <2607286.WWoMLU17V0@xps> (raw)
In-Reply-To: <CAJFAV8wL=zv7kV+86jdzYT+c340iZAwFcPs1RJvrVoZ66FTFjg@mail.gmail.com>

20/11/2019 14:03, David Marchand:
> On Wed, Nov 20, 2019 at 10:48 AM Matan Azrad <matan@mellanox.com> wrote:
> >
> > When a rte_device is unplugged, the driver should be detached from the
> > device.
> >
> > The PCI detach driver operation wrongly didn't clear the driver from the
> > device structure what remain the device in probe state from the EAL
> > point of view.
> >
> > For example, when a device is removed twice using rte_dev_remove, it
> > cause a crash in EAL.
> 
> I can see a crash when using port detach in testpmd with a virtio pci device.
> 
> testpmd> port attach 0000:07:00.0
> Attaching a new port...
> EAL: PCI device 0000:07:00.0 on NUMA socket -1
> EAL:   Invalid NUMA socket, default to 0
> EAL:   probe driver: 1af4:1041 net_virtio
> Port 1 is attached. Now total ports is 2
> Done
> testpmd> port close 1
> Closing ports...
> EAL: Releasing pci mapped resource for 0000:07:00.0
> EAL: Calling pci_unmap_resource for 0000:07:00.0 at 0x2200006000
> Done
> testpmd> port detach 1
> Removing a device...
> 
> Breakpoint 1, local_dev_remove (dev=0x1de64b0) at
> /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315
> 315        if (dev->bus->unplug == NULL) {
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-292.el7.x86_64 libgcc-4.8.5-39.el7.x86_64
> libpcap-1.5.3-11.el7.x86_64 numactl-libs-2.0.12-3.el7.x86_64
> (gdb) p *dev
> $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x1cf8078
> "0000:07:00.0", driver = 0x16c68f0 <rte_virtio_pmd+16>, bus =
> 0x16b2640 <rte_pci_bus>, numa_node = 0, devargs = 0x1cf8060}
> (gdb) c
> Continuing.
> Device of port 1 is detached
> Now total ports is 1
> Done
> 
> 
> On the first detach, the pci bus frees the rte_pci_device which embeds
> the rte_device object.
> 
> static int
> pci_unplug(struct rte_device *dev)
> {
>         struct rte_pci_device *pdev;
>         int ret;
> 
>         pdev = RTE_DEV_TO_PCI(dev);
>         ret = rte_pci_detach_dev(pdev);
>         if (ret == 0) {
>                 rte_pci_remove_device(pdev);
>                 rte_devargs_remove(dev->devargs);
>                 free(pdev);
>         }
>         return ret;
> }
> 
> 
> 
> testpmd> port detach 1
> Removing a device...
> 
> Breakpoint 1, local_dev_remove (dev=0x1de64b0) at
> /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315
> 315        if (dev->bus->unplug == NULL) {
> (gdb) p *dev
> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0xa <Address 0xa
> out of bounds>, driver = 0x0, bus = 0x4637, numa_node = 1, devargs =
> 0x40000002e040018}
> (gdb) c
> Continuing.
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000007c1ddd in local_dev_remove (dev=0x1de64b0) at
> /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315
> 315        if (dev->bus->unplug == NULL) {
> 
> 
> On the second detach, testpmd passes the same rte_device pointer it
> extracts from rte_eth_devices, but the malloc'd location has been
> reused (with watchpoint on the location, I found somewhere around
> rte_mp_request_sync/opendir()), and then *crunch* on dev->bus.
> 
> 
> From my pov:
> - testpmd is wrongly reusing a pointer coming from rte_eth_devices[],
> without caring about the port state (this is what your second patch
> fixes),
> - testpmd is directly kicking pointers in rte_eth_devices[] (setting
> ->device = NULL for its own logic), which is bad too,
> - this patch just hides the reuse of a freed pointer,

I agree with most of your analysis.
So we agree that patch 2 is a real fix.
We agree that tespmd should be fixed in next release to not update
.device pointer. But keep it for now as it may be a workaround for
some drivers (need to be deeply analyzed).

But about this patch 1, it is resetting rte_device.driver,
which is used by the function rte_dev_is_probed().
It says rte_device has no rte_driver attached anymore.
This patch is the same idea as
391797f04208 ("drivers/bus: move driver assignment to end of probing")
So I consider this is a real fix.



  parent reply	other threads:[~2019-11-20 13:54 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-12  8:47 [dpdk-dev] [PATCH 1/2] bus/pci: fix driver detach clear Matan Azrad
2019-11-12  8:47 ` [dpdk-dev] [PATCH 2/2] app/testpmd: fix invalid port detaching Matan Azrad
2019-11-12 11:20   ` Iremonger, Bernard
2019-11-20 22:52     ` [dpdk-dev] [dpdk-stable] " David Marchand
2020-01-23 13:19   ` [dpdk-dev] " Yigit, Ferruh
2020-01-23 14:05     ` Matan Azrad
2020-01-23 14:48       ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
2020-01-23 15:29         ` Matan Azrad
2020-01-23 18:14           ` Ferruh Yigit
2020-01-23 19:25             ` Matan Azrad
2020-01-24 16:28               ` Ferruh Yigit
2020-01-25 18:56                 ` Matan Azrad
2020-02-03 15:58                   ` Ferruh Yigit
2020-02-03 17:10                     ` Matan Azrad
2020-02-12 13:49                       ` Ferruh Yigit
2020-02-13 12:37                         ` Thomas Monjalon
2020-02-13 13:36                           ` Thomas Monjalon
2020-02-13 14:00                             ` Ferruh Yigit
2019-11-19 22:40 ` [dpdk-dev] [dpdk-stable] [PATCH 1/2] bus/pci: fix driver detach clear Thomas Monjalon
2019-11-20  9:02   ` Matan Azrad
2019-11-20  9:47 ` [dpdk-dev] [PATCH v2] " Matan Azrad
2019-11-20 13:03   ` [dpdk-dev] [dpdk-stable] " David Marchand
2019-11-20 13:44     ` Matan Azrad
2019-11-20 13:51     ` Thomas Monjalon [this message]
2019-11-20 17:22       ` David Marchand
2019-11-20 22:52   ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2607286.WWoMLU17V0@xps \
    --to=thomas@monjalon.net \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=matan@mellanox.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.