linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>,
	Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>,
	Matthew Wilcox <matthew@wil.cx>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: [PATCH 1/7] PCI: Make sriov work with hotplug remove
Date: Mon, 23 Jan 2012 14:30:41 -0800	[thread overview]
Message-ID: <CAE9FiQW5VZZrdQmuKF6_4dvTDUO9xPL=uqo0KaohnJ4xboDwSA@mail.gmail.com> (raw)
In-Reply-To: <CAE9FiQWt7j+Eoo_xhFmaGGh7hA-j2x991KZajmberf8Kod5G1g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 853 bytes --]

On Mon, Jan 23, 2012 at 1:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Jan 23, 2012 at 11:44 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Mon, Jan 23, 2012 at 11:36 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>
>>> Maybe we can put VF and PF in bus->devices like:
>>> VF come first than PF?
>>
>> Ugh. Ok, so that's a disgusting hack, but it's better than messing up
>> the generic PCI subsystem. At least it's a disgusting hack in the IOV
>> code.
>
> just tested, the patch with this hack works.
>
>>
>> I still would prefer to just do the virtual devices right instead. Or
>> even just make the removal loop inherently robust, rather than have
>> that insane knowledge of virtual function devices that were done so
>> horribly wrong.

for_each_prev_safe is working...

please check attached patch...

Thanks

Yinghai

[-- Attachment #2: pci_001_debug_sriov_hot_remove_7.patch --]
[-- Type: text/x-patch, Size: 5702 bytes --]

Subject: [PATCH -v5] PCI: Make sriov work with hotplug remove

When hot remove pci express module that have pcie switch and support SRIOV, got

[ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
[ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
[ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
[ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
[ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
[ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
[ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
[ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
[ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
[ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
[ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
[ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
[ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
[ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
[ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
[ 5926.133377] PGD 0
[ 5926.135402] Oops: 0000 [#1] SMP
[ 5926.138659] CPU 2
[ 5926.140499] Modules linked in:
...
[ 5926.143754]
[ 5926.275823] Call Trace:
[ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
[ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
[ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
[ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
[ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
[ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
...

 +-[0000:80]-+-00.0-[81-8f]--
 |           +-01.0-[90-9f]--
 |           +-02.0-[a0-af]--
 |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
 |           |                               |            +-00.1 Device
 |           |                               |            +-00.2 Device
 |           |                               |            \-00.3 Device
 |           |                               \-03.0-[b3]--+-00.0 Device
 |           |                                            +-00.1 Device
 |           |                                            +-00.2 Device
 |           |                                            \-00.3 Device

root complex: 80:02.2
pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
                end devices  are b2:00.0 and b3.00.0.
                VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3

Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
remove the fn, so
	list_for_each_safe(l, n, &bus->devices)
will have problem to refer freed n that is pointed to vf entry.

Solution is just replacing list_for_each_safe() with list_for_each().
	pci_stop_bus_device(dev) will not remove dev from bus->devices list,
	so We only need use for_each here.

During reviewing the patch, Bjorn said:
|   The PCI hot-remove path calls pci_stop_bus_devices() via
|   pci_remove_bus_device().
|
|   pci_stop_bus_devices() traverses the bus->devices list (point A below),
|   stopping each device in turn, which calls the driver remove() method.  When
|   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
|   also uses pci_remove_bus_device() to remove the VF devices from the
|   bus->devices list (point B).
|
|       pci_remove_bus_device
|         pci_stop_bus_device
|           pci_stop_bus_devices(subordinate)
|             list_for_each(bus->devices)             <-- A
|               pci_stop_bus_device(PF)
|                 ...
|                   driver->remove
|                     pci_disable_sriov
|                       ...
|                         pci_remove_bus_device(VF)
|                             <remove from bus_list>  <-- B
|
|   At B, we're changing the same list we're iterating through at A, so when
|   the driver remove() method returns, the pci_stop_bus_devices() iterator has
|   a pointer to a list entry that has already been freed.

Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141

-v5: According to Linus to make remove more robust, Change to list_for_each_prev_safe
	instead. That is more reasonable, because those devices are added to tail of
	the list before.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/remove.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -128,7 +128,15 @@ static void pci_stop_bus_devices(struct
 {
 	struct list_head *l, *n;
 
-	list_for_each_safe(l, n, &bus->devices) {
+	/*
+	 * VFs could be removed by pci_remove_bus_device() in the
+	 *  pci_stop_bus_devices() code path for PF.
+	 *  aka, bus->devices get updated in the process.
+	 * but VFs are inserted after PFs when SRIOV is enabled for PF,
+	 * We can iterate the list backwards to get prev valid PF instead
+	 *  of removed VF.
+	 */
+	list_for_each_prev_safe(l, n, &bus->devices) {
 		struct pci_dev *dev = pci_dev_b(l);
 		pci_stop_bus_device(dev);
 	}

  reply	other threads:[~2012-01-23 22:30 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-21  9:52 [PATCH 0/7] PCI: pcie hotplug related patch Yinghai Lu
2012-01-21  9:52 ` [PATCH 1/7] PCI: Make sriov work with hotplug remove Yinghai Lu
2012-01-23 16:06   ` Linus Torvalds
2012-01-23 18:30     ` Yinghai Lu
2012-01-23 18:45       ` Linus Torvalds
2012-01-23 19:34         ` Linus Torvalds
2012-01-23 19:59           ` Yinghai Lu
2012-01-23 20:48             ` Yinghai Lu
2012-01-23 22:35               ` Linus Torvalds
2012-01-24  4:34           ` Benjamin Herrenschmidt
2012-01-23 19:36         ` Yinghai Lu
2012-01-23 19:44           ` Linus Torvalds
2012-01-23 21:34             ` Yinghai Lu
2012-01-23 22:30               ` Yinghai Lu [this message]
2012-01-23 22:38                 ` Linus Torvalds
2012-01-21  9:52 ` [PATCH 2/7] PCI: Separate pci_bus_read_dev_vendor_id from pci_scan_device Yinghai Lu
2012-01-21  9:52 ` [PATCH 3/7] pciehp: Checking pci conf reading to new added device instead of sleep 1s Yinghai Lu
2012-01-21  9:52 ` [PATCH 4/7] pciehp: print out link status when dlla get active Yinghai Lu
2012-01-21  9:52 ` [PATCH 5/7] pciehp: Add pcie_wait_link_not_active() Yinghai Lu
2012-01-21  9:52 ` [PATCH 6/7] pciehp: Add Disable/enable link functions Yinghai Lu
2012-01-23 16:13   ` Linus Torvalds
2012-01-24  5:36     ` Yinghai Lu
2012-01-21  9:52 ` [PATCH 7/7] pciehp: Disable/enable link during slot power off/on Yinghai Lu
2012-01-21 10:26 ` [PATCH 0/7] PCI: pcie hotplug related patch Yinghai Lu
2012-01-27 18:36   ` Jesse Barnes
2012-01-27 18:58     ` Yinghai Lu
2012-01-30  3:42       ` Kenji Kaneshige
2012-01-27 18:55 [PATCH -v2 " Yinghai Lu
2012-01-27 18:55 ` [PATCH 1/7] PCI: Make sriov work with hotplug remove Yinghai Lu
2012-01-27 19:43   ` Jesse Barnes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE9FiQW5VZZrdQmuKF6_4dvTDUO9xPL=uqo0KaohnJ4xboDwSA@mail.gmail.com' \
    --to=yinghai@kernel.org \
    --cc=jbarnes@virtuousgeek.org \
    --cc=kaneshige.kenji@jp.fujitsu.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).