All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
@ 2013-07-27 15:11 Yinghai Lu
  2013-07-27 15:11 ` [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that Yinghai Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-07-27 15:11 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Yinghai Lu, Yijing Wang, Jiang Liu

commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
broke the pcie native hotplug with SRIOV enabled: PF is not freed
during hot-remove, and later hot-add do not work as old pci_dev
is still around, and can not create new pci_dev.

That commit need VFs to be removed via pci_disable_sriov/virtfn_remove
to make sure ref to PF is put back.

So we can not call stop_and_remove for VF before PF, that will
make VF get removed directly before PF's driver try to use
pci_disable_sriov/virtfn_remove to do it.

Solution is separating stop and remove in two iterations.

In first iteration, we stop VF driver at first with iterating devices
reversely, and later during stop PF driver, disable_sriov will use
virtfn_remove to remove VFs.

Also some driver (like mlx4_core) need VF's driver get stopped before PF's.

Need this one for v3.11.

-v2: separate VGA checking to another patch according to Bjorn.
     and after patches that make pciehp to be built-in

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>

---
 drivers/pci/hotplug/pciehp_pci.c |   15 +++++++++++++--
 drivers/pci/pci.h                |    3 +++
 drivers/pci/remove.c             |    4 ++--
 3 files changed, 18 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
@@ -109,6 +109,13 @@ int pciehp_unconfigure_device(struct slo
 	}
 
 	/*
+	 * Now VF need to be removed via virtfn_remove to make
+	 * sure ref to PF is put back. Some driver (mlx4_core) need
+	 * VF's driver get stopped before PF.
+	 * So we need to stop VF driver at first, that means
+	 * loop reversely, and later during stop PF driver,
+	 * disable_sriov will use virtfn_remove to remove VFs.
+	 *
 	 * Stopping an SR-IOV PF device removes all the associated VFs,
 	 * which will update the bus->devices list and confuse the
 	 * iterator.  Therefore, iterate in reverse so we remove the VFs
@@ -116,8 +123,7 @@ int pciehp_unconfigure_device(struct slo
 	 */
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
 					 bus_list) {
-		pci_dev_get(dev);
-		pci_stop_and_remove_bus_device(dev);
+		pci_stop_bus_device(dev);
 		/*
 		 * Ensure that no new Requests will be generated from
 		 * the device.
@@ -128,6 +134,11 @@ int pciehp_unconfigure_device(struct slo
 			command |= PCI_COMMAND_INTX_DISABLE;
 			pci_write_config_word(dev, PCI_COMMAND, command);
 		}
+	}
+
+	list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
+		pci_dev_get(dev);
+		pci_remove_bus_device(dev);
 		pci_dev_put(dev);
 	}
 
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_remove_bus);
 
-static void pci_stop_bus_device(struct pci_dev *dev)
+void pci_stop_bus_device(struct pci_dev *dev)
 {
 	struct pci_bus *bus = dev->subordinate;
 	struct pci_dev *child, *tmp;
@@ -76,7 +76,7 @@ static void pci_stop_bus_device(struct p
 	pci_stop_dev(dev);
 }
 
-static void pci_remove_bus_device(struct pci_dev *dev)
+void pci_remove_bus_device(struct pci_dev *dev)
 {
 	struct pci_bus *bus = dev->subordinate;
 	struct pci_dev *child, *tmp;
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -317,4 +317,7 @@ static inline int pci_dev_specific_reset
 }
 #endif
 
+void pci_stop_bus_device(struct pci_dev *dev);
+void pci_remove_bus_device(struct pci_dev *dev);
+
 #endif /* DRIVERS_PCI_H */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-27 15:11 [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp Yinghai Lu
@ 2013-07-27 15:11 ` Yinghai Lu
  2013-07-29 19:58   ` Bjorn Helgaas
  2013-07-27 15:11 ` [PATCH v2 1/3] PCI/pciehp: Separate VGA checking out from loop Yinghai Lu
  2013-08-07  4:01 ` [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp Bjorn Helgaas
  2 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-07-27 15:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Yinghai Lu, Jiang Liu, Alexander Duyck, Donald Dutile,
	Greg Rose

After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
VF need to be removed via virtfn_remove to make sure ref to PF
is put back.

Some driver (like ixgbe) does not call pci_disable_sriov() if
sriov is enabled via /sys/.../sriov_numvfs setting.
ixgbe does allow driver for PF get detached, but still have VFs
around.

But how about PF get removed via /sys or pciehp finally?

During hot-remove, VF will still hold one ref to PF and it
prevent PF to be removed.
That make the next hot-add fails, as old PF dev struct is still around.

We need to add pci_disable_sriov() calling during stop PF .

Need this one for v3.11

-v2: Accoring to Bjorn, move that calling to pci_stop_dev.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Greg Rose <gregory.v.rose@intel.com>

---
 drivers/pci/remove.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
 		device_del(&dev->dev);
+		/* remove VF, if PF driver skip that */
+		pci_disable_sriov(dev);
 		dev->is_added = 0;
 	}
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/3] PCI/pciehp: Separate VGA checking out from loop
  2013-07-27 15:11 [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp Yinghai Lu
  2013-07-27 15:11 ` [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that Yinghai Lu
@ 2013-07-27 15:11 ` Yinghai Lu
  2013-08-07  4:01 ` [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp Bjorn Helgaas
  2 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-07-27 15:11 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Yinghai Lu

Separate VGA checking out and do that at first,
if there is VGA in the chain, do even try to stop or remove any device
under that bus.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/hotplug/pciehp_pci.c |   27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
@@ -92,6 +92,22 @@ int pciehp_unconfigure_device(struct slo
 	if (ret)
 		presence = 0;
 
+	/* check if VGA is around */
+	if (presence) {
+		list_for_each_entry(dev, &parent->devices, bus_list) {
+			if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+				pci_read_config_byte(dev, PCI_BRIDGE_CONTROL,
+							&bctl);
+				if (bctl & PCI_BRIDGE_CTL_VGA) {
+					ctrl_err(ctrl,
+						 "Cannot remove display device %s\n",
+						 pci_name(dev));
+					return -EINVAL;
+				}
+			}
+		}
+	}
+
 	/*
 	 * Stopping an SR-IOV PF device removes all the associated VFs,
 	 * which will update the bus->devices list and confuse the
@@ -101,17 +117,6 @@ int pciehp_unconfigure_device(struct slo
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
 					 bus_list) {
 		pci_dev_get(dev);
-		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
-			pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);
-			if (bctl & PCI_BRIDGE_CTL_VGA) {
-				ctrl_err(ctrl,
-					 "Cannot remove display device %s\n",
-					 pci_name(dev));
-				pci_dev_put(dev);
-				rc = -EINVAL;
-				break;
-			}
-		}
 		pci_stop_and_remove_bus_device(dev);
 		/*
 		 * Ensure that no new Requests will be generated from

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-27 15:11 ` [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that Yinghai Lu
@ 2013-07-29 19:58   ` Bjorn Helgaas
  2013-07-29 20:32     ` Don Dutile
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2013-07-29 19:58 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-pci, Jiang Liu, Alexander Duyck, Donald Dutile, Greg Rose

On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> (PCI: Simplify IOV implementation and fix reference count races)
> VF need to be removed via virtfn_remove to make sure ref to PF
> is put back.
>
> Some driver (like ixgbe) does not call pci_disable_sriov() if
> sriov is enabled via /sys/.../sriov_numvfs setting.
> ixgbe does allow driver for PF get detached, but still have VFs
> around.

Is this something ixgbe should be doing differently?

I'm not 100% sold on the idea of the VFs staying active after the
driver releases the PF.  It seems asymmetric because I think the
driver has to claim the PF to *enable* the VFs, but we don't disable
them when releasing the PF.

What's the use case for detaching the PF driver while the VFs are active?

> But how about PF get removed via /sys or pciehp finally?
>
> During hot-remove, VF will still hold one ref to PF and it
> prevent PF to be removed.
> That make the next hot-add fails, as old PF dev struct is still around.
>
> We need to add pci_disable_sriov() calling during stop PF .
>
> Need this one for v3.11

Don had a concern that there might be a regression here ... I'm a bit
confused on the details, but you guys need to come to agreement that
this doesn't make things worse.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jiang Liu <liuj97@gmail.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Donald Dutile <ddutile@redhat.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
>
> ---
>  drivers/pci/remove.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>                 pci_proc_detach_device(dev);
>                 pci_remove_sysfs_dev_files(dev);
>                 device_del(&dev->dev);
> +               /* remove VF, if PF driver skip that */
> +               pci_disable_sriov(dev);
>                 dev->is_added = 0;
>         }
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-29 19:58   ` Bjorn Helgaas
@ 2013-07-29 20:32     ` Don Dutile
  2013-07-29 21:07       ` Alexander Duyck
  2013-08-01 20:16       ` Bjorn Helgaas
  0 siblings, 2 replies; 25+ messages in thread
From: Don Dutile @ 2013-07-29 20:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, linux-pci, Jiang Liu, Alexander Duyck, Greg Rose

On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>  wrote:
>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> (PCI: Simplify IOV implementation and fix reference count races)
>> VF need to be removed via virtfn_remove to make sure ref to PF
>> is put back.
>>
>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>> sriov is enabled via /sys/.../sriov_numvfs setting.
>> ixgbe does allow driver for PF get detached, but still have VFs
>> around.
>
> Is this something ixgbe should be doing differently?
>
> I'm not 100% sold on the idea of the VFs staying active after the
> driver releases the PF.  It seems asymmetric because I think the
> driver has to claim the PF to *enable* the VFs, but we don't disable
> them when releasing the PF.
>
> What's the use case for detaching the PF driver while the VFs are active?
>
VF's assigned to (KVM) guest (via device-assignment).
Virtually, it's as if the enet cable is unplugged to the VF in the guest --
the device is still there (the PF wasn't unplugged, just the driver de-configured).

Pre-sysfs-based configuration, the std way to configure the VFs into
a system was to unload the PF driver, and reload it with a vf-enabling parameter
(like max_vfs=<n> in the case of ixgbe, igb).
Now, if someone unloaded the PF driver in the host, the unplanned removal
of the PF enabled the VF to crash the host (maybe AlexD can provide the
details how that occurred).
So, the solution was to 'pause' the VF operation and let packets drop
in the guest, and re-enable the VF operation if the PF driver was re-configured.

So, as I stated in previous postings, this patch is acceptable if
it doesn't cause a guest crash when a VF is assigned to a KVM guest.
If you tested this case, then please state as such in the posting.
If not, then can AlexD test this case ?

- Don
>> But how about PF get removed via /sys or pciehp finally?
>>
>> During hot-remove, VF will still hold one ref to PF and it
>> prevent PF to be removed.
>> That make the next hot-add fails, as old PF dev struct is still around.
>>
>> We need to add pci_disable_sriov() calling during stop PF .
>>
>> Need this one for v3.11
>
> Don had a concern that there might be a regression here ... I'm a bit
> confused on the details, but you guys need to come to agreement that
> this doesn't make things worse.
>
>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>> Cc: Jiang Liu<liuj97@gmail.com>
>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
>> Cc: Donald Dutile<ddutile@redhat.com>
>> Cc: Greg Rose<gregory.v.rose@intel.com>
>>
>> ---
>>   drivers/pci/remove.c |    2 ++
>>   1 file changed, 2 insertions(+)
>>
>> Index: linux-2.6/drivers/pci/remove.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/remove.c
>> +++ linux-2.6/drivers/pci/remove.c
>> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>>                  pci_proc_detach_device(dev);
>>                  pci_remove_sysfs_dev_files(dev);
>>                  device_del(&dev->dev);
>> +               /* remove VF, if PF driver skip that */
>> +               pci_disable_sriov(dev);
>>                  dev->is_added = 0;
>>          }
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-29 20:32     ` Don Dutile
@ 2013-07-29 21:07       ` Alexander Duyck
  2013-07-29 21:31         ` Don Dutile
  2013-08-01 20:16       ` Bjorn Helgaas
  1 sibling, 1 reply; 25+ messages in thread
From: Alexander Duyck @ 2013-07-29 21:07 UTC (permalink / raw)
  To: Don Dutile
  Cc: Bjorn Helgaas, Yinghai Lu, linux-pci, Jiang Liu, Greg Rose,
	Kirsher, Jeffrey T

On 07/29/2013 01:32 PM, Don Dutile wrote:
> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>  wrote:
>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>> (PCI: Simplify IOV implementation and fix reference count races)
>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>> is put back.
>>>
>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>> ixgbe does allow driver for PF get detached, but still have VFs
>>> around.
>>
>> Is this something ixgbe should be doing differently?
>>
>> I'm not 100% sold on the idea of the VFs staying active after the
>> driver releases the PF.  It seems asymmetric because I think the
>> driver has to claim the PF to *enable* the VFs, but we don't disable
>> them when releasing the PF.
>>
>> What's the use case for detaching the PF driver while the VFs are
>> active?
>>
> VF's assigned to (KVM) guest (via device-assignment).
> Virtually, it's as if the enet cable is unplugged to the VF in the
> guest --
> the device is still there (the PF wasn't unplugged, just the driver
> de-configured).
>
> Pre-sysfs-based configuration, the std way to configure the VFs into
> a system was to unload the PF driver, and reload it with a vf-enabling
> parameter
> (like max_vfs=<n> in the case of ixgbe, igb).
> Now, if someone unloaded the PF driver in the host, the unplanned removal
> of the PF enabled the VF to crash the host (maybe AlexD can provide the
> details how that occurred).
> So, the solution was to 'pause' the VF operation and let packets drop
> in the guest, and re-enable the VF operation if the PF driver was
> re-configured.
>
> So, as I stated in previous postings, this patch is acceptable if
> it doesn't cause a guest crash when a VF is assigned to a KVM guest.
> If you tested this case, then please state as such in the posting.
> If not, then can AlexD test this case ?
>
> - Don

I actually haven't done much with direct assignment to guests.  I
believe it was Greg who did that work to fix this issue.

I'm adding Jeff Kirsher to the CC.  Perhaps he can pull this patch into
a copy of the net tree and submit it to our validation team for testing
to see if they end up being able to reproduce the kernel panic issue
that was originally addressed by allowing the VFs to be persistent.

Thanks,

Alex

>>> But how about PF get removed via /sys or pciehp finally?
>>>
>>> During hot-remove, VF will still hold one ref to PF and it
>>> prevent PF to be removed.
>>> That make the next hot-add fails, as old PF dev struct is still around.
>>>
>>> We need to add pci_disable_sriov() calling during stop PF .
>>>
>>> Need this one for v3.11
>>
>> Don had a concern that there might be a regression here ... I'm a bit
>> confused on the details, but you guys need to come to agreement that
>> this doesn't make things worse.
>>
>>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>>> Cc: Jiang Liu<liuj97@gmail.com>
>>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
>>> Cc: Donald Dutile<ddutile@redhat.com>
>>> Cc: Greg Rose<gregory.v.rose@intel.com>
>>>
>>> ---
>>>   drivers/pci/remove.c |    2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> Index: linux-2.6/drivers/pci/remove.c
>>> ===================================================================
>>> --- linux-2.6.orig/drivers/pci/remove.c
>>> +++ linux-2.6/drivers/pci/remove.c
>>> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>>>                  pci_proc_detach_device(dev);
>>>                  pci_remove_sysfs_dev_files(dev);
>>>                  device_del(&dev->dev);
>>> +               /* remove VF, if PF driver skip that */
>>> +               pci_disable_sriov(dev);
>>>                  dev->is_added = 0;
>>>          }
>>>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-29 21:07       ` Alexander Duyck
@ 2013-07-29 21:31         ` Don Dutile
  2013-07-29 21:52           ` Jeff Kirsher
  0 siblings, 1 reply; 25+ messages in thread
From: Don Dutile @ 2013-07-29 21:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Bjorn Helgaas, Yinghai Lu, linux-pci, Jiang Liu, Greg Rose,
	Kirsher, Jeffrey T

On 07/29/2013 05:07 PM, Alexander Duyck wrote:
> On 07/29/2013 01:32 PM, Don Dutile wrote:
>> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>   wrote:
>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>> is put back.
>>>>
>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>> around.
>>>
>>> Is this something ixgbe should be doing differently?
>>>
>>> I'm not 100% sold on the idea of the VFs staying active after the
>>> driver releases the PF.  It seems asymmetric because I think the
>>> driver has to claim the PF to *enable* the VFs, but we don't disable
>>> them when releasing the PF.
>>>
>>> What's the use case for detaching the PF driver while the VFs are
>>> active?
>>>
>> VF's assigned to (KVM) guest (via device-assignment).
>> Virtually, it's as if the enet cable is unplugged to the VF in the
>> guest --
>> the device is still there (the PF wasn't unplugged, just the driver
>> de-configured).
>>
>> Pre-sysfs-based configuration, the std way to configure the VFs into
>> a system was to unload the PF driver, and reload it with a vf-enabling
>> parameter
>> (like max_vfs=<n>  in the case of ixgbe, igb).
>> Now, if someone unloaded the PF driver in the host, the unplanned removal
>> of the PF enabled the VF to crash the host (maybe AlexD can provide the
>> details how that occurred).
>> So, the solution was to 'pause' the VF operation and let packets drop
>> in the guest, and re-enable the VF operation if the PF driver was
>> re-configured.
>>
>> So, as I stated in previous postings, this patch is acceptable if
>> it doesn't cause a guest crash when a VF is assigned to a KVM guest.
>> If you tested this case, then please state as such in the posting.
>> If not, then can AlexD test this case ?
>>
>> - Don
>
> I actually haven't done much with direct assignment to guests.  I
> believe it was Greg who did that work to fix this issue.
>
> I'm adding Jeff Kirsher to the CC.  Perhaps he can pull this patch into
> a copy of the net tree and submit it to our validation team for testing
> to see if they end up being able to reproduce the kernel panic issue
> that was originally addressed by allowing the VFs to be persistent.
>
I'd appreciate hearing Jeff's test results....

> Thanks,
>
> Alex
>
>>>> But how about PF get removed via /sys or pciehp finally?
>>>>
>>>> During hot-remove, VF will still hold one ref to PF and it
>>>> prevent PF to be removed.
>>>> That make the next hot-add fails, as old PF dev struct is still around.
>>>>
>>>> We need to add pci_disable_sriov() calling during stop PF .
>>>>
>>>> Need this one for v3.11
>>>
>>> Don had a concern that there might be a regression here ... I'm a bit
>>> confused on the details, but you guys need to come to agreement that
>>> this doesn't make things worse.
>>>
>>>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>>>> Cc: Jiang Liu<liuj97@gmail.com>
>>>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
>>>> Cc: Donald Dutile<ddutile@redhat.com>
>>>> Cc: Greg Rose<gregory.v.rose@intel.com>
>>>>
>>>> ---
>>>>    drivers/pci/remove.c |    2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> Index: linux-2.6/drivers/pci/remove.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/drivers/pci/remove.c
>>>> +++ linux-2.6/drivers/pci/remove.c
>>>> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>>>>                   pci_proc_detach_device(dev);
>>>>                   pci_remove_sysfs_dev_files(dev);
>>>>                   device_del(&dev->dev);
>>>> +               /* remove VF, if PF driver skip that */
>>>> +               pci_disable_sriov(dev);
>>>>                   dev->is_added = 0;
>>>>           }
>>>>
>>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-29 21:31         ` Don Dutile
@ 2013-07-29 21:52           ` Jeff Kirsher
  2013-07-29 23:14             ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Kirsher @ 2013-07-29 21:52 UTC (permalink / raw)
  To: Don Dutile
  Cc: Alexander Duyck, Bjorn Helgaas, Yinghai Lu, linux-pci, Jiang Liu,
	Greg Rose

[-- Attachment #1: Type: text/plain, Size: 4493 bytes --]

On Mon, 2013-07-29 at 17:31 -0400, Don Dutile wrote:
> On 07/29/2013 05:07 PM, Alexander Duyck wrote:
> > On 07/29/2013 01:32 PM, Don Dutile wrote:
> >> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
> >>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>   wrote:
> >>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> >>>> (PCI: Simplify IOV implementation and fix reference count races)
> >>>> VF need to be removed via virtfn_remove to make sure ref to PF
> >>>> is put back.
> >>>>
> >>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
> >>>> sriov is enabled via /sys/.../sriov_numvfs setting.
> >>>> ixgbe does allow driver for PF get detached, but still have VFs
> >>>> around.
> >>>
> >>> Is this something ixgbe should be doing differently?
> >>>
> >>> I'm not 100% sold on the idea of the VFs staying active after the
> >>> driver releases the PF.  It seems asymmetric because I think the
> >>> driver has to claim the PF to *enable* the VFs, but we don't disable
> >>> them when releasing the PF.
> >>>
> >>> What's the use case for detaching the PF driver while the VFs are
> >>> active?
> >>>
> >> VF's assigned to (KVM) guest (via device-assignment).
> >> Virtually, it's as if the enet cable is unplugged to the VF in the
> >> guest --
> >> the device is still there (the PF wasn't unplugged, just the driver
> >> de-configured).
> >>
> >> Pre-sysfs-based configuration, the std way to configure the VFs into
> >> a system was to unload the PF driver, and reload it with a vf-enabling
> >> parameter
> >> (like max_vfs=<n>  in the case of ixgbe, igb).
> >> Now, if someone unloaded the PF driver in the host, the unplanned removal
> >> of the PF enabled the VF to crash the host (maybe AlexD can provide the
> >> details how that occurred).
> >> So, the solution was to 'pause' the VF operation and let packets drop
> >> in the guest, and re-enable the VF operation if the PF driver was
> >> re-configured.
> >>
> >> So, as I stated in previous postings, this patch is acceptable if
> >> it doesn't cause a guest crash when a VF is assigned to a KVM guest.
> >> If you tested this case, then please state as such in the posting.
> >> If not, then can AlexD test this case ?
> >>
> >> - Don
> >
> > I actually haven't done much with direct assignment to guests.  I
> > believe it was Greg who did that work to fix this issue.
> >
> > I'm adding Jeff Kirsher to the CC.  Perhaps he can pull this patch into
> > a copy of the net tree and submit it to our validation team for testing
> > to see if they end up being able to reproduce the kernel panic issue
> > that was originally addressed by allowing the VFs to be persistent.
> >
> I'd appreciate hearing Jeff's test results....

Can I apply this patch as a standalone? or will I need the entire 3
patch series?

> 
> > Thanks,
> >
> > Alex
> >
> >>>> But how about PF get removed via /sys or pciehp finally?
> >>>>
> >>>> During hot-remove, VF will still hold one ref to PF and it
> >>>> prevent PF to be removed.
> >>>> That make the next hot-add fails, as old PF dev struct is still around.
> >>>>
> >>>> We need to add pci_disable_sriov() calling during stop PF .
> >>>>
> >>>> Need this one for v3.11
> >>>
> >>> Don had a concern that there might be a regression here ... I'm a bit
> >>> confused on the details, but you guys need to come to agreement that
> >>> this doesn't make things worse.
> >>>
> >>>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
> >>>> Cc: Jiang Liu<liuj97@gmail.com>
> >>>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
> >>>> Cc: Donald Dutile<ddutile@redhat.com>
> >>>> Cc: Greg Rose<gregory.v.rose@intel.com>
> >>>>
> >>>> ---
> >>>>    drivers/pci/remove.c |    2 ++
> >>>>    1 file changed, 2 insertions(+)
> >>>>
> >>>> Index: linux-2.6/drivers/pci/remove.c
> >>>> ===================================================================
> >>>> --- linux-2.6.orig/drivers/pci/remove.c
> >>>> +++ linux-2.6/drivers/pci/remove.c
> >>>> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
> >>>>                   pci_proc_detach_device(dev);
> >>>>                   pci_remove_sysfs_dev_files(dev);
> >>>>                   device_del(&dev->dev);
> >>>> +               /* remove VF, if PF driver skip that */
> >>>> +               pci_disable_sriov(dev);
> >>>>                   dev->is_added = 0;
> >>>>           }
> >>>>
> >>
> >
> 



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-29 21:52           ` Jeff Kirsher
@ 2013-07-29 23:14             ` Yinghai Lu
  2013-07-29 23:23               ` Jeff Kirsher
  0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-07-29 23:14 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Don Dutile, Alexander Duyck, Bjorn Helgaas, linux-pci, Jiang Liu,
	Greg Rose

On Mon, Jul 29, 2013 at 2:52 PM, Jeff Kirsher
<jeffrey.t.kirsher@intel.com> wrote:
> Can I apply this patch as a standalone? or will I need the entire 3
> patch series?

would be better if you can test those three on top of pci/for-linus branch.
http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=for-linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-29 23:14             ` Yinghai Lu
@ 2013-07-29 23:23               ` Jeff Kirsher
  2013-07-30 15:04                 ` Don Dutile
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Kirsher @ 2013-07-29 23:23 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Don Dutile, Alexander Duyck, Bjorn Helgaas, linux-pci, Jiang Liu,
	Greg Rose

[-- Attachment #1: Type: text/plain, Size: 516 bytes --]

On Mon, 2013-07-29 at 16:14 -0700, Yinghai Lu wrote:
> On Mon, Jul 29, 2013 at 2:52 PM, Jeff Kirsher
> <jeffrey.t.kirsher@intel.com> wrote:
> > Can I apply this patch as a standalone? or will I need the entire 3
> > patch series?
> 
> would be better if you can test those three on top of pci/for-linus branch.
> http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=for-linus

I will have to do that, because I was only able to get patch 3 of the
series to apply to David Miller's net tree.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-29 23:23               ` Jeff Kirsher
@ 2013-07-30 15:04                 ` Don Dutile
  0 siblings, 0 replies; 25+ messages in thread
From: Don Dutile @ 2013-07-30 15:04 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: Yinghai Lu, Alexander Duyck, Bjorn Helgaas, linux-pci, Jiang Liu,
	Greg Rose

On 07/29/2013 07:23 PM, Jeff Kirsher wrote:
> On Mon, 2013-07-29 at 16:14 -0700, Yinghai Lu wrote:
>> On Mon, Jul 29, 2013 at 2:52 PM, Jeff Kirsher
>> <jeffrey.t.kirsher@intel.com>  wrote:
>>> Can I apply this patch as a standalone? or will I need the entire 3
>>> patch series?
>>
>> would be better if you can test those three on top of pci/for-linus branch.
>> http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=for-linus
>
> I will have to do that, because I was only able to get patch 3 of the
> series to apply to David Miller's net tree.
I could only find 1/3 of the other 2 patches; don't know why 2/3 is missing in my mailbox.

Given that Yinghai has requested all 3, and you agreed, sounds like you have your answer. ;-)

- Don


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-07-29 20:32     ` Don Dutile
  2013-07-29 21:07       ` Alexander Duyck
@ 2013-08-01 20:16       ` Bjorn Helgaas
  2013-08-01 21:21         ` Don Dutile
  1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2013-08-01 20:16 UTC (permalink / raw)
  To: Don Dutile; +Cc: Yinghai Lu, linux-pci, Jiang Liu, Alexander Duyck, Greg Rose

On Mon, Jul 29, 2013 at 2:32 PM, Don Dutile <ddutile@redhat.com> wrote:
> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>
>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>  wrote:
>>>
>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>> (PCI: Simplify IOV implementation and fix reference count races)
>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>> is put back.

OK, I'm lost here.  I think this is the scenario where Yinghai saw a
regression (please correct me if not):

  00:03.0 Root port to [bus 02]
  02:00.0 SR-IOV NIC (PF in slot 2)
  02:00.1 VF (for example)

  # echo -n 0 > /sys/bus/pci/slots/2/power
  02:00.0 PF is powered off
  02:00.0 PF pci_dev is released, but VF 02:00.1 pci_dev still exists
and holds a reference to the PF pci_dev, so the 02:00.0 pci_dev is not
actually deallocated

  # echo -n 1 > /sys/bus/pci/slots/2/power
  pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at
0000:02:00, cannot hot-add

Prior to dc087f2f6 ("Simplify IOV implementation and fix reference
count races"), this scenario (powering the slot off then back on)
apparently worked, and hot-adding 02:00.0 worked fine.

But what about the VF pci_devs?  Prior to dc087f2f6, I assume they
still existed even after we removed power from the PF.  But obviously
the hardware VFs are disabled when we power the slot back up.  It
doesn't make sense to me to have pci_devs for these VFs that are no
longer enabled, so maybe I'm missing something here.

>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>> ixgbe does allow driver for PF get detached, but still have VFs
>>> around.
>>
>> Is this something ixgbe should be doing differently?
>>
>> I'm not 100% sold on the idea of the VFs staying active after the
>> driver releases the PF.  It seems asymmetric because I think the
>> driver has to claim the PF to *enable* the VFs, but we don't disable
>> them when releasing the PF.
>>
>> What's the use case for detaching the PF driver while the VFs are active?
>>
> VF's assigned to (KVM) guest (via device-assignment).
> Virtually, it's as if the enet cable is unplugged to the VF in the guest --
> the device is still there (the PF wasn't unplugged, just the driver
> de-configured).

OK, but why is it necessary to detach and reattach the PF driver at
all?  Are you trying to update the PF driver or something?  Change the
number of VFs?

> Pre-sysfs-based configuration, the std way to configure the VFs into
> a system was to unload the PF driver, and reload it with a vf-enabling
> parameter
> (like max_vfs=<n> in the case of ixgbe, igb).

Yes.  But I assume the first time the PF was loaded, there were no VFs
enabled.  So disabling VFs at unload wouldn't cause any problem there.
 Then you reload the driver with "max_vfs=<n>".  The driver enables
VFs.  Is there any reason to unload the PF driver again?

> Now, if someone unloaded the PF driver in the host, the unplanned removal
> of the PF enabled the VF to crash the host (maybe AlexD can provide the
> details how that occurred).
> So, the solution was to 'pause' the VF operation and let packets drop
> in the guest, and re-enable the VF operation if the PF driver was
> re-configured.

I don't get this.  Why should we tolerate "unplanned removal of the
PF"?  If you yank the card, I would *expect* anything using it to
crash.

Bjorn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-08-01 20:16       ` Bjorn Helgaas
@ 2013-08-01 21:21         ` Don Dutile
  2013-08-01 21:41           ` Bjorn Helgaas
  2013-08-01 21:55           ` Alexander Duyck
  0 siblings, 2 replies; 25+ messages in thread
From: Don Dutile @ 2013-08-01 21:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, linux-pci, Jiang Liu, Alexander Duyck, Greg Rose

On 08/01/2013 04:16 PM, Bjorn Helgaas wrote:
> On Mon, Jul 29, 2013 at 2:32 PM, Don Dutile<ddutile@redhat.com>  wrote:
>> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>>
>>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>   wrote:
>>>>
>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>> is put back.
>
> OK, I'm lost here.  I think this is the scenario where Yinghai saw a
> regression (please correct me if not):
>
>    00:03.0 Root port to [bus 02]
>    02:00.0 SR-IOV NIC (PF in slot 2)
>    02:00.1 VF (for example)
>
>    # echo -n 0>  /sys/bus/pci/slots/2/power
>    02:00.0 PF is powered off
>    02:00.0 PF pci_dev is released, but VF 02:00.1 pci_dev still exists
> and holds a reference to the PF pci_dev, so the 02:00.0 pci_dev is not
> actually deallocated
>
>    # echo -n 1>  /sys/bus/pci/slots/2/power
>    pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at
> 0000:02:00, cannot hot-add
>
> Prior to dc087f2f6 ("Simplify IOV implementation and fix reference
> count races"), this scenario (powering the slot off then back on)
> apparently worked, and hot-adding 02:00.0 worked fine.
>
> But what about the VF pci_devs?  Prior to dc087f2f6, I assume they
> still existed even after we removed power from the PF.  But obviously
> the hardware VFs are disabled when we power the slot back up.  It
> doesn't make sense to me to have pci_devs for these VFs that are no
> longer enabled, so maybe I'm missing something here.
>
>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>> around.
>>>
>>> Is this something ixgbe should be doing differently?
>>>
>>> I'm not 100% sold on the idea of the VFs staying active after the
>>> driver releases the PF.  It seems asymmetric because I think the
>>> driver has to claim the PF to *enable* the VFs, but we don't disable
>>> them when releasing the PF.
>>>
>>> What's the use case for detaching the PF driver while the VFs are active?
>>>
>> VF's assigned to (KVM) guest (via device-assignment).
>> Virtually, it's as if the enet cable is unplugged to the VF in the guest --
>> the device is still there (the PF wasn't unplugged, just the driver
>> de-configured).
>
> OK, but why is it necessary to detach and reattach the PF driver at
> all?  Are you trying to update the PF driver or something?  Change the
> number of VFs?
>
One case is to update the PF driver; the other is 'operator error' :-/
Want to not crash the guest w/VFs, and allow operator to 'mend their error',
i.e., re-load the PF driver back.

>> Pre-sysfs-based configuration, the std way to configure the VFs into
>> a system was to unload the PF driver, and reload it with a vf-enabling
>> parameter
>> (like max_vfs=<n>  in the case of ixgbe, igb).
>
> Yes.  But I assume the first time the PF was loaded, there were no VFs
> enabled.  So disabling VFs at unload wouldn't cause any problem there.
>   Then you reload the driver with "max_vfs=<n>".  The driver enables
> VFs.  Is there any reason to unload the PF driver again?
>
>> Now, if someone unloaded the PF driver in the host, the unplanned removal
>> of the PF enabled the VF to crash the host (maybe AlexD can provide the
>> details how that occurred).
>> So, the solution was to 'pause' the VF operation and let packets drop
>> in the guest, and re-enable the VF operation if the PF driver was
>> re-configured.
>
> I don't get this.  Why should we tolerate "unplanned removal of the
> PF"?  If you yank the card, I would *expect* anything using it to
> crash.
>
Last time I checked, PCIe handles surprise removal, electrically,
on unplanned removal -- it's capacitively coupled so crashes due to
odd signalling transition can be handled (malformed pcie packet, w/c,
which aer handles/dismisses).

Note, the i(x)gb(e) code was designed to handle operator error around
PF driver removal.  hot plug/unplug was probably not tested against
the 'save the VF state' code in the i(x)gb(e)vf drivers.
Again, try asking the driver owner(s).
  
> Bjorn


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-08-01 21:21         ` Don Dutile
@ 2013-08-01 21:41           ` Bjorn Helgaas
  2013-08-01 21:55           ` Alexander Duyck
  1 sibling, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2013-08-01 21:41 UTC (permalink / raw)
  To: Don Dutile; +Cc: Yinghai Lu, linux-pci, Jiang Liu, Alexander Duyck, Greg Rose

On Thu, Aug 1, 2013 at 3:21 PM, Don Dutile <ddutile@redhat.com> wrote:
> On 08/01/2013 04:16 PM, Bjorn Helgaas wrote:
>>
>> On Mon, Jul 29, 2013 at 2:32 PM, Don Dutile<ddutile@redhat.com>  wrote:
>>>
>>> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>>>
>>>>
>>>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>   wrote:
>>>>>
>>>>>
>>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>>> is put back.
>>
>>
>> OK, I'm lost here.  I think this is the scenario where Yinghai saw a
>> regression (please correct me if not):
>>
>>    00:03.0 Root port to [bus 02]
>>    02:00.0 SR-IOV NIC (PF in slot 2)
>>    02:00.1 VF (for example)
>>
>>    # echo -n 0>  /sys/bus/pci/slots/2/power
>>    02:00.0 PF is powered off
>>    02:00.0 PF pci_dev is released, but VF 02:00.1 pci_dev still exists
>> and holds a reference to the PF pci_dev, so the 02:00.0 pci_dev is not
>> actually deallocated
>>
>>    # echo -n 1>  /sys/bus/pci/slots/2/power
>>    pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at
>> 0000:02:00, cannot hot-add
>>
>> Prior to dc087f2f6 ("Simplify IOV implementation and fix reference
>> count races"), this scenario (powering the slot off then back on)
>> apparently worked, and hot-adding 02:00.0 worked fine.
>>
>> But what about the VF pci_devs?  Prior to dc087f2f6, I assume they
>> still existed even after we removed power from the PF.  But obviously
>> the hardware VFs are disabled when we power the slot back up.  It
>> doesn't make sense to me to have pci_devs for these VFs that are no
>> longer enabled, so maybe I'm missing something here.
>>
>>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>>> around.
>>>>
>>>>
>>>> Is this something ixgbe should be doing differently?
>>>>
>>>> I'm not 100% sold on the idea of the VFs staying active after the
>>>> driver releases the PF.  It seems asymmetric because I think the
>>>> driver has to claim the PF to *enable* the VFs, but we don't disable
>>>> them when releasing the PF.
>>>>
>>>> What's the use case for detaching the PF driver while the VFs are
>>>> active?
>>>>
>>> VF's assigned to (KVM) guest (via device-assignment).
>>> Virtually, it's as if the enet cable is unplugged to the VF in the guest
>>> --
>>> the device is still there (the PF wasn't unplugged, just the driver
>>> de-configured).
>>
>>
>> OK, but why is it necessary to detach and reattach the PF driver at
>> all?  Are you trying to update the PF driver or something?  Change the
>> number of VFs?
>>
> One case is to update the PF driver; the other is 'operator error' :-/
> Want to not crash the guest w/VFs, and allow operator to 'mend their error',
> i.e., re-load the PF driver back.

Neither of these sounds very compelling to me.

We don't support updating drivers for non-SR-IOV devices; why should
we bother doing something different just because this device has
SR-IOV enabled?

Is there a way to address the operator error issue by making rmmod
fail if there are VFs assigned to a guest?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
  2013-08-01 21:21         ` Don Dutile
  2013-08-01 21:41           ` Bjorn Helgaas
@ 2013-08-01 21:55           ` Alexander Duyck
  1 sibling, 0 replies; 25+ messages in thread
From: Alexander Duyck @ 2013-08-01 21:55 UTC (permalink / raw)
  To: Don Dutile; +Cc: Bjorn Helgaas, Yinghai Lu, linux-pci, Jiang Liu, Greg Rose

On 08/01/2013 02:21 PM, Don Dutile wrote:
> On 08/01/2013 04:16 PM, Bjorn Helgaas wrote:
>> On Mon, Jul 29, 2013 at 2:32 PM, Don Dutile<ddutile@redhat.com>  wrote:
>>> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>>>
>>>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>  
>>>> wrote:
>>>>>
>>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>>> is put back.
>>
>> OK, I'm lost here.  I think this is the scenario where Yinghai saw a
>> regression (please correct me if not):
>>
>>    00:03.0 Root port to [bus 02]
>>    02:00.0 SR-IOV NIC (PF in slot 2)
>>    02:00.1 VF (for example)
>>
>>    # echo -n 0>  /sys/bus/pci/slots/2/power
>>    02:00.0 PF is powered off
>>    02:00.0 PF pci_dev is released, but VF 02:00.1 pci_dev still exists
>> and holds a reference to the PF pci_dev, so the 02:00.0 pci_dev is not
>> actually deallocated
>>
>>    # echo -n 1>  /sys/bus/pci/slots/2/power
>>    pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at
>> 0000:02:00, cannot hot-add
>>
>> Prior to dc087f2f6 ("Simplify IOV implementation and fix reference
>> count races"), this scenario (powering the slot off then back on)
>> apparently worked, and hot-adding 02:00.0 worked fine.
>>
>> But what about the VF pci_devs?  Prior to dc087f2f6, I assume they
>> still existed even after we removed power from the PF.  But obviously
>> the hardware VFs are disabled when we power the slot back up.  It
>> doesn't make sense to me to have pci_devs for these VFs that are no
>> longer enabled, so maybe I'm missing something here.
>>

Actually this is part of what I don't understand as well.  I would have
thought that if the device was hot-plugged that the VF devices would
have been removed as devices on the PCI bus since they all lived on the
same bus.  Why is it the VFs are still present to hold a reference to
the PF at all?

I'm not sure why we even have to disable SR-IOV in this case.  It seems
like the hot-plug should have walked though the bus already and removed
the VFs.

>>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>>> around.
>>>>
>>>> Is this something ixgbe should be doing differently?
>>>>
>>>> I'm not 100% sold on the idea of the VFs staying active after the
>>>> driver releases the PF.  It seems asymmetric because I think the
>>>> driver has to claim the PF to *enable* the VFs, but we don't disable
>>>> them when releasing the PF.
>>>>
>>>> What's the use case for detaching the PF driver while the VFs are
>>>> active?
>>>>
>>> VF's assigned to (KVM) guest (via device-assignment).
>>> Virtually, it's as if the enet cable is unplugged to the VF in the
>>> guest --
>>> the device is still there (the PF wasn't unplugged, just the driver
>>> de-configured).
>>
>> OK, but why is it necessary to detach and reattach the PF driver at
>> all?  Are you trying to update the PF driver or something?  Change the
>> number of VFs?
>>
> One case is to update the PF driver; the other is 'operator error' :-/
> Want to not crash the guest w/VFs, and allow operator to 'mend their
> error',
> i.e., re-load the PF driver back.
>

That pretty much matches my understanding of things.  Basically we
cannot modify the number of VFs until all of the VFs are unassigned.

>>> Pre-sysfs-based configuration, the std way to configure the VFs into
>>> a system was to unload the PF driver, and reload it with a vf-enabling
>>> parameter
>>> (like max_vfs=<n>  in the case of ixgbe, igb).
>>
>> Yes.  But I assume the first time the PF was loaded, there were no VFs
>> enabled.  So disabling VFs at unload wouldn't cause any problem there.
>>   Then you reload the driver with "max_vfs=<n>".  The driver enables
>> VFs.  Is there any reason to unload the PF driver again?
>>
>>> Now, if someone unloaded the PF driver in the host, the unplanned
>>> removal
>>> of the PF enabled the VF to crash the host (maybe AlexD can provide the
>>> details how that occurred).
>>> So, the solution was to 'pause' the VF operation and let packets drop
>>> in the guest, and re-enable the VF operation if the PF driver was
>>> re-configured.
>>
>> I don't get this.  Why should we tolerate "unplanned removal of the
>> PF"?  If you yank the card, I would *expect* anything using it to
>> crash.
>>
> Last time I checked, PCIe handles surprise removal, electrically,
> on unplanned removal -- it's capacitively coupled so crashes due to
> odd signalling transition can be handled (malformed pcie packet, w/c,
> which aer handles/dismisses).
>

PCIe handles the surprise removal, but the kernel doesn't do much about
it.  From what I have seen in the past surprise removal results in
completion timeouts and reads returning ~0.  The device itself doesn't
get removed from the bus by the device going away.

> Note, the i(x)gb(e) code was designed to handle operator error around
> PF driver removal.  hot plug/unplug was probably not tested against
> the 'save the VF state' code in the i(x)gb(e)vf drivers.
> Again, try asking the driver owner(s).

I'm suspecting hot-plug will probably cause the same issues the we saw
with the PF driver removal.  I'm still waiting on the response from
testing before I know one way or the other.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-07-27 15:11 [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp Yinghai Lu
  2013-07-27 15:11 ` [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that Yinghai Lu
  2013-07-27 15:11 ` [PATCH v2 1/3] PCI/pciehp: Separate VGA checking out from loop Yinghai Lu
@ 2013-08-07  4:01 ` Bjorn Helgaas
  2013-08-07  6:34   ` Yinghai Lu
  2 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2013-08-07  4:01 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige

[+cc Kenji]

On Sat, Jul 27, 2013 at 08:11:06AM -0700, Yinghai Lu wrote:
> commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> (PCI: Simplify IOV implementation and fix reference count races)
> broke the pcie native hotplug with SRIOV enabled: PF is not freed
> during hot-remove, and later hot-add do not work as old pci_dev
> is still around, and can not create new pci_dev.
> 
> That commit need VFs to be removed via pci_disable_sriov/virtfn_remove
> to make sure ref to PF is put back.

Huh.  I wish we didn't have virtfn_remove() at all.  I wish the
normal device removal path, i.e., pci_stop_and_remove_bus_device(),
could deal with VFs directly.  I don't know all the history there,
so maybe there's some reason that's not feasible.

> So we can not call stop_and_remove for VF before PF, that will
> make VF get removed directly before PF's driver try to use
> pci_disable_sriov/virtfn_remove to do it.
> 
> Solution is separating stop and remove in two iterations.
> 
> In first iteration, we stop VF driver at first with iterating devices
> reversely, and later during stop PF driver, disable_sriov will use
> virtfn_remove to remove VFs.
> 
> Also some driver (like mlx4_core) need VF's driver get stopped before PF's.
> 
> Need this one for v3.11.
> 
> -v2: separate VGA checking to another patch according to Bjorn.
>      and after patches that make pciehp to be built-in
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Yijing Wang <wangyijing@huawei.com>
> Cc: Jiang Liu <liuj97@gmail.com>
> 
> ---
>  drivers/pci/hotplug/pciehp_pci.c |   15 +++++++++++++--
>  drivers/pci/pci.h                |    3 +++
>  drivers/pci/remove.c             |    4 ++--
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> @@ -109,6 +109,13 @@ int pciehp_unconfigure_device(struct slo
>  	}
>  
>  	/*
> +	 * Now VF need to be removed via virtfn_remove to make
> +	 * sure ref to PF is put back. Some driver (mlx4_core) need
> +	 * VF's driver get stopped before PF.
> +	 * So we need to stop VF driver at first, that means
> +	 * loop reversely, and later during stop PF driver,
> +	 * disable_sriov will use virtfn_remove to remove VFs.
> +	 *
>  	 * Stopping an SR-IOV PF device removes all the associated VFs,
>  	 * which will update the bus->devices list and confuse the
>  	 * iterator.  Therefore, iterate in reverse so we remove the VFs
> @@ -116,8 +123,7 @@ int pciehp_unconfigure_device(struct slo
>  	 */
>  	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
>  					 bus_list) {
> -		pci_dev_get(dev);
> -		pci_stop_and_remove_bus_device(dev);
> +		pci_stop_bus_device(dev);
>  		/*
>  		 * Ensure that no new Requests will be generated from
>  		 * the device.
> @@ -128,6 +134,11 @@ int pciehp_unconfigure_device(struct slo
>  			command |= PCI_COMMAND_INTX_DISABLE;
>  			pci_write_config_word(dev, PCI_COMMAND, command);

This "disable bus mastering, SERR reporting, and INTx" stuff seems
like it's in the wrong place.  I don't think it's specific to
pciehp, and it seems like it should be in the generic PCI device
removal path.  Kenji added it with 2326e2b99, "in accordance with
the specification," but I don't know specifically what he was
referring to.

But this isn't trivial and it's not part of your patch, so we can
worry about that later.

>  		}
> +	}
> +
> +	list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
> +		pci_dev_get(dev);
> +		pci_remove_bus_device(dev);
>  		pci_dev_put(dev);

I don't see the point of the pci_dev_get()/pci_dev_put() here.  It
doesn't do anything useful, does it?

The pci_dev_get() doesn't help: it will keep a racing path from
removing the dev *after* we call pci_dev_get(), but that racing path
could just as easily remove the dev *before* we call pci_dev_get().

And there's no reason to hold onto a reference after we call
pci_remove_bus_device(), because we don't do anything else with the
device before we call pci_dev_put().

>  	}
>  
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
>  }
>  EXPORT_SYMBOL(pci_remove_bus);
>  
> -static void pci_stop_bus_device(struct pci_dev *dev)
> +void pci_stop_bus_device(struct pci_dev *dev)
>  {
>  	struct pci_bus *bus = dev->subordinate;
>  	struct pci_dev *child, *tmp;
> @@ -76,7 +76,7 @@ static void pci_stop_bus_device(struct p
>  	pci_stop_dev(dev);
>  }
>  
> -static void pci_remove_bus_device(struct pci_dev *dev)
> +void pci_remove_bus_device(struct pci_dev *dev)
>  {
>  	struct pci_bus *bus = dev->subordinate;
>  	struct pci_dev *child, *tmp;
> Index: linux-2.6/drivers/pci/pci.h
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pci.h
> +++ linux-2.6/drivers/pci/pci.h
> @@ -317,4 +317,7 @@ static inline int pci_dev_specific_reset
>  }
>  #endif
>  
> +void pci_stop_bus_device(struct pci_dev *dev);
> +void pci_remove_bus_device(struct pci_dev *dev);
> +
>  #endif /* DRIVERS_PCI_H */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-08-07  4:01 ` [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp Bjorn Helgaas
@ 2013-08-07  6:34   ` Yinghai Lu
  2013-08-08  1:49     ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-08-07  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige

On Tue, Aug 6, 2013 at 9:01 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Kenji]
>
> On Sat, Jul 27, 2013 at 08:11:06AM -0700, Yinghai Lu wrote:
>> commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> (PCI: Simplify IOV implementation and fix reference count races)
>> broke the pcie native hotplug with SRIOV enabled: PF is not freed
>> during hot-remove, and later hot-add do not work as old pci_dev
>> is still around, and can not create new pci_dev.
>>
>> That commit need VFs to be removed via pci_disable_sriov/virtfn_remove
>> to make sure ref to PF is put back.
>
> Huh.  I wish we didn't have virtfn_remove() at all.  I wish the
> normal device removal path, i.e., pci_stop_and_remove_bus_device(),
> could deal with VFs directly.  I don't know all the history there,
> so maybe there's some reason that's not feasible.

I had one draft version, but looks more confusing.

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 21a7182..5f8d217 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -132,6 +132,24 @@ failed:
     return rc;
 }

+void virtfn_release(struct pci_dev *virtfn)
+{
+    struct pci_dev *dev;
+    struct pci_sriov *iov;
+    int locked;
+
+    if (!virtfn->is_virtfn)
+        return;
+
+    dev = virtfn->physfn;
+    iov = dev->sriov;
+    locked = mutex_trylock(&iov->dev->sriov->lock);
+    virtfn_remove_bus(dev->bus, virtfn->bus);
+    if (locked)
+        mutex_unlock(&iov->dev->sriov->lock);
+    pci_dev_put(dev);
+}
+
 static void virtfn_remove(struct pci_dev *dev, int id, int reset)
 {
     char buf[VIRTFN_ID_LEN];
@@ -161,12 +179,10 @@ static void virtfn_remove(struct pci_dev *dev,
int id, int reset)

     mutex_lock(&iov->dev->sriov->lock);
     pci_stop_and_remove_bus_device(virtfn);
-    virtfn_remove_bus(dev->bus, virtfn->bus);
     mutex_unlock(&iov->dev->sriov->lock);

     /* balance pci_get_domain_bus_and_slot() */
     pci_dev_put(virtfn);
-    pci_dev_put(dev);
 }

 static int sriov_migration(struct pci_dev *dev)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 8a00c06..1d78e95 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -262,6 +262,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno,
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
 void pci_restore_iov_state(struct pci_dev *dev);
 int pci_iov_bus_range(struct pci_bus *bus);
+void virtfn_release(struct pci_dev *dev);

 #else
 static inline int pci_iov_init(struct pci_dev *dev)
@@ -284,6 +285,9 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
 {
     return 0;
 }
+void virtfn_release(struct pci_dev *dev)
+{
+}

 #endif /* CONFIG_PCI_IOV */

diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 8fc54b7..46d9ec8 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -38,6 +38,8 @@ static void pci_destroy_dev(struct pci_dev *dev)
     list_del(&dev->bus_list);
     up_write(&pci_bus_sem);

+    virtfn_release(dev);
+
     pci_free_resources(dev);
     put_device(&dev->dev);
 }


>> +     }
>> +
>> +     list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
>> +             pci_dev_get(dev);
>> +             pci_remove_bus_device(dev);
>>               pci_dev_put(dev);
>
> I don't see the point of the pci_dev_get()/pci_dev_put() here.  It
> doesn't do anything useful, does it?
>
> The pci_dev_get() doesn't help: it will keep a racing path from
> removing the dev *after* we call pci_dev_get(), but that racing path
> could just as easily remove the dev *before* we call pci_dev_get().
>
> And there's no reason to hold onto a reference after we call
> pci_remove_bus_device(), because we don't do anything else with the
> device before we call pci_dev_put().

should be ok, just like remove one device from /sys.
device_schedule will keep the last one ref and be put back
after the put from pci_destroy_dev.

Yinghai

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-08-07  6:34   ` Yinghai Lu
@ 2013-08-08  1:49     ` Yinghai Lu
  2013-08-08  1:54       ` Yinghai Lu
  2013-08-09 17:04       ` Bjorn Helgaas
  0 siblings, 2 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-08-08  1:49 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige

[-- Attachment #1: Type: text/plain, Size: 878 bytes --]

On Tue, Aug 6, 2013 at 11:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Aug 6, 2013 at 9:01 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> Huh.  I wish we didn't have virtfn_remove() at all.  I wish the
>> normal device removal path, i.e., pci_stop_and_remove_bus_device(),
>> could deal with VFs directly.  I don't know all the history there,
>> so maybe there's some reason that's not feasible.
>
> I had one draft version, but looks more confusing.

please check attached that make pci_stop_and_remove_bus_device()
could be used with VF.

It could replace
https://patchwork.kernel.org/patch/2834638/
https://patchwork.kernel.org/patch/2834639/

Please choose one of the solutions.

but we still need
https://patchwork.kernel.org/patch/2834640/
as VFs that does not use ARI could be on other virtual bus.
so they will not be removed directly.

Thanks

Yinghai

[-- Attachment #2: virtfn_release.patch --]
[-- Type: application/octet-stream, Size: 4530 bytes --]

Subject: [PATCH] PCI: Release PF ref during removing VF
From: Yinghai Lu <yinghai@kernel.org>

commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
broke the pcie native hotplug with SRIOV enabled: PF is not freed
during hot-remove, and later hot-add do not work as old pci_dev
is still around, and can not create new pci_dev.

That commit need VFs to be removed via pci_disable_sriov/virtfn_remove
to make sure ref to PF is put back.

So we can not call stop_and_remove for VF before PF, that will
make VF get removed directly before PF's driver try to use
pci_disable_sriov/virtfn_remove to do it.

Solution would be
1. separating stop and remove in two iterations.
2. or make pci_stop_and_remove_bus_device could be used on VF.

This patch is second solution:
Separate PF ref releasing from virtfn_remove, all that during
pci_destroy_dev for VFs.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/iov.c    |   34 +++++++++++++++++++++++++++++-----
 drivers/pci/pci.h    |    4 ++++
 drivers/pci/remove.c |   17 +++++++++++++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

Index: linux-2.6/drivers/pci/iov.c
===================================================================
--- linux-2.6.orig/drivers/pci/iov.c
+++ linux-2.6/drivers/pci/iov.c
@@ -132,9 +132,37 @@ failed:
 	return rc;
 }
 
-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+void virtfn_release(struct pci_dev *virtfn)
 {
+	int i;
+	struct pci_dev *dev;
 	char buf[VIRTFN_ID_LEN];
+
+	if (!virtfn->is_virtfn)
+		return;
+
+	dev = virtfn->physfn;
+
+	/*
+	 * Remove link to virtfn for PF.
+	 * pci_disable_sriov() could be called after pci_stop_dev() is
+	 * called for PF. So check sd to avoid spurious sfsfs warning.
+	 */
+	if (dev->dev.kobj.sd)
+		for (i = 0; i < dev->sriov->num_VFs; i++)
+			if ((virtfn_bus(dev, i) == virtfn->bus->number) &&
+			    (virtfn_devfn(dev, i) == virtfn->devfn)) {
+				sprintf(buf, "virtfn%u", i);
+				sysfs_remove_link(&dev->dev.kobj, buf);
+				break;
+			}
+
+	virtfn_remove_bus(dev->bus, virtfn->bus);
+	pci_dev_put(dev);
+}
+
+static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+{
 	struct pci_dev *virtfn;
 	struct pci_sriov *iov = dev->sriov;
 
@@ -149,8 +177,6 @@ static void virtfn_remove(struct pci_dev
 		__pci_reset_function(virtfn);
 	}
 
-	sprintf(buf, "virtfn%u", id);
-	sysfs_remove_link(&dev->dev.kobj, buf);
 	/*
 	 * pci_stop_dev() could have been called for this virtfn already,
 	 * so the directory for the virtfn may have been removed before.
@@ -161,12 +187,10 @@ static void virtfn_remove(struct pci_dev
 
 	mutex_lock(&iov->dev->sriov->lock);
 	pci_stop_and_remove_bus_device(virtfn);
-	virtfn_remove_bus(dev->bus, virtfn->bus);
 	mutex_unlock(&iov->dev->sriov->lock);
 
 	/* balance pci_get_domain_bus_and_slot() */
 	pci_dev_put(virtfn);
-	pci_dev_put(dev);
 }
 
 static int sriov_migration(struct pci_dev *dev)
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -262,6 +262,7 @@ int pci_iov_resource_bar(struct pci_dev
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
 void pci_restore_iov_state(struct pci_dev *dev);
 int pci_iov_bus_range(struct pci_bus *bus);
+void virtfn_release(struct pci_dev *dev);
 
 #else
 static inline int pci_iov_init(struct pci_dev *dev)
@@ -284,6 +285,9 @@ static inline int pci_iov_bus_range(stru
 {
 	return 0;
 }
+void virtfn_release(struct pci_dev *dev)
+{
+}
 
 #endif /* CONFIG_PCI_IOV */
 
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -38,6 +38,8 @@ static void pci_destroy_dev(struct pci_d
 	list_del(&dev->bus_list);
 	up_write(&pci_bus_sem);
 
+	virtfn_release(dev);
+
 	pci_free_resources(dev);
 	put_device(&dev->dev);
 }
@@ -107,8 +109,23 @@ static void pci_remove_bus_device(struct
  */
 void pci_stop_and_remove_bus_device(struct pci_dev *dev)
 {
+#ifdef CONFIG_PCI_ATS
+	struct pci_sriov *iov = NULL;
+	int locked = 0;
+
+	if (dev->is_virtfn) {
+		iov = dev->physfn->sriov;
+		locked = mutex_trylock(&iov->dev->sriov->lock);
+	}
+#endif
+
 	pci_stop_bus_device(dev);
 	pci_remove_bus_device(dev);
+
+#ifdef CONFIG_PCI_ATS
+	if (locked)
+		mutex_unlock(&iov->dev->sriov->lock);
+#endif
 }
 EXPORT_SYMBOL(pci_stop_and_remove_bus_device);
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-08-08  1:49     ` Yinghai Lu
@ 2013-08-08  1:54       ` Yinghai Lu
  2013-08-09 17:04       ` Bjorn Helgaas
  1 sibling, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-08-08  1:54 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige

On Wed, Aug 7, 2013 at 6:49 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Aug 6, 2013 at 11:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Tue, Aug 6, 2013 at 9:01 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>
>>> Huh.  I wish we didn't have virtfn_remove() at all.  I wish the
>>> normal device removal path, i.e., pci_stop_and_remove_bus_device(),
>>> could deal with VFs directly.  I don't know all the history there,
>>> so maybe there's some reason that's not feasible.
>>
>> I had one draft version, but looks more confusing.
>
> please check attached that make pci_stop_and_remove_bus_device()
> could be used with VF.

If this solution could be used, we can revert

commit dfab88beda88d6c24111e5966b08ecf813c3a18a
Author: Jiang Liu <liuj97@gmail.com>
Date:   Fri May 31 12:21:31 2013 +0800

    PCI: Hide remove and rescan sysfs interfaces for SR-IOV virtual functions


>
> It could replace
> https://patchwork.kernel.org/patch/2834638/
> https://patchwork.kernel.org/patch/2834639/
>
> Please choose one of the solutions.
>
> but we still need
> https://patchwork.kernel.org/patch/2834640/
> as VFs that does not use ARI could be on other virtual bus.
> so they will not be removed directly.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-08-08  1:49     ` Yinghai Lu
  2013-08-08  1:54       ` Yinghai Lu
@ 2013-08-09 17:04       ` Bjorn Helgaas
  2013-08-09 23:44         ` Yinghai Lu
  1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2013-08-09 17:04 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige

On Wed, Aug 07, 2013 at 06:49:25PM -0700, Yinghai Lu wrote:
> On Tue, Aug 6, 2013 at 11:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Tue, Aug 6, 2013 at 9:01 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >> Huh.  I wish we didn't have virtfn_remove() at all.  I wish the
> >> normal device removal path, i.e., pci_stop_and_remove_bus_device(),
> >> could deal with VFs directly.  I don't know all the history there,
> >> so maybe there's some reason that's not feasible.
> >
> > I had one draft version, but looks more confusing.
> 
> please check attached that make pci_stop_and_remove_bus_device()
> could be used with VF.
> 
> It could replace
> https://patchwork.kernel.org/patch/2834638/
> https://patchwork.kernel.org/patch/2834639/
> 
> Please choose one of the solutions.
> 
> but we still need
> https://patchwork.kernel.org/patch/2834640/
> as VFs that does not use ARI could be on other virtual bus.
> so they will not be removed directly.

[I inlined your patch below for convenience]

> Subject: [PATCH] PCI: Release PF ref during removing VF
> From: Yinghai Lu <yinghai@kernel.org>
> 
> commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> (PCI: Simplify IOV implementation and fix reference count races)
> broke the pcie native hotplug with SRIOV enabled: PF is not freed
> during hot-remove, and later hot-add do not work as old pci_dev
> is still around, and can not create new pci_dev.
> 
> That commit need VFs to be removed via pci_disable_sriov/virtfn_remove
> to make sure ref to PF is put back.
> 
> So we can not call stop_and_remove for VF before PF, that will
> make VF get removed directly before PF's driver try to use
> pci_disable_sriov/virtfn_remove to do it.
> 
> Solution would be
> 1. separating stop and remove in two iterations.
> 2. or make pci_stop_and_remove_bus_device could be used on VF.
> 
> This patch is second solution:
> Separate PF ref releasing from virtfn_remove, all that during
> pci_destroy_dev for VFs.

I like the second approach better, but I think the call path and
locking is way too messy:

    virtfn_remove
      mutex_lock(&iov->dev->sriov->lock)          <--
      pci_stop_and_remove_bus_device
        mutex_trylock(&iov->dev->sriov->lock)     <--
        pci_stop_bus_device
        pci_remove_bus_device
          pci_destroy_dev
            virtfn_release
              virtfn_remove_bus
              pci_dev_put

I think it could be fixed, but it would require significant SR-IOV
rework, and I'm dubious that we can get it done in time for v3.11.

What would happen if we just reverted dc087f2f6a2 for now?  Jiang?

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  drivers/pci/iov.c    |   34 +++++++++++++++++++++++++++++-----
>  drivers/pci/pci.h    |    4 ++++
>  drivers/pci/remove.c |   17 +++++++++++++++++
>  3 files changed, 50 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/drivers/pci/iov.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/iov.c
> +++ linux-2.6/drivers/pci/iov.c
> @@ -132,9 +132,37 @@ failed:
>  	return rc;
>  }
>  
> -static void virtfn_remove(struct pci_dev *dev, int id, int reset)
> +void virtfn_release(struct pci_dev *virtfn)
>  {
> +	int i;
> +	struct pci_dev *dev;
>  	char buf[VIRTFN_ID_LEN];
> +
> +	if (!virtfn->is_virtfn)
> +		return;
> +
> +	dev = virtfn->physfn;
> +
> +	/*
> +	 * Remove link to virtfn for PF.
> +	 * pci_disable_sriov() could be called after pci_stop_dev() is
> +	 * called for PF. So check sd to avoid spurious sfsfs warning.
> +	 */
> +	if (dev->dev.kobj.sd)
> +		for (i = 0; i < dev->sriov->num_VFs; i++)
> +			if ((virtfn_bus(dev, i) == virtfn->bus->number) &&
> +			    (virtfn_devfn(dev, i) == virtfn->devfn)) {
> +				sprintf(buf, "virtfn%u", i);
> +				sysfs_remove_link(&dev->dev.kobj, buf);
> +				break;
> +			}
> +
> +	virtfn_remove_bus(dev->bus, virtfn->bus);
> +	pci_dev_put(dev);
> +}
> +
> +static void virtfn_remove(struct pci_dev *dev, int id, int reset)
> +{
>  	struct pci_dev *virtfn;
>  	struct pci_sriov *iov = dev->sriov;
>  
> @@ -149,8 +177,6 @@ static void virtfn_remove(struct pci_dev
>  		__pci_reset_function(virtfn);
>  	}
>  
> -	sprintf(buf, "virtfn%u", id);
> -	sysfs_remove_link(&dev->dev.kobj, buf);
>  	/*
>  	 * pci_stop_dev() could have been called for this virtfn already,
>  	 * so the directory for the virtfn may have been removed before.
> @@ -161,12 +187,10 @@ static void virtfn_remove(struct pci_dev
>  
>  	mutex_lock(&iov->dev->sriov->lock);
>  	pci_stop_and_remove_bus_device(virtfn);
> -	virtfn_remove_bus(dev->bus, virtfn->bus);
>  	mutex_unlock(&iov->dev->sriov->lock);
>  
>  	/* balance pci_get_domain_bus_and_slot() */
>  	pci_dev_put(virtfn);
> -	pci_dev_put(dev);
>  }
>  
>  static int sriov_migration(struct pci_dev *dev)
> Index: linux-2.6/drivers/pci/pci.h
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pci.h
> +++ linux-2.6/drivers/pci/pci.h
> @@ -262,6 +262,7 @@ int pci_iov_resource_bar(struct pci_dev
>  resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
>  void pci_restore_iov_state(struct pci_dev *dev);
>  int pci_iov_bus_range(struct pci_bus *bus);
> +void virtfn_release(struct pci_dev *dev);
>  
>  #else
>  static inline int pci_iov_init(struct pci_dev *dev)
> @@ -284,6 +285,9 @@ static inline int pci_iov_bus_range(stru
>  {
>  	return 0;
>  }
> +void virtfn_release(struct pci_dev *dev)
> +{
> +}
>  
>  #endif /* CONFIG_PCI_IOV */
>  
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -38,6 +38,8 @@ static void pci_destroy_dev(struct pci_d
>  	list_del(&dev->bus_list);
>  	up_write(&pci_bus_sem);
>  
> +	virtfn_release(dev);
> +
>  	pci_free_resources(dev);
>  	put_device(&dev->dev);
>  }
> @@ -107,8 +109,23 @@ static void pci_remove_bus_device(struct
>   */
>  void pci_stop_and_remove_bus_device(struct pci_dev *dev)
>  {
> +#ifdef CONFIG_PCI_ATS
> +	struct pci_sriov *iov = NULL;
> +	int locked = 0;
> +
> +	if (dev->is_virtfn) {
> +		iov = dev->physfn->sriov;
> +		locked = mutex_trylock(&iov->dev->sriov->lock);
> +	}
> +#endif
> +
>  	pci_stop_bus_device(dev);
>  	pci_remove_bus_device(dev);
> +
> +#ifdef CONFIG_PCI_ATS
> +	if (locked)
> +		mutex_unlock(&iov->dev->sriov->lock);
> +#endif
>  }
>  EXPORT_SYMBOL(pci_stop_and_remove_bus_device);
>  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-08-09 17:04       ` Bjorn Helgaas
@ 2013-08-09 23:44         ` Yinghai Lu
  2013-08-14 19:44           ` Bjorn Helgaas
  0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-08-09 23:44 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige

On Fri, Aug 9, 2013 at 10:04 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:

>> Solution would be
>> 1. separating stop and remove in two iterations.
>> 2. or make pci_stop_and_remove_bus_device could be used on VF.
>>
>> This patch is second solution:
>> Separate PF ref releasing from virtfn_remove, all that during
>> pci_destroy_dev for VFs.
>
> I like the second approach better, but I think the call path and
> locking is way too messy:
>
>     virtfn_remove
>       mutex_lock(&iov->dev->sriov->lock)          <--
>       pci_stop_and_remove_bus_device
>         mutex_trylock(&iov->dev->sriov->lock)     <--

yes, original sriov remove code have that, so try to save the lock holding

so if VF is removed directly other than virtfn_remove, those lock still get
hold in the process.

>         pci_stop_bus_device
>         pci_remove_bus_device
>           pci_destroy_dev
>             virtfn_release
>               virtfn_remove_bus
>               pci_dev_put
>
> I think it could be fixed, but it would require significant SR-IOV
> rework, and I'm dubious that we can get it done in time for v3.11.
>
> What would happen if we just reverted dc087f2f6a2 for now?  Jiang?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-08-09 23:44         ` Yinghai Lu
@ 2013-08-14 19:44           ` Bjorn Helgaas
  2013-08-14 20:15             ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2013-08-14 19:44 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige, Don Dutile

[+cc Don]

On Fri, Aug 9, 2013 at 5:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Aug 9, 2013 at 10:04 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:

>> What would happen if we just reverted dc087f2f6a2 for now?  Jiang?

I tried to reproduce the problem you reported ("Cannot add device at
0000:02:00"), but I haven't been able to.  I have an ixgbe SR-IOV
device in an external hotplug-capable chassis, and I'm using v3.11-rc4
with a patch to add a little instrumentation.  Here's what I see:

    # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
    ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs
    pci 0000:08:10.0: [8086:1515] type 00 class 0x020000        # VF1
    pci 0000:08:10.2: [8086:1515] type 00 class 0x020000        # VF2
    ...
    pci 0000:08:11.6: [8086:1515] type 00 class 0x020000        # VF8

Created 8 VFs; no problem there.

    # echo -n 0 > /sys/bus/pci/slots/4-1/power
    pciehp 0000:04:03.0:pcie24: disable_slot: physical_slot = 4-1
    pciehp 0000:04:03.0:pcie24: pciehp_unconfigure_device:
domain:bus:dev = 0000:08:00
    ixgbevf 0000:08:11.6: pci_stop_and_remove_bus_device
    pci 0000:08:11.6: pci_release_dev                   # release VF8
    ixgbevf 0000:08:11.4: pci_stop_and_remove_bus_device
    pci 0000:08:11.4: pci_release_dev                   # release VF7
    ...
    ixgbevf 0000:08:10.0: pci_stop_and_remove_bus_device
    pci 0000:08:10.0: pci_release_dev                   # release VF1
    ixgbe 0000:08:00.1: pci_stop_and_remove_bus_device
    pci 0000:08:00.1: pci_release_dev                   # release PF1
    ixgbe 0000:08:00.0: pci_stop_and_remove_bus_device  # stop/remove PF0
                                                        # PF0 not released

When I turn off power to the slot, pciehp stops and removes all the
devices, starting with the VFs.  The PF 08:00.0 is stopped but not
released, because we didn't call virtfn_remove(), so the reference
held by the VFs was never released.  This is a problem.

   # echo -n 1 > /sys/bus/pci/slots/4-1/power
    pci 0000:08:00.0: [8086:1528] type 00 class 0x020000
    pci 0000:08:00.1: [8086:1528] type 00 class 0x020000

When I turn the slot back on, we find both PFs; no apparent problem there.

On v3.10 (which does not contain dc087f2f6a), I see essentially the
same thing.  The PFs are stopped before the VFs because v3.10 doesn't
contain 29ed1f29b68, which removes devices in reverse order.  But even
in v3.10, PF0 is stopped but not released because the VFs still hold
references to it.  And there's no complaint when turning the slot back
on.

So we definitely have a problem: when we remove the PF, we don't
release it correctly because we don't release the references held by
VFs.  But this is not a regression, and doesn't seem urgent enough to
rush something for v3.11.

I think we're headed toward tearing down all VFs when removing a PF.
Obviously, we *have* to do that when powering off the device, and I
think we also need to do it when removing the PF driver just to keep
all the bookkeeping sane.  I don't know how much heartburn this will
cause Don.

So tell me if you still think we need to do something for v3.11.

Bjorn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-08-14 19:44           ` Bjorn Helgaas
@ 2013-08-14 20:15             ` Yinghai Lu
  2013-08-14 20:36               ` Bjorn Helgaas
  0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-08-14 20:15 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige, Don Dutile

[-- Attachment #1: Type: text/plain, Size: 3770 bytes --]

On Wed, Aug 14, 2013 at 12:44 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Don]
>
> On Fri, Aug 9, 2013 at 5:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, Aug 9, 2013 at 10:04 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
>>> What would happen if we just reverted dc087f2f6a2 for now?  Jiang?
>
> I tried to reproduce the problem you reported ("Cannot add device at
> 0000:02:00"), but I haven't been able to.  I have an ixgbe SR-IOV
> device in an external hotplug-capable chassis, and I'm using v3.11-rc4
> with a patch to add a little instrumentation.  Here's what I see:
>
>     # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
>     ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs
>     pci 0000:08:10.0: [8086:1515] type 00 class 0x020000        # VF1
>     pci 0000:08:10.2: [8086:1515] type 00 class 0x020000        # VF2
>     ...
>     pci 0000:08:11.6: [8086:1515] type 00 class 0x020000        # VF8
>
> Created 8 VFs; no problem there.
>
>     # echo -n 0 > /sys/bus/pci/slots/4-1/power
>     pciehp 0000:04:03.0:pcie24: disable_slot: physical_slot = 4-1
>     pciehp 0000:04:03.0:pcie24: pciehp_unconfigure_device:
> domain:bus:dev = 0000:08:00
>     ixgbevf 0000:08:11.6: pci_stop_and_remove_bus_device
>     pci 0000:08:11.6: pci_release_dev                   # release VF8
>     ixgbevf 0000:08:11.4: pci_stop_and_remove_bus_device
>     pci 0000:08:11.4: pci_release_dev                   # release VF7
>     ...
>     ixgbevf 0000:08:10.0: pci_stop_and_remove_bus_device
>     pci 0000:08:10.0: pci_release_dev                   # release VF1
>     ixgbe 0000:08:00.1: pci_stop_and_remove_bus_device
>     pci 0000:08:00.1: pci_release_dev                   # release PF1
>     ixgbe 0000:08:00.0: pci_stop_and_remove_bus_device  # stop/remove PF0
>                                                         # PF0 not released
>
> When I turn off power to the slot, pciehp stops and removes all the
> devices, starting with the VFs.  The PF 08:00.0 is stopped but not
> released, because we didn't call virtfn_remove(), so the reference
> held by the VFs was never released.  This is a problem.
>
>    # echo -n 1 > /sys/bus/pci/slots/4-1/power
>     pci 0000:08:00.0: [8086:1528] type 00 class 0x020000
>     pci 0000:08:00.1: [8086:1528] type 00 class 0x020000
>
> When I turn the slot back on, we find both PFs; no apparent problem there.
>
> On v3.10 (which does not contain dc087f2f6a), I see essentially the
> same thing.  The PFs are stopped before the VFs because v3.10 doesn't
> contain 29ed1f29b68, which removes devices in reverse order.  But even
> in v3.10, PF0 is stopped but not released because the VFs still hold
> references to it.  And there's no complaint when turning the slot back
> on.
>
> So we definitely have a problem: when we remove the PF, we don't
> release it correctly because we don't release the references held by
> VFs.  But this is not a regression, and doesn't seem urgent enough to
> rush something for v3.11.
>
> I think we're headed toward tearing down all VFs when removing a PF.
> Obviously, we *have* to do that when powering off the device, and I
> think we also need to do it when removing the PF driver just to keep
> all the bookkeeping sane.  I don't know how much heartburn this will
> cause Don.
>
> So tell me if you still think we need to do something for v3.11.

in your test: the PF is not get freed but it is removed for bus's device list.
so pciehp_configure_device will create new one even the old is not
really released.

Please check with attached patch, that only release that from link-list in
pci_release device. I posted it a while back.
After it get applied, you will not have chance to create new device in
pciehp_configure_device.

Thanks

Yinghai

[-- Attachment #2: fix_racing_removing_6_2_before.patch --]
[-- Type: application/octet-stream, Size: 2517 bytes --]

Subject: [PATCH 2/7] PCI: move resources and bus_list releasing to pci_release_dev
From: Yinghai Lu <yinghai@kernel.org>

We should not release resource in pci_destroy that is too early
as there could be still other use hold reference.

release them or remove it from bus devices list at last
in pci_release_dev instead.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/probe.c  |   22 ++++++++++++++++++++--
 drivers/pci/remove.c |   19 -------------------
 2 files changed, 20 insertions(+), 21 deletions(-)

Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1150,6 +1150,20 @@ static void pci_release_capabilities(str
 	pci_free_cap_save_buffers(dev);
 }
 
+static void pci_free_resources(struct pci_dev *dev)
+{
+	int i;
+
+	msi_remove_pci_irq_vectors(dev);
+
+	pci_cleanup_rom(dev);
+	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+		struct resource *res = dev->resource + i;
+		if (res->parent)
+			release_resource(res);
+	}
+}
+
 /**
  * pci_release_dev - free a pci device structure when all users of it are finished.
  * @dev: device that's been disconnected
@@ -1159,9 +1173,13 @@ static void pci_release_capabilities(str
  */
 static void pci_release_dev(struct device *dev)
 {
-	struct pci_dev *pci_dev;
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
+	down_write(&pci_bus_sem);
+	list_del(&pci_dev->bus_list);
+	up_write(&pci_bus_sem);
+	pci_free_resources(pci_dev);
 
-	pci_dev = to_pci_dev(dev);
 	pci_release_capabilities(pci_dev);
 	pci_release_of_node(pci_dev);
 	pcibios_release_device(pci_dev);
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -3,20 +3,6 @@
 #include <linux/pci-aspm.h>
 #include "pci.h"
 
-static void pci_free_resources(struct pci_dev *dev)
-{
-	int i;
-
- 	msi_remove_pci_irq_vectors(dev);
-
-	pci_cleanup_rom(dev);
-	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-		struct resource *res = dev->resource + i;
-		if (res->parent)
-			release_resource(res);
-	}
-}
-
 static void pci_stop_dev(struct pci_dev *dev)
 {
 	pci_pme_active(dev, false);
@@ -36,11 +22,6 @@ static void pci_stop_dev(struct pci_dev
 
 static void pci_destroy_dev(struct pci_dev *dev)
 {
-	down_write(&pci_bus_sem);
-	list_del(&dev->bus_list);
-	up_write(&pci_bus_sem);
-
-	pci_free_resources(dev);
 	put_device(&dev->dev);
 }
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-08-14 20:15             ` Yinghai Lu
@ 2013-08-14 20:36               ` Bjorn Helgaas
  2013-08-14 20:58                 ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2013-08-14 20:36 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige, Don Dutile

On Wed, Aug 14, 2013 at 01:15:35PM -0700, Yinghai Lu wrote:
> On Wed, Aug 14, 2013 at 12:44 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > [+cc Don]
> >
> > On Fri, Aug 9, 2013 at 5:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> On Fri, Aug 9, 2013 at 10:04 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >
> >>> What would happen if we just reverted dc087f2f6a2 for now?  Jiang?
> >
> > I tried to reproduce the problem you reported ("Cannot add device at
> > 0000:02:00"), but I haven't been able to.  I have an ixgbe SR-IOV
> > device in an external hotplug-capable chassis, and I'm using v3.11-rc4
> > with a patch to add a little instrumentation.  Here's what I see:
> >
> >     # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
> >     ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs
> >     pci 0000:08:10.0: [8086:1515] type 00 class 0x020000        # VF1
> >     pci 0000:08:10.2: [8086:1515] type 00 class 0x020000        # VF2
> >     ...
> >     pci 0000:08:11.6: [8086:1515] type 00 class 0x020000        # VF8
> >
> > Created 8 VFs; no problem there.
> >
> >     # echo -n 0 > /sys/bus/pci/slots/4-1/power
> >     pciehp 0000:04:03.0:pcie24: disable_slot: physical_slot = 4-1
> >     pciehp 0000:04:03.0:pcie24: pciehp_unconfigure_device:
> > domain:bus:dev = 0000:08:00
> >     ixgbevf 0000:08:11.6: pci_stop_and_remove_bus_device
> >     pci 0000:08:11.6: pci_release_dev                   # release VF8
> >     ixgbevf 0000:08:11.4: pci_stop_and_remove_bus_device
> >     pci 0000:08:11.4: pci_release_dev                   # release VF7
> >     ...
> >     ixgbevf 0000:08:10.0: pci_stop_and_remove_bus_device
> >     pci 0000:08:10.0: pci_release_dev                   # release VF1
> >     ixgbe 0000:08:00.1: pci_stop_and_remove_bus_device
> >     pci 0000:08:00.1: pci_release_dev                   # release PF1
> >     ixgbe 0000:08:00.0: pci_stop_and_remove_bus_device  # stop/remove PF0
> >                                                         # PF0 not released
> >
> > When I turn off power to the slot, pciehp stops and removes all the
> > devices, starting with the VFs.  The PF 08:00.0 is stopped but not
> > released, because we didn't call virtfn_remove(), so the reference
> > held by the VFs was never released.  This is a problem.
> >
> >    # echo -n 1 > /sys/bus/pci/slots/4-1/power
> >     pci 0000:08:00.0: [8086:1528] type 00 class 0x020000
> >     pci 0000:08:00.1: [8086:1528] type 00 class 0x020000
> >
> > When I turn the slot back on, we find both PFs; no apparent problem there.
> >
> > On v3.10 (which does not contain dc087f2f6a), I see essentially the
> > same thing.  The PFs are stopped before the VFs because v3.10 doesn't
> > contain 29ed1f29b68, which removes devices in reverse order.  But even
> > in v3.10, PF0 is stopped but not released because the VFs still hold
> > references to it.  And there's no complaint when turning the slot back
> > on.
> >
> > So we definitely have a problem: when we remove the PF, we don't
> > release it correctly because we don't release the references held by
> > VFs.  But this is not a regression, and doesn't seem urgent enough to
> > rush something for v3.11.
> >
> > I think we're headed toward tearing down all VFs when removing a PF.
> > Obviously, we *have* to do that when powering off the device, and I
> > think we also need to do it when removing the PF driver just to keep
> > all the bookkeeping sane.  I don't know how much heartburn this will
> > cause Don.
> >
> > So tell me if you still think we need to do something for v3.11.
> 
> in your test: the PF is not get freed but it is removed for bus's device list.
> so pciehp_configure_device will create new one even the old is not
> really released.

Yes.  As I said above, I do see the leak of the PF pci_dev, and
we certainly do need to fix that, but it's not a regression and
doesn't seem urgent.

> Please check with attached patch, that only release that from link-list in
> pci_release device. I posted it a while back.
> After it get applied, you will not have chance to create new device in
> pciehp_configure_device.

[I inlined your patch for convenience]

I agree that the patch below looks like a good thing to do.
However, it is not in v3.11.  So if the problem only happens when
the patch below is applied, we don't need to fix the problem in
v3.11.

I've spent a LOT of time trying to reproduce this problem and
convince myself that we need to do something in v3.11.  If you're
telling me that my time was wasted because the problem does not
happen in v3.11-rc, but only happens with your out-of-tree patch,
I'm going to be very upset.

Bjorn

> Subject: [PATCH 2/7] PCI: move resources and bus_list releasing to pci_release_dev
> From: Yinghai Lu <yinghai@kernel.org>
> 
> We should not release resource in pci_destroy that is too early
> as there could be still other use hold reference.
> 
> release them or remove it from bus devices list at last
> in pci_release_dev instead.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  drivers/pci/probe.c  |   22 ++++++++++++++++++++--
>  drivers/pci/remove.c |   19 -------------------
>  2 files changed, 20 insertions(+), 21 deletions(-)
> 
> Index: linux-2.6/drivers/pci/probe.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/probe.c
> +++ linux-2.6/drivers/pci/probe.c
> @@ -1150,6 +1150,20 @@ static void pci_release_capabilities(str
>  	pci_free_cap_save_buffers(dev);
>  }
>  
> +static void pci_free_resources(struct pci_dev *dev)
> +{
> +	int i;
> +
> +	msi_remove_pci_irq_vectors(dev);
> +
> +	pci_cleanup_rom(dev);
> +	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> +		struct resource *res = dev->resource + i;
> +		if (res->parent)
> +			release_resource(res);
> +	}
> +}
> +
>  /**
>   * pci_release_dev - free a pci device structure when all users of it are finished.
>   * @dev: device that's been disconnected
> @@ -1159,9 +1173,13 @@ static void pci_release_capabilities(str
>   */
>  static void pci_release_dev(struct device *dev)
>  {
> -	struct pci_dev *pci_dev;
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +
> +	down_write(&pci_bus_sem);
> +	list_del(&pci_dev->bus_list);
> +	up_write(&pci_bus_sem);
> +	pci_free_resources(pci_dev);
>  
> -	pci_dev = to_pci_dev(dev);
>  	pci_release_capabilities(pci_dev);
>  	pci_release_of_node(pci_dev);
>  	pcibios_release_device(pci_dev);
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -3,20 +3,6 @@
>  #include <linux/pci-aspm.h>
>  #include "pci.h"
>  
> -static void pci_free_resources(struct pci_dev *dev)
> -{
> -	int i;
> -
> - 	msi_remove_pci_irq_vectors(dev);
> -
> -	pci_cleanup_rom(dev);
> -	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> -		struct resource *res = dev->resource + i;
> -		if (res->parent)
> -			release_resource(res);
> -	}
> -}
> -
>  static void pci_stop_dev(struct pci_dev *dev)
>  {
>  	pci_pme_active(dev, false);
> @@ -36,11 +22,6 @@ static void pci_stop_dev(struct pci_dev
>  
>  static void pci_destroy_dev(struct pci_dev *dev)
>  {
> -	down_write(&pci_bus_sem);
> -	list_del(&dev->bus_list);
> -	up_write(&pci_bus_sem);
> -
> -	pci_free_resources(dev);
>  	put_device(&dev->dev);
>  }
>  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp
  2013-08-14 20:36               ` Bjorn Helgaas
@ 2013-08-14 20:58                 ` Yinghai Lu
  0 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-08-14 20:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Yijing Wang, Jiang Liu, Kenji Kaneshige, Don Dutile

>> in your test: the PF is not get freed but it is removed for bus's device list.
>> so pciehp_configure_device will create new one even the old is not
>> really released.
>
> Yes.  As I said above, I do see the leak of the PF pci_dev, and
> we certainly do need to fix that, but it's not a regression and
> doesn't seem urgent.
>
>> Please check with attached patch, that only release that from link-list in
>> pci_release device. I posted it a while back.
>> After it get applied, you will not have chance to create new device in
>> pciehp_configure_device.
>
> [I inlined your patch for convenience]
>
> I agree that the patch below looks like a good thing to do.
> However, it is not in v3.11.  So if the problem only happens when
> the patch below is applied, we don't need to fix the problem in
> v3.11.

ok.

>
> I've spent a LOT of time trying to reproduce this problem and
> convince myself that we need to do something in v3.11.  If you're
> telling me that my time was wasted because the problem does not
> happen in v3.11-rc, but only happens with your out-of-tree patch,
> I'm going to be very upset.

sorry, I did not notice that. I should test without that patch.

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-08-14 20:58 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-27 15:11 [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp Yinghai Lu
2013-07-27 15:11 ` [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that Yinghai Lu
2013-07-29 19:58   ` Bjorn Helgaas
2013-07-29 20:32     ` Don Dutile
2013-07-29 21:07       ` Alexander Duyck
2013-07-29 21:31         ` Don Dutile
2013-07-29 21:52           ` Jeff Kirsher
2013-07-29 23:14             ` Yinghai Lu
2013-07-29 23:23               ` Jeff Kirsher
2013-07-30 15:04                 ` Don Dutile
2013-08-01 20:16       ` Bjorn Helgaas
2013-08-01 21:21         ` Don Dutile
2013-08-01 21:41           ` Bjorn Helgaas
2013-08-01 21:55           ` Alexander Duyck
2013-07-27 15:11 ` [PATCH v2 1/3] PCI/pciehp: Separate VGA checking out from loop Yinghai Lu
2013-08-07  4:01 ` [PATCH v2 2/3] PCI: Separate stop and remove devices in pciehp Bjorn Helgaas
2013-08-07  6:34   ` Yinghai Lu
2013-08-08  1:49     ` Yinghai Lu
2013-08-08  1:54       ` Yinghai Lu
2013-08-09 17:04       ` Bjorn Helgaas
2013-08-09 23:44         ` Yinghai Lu
2013-08-14 19:44           ` Bjorn Helgaas
2013-08-14 20:15             ` Yinghai Lu
2013-08-14 20:36               ` Bjorn Helgaas
2013-08-14 20:58                 ` Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.