All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V10 00/12] VF EEH on Power8
@ 2015-10-26  3:15 Wei Yang
  2015-10-26  3:15 ` [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove Wei Yang
                   ` (12 more replies)
  0 siblings, 13 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

This patchset enables EEH on SRIOV VFs. The general idea is to create proper
VF edev and VF PE and handle them properly.

Different from the Bus PE, VF PE just contain one VF. This introduces the
difference of EEH error handling on a VF PE. Generally, it has several
differences.

First, the VF's removal and re-enumerate rely on its PF. VF has a tight
relationship between its PF. This is not proper to enumerate a VF by usual
scan procedure. That's why virtfn_add/virtfn_remove are exported in this patch
set.

Second, the reset/restore of a VF is done in kernel space. FW is not aware of
the VF, this means the usual reset function done in FW will not work. One of
the patch will imitate the reset/restore function in kernel space.

Third, the VF may be removed during the PF's error_detected function. In this
case, the original error_detected->slot_reset->resume sequence is not proper
to those removed VFs, since they are re-created by PF in a fresh state. A flag
in eeh_dev is introduce to mark the eeh_dev is in error state. By doing so, we
track whether this device needs to be reset or not.

This has been tested both on host and in guest on Power8 with latest kernel
version.

v10:
   * delete the last patch "powerpc/powernv: compound PE for VFs" since after
     redesign of SRIOV, there is no compound PE for VFs now.
   * add two patches which fix problems found during tests
     powerpc/eeh: Support error recovery for VF PE                                 
     powerpc/eeh: Handle hot removed VF when PF is EEH aware
v9:
   * split pcibios_bus_add_device() into a separate patch
   * Bjorn acked the PCI part and agreed this patch set to be merged from ppc
     tree
   * rebased on mpe/linux.git next branch
v8:
   * fix on checking the return value of pnv_eeh_do_flr()
   * introduced a weak function pcibios_bus_add_device() to create PE for VFs
v7:
   * fix compile error when PCI_IOV is not set
v6:
   * code / commit log refactor by Gavin
v5:
   * remove the compound field, iterate on Master VF PE instead
   * some code refine on PCI config restore and reset on VF
     the wait time for assert and deassert
     PCI device address format
     check on edev->pcie_cap and edev->aer_cap before access them
v4:
   * refine the change logs, comment and code style
   * change pnv_pci_fixup_vf_eeh() to pnv_eeh_vf_final_fixup() and remove the
     CONFIG_PCI_IOV macro
   * reorder patch 5/6 to make the logic more reasonable
   * remove remove_dev_pci_data()
   * remove the EEH_DEV_VF flag, use edev->physfn to identify a VF EEH DEV and
     remove related CONFIG_PCI_IOV macro
   * add the option for VF reset
   * fix the pnv_eeh_cfg_blocked() logic
   * replace pnv_pci_cfg_{read,write} with eeh_ops->{read,write}_config in
     pnv_eeh_vf_restore_config()
   * rename pnv_eeh_vf_restore_config() to pnv_eeh_restore_vf_config()
   * rename pnv_pci_fixup_vf_caps() to pnv_pci_vf_header_fixup() and move it
     to arch/powerpc/platforms/powernv/pci.c
   * add a field compound in pnv_ioda_pe to link compound PEs
   * handle compound PE for VF PEs
v3:
   * add back vf_index in pci_dn to track the VF's index
   * rename ppdev in eeh_dev to physfn for consistency
   * move edev->physfn assignment before dev->dev.archdata.edev is set
   * move pnv_pci_fixup_vf_eeh() and pnv_pci_fixup_vf_caps() to eeh-powernv.c
   * more clear and detail in commit log and comment in code
   * merge eeh_rmv_virt_device() with eeh_rmv_device()
   * move the cfg_blocked check logic from pnv_eeh_read/write_config() to
     pnv_eeh_cfg_blocked()
   * move the vf reset/restore logic into its own patch, two patches are
     created.
     powerpc/powernv: Support PCI config restore for VFs
     powerpc/powernv: Support EEH reset for VFs
   * simplify the vf reset logic
v2:
   * add prefix pci_iov_ to virtfn_add/virtfn_remove
   * use EEH_DEV_VF as a flag for a VF's eeh_dev
   * use eeh_dev instead of edev in change log
   * remove vf_index in eeh_dev, calculate it from pdn->busno and devfn
   * do eeh_add_device_late() and eeh_sysfs_add_device() both after pci_dev is
     well initialized
   * do FLR to reset a VF PE
   * imitate the restore function in FW for VF
   * remove the reverse order patch, since it is still under discussion

Gavin Shan (1):
  powerpc/eeh: Don't block PCI config on resetting VF PE

Wei Yang (11):
  PCI/IOV: Rename and export virtfn_add/virtfn_remove
  PCI: Add pcibios_bus_add_device() weak function
  powerpc/pci: Cache VF index in pci_dn
  powerpc/pci: Remove VFs prior to PF
  powerpc/eeh: Cache only BARs, not windows or IOV BARs
  powerpc/powernv: EEH device for VF
  powerpc/eeh: Create PE for VFs
  powerpc/powernv: Support EEH reset for VF PE
  powerpc/powernv: Support PCI config restore for VFs
  powerpc/eeh: Support error recovery for VF PE
  powerpc/eeh: Handle hot removed VF when PF is EEH aware

 arch/powerpc/include/asm/eeh.h               |  10 ++
 arch/powerpc/include/asm/pci-bridge.h        |   2 +
 arch/powerpc/kernel/eeh.c                    |  17 ++-
 arch/powerpc/kernel/eeh_cache.c              |   6 +-
 arch/powerpc/kernel/eeh_dev.c                |   1 +
 arch/powerpc/kernel/eeh_driver.c             | 130 ++++++++++++----
 arch/powerpc/kernel/eeh_pe.c                 |  13 +-
 arch/powerpc/kernel/pci-hotplug.c            |   2 +-
 arch/powerpc/kernel/pci_dn.c                 |  16 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 220 ++++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.c         |  18 +++
 drivers/pci/bus.c                            |   3 +
 drivers/pci/iov.c                            |  10 +-
 include/linux/pci.h                          |   8 +
 14 files changed, 408 insertions(+), 48 deletions(-)

-- 
2.5.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
@ 2015-10-26  3:15 ` Wei Yang
  2015-10-27  1:31   ` Andrew Donnellan
  2015-10-27 23:06   ` Bjorn Helgaas
  2015-10-26  3:15 ` [PATCH V10 02/12] PCI: Add pcibios_bus_add_device() weak function Wei Yang
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

During EEH recovery, hotplug is applied to the devices which don't
have drivers or their drivers don't support EEH. However, the hotplug,
which was implemented based on PCI bus, can't be applied to VF directly.

The patch renames virtn_{add,remove}() and exports them so that they
can be used in PCI hotplug during EEH recovery.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/iov.c   | 10 +++++-----
 include/linux/pci.h |  8 ++++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..cc941dd 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 	return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
 
-static int virtfn_add(struct pci_dev *dev, int id, int reset)
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
 {
 	int i;
 	int rc = -ENOMEM;
@@ -183,7 +183,7 @@ failed:
 	return rc;
 }
 
-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
 {
 	char buf[VIRTFN_ID_LEN];
 	struct pci_dev *virtfn;
@@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	}
 
 	for (i = 0; i < initial; i++) {
-		rc = virtfn_add(dev, i, 0);
+		rc = pci_iov_virtfn_add(dev, i, 0);
 		if (rc)
 			goto failed;
 	}
@@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 
 failed:
 	for (j = 0; j < i; j++)
-		virtfn_remove(dev, j, 0);
+		pci_iov_virtfn_remove(dev, j, 0);
 
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
 	pci_cfg_access_lock(dev);
@@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
 		return;
 
 	for (i = 0; i < iov->num_VFs; i++)
-		virtfn_remove(dev, i, 0);
+		pci_iov_virtfn_remove(dev, i, 0);
 
 	pcibios_sriov_disable(dev);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 860c751..b854a5f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1669,6 +1669,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
 
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
 int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
@@ -1686,6 +1688,12 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void pci_disable_sriov(struct pci_dev *dev) { }
+static inline int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
+{
+	return -ENOSYS;
+}
+static inline void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
+{ }
 static inline int pci_num_vf(struct pci_dev *dev) { return 0; }
 static inline int pci_vfs_assigned(struct pci_dev *dev)
 { return 0; }
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 02/12] PCI: Add pcibios_bus_add_device() weak function
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
  2015-10-26  3:15 ` [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove Wei Yang
@ 2015-10-26  3:15 ` Wei Yang
  2015-10-27  5:07   ` Andrew Donnellan
  2015-10-26  3:15 ` [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn Wei Yang
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

This patch adds a weak function pcibios_bus_add_device() for arch dependent
code could do proper setup. For example, powerpc could setup EEH related
resources.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/bus.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 6fbd3f2..b7e30a7 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -267,6 +267,7 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx)
 
 void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { }
 
+void __weak pcibios_bus_add_device(struct pci_dev *dev) { }
 /**
  * pci_bus_add_device - start driver for a single device
  * @dev: device to add
@@ -277,6 +278,8 @@ void pci_bus_add_device(struct pci_dev *dev)
 {
 	int retval;
 
+	pcibios_bus_add_device(dev);
+
 	/*
 	 * Can not put in pci_device_add yet because resources
 	 * are not assigned yet for some devices.
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
  2015-10-26  3:15 ` [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove Wei Yang
  2015-10-26  3:15 ` [PATCH V10 02/12] PCI: Add pcibios_bus_add_device() weak function Wei Yang
@ 2015-10-26  3:15 ` Wei Yang
  2015-10-27  5:01   ` Andrew Donnellan
                     ` (2 more replies)
  2015-10-26  3:15 ` [PATCH V10 04/12] powerpc/pci: Remove VFs prior to PF Wei Yang
                   ` (9 subsequent siblings)
  12 siblings, 3 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

The patch caches the VF index in pci_dn, which can be used to calculate
VF's bus, device and function number. Those information helps to locate
the VF's PCI device instance when doing hotplug during EEH recovery if
necessary.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h | 1 +
 arch/powerpc/kernel/pci_dn.c          | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index b3a226b..3d7e537 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -210,6 +210,7 @@ struct pci_dn {
 #define IODA_INVALID_PE		(-1)
 #ifdef CONFIG_PPC_POWERNV
 	int	pe_number;
+	int     vf_index;		/* VF index in the PF */
 #ifdef CONFIG_PCI_IOV
 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
 	u16     num_vfs;		/* number of VFs enabled*/
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index b3b4df9..f771130 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
 #ifdef CONFIG_PCI_IOV
 static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 					   struct pci_dev *pdev,
+					   int vf_index,
 					   int busno, int devfn)
 {
 	struct pci_dn *pdn;
@@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 	pdn->parent = parent;
 	pdn->busno = busno;
 	pdn->devfn = devfn;
+	pdn->vf_index = vf_index;
 #ifdef CONFIG_PPC_POWERNV
 	pdn->pe_number = IODA_INVALID_PE;
 #endif
@@ -196,7 +198,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 		return NULL;
 
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
-		pdn = add_one_dev_pci_data(parent, NULL,
+		pdn = add_one_dev_pci_data(parent, NULL, i,
 					   pci_iov_virtfn_bus(pdev, i),
 					   pci_iov_virtfn_devfn(pdev, i));
 		if (!pdn) {
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 04/12] powerpc/pci: Remove VFs prior to PF
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (2 preceding siblings ...)
  2015-10-26  3:15 ` [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn Wei Yang
@ 2015-10-26  3:15 ` Wei Yang
  2015-10-30  3:04   ` Alexey Kardashevskiy
  2015-10-26  3:15 ` [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs Wei Yang
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

As commit ac205b7bb72f ("PCI: make sriov work with hotplug remove") indicates,
VFs, which might be hooked to same PCI bus as their PF should be removed
before the PF. Otherwise, the PCI hot unplugging on the PCI bus would
cause kernel crash.

The patch applies the above pattern to PowerPC PCI hotplug path.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 7f9ed0c..59c4361 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -55,7 +55,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 
 	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 		 pci_domain_nr(bus),  bus->number);
-	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
+	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
 		pr_debug("   Removing %s...\n", pci_name(dev));
 		pci_stop_and_remove_bus_device(dev);
 	}
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (3 preceding siblings ...)
  2015-10-26  3:15 ` [PATCH V10 04/12] powerpc/pci: Remove VFs prior to PF Wei Yang
@ 2015-10-26  3:15 ` Wei Yang
  2015-10-29  3:29   ` Daniel Axtens
  2015-10-30  3:22   ` Alexey Kardashevskiy
  2015-10-26  3:15 ` [PATCH V10 06/12] powerpc/powernv: EEH device for VF Wei Yang
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

EEH address cache, which helps to locate the PCI device according to
the given (physical) MMIO address, didn't cover PCI bridges. Also, it
shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
should be returned.

Also, by doing so, it removes the type check in
eeh_addr_cache_insert_dev(), since bridge's window would not be cached.

The patch restricts the address cache to cover first 7 BARs for the
above purposes.

[gwshan: changelog]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_cache.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index a1e86e1..e6887f0 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
 	}
 
 	/* Walk resources on this device, poke them into the tree */
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
 		resource_size_t start = pci_resource_start(dev,i);
 		resource_size_t end = pci_resource_end(dev,i);
 		unsigned long flags = pci_resource_flags(dev,i);
@@ -222,10 +222,6 @@ void eeh_addr_cache_insert_dev(struct pci_dev *dev)
 {
 	unsigned long flags;
 
-	/* Ignore PCI bridges */
-	if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
-		return;
-
 	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
 	__eeh_addr_cache_insert_dev(dev);
 	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 06/12] powerpc/powernv: EEH device for VF
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (4 preceding siblings ...)
  2015-10-26  3:15 ` [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs Wei Yang
@ 2015-10-26  3:15 ` Wei Yang
  2015-10-30  3:33   ` Alexey Kardashevskiy
  2015-10-26  3:15 ` [PATCH V10 07/12] powerpc/eeh: Create PE for VFs Wei Yang
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

VFs and their corresponding pci_dn instances are created and released
dynamically as their PF's SRIOV capability is enabled and disabled.
The patch creates and releases EEH devices for VFs when creating and
releasing their pci_dn instances, which means EEH devices and pci_dn
instances have same life cycle. Also, VF's EEH device is identified
by (struct eeh_dev::physfn).

[gwshan: changelog and removed CONFIG_PCI_IOV]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |  1 +
 arch/powerpc/kernel/pci_dn.c   | 12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c5eb86f..6c383ad 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -140,6 +140,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f771130..f0ddde7 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
 struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 {
 #ifdef CONFIG_PCI_IOV
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pci_dn *parent, *pdn;
+	struct eeh_dev *edev;
 	int i;
 
 	/* Only support IOV for now */
@@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 				 __func__, i);
 			return NULL;
 		}
+		eeh_dev_init(pdn, hose);
+		edev = pdn_to_eeh_dev(pdn);
+		edev->physfn = pdev;
 	}
 #endif /* CONFIG_PCI_IOV */
 
@@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
 		list_for_each_entry_safe(pdn, tmp,
 			&parent->child_list, list) {
+			struct eeh_dev *edev;
 			if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
 			    pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
 				continue;
 
+			edev = pdn_to_eeh_dev(pdn);
+			if (edev) {
+				pdn->edev = NULL;
+				kfree(edev);
+			}
+
 			if (!list_empty(&pdn->list))
 				list_del(&pdn->list);
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 07/12] powerpc/eeh: Create PE for VFs
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (5 preceding siblings ...)
  2015-10-26  3:15 ` [PATCH V10 06/12] powerpc/powernv: EEH device for VF Wei Yang
@ 2015-10-26  3:15 ` Wei Yang
  2015-10-30  3:46   ` Alexey Kardashevskiy
  2015-10-26  3:15 ` [PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE Wei Yang
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

Current EEH recovery code works with the assumption: the PE has primary
bus. Unfortunately, that's not true for VF PEs, which generally contains
one or multiple VFs (for VF group case).

The patch creates PEs for VFs in the weak function
pcibios_bus_add_device(). Those PEs for VFs are identified with newly
introduced flag EEH_PE_VF so that we handle them differently during EEH
recovery.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |  1 +
 arch/powerpc/kernel/eeh_pe.c                 | 10 ++++++++--
 arch/powerpc/platforms/powernv/eeh-powernv.c | 16 ++++++++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 6c383ad..ec21f8f 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -72,6 +72,7 @@ struct pci_dn;
 #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
 #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
 #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
+#define EEH_PE_VF	(1 << 4)	/* VF PE     */
 
 #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
 #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 35f0b62..260a701 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
 	 * EEH device already having associated PE, but
 	 * the direct parent EEH device doesn't have yet.
 	 */
-	pdn = pdn ? pdn->parent : NULL;
+	if (edev->physfn)
+		pdn = pci_get_pdn(edev->physfn);
+	else
+		pdn = pdn ? pdn->parent : NULL;
 	while (pdn) {
 		/* We're poking out of PCI territory */
 		parent = pdn_to_eeh_dev(pdn);
@@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 	}
 
 	/* Create a new EEH PE */
-	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
+	if (edev->physfn)
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
+	else
+		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
 	if (!pe) {
 		pr_err("%s: out of memory!\n", __func__);
 		return -ENOMEM;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 7cf0df8..cfd55dd 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1524,6 +1524,22 @@ static struct eeh_ops pnv_eeh_ops = {
 	.restore_config		= pnv_eeh_restore_config
 };
 
+void pcibios_bus_add_device(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/*
+	 * The following operations will fail if VF's sysfs files
+	 * aren't created or its resources aren't finalized.
+	 */
+	eeh_add_device_early(pdn);
+	eeh_add_device_late(pdev);
+	eeh_sysfs_add_device(pdev);
+}
+
 /**
  * eeh_powernv_init - Register platform dependent EEH operations
  *
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (6 preceding siblings ...)
  2015-10-26  3:15 ` [PATCH V10 07/12] powerpc/eeh: Create PE for VFs Wei Yang
@ 2015-10-26  3:15 ` Wei Yang
  2015-10-30  4:11   ` Alexey Kardashevskiy
  2015-10-26  3:15 ` [PATCH V10 09/12] powerpc/powernv: Support PCI config restore for VFs Wei Yang
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

PEs for VFs don't have primary bus. So they have to have their own reset
backend, which is used during EEH recovery. The patch implements the reset
backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
in the PE.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |   1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c | 134 ++++++++++++++++++++++++++-
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index ec21f8f..331c856 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -136,6 +136,7 @@ struct eeh_dev {
 	int pcix_cap;			/* Saved PCIx capability	*/
 	int pcie_cap;			/* Saved PCIe capability	*/
 	int aer_cap;			/* Saved AER capability		*/
+	int af_cap;			/* Saved AF capability		*/
 	struct eeh_pe *pe;		/* Associated PE		*/
 	struct list_head list;		/* Form link list in the PE	*/
 	struct pci_controller *phb;	/* Associated PHB		*/
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index cfd55dd..017cd72 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -404,6 +404,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
 	edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
 	edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
+	edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
 	if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
 		edev->mode |= EEH_DEV_BRIDGE;
 		if (edev->pcie_cap) {
@@ -893,6 +894,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
+				     u16 mask, bool af_flr_rst)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	int status, i;
+
+	/* Wait for Transaction Pending bit to be cleared */
+	for (i = 0; i < 4; i++) {
+		eeh_ops->read_config(pdn, pos, 2, &status);
+		if (!(status & mask))
+			return;
+
+		msleep((1 << i) * 100);
+	}
+
+	pr_warn("%s: Pending transaction while issuing %s FLR to "
+		"%04x:%02x:%02x.%01x\n",
+		__func__, af_flr_rst ? "AF" : "",
+		edev->phb->global_number, pdn->busno,
+		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+}
+
+static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 reg;
+
+	if (!edev->pcie_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, &reg);
+	if (!(reg & PCI_EXP_DEVCAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
+					 PCI_EXP_DEVSTA_TRPND, false);
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg |= PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     4, &reg);
+		reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      4, reg);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 cap;
+
+	if (!edev->af_cap)
+		return -ENOTTY;
+
+	eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);
+	if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
+		return -ENOTTY;
+
+	switch (option) {
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		/*
+		 * Wait for Transaction Pending bit to clear. A word-aligned
+		 * test is used, so we use the conrol offset rather than status
+		 * and shift the test bit to match.
+		 */
+		pnv_eeh_wait_for_pending(pdn, edev->af_cap + PCI_AF_CTRL,
+					 PCI_AF_STATUS_TP << 8, true);
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL,
+				      1, PCI_AF_CTRL_FLR);
+		msleep(EEH_PE_RST_HOLD_TIME);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL, 1, 0);
+		msleep(EEH_PE_RST_SETTLE_TIME);
+		break;
+	}
+
+	return 0;
+}
+
+static int pnv_eeh_reset_vf(struct pci_dn *pdn, int option)
+{
+	int ret;
+
+	ret = pnv_eeh_do_flr(pdn, option);
+	if (ret != -ENOTTY)
+		return ret;
+
+	return pnv_eeh_do_af_flr(pdn, option);
+}
+
+static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
+{
+	struct eeh_dev *edev, *tmp;
+	struct pci_dn *pdn;
+	int ret;
+
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		pdn = eeh_dev_to_pdn(edev);
+		ret = pnv_eeh_reset_vf(pdn, option);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *hose;
@@ -968,7 +1090,9 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 		}
 
 		bus = eeh_pe_bus_get(pe);
-		if (pci_is_root_bus(bus) ||
+		if (pe->type & EEH_PE_VF)
+			ret = pnv_eeh_vf_pe_reset(pe, option);
+		else if (pci_is_root_bus(bus) ||
 			pci_is_root_bus(bus->parent))
 			ret = pnv_eeh_root_reset(hose, option);
 		else
@@ -1108,6 +1232,14 @@ static inline bool pnv_eeh_cfg_blocked(struct pci_dn *pdn)
 	if (!edev || !edev->pe)
 		return false;
 
+	/*
+	 * We will issue FLR or AF FLR to all VFs, which are contained
+	 * in VF PE. It relies on the EEH PCI config accessors. So we
+	 * can't block them during the window.
+	 */
+	if ((edev->physfn) && (edev->pe->state & EEH_PE_RESET))
+		return false;
+
 	if (edev->pe->state & EEH_PE_CFG_BLOCKED)
 		return true;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 09/12] powerpc/powernv: Support PCI config restore for VFs
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (7 preceding siblings ...)
  2015-10-26  3:15 ` [PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE Wei Yang
@ 2015-10-26  3:15 ` Wei Yang
  2015-10-30  4:56   ` Alexey Kardashevskiy
  2015-10-26  3:16 ` [PATCH V10 10/12] powerpc/eeh: Support error recovery for VF PE Wei Yang
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:15 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

After PE reset, OPAL API opal_pci_reinit() is called on all devices
contained in the PE to reinitialize them. However, VFs can't be seen
from skiboot firmware. We have to implement the functions, similar
those in skiboot firmware, to reinitialize VFs after reset on PE
for VFs.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h        |  1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c | 70 +++++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.c         | 18 +++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 3d7e537..e499d93 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -219,6 +219,7 @@ struct pci_dn {
 #define IODA_INVALID_M64        (-1)
 	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
+	int     mps;
 #endif
 	struct list_head child_list;
 	struct list_head list;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 017cd72..3cc3e76 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1616,6 +1616,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
 	return ret;
 }
 
+static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+	u32 devctl, cmd, cap2, aer_capctl;
+	int old_mps;
+
+	/* Restore MPS */
+	if (edev->pcie_cap) {
+		old_mps = (ffs(pdn->mps) - 8) << 5;
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				     2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
+		devctl |= old_mps;
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				      2, devctl);
+	}
+
+	/* Disable Completion Timeout */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
+				     4, &cap2);
+		if (cap2 & 0x10) {
+			eeh_ops->read_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, &cap2);
+			cap2 |= 0x10;
+			eeh_ops->write_config(pdn,
+					edev->pcie_cap + PCI_EXP_DEVCTL2,
+					4, cap2);
+		}
+	}
+
+	/* Enable SERR and parity checking */
+	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
+	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
+	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
+
+	/* Enable report various errors */
+	if (edev->pcie_cap) {
+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, &devctl);
+		devctl &= ~PCI_EXP_DEVCTL_CERE;
+		devctl |= (PCI_EXP_DEVCTL_NFERE |
+			   PCI_EXP_DEVCTL_FERE |
+			   PCI_EXP_DEVCTL_URRE);
+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+				2, devctl);
+	}
+
+	/* Enable ECRC generation and check */
+	if (edev->pcie_cap && edev->aer_cap) {
+		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, &aer_capctl);
+		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
+		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+				4, aer_capctl);
+	}
+
+	return 0;
+}
+
 static int pnv_eeh_restore_config(struct pci_dn *pdn)
 {
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -1626,7 +1687,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
 		return -EEXIST;
 
 	phb = edev->phb->private_data;
-	ret = opal_pci_reinit(phb->opal_id,
+	/*
+	 * We have to restore the PCI config space after reset since the
+	 * firmware can't see SRIOV VFs.
+	 */
+	if (edev->physfn)
+		ret = pnv_eeh_restore_vf_config(pdn);
+	else
+		ret = opal_pci_reinit(phb->opal_id,
 			      OPAL_REINIT_PCI_DEV, edev->config_addr);
 	if (ret) {
 		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 765d8ed..0e4f42e 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -788,6 +788,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	int parent_mps;
+
+	if (!pdev->is_virtfn)
+		return;
+
+	/* Synchronize MPS for VF and PF */
+	parent_mps = pcie_get_mps(pdev->physfn);
+	if ((128 << pdev->pcie_mpss) >= parent_mps)
+		pcie_set_mps(pdev, parent_mps);
+	pdn->mps = pcie_get_mps(pdev);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
+#endif /* CONFIG_PCI_IOV */
+
 void __init pnv_pci_init(void)
 {
 	struct device_node *np;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 10/12] powerpc/eeh: Support error recovery for VF PE
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (8 preceding siblings ...)
  2015-10-26  3:15 ` [PATCH V10 09/12] powerpc/powernv: Support PCI config restore for VFs Wei Yang
@ 2015-10-26  3:16 ` Wei Yang
  2015-10-30  5:20   ` Alexey Kardashevskiy
  2015-10-26  3:16 ` [PATCH V10 11/12] powerpc/eeh: Don't block PCI config on resetting " Wei Yang
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:16 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

Different from PCI bus dependent PE, PE for VFs doesn't have the
primary bus, on which the PCI hotplug is implemented. The patch
supports error recovery, especially the PCI hotplug for VF's PE.
The hotplug on VF's PE is implemented based on VFs, instead of
PCI bus any more.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |   1 +
 arch/powerpc/kernel/eeh.c        |   8 ++++
 arch/powerpc/kernel/eeh_driver.c | 100 +++++++++++++++++++++++++++++++--------
 arch/powerpc/kernel/eeh_pe.c     |   3 +-
 4 files changed, 90 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 331c856..ea1f13c4 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -142,6 +142,7 @@ struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	int    in_error;		/* Error flag for eeh_dev	*/
 	struct pci_dev *physfn;		/* Associated PF PORT		*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
 };
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index af9b597..28e4d73 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1227,6 +1227,14 @@ void eeh_remove_device(struct pci_dev *dev)
 	 * from the parent PE during the BAR resotre.
 	 */
 	edev->pdev = NULL;
+
+	/*
+	 * The flag "in_error" is used to trace EEH devices for VFs
+	 * in error state or not. It's set in eeh_report_error(). If
+	 * it's not set, eeh_report_{reset,resume}() won't be called
+	 * for the VF EEH device.
+	 */
+	edev->in_error = 0;
 	dev->dev.archdata.edev = NULL;
 	if (!(edev->pe->state & EEH_PE_KEEP))
 		eeh_rmv_from_parent_pe(edev);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 89eb4bc..99868e2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
 
+	edev->in_error = 1;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->slot_reset ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		eeh_pcid_put(dev);
 		return NULL;
 	}
@@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		edev->mode &= ~EEH_DEV_NO_HANDLER;
-		eeh_pcid_put(dev);
-		return NULL;
+		goto out;
 	}
 
 	driver->err_handler->resume(dev);
 
+out:
+	edev->in_error = 0;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata)
 	return NULL;
 }
 
+static void *eeh_add_virt_device(void *data, void *userdata)
+{
+	struct pci_driver *driver;
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
+
+	if (!(edev->physfn)) {
+		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
+			__func__, edev->phb->global_number, pdn->busno,
+			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+		return NULL;
+	}
+
+	driver = eeh_pcid_get(dev);
+	if (driver) {
+		eeh_pcid_put(dev);
+		if (driver->err_handler)
+			return NULL;
+	}
+
+	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
+	return NULL;
+}
+
 static void *eeh_rmv_device(void *data, void *userdata)
 {
 	struct pci_driver *driver;
 	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	int *removed = (int *)userdata;
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	/*
 	 * Actually, we should remove the PCI bridges as well.
@@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (removed && driver->err_handler)
 			return NULL;
 	}
 
@@ -425,11 +455,23 @@ static void *eeh_rmv_device(void *data, void *userdata)
 		 pci_name(dev));
 	edev->bus = dev->bus;
 	edev->mode |= EEH_DEV_DISCONNECTED;
-	(*removed)++;
+	if (removed)
+		(*removed)++;
 
-	pci_lock_rescan_remove();
-	pci_stop_and_remove_bus_device(dev);
-	pci_unlock_rescan_remove();
+	if (edev->physfn) {
+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
+		edev->pdev = NULL;
+
+		/*
+		 * We have to set the VF PE number to invalid one, which is
+		 * required to plug the VF successfully.
+		 */
+		pdn->pe_number = IODA_INVALID_PE;
+	} else {
+		pci_lock_rescan_remove();
+		pci_stop_and_remove_bus_device(dev);
+		pci_unlock_rescan_remove();
+	}
 
 	return NULL;
 }
@@ -548,6 +590,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
 	struct timeval tstamp;
 	int cnt, rc, removed = 0;
+	struct eeh_dev *edev;
 
 	/* pcibios will clear the counter; save the value */
 	cnt = pe->freeze_count;
@@ -561,12 +604,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(bus);
-		pci_unlock_rescan_remove();
-	} else if (frozen_bus) {
+		if (pe->type & EEH_PE_VF)
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+		else {
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(bus);
+			pci_unlock_rescan_remove();
+		}
+	} else if (frozen_bus)
 		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
-	}
 
 	/*
 	 * Reset the pci controller. (Asserts RST#; resets config space).
@@ -607,14 +653,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		 * PE. We should disconnect it so the binding can be
 		 * rebuilt when adding PCI devices.
 		 */
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(bus);
 	} else if (frozen_bus && removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
 
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(frozen_bus);
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+			pcibios_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -792,11 +846,15 @@ perm_error:
 	 * the their PCI config any more.
 	 */
 	if (frozen_bus) {
-		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
-
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(frozen_bus);
-		pci_unlock_rescan_remove();
+		if (pe->type & EEH_PE_VF) {
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+		} else {
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(frozen_bus);
+			pci_unlock_rescan_remove();
+		}
 	}
 }
 
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 260a701..5cde950 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -914,7 +914,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 	if (pe->type & EEH_PE_PHB) {
 		bus = pe->phb->bus;
 	} else if (pe->type & EEH_PE_BUS ||
-		   pe->type & EEH_PE_DEVICE) {
+		   pe->type & EEH_PE_DEVICE ||
+		   pe->type & EEH_PE_VF) {
 		if (pe->bus) {
 			bus = pe->bus;
 			goto out;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 11/12] powerpc/eeh: Don't block PCI config on resetting VF PE
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (9 preceding siblings ...)
  2015-10-26  3:16 ` [PATCH V10 10/12] powerpc/eeh: Support error recovery for VF PE Wei Yang
@ 2015-10-26  3:16 ` Wei Yang
  2015-10-30  5:42   ` Alexey Kardashevskiy
  2015-10-26  3:16 ` [PATCH V10 12/12] powerpc/eeh: Handle hot removed VF when PF is EEH aware Wei Yang
  2015-10-27 23:11 ` [PATCH V10 00/12] VF EEH on Power8 Bjorn Helgaas
  12 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:16 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci

From: Gavin Shan <gwshan@linux.vnet.ibm.com>

When passing through SRIOV VF from host to guest via VFIO PCI
infrastructure, the VF is resetted by EEH specific backend
(pcibios_set_pcie_reset_state()). We can't block the PCI config,
otherwise, the reset (FLR or AF FLR), to be completed by PCI
config access to the VF, can't be done. Then the VF can't be
put into initial state when passing it to the guest and returning
back to the host.

The patch just doesn't block the VF's PCI config space when doing
the reset. It fixes EEH error caused by DMA traffic to bogus DMA
address on restarting guest after killing the QEMU process, which
includes Mellanox VF passed through from host.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/eeh.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 28e4d73..e1846f5 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -745,7 +745,8 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat
 	case pcie_deassert_reset:
 		eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
 		eeh_unfreeze_pe(pe, false);
-		eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
+		if (!(pe->type & EEH_PE_VF))
+			eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
 		eeh_pe_dev_traverse(pe, eeh_restore_dev_state, dev);
 		eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
 		break;
@@ -753,14 +754,16 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat
 		eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
 		eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE);
 		eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev);
-		eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
+		if (!(pe->type & EEH_PE_VF))
+			eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
 		eeh_ops->reset(pe, EEH_RESET_HOT);
 		break;
 	case pcie_warm_reset:
 		eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
 		eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE);
 		eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev);
-		eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
+		if (!(pe->type & EEH_PE_VF))
+			eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
 		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
 		break;
 	default:
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH V10 12/12] powerpc/eeh: Handle hot removed VF when PF is EEH aware
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (10 preceding siblings ...)
  2015-10-26  3:16 ` [PATCH V10 11/12] powerpc/eeh: Don't block PCI config on resetting " Wei Yang
@ 2015-10-26  3:16 ` Wei Yang
  2015-10-30  5:35   ` Alexey Kardashevskiy
  2015-10-27 23:11 ` [PATCH V10 00/12] VF EEH on Power8 Bjorn Helgaas
  12 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-26  3:16 UTC (permalink / raw)
  To: gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

When PF is EEH aware while VFs are not, VFs will be removed during EEH
recovery. This is not supported in current code, while will leads to the VF
lost.

This patch fixes this by adding VFs back. VFs should be added back after PF
get recovered properly.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/eeh.h   |  6 ++++++
 arch/powerpc/kernel/eeh_dev.c    |  1 +
 arch/powerpc/kernel/eeh_driver.c | 30 +++++++++++++++++++++++-------
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index ea1f13c4..c529a23 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -127,6 +127,11 @@ static inline bool eeh_pe_passed(struct eeh_pe *pe)
 #define EEH_DEV_SYSFS		(1 << 9)	/* Sysfs created	*/
 #define EEH_DEV_REMOVED		(1 << 10)	/* Removed permanently	*/
 
+struct eeh_rmv_data {
+	struct list_head edev_list;
+	int removed;
+};
+
 struct eeh_dev {
 	int mode;			/* EEH mode			*/
 	int class_code;			/* Class code of the device	*/
@@ -139,6 +144,7 @@ struct eeh_dev {
 	int af_cap;			/* Saved AF capability		*/
 	struct eeh_pe *pe;		/* Associated PE		*/
 	struct list_head list;		/* Form link list in the PE	*/
+	struct list_head rmv_list;	/* Record the removed edev 	*/
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index aabba94..7815095 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -67,6 +67,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
 	edev->pdn = pdn;
 	edev->phb = phb;
 	INIT_LIST_HEAD(&edev->list);
+	INIT_LIST_HEAD(&edev->rmv_list);
 
 	return NULL;
 }
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 99868e2..f2406b4 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -420,7 +420,8 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	struct pci_driver *driver;
 	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	int *removed = (int *)userdata;
+	struct eeh_rmv_data *rmv_data = (struct eeh_rmv_data *)userdata;
+	int *removed = rmv_data ? &rmv_data->removed : NULL;
 	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	/*
@@ -467,6 +468,9 @@ static void *eeh_rmv_device(void *data, void *userdata)
 		 * required to plug the VF successfully.
 		 */
 		pdn->pe_number = IODA_INVALID_PE;
+
+		if (rmv_data)
+			list_add(&edev->rmv_list, &rmv_data->edev_list);
 	} else {
 		pci_lock_rescan_remove();
 		pci_stop_and_remove_bus_device(dev);
@@ -585,11 +589,12 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
  * During the reset, udev might be invoked because those affected
  * PCI devices will be removed and then added.
  */
-static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
+static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
+				struct eeh_rmv_data *rmv_data)
 {
 	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
 	struct timeval tstamp;
-	int cnt, rc, removed = 0;
+	int cnt, rc;
 	struct eeh_dev *edev;
 
 	/* pcibios will clear the counter; save the value */
@@ -612,7 +617,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 			pci_unlock_rescan_remove();
 		}
 	} else if (frozen_bus)
-		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
+		eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data);
 
 	/*
 	 * Reset the pci controller. (Asserts RST#; resets config space).
@@ -659,7 +664,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 			eeh_add_virt_device(edev, NULL);
 		else
 			pcibios_add_pci_devices(bus);
-	} else if (frozen_bus && removed) {
+	} else if (frozen_bus && rmv_data->removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
 
@@ -687,8 +692,10 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 static void eeh_handle_normal_event(struct eeh_pe *pe)
 {
 	struct pci_bus *frozen_bus;
+	struct eeh_dev *edev, *tmp;
 	int rc = 0;
 	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
+	struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0};
 
 	frozen_bus = eeh_pe_bus_get(pe);
 	if (!frozen_bus) {
@@ -735,7 +742,7 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
 	 */
 	if (result == PCI_ERS_RESULT_NONE) {
 		pr_info("EEH: Reset with hotplug activity\n");
-		rc = eeh_reset_device(pe, frozen_bus);
+		rc = eeh_reset_device(pe, frozen_bus, NULL);
 		if (rc) {
 			pr_warn("%s: Unable to reset, err=%d\n",
 				__func__, rc);
@@ -787,7 +794,7 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
 	/* If any device called out for a reset, then reset the slot */
 	if (result == PCI_ERS_RESULT_NEED_RESET) {
 		pr_info("EEH: Reset without hotplug activity\n");
-		rc = eeh_reset_device(pe, NULL);
+		rc = eeh_reset_device(pe, NULL, &rmv_data);
 		if (rc) {
 			pr_warn("%s: Cannot reset, err=%d\n",
 				__func__, rc);
@@ -807,6 +814,15 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
 		goto hard_fail;
 	}
 
+	/*
+	 * For those hot removed VFs, we should add back them after PF get
+	 * recovered properly.
+	 */
+	list_for_each_entry_safe(edev, tmp, &rmv_data.edev_list, rmv_list) {
+		eeh_add_virt_device(edev, NULL);
+		list_del(&edev->rmv_list);
+	}
+
 	/* Tell all device drivers that they can resume operations */
 	pr_info("EEH: Notify device driver to resume\n");
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove
  2015-10-26  3:15 ` [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove Wei Yang
@ 2015-10-27  1:31   ` Andrew Donnellan
  2015-10-27 23:06   ` Bjorn Helgaas
  1 sibling, 0 replies; 50+ messages in thread
From: Andrew Donnellan @ 2015-10-27  1:31 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe, aik; +Cc: linux-pci, linuxppc-dev

On 26/10/15 14:15, Wei Yang wrote:
> During EEH recovery, hotplug is applied to the devices which don't
> have drivers or their drivers don't support EEH. However, the hotplug,
> which was implemented based on PCI bus, can't be applied to VF directly.
>
> The patch renames virtn_{add,remove}() and exports them so that they
> can be used in PCI hotplug during EEH recovery.
>
> [gwshan: changelog]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn
  2015-10-26  3:15 ` [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn Wei Yang
@ 2015-10-27  5:01   ` Andrew Donnellan
  2015-10-27 22:04   ` Daniel Axtens
  2015-10-30  2:05   ` Alexey Kardashevskiy
  2 siblings, 0 replies; 50+ messages in thread
From: Andrew Donnellan @ 2015-10-27  5:01 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe, aik; +Cc: linux-pci, linuxppc-dev

On 26/10/15 14:15, Wei Yang wrote:
> The patch caches the VF index in pci_dn, which can be used to calculate
> VF's bus, device and function number. Those information helps to locate
> the VF's PCI device instance when doing hotplug during EEH recovery if
> necessary.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 02/12] PCI: Add pcibios_bus_add_device() weak function
  2015-10-26  3:15 ` [PATCH V10 02/12] PCI: Add pcibios_bus_add_device() weak function Wei Yang
@ 2015-10-27  5:07   ` Andrew Donnellan
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Donnellan @ 2015-10-27  5:07 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe, aik; +Cc: linux-pci, linuxppc-dev

On 26/10/15 14:15, Wei Yang wrote:
> This patch adds a weak function pcibios_bus_add_device() for arch dependent
> code could do proper setup. For example, powerpc could setup EEH related
> resources.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn
  2015-10-26  3:15 ` [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn Wei Yang
  2015-10-27  5:01   ` Andrew Donnellan
@ 2015-10-27 22:04   ` Daniel Axtens
  2015-10-28  1:45     ` Wei Yang
  2015-10-30  2:05   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 50+ messages in thread
From: Daniel Axtens @ 2015-10-27 22:04 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

[-- Attachment #1: Type: text/plain, Size: 1541 bytes --]

Hi,

>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index b3a226b..3d7e537 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -210,6 +210,7 @@ struct pci_dn {
>  #define IODA_INVALID_PE		(-1)
>  #ifdef CONFIG_PPC_POWERNV
>  	int	pe_number;
> +	int     vf_index;		/* VF index in the PF */

Here, vf_index is inside CONFIG_PPC_POWERNV...

>  #ifdef CONFIG_PCI_IOV
>  	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>  	u16     num_vfs;		/* number of VFs enabled*/
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index b3b4df9..f771130 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
>  #ifdef CONFIG_PCI_IOV
>  static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>  					   struct pci_dev *pdev,
> +					   int vf_index,
>  					   int busno, int devfn)
>  {
>  	struct pci_dn *pdn;
> @@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>  	pdn->parent = parent;
>  	pdn->busno = busno;
>  	pdn->devfn = devfn;
> +	pdn->vf_index = vf_index;
>  #ifdef CONFIG_PPC_POWERNV
>  	pdn->pe_number = IODA_INVALID_PE;
... but here, vf_index is outside CONFIG_PPC_POWERNV.

Otherwise, the patch looks fine to me.

I'm still trying to get my head around SR-IOV generally - once I do I
will add any more comments I have or add a reviewed-by.

Regards,
Daniel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove
  2015-10-26  3:15 ` [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove Wei Yang
  2015-10-27  1:31   ` Andrew Donnellan
@ 2015-10-27 23:06   ` Bjorn Helgaas
  2015-10-28  1:21     ` Wei Yang
  1 sibling, 1 reply; 50+ messages in thread
From: Bjorn Helgaas @ 2015-10-27 23:06 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, bhelgaas, mpe, aik, linuxppc-dev, linux-pci

On Mon, Oct 26, 2015 at 11:15:51AM +0800, Wei Yang wrote:
> During EEH recovery, hotplug is applied to the devices which don't
> have drivers or their drivers don't support EEH. However, the hotplug,
> which was implemented based on PCI bus, can't be applied to VF directly.
> 
> The patch renames virtn_{add,remove}() and exports them so that they
> can be used in PCI hotplug during EEH recovery.

Trivial, but write this as an imperative sentence, e.g.,

  Rename virtn_{add,remove}() and export them so they
  can be used in PCI hotplug during EEH recovery.

"The patch" doesn't add any useful information; it's obvious that the
changelog applied to this patch.

This comment also applies to at least the next patch.

Bjorn

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 00/12] VF EEH on Power8
  2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
                   ` (11 preceding siblings ...)
  2015-10-26  3:16 ` [PATCH V10 12/12] powerpc/eeh: Handle hot removed VF when PF is EEH aware Wei Yang
@ 2015-10-27 23:11 ` Bjorn Helgaas
  2015-10-28  1:50   ` Wei Yang
  12 siblings, 1 reply; 50+ messages in thread
From: Bjorn Helgaas @ 2015-10-27 23:11 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, bhelgaas, mpe, aik, linuxppc-dev, linux-pci

On Mon, Oct 26, 2015 at 11:15:50AM +0800, Wei Yang wrote:
> This patchset enables EEH on SRIOV VFs. The general idea is to create proper
> VF edev and VF PE and handle them properly.
> ...

> Gavin Shan (1):
>   powerpc/eeh: Don't block PCI config on resetting VF PE
> 
> Wei Yang (11):
>   PCI/IOV: Rename and export virtfn_add/virtfn_remove
>   PCI: Add pcibios_bus_add_device() weak function
>   powerpc/pci: Cache VF index in pci_dn
>   powerpc/pci: Remove VFs prior to PF
>   powerpc/eeh: Cache only BARs, not windows or IOV BARs
>   powerpc/powernv: EEH device for VF
>   powerpc/eeh: Create PE for VFs
>   powerpc/powernv: Support EEH reset for VF PE
>   powerpc/powernv: Support PCI config restore for VFs
>   powerpc/eeh: Support error recovery for VF PE
>   powerpc/eeh: Handle hot removed VF when PF is EEH aware
> 
>  arch/powerpc/include/asm/eeh.h               |  10 ++
>  arch/powerpc/include/asm/pci-bridge.h        |   2 +
>  arch/powerpc/kernel/eeh.c                    |  17 ++-
>  arch/powerpc/kernel/eeh_cache.c              |   6 +-
>  arch/powerpc/kernel/eeh_dev.c                |   1 +
>  arch/powerpc/kernel/eeh_driver.c             | 130 ++++++++++++----
>  arch/powerpc/kernel/eeh_pe.c                 |  13 +-
>  arch/powerpc/kernel/pci-hotplug.c            |   2 +-
>  arch/powerpc/kernel/pci_dn.c                 |  16 +-
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 220 ++++++++++++++++++++++++++-
>  arch/powerpc/platforms/powernv/pci.c         |  18 +++
>  drivers/pci/bus.c                            |   3 +
>  drivers/pci/iov.c                            |  10 +-
>  include/linux/pci.h                          |   8 +
>  14 files changed, 408 insertions(+), 48 deletions(-)

This really only affects powerpc, so I assume this series will go through
the powerpc tree.  Let me know if you want me to do anything else.

Bjorn

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove
  2015-10-27 23:06   ` Bjorn Helgaas
@ 2015-10-28  1:21     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-28  1:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Wei Yang, gwshan, bhelgaas, mpe, aik, linuxppc-dev, linux-pci

On Tue, Oct 27, 2015 at 06:06:54PM -0500, Bjorn Helgaas wrote:
>On Mon, Oct 26, 2015 at 11:15:51AM +0800, Wei Yang wrote:
>> During EEH recovery, hotplug is applied to the devices which don't
>> have drivers or their drivers don't support EEH. However, the hotplug,
>> which was implemented based on PCI bus, can't be applied to VF directly.
>> 
>> The patch renames virtn_{add,remove}() and exports them so that they
>> can be used in PCI hotplug during EEH recovery.
>
>Trivial, but write this as an imperative sentence, e.g.,
>
>  Rename virtn_{add,remove}() and export them so they
>  can be used in PCI hotplug during EEH recovery.
>
>"The patch" doesn't add any useful information; it's obvious that the
>changelog applied to this patch.

Yep, thanks, will change in next version.

>
>This comment also applies to at least the next patch.
>
>Bjorn

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn
  2015-10-27 22:04   ` Daniel Axtens
@ 2015-10-28  1:45     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-28  1:45 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Wei Yang, gwshan, bhelgaas, mpe, aik, linuxppc-dev, linux-pci

On Wed, Oct 28, 2015 at 09:04:34AM +1100, Daniel Axtens wrote:
>Hi,
>
>>
>> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>> index b3a226b..3d7e537 100644
>> --- a/arch/powerpc/include/asm/pci-bridge.h
>> +++ b/arch/powerpc/include/asm/pci-bridge.h
>> @@ -210,6 +210,7 @@ struct pci_dn {
>>  #define IODA_INVALID_PE		(-1)
>>  #ifdef CONFIG_PPC_POWERNV
>>  	int	pe_number;
>> +	int     vf_index;		/* VF index in the PF */
>
>Here, vf_index is inside CONFIG_PPC_POWERNV...
>
>>  #ifdef CONFIG_PCI_IOV
>>  	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>>  	u16     num_vfs;		/* number of VFs enabled*/
>> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>> index b3b4df9..f771130 100644
>> --- a/arch/powerpc/kernel/pci_dn.c
>> +++ b/arch/powerpc/kernel/pci_dn.c
>> @@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
>>  #ifdef CONFIG_PCI_IOV
>>  static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>>  					   struct pci_dev *pdev,
>> +					   int vf_index,
>>  					   int busno, int devfn)
>>  {
>>  	struct pci_dn *pdn;
>> @@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>>  	pdn->parent = parent;
>>  	pdn->busno = busno;
>>  	pdn->devfn = devfn;
>> +	pdn->vf_index = vf_index;
>>  #ifdef CONFIG_PPC_POWERNV
>>  	pdn->pe_number = IODA_INVALID_PE;
>... but here, vf_index is outside CONFIG_PPC_POWERNV.
>

Hey, Daniel

Glad to see you comment. You are right, to be consistent this should be put
into the CONFIG_PPC_POWERNV. Will change it next version.

>Otherwise, the patch looks fine to me.
>
>I'm still trying to get my head around SR-IOV generally - once I do I
>will add any more comments I have or add a reviewed-by.
>
>Regards,
>Daniel



-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 00/12] VF EEH on Power8
  2015-10-27 23:11 ` [PATCH V10 00/12] VF EEH on Power8 Bjorn Helgaas
@ 2015-10-28  1:50   ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-28  1:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Wei Yang, gwshan, bhelgaas, mpe, aik, linuxppc-dev, linux-pci

On Tue, Oct 27, 2015 at 06:11:13PM -0500, Bjorn Helgaas wrote:
>On Mon, Oct 26, 2015 at 11:15:50AM +0800, Wei Yang wrote:
>> This patchset enables EEH on SRIOV VFs. The general idea is to create proper
>> VF edev and VF PE and handle them properly.
>> ...
>
>> Gavin Shan (1):
>>   powerpc/eeh: Don't block PCI config on resetting VF PE
>> 
>> Wei Yang (11):
>>   PCI/IOV: Rename and export virtfn_add/virtfn_remove
>>   PCI: Add pcibios_bus_add_device() weak function
>>   powerpc/pci: Cache VF index in pci_dn
>>   powerpc/pci: Remove VFs prior to PF
>>   powerpc/eeh: Cache only BARs, not windows or IOV BARs
>>   powerpc/powernv: EEH device for VF
>>   powerpc/eeh: Create PE for VFs
>>   powerpc/powernv: Support EEH reset for VF PE
>>   powerpc/powernv: Support PCI config restore for VFs
>>   powerpc/eeh: Support error recovery for VF PE
>>   powerpc/eeh: Handle hot removed VF when PF is EEH aware
>> 
>>  arch/powerpc/include/asm/eeh.h               |  10 ++
>>  arch/powerpc/include/asm/pci-bridge.h        |   2 +
>>  arch/powerpc/kernel/eeh.c                    |  17 ++-
>>  arch/powerpc/kernel/eeh_cache.c              |   6 +-
>>  arch/powerpc/kernel/eeh_dev.c                |   1 +
>>  arch/powerpc/kernel/eeh_driver.c             | 130 ++++++++++++----
>>  arch/powerpc/kernel/eeh_pe.c                 |  13 +-
>>  arch/powerpc/kernel/pci-hotplug.c            |   2 +-
>>  arch/powerpc/kernel/pci_dn.c                 |  16 +-
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 220 ++++++++++++++++++++++++++-
>>  arch/powerpc/platforms/powernv/pci.c         |  18 +++
>>  drivers/pci/bus.c                            |   3 +
>>  drivers/pci/iov.c                            |  10 +-
>>  include/linux/pci.h                          |   8 +
>>  14 files changed, 408 insertions(+), 48 deletions(-)
>
>This really only affects powerpc, so I assume this series will go through
>the powerpc tree.  Let me know if you want me to do anything else.
>

Yep, as we talked about it, this will be merged in powerpc tree.

Have a good day :-)

>Bjorn

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs
  2015-10-26  3:15 ` [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs Wei Yang
@ 2015-10-29  3:29   ` Daniel Axtens
  2015-10-29  8:57     ` Wei Yang
  2015-10-30  3:22   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 50+ messages in thread
From: Daniel Axtens @ 2015-10-29  3:29 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe, aik; +Cc: linuxppc-dev, linux-pci, Wei Yang

[-- Attachment #1: Type: text/plain, Size: 2456 bytes --]

Wei Yang <weiyang@linux.vnet.ibm.com> writes:

> EEH address cache, which helps to locate the PCI device according to
> the given (physical) MMIO address, didn't cover PCI bridges. Also, it
> shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
> should be returned.
>
> Also, by doing so, it removes the type check in
> eeh_addr_cache_insert_dev(), since bridge's window would not be cached.
>
> The patch restricts the address cache to cover first 7 BARs for the
> above purposes.
If I've understoond the patch correctly, I think you want to swap the
last two paragraphs in the commit message:

"Restrict the address cache to cover the first 7 BARs...

Since the window of a bridge will now not be cached, remove the type
check..."

With regards to the actual patch, I have now got access to the PCI and
SR-IOV specs, but I'm still getting to grips with it all so let me know
if something I say doesn't make sense.

Here, you restrict the enumeration of resources to the standard and
extension ROM resources (the first 7), which excludes enumeration of
VF resources. That much I understand.

I'm having more trouble convincing myself that it's safe or desirable to
drop the test for bridges. I think I understand that the change to the
for loop means it _should_ be safe, but is there any motivation for the
change other than making the code more straightforward?

>  	/* Walk resources on this device, poke them into the tree *
This comment probably needs to be made more descriptive given the change.
> -	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
> +	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>  		resource_size_t start = pci_resource_start(dev,i);
>  		resource_size_t end = pci_resource_end(dev,i);
>  		unsigned long flags = pci_resource_flags(dev,i);
> @@ -222,10 +222,6 @@ void eeh_addr_cache_insert_dev(struct pci_dev *dev)
>  {

Regards,
Daniel

>  	unsigned long flags;
>  
> -	/* Ignore PCI bridges */
> -	if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
> -		return;
> -
>  	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
>  	__eeh_addr_cache_insert_dev(dev);
>  	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
> -- 
> 2.5.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs
  2015-10-29  3:29   ` Daniel Axtens
@ 2015-10-29  8:57     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-29  8:57 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Wei Yang, gwshan, bhelgaas, mpe, aik, linuxppc-dev, linux-pci

On Thu, Oct 29, 2015 at 02:29:19PM +1100, Daniel Axtens wrote:
>Wei Yang <weiyang@linux.vnet.ibm.com> writes:
>
>> EEH address cache, which helps to locate the PCI device according to
>> the given (physical) MMIO address, didn't cover PCI bridges. Also, it
>> shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
>> should be returned.
>>
>> Also, by doing so, it removes the type check in
>> eeh_addr_cache_insert_dev(), since bridge's window would not be cached.
>>
>> The patch restricts the address cache to cover first 7 BARs for the
>> above purposes.
>If I've understoond the patch correctly, I think you want to swap the
>last two paragraphs in the commit message:
>
>"Restrict the address cache to cover the first 7 BARs...
>
>Since the window of a bridge will now not be cached, remove the type
>check..."
>

Hmm... my purpose in the last paragraphs is to state what the patch does and
the 2nd one is to mention another change in the log.

The order is both fine to me.

>With regards to the actual patch, I have now got access to the PCI and
>SR-IOV specs, but I'm still getting to grips with it all so let me know
>if something I say doesn't make sense.
>
>Here, you restrict the enumeration of resources to the standard and
>extension ROM resources (the first 7), which excludes enumeration of
>VF resources. That much I understand.
>
>I'm having more trouble convincing myself that it's safe or desirable to
>drop the test for bridges. I think I understand that the change to the
>for loop means it _should_ be safe, but is there any motivation for the
>change other than making the code more straightforward?
>

The motivation is just make the code more straightforward.

For a bridge device, the first 7 resources are not used and the last several
are not cached, This is the reason why I remove it in the patch.

>>  	/* Walk resources on this device, poke them into the tree *
>This comment probably needs to be made more descriptive given the change.

Right, will change it.

>> -	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
>> +	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>>  		resource_size_t start = pci_resource_start(dev,i);
>>  		resource_size_t end = pci_resource_end(dev,i);
>>  		unsigned long flags = pci_resource_flags(dev,i);
>> @@ -222,10 +222,6 @@ void eeh_addr_cache_insert_dev(struct pci_dev *dev)
>>  {
>
>Regards,
>Daniel
>
>>  	unsigned long flags;
>>  
>> -	/* Ignore PCI bridges */
>> -	if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
>> -		return;
>> -
>>  	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
>>  	__eeh_addr_cache_insert_dev(dev);
>>  	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
>> -- 
>> 2.5.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn
  2015-10-26  3:15 ` [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn Wei Yang
  2015-10-27  5:01   ` Andrew Donnellan
  2015-10-27 22:04   ` Daniel Axtens
@ 2015-10-30  2:05   ` Alexey Kardashevskiy
  2015-10-30  2:48     ` Wei Yang
  2 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  2:05 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:15 PM, Wei Yang wrote:
> The patch caches the VF index in pci_dn, which can be used to calculate
> VF's bus, device and function number. Those information helps to locate
> the VF's PCI device instance when doing hotplug during EEH recovery if
> necessary.


The patch itself does not make much sense and quite small, I'd merge it 
into the one which makes use of this new vf_index.

>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/pci-bridge.h | 1 +
>   arch/powerpc/kernel/pci_dn.c          | 4 +++-
>   2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index b3a226b..3d7e537 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -210,6 +210,7 @@ struct pci_dn {
>   #define IODA_INVALID_PE		(-1)
>   #ifdef CONFIG_PPC_POWERNV
>   	int	pe_number;
> +	int     vf_index;		/* VF index in the PF */
>   #ifdef CONFIG_PCI_IOV
>   	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>   	u16     num_vfs;		/* number of VFs enabled*/
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index b3b4df9..f771130 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
>   #ifdef CONFIG_PCI_IOV
>   static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>   					   struct pci_dev *pdev,
> +					   int vf_index,
>   					   int busno, int devfn)
>   {
>   	struct pci_dn *pdn;
> @@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>   	pdn->parent = parent;
>   	pdn->busno = busno;
>   	pdn->devfn = devfn;
> +	pdn->vf_index = vf_index;
>   #ifdef CONFIG_PPC_POWERNV
>   	pdn->pe_number = IODA_INVALID_PE;
>   #endif
> @@ -196,7 +198,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>   		return NULL;
>
>   	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
> -		pdn = add_one_dev_pci_data(parent, NULL,
> +		pdn = add_one_dev_pci_data(parent, NULL, i,
>   					   pci_iov_virtfn_bus(pdev, i),
>   					   pci_iov_virtfn_devfn(pdev, i));
>   		if (!pdn) {
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn
  2015-10-30  2:05   ` Alexey Kardashevskiy
@ 2015-10-30  2:48     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-30  2:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 01:05:43PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:15 PM, Wei Yang wrote:
>>The patch caches the VF index in pci_dn, which can be used to calculate
>>VF's bus, device and function number. Those information helps to locate
>>the VF's PCI device instance when doing hotplug during EEH recovery if
>>necessary.
>
>
>The patch itself does not make much sense and quite small, I'd merge it into
>the one which makes use of this new vf_index.
>

Well, reasonable, will merge it.

>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h | 1 +
>>  arch/powerpc/kernel/pci_dn.c          | 4 +++-
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index b3a226b..3d7e537 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -210,6 +210,7 @@ struct pci_dn {
>>  #define IODA_INVALID_PE		(-1)
>>  #ifdef CONFIG_PPC_POWERNV
>>  	int	pe_number;
>>+	int     vf_index;		/* VF index in the PF */
>>  #ifdef CONFIG_PCI_IOV
>>  	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>>  	u16     num_vfs;		/* number of VFs enabled*/
>>diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>index b3b4df9..f771130 100644
>>--- a/arch/powerpc/kernel/pci_dn.c
>>+++ b/arch/powerpc/kernel/pci_dn.c
>>@@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
>>  #ifdef CONFIG_PCI_IOV
>>  static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>>  					   struct pci_dev *pdev,
>>+					   int vf_index,
>>  					   int busno, int devfn)
>>  {
>>  	struct pci_dn *pdn;
>>@@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>>  	pdn->parent = parent;
>>  	pdn->busno = busno;
>>  	pdn->devfn = devfn;
>>+	pdn->vf_index = vf_index;
>>  #ifdef CONFIG_PPC_POWERNV
>>  	pdn->pe_number = IODA_INVALID_PE;
>>  #endif
>>@@ -196,7 +198,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>>  		return NULL;
>>
>>  	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
>>-		pdn = add_one_dev_pci_data(parent, NULL,
>>+		pdn = add_one_dev_pci_data(parent, NULL, i,
>>  					   pci_iov_virtfn_bus(pdev, i),
>>  					   pci_iov_virtfn_devfn(pdev, i));
>>  		if (!pdn) {
>>
>
>
>-- 
>Alexey
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 04/12] powerpc/pci: Remove VFs prior to PF
  2015-10-26  3:15 ` [PATCH V10 04/12] powerpc/pci: Remove VFs prior to PF Wei Yang
@ 2015-10-30  3:04   ` Alexey Kardashevskiy
  2015-10-30  6:31     ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  3:04 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:15 PM, Wei Yang wrote:
> As commit ac205b7bb72f ("PCI: make sriov work with hotplug remove") indicates,
> VFs, which might be hooked to same PCI bus as their PF should be removed

A comma is missing before "should be" (or you did not need a comma after 
"VFs" may be :) ).


> before the PF. Otherwise, the PCI hot unplugging on the PCI bus would

s/on/of/? "Unplugging on" does not make much sense to me in this context at 
least.


> cause kernel crash.
>
> The patch applies the above pattern to PowerPC PCI hotplug path.
>
> [gwshan: changelog]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/kernel/pci-hotplug.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 7f9ed0c..59c4361 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -55,7 +55,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
>
>   	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
>   		 pci_domain_nr(bus),  bus->number);
> -	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
> +	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
>   		pr_debug("   Removing %s...\n", pci_name(dev));
>   		pci_stop_and_remove_bus_device(dev);
>   	}
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs
  2015-10-26  3:15 ` [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs Wei Yang
  2015-10-29  3:29   ` Daniel Axtens
@ 2015-10-30  3:22   ` Alexey Kardashevskiy
  2015-10-30  6:37     ` Wei Yang
  1 sibling, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  3:22 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:15 PM, Wei Yang wrote:
> EEH address cache, which helps to locate the PCI device according to
> the given (physical) MMIO address, didn't cover PCI bridges. Also, it
> shouldn't return PF

"it shouldn't return" is about the cache, right? eeh_addr_cache_get_dev() - 
this guy can "return", the cache cannot.

> with address in PF's IOV BARs. Instead, the VFs
> should be returned.
>
> Also, by doing so, it removes the type check in
> eeh_addr_cache_insert_dev(), since bridge's window would not be cached.
>
> The patch restricts the address cache to cover first 7 BARs for the
> above purposes.


I'd better understand something like this :)

This restricts the EEH address cache to use only first 7 BARs. This makes 
__eeh_addr_cache_insert_dev() ignore PCI bridge windows and IOV BARs. As 
the result of this change, eeh_addr_cache_get_dev() will return VFs from 
VF's resource addresses instead of parent PFs.

This removes extra check for a PCI bridge as we limit 
__eeh_addr_cache_insert_dev() to 7 BARs and this effectively excludes PCI 
bridges from being cached.


>
> [gwshan: changelog]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/kernel/eeh_cache.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index a1e86e1..e6887f0 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
>   	}
>
>   	/* Walk resources on this device, poke them into the tree */
> -	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
> +	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>   		resource_size_t start = pci_resource_start(dev,i);
>   		resource_size_t end = pci_resource_end(dev,i);
>   		unsigned long flags = pci_resource_flags(dev,i);
> @@ -222,10 +222,6 @@ void eeh_addr_cache_insert_dev(struct pci_dev *dev)
>   {
>   	unsigned long flags;
>
> -	/* Ignore PCI bridges */
> -	if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
> -		return;
> -
>   	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
>   	__eeh_addr_cache_insert_dev(dev);
>   	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 06/12] powerpc/powernv: EEH device for VF
  2015-10-26  3:15 ` [PATCH V10 06/12] powerpc/powernv: EEH device for VF Wei Yang
@ 2015-10-30  3:33   ` Alexey Kardashevskiy
  2015-10-30  6:52     ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  3:33 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:15 PM, Wei Yang wrote:
> VFs and their corresponding pci_dn instances are created and released
> dynamically as their PF's SRIOV capability is enabled and disabled.
> The patch creates and releases EEH devices for VFs when creating and
> releasing their pci_dn instances, which means EEH devices and pci_dn
> instances have same life cycle. Also, VF's EEH device is identified
> by (struct eeh_dev::physfn).


The add_dev_pci_data() helper (the one you hack) does not create pci_dn 
instances. The add_one_dev_pci_data() helper does.


>
> [gwshan: changelog and removed CONFIG_PCI_IOV]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/eeh.h |  1 +
>   arch/powerpc/kernel/pci_dn.c   | 12 ++++++++++++
>   2 files changed, 13 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index c5eb86f..6c383ad 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -140,6 +140,7 @@ struct eeh_dev {
>   	struct pci_controller *phb;	/* Associated PHB		*/
>   	struct pci_dn *pdn;		/* Associated PCI device node	*/
>   	struct pci_dev *pdev;		/* Associated PCI device	*/
> +	struct pci_dev *physfn;		/* Associated PF PORT		*/
>   	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
>   };
>
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index f771130..f0ddde7 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>   struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>   {
>   #ifdef CONFIG_PCI_IOV
> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>   	struct pci_dn *parent, *pdn;
> +	struct eeh_dev *edev;
>   	int i;
>
>   	/* Only support IOV for now */
> @@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>   				 __func__, i);
>   			return NULL;
>   		}
> +		eeh_dev_init(pdn, hose);
> +		edev = pdn_to_eeh_dev(pdn);


In theory, pdn_to_eeh_dev() can return NULL. In this patch, it is not clear 
if pdn->edev gets initialized before or after add_dev_pci_data().



> +		edev->physfn = pdev;
>   	}
>   #endif /* CONFIG_PCI_IOV */
>
> @@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
>   	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
>   		list_for_each_entry_safe(pdn, tmp,
>   			&parent->child_list, list) {
> +			struct eeh_dev *edev;
>   			if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
>   			    pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
>   				continue;
>
> +			edev = pdn_to_eeh_dev(pdn);
> +			if (edev) {
> +				pdn->edev = NULL;
> +				kfree(edev);
> +			}
> +
>   			if (!list_empty(&pdn->list))
>   				list_del(&pdn->list);
>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 07/12] powerpc/eeh: Create PE for VFs
  2015-10-26  3:15 ` [PATCH V10 07/12] powerpc/eeh: Create PE for VFs Wei Yang
@ 2015-10-30  3:46   ` Alexey Kardashevskiy
  2015-10-30  6:59     ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  3:46 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:15 PM, Wei Yang wrote:
> Current EEH recovery code works with the assumption: the PE has primary
> bus. Unfortunately, that's not true for VF PEs, which generally contains
> one or multiple VFs (for VF group case).

What is that "VF group case"? Is not it a "compound PE" thingy which you 
were removing in "SRIOV redesign patchset"?

The patch might be ok but the commit log above does not explain why the 
existing way of PEs allocation would not work - somehow it works for a 
primary bus now, why would not it work on other buses?


> The patch creates PEs for VFs in the weak function
> pcibios_bus_add_device().Those PEs for VFs are identified with newly
> introduced flag EEH_PE_VF so that we handle them differently during EEH
> recovery.
 >
> [gwshan: changelog and code refactoring]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/eeh.h               |  1 +
>   arch/powerpc/kernel/eeh_pe.c                 | 10 ++++++++--
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 16 ++++++++++++++++
>   3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 6c383ad..ec21f8f 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -72,6 +72,7 @@ struct pci_dn;
>   #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
>   #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
>   #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
> +#define EEH_PE_VF	(1 << 4)	/* VF PE     */
>
>   #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
>   #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> index 35f0b62..260a701 100644
> --- a/arch/powerpc/kernel/eeh_pe.c
> +++ b/arch/powerpc/kernel/eeh_pe.c
> @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
>   	 * EEH device already having associated PE, but
>   	 * the direct parent EEH device doesn't have yet.
>   	 */
> -	pdn = pdn ? pdn->parent : NULL;
> +	if (edev->physfn)
> +		pdn = pci_get_pdn(edev->physfn);
> +	else
> +		pdn = pdn ? pdn->parent : NULL;
>   	while (pdn) {
>   		/* We're poking out of PCI territory */
>   		parent = pdn_to_eeh_dev(pdn);
> @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>   	}
>
>   	/* Create a new EEH PE */
> -	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
> +	if (edev->physfn)
> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
> +	else
> +		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>   	if (!pe) {
>   		pr_err("%s: out of memory!\n", __func__);
>   		return -ENOMEM;
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 7cf0df8..cfd55dd 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -1524,6 +1524,22 @@ static struct eeh_ops pnv_eeh_ops = {
>   	.restore_config		= pnv_eeh_restore_config
>   };
>
> +void pcibios_bus_add_device(struct pci_dev *pdev)
> +{
> +	struct pci_dn *pdn = pci_get_pdn(pdev);
> +
> +	if (!pdev->is_virtfn)
> +		return;
> +
> +	/*
> +	 * The following operations will fail if VF's sysfs files
> +	 * aren't created or its resources aren't finalized.
> +	 */
> +	eeh_add_device_early(pdn);
> +	eeh_add_device_late(pdev);
> +	eeh_sysfs_add_device(pdev);
> +}
> +
>   /**
>    * eeh_powernv_init - Register platform dependent EEH operations
>    *
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE
  2015-10-26  3:15 ` [PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE Wei Yang
@ 2015-10-30  4:11   ` Alexey Kardashevskiy
  2015-10-30  7:18     ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  4:11 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:15 PM, Wei Yang wrote:
> PEs for VFs don't have primary bus. So they have to have their own reset
> backend, which is used during EEH recovery. The patch implements the reset
> backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
> in the PE.
>
> [gwshan: changelog and code refactoring]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/eeh.h               |   1 +
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 134 ++++++++++++++++++++++++++-
>   2 files changed, 134 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index ec21f8f..331c856 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -136,6 +136,7 @@ struct eeh_dev {
>   	int pcix_cap;			/* Saved PCIx capability	*/
>   	int pcie_cap;			/* Saved PCIe capability	*/
>   	int aer_cap;			/* Saved AER capability		*/
> +	int af_cap;			/* Saved AF capability		*/
>   	struct eeh_pe *pe;		/* Associated PE		*/
>   	struct list_head list;		/* Form link list in the PE	*/
>   	struct pci_controller *phb;	/* Associated PHB		*/
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index cfd55dd..017cd72 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -404,6 +404,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
>   	edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
>   	edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
>   	edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
> +	edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
>   	if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
>   		edev->mode |= EEH_DEV_BRIDGE;
>   		if (edev->pcie_cap) {
> @@ -893,6 +894,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>   	return 0;
>   }
>
> +static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
> +				     u16 mask, bool af_flr_rst)
> +{
> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
> +	int status, i;
> +
> +	/* Wait for Transaction Pending bit to be cleared */
> +	for (i = 0; i < 4; i++) {
> +		eeh_ops->read_config(pdn, pos, 2, &status);


gcc should have complained on using uninitialized @status here.


> +		if (!(status & mask))
> +			return;
> +
> +		msleep((1 << i) * 100);
> +	}
> +
> +	pr_warn("%s: Pending transaction while issuing %s FLR to "
> +		"%04x:%02x:%02x.%01x\n",

Do not wrap user-visible strings.


> +		__func__, af_flr_rst ? "AF" : "",
> +		edev->phb->global_number, pdn->busno,
> +		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
> +}
> +
> +static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
> +{
> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
> +	u32 reg;
> +
> +	if (!edev->pcie_cap)
> +		return -ENOTTY;


Can pnv_eeh_do_flr() be really called on a non PCIe device, can we get that 
far? WARN_ON_ONCE() may be?


> +
> +	eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, &reg);


... and here about uninitialized @reg.


> +	if (!(reg & PCI_EXP_DEVCAP_FLR))
> +		return -ENOTTY;
> +
> +	switch (option) {
> +	case EEH_RESET_HOT:
> +	case EEH_RESET_FUNDAMENTAL:
> +		pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
> +					 PCI_EXP_DEVSTA_TRPND, false);
> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				     4, &reg);
> +		reg |= PCI_EXP_DEVCTL_BCR_FLR;
> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				      4, reg);
> +		msleep(EEH_PE_RST_HOLD_TIME);
> +		break;
> +	case EEH_RESET_DEACTIVATE:
> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				     4, &reg);
> +		reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				      4, reg);
> +		msleep(EEH_PE_RST_SETTLE_TIME);
> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
> +{
> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
> +	u32 cap;
> +
> +	if (!edev->af_cap)
> +		return -ENOTTY;
> +
> +	eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);


... and here about @cap.

> +	if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
> +		return -ENOTTY;
> +
> +	switch (option) {
> +	case EEH_RESET_HOT:
> +	case EEH_RESET_FUNDAMENTAL:
> +		/*
> +		 * Wait for Transaction Pending bit to clear. A word-aligned
> +		 * test is used, so we use the conrol offset rather than status
> +		 * and shift the test bit to match.


Why word-aligned (not byte or double word)?

> +		 */
> +		pnv_eeh_wait_for_pending(pdn, edev->af_cap + PCI_AF_CTRL,
> +					 PCI_AF_STATUS_TP << 8, true);
> +		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL,
> +				      1, PCI_AF_CTRL_FLR);
> +		msleep(EEH_PE_RST_HOLD_TIME);
> +		break;
> +	case EEH_RESET_DEACTIVATE:
> +		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL, 1, 0);
> +		msleep(EEH_PE_RST_SETTLE_TIME);


btw there is an unrelated issue with EEH_PE_RST_SETTLE_TIME which is 
defined as 1800 which is A LOT (+250ms from EEH_PE_RST_HOLD_TIME and for 
some reason this is actually doubled so there is another reset somewhere).

Booting a guest with 63 VFs takes 6 minutes or so, is there a good reason 
for such a huge timeout?


> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +static int pnv_eeh_reset_vf(struct pci_dn *pdn, int option)
> +{
> +	int ret;
> +
> +	ret = pnv_eeh_do_flr(pdn, option);
> +	if (ret != -ENOTTY)
> +		return ret;
> +
> +	return pnv_eeh_do_af_flr(pdn, option);
> +}
> +
> +static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
> +{
> +	struct eeh_dev *edev, *tmp;
> +	struct pci_dn *pdn;
> +	int ret;
> +
> +	eeh_pe_for_each_dev(pe, edev, tmp) {
> +		pdn = eeh_dev_to_pdn(edev);
> +		ret = pnv_eeh_reset_vf(pdn, option);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
>   void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>   {
>   	struct pci_controller *hose;
> @@ -968,7 +1090,9 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
>   		}
>
>   		bus = eeh_pe_bus_get(pe);
> -		if (pci_is_root_bus(bus) ||
> +		if (pe->type & EEH_PE_VF)
> +			ret = pnv_eeh_vf_pe_reset(pe, option);
> +		else if (pci_is_root_bus(bus) ||
>   			pci_is_root_bus(bus->parent))
>   			ret = pnv_eeh_root_reset(hose, option);
>   		else
> @@ -1108,6 +1232,14 @@ static inline bool pnv_eeh_cfg_blocked(struct pci_dn *pdn)
>   	if (!edev || !edev->pe)
>   		return false;
>
> +	/*
> +	 * We will issue FLR or AF FLR to all VFs, which are contained
> +	 * in VF PE. It relies on the EEH PCI config accessors. So we
> +	 * can't block them during the window.
> +	 */
> +	if ((edev->physfn) && (edev->pe->state & EEH_PE_RESET))


Extra braces around edev->physfn.



> +		return false;
> +
>   	if (edev->pe->state & EEH_PE_CFG_BLOCKED)
>   		return true;
>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 09/12] powerpc/powernv: Support PCI config restore for VFs
  2015-10-26  3:15 ` [PATCH V10 09/12] powerpc/powernv: Support PCI config restore for VFs Wei Yang
@ 2015-10-30  4:56   ` Alexey Kardashevskiy
  2015-10-30  8:17     ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  4:56 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:15 PM, Wei Yang wrote:
> After PE reset, OPAL API opal_pci_reinit() is called on all devices
> contained in the PE to reinitialize them. However, VFs can't be seen
> from skiboot firmware. We have to implement the functions, similar
> those in skiboot firmware, to reinitialize VFs after reset on PE
> for VFs.
>
> [gwshan: changelog and code refactoring]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/pci-bridge.h        |  1 +
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 70 +++++++++++++++++++++++++++-
>   arch/powerpc/platforms/powernv/pci.c         | 18 +++++++
>   3 files changed, 88 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index 3d7e537..e499d93 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -219,6 +219,7 @@ struct pci_dn {
>   #define IODA_INVALID_M64        (-1)
>   	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>   #endif /* CONFIG_PCI_IOV */
> +	int     mps;

int     mps; /* maximum payload size */
?


>   #endif
>   	struct list_head child_list;
>   	struct list_head list;
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 017cd72..3cc3e76 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -1616,6 +1616,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
>   	return ret;
>   }
>
> +static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)

It does not exactly restore it, it just tweaks few bytes.


> +{
> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
> +	u32 devctl, cmd, cap2, aer_capctl;
> +	int old_mps;
> +
> +	/* Restore MPS */
> +	if (edev->pcie_cap) {
> +		old_mps = (ffs(pdn->mps) - 8) << 5;
> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				     2, &devctl);
> +		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
> +		devctl |= old_mps;
> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				      2, devctl);
> +	}
> +
> +	/* Disable Completion Timeout */
> +	if (edev->pcie_cap) {
> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
> +				     4, &cap2);
> +		if (cap2 & 0x10) {
> +			eeh_ops->read_config(pdn,
> +					edev->pcie_cap + PCI_EXP_DEVCTL2,
> +					4, &cap2);
> +			cap2 |= 0x10;
> +			eeh_ops->write_config(pdn,
> +					edev->pcie_cap + PCI_EXP_DEVCTL2,
> +					4, cap2);
> +		}
> +	}
> +
> +	/* Enable SERR and parity checking */
> +	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);


No complains from gcc about uninitialized @cmd and others? Cooool...


> +	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
> +	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
> +
> +	/* Enable report various errors */
> +	if (edev->pcie_cap) {
> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				2, &devctl);
> +		devctl &= ~PCI_EXP_DEVCTL_CERE;
> +		devctl |= (PCI_EXP_DEVCTL_NFERE |
> +			   PCI_EXP_DEVCTL_FERE |
> +			   PCI_EXP_DEVCTL_URRE);
> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
> +				2, devctl);
> +	}
> +
> +	/* Enable ECRC generation and check */
> +	if (edev->pcie_cap && edev->aer_cap) {
> +		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
> +				4, &aer_capctl);
> +		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
> +		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
> +				4, aer_capctl);
> +	}
> +
> +	return 0;
> +}
> +
>   static int pnv_eeh_restore_config(struct pci_dn *pdn)
>   {
>   	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
> @@ -1626,7 +1687,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
>   		return -EEXIST;
>
>   	phb = edev->phb->private_data;
> -	ret = opal_pci_reinit(phb->opal_id,
> +	/*
> +	 * We have to restore the PCI config space after reset since the
> +	 * firmware can't see SRIOV VFs.


When I see "restore config space", pci_restore_state() comes to my mind... 
What you do is rather "fixup" but for some reason you do not call this from 
pnv_pci_fixup_vf_mps (which could be more generic and call 
pnv_eeh_restore_config()). Or that pnv_pci_fixup_vf_mps() could be merged 
into pnv_eeh_restore_config(). Having "restore" code in 2 places with 
unclear execution order does not feel right.



> +	 */
> +	if (edev->physfn)
> +		ret = pnv_eeh_restore_vf_config(pdn);
> +	else
> +		ret = opal_pci_reinit(phb->opal_id,
>   			      OPAL_REINIT_PCI_DEV, edev->config_addr);
>   	if (ret) {
>   		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index 765d8ed..0e4f42e 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -788,6 +788,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
>   }
>   DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
>
> +#ifdef CONFIG_PCI_IOV
> +static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
> +{
> +	struct pci_dn *pdn = pci_get_pdn(pdev);
> +	int parent_mps;
> +
> +	if (!pdev->is_virtfn)
> +		return;
> +
> +	/* Synchronize MPS for VF and PF */
> +	parent_mps = pcie_get_mps(pdev->physfn);
> +	if ((128 << pdev->pcie_mpss) >= parent_mps)
> +		pcie_set_mps(pdev, parent_mps);


There is no mentioning of MPS in the commit log. What and why is happening 
here? Is this cut-n-paste? Is not there already some code somewhere which 
does the same thing already for initial init()? Can this be reused? Or 
extracted to a helper and reused?



> +	pdn->mps = pcie_get_mps(pdev);
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
> +#endif /* CONFIG_PCI_IOV */
> +
>   void __init pnv_pci_init(void)
>   {
>   	struct device_node *np;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 10/12] powerpc/eeh: Support error recovery for VF PE
  2015-10-26  3:16 ` [PATCH V10 10/12] powerpc/eeh: Support error recovery for VF PE Wei Yang
@ 2015-10-30  5:20   ` Alexey Kardashevskiy
  2015-11-01  1:53     ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  5:20 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:16 PM, Wei Yang wrote:
> Different from PCI bus dependent PE, PE for VFs doesn't have the

s/Different from/Unlike/


> primary bus, on which the PCI hotplug is implemented. The patch
> supports error recovery, especially the PCI hotplug for VF's PE.

The patch adds support for error recovery of what exactly?
What is "especially" about?


> The hotplug on VF's PE is implemented based on VFs, instead of
> PCI bus any more.

Needs rephrase.

Is this patch about EEH error recovery, i.e. unplug VF, re-plug VF? Why 
does the commit log talk about PE hotplug? I thought we do VF (i.e. PCI 
device) hotplug, not PE.


>
> [gwshan: changelog and code refactoring]
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/eeh.h   |   1 +
>   arch/powerpc/kernel/eeh.c        |   8 ++++
>   arch/powerpc/kernel/eeh_driver.c | 100 +++++++++++++++++++++++++++++++--------
>   arch/powerpc/kernel/eeh_pe.c     |   3 +-
>   4 files changed, 90 insertions(+), 22 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 331c856..ea1f13c4 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -142,6 +142,7 @@ struct eeh_dev {
>   	struct pci_controller *phb;	/* Associated PHB		*/
>   	struct pci_dn *pdn;		/* Associated PCI device node	*/
>   	struct pci_dev *pdev;		/* Associated PCI device	*/
> +	int    in_error;		/* Error flag for eeh_dev	*/

Make it "bool".


>   	struct pci_dev *physfn;		/* Associated PF PORT		*/
>   	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
>   };
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index af9b597..28e4d73 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1227,6 +1227,14 @@ void eeh_remove_device(struct pci_dev *dev)
>   	 * from the parent PE during the BAR resotre.
>   	 */
>   	edev->pdev = NULL;
> +
> +	/*
> +	 * The flag "in_error" is used to trace EEH devices for VFs
> +	 * in error state or not. It's set in eeh_report_error(). If
> +	 * it's not set, eeh_report_{reset,resume}() won't be called
> +	 * for the VF EEH device.
> +	 */
> +	edev->in_error = 0;
>   	dev->dev.archdata.edev = NULL;
>   	if (!(edev->pe->state & EEH_PE_KEEP))
>   		eeh_rmv_from_parent_pe(edev);
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index 89eb4bc..99868e2 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
>   	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
>   	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
>
> +	edev->in_error = 1;
>   	eeh_pcid_put(dev);
>   	return NULL;
>   }
> @@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
>
>   	if (!driver->err_handler ||
>   	    !driver->err_handler->slot_reset ||
> -	    (edev->mode & EEH_DEV_NO_HANDLER)) {
> +	    (edev->mode & EEH_DEV_NO_HANDLER) ||
> +	    (!edev->in_error)) {
>   		eeh_pcid_put(dev);
>   		return NULL;
>   	}
> @@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
>

bood was_in_error = edev->in_error;
edev->in_error = false;

then use was_in_error below and there is no need to replace return with 
goto, etc -> slightly simpler code.


>   	if (!driver->err_handler ||
>   	    !driver->err_handler->resume ||
> -	    (edev->mode & EEH_DEV_NO_HANDLER)) {
> +	    (edev->mode & EEH_DEV_NO_HANDLER) ||
> +	    (!edev->in_error)) {
>   		edev->mode &= ~EEH_DEV_NO_HANDLER;
> -		eeh_pcid_put(dev);
> -		return NULL;
> +		goto out;
>   	}
>
>   	driver->err_handler->resume(dev);
>
> +out:
> +	edev->in_error = 0;
>   	eeh_pcid_put(dev);
>   	return NULL;
>   }
> @@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata)
>   	return NULL;
>   }
>
> +static void *eeh_add_virt_device(void *data, void *userdata)
> +{
> +	struct pci_driver *driver;
> +	struct eeh_dev *edev = (struct eeh_dev *)data;
> +	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> +	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
> +
> +	if (!(edev->physfn)) {
> +		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
> +			__func__, edev->phb->global_number, pdn->busno,
> +			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
> +		return NULL;
> +	}
> +
> +	driver = eeh_pcid_get(dev);
> +	if (driver) {
> +		eeh_pcid_put(dev);
> +		if (driver->err_handler)
> +			return NULL;
> +	}
> +
> +	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
> +	return NULL;
> +}
> +
>   static void *eeh_rmv_device(void *data, void *userdata)
>   {
>   	struct pci_driver *driver;
>   	struct eeh_dev *edev = (struct eeh_dev *)data;
>   	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>   	int *removed = (int *)userdata;
> +	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>
>   	/*
>   	 * Actually, we should remove the PCI bridges as well.
> @@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
>   	driver = eeh_pcid_get(dev);
>   	if (driver) {
>   		eeh_pcid_put(dev);
> -		if (driver->err_handler)
> +		if (removed && driver->err_handler)
>   			return NULL;
>   	}
>
> @@ -425,11 +455,23 @@ static void *eeh_rmv_device(void *data, void *userdata)
>   		 pci_name(dev));
>   	edev->bus = dev->bus;
>   	edev->mode |= EEH_DEV_DISCONNECTED;
> -	(*removed)++;
> +	if (removed)
> +		(*removed)++;
>
> -	pci_lock_rescan_remove();
> -	pci_stop_and_remove_bus_device(dev);
> -	pci_unlock_rescan_remove();
> +	if (edev->physfn) {
> +		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
> +		edev->pdev = NULL;
> +
> +		/*
> +		 * We have to set the VF PE number to invalid one, which is
> +		 * required to plug the VF successfully.
> +		 */
> +		pdn->pe_number = IODA_INVALID_PE;
> +	} else {
> +		pci_lock_rescan_remove();
> +		pci_stop_and_remove_bus_device(dev);
> +		pci_unlock_rescan_remove();
> +	}
>
>   	return NULL;
>   }
> @@ -548,6 +590,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>   	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
>   	struct timeval tstamp;
>   	int cnt, rc, removed = 0;
> +	struct eeh_dev *edev;
>
>   	/* pcibios will clear the counter; save the value */
>   	cnt = pe->freeze_count;
> @@ -561,12 +604,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>   	 */
>   	eeh_pe_state_mark(pe, EEH_PE_KEEP);
>   	if (bus) {
> -		pci_lock_rescan_remove();
> -		pcibios_remove_pci_devices(bus);
> -		pci_unlock_rescan_remove();
> -	} else if (frozen_bus) {
> +		if (pe->type & EEH_PE_VF)
> +			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);


I believe the rule is that if one branch of "if" uses curly braces, then 
the other should have them too.


> +		else {
> +			pci_lock_rescan_remove();
> +			pcibios_remove_pci_devices(bus);
> +			pci_unlock_rescan_remove();
> +		}
> +	} else if (frozen_bus)
>   		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
> -	}
>
>   	/*
>   	 * Reset the pci controller. (Asserts RST#; resets config space).
> @@ -607,14 +653,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>   		 * PE. We should disconnect it so the binding can be
>   		 * rebuilt when adding PCI devices.
>   		 */
> +		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
>   		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
> -		pcibios_add_pci_devices(bus);
> +		if (pe->type & EEH_PE_VF)

Move "edev = list_first_entry(&pe->edevs, struct eeh_dev, list)" here. You 
could actually do:

eeh_add_virt_device(list_first_entry(&pe->edevs, struct eeh_dev, list), NULL);

and drop local variable @edev. Or move it to this scope. Dunno.


> +			eeh_add_virt_device(edev, NULL);
> +		else
> +			pcibios_add_pci_devices(bus);
>   	} else if (frozen_bus && removed) {
>   		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>   		ssleep(5);
>
> +		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
>   		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
> -		pcibios_add_pci_devices(frozen_bus);
> +		if (pe->type & EEH_PE_VF)


The same comment as above.

> +			eeh_add_virt_device(edev, NULL);
> +		else
> +			pcibios_add_pci_devices(frozen_bus);
>   	}
>   	eeh_pe_state_clear(pe, EEH_PE_KEEP);
>
> @@ -792,11 +846,15 @@ perm_error:
>   	 * the their PCI config any more.
>   	 */
>   	if (frozen_bus) {
> -		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
> -
> -		pci_lock_rescan_remove();
> -		pcibios_remove_pci_devices(frozen_bus);
> -		pci_unlock_rescan_remove();
> +		if (pe->type & EEH_PE_VF) {
> +			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
> +			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
> +		} else {
> +			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
> +			pci_lock_rescan_remove();
> +			pcibios_remove_pci_devices(frozen_bus);
> +			pci_unlock_rescan_remove();
> +		}
>   	}
>   }
>
> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> index 260a701..5cde950 100644
> --- a/arch/powerpc/kernel/eeh_pe.c
> +++ b/arch/powerpc/kernel/eeh_pe.c
> @@ -914,7 +914,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
>   	if (pe->type & EEH_PE_PHB) {
>   		bus = pe->phb->bus;
>   	} else if (pe->type & EEH_PE_BUS ||
> -		   pe->type & EEH_PE_DEVICE) {
> +		   pe->type & EEH_PE_DEVICE ||
> +		   pe->type & EEH_PE_VF) {
>   		if (pe->bus) {
>   			bus = pe->bus;
>   			goto out;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 12/12] powerpc/eeh: Handle hot removed VF when PF is EEH aware
  2015-10-26  3:16 ` [PATCH V10 12/12] powerpc/eeh: Handle hot removed VF when PF is EEH aware Wei Yang
@ 2015-10-30  5:35   ` Alexey Kardashevskiy
  2015-10-30  7:29     ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  5:35 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:16 PM, Wei Yang wrote:
> When PF is EEH aware while VFs are not, VFs will be removed during EEH
> recovery. This is not supported in current code, while will leads to the VF
> lost.
>
> This patch fixes this by adding VFs back. VFs should be added back after PF
> get recovered properly.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

btw please remove my "sob" from this patchset (here and 11/12 at least) as 
I did not "sob" the upstream versions of these and I did not post them and 
there is no public tree of mine with these patches. When I agree that the 
patches are good to go, it will be "reviewed-by" or "acked-by".


> ---
>   arch/powerpc/include/asm/eeh.h   |  6 ++++++
>   arch/powerpc/kernel/eeh_dev.c    |  1 +
>   arch/powerpc/kernel/eeh_driver.c | 30 +++++++++++++++++++++++-------
>   3 files changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index ea1f13c4..c529a23 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -127,6 +127,11 @@ static inline bool eeh_pe_passed(struct eeh_pe *pe)
>   #define EEH_DEV_SYSFS		(1 << 9)	/* Sysfs created	*/
>   #define EEH_DEV_REMOVED		(1 << 10)	/* Removed permanently	*/
>
> +struct eeh_rmv_data {
> +	struct list_head edev_list;
> +	int removed;
> +};
> +
>   struct eeh_dev {
>   	int mode;			/* EEH mode			*/
>   	int class_code;			/* Class code of the device	*/
> @@ -139,6 +144,7 @@ struct eeh_dev {
>   	int af_cap;			/* Saved AF capability		*/
>   	struct eeh_pe *pe;		/* Associated PE		*/
>   	struct list_head list;		/* Form link list in the PE	*/
> +	struct list_head rmv_list;	/* Record the removed edev 	*/
>   	struct pci_controller *phb;	/* Associated PHB		*/
>   	struct pci_dn *pdn;		/* Associated PCI device node	*/
>   	struct pci_dev *pdev;		/* Associated PCI device	*/
> diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
> index aabba94..7815095 100644
> --- a/arch/powerpc/kernel/eeh_dev.c
> +++ b/arch/powerpc/kernel/eeh_dev.c
> @@ -67,6 +67,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
>   	edev->pdn = pdn;
>   	edev->phb = phb;
>   	INIT_LIST_HEAD(&edev->list);
> +	INIT_LIST_HEAD(&edev->rmv_list);
>
>   	return NULL;
>   }
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index 99868e2..f2406b4 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -420,7 +420,8 @@ static void *eeh_rmv_device(void *data, void *userdata)
>   	struct pci_driver *driver;
>   	struct eeh_dev *edev = (struct eeh_dev *)data;
>   	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> -	int *removed = (int *)userdata;
> +	struct eeh_rmv_data *rmv_data = (struct eeh_rmv_data *)userdata;
> +	int *removed = rmv_data ? &rmv_data->removed : NULL;


You just touched @userdata/@removed in [10/12] and now you are touching it 
again.

It feels like this patch is better to be merged into [10/12], this will 
reduce the noise about the userdata pointer changes passed into 
eeh_pe_dev_traverse() and make more sense as "powerpc/eeh: Support error 
recovery for VF PE" without adding VFs back is pretty useless.




>   	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>
>   	/*
> @@ -467,6 +468,9 @@ static void *eeh_rmv_device(void *data, void *userdata)
>   		 * required to plug the VF successfully.
>   		 */
>   		pdn->pe_number = IODA_INVALID_PE;
> +
> +		if (rmv_data)
> +			list_add(&edev->rmv_list, &rmv_data->edev_list);
>   	} else {
>   		pci_lock_rescan_remove();
>   		pci_stop_and_remove_bus_device(dev);
> @@ -585,11 +589,12 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
>    * During the reset, udev might be invoked because those affected
>    * PCI devices will be removed and then added.
>    */
> -static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
> +static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
> +				struct eeh_rmv_data *rmv_data)
>   {
>   	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
>   	struct timeval tstamp;
> -	int cnt, rc, removed = 0;
> +	int cnt, rc;
>   	struct eeh_dev *edev;
>
>   	/* pcibios will clear the counter; save the value */
> @@ -612,7 +617,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>   			pci_unlock_rescan_remove();
>   		}
>   	} else if (frozen_bus)
> -		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
> +		eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data);
>
>   	/*
>   	 * Reset the pci controller. (Asserts RST#; resets config space).
> @@ -659,7 +664,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>   			eeh_add_virt_device(edev, NULL);
>   		else
>   			pcibios_add_pci_devices(bus);
> -	} else if (frozen_bus && removed) {
> +	} else if (frozen_bus && rmv_data->removed) {
>   		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>   		ssleep(5);
>
> @@ -687,8 +692,10 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>   static void eeh_handle_normal_event(struct eeh_pe *pe)
>   {
>   	struct pci_bus *frozen_bus;
> +	struct eeh_dev *edev, *tmp;
>   	int rc = 0;
>   	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
> +	struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0};
>
>   	frozen_bus = eeh_pe_bus_get(pe);
>   	if (!frozen_bus) {
> @@ -735,7 +742,7 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
>   	 */
>   	if (result == PCI_ERS_RESULT_NONE) {
>   		pr_info("EEH: Reset with hotplug activity\n");
> -		rc = eeh_reset_device(pe, frozen_bus);
> +		rc = eeh_reset_device(pe, frozen_bus, NULL);
>   		if (rc) {
>   			pr_warn("%s: Unable to reset, err=%d\n",
>   				__func__, rc);
> @@ -787,7 +794,7 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
>   	/* If any device called out for a reset, then reset the slot */
>   	if (result == PCI_ERS_RESULT_NEED_RESET) {
>   		pr_info("EEH: Reset without hotplug activity\n");
> -		rc = eeh_reset_device(pe, NULL);
> +		rc = eeh_reset_device(pe, NULL, &rmv_data);
>   		if (rc) {
>   			pr_warn("%s: Cannot reset, err=%d\n",
>   				__func__, rc);
> @@ -807,6 +814,15 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
>   		goto hard_fail;
>   	}
>
> +	/*
> +	 * For those hot removed VFs, we should add back them after PF get
> +	 * recovered properly.
> +	 */
> +	list_for_each_entry_safe(edev, tmp, &rmv_data.edev_list, rmv_list) {
> +		eeh_add_virt_device(edev, NULL);
> +		list_del(&edev->rmv_list);
> +	}
> +
>   	/* Tell all device drivers that they can resume operations */
>   	pr_info("EEH: Notify device driver to resume\n");
>   	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 11/12] powerpc/eeh: Don't block PCI config on resetting VF PE
  2015-10-26  3:16 ` [PATCH V10 11/12] powerpc/eeh: Don't block PCI config on resetting " Wei Yang
@ 2015-10-30  5:42   ` Alexey Kardashevskiy
  2015-10-30  7:19     ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  5:42 UTC (permalink / raw)
  To: Wei Yang, gwshan, bhelgaas, mpe; +Cc: linuxppc-dev, linux-pci

On 10/26/2015 02:16 PM, Wei Yang wrote:
> From: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
> When passing through SRIOV VF from host to guest via VFIO PCI
> infrastructure, the VF is resetted by EEH specific backend
> (pcibios_set_pcie_reset_state()). We can't block the PCI config,
> otherwise, the reset (FLR or AF FLR), to be completed by PCI
> config access to the VF, can't be done. Then the VF can't be
> put into initial state when passing it to the guest and returning
> back to the host.
>
> The patch just doesn't block the VF's PCI config space when doing
> the reset. It fixes EEH error caused by DMA traffic to bogus DMA
> address on restarting guest after killing the QEMU process, which
> includes Mellanox VF passed through from host.

The patch as it is makes sense as a bugfix for our internal tree where the 
EEH VF feature was present at the time when this patch was posted but in 
this patchset is makes more sense to merge it into:

[PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE

as it is quite weird within one patchset to introduce a problem  and then 
fix it in a following patch.


> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Remove "sob: aik@..." please.


> ---
>   arch/powerpc/kernel/eeh.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 28e4d73..e1846f5 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -745,7 +745,8 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat
>   	case pcie_deassert_reset:
>   		eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
>   		eeh_unfreeze_pe(pe, false);
> -		eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
> +		if (!(pe->type & EEH_PE_VF))
> +			eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
>   		eeh_pe_dev_traverse(pe, eeh_restore_dev_state, dev);
>   		eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
>   		break;
> @@ -753,14 +754,16 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat
>   		eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
>   		eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE);
>   		eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev);
> -		eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
> +		if (!(pe->type & EEH_PE_VF))
> +			eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
>   		eeh_ops->reset(pe, EEH_RESET_HOT);
>   		break;
>   	case pcie_warm_reset:
>   		eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
>   		eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE);
>   		eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev);
> -		eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
> +		if (!(pe->type & EEH_PE_VF))
> +			eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
>   		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
>   		break;
>   	default:
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 04/12] powerpc/pci: Remove VFs prior to PF
  2015-10-30  3:04   ` Alexey Kardashevskiy
@ 2015-10-30  6:31     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-30  6:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 02:04:12PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:15 PM, Wei Yang wrote:
>>As commit ac205b7bb72f ("PCI: make sriov work with hotplug remove") indicates,
>>VFs, which might be hooked to same PCI bus as their PF should be removed
>
>A comma is missing before "should be" (or you did not need a comma after
>"VFs" may be :) ).
>

I think you are right.

>
>>before the PF. Otherwise, the PCI hot unplugging on the PCI bus would
>
>s/on/of/? "Unplugging on" does not make much sense to me in this context at
>least.
>

Sounds I need to improve my English :-)

"on" here means those PCI devices are attached to the PCI bus. So "of" is the
correct word?

Change "unplugging" to "removing" would be better?

>
>>cause kernel crash.
>>
>>The patch applies the above pattern to PowerPC PCI hotplug path.
>>
>>[gwshan: changelog]
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/kernel/pci-hotplug.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
>>index 7f9ed0c..59c4361 100644
>>--- a/arch/powerpc/kernel/pci-hotplug.c
>>+++ b/arch/powerpc/kernel/pci-hotplug.c
>>@@ -55,7 +55,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
>>
>>  	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
>>  		 pci_domain_nr(bus),  bus->number);
>>-	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
>>+	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
>>  		pr_debug("   Removing %s...\n", pci_name(dev));
>>  		pci_stop_and_remove_bus_device(dev);
>>  	}
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs
  2015-10-30  3:22   ` Alexey Kardashevskiy
@ 2015-10-30  6:37     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-30  6:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 02:22:43PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:15 PM, Wei Yang wrote:
>>EEH address cache, which helps to locate the PCI device according to
>>the given (physical) MMIO address, didn't cover PCI bridges. Also, it
>>shouldn't return PF
>
>"it shouldn't return" is about the cache, right? eeh_addr_cache_get_dev() -
>this guy can "return", the cache cannot.
>

Here I want to say if we cache the PF's IOV BAR, eeh_addr_cache_get_dev()
would return PF when the address is for VF.

>>with address in PF's IOV BARs. Instead, the VFs
>>should be returned.
>>
>>Also, by doing so, it removes the type check in
>>eeh_addr_cache_insert_dev(), since bridge's window would not be cached.
>>
>>The patch restricts the address cache to cover first 7 BARs for the
>>above purposes.
>
>
>I'd better understand something like this :)
>
>This restricts the EEH address cache to use only first 7 BARs. This makes
>__eeh_addr_cache_insert_dev() ignore PCI bridge windows and IOV BARs. As the
>result of this change, eeh_addr_cache_get_dev() will return VFs from VF's
>resource addresses instead of parent PFs.
>
>This removes extra check for a PCI bridge as we limit
>__eeh_addr_cache_insert_dev() to 7 BARs and this effectively excludes PCI
>bridges from being cached.
>

Yep, I think this one is more clear. Would use this one.

>
>>
>>[gwshan: changelog]
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/kernel/eeh_cache.c | 6 +-----
>>  1 file changed, 1 insertion(+), 5 deletions(-)
>>
>>diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
>>index a1e86e1..e6887f0 100644
>>--- a/arch/powerpc/kernel/eeh_cache.c
>>+++ b/arch/powerpc/kernel/eeh_cache.c
>>@@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
>>  	}
>>
>>  	/* Walk resources on this device, poke them into the tree */
>>-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
>>+	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>>  		resource_size_t start = pci_resource_start(dev,i);
>>  		resource_size_t end = pci_resource_end(dev,i);
>>  		unsigned long flags = pci_resource_flags(dev,i);
>>@@ -222,10 +222,6 @@ void eeh_addr_cache_insert_dev(struct pci_dev *dev)
>>  {
>>  	unsigned long flags;
>>
>>-	/* Ignore PCI bridges */
>>-	if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
>>-		return;
>>-
>>  	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
>>  	__eeh_addr_cache_insert_dev(dev);
>>  	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 06/12] powerpc/powernv: EEH device for VF
  2015-10-30  3:33   ` Alexey Kardashevskiy
@ 2015-10-30  6:52     ` Wei Yang
  2015-10-30  7:36       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-30  6:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 02:33:49PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:15 PM, Wei Yang wrote:
>>VFs and their corresponding pci_dn instances are created and released
>>dynamically as their PF's SRIOV capability is enabled and disabled.
>>The patch creates and releases EEH devices for VFs when creating and
>>releasing their pci_dn instances, which means EEH devices and pci_dn
>>instances have same life cycle. Also, VF's EEH device is identified
>>by (struct eeh_dev::physfn).
>
>
>The add_dev_pci_data() helper (the one you hack) does not create pci_dn
>instances. The add_one_dev_pci_data() helper does.
>

Yes, you are right. The patch here create edev after the pci_dn is created.

So which part in the log you think is not accurate?

>
>>
>>[gwshan: changelog and removed CONFIG_PCI_IOV]
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/eeh.h |  1 +
>>  arch/powerpc/kernel/pci_dn.c   | 12 ++++++++++++
>>  2 files changed, 13 insertions(+)
>>
>>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>index c5eb86f..6c383ad 100644
>>--- a/arch/powerpc/include/asm/eeh.h
>>+++ b/arch/powerpc/include/asm/eeh.h
>>@@ -140,6 +140,7 @@ struct eeh_dev {
>>  	struct pci_controller *phb;	/* Associated PHB		*/
>>  	struct pci_dn *pdn;		/* Associated PCI device node	*/
>>  	struct pci_dev *pdev;		/* Associated PCI device	*/
>>+	struct pci_dev *physfn;		/* Associated PF PORT		*/
>>  	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
>>  };
>>
>>diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>index f771130..f0ddde7 100644
>>--- a/arch/powerpc/kernel/pci_dn.c
>>+++ b/arch/powerpc/kernel/pci_dn.c
>>@@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>>  struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>>  {
>>  #ifdef CONFIG_PCI_IOV
>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>  	struct pci_dn *parent, *pdn;
>>+	struct eeh_dev *edev;
>>  	int i;
>>
>>  	/* Only support IOV for now */
>>@@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>>  				 __func__, i);
>>  			return NULL;
>>  		}
>>+		eeh_dev_init(pdn, hose);
>>+		edev = pdn_to_eeh_dev(pdn);
>
>
>In theory, pdn_to_eeh_dev() can return NULL. In this patch, it is not clear
>if pdn->edev gets initialized before or after add_dev_pci_data().
>

Yep, the return value should be checked.

pdn->edev is initialized in eeh_dev_init() which is called in
add_dev_pci_data(). The order is not clear?

>
>
>>+		edev->physfn = pdev;
>>  	}
>>  #endif /* CONFIG_PCI_IOV */
>>
>>@@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
>>  	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
>>  		list_for_each_entry_safe(pdn, tmp,
>>  			&parent->child_list, list) {
>>+			struct eeh_dev *edev;
>>  			if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
>>  			    pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
>>  				continue;
>>
>>+			edev = pdn_to_eeh_dev(pdn);
>>+			if (edev) {
>>+				pdn->edev = NULL;
>>+				kfree(edev);
>>+			}
>>+
>>  			if (!list_empty(&pdn->list))
>>  				list_del(&pdn->list);
>>
>>
>
>
>-- 
>Alexey
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 07/12] powerpc/eeh: Create PE for VFs
  2015-10-30  3:46   ` Alexey Kardashevskiy
@ 2015-10-30  6:59     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-30  6:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 02:46:35PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:15 PM, Wei Yang wrote:
>>Current EEH recovery code works with the assumption: the PE has primary
>>bus. Unfortunately, that's not true for VF PEs, which generally contains
>>one or multiple VFs (for VF group case).
>
>What is that "VF group case"? Is not it a "compound PE" thingy which you were
>removing in "SRIOV redesign patchset"?
>

I think you are right.

The commit log is not correct, especially after SRIOV redesign.
Will remove this part.

>The patch might be ok but the commit log above does not explain why the
>existing way of PEs allocation would not work - somehow it works for a
>primary bus now, why would not it work on other buses?
>
>
>>The patch creates PEs for VFs in the weak function
>>pcibios_bus_add_device().Those PEs for VFs are identified with newly
>>introduced flag EEH_PE_VF so that we handle them differently during EEH
>>recovery.
>>
>>[gwshan: changelog and code refactoring]
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/eeh.h               |  1 +
>>  arch/powerpc/kernel/eeh_pe.c                 | 10 ++++++++--
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 16 ++++++++++++++++
>>  3 files changed, 25 insertions(+), 2 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>index 6c383ad..ec21f8f 100644
>>--- a/arch/powerpc/include/asm/eeh.h
>>+++ b/arch/powerpc/include/asm/eeh.h
>>@@ -72,6 +72,7 @@ struct pci_dn;
>>  #define EEH_PE_PHB	(1 << 1)	/* PHB PE    */
>>  #define EEH_PE_DEVICE 	(1 << 2)	/* Device PE */
>>  #define EEH_PE_BUS	(1 << 3)	/* Bus PE    */
>>+#define EEH_PE_VF	(1 << 4)	/* VF PE     */
>>
>>  #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
>>  #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
>>diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>>index 35f0b62..260a701 100644
>>--- a/arch/powerpc/kernel/eeh_pe.c
>>+++ b/arch/powerpc/kernel/eeh_pe.c
>>@@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
>>  	 * EEH device already having associated PE, but
>>  	 * the direct parent EEH device doesn't have yet.
>>  	 */
>>-	pdn = pdn ? pdn->parent : NULL;
>>+	if (edev->physfn)
>>+		pdn = pci_get_pdn(edev->physfn);
>>+	else
>>+		pdn = pdn ? pdn->parent : NULL;
>>  	while (pdn) {
>>  		/* We're poking out of PCI territory */
>>  		parent = pdn_to_eeh_dev(pdn);
>>@@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>>  	}
>>
>>  	/* Create a new EEH PE */
>>-	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>>+	if (edev->physfn)
>>+		pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
>>+	else
>>+		pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>>  	if (!pe) {
>>  		pr_err("%s: out of memory!\n", __func__);
>>  		return -ENOMEM;
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index 7cf0df8..cfd55dd 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -1524,6 +1524,22 @@ static struct eeh_ops pnv_eeh_ops = {
>>  	.restore_config		= pnv_eeh_restore_config
>>  };
>>
>>+void pcibios_bus_add_device(struct pci_dev *pdev)
>>+{
>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>+
>>+	if (!pdev->is_virtfn)
>>+		return;
>>+
>>+	/*
>>+	 * The following operations will fail if VF's sysfs files
>>+	 * aren't created or its resources aren't finalized.
>>+	 */
>>+	eeh_add_device_early(pdn);
>>+	eeh_add_device_late(pdev);
>>+	eeh_sysfs_add_device(pdev);
>>+}
>>+
>>  /**
>>   * eeh_powernv_init - Register platform dependent EEH operations
>>   *
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE
  2015-10-30  4:11   ` Alexey Kardashevskiy
@ 2015-10-30  7:18     ` Wei Yang
  2015-10-30  8:05       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-10-30  7:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 03:11:20PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:15 PM, Wei Yang wrote:
>>PEs for VFs don't have primary bus. So they have to have their own reset
>>backend, which is used during EEH recovery. The patch implements the reset
>>backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
>>in the PE.
>>
>>[gwshan: changelog and code refactoring]
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/eeh.h               |   1 +
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 134 ++++++++++++++++++++++++++-
>>  2 files changed, 134 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>index ec21f8f..331c856 100644
>>--- a/arch/powerpc/include/asm/eeh.h
>>+++ b/arch/powerpc/include/asm/eeh.h
>>@@ -136,6 +136,7 @@ struct eeh_dev {
>>  	int pcix_cap;			/* Saved PCIx capability	*/
>>  	int pcie_cap;			/* Saved PCIe capability	*/
>>  	int aer_cap;			/* Saved AER capability		*/
>>+	int af_cap;			/* Saved AF capability		*/
>>  	struct eeh_pe *pe;		/* Associated PE		*/
>>  	struct list_head list;		/* Form link list in the PE	*/
>>  	struct pci_controller *phb;	/* Associated PHB		*/
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index cfd55dd..017cd72 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -404,6 +404,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
>>  	edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
>>  	edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
>>  	edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
>>+	edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
>>  	if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
>>  		edev->mode |= EEH_DEV_BRIDGE;
>>  		if (edev->pcie_cap) {
>>@@ -893,6 +894,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>  	return 0;
>>  }
>>
>>+static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
>>+				     u16 mask, bool af_flr_rst)
>>+{
>>+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>+	int status, i;
>>+
>>+	/* Wait for Transaction Pending bit to be cleared */
>>+	for (i = 0; i < 4; i++) {
>>+		eeh_ops->read_config(pdn, pos, 2, &status);
>
>
>gcc should have complained on using uninitialized @status here.
>

I remove the obj file and re-compile the file, not the warning.
And took a look at other places where read_config() is called. The laster
parameter is not initialized before called.

You see the error during build?

>
>>+		if (!(status & mask))
>>+			return;
>>+
>>+		msleep((1 << i) * 100);
>>+	}
>>+
>>+	pr_warn("%s: Pending transaction while issuing %s FLR to "
>>+		"%04x:%02x:%02x.%01x\n",
>
>Do not wrap user-visible strings.
>

Will change this.

>
>>+		__func__, af_flr_rst ? "AF" : "",
>>+		edev->phb->global_number, pdn->busno,
>>+		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>>+}
>>+
>>+static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
>>+{
>>+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>+	u32 reg;
>>+
>>+	if (!edev->pcie_cap)
>>+		return -ENOTTY;
>
>
>Can pnv_eeh_do_flr() be really called on a non PCIe device, can we get that
>far? WARN_ON_ONCE() may be?
>

So you suggest to add a WARN_ON_ONCE() in this condition, right?

>
>>+
>>+	eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, &reg);
>
>
>... and here about uninitialized @reg.
>
>
>>+	if (!(reg & PCI_EXP_DEVCAP_FLR))
>>+		return -ENOTTY;
>>+
>>+	switch (option) {
>>+	case EEH_RESET_HOT:
>>+	case EEH_RESET_FUNDAMENTAL:
>>+		pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
>>+					 PCI_EXP_DEVSTA_TRPND, false);
>>+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>+				     4, &reg);
>>+		reg |= PCI_EXP_DEVCTL_BCR_FLR;
>>+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>+				      4, reg);
>>+		msleep(EEH_PE_RST_HOLD_TIME);
>>+		break;
>>+	case EEH_RESET_DEACTIVATE:
>>+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>+				     4, &reg);
>>+		reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
>>+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>+				      4, reg);
>>+		msleep(EEH_PE_RST_SETTLE_TIME);
>>+		break;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
>>+{
>>+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>+	u32 cap;
>>+
>>+	if (!edev->af_cap)
>>+		return -ENOTTY;
>>+
>>+	eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);
>
>
>... and here about @cap.
>
>>+	if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
>>+		return -ENOTTY;
>>+
>>+	switch (option) {
>>+	case EEH_RESET_HOT:
>>+	case EEH_RESET_FUNDAMENTAL:
>>+		/*
>>+		 * Wait for Transaction Pending bit to clear. A word-aligned
>>+		 * test is used, so we use the conrol offset rather than status
>>+		 * and shift the test bit to match.
>
>
>Why word-aligned (not byte or double word)?
>

I copied this words from pci_af_flr(). Actually, I don't tried to understand
this reason.

>>+		 */
>>+		pnv_eeh_wait_for_pending(pdn, edev->af_cap + PCI_AF_CTRL,
>>+					 PCI_AF_STATUS_TP << 8, true);
>>+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL,
>>+				      1, PCI_AF_CTRL_FLR);
>>+		msleep(EEH_PE_RST_HOLD_TIME);
>>+		break;
>>+	case EEH_RESET_DEACTIVATE:
>>+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL, 1, 0);
>>+		msleep(EEH_PE_RST_SETTLE_TIME);
>
>
>btw there is an unrelated issue with EEH_PE_RST_SETTLE_TIME which is defined
>as 1800 which is A LOT (+250ms from EEH_PE_RST_HOLD_TIME and for some reason
>this is actually doubled so there is another reset somewhere).
>

I don't know the reason for this value. This code keeps aligned with other
reset functions, like pnv_eeh_bridge_reset().

>Booting a guest with 63 VFs takes 6 minutes or so, is there a good reason for
>such a huge timeout?
>
>
>>+		break;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int pnv_eeh_reset_vf(struct pci_dn *pdn, int option)
>>+{
>>+	int ret;
>>+
>>+	ret = pnv_eeh_do_flr(pdn, option);
>>+	if (ret != -ENOTTY)
>>+		return ret;
>>+
>>+	return pnv_eeh_do_af_flr(pdn, option);
>>+}
>>+
>>+static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
>>+{
>>+	struct eeh_dev *edev, *tmp;
>>+	struct pci_dn *pdn;
>>+	int ret;
>>+
>>+	eeh_pe_for_each_dev(pe, edev, tmp) {
>>+		pdn = eeh_dev_to_pdn(edev);
>>+		ret = pnv_eeh_reset_vf(pdn, option);
>>+		if (ret)
>>+			return ret;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>  void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>>  {
>>  	struct pci_controller *hose;
>>@@ -968,7 +1090,9 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
>>  		}
>>
>>  		bus = eeh_pe_bus_get(pe);
>>-		if (pci_is_root_bus(bus) ||
>>+		if (pe->type & EEH_PE_VF)
>>+			ret = pnv_eeh_vf_pe_reset(pe, option);
>>+		else if (pci_is_root_bus(bus) ||
>>  			pci_is_root_bus(bus->parent))
>>  			ret = pnv_eeh_root_reset(hose, option);
>>  		else
>>@@ -1108,6 +1232,14 @@ static inline bool pnv_eeh_cfg_blocked(struct pci_dn *pdn)
>>  	if (!edev || !edev->pe)
>>  		return false;
>>
>>+	/*
>>+	 * We will issue FLR or AF FLR to all VFs, which are contained
>>+	 * in VF PE. It relies on the EEH PCI config accessors. So we
>>+	 * can't block them during the window.
>>+	 */
>>+	if ((edev->physfn) && (edev->pe->state & EEH_PE_RESET))
>
>
>Extra braces around edev->physfn.
>

Will remove it.

>
>
>>+		return false;
>>+
>>  	if (edev->pe->state & EEH_PE_CFG_BLOCKED)
>>  		return true;
>>
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 11/12] powerpc/eeh: Don't block PCI config on resetting VF PE
  2015-10-30  5:42   ` Alexey Kardashevskiy
@ 2015-10-30  7:19     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-30  7:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 04:42:07PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:16 PM, Wei Yang wrote:
>>From: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>
>>When passing through SRIOV VF from host to guest via VFIO PCI
>>infrastructure, the VF is resetted by EEH specific backend
>>(pcibios_set_pcie_reset_state()). We can't block the PCI config,
>>otherwise, the reset (FLR or AF FLR), to be completed by PCI
>>config access to the VF, can't be done. Then the VF can't be
>>put into initial state when passing it to the guest and returning
>>back to the host.
>>
>>The patch just doesn't block the VF's PCI config space when doing
>>the reset. It fixes EEH error caused by DMA traffic to bogus DMA
>>address on restarting guest after killing the QEMU process, which
>>includes Mellanox VF passed through from host.
>
>The patch as it is makes sense as a bugfix for our internal tree where the
>EEH VF feature was present at the time when this patch was posted but in this
>patchset is makes more sense to merge it into:
>
>[PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE
>
>as it is quite weird within one patchset to introduce a problem  and then fix
>it in a following patch.
>

Sure, got it.

>
>>Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>Remove "sob: aik@..." please.
>
>
>>---
>>  arch/powerpc/kernel/eeh.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>>index 28e4d73..e1846f5 100644
>>--- a/arch/powerpc/kernel/eeh.c
>>+++ b/arch/powerpc/kernel/eeh.c
>>@@ -745,7 +745,8 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat
>>  	case pcie_deassert_reset:
>>  		eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
>>  		eeh_unfreeze_pe(pe, false);
>>-		eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
>>+		if (!(pe->type & EEH_PE_VF))
>>+			eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
>>  		eeh_pe_dev_traverse(pe, eeh_restore_dev_state, dev);
>>  		eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
>>  		break;
>>@@ -753,14 +754,16 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat
>>  		eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
>>  		eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE);
>>  		eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev);
>>-		eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
>>+		if (!(pe->type & EEH_PE_VF))
>>+			eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
>>  		eeh_ops->reset(pe, EEH_RESET_HOT);
>>  		break;
>>  	case pcie_warm_reset:
>>  		eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
>>  		eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE);
>>  		eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev);
>>-		eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
>>+		if (!(pe->type & EEH_PE_VF))
>>+			eeh_pe_state_mark(pe, EEH_PE_CFG_BLOCKED);
>>  		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
>>  		break;
>>  	default:
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 12/12] powerpc/eeh: Handle hot removed VF when PF is EEH aware
  2015-10-30  5:35   ` Alexey Kardashevskiy
@ 2015-10-30  7:29     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-30  7:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 04:35:54PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:16 PM, Wei Yang wrote:
>>When PF is EEH aware while VFs are not, VFs will be removed during EEH
>>recovery. This is not supported in current code, while will leads to the VF
>>lost.
>>
>>This patch fixes this by adding VFs back. VFs should be added back after PF
>>get recovered properly.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>btw please remove my "sob" from this patchset (here and 11/12 at least) as I
>did not "sob" the upstream versions of these and I did not post them and
>there is no public tree of mine with these patches. When I agree that the
>patches are good to go, it will be "reviewed-by" or "acked-by".
>

Sure, I would obey this rule in the future.

>
>>---
>>  arch/powerpc/include/asm/eeh.h   |  6 ++++++
>>  arch/powerpc/kernel/eeh_dev.c    |  1 +
>>  arch/powerpc/kernel/eeh_driver.c | 30 +++++++++++++++++++++++-------
>>  3 files changed, 30 insertions(+), 7 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>index ea1f13c4..c529a23 100644
>>--- a/arch/powerpc/include/asm/eeh.h
>>+++ b/arch/powerpc/include/asm/eeh.h
>>@@ -127,6 +127,11 @@ static inline bool eeh_pe_passed(struct eeh_pe *pe)
>>  #define EEH_DEV_SYSFS		(1 << 9)	/* Sysfs created	*/
>>  #define EEH_DEV_REMOVED		(1 << 10)	/* Removed permanently	*/
>>
>>+struct eeh_rmv_data {
>>+	struct list_head edev_list;
>>+	int removed;
>>+};
>>+
>>  struct eeh_dev {
>>  	int mode;			/* EEH mode			*/
>>  	int class_code;			/* Class code of the device	*/
>>@@ -139,6 +144,7 @@ struct eeh_dev {
>>  	int af_cap;			/* Saved AF capability		*/
>>  	struct eeh_pe *pe;		/* Associated PE		*/
>>  	struct list_head list;		/* Form link list in the PE	*/
>>+	struct list_head rmv_list;	/* Record the removed edev 	*/
>>  	struct pci_controller *phb;	/* Associated PHB		*/
>>  	struct pci_dn *pdn;		/* Associated PCI device node	*/
>>  	struct pci_dev *pdev;		/* Associated PCI device	*/
>>diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
>>index aabba94..7815095 100644
>>--- a/arch/powerpc/kernel/eeh_dev.c
>>+++ b/arch/powerpc/kernel/eeh_dev.c
>>@@ -67,6 +67,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
>>  	edev->pdn = pdn;
>>  	edev->phb = phb;
>>  	INIT_LIST_HEAD(&edev->list);
>>+	INIT_LIST_HEAD(&edev->rmv_list);
>>
>>  	return NULL;
>>  }
>>diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>index 99868e2..f2406b4 100644
>>--- a/arch/powerpc/kernel/eeh_driver.c
>>+++ b/arch/powerpc/kernel/eeh_driver.c
>>@@ -420,7 +420,8 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>  	struct pci_driver *driver;
>>  	struct eeh_dev *edev = (struct eeh_dev *)data;
>>  	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>>-	int *removed = (int *)userdata;
>>+	struct eeh_rmv_data *rmv_data = (struct eeh_rmv_data *)userdata;
>>+	int *removed = rmv_data ? &rmv_data->removed : NULL;
>
>
>You just touched @userdata/@removed in [10/12] and now you are touching it
>again.
>
>It feels like this patch is better to be merged into [10/12], this will
>reduce the noise about the userdata pointer changes passed into
>eeh_pe_dev_traverse() and make more sense as "powerpc/eeh: Support error
>recovery for VF PE" without adding VFs back is pretty useless.
>

Agree, will merge them.

>
>
>
>>  	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>>
>>  	/*
>>@@ -467,6 +468,9 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>  		 * required to plug the VF successfully.
>>  		 */
>>  		pdn->pe_number = IODA_INVALID_PE;
>>+
>>+		if (rmv_data)
>>+			list_add(&edev->rmv_list, &rmv_data->edev_list);
>>  	} else {
>>  		pci_lock_rescan_remove();
>>  		pci_stop_and_remove_bus_device(dev);
>>@@ -585,11 +589,12 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
>>   * During the reset, udev might be invoked because those affected
>>   * PCI devices will be removed and then added.
>>   */
>>-static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>+static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>>+				struct eeh_rmv_data *rmv_data)
>>  {
>>  	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
>>  	struct timeval tstamp;
>>-	int cnt, rc, removed = 0;
>>+	int cnt, rc;
>>  	struct eeh_dev *edev;
>>
>>  	/* pcibios will clear the counter; save the value */
>>@@ -612,7 +617,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>  			pci_unlock_rescan_remove();
>>  		}
>>  	} else if (frozen_bus)
>>-		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
>>+		eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data);
>>
>>  	/*
>>  	 * Reset the pci controller. (Asserts RST#; resets config space).
>>@@ -659,7 +664,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>  			eeh_add_virt_device(edev, NULL);
>>  		else
>>  			pcibios_add_pci_devices(bus);
>>-	} else if (frozen_bus && removed) {
>>+	} else if (frozen_bus && rmv_data->removed) {
>>  		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>>  		ssleep(5);
>>
>>@@ -687,8 +692,10 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>  static void eeh_handle_normal_event(struct eeh_pe *pe)
>>  {
>>  	struct pci_bus *frozen_bus;
>>+	struct eeh_dev *edev, *tmp;
>>  	int rc = 0;
>>  	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
>>+	struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0};
>>
>>  	frozen_bus = eeh_pe_bus_get(pe);
>>  	if (!frozen_bus) {
>>@@ -735,7 +742,7 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
>>  	 */
>>  	if (result == PCI_ERS_RESULT_NONE) {
>>  		pr_info("EEH: Reset with hotplug activity\n");
>>-		rc = eeh_reset_device(pe, frozen_bus);
>>+		rc = eeh_reset_device(pe, frozen_bus, NULL);
>>  		if (rc) {
>>  			pr_warn("%s: Unable to reset, err=%d\n",
>>  				__func__, rc);
>>@@ -787,7 +794,7 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
>>  	/* If any device called out for a reset, then reset the slot */
>>  	if (result == PCI_ERS_RESULT_NEED_RESET) {
>>  		pr_info("EEH: Reset without hotplug activity\n");
>>-		rc = eeh_reset_device(pe, NULL);
>>+		rc = eeh_reset_device(pe, NULL, &rmv_data);
>>  		if (rc) {
>>  			pr_warn("%s: Cannot reset, err=%d\n",
>>  				__func__, rc);
>>@@ -807,6 +814,15 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
>>  		goto hard_fail;
>>  	}
>>
>>+	/*
>>+	 * For those hot removed VFs, we should add back them after PF get
>>+	 * recovered properly.
>>+	 */
>>+	list_for_each_entry_safe(edev, tmp, &rmv_data.edev_list, rmv_list) {
>>+		eeh_add_virt_device(edev, NULL);
>>+		list_del(&edev->rmv_list);
>>+	}
>>+
>>  	/* Tell all device drivers that they can resume operations */
>>  	pr_info("EEH: Notify device driver to resume\n");
>>  	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 06/12] powerpc/powernv: EEH device for VF
  2015-10-30  6:52     ` Wei Yang
@ 2015-10-30  7:36       ` Alexey Kardashevskiy
  2015-10-30  7:58         ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  7:36 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On 10/30/2015 05:52 PM, Wei Yang wrote:
> On Fri, Oct 30, 2015 at 02:33:49PM +1100, Alexey Kardashevskiy wrote:
>> On 10/26/2015 02:15 PM, Wei Yang wrote:
>>> VFs and their corresponding pci_dn instances are created and released
>>> dynamically as their PF's SRIOV capability is enabled and disabled.
>>> The patch creates and releases EEH devices for VFs when creating and
>>> releasing their pci_dn instances, which means EEH devices and pci_dn
>>> instances have same life cycle. Also, VF's EEH device is identified
>>> by (struct eeh_dev::physfn).
>>
>>
>> The add_dev_pci_data() helper (the one you hack) does not create pci_dn
>> instances. The add_one_dev_pci_data() helper does.
>>
>
> Yes, you are right. The patch here create edev after the pci_dn is created.
>
> So which part in the log you think is not accurate?


The commit log is ok, I just thought loud that eeh_dev_init() could go to 
add_one_dev_pci_data() to make things more clear.



>>
>>>
>>> [gwshan: changelog and removed CONFIG_PCI_IOV]
>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/include/asm/eeh.h |  1 +
>>>   arch/powerpc/kernel/pci_dn.c   | 12 ++++++++++++
>>>   2 files changed, 13 insertions(+)
>>>
>>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>> index c5eb86f..6c383ad 100644
>>> --- a/arch/powerpc/include/asm/eeh.h
>>> +++ b/arch/powerpc/include/asm/eeh.h
>>> @@ -140,6 +140,7 @@ struct eeh_dev {
>>>   	struct pci_controller *phb;	/* Associated PHB		*/
>>>   	struct pci_dn *pdn;		/* Associated PCI device node	*/
>>>   	struct pci_dev *pdev;		/* Associated PCI device	*/
>>> +	struct pci_dev *physfn;		/* Associated PF PORT		*/
>>>   	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
>>>   };
>>>
>>> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>> index f771130..f0ddde7 100644
>>> --- a/arch/powerpc/kernel/pci_dn.c
>>> +++ b/arch/powerpc/kernel/pci_dn.c
>>> @@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>>>   struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>>>   {
>>>   #ifdef CONFIG_PCI_IOV
>>> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>>   	struct pci_dn *parent, *pdn;
>>> +	struct eeh_dev *edev;
>>>   	int i;
>>>
>>>   	/* Only support IOV for now */
>>> @@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>>>   				 __func__, i);
>>>   			return NULL;
>>>   		}
>>> +		eeh_dev_init(pdn, hose);
>>> +		edev = pdn_to_eeh_dev(pdn);
>>
>>
>> In theory, pdn_to_eeh_dev() can return NULL. In this patch, it is not clear
>> if pdn->edev gets initialized before or after add_dev_pci_data().
>>
>
> Yep, the return value should be checked.

May be BUG_ON will be enough, up to you.


>
> pdn->edev is initialized in eeh_dev_init() which is called in
> add_dev_pci_data(). The order is not clear?
>
>>
>>
>>> +		edev->physfn = pdev;
>>>   	}
>>>   #endif /* CONFIG_PCI_IOV */
>>>
>>> @@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
>>>   	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
>>>   		list_for_each_entry_safe(pdn, tmp,
>>>   			&parent->child_list, list) {
>>> +			struct eeh_dev *edev;
>>>   			if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
>>>   			    pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
>>>   				continue;
>>>
>>> +			edev = pdn_to_eeh_dev(pdn);
>>> +			if (edev) {
>>> +				pdn->edev = NULL;
>>> +				kfree(edev);
>>> +			}
>>> +
>>>   			if (!list_empty(&pdn->list))
>>>   				list_del(&pdn->list);
>>>
>>>
>>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 06/12] powerpc/powernv: EEH device for VF
  2015-10-30  7:36       ` Alexey Kardashevskiy
@ 2015-10-30  7:58         ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-30  7:58 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 06:36:01PM +1100, Alexey Kardashevskiy wrote:
>On 10/30/2015 05:52 PM, Wei Yang wrote:
>>On Fri, Oct 30, 2015 at 02:33:49PM +1100, Alexey Kardashevskiy wrote:
>>>On 10/26/2015 02:15 PM, Wei Yang wrote:
>>>>VFs and their corresponding pci_dn instances are created and released
>>>>dynamically as their PF's SRIOV capability is enabled and disabled.
>>>>The patch creates and releases EEH devices for VFs when creating and
>>>>releasing their pci_dn instances, which means EEH devices and pci_dn
>>>>instances have same life cycle. Also, VF's EEH device is identified
>>>>by (struct eeh_dev::physfn).
>>>
>>>
>>>The add_dev_pci_data() helper (the one you hack) does not create pci_dn
>>>instances. The add_one_dev_pci_data() helper does.
>>>
>>
>>Yes, you are right. The patch here create edev after the pci_dn is created.
>>
>>So which part in the log you think is not accurate?
>
>
>The commit log is ok, I just thought loud that eeh_dev_init() could go to
>add_one_dev_pci_data() to make things more clear.
>

I thought this is are good suggestion.

My thought is, when we don't have VF, pci_dn and edev are two different thing.
We create pci_dn first and then initialize the edev. So mix the initialization
of them together is not that clear.

Not sure you agree or not.

>
>
>>>
>>>>
>>>>[gwshan: changelog and removed CONFIG_PCI_IOV]
>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/include/asm/eeh.h |  1 +
>>>>  arch/powerpc/kernel/pci_dn.c   | 12 ++++++++++++
>>>>  2 files changed, 13 insertions(+)
>>>>
>>>>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>>>index c5eb86f..6c383ad 100644
>>>>--- a/arch/powerpc/include/asm/eeh.h
>>>>+++ b/arch/powerpc/include/asm/eeh.h
>>>>@@ -140,6 +140,7 @@ struct eeh_dev {
>>>>  	struct pci_controller *phb;	/* Associated PHB		*/
>>>>  	struct pci_dn *pdn;		/* Associated PCI device node	*/
>>>>  	struct pci_dev *pdev;		/* Associated PCI device	*/
>>>>+	struct pci_dev *physfn;		/* Associated PF PORT		*/
>>>>  	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
>>>>  };
>>>>
>>>>diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>>>index f771130..f0ddde7 100644
>>>>--- a/arch/powerpc/kernel/pci_dn.c
>>>>+++ b/arch/powerpc/kernel/pci_dn.c
>>>>@@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>>>>  struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>>>>  {
>>>>  #ifdef CONFIG_PCI_IOV
>>>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>>>  	struct pci_dn *parent, *pdn;
>>>>+	struct eeh_dev *edev;
>>>>  	int i;
>>>>
>>>>  	/* Only support IOV for now */
>>>>@@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
>>>>  				 __func__, i);
>>>>  			return NULL;
>>>>  		}
>>>>+		eeh_dev_init(pdn, hose);
>>>>+		edev = pdn_to_eeh_dev(pdn);
>>>
>>>
>>>In theory, pdn_to_eeh_dev() can return NULL. In this patch, it is not clear
>>>if pdn->edev gets initialized before or after add_dev_pci_data().
>>>
>>
>>Yep, the return value should be checked.
>
>May be BUG_ON will be enough, up to you.
>

Yep, thanks.

>
>>
>>pdn->edev is initialized in eeh_dev_init() which is called in
>>add_dev_pci_data(). The order is not clear?
>>
>>>
>>>
>>>>+		edev->physfn = pdev;
>>>>  	}
>>>>  #endif /* CONFIG_PCI_IOV */
>>>>
>>>>@@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
>>>>  	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
>>>>  		list_for_each_entry_safe(pdn, tmp,
>>>>  			&parent->child_list, list) {
>>>>+			struct eeh_dev *edev;
>>>>  			if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
>>>>  			    pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
>>>>  				continue;
>>>>
>>>>+			edev = pdn_to_eeh_dev(pdn);
>>>>+			if (edev) {
>>>>+				pdn->edev = NULL;
>>>>+				kfree(edev);
>>>>+			}
>>>>+
>>>>  			if (!list_empty(&pdn->list))
>>>>  				list_del(&pdn->list);
>>>>
>>>>
>>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE
  2015-10-30  7:18     ` Wei Yang
@ 2015-10-30  8:05       ` Alexey Kardashevskiy
  2015-11-02 22:45         ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-30  8:05 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On 10/30/2015 06:18 PM, Wei Yang wrote:
> On Fri, Oct 30, 2015 at 03:11:20PM +1100, Alexey Kardashevskiy wrote:
>> On 10/26/2015 02:15 PM, Wei Yang wrote:
>>> PEs for VFs don't have primary bus. So they have to have their own reset
>>> backend, which is used during EEH recovery. The patch implements the reset
>>> backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
>>> in the PE.
>>>
>>> [gwshan: changelog and code refactoring]
>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/include/asm/eeh.h               |   1 +
>>>   arch/powerpc/platforms/powernv/eeh-powernv.c | 134 ++++++++++++++++++++++++++-
>>>   2 files changed, 134 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>> index ec21f8f..331c856 100644
>>> --- a/arch/powerpc/include/asm/eeh.h
>>> +++ b/arch/powerpc/include/asm/eeh.h
>>> @@ -136,6 +136,7 @@ struct eeh_dev {
>>>   	int pcix_cap;			/* Saved PCIx capability	*/
>>>   	int pcie_cap;			/* Saved PCIe capability	*/
>>>   	int aer_cap;			/* Saved AER capability		*/
>>> +	int af_cap;			/* Saved AF capability		*/
>>>   	struct eeh_pe *pe;		/* Associated PE		*/
>>>   	struct list_head list;		/* Form link list in the PE	*/
>>>   	struct pci_controller *phb;	/* Associated PHB		*/
>>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> index cfd55dd..017cd72 100644
>>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> @@ -404,6 +404,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
>>>   	edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
>>>   	edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
>>>   	edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
>>> +	edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
>>>   	if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
>>>   		edev->mode |= EEH_DEV_BRIDGE;
>>>   		if (edev->pcie_cap) {
>>> @@ -893,6 +894,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>>   	return 0;
>>>   }
>>>
>>> +static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
>>> +				     u16 mask, bool af_flr_rst)

Missed this - @af_flr_rst is only used for warnings so better do:
s/bool af_flr_rst/const char *reset_type/
to make it explicit.


>>> +{
>>> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>> +	int status, i;
>>> +
>>> +	/* Wait for Transaction Pending bit to be cleared */
>>> +	for (i = 0; i < 4; i++) {
>>> +		eeh_ops->read_config(pdn, pos, 2, &status);
>>
>>
>> gcc should have complained on using uninitialized @status here.
>>
>
> I remove the obj file and re-compile the file, not the warning.

Hm. Does not warn me either.

> And took a look at other places where read_config() is called. The laster
> parameter is not initialized before called.

So? It does not make it right.

> You see the error during build?

Why does it matter? We have an undefined behavior here which we should not. 
You could test the return values from read_config() but you do not so at 
least initialize local variables.


>
>>
>>> +		if (!(status & mask))
>>> +			return;
>>> +
>>> +		msleep((1 << i) * 100);
>>> +	}
>>> +
>>> +	pr_warn("%s: Pending transaction while issuing %s FLR to "
>>> +		"%04x:%02x:%02x.%01x\n",
>>
>> Do not wrap user-visible strings.
>>
>
> Will change this.
>
>>
>>> +		__func__, af_flr_rst ? "AF" : "",
>>> +		edev->phb->global_number, pdn->busno,
>>> +		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>>> +}
>>> +
>>> +static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
>>> +{
>>> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>> +	u32 reg;
>>> +
>>> +	if (!edev->pcie_cap)
>>> +		return -ENOTTY;
>>
>>
>> Can pnv_eeh_do_flr() be really called on a non PCIe device, can we get that
>> far? WARN_ON_ONCE() may be?
>>
>
> So you suggest to add a WARN_ON_ONCE() in this condition, right?

I am asking a question here whether it makes sense or not to add a 
WARN_ON_ONCE or replace "if" with WARN_ON_ONCE or not having pcie_cap 
initialized is possible in this code - which one is it?


>
>>
>>> +
>>> +	eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, &reg);
>>
>>
>> ... and here about uninitialized @reg.
>>
>>
>>> +	if (!(reg & PCI_EXP_DEVCAP_FLR))
>>> +		return -ENOTTY;
>>> +
>>> +	switch (option) {
>>> +	case EEH_RESET_HOT:
>>> +	case EEH_RESET_FUNDAMENTAL:
>>> +		pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
>>> +					 PCI_EXP_DEVSTA_TRPND, false);
>>> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>> +				     4, &reg);
>>> +		reg |= PCI_EXP_DEVCTL_BCR_FLR;
>>> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>> +				      4, reg);
>>> +		msleep(EEH_PE_RST_HOLD_TIME);
>>> +		break;
>>> +	case EEH_RESET_DEACTIVATE:
>>> +		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>> +				     4, &reg);
>>> +		reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
>>> +		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>> +				      4, reg);
>>> +		msleep(EEH_PE_RST_SETTLE_TIME);
>>> +		break;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
>>> +{
>>> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>> +	u32 cap;
>>> +
>>> +	if (!edev->af_cap)
>>> +		return -ENOTTY;
>>> +
>>> +	eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);
>>
>>
>> ... and here about @cap.
>>
>>> +	if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
>>> +		return -ENOTTY;
>>> +
>>> +	switch (option) {
>>> +	case EEH_RESET_HOT:
>>> +	case EEH_RESET_FUNDAMENTAL:
>>> +		/*
>>> +		 * Wait for Transaction Pending bit to clear. A word-aligned
>>> +		 * test is used, so we use the conrol offset rather than status
>>> +		 * and shift the test bit to match.
>>
>>
>> Why word-aligned (not byte or double word)?
>>
>
> I copied this words from pci_af_flr(). Actually, I don't tried to understand
> this reason.

Ok. I looked at pci_af_flr().

In this patch, the comment before pnv_eeh_wait_for_pending() is missing, 
something like "pnv_eeh_wait_for_pending() uses a word-size accessor so 
@pos must be work-aligned".


>
>>> +		 */
>>> +		pnv_eeh_wait_for_pending(pdn, edev->af_cap + PCI_AF_CTRL,
>>> +					 PCI_AF_STATUS_TP << 8, true);
>>> +		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL,
>>> +				      1, PCI_AF_CTRL_FLR);
>>> +		msleep(EEH_PE_RST_HOLD_TIME);
>>> +		break;
>>> +	case EEH_RESET_DEACTIVATE:
>>> +		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL, 1, 0);
>>> +		msleep(EEH_PE_RST_SETTLE_TIME);
>>
>>
>> btw there is an unrelated issue with EEH_PE_RST_SETTLE_TIME which is defined
>> as 1800 which is A LOT (+250ms from EEH_PE_RST_HOLD_TIME and for some reason
>> this is actually doubled so there is another reset somewhere).
>>
>
> I don't know the reason for this value. This code keeps aligned with other
> reset functions, like pnv_eeh_bridge_reset().


Are they all in POWERNV/EEH or generic PCI uses same values on, for 
example, x86?



>> Booting a guest with 63 VFs takes 6 minutes or so, is there a good reason for
>> such a huge timeout?
>>
>>
>>> +		break;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int pnv_eeh_reset_vf(struct pci_dn *pdn, int option)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = pnv_eeh_do_flr(pdn, option);
>>> +	if (ret != -ENOTTY)
>>> +		return ret;
>>> +
>>> +	return pnv_eeh_do_af_flr(pdn, option);
>>> +}
>>> +
>>> +static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
>>> +{
>>> +	struct eeh_dev *edev, *tmp;
>>> +	struct pci_dn *pdn;
>>> +	int ret;
>>> +
>>> +	eeh_pe_for_each_dev(pe, edev, tmp) {
>>> +		pdn = eeh_dev_to_pdn(edev);
>>> +		ret = pnv_eeh_reset_vf(pdn, option);
>>> +		if (ret)
>>> +			return ret;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>   void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>>>   {
>>>   	struct pci_controller *hose;
>>> @@ -968,7 +1090,9 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
>>>   		}
>>>
>>>   		bus = eeh_pe_bus_get(pe);
>>> -		if (pci_is_root_bus(bus) ||
>>> +		if (pe->type & EEH_PE_VF)
>>> +			ret = pnv_eeh_vf_pe_reset(pe, option);
>>> +		else if (pci_is_root_bus(bus) ||
>>>   			pci_is_root_bus(bus->parent))
>>>   			ret = pnv_eeh_root_reset(hose, option);
>>>   		else
>>> @@ -1108,6 +1232,14 @@ static inline bool pnv_eeh_cfg_blocked(struct pci_dn *pdn)
>>>   	if (!edev || !edev->pe)
>>>   		return false;
>>>
>>> +	/*
>>> +	 * We will issue FLR or AF FLR to all VFs, which are contained
>>> +	 * in VF PE. It relies on the EEH PCI config accessors. So we
>>> +	 * can't block them during the window.
>>> +	 */
>>> +	if ((edev->physfn) && (edev->pe->state & EEH_PE_RESET))
>>
>>
>> Extra braces around edev->physfn.
>>
>
> Will remove it.
>
>>
>>
>>> +		return false;
>>> +
>>>   	if (edev->pe->state & EEH_PE_CFG_BLOCKED)
>>>   		return true;
>>>
>>>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 09/12] powerpc/powernv: Support PCI config restore for VFs
  2015-10-30  4:56   ` Alexey Kardashevskiy
@ 2015-10-30  8:17     ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-10-30  8:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 03:56:12PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:15 PM, Wei Yang wrote:
>>After PE reset, OPAL API opal_pci_reinit() is called on all devices
>>contained in the PE to reinitialize them. However, VFs can't be seen
>>from skiboot firmware. We have to implement the functions, similar
>>those in skiboot firmware, to reinitialize VFs after reset on PE
>>for VFs.
>>
>>[gwshan: changelog and code refactoring]
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h        |  1 +
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 70 +++++++++++++++++++++++++++-
>>  arch/powerpc/platforms/powernv/pci.c         | 18 +++++++
>>  3 files changed, 88 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index 3d7e537..e499d93 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -219,6 +219,7 @@ struct pci_dn {
>>  #define IODA_INVALID_M64        (-1)
>>  	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>>  #endif /* CONFIG_PCI_IOV */
>>+	int     mps;
>
>int     mps; /* maximum payload size */
>?

You are right. Will add this comment in code.

>
>
>>  #endif
>>  	struct list_head child_list;
>>  	struct list_head list;
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index 017cd72..3cc3e76 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -1616,6 +1616,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
>>  	return ret;
>>  }
>>
>>+static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
>
>It does not exactly restore it, it just tweaks few bytes.
>
>
>>+{
>>+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>+	u32 devctl, cmd, cap2, aer_capctl;
>>+	int old_mps;
>>+
>>+	/* Restore MPS */
>>+	if (edev->pcie_cap) {
>>+		old_mps = (ffs(pdn->mps) - 8) << 5;
>>+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>+				     2, &devctl);
>>+		devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
>>+		devctl |= old_mps;
>>+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>+				      2, devctl);
>>+	}
>>+
>>+	/* Disable Completion Timeout */
>>+	if (edev->pcie_cap) {
>>+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
>>+				     4, &cap2);
>>+		if (cap2 & 0x10) {
>>+			eeh_ops->read_config(pdn,
>>+					edev->pcie_cap + PCI_EXP_DEVCTL2,
>>+					4, &cap2);
>>+			cap2 |= 0x10;
>>+			eeh_ops->write_config(pdn,
>>+					edev->pcie_cap + PCI_EXP_DEVCTL2,
>>+					4, cap2);
>>+		}
>>+	}
>>+
>>+	/* Enable SERR and parity checking */
>>+	eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
>
>
>No complains from gcc about uninitialized @cmd and others? Cooool...
>

No...

>
>>+	cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
>>+	eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
>>+
>>+	/* Enable report various errors */
>>+	if (edev->pcie_cap) {
>>+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>+				2, &devctl);
>>+		devctl &= ~PCI_EXP_DEVCTL_CERE;
>>+		devctl |= (PCI_EXP_DEVCTL_NFERE |
>>+			   PCI_EXP_DEVCTL_FERE |
>>+			   PCI_EXP_DEVCTL_URRE);
>>+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>+				2, devctl);
>>+	}
>>+
>>+	/* Enable ECRC generation and check */
>>+	if (edev->pcie_cap && edev->aer_cap) {
>>+		eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
>>+				4, &aer_capctl);
>>+		aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
>>+		eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
>>+				4, aer_capctl);
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>  static int pnv_eeh_restore_config(struct pci_dn *pdn)
>>  {
>>  	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>@@ -1626,7 +1687,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
>>  		return -EEXIST;
>>
>>  	phb = edev->phb->private_data;
>>-	ret = opal_pci_reinit(phb->opal_id,
>>+	/*
>>+	 * We have to restore the PCI config space after reset since the
>>+	 * firmware can't see SRIOV VFs.
>
>
>When I see "restore config space", pci_restore_state() comes to my mind...
>What you do is rather "fixup" but for some reason you do not call this from
>pnv_pci_fixup_vf_mps (which could be more generic and call
>pnv_eeh_restore_config()). Or that pnv_pci_fixup_vf_mps() could be merged
>into pnv_eeh_restore_config(). Having "restore" code in 2 places with unclear
>execution order does not feel right.
>
>
>
>>+	 */
>>+	if (edev->physfn)
>>+		ret = pnv_eeh_restore_vf_config(pdn);
>>+	else
>>+		ret = opal_pci_reinit(phb->opal_id,
>>  			      OPAL_REINIT_PCI_DEV, edev->config_addr);
>>  	if (ret) {
>>  		pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
>>diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>index 765d8ed..0e4f42e 100644
>>--- a/arch/powerpc/platforms/powernv/pci.c
>>+++ b/arch/powerpc/platforms/powernv/pci.c
>>@@ -788,6 +788,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
>>  }
>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
>>
>>+#ifdef CONFIG_PCI_IOV
>>+static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
>>+{
>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>+	int parent_mps;
>>+
>>+	if (!pdev->is_virtfn)
>>+		return;
>>+
>>+	/* Synchronize MPS for VF and PF */
>>+	parent_mps = pcie_get_mps(pdev->physfn);
>>+	if ((128 << pdev->pcie_mpss) >= parent_mps)
>>+		pcie_set_mps(pdev, parent_mps);
>
>
>There is no mentioning of MPS in the commit log. What and why is happening
>here? Is this cut-n-paste? Is not there already some code somewhere which
>does the same thing already for initial init()? Can this be reused? Or
>extracted to a helper and reused?
>

Ok, this code confused you.

This code, pnv_eeh_restore_vf_config() is called in pnv_eeh_restore_config().
The purpose of this function is to be the counter part of opal_pci_reinit().
To be simple, what this function does, is what opal_pci_reinit() does in
skiboot. You may ask why we just rely on opal_pci_reinit(). The reason is in
skiboot, we don't have device to represent a VF.

About why we have pnv_pci_fixup_vf_mps() and pnv_eeh_restore_vf_config() and
both handle mps, because in skiboot during system initialization devices' MPS
is set by this rule. While skiboot has no idea about VFs, this step is done in
kernel. 

>
>
>>+	pdn->mps = pcie_get_mps(pdev);
>>+}
>>+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
>>+#endif /* CONFIG_PCI_IOV */
>>+
>>  void __init pnv_pci_init(void)
>>  {
>>  	struct device_node *np;
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 10/12] powerpc/eeh: Support error recovery for VF PE
  2015-10-30  5:20   ` Alexey Kardashevskiy
@ 2015-11-01  1:53     ` Wei Yang
  2015-11-01 23:40       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 50+ messages in thread
From: Wei Yang @ 2015-11-01  1:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 04:20:48PM +1100, Alexey Kardashevskiy wrote:
>On 10/26/2015 02:16 PM, Wei Yang wrote:
>>Different from PCI bus dependent PE, PE for VFs doesn't have the
>
>s/Different from/Unlike/
>

Will change in next version.

>
>>primary bus, on which the PCI hotplug is implemented. The patch
>>supports error recovery, especially the PCI hotplug for VF's PE.
>
>The patch adds support for error recovery of what exactly?
>What is "especially" about?
>

PFs are enumerated on PCI bus, while VFs are created by PF's driver.

In EEH recovery, it has two cases.
1. Device and driver is EEH aware, error handlers are called.
2. Device and driver is not EEH aware, un-plug the device and plug it again by
   enumerating it.

The special thing happens on the second case. For a PF, we could use the
original pci core to enumerate the bus, while for VF, we need to record the VF
which are un-plugged then plug it again.

>
>>The hotplug on VF's PE is implemented based on VFs, instead of
>>PCI bus any more.
>
>Needs rephrase.
>
>Is this patch about EEH error recovery, i.e. unplug VF, re-plug VF? Why does
>the commit log talk about PE hotplug? I thought we do VF (i.e. PCI device)
>hotplug, not PE.
>

Hmm... unlike the Bus PE for PFs, VF PE is dynamically created and released
when VFs are created and released.

>
>>
>>[gwshan: changelog and code refactoring]
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/eeh.h   |   1 +
>>  arch/powerpc/kernel/eeh.c        |   8 ++++
>>  arch/powerpc/kernel/eeh_driver.c | 100 +++++++++++++++++++++++++++++++--------
>>  arch/powerpc/kernel/eeh_pe.c     |   3 +-
>>  4 files changed, 90 insertions(+), 22 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>index 331c856..ea1f13c4 100644
>>--- a/arch/powerpc/include/asm/eeh.h
>>+++ b/arch/powerpc/include/asm/eeh.h
>>@@ -142,6 +142,7 @@ struct eeh_dev {
>>  	struct pci_controller *phb;	/* Associated PHB		*/
>>  	struct pci_dn *pdn;		/* Associated PCI device node	*/
>>  	struct pci_dev *pdev;		/* Associated PCI device	*/
>>+	int    in_error;		/* Error flag for eeh_dev	*/
>
>Make it "bool".
>

Will change it in next version.

>
>>  	struct pci_dev *physfn;		/* Associated PF PORT		*/
>>  	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
>>  };
>>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>>index af9b597..28e4d73 100644
>>--- a/arch/powerpc/kernel/eeh.c
>>+++ b/arch/powerpc/kernel/eeh.c
>>@@ -1227,6 +1227,14 @@ void eeh_remove_device(struct pci_dev *dev)
>>  	 * from the parent PE during the BAR resotre.
>>  	 */
>>  	edev->pdev = NULL;
>>+
>>+	/*
>>+	 * The flag "in_error" is used to trace EEH devices for VFs
>>+	 * in error state or not. It's set in eeh_report_error(). If
>>+	 * it's not set, eeh_report_{reset,resume}() won't be called
>>+	 * for the VF EEH device.
>>+	 */
>>+	edev->in_error = 0;
>>  	dev->dev.archdata.edev = NULL;
>>  	if (!(edev->pe->state & EEH_PE_KEEP))
>>  		eeh_rmv_from_parent_pe(edev);
>>diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>index 89eb4bc..99868e2 100644
>>--- a/arch/powerpc/kernel/eeh_driver.c
>>+++ b/arch/powerpc/kernel/eeh_driver.c
>>@@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
>>  	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
>>  	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
>>
>>+	edev->in_error = 1;
>>  	eeh_pcid_put(dev);
>>  	return NULL;
>>  }
>>@@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
>>
>>  	if (!driver->err_handler ||
>>  	    !driver->err_handler->slot_reset ||
>>-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
>>+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
>>+	    (!edev->in_error)) {
>>  		eeh_pcid_put(dev);
>>  		return NULL;
>>  	}
>>@@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
>>
>
>bood was_in_error = edev->in_error;
>edev->in_error = false;
>
>then use was_in_error below and there is no need to replace return with goto,
>etc -> slightly simpler code.
>

Will change it in next version.

>
>>  	if (!driver->err_handler ||
>>  	    !driver->err_handler->resume ||
>>-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
>>+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
>>+	    (!edev->in_error)) {
>>  		edev->mode &= ~EEH_DEV_NO_HANDLER;
>>-		eeh_pcid_put(dev);
>>-		return NULL;
>>+		goto out;
>>  	}
>>
>>  	driver->err_handler->resume(dev);
>>
>>+out:
>>+	edev->in_error = 0;
>>  	eeh_pcid_put(dev);
>>  	return NULL;
>>  }
>>@@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata)
>>  	return NULL;
>>  }
>>
>>+static void *eeh_add_virt_device(void *data, void *userdata)
>>+{
>>+	struct pci_driver *driver;
>>+	struct eeh_dev *edev = (struct eeh_dev *)data;
>>+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>>+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>>+
>>+	if (!(edev->physfn)) {
>>+		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
>>+			__func__, edev->phb->global_number, pdn->busno,
>>+			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>>+		return NULL;
>>+	}
>>+
>>+	driver = eeh_pcid_get(dev);
>>+	if (driver) {
>>+		eeh_pcid_put(dev);
>>+		if (driver->err_handler)
>>+			return NULL;
>>+	}
>>+
>>+	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
>>+	return NULL;
>>+}
>>+
>>  static void *eeh_rmv_device(void *data, void *userdata)
>>  {
>>  	struct pci_driver *driver;
>>  	struct eeh_dev *edev = (struct eeh_dev *)data;
>>  	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>>  	int *removed = (int *)userdata;
>>+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>>
>>  	/*
>>  	 * Actually, we should remove the PCI bridges as well.
>>@@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>  	driver = eeh_pcid_get(dev);
>>  	if (driver) {
>>  		eeh_pcid_put(dev);
>>-		if (driver->err_handler)
>>+		if (removed && driver->err_handler)
>>  			return NULL;
>>  	}
>>
>>@@ -425,11 +455,23 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>  		 pci_name(dev));
>>  	edev->bus = dev->bus;
>>  	edev->mode |= EEH_DEV_DISCONNECTED;
>>-	(*removed)++;
>>+	if (removed)
>>+		(*removed)++;
>>
>>-	pci_lock_rescan_remove();
>>-	pci_stop_and_remove_bus_device(dev);
>>-	pci_unlock_rescan_remove();
>>+	if (edev->physfn) {
>>+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
>>+		edev->pdev = NULL;
>>+
>>+		/*
>>+		 * We have to set the VF PE number to invalid one, which is
>>+		 * required to plug the VF successfully.
>>+		 */
>>+		pdn->pe_number = IODA_INVALID_PE;
>>+	} else {
>>+		pci_lock_rescan_remove();
>>+		pci_stop_and_remove_bus_device(dev);
>>+		pci_unlock_rescan_remove();
>>+	}
>>
>>  	return NULL;
>>  }
>>@@ -548,6 +590,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>  	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
>>  	struct timeval tstamp;
>>  	int cnt, rc, removed = 0;
>>+	struct eeh_dev *edev;
>>
>>  	/* pcibios will clear the counter; save the value */
>>  	cnt = pe->freeze_count;
>>@@ -561,12 +604,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>  	 */
>>  	eeh_pe_state_mark(pe, EEH_PE_KEEP);
>>  	if (bus) {
>>-		pci_lock_rescan_remove();
>>-		pcibios_remove_pci_devices(bus);
>>-		pci_unlock_rescan_remove();
>>-	} else if (frozen_bus) {
>>+		if (pe->type & EEH_PE_VF)
>>+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
>
>
>I believe the rule is that if one branch of "if" uses curly braces, then the
>other should have them too.
>

Thanks for reminding, will fix it in next version.

>
>>+		else {
>>+			pci_lock_rescan_remove();
>>+			pcibios_remove_pci_devices(bus);
>>+			pci_unlock_rescan_remove();
>>+		}
>>+	} else if (frozen_bus)
>>  		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
>>-	}
>>
>>  	/*
>>  	 * Reset the pci controller. (Asserts RST#; resets config space).
>>@@ -607,14 +653,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>  		 * PE. We should disconnect it so the binding can be
>>  		 * rebuilt when adding PCI devices.
>>  		 */
>>+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
>>  		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
>>-		pcibios_add_pci_devices(bus);
>>+		if (pe->type & EEH_PE_VF)
>
>Move "edev = list_first_entry(&pe->edevs, struct eeh_dev, list)" here. You
>could actually do:
>
>eeh_add_virt_device(list_first_entry(&pe->edevs, struct eeh_dev, list), NULL);
>
>and drop local variable @edev. Or move it to this scope. Dunno.
>

Hmm... as I know, in eeh_pe_detach_dev() will remove the edev from pe's edevs
list. 

>
>>+			eeh_add_virt_device(edev, NULL);
>>+		else
>>+			pcibios_add_pci_devices(bus);
>>  	} else if (frozen_bus && removed) {
>>  		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>>  		ssleep(5);
>>
>>+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
>>  		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
>>-		pcibios_add_pci_devices(frozen_bus);
>>+		if (pe->type & EEH_PE_VF)
>
>
>The same comment as above.
>
>>+			eeh_add_virt_device(edev, NULL);
>>+		else
>>+			pcibios_add_pci_devices(frozen_bus);
>>  	}
>>  	eeh_pe_state_clear(pe, EEH_PE_KEEP);
>>
>>@@ -792,11 +846,15 @@ perm_error:
>>  	 * the their PCI config any more.
>>  	 */
>>  	if (frozen_bus) {
>>-		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>-
>>-		pci_lock_rescan_remove();
>>-		pcibios_remove_pci_devices(frozen_bus);
>>-		pci_unlock_rescan_remove();
>>+		if (pe->type & EEH_PE_VF) {
>>+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
>>+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>+		} else {
>>+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>+			pci_lock_rescan_remove();
>>+			pcibios_remove_pci_devices(frozen_bus);
>>+			pci_unlock_rescan_remove();
>>+		}
>>  	}
>>  }
>>
>>diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>>index 260a701..5cde950 100644
>>--- a/arch/powerpc/kernel/eeh_pe.c
>>+++ b/arch/powerpc/kernel/eeh_pe.c
>>@@ -914,7 +914,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
>>  	if (pe->type & EEH_PE_PHB) {
>>  		bus = pe->phb->bus;
>>  	} else if (pe->type & EEH_PE_BUS ||
>>-		   pe->type & EEH_PE_DEVICE) {
>>+		   pe->type & EEH_PE_DEVICE ||
>>+		   pe->type & EEH_PE_VF) {
>>  		if (pe->bus) {
>>  			bus = pe->bus;
>>  			goto out;
>>
>
>
>-- 
>Alexey
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 10/12] powerpc/eeh: Support error recovery for VF PE
  2015-11-01  1:53     ` Wei Yang
@ 2015-11-01 23:40       ` Alexey Kardashevskiy
  2015-11-02  9:39         ` Wei Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-01 23:40 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On 11/01/2015 12:53 PM, Wei Yang wrote:
> On Fri, Oct 30, 2015 at 04:20:48PM +1100, Alexey Kardashevskiy wrote:
>> On 10/26/2015 02:16 PM, Wei Yang wrote:
>>> Different from PCI bus dependent PE, PE for VFs doesn't have the
>>
>> s/Different from/Unlike/
>>
>
> Will change in next version.
>
>>
>>> primary bus, on which the PCI hotplug is implemented. The patch
>>> supports error recovery, especially the PCI hotplug for VF's PE.
>>
>> The patch adds support for error recovery of what exactly?
>> What is "especially" about?
>>
>
> PFs are enumerated on PCI bus, while VFs are created by PF's driver.
>
> In EEH recovery, it has two cases.
> 1. Device and driver is EEH aware, error handlers are called.
> 2. Device and driver is not EEH aware, un-plug the device and plug it again by
>     enumerating it.
>
> The special thing happens on the second case. For a PF, we could use the
> original pci core to enumerate the bus, while for VF, we need to record the VF
> which are un-plugged then plug it again.


Right. This should have been the actual commit log.


>>
>>> The hotplug on VF's PE is implemented based on VFs, instead of
>>> PCI bus any more.
>>
>> Needs rephrase.
>>
>> Is this patch about EEH error recovery, i.e. unplug VF, re-plug VF? Why does
>> the commit log talk about PE hotplug? I thought we do VF (i.e. PCI device)
>> hotplug, not PE.
>>
>
> Hmm... unlike the Bus PE for PFs, VF PE is dynamically created and released
> when VFs are created and released.


Sure. PEs are created/released, not plugged/unplugged (VFs are), that was 
my point.


>
>>
>>>
>>> [gwshan: changelog and code refactoring]
>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/include/asm/eeh.h   |   1 +
>>>   arch/powerpc/kernel/eeh.c        |   8 ++++
>>>   arch/powerpc/kernel/eeh_driver.c | 100 +++++++++++++++++++++++++++++++--------
>>>   arch/powerpc/kernel/eeh_pe.c     |   3 +-
>>>   4 files changed, 90 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>> index 331c856..ea1f13c4 100644
>>> --- a/arch/powerpc/include/asm/eeh.h
>>> +++ b/arch/powerpc/include/asm/eeh.h
>>> @@ -142,6 +142,7 @@ struct eeh_dev {
>>>   	struct pci_controller *phb;	/* Associated PHB		*/
>>>   	struct pci_dn *pdn;		/* Associated PCI device node	*/
>>>   	struct pci_dev *pdev;		/* Associated PCI device	*/
>>> +	int    in_error;		/* Error flag for eeh_dev	*/
>>
>> Make it "bool".
>>
>
> Will change it in next version.
>
>>
>>>   	struct pci_dev *physfn;		/* Associated PF PORT		*/
>>>   	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
>>>   };
>>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>>> index af9b597..28e4d73 100644
>>> --- a/arch/powerpc/kernel/eeh.c
>>> +++ b/arch/powerpc/kernel/eeh.c
>>> @@ -1227,6 +1227,14 @@ void eeh_remove_device(struct pci_dev *dev)
>>>   	 * from the parent PE during the BAR resotre.
>>>   	 */
>>>   	edev->pdev = NULL;
>>> +
>>> +	/*
>>> +	 * The flag "in_error" is used to trace EEH devices for VFs
>>> +	 * in error state or not. It's set in eeh_report_error(). If
>>> +	 * it's not set, eeh_report_{reset,resume}() won't be called
>>> +	 * for the VF EEH device.
>>> +	 */
>>> +	edev->in_error = 0;
>>>   	dev->dev.archdata.edev = NULL;
>>>   	if (!(edev->pe->state & EEH_PE_KEEP))
>>>   		eeh_rmv_from_parent_pe(edev);
>>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>> index 89eb4bc..99868e2 100644
>>> --- a/arch/powerpc/kernel/eeh_driver.c
>>> +++ b/arch/powerpc/kernel/eeh_driver.c
>>> @@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
>>>   	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
>>>   	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
>>>
>>> +	edev->in_error = 1;
>>>   	eeh_pcid_put(dev);
>>>   	return NULL;
>>>   }
>>> @@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
>>>
>>>   	if (!driver->err_handler ||
>>>   	    !driver->err_handler->slot_reset ||
>>> -	    (edev->mode & EEH_DEV_NO_HANDLER)) {
>>> +	    (edev->mode & EEH_DEV_NO_HANDLER) ||
>>> +	    (!edev->in_error)) {
>>>   		eeh_pcid_put(dev);
>>>   		return NULL;
>>>   	}
>>> @@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
>>>
>>
>> bood was_in_error = edev->in_error;
>> edev->in_error = false;
>>
>> then use was_in_error below and there is no need to replace return with goto,
>> etc -> slightly simpler code.
>>
>
> Will change it in next version.
>
>>
>>>   	if (!driver->err_handler ||
>>>   	    !driver->err_handler->resume ||
>>> -	    (edev->mode & EEH_DEV_NO_HANDLER)) {
>>> +	    (edev->mode & EEH_DEV_NO_HANDLER) ||
>>> +	    (!edev->in_error)) {
>>>   		edev->mode &= ~EEH_DEV_NO_HANDLER;
>>> -		eeh_pcid_put(dev);
>>> -		return NULL;
>>> +		goto out;
>>>   	}
>>>
>>>   	driver->err_handler->resume(dev);
>>>
>>> +out:
>>> +	edev->in_error = 0;
>>>   	eeh_pcid_put(dev);
>>>   	return NULL;
>>>   }
>>> @@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata)
>>>   	return NULL;
>>>   }
>>>
>>> +static void *eeh_add_virt_device(void *data, void *userdata)
>>> +{
>>> +	struct pci_driver *driver;
>>> +	struct eeh_dev *edev = (struct eeh_dev *)data;
>>> +	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>>> +	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>>> +
>>> +	if (!(edev->physfn)) {
>>> +		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
>>> +			__func__, edev->phb->global_number, pdn->busno,
>>> +			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>>> +		return NULL;
>>> +	}
>>> +
>>> +	driver = eeh_pcid_get(dev);
>>> +	if (driver) {
>>> +		eeh_pcid_put(dev);
>>> +		if (driver->err_handler)
>>> +			return NULL;
>>> +	}
>>> +
>>> +	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
>>> +	return NULL;
>>> +}
>>> +
>>>   static void *eeh_rmv_device(void *data, void *userdata)
>>>   {
>>>   	struct pci_driver *driver;
>>>   	struct eeh_dev *edev = (struct eeh_dev *)data;
>>>   	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>>>   	int *removed = (int *)userdata;
>>> +	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>>>
>>>   	/*
>>>   	 * Actually, we should remove the PCI bridges as well.
>>> @@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>>   	driver = eeh_pcid_get(dev);
>>>   	if (driver) {
>>>   		eeh_pcid_put(dev);
>>> -		if (driver->err_handler)
>>> +		if (removed && driver->err_handler)
>>>   			return NULL;
>>>   	}
>>>
>>> @@ -425,11 +455,23 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>>   		 pci_name(dev));
>>>   	edev->bus = dev->bus;
>>>   	edev->mode |= EEH_DEV_DISCONNECTED;
>>> -	(*removed)++;
>>> +	if (removed)
>>> +		(*removed)++;
>>>
>>> -	pci_lock_rescan_remove();
>>> -	pci_stop_and_remove_bus_device(dev);
>>> -	pci_unlock_rescan_remove();
>>> +	if (edev->physfn) {
>>> +		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
>>> +		edev->pdev = NULL;
>>> +
>>> +		/*
>>> +		 * We have to set the VF PE number to invalid one, which is
>>> +		 * required to plug the VF successfully.
>>> +		 */
>>> +		pdn->pe_number = IODA_INVALID_PE;
>>> +	} else {
>>> +		pci_lock_rescan_remove();
>>> +		pci_stop_and_remove_bus_device(dev);
>>> +		pci_unlock_rescan_remove();
>>> +	}
>>>
>>>   	return NULL;
>>>   }
>>> @@ -548,6 +590,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>>   	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
>>>   	struct timeval tstamp;
>>>   	int cnt, rc, removed = 0;
>>> +	struct eeh_dev *edev;
>>>
>>>   	/* pcibios will clear the counter; save the value */
>>>   	cnt = pe->freeze_count;
>>> @@ -561,12 +604,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>>   	 */
>>>   	eeh_pe_state_mark(pe, EEH_PE_KEEP);
>>>   	if (bus) {
>>> -		pci_lock_rescan_remove();
>>> -		pcibios_remove_pci_devices(bus);
>>> -		pci_unlock_rescan_remove();
>>> -	} else if (frozen_bus) {
>>> +		if (pe->type & EEH_PE_VF)
>>> +			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
>>
>>
>> I believe the rule is that if one branch of "if" uses curly braces, then the
>> other should have them too.
>>
>
> Thanks for reminding, will fix it in next version.


I thought checkpatch.pl checks for it but apparently it does not.



>>
>>> +		else {
>>> +			pci_lock_rescan_remove();
>>> +			pcibios_remove_pci_devices(bus);
>>> +			pci_unlock_rescan_remove();
>>> +		}
>>> +	} else if (frozen_bus)
>>>   		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
>>> -	}
>>>
>>>   	/*
>>>   	 * Reset the pci controller. (Asserts RST#; resets config space).
>>> @@ -607,14 +653,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>>   		 * PE. We should disconnect it so the binding can be
>>>   		 * rebuilt when adding PCI devices.
>>>   		 */
>>> +		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
>>>   		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
>>> -		pcibios_add_pci_devices(bus);
>>> +		if (pe->type & EEH_PE_VF)
>>
>> Move "edev = list_first_entry(&pe->edevs, struct eeh_dev, list)" here. You
>> could actually do:
>>
>> eeh_add_virt_device(list_first_entry(&pe->edevs, struct eeh_dev, list), NULL);
>>
>> and drop local variable @edev. Or move it to this scope. Dunno.
>>
>
> Hmm... as I know, in eeh_pe_detach_dev() will remove the edev from pe's edevs
> list.
>
>>
>>> +			eeh_add_virt_device(edev, NULL);
>>> +		else
>>> +			pcibios_add_pci_devices(bus);
>>>   	} else if (frozen_bus && removed) {
>>>   		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>>>   		ssleep(5);
>>>
>>> +		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
>>>   		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
>>> -		pcibios_add_pci_devices(frozen_bus);
>>> +		if (pe->type & EEH_PE_VF)
>>
>>
>> The same comment as above.
>>
>>> +			eeh_add_virt_device(edev, NULL);
>>> +		else
>>> +			pcibios_add_pci_devices(frozen_bus);
>>>   	}
>>>   	eeh_pe_state_clear(pe, EEH_PE_KEEP);
>>>
>>> @@ -792,11 +846,15 @@ perm_error:
>>>   	 * the their PCI config any more.
>>>   	 */
>>>   	if (frozen_bus) {
>>> -		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>> -
>>> -		pci_lock_rescan_remove();
>>> -		pcibios_remove_pci_devices(frozen_bus);
>>> -		pci_unlock_rescan_remove();
>>> +		if (pe->type & EEH_PE_VF) {
>>> +			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
>>> +			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>> +		} else {
>>> +			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>> +			pci_lock_rescan_remove();
>>> +			pcibios_remove_pci_devices(frozen_bus);
>>> +			pci_unlock_rescan_remove();
>>> +		}
>>>   	}
>>>   }
>>>
>>> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>>> index 260a701..5cde950 100644
>>> --- a/arch/powerpc/kernel/eeh_pe.c
>>> +++ b/arch/powerpc/kernel/eeh_pe.c
>>> @@ -914,7 +914,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
>>>   	if (pe->type & EEH_PE_PHB) {
>>>   		bus = pe->phb->bus;
>>>   	} else if (pe->type & EEH_PE_BUS ||
>>> -		   pe->type & EEH_PE_DEVICE) {
>>> +		   pe->type & EEH_PE_DEVICE ||
>>> +		   pe->type & EEH_PE_VF) {
>>>   		if (pe->bus) {
>>>   			bus = pe->bus;
>>>   			goto out;
>>>


-- 
Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 10/12] powerpc/eeh: Support error recovery for VF PE
  2015-11-01 23:40       ` Alexey Kardashevskiy
@ 2015-11-02  9:39         ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-11-02  9:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Mon, Nov 02, 2015 at 10:40:36AM +1100, Alexey Kardashevskiy wrote:
>On 11/01/2015 12:53 PM, Wei Yang wrote:
>>On Fri, Oct 30, 2015 at 04:20:48PM +1100, Alexey Kardashevskiy wrote:
>>>On 10/26/2015 02:16 PM, Wei Yang wrote:
>>>>Different from PCI bus dependent PE, PE for VFs doesn't have the
>>>
>>>s/Different from/Unlike/
>>>
>>
>>Will change in next version.
>>
>>>
>>>>primary bus, on which the PCI hotplug is implemented. The patch
>>>>supports error recovery, especially the PCI hotplug for VF's PE.
>>>
>>>The patch adds support for error recovery of what exactly?
>>>What is "especially" about?
>>>
>>
>>PFs are enumerated on PCI bus, while VFs are created by PF's driver.
>>
>>In EEH recovery, it has two cases.
>>1. Device and driver is EEH aware, error handlers are called.
>>2. Device and driver is not EEH aware, un-plug the device and plug it again by
>>    enumerating it.
>>
>>The special thing happens on the second case. For a PF, we could use the
>>original pci core to enumerate the bus, while for VF, we need to record the VF
>>which are un-plugged then plug it again.
>
>
>Right. This should have been the actual commit log.
>
>
>>>
>>>>The hotplug on VF's PE is implemented based on VFs, instead of
>>>>PCI bus any more.
>>>
>>>Needs rephrase.
>>>
>>>Is this patch about EEH error recovery, i.e. unplug VF, re-plug VF? Why does
>>>the commit log talk about PE hotplug? I thought we do VF (i.e. PCI device)
>>>hotplug, not PE.
>>>
>>
>>Hmm... unlike the Bus PE for PFs, VF PE is dynamically created and released
>>when VFs are created and released.
>
>
>Sure. PEs are created/released, not plugged/unplugged (VFs are), that was my
>point.
>

Thanks for the suggestion, will change it in next version.

>
>>
>>>
>>>>
>>>>[gwshan: changelog and code refactoring]
>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/include/asm/eeh.h   |   1 +
>>>>  arch/powerpc/kernel/eeh.c        |   8 ++++
>>>>  arch/powerpc/kernel/eeh_driver.c | 100 +++++++++++++++++++++++++++++++--------
>>>>  arch/powerpc/kernel/eeh_pe.c     |   3 +-
>>>>  4 files changed, 90 insertions(+), 22 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>>>index 331c856..ea1f13c4 100644
>>>>--- a/arch/powerpc/include/asm/eeh.h
>>>>+++ b/arch/powerpc/include/asm/eeh.h
>>>>@@ -142,6 +142,7 @@ struct eeh_dev {
>>>>  	struct pci_controller *phb;	/* Associated PHB		*/
>>>>  	struct pci_dn *pdn;		/* Associated PCI device node	*/
>>>>  	struct pci_dev *pdev;		/* Associated PCI device	*/
>>>>+	int    in_error;		/* Error flag for eeh_dev	*/
>>>
>>>Make it "bool".
>>>
>>
>>Will change it in next version.
>>
>>>
>>>>  	struct pci_dev *physfn;		/* Associated PF PORT		*/
>>>>  	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
>>>>  };
>>>>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>>>>index af9b597..28e4d73 100644
>>>>--- a/arch/powerpc/kernel/eeh.c
>>>>+++ b/arch/powerpc/kernel/eeh.c
>>>>@@ -1227,6 +1227,14 @@ void eeh_remove_device(struct pci_dev *dev)
>>>>  	 * from the parent PE during the BAR resotre.
>>>>  	 */
>>>>  	edev->pdev = NULL;
>>>>+
>>>>+	/*
>>>>+	 * The flag "in_error" is used to trace EEH devices for VFs
>>>>+	 * in error state or not. It's set in eeh_report_error(). If
>>>>+	 * it's not set, eeh_report_{reset,resume}() won't be called
>>>>+	 * for the VF EEH device.
>>>>+	 */
>>>>+	edev->in_error = 0;
>>>>  	dev->dev.archdata.edev = NULL;
>>>>  	if (!(edev->pe->state & EEH_PE_KEEP))
>>>>  		eeh_rmv_from_parent_pe(edev);
>>>>diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>>>index 89eb4bc..99868e2 100644
>>>>--- a/arch/powerpc/kernel/eeh_driver.c
>>>>+++ b/arch/powerpc/kernel/eeh_driver.c
>>>>@@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
>>>>  	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
>>>>  	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
>>>>
>>>>+	edev->in_error = 1;
>>>>  	eeh_pcid_put(dev);
>>>>  	return NULL;
>>>>  }
>>>>@@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
>>>>
>>>>  	if (!driver->err_handler ||
>>>>  	    !driver->err_handler->slot_reset ||
>>>>-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
>>>>+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
>>>>+	    (!edev->in_error)) {
>>>>  		eeh_pcid_put(dev);
>>>>  		return NULL;
>>>>  	}
>>>>@@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
>>>>
>>>
>>>bood was_in_error = edev->in_error;
>>>edev->in_error = false;
>>>
>>>then use was_in_error below and there is no need to replace return with goto,
>>>etc -> slightly simpler code.
>>>
>>
>>Will change it in next version.
>>
>>>
>>>>  	if (!driver->err_handler ||
>>>>  	    !driver->err_handler->resume ||
>>>>-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
>>>>+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
>>>>+	    (!edev->in_error)) {
>>>>  		edev->mode &= ~EEH_DEV_NO_HANDLER;
>>>>-		eeh_pcid_put(dev);
>>>>-		return NULL;
>>>>+		goto out;
>>>>  	}
>>>>
>>>>  	driver->err_handler->resume(dev);
>>>>
>>>>+out:
>>>>+	edev->in_error = 0;
>>>>  	eeh_pcid_put(dev);
>>>>  	return NULL;
>>>>  }
>>>>@@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata)
>>>>  	return NULL;
>>>>  }
>>>>
>>>>+static void *eeh_add_virt_device(void *data, void *userdata)
>>>>+{
>>>>+	struct pci_driver *driver;
>>>>+	struct eeh_dev *edev = (struct eeh_dev *)data;
>>>>+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>>>>+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>>>>+
>>>>+	if (!(edev->physfn)) {
>>>>+		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
>>>>+			__func__, edev->phb->global_number, pdn->busno,
>>>>+			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>>>>+		return NULL;
>>>>+	}
>>>>+
>>>>+	driver = eeh_pcid_get(dev);
>>>>+	if (driver) {
>>>>+		eeh_pcid_put(dev);
>>>>+		if (driver->err_handler)
>>>>+			return NULL;
>>>>+	}
>>>>+
>>>>+	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
>>>>+	return NULL;
>>>>+}
>>>>+
>>>>  static void *eeh_rmv_device(void *data, void *userdata)
>>>>  {
>>>>  	struct pci_driver *driver;
>>>>  	struct eeh_dev *edev = (struct eeh_dev *)data;
>>>>  	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>>>>  	int *removed = (int *)userdata;
>>>>+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>>>>
>>>>  	/*
>>>>  	 * Actually, we should remove the PCI bridges as well.
>>>>@@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>>>  	driver = eeh_pcid_get(dev);
>>>>  	if (driver) {
>>>>  		eeh_pcid_put(dev);
>>>>-		if (driver->err_handler)
>>>>+		if (removed && driver->err_handler)
>>>>  			return NULL;
>>>>  	}
>>>>
>>>>@@ -425,11 +455,23 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>>>  		 pci_name(dev));
>>>>  	edev->bus = dev->bus;
>>>>  	edev->mode |= EEH_DEV_DISCONNECTED;
>>>>-	(*removed)++;
>>>>+	if (removed)
>>>>+		(*removed)++;
>>>>
>>>>-	pci_lock_rescan_remove();
>>>>-	pci_stop_and_remove_bus_device(dev);
>>>>-	pci_unlock_rescan_remove();
>>>>+	if (edev->physfn) {
>>>>+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
>>>>+		edev->pdev = NULL;
>>>>+
>>>>+		/*
>>>>+		 * We have to set the VF PE number to invalid one, which is
>>>>+		 * required to plug the VF successfully.
>>>>+		 */
>>>>+		pdn->pe_number = IODA_INVALID_PE;
>>>>+	} else {
>>>>+		pci_lock_rescan_remove();
>>>>+		pci_stop_and_remove_bus_device(dev);
>>>>+		pci_unlock_rescan_remove();
>>>>+	}
>>>>
>>>>  	return NULL;
>>>>  }
>>>>@@ -548,6 +590,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>>>  	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
>>>>  	struct timeval tstamp;
>>>>  	int cnt, rc, removed = 0;
>>>>+	struct eeh_dev *edev;
>>>>
>>>>  	/* pcibios will clear the counter; save the value */
>>>>  	cnt = pe->freeze_count;
>>>>@@ -561,12 +604,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>>>  	 */
>>>>  	eeh_pe_state_mark(pe, EEH_PE_KEEP);
>>>>  	if (bus) {
>>>>-		pci_lock_rescan_remove();
>>>>-		pcibios_remove_pci_devices(bus);
>>>>-		pci_unlock_rescan_remove();
>>>>-	} else if (frozen_bus) {
>>>>+		if (pe->type & EEH_PE_VF)
>>>>+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
>>>
>>>
>>>I believe the rule is that if one branch of "if" uses curly braces, then the
>>>other should have them too.
>>>
>>
>>Thanks for reminding, will fix it in next version.
>
>
>I thought checkpatch.pl checks for it but apparently it does not.
>
>
>
>>>
>>>>+		else {
>>>>+			pci_lock_rescan_remove();
>>>>+			pcibios_remove_pci_devices(bus);
>>>>+			pci_unlock_rescan_remove();
>>>>+		}
>>>>+	} else if (frozen_bus)
>>>>  		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
>>>>-	}
>>>>
>>>>  	/*
>>>>  	 * Reset the pci controller. (Asserts RST#; resets config space).
>>>>@@ -607,14 +653,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>>>>  		 * PE. We should disconnect it so the binding can be
>>>>  		 * rebuilt when adding PCI devices.
>>>>  		 */
>>>>+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
>>>>  		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
>>>>-		pcibios_add_pci_devices(bus);
>>>>+		if (pe->type & EEH_PE_VF)
>>>
>>>Move "edev = list_first_entry(&pe->edevs, struct eeh_dev, list)" here. You
>>>could actually do:
>>>
>>>eeh_add_virt_device(list_first_entry(&pe->edevs, struct eeh_dev, list), NULL);
>>>
>>>and drop local variable @edev. Or move it to this scope. Dunno.
>>>
>>
>>Hmm... as I know, in eeh_pe_detach_dev() will remove the edev from pe's edevs
>>list.
>>
>>>
>>>>+			eeh_add_virt_device(edev, NULL);
>>>>+		else
>>>>+			pcibios_add_pci_devices(bus);
>>>>  	} else if (frozen_bus && removed) {
>>>>  		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>>>>  		ssleep(5);
>>>>
>>>>+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
>>>>  		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
>>>>-		pcibios_add_pci_devices(frozen_bus);
>>>>+		if (pe->type & EEH_PE_VF)
>>>
>>>
>>>The same comment as above.
>>>
>>>>+			eeh_add_virt_device(edev, NULL);
>>>>+		else
>>>>+			pcibios_add_pci_devices(frozen_bus);
>>>>  	}
>>>>  	eeh_pe_state_clear(pe, EEH_PE_KEEP);
>>>>
>>>>@@ -792,11 +846,15 @@ perm_error:
>>>>  	 * the their PCI config any more.
>>>>  	 */
>>>>  	if (frozen_bus) {
>>>>-		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>>>-
>>>>-		pci_lock_rescan_remove();
>>>>-		pcibios_remove_pci_devices(frozen_bus);
>>>>-		pci_unlock_rescan_remove();
>>>>+		if (pe->type & EEH_PE_VF) {
>>>>+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
>>>>+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>>>+		} else {
>>>>+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>>>+			pci_lock_rescan_remove();
>>>>+			pcibios_remove_pci_devices(frozen_bus);
>>>>+			pci_unlock_rescan_remove();
>>>>+		}
>>>>  	}
>>>>  }
>>>>
>>>>diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>>>>index 260a701..5cde950 100644
>>>>--- a/arch/powerpc/kernel/eeh_pe.c
>>>>+++ b/arch/powerpc/kernel/eeh_pe.c
>>>>@@ -914,7 +914,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
>>>>  	if (pe->type & EEH_PE_PHB) {
>>>>  		bus = pe->phb->bus;
>>>>  	} else if (pe->type & EEH_PE_BUS ||
>>>>-		   pe->type & EEH_PE_DEVICE) {
>>>>+		   pe->type & EEH_PE_DEVICE ||
>>>>+		   pe->type & EEH_PE_VF) {
>>>>  		if (pe->bus) {
>>>>  			bus = pe->bus;
>>>>  			goto out;
>>>>
>
>
>-- 
>Alexey
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE
  2015-10-30  8:05       ` Alexey Kardashevskiy
@ 2015-11-02 22:45         ` Wei Yang
  0 siblings, 0 replies; 50+ messages in thread
From: Wei Yang @ 2015-11-02 22:45 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, gwshan, bhelgaas, mpe, linuxppc-dev, linux-pci

On Fri, Oct 30, 2015 at 07:05:05PM +1100, Alexey Kardashevskiy wrote:
>On 10/30/2015 06:18 PM, Wei Yang wrote:
>>On Fri, Oct 30, 2015 at 03:11:20PM +1100, Alexey Kardashevskiy wrote:
>>>On 10/26/2015 02:15 PM, Wei Yang wrote:
>>>>PEs for VFs don't have primary bus. So they have to have their own reset
>>>>backend, which is used during EEH recovery. The patch implements the reset
>>>>backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
>>>>in the PE.
>>>>
>>>>[gwshan: changelog and code refactoring]
>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/include/asm/eeh.h               |   1 +
>>>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 134 ++++++++++++++++++++++++++-
>>>>  2 files changed, 134 insertions(+), 1 deletion(-)
>>>>
>>>>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>>>index ec21f8f..331c856 100644
>>>>--- a/arch/powerpc/include/asm/eeh.h
>>>>+++ b/arch/powerpc/include/asm/eeh.h
>>>>@@ -136,6 +136,7 @@ struct eeh_dev {
>>>>  	int pcix_cap;			/* Saved PCIx capability	*/
>>>>  	int pcie_cap;			/* Saved PCIe capability	*/
>>>>  	int aer_cap;			/* Saved AER capability		*/
>>>>+	int af_cap;			/* Saved AF capability		*/
>>>>  	struct eeh_pe *pe;		/* Associated PE		*/
>>>>  	struct list_head list;		/* Form link list in the PE	*/
>>>>  	struct pci_controller *phb;	/* Associated PHB		*/
>>>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>>>index cfd55dd..017cd72 100644
>>>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>>>@@ -404,6 +404,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
>>>>  	edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
>>>>  	edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
>>>>  	edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
>>>>+	edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
>>>>  	if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
>>>>  		edev->mode |= EEH_DEV_BRIDGE;
>>>>  		if (edev->pcie_cap) {
>>>>@@ -893,6 +894,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>>>  	return 0;
>>>>  }
>>>>
>>>>+static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
>>>>+				     u16 mask, bool af_flr_rst)
>
>Missed this - @af_flr_rst is only used for warnings so better do:
>s/bool af_flr_rst/const char *reset_type/
>to make it explicit.
>

Looks good, will change in next version.

>
>>>>+{
>>>>+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>>>+	int status, i;
>>>>+
>>>>+	/* Wait for Transaction Pending bit to be cleared */
>>>>+	for (i = 0; i < 4; i++) {
>>>>+		eeh_ops->read_config(pdn, pos, 2, &status);
>>>
>>>
>>>gcc should have complained on using uninitialized @status here.
>>>
>>
>>I remove the obj file and re-compile the file, not the warning.
>
>Hm. Does not warn me either.
>
>>And took a look at other places where read_config() is called. The laster
>>parameter is not initialized before called.
>
>So? It does not make it right.
>
>>You see the error during build?
>
>Why does it matter? We have an undefined behavior here which we should not.
>You could test the return values from read_config() but you do not so at
>least initialize local variables.
>

I believe your concern is reasonable.

I suggest to have a separate patch to fix the read_config() by initialize the
last parameter.

>
>>
>>>
>>>>+		if (!(status & mask))
>>>>+			return;
>>>>+
>>>>+		msleep((1 << i) * 100);
>>>>+	}
>>>>+
>>>>+	pr_warn("%s: Pending transaction while issuing %s FLR to "
>>>>+		"%04x:%02x:%02x.%01x\n",
>>>
>>>Do not wrap user-visible strings.
>>>
>>
>>Will change this.
>>
>>>
>>>>+		__func__, af_flr_rst ? "AF" : "",
>>>>+		edev->phb->global_number, pdn->busno,
>>>>+		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>>>>+}
>>>>+
>>>>+static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
>>>>+{
>>>>+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>>>+	u32 reg;
>>>>+
>>>>+	if (!edev->pcie_cap)
>>>>+		return -ENOTTY;
>>>
>>>
>>>Can pnv_eeh_do_flr() be really called on a non PCIe device, can we get that
>>>far? WARN_ON_ONCE() may be?
>>>
>>
>>So you suggest to add a WARN_ON_ONCE() in this condition, right?
>
>I am asking a question here whether it makes sense or not to add a
>WARN_ON_ONCE or replace "if" with WARN_ON_ONCE or not having pcie_cap
>initialized is possible in this code - which one is it?
>

I think the check here is reasonable. In the body of this function, pcie_cap
is used to access the config space. If we remove this, it would be a chance to
access a not correct area.

>
>>
>>>
>>>>+
>>>>+	eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, &reg);
>>>
>>>
>>>... and here about uninitialized @reg.
>>>
>>>
>>>>+	if (!(reg & PCI_EXP_DEVCAP_FLR))
>>>>+		return -ENOTTY;
>>>>+
>>>>+	switch (option) {
>>>>+	case EEH_RESET_HOT:
>>>>+	case EEH_RESET_FUNDAMENTAL:
>>>>+		pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
>>>>+					 PCI_EXP_DEVSTA_TRPND, false);
>>>>+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>>>+				     4, &reg);
>>>>+		reg |= PCI_EXP_DEVCTL_BCR_FLR;
>>>>+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>>>+				      4, reg);
>>>>+		msleep(EEH_PE_RST_HOLD_TIME);
>>>>+		break;
>>>>+	case EEH_RESET_DEACTIVATE:
>>>>+		eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>>>+				     4, &reg);
>>>>+		reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
>>>>+		eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
>>>>+				      4, reg);
>>>>+		msleep(EEH_PE_RST_SETTLE_TIME);
>>>>+		break;
>>>>+	}
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
>>>>+{
>>>>+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>>>+	u32 cap;
>>>>+
>>>>+	if (!edev->af_cap)
>>>>+		return -ENOTTY;
>>>>+
>>>>+	eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);
>>>
>>>
>>>... and here about @cap.
>>>
>>>>+	if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
>>>>+		return -ENOTTY;
>>>>+
>>>>+	switch (option) {
>>>>+	case EEH_RESET_HOT:
>>>>+	case EEH_RESET_FUNDAMENTAL:
>>>>+		/*
>>>>+		 * Wait for Transaction Pending bit to clear. A word-aligned
>>>>+		 * test is used, so we use the conrol offset rather than status
>>>>+		 * and shift the test bit to match.
>>>
>>>
>>>Why word-aligned (not byte or double word)?
>>>
>>
>>I copied this words from pci_af_flr(). Actually, I don't tried to understand
>>this reason.
>
>Ok. I looked at pci_af_flr().
>
>In this patch, the comment before pnv_eeh_wait_for_pending() is missing,
>something like "pnv_eeh_wait_for_pending() uses a word-size accessor so @pos
>must be work-aligned".
>
>
>>
>>>>+		 */
>>>>+		pnv_eeh_wait_for_pending(pdn, edev->af_cap + PCI_AF_CTRL,
>>>>+					 PCI_AF_STATUS_TP << 8, true);
>>>>+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL,
>>>>+				      1, PCI_AF_CTRL_FLR);
>>>>+		msleep(EEH_PE_RST_HOLD_TIME);
>>>>+		break;
>>>>+	case EEH_RESET_DEACTIVATE:
>>>>+		eeh_ops->write_config(pdn, edev->af_cap + PCI_AF_CTRL, 1, 0);
>>>>+		msleep(EEH_PE_RST_SETTLE_TIME);
>>>
>>>
>>>btw there is an unrelated issue with EEH_PE_RST_SETTLE_TIME which is defined
>>>as 1800 which is A LOT (+250ms from EEH_PE_RST_HOLD_TIME and for some reason
>>>this is actually doubled so there is another reset somewhere).
>>>
>>
>>I don't know the reason for this value. This code keeps aligned with other
>>reset functions, like pnv_eeh_bridge_reset().
>
>
>Are they all in POWERNV/EEH or generic PCI uses same values on, for example,
>x86?
>
>
>
>>>Booting a guest with 63 VFs takes 6 minutes or so, is there a good reason for
>>>such a huge timeout?
>>>
>>>
>>>>+		break;
>>>>+	}
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+static int pnv_eeh_reset_vf(struct pci_dn *pdn, int option)
>>>>+{
>>>>+	int ret;
>>>>+
>>>>+	ret = pnv_eeh_do_flr(pdn, option);
>>>>+	if (ret != -ENOTTY)
>>>>+		return ret;
>>>>+
>>>>+	return pnv_eeh_do_af_flr(pdn, option);
>>>>+}
>>>>+
>>>>+static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
>>>>+{
>>>>+	struct eeh_dev *edev, *tmp;
>>>>+	struct pci_dn *pdn;
>>>>+	int ret;
>>>>+
>>>>+	eeh_pe_for_each_dev(pe, edev, tmp) {
>>>>+		pdn = eeh_dev_to_pdn(edev);
>>>>+		ret = pnv_eeh_reset_vf(pdn, option);
>>>>+		if (ret)
>>>>+			return ret;
>>>>+	}
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>  void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>>>>  {
>>>>  	struct pci_controller *hose;
>>>>@@ -968,7 +1090,9 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
>>>>  		}
>>>>
>>>>  		bus = eeh_pe_bus_get(pe);
>>>>-		if (pci_is_root_bus(bus) ||
>>>>+		if (pe->type & EEH_PE_VF)
>>>>+			ret = pnv_eeh_vf_pe_reset(pe, option);
>>>>+		else if (pci_is_root_bus(bus) ||
>>>>  			pci_is_root_bus(bus->parent))
>>>>  			ret = pnv_eeh_root_reset(hose, option);
>>>>  		else
>>>>@@ -1108,6 +1232,14 @@ static inline bool pnv_eeh_cfg_blocked(struct pci_dn *pdn)
>>>>  	if (!edev || !edev->pe)
>>>>  		return false;
>>>>
>>>>+	/*
>>>>+	 * We will issue FLR or AF FLR to all VFs, which are contained
>>>>+	 * in VF PE. It relies on the EEH PCI config accessors. So we
>>>>+	 * can't block them during the window.
>>>>+	 */
>>>>+	if ((edev->physfn) && (edev->pe->state & EEH_PE_RESET))
>>>
>>>
>>>Extra braces around edev->physfn.
>>>
>>
>>Will remove it.
>>
>>>
>>>
>>>>+		return false;
>>>>+
>>>>  	if (edev->pe->state & EEH_PE_CFG_BLOCKED)
>>>>  		return true;
>>>>
>>>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2015-11-02 22:46 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-26  3:15 [PATCH V10 00/12] VF EEH on Power8 Wei Yang
2015-10-26  3:15 ` [PATCH V10 01/12] PCI/IOV: Rename and export virtfn_add/virtfn_remove Wei Yang
2015-10-27  1:31   ` Andrew Donnellan
2015-10-27 23:06   ` Bjorn Helgaas
2015-10-28  1:21     ` Wei Yang
2015-10-26  3:15 ` [PATCH V10 02/12] PCI: Add pcibios_bus_add_device() weak function Wei Yang
2015-10-27  5:07   ` Andrew Donnellan
2015-10-26  3:15 ` [PATCH V10 03/12] powerpc/pci: Cache VF index in pci_dn Wei Yang
2015-10-27  5:01   ` Andrew Donnellan
2015-10-27 22:04   ` Daniel Axtens
2015-10-28  1:45     ` Wei Yang
2015-10-30  2:05   ` Alexey Kardashevskiy
2015-10-30  2:48     ` Wei Yang
2015-10-26  3:15 ` [PATCH V10 04/12] powerpc/pci: Remove VFs prior to PF Wei Yang
2015-10-30  3:04   ` Alexey Kardashevskiy
2015-10-30  6:31     ` Wei Yang
2015-10-26  3:15 ` [PATCH V10 05/12] powerpc/eeh: Cache only BARs, not windows or IOV BARs Wei Yang
2015-10-29  3:29   ` Daniel Axtens
2015-10-29  8:57     ` Wei Yang
2015-10-30  3:22   ` Alexey Kardashevskiy
2015-10-30  6:37     ` Wei Yang
2015-10-26  3:15 ` [PATCH V10 06/12] powerpc/powernv: EEH device for VF Wei Yang
2015-10-30  3:33   ` Alexey Kardashevskiy
2015-10-30  6:52     ` Wei Yang
2015-10-30  7:36       ` Alexey Kardashevskiy
2015-10-30  7:58         ` Wei Yang
2015-10-26  3:15 ` [PATCH V10 07/12] powerpc/eeh: Create PE for VFs Wei Yang
2015-10-30  3:46   ` Alexey Kardashevskiy
2015-10-30  6:59     ` Wei Yang
2015-10-26  3:15 ` [PATCH V10 08/12] powerpc/powernv: Support EEH reset for VF PE Wei Yang
2015-10-30  4:11   ` Alexey Kardashevskiy
2015-10-30  7:18     ` Wei Yang
2015-10-30  8:05       ` Alexey Kardashevskiy
2015-11-02 22:45         ` Wei Yang
2015-10-26  3:15 ` [PATCH V10 09/12] powerpc/powernv: Support PCI config restore for VFs Wei Yang
2015-10-30  4:56   ` Alexey Kardashevskiy
2015-10-30  8:17     ` Wei Yang
2015-10-26  3:16 ` [PATCH V10 10/12] powerpc/eeh: Support error recovery for VF PE Wei Yang
2015-10-30  5:20   ` Alexey Kardashevskiy
2015-11-01  1:53     ` Wei Yang
2015-11-01 23:40       ` Alexey Kardashevskiy
2015-11-02  9:39         ` Wei Yang
2015-10-26  3:16 ` [PATCH V10 11/12] powerpc/eeh: Don't block PCI config on resetting " Wei Yang
2015-10-30  5:42   ` Alexey Kardashevskiy
2015-10-30  7:19     ` Wei Yang
2015-10-26  3:16 ` [PATCH V10 12/12] powerpc/eeh: Handle hot removed VF when PF is EEH aware Wei Yang
2015-10-30  5:35   ` Alexey Kardashevskiy
2015-10-30  7:29     ` Wei Yang
2015-10-27 23:11 ` [PATCH V10 00/12] VF EEH on Power8 Bjorn Helgaas
2015-10-28  1:50   ` Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.