All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/21] PowerPC/PowerNV: PCI Slot Management
@ 2015-04-27  6:37 ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org for
review. The PCI slots are exposed by skiboot with device node properties, and
kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug because
the PE is assigned during PHB fixup time, which is called for once during system
boot time. For this, the PCI infrastructure on PowerNV platform has been reworked
for a lot. After that, the PE and its corresponding resources (IODT, M32DT, M64
segments, DMA32 and bypass window) are assigned upon updating PCI bridge's
resources, which might decide PE# assigned to the PE (e.g. M64 resources, on P8
strictly speaking). Each PE will maintain a reference count, which is (number of
child PCI devices + 1). That indicates when last child PCI device leaves the PE,
the PE and its included resources will be relased and put back into free pool
again. With this design, the PE will be released when EEH PE is released.
PATCH[1 - 8] are related to this part.

>From skiboot perspective, PCI slot is providing (hot/fundamental/complete) resets
to EEH. The kernel gets to know if skiboot supports various reset on one particular
PCI slot through device-tree node. If it does, EEH will utilize the functionality
provided by skiboot. Besides, the device-tree nodes have to change in order to
support PCI hotplug. For example, when one PCI adapter inserted to one slot,
its device-tree node should be added to the system dynamically. Conversely, the
device-tree node should be removed from the system when the PCI adapter is going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. Patch[9 - 20] are
doing the related work.

The last patch is the standalone PCI hotplug driver for PowerNV platform. When
removing PCI adapter from one PCI slot, which is invoked by command in userland,
the skiboot will power off the slot to save power and remove all device-tree
nodes for all PCI devices behind the slot. Conversely, the Power to the slot
is turned on, the PCI devices behind the slot is rescanned, and the device-tree
nodes for those newly detected PCI devices will be built in skiboot. For both
of cases, one message will be sent to kernel by skiboot so that the kernel
can adjust the device-tree accordingly. At the same time, the kernel also have
to deallocate or allocate PE# and its related resources (PE# and so on) for the
removed/added PCI devices.

Changelog
=========
v3:
   * Rebased to 4.1.RC0
   * PowerNV PCI infrasturcture is total refactored in order to support PCI
     hotplug. The PowerNV hotplug driver is also reworked a lot because of
     the changes in skiboot in order to support PCI hotplug.


Gavin Shan (21):
  pci: Add pcibios_setup_bridge()
  powerpc/powernv: Enable M64 on P7IOC
  powerpc/powernv: M64 support improvement
  powerpc/powernv: Improve IO and M32 mapping
  powerpc/powernv: Improve DMA32 segment assignment
  powerpc/powernv: Create PEs dynamically
  powerpc/powernv: Release PEs dynamically
  powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
  powerpc/eeh: Delay probing EEH device during hotplug
  powerpc/powernv: Use PCI slot reset infrastructure
  powerpc/powernv: Fundamental reset for PCI bus reset
  powerpc/pci: Don't scan empty slot
  powerpc/pci: Move pcibios_find_pci_bus() around
  powerpc/powernv: Introduce pnv_pci_poll()
  powerpc/powernv: Functions to get/reset PCI slot status
  powerpc/pci: Delay creating pci_dn
  powerpc/pci: Create eeh_dev while creating pci_dn
  powerpc/pci: Export traverse_pci_device_nodes()
  powerpc/pci: Update bridge windows on PCI plugging
  powerpc/powernv: Select OF_DYNAMIC
  pci/hotplug: PowerPC PowerNV PCI hotplug driver

 arch/powerpc/include/asm/eeh.h                 |    7 +-
 arch/powerpc/include/asm/opal-api.h            |    7 +-
 arch/powerpc/include/asm/opal.h                |    7 +-
 arch/powerpc/include/asm/pci-bridge.h          |    7 +-
 arch/powerpc/include/asm/pnv-pci.h             |    5 +
 arch/powerpc/include/asm/ppc-pci.h             |    7 +-
 arch/powerpc/kernel/eeh.c                      |    7 +
 arch/powerpc/kernel/eeh_dev.c                  |   20 +-
 arch/powerpc/kernel/pci-common.c               |   18 +-
 arch/powerpc/kernel/pci-hotplug.c              |   44 +-
 arch/powerpc/kernel/pci_dn.c                   |  119 +-
 arch/powerpc/platforms/maple/pci.c             |   35 +-
 arch/powerpc/platforms/pasemi/pci.c            |    3 -
 arch/powerpc/platforms/powermac/pci.c          |   39 +-
 arch/powerpc/platforms/powernv/Kconfig         |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c   |  245 ++--
 arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
 arch/powerpc/platforms/powernv/pci-ioda.c      | 1657 +++++++++++++++---------
 arch/powerpc/platforms/powernv/pci.c           |   64 +-
 arch/powerpc/platforms/powernv/pci.h           |   52 +-
 arch/powerpc/platforms/pseries/msi.c           |    4 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
 arch/powerpc/platforms/pseries/setup.c         |    9 +-
 drivers/pci/hotplug/Kconfig                    |   12 +
 drivers/pci/hotplug/Makefile                   |    4 +
 drivers/pci/hotplug/powernv_php.c              |  146 +++
 drivers/pci/hotplug/powernv_php.h              |   78 ++
 drivers/pci/hotplug/powernv_php_slot.c         |  913 +++++++++++++
 drivers/pci/setup-bus.c                        |   12 +-
 include/linux/pci.h                            |    1 +
 30 files changed, 2623 insertions(+), 935 deletions(-)
 create mode 100644 drivers/pci/hotplug/powernv_php.c
 create mode 100644 drivers/pci/hotplug/powernv_php.h
 create mode 100644 drivers/pci/hotplug/powernv_php_slot.c

-- 
2.1.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 00/21] PowerPC/PowerNV: PCI Slot Management
@ 2015-04-27  6:37 ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org for
review. The PCI slots are exposed by skiboot with device node properties, and
kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug because
the PE is assigned during PHB fixup time, which is called for once during system
boot time. For this, the PCI infrastructure on PowerNV platform has been reworked
for a lot. After that, the PE and its corresponding resources (IODT, M32DT, M64
segments, DMA32 and bypass window) are assigned upon updating PCI bridge's
resources, which might decide PE# assigned to the PE (e.g. M64 resources, on P8
strictly speaking). Each PE will maintain a reference count, which is (number of
child PCI devices + 1). That indicates when last child PCI device leaves the PE,
the PE and its included resources will be relased and put back into free pool
again. With this design, the PE will be released when EEH PE is released.
PATCH[1 - 8] are related to this part.

>From skiboot perspective, PCI slot is providing (hot/fundamental/complete) resets
to EEH. The kernel gets to know if skiboot supports various reset on one particular
PCI slot through device-tree node. If it does, EEH will utilize the functionality
provided by skiboot. Besides, the device-tree nodes have to change in order to
support PCI hotplug. For example, when one PCI adapter inserted to one slot,
its device-tree node should be added to the system dynamically. Conversely, the
device-tree node should be removed from the system when the PCI adapter is going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. Patch[9 - 20] are
doing the related work.

The last patch is the standalone PCI hotplug driver for PowerNV platform. When
removing PCI adapter from one PCI slot, which is invoked by command in userland,
the skiboot will power off the slot to save power and remove all device-tree
nodes for all PCI devices behind the slot. Conversely, the Power to the slot
is turned on, the PCI devices behind the slot is rescanned, and the device-tree
nodes for those newly detected PCI devices will be built in skiboot. For both
of cases, one message will be sent to kernel by skiboot so that the kernel
can adjust the device-tree accordingly. At the same time, the kernel also have
to deallocate or allocate PE# and its related resources (PE# and so on) for the
removed/added PCI devices.

Changelog
=========
v3:
   * Rebased to 4.1.RC0
   * PowerNV PCI infrasturcture is total refactored in order to support PCI
     hotplug. The PowerNV hotplug driver is also reworked a lot because of
     the changes in skiboot in order to support PCI hotplug.


Gavin Shan (21):
  pci: Add pcibios_setup_bridge()
  powerpc/powernv: Enable M64 on P7IOC
  powerpc/powernv: M64 support improvement
  powerpc/powernv: Improve IO and M32 mapping
  powerpc/powernv: Improve DMA32 segment assignment
  powerpc/powernv: Create PEs dynamically
  powerpc/powernv: Release PEs dynamically
  powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
  powerpc/eeh: Delay probing EEH device during hotplug
  powerpc/powernv: Use PCI slot reset infrastructure
  powerpc/powernv: Fundamental reset for PCI bus reset
  powerpc/pci: Don't scan empty slot
  powerpc/pci: Move pcibios_find_pci_bus() around
  powerpc/powernv: Introduce pnv_pci_poll()
  powerpc/powernv: Functions to get/reset PCI slot status
  powerpc/pci: Delay creating pci_dn
  powerpc/pci: Create eeh_dev while creating pci_dn
  powerpc/pci: Export traverse_pci_device_nodes()
  powerpc/pci: Update bridge windows on PCI plugging
  powerpc/powernv: Select OF_DYNAMIC
  pci/hotplug: PowerPC PowerNV PCI hotplug driver

 arch/powerpc/include/asm/eeh.h                 |    7 +-
 arch/powerpc/include/asm/opal-api.h            |    7 +-
 arch/powerpc/include/asm/opal.h                |    7 +-
 arch/powerpc/include/asm/pci-bridge.h          |    7 +-
 arch/powerpc/include/asm/pnv-pci.h             |    5 +
 arch/powerpc/include/asm/ppc-pci.h             |    7 +-
 arch/powerpc/kernel/eeh.c                      |    7 +
 arch/powerpc/kernel/eeh_dev.c                  |   20 +-
 arch/powerpc/kernel/pci-common.c               |   18 +-
 arch/powerpc/kernel/pci-hotplug.c              |   44 +-
 arch/powerpc/kernel/pci_dn.c                   |  119 +-
 arch/powerpc/platforms/maple/pci.c             |   35 +-
 arch/powerpc/platforms/pasemi/pci.c            |    3 -
 arch/powerpc/platforms/powermac/pci.c          |   39 +-
 arch/powerpc/platforms/powernv/Kconfig         |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c   |  245 ++--
 arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
 arch/powerpc/platforms/powernv/pci-ioda.c      | 1657 +++++++++++++++---------
 arch/powerpc/platforms/powernv/pci.c           |   64 +-
 arch/powerpc/platforms/powernv/pci.h           |   52 +-
 arch/powerpc/platforms/pseries/msi.c           |    4 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
 arch/powerpc/platforms/pseries/setup.c         |    9 +-
 drivers/pci/hotplug/Kconfig                    |   12 +
 drivers/pci/hotplug/Makefile                   |    4 +
 drivers/pci/hotplug/powernv_php.c              |  146 +++
 drivers/pci/hotplug/powernv_php.h              |   78 ++
 drivers/pci/hotplug/powernv_php_slot.c         |  913 +++++++++++++
 drivers/pci/setup-bus.c                        |   12 +-
 include/linux/pci.h                            |    1 +
 30 files changed, 2623 insertions(+), 935 deletions(-)
 create mode 100644 drivers/pci/hotplug/powernv_php.c
 create mode 100644 drivers/pci/hotplug/powernv_php.h
 create mode 100644 drivers/pci/hotplug/powernv_php_slot.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 01/21] pci: Add pcibios_setup_bridge()
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(),
which is called for once after PCI probing and resource assignment
are completed, to allocate platform required resources for PCI devices:
PE#, IO and MMIO mapping, DMA address translation (TCE) table etc.
Obviously, it's not hotplug friendly.

The patch adds weak function pcibios_setup_bridge(), which is called
by pci_setup_bridge(). PowerPC PowerNV platform will reuse the function
to assign above platform required resources to newly added PCI devices,
in order to support PCI hotplug on PowerPC PowerNV platform.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/setup-bus.c | 12 +++++++++---
 include/linux/pci.h     |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4fd0cac..a7d0c3c 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -674,7 +674,8 @@ static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
 	pci_write_config_dword(bridge, PCI_PREF_LIMIT_UPPER32, lu);
 }
 
-static void __pci_setup_bridge(struct pci_bus *bus, unsigned long type)
+
+void pci_setup_bridge_resources(struct pci_bus *bus, unsigned long type)
 {
 	struct pci_dev *bridge = bus->self;
 
@@ -693,12 +694,17 @@ static void __pci_setup_bridge(struct pci_bus *bus, unsigned long type)
 	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl);
 }
 
+void __weak pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+	pci_setup_bridge_resources(bus, type);
+}
+
 void pci_setup_bridge(struct pci_bus *bus)
 {
 	unsigned long type = IORESOURCE_IO | IORESOURCE_MEM |
 				  IORESOURCE_PREFETCH;
 
-	__pci_setup_bridge(bus, type);
+	pcibios_setup_bridge(bus, type);
 }
 
 
@@ -1467,7 +1473,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		/* avoiding touch the one without PREF */
 		if (type & IORESOURCE_PREFETCH)
 			type = IORESOURCE_PREFETCH;
-		__pci_setup_bridge(bus, type);
+		pci_setup_bridge_resources(bus, type);
 		/* for next child res under same bridge */
 		r->flags = old_flags;
 	}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..68c5ef9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1175,6 +1175,7 @@ void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
 		  void *userdata);
 int pci_cfg_space_size(struct pci_dev *dev);
 unsigned char pci_bus_max_busnr(struct pci_bus *bus);
+void pci_setup_bridge_resources(struct pci_bus *bus, unsigned long type);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 					 unsigned long type);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 01/21] pci: Add pcibios_setup_bridge()
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(),
which is called for once after PCI probing and resource assignment
are completed, to allocate platform required resources for PCI devices:
PE#, IO and MMIO mapping, DMA address translation (TCE) table etc.
Obviously, it's not hotplug friendly.

The patch adds weak function pcibios_setup_bridge(), which is called
by pci_setup_bridge(). PowerPC PowerNV platform will reuse the function
to assign above platform required resources to newly added PCI devices,
in order to support PCI hotplug on PowerPC PowerNV platform.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/setup-bus.c | 12 +++++++++---
 include/linux/pci.h     |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4fd0cac..a7d0c3c 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -674,7 +674,8 @@ static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
 	pci_write_config_dword(bridge, PCI_PREF_LIMIT_UPPER32, lu);
 }
 
-static void __pci_setup_bridge(struct pci_bus *bus, unsigned long type)
+
+void pci_setup_bridge_resources(struct pci_bus *bus, unsigned long type)
 {
 	struct pci_dev *bridge = bus->self;
 
@@ -693,12 +694,17 @@ static void __pci_setup_bridge(struct pci_bus *bus, unsigned long type)
 	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl);
 }
 
+void __weak pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+	pci_setup_bridge_resources(bus, type);
+}
+
 void pci_setup_bridge(struct pci_bus *bus)
 {
 	unsigned long type = IORESOURCE_IO | IORESOURCE_MEM |
 				  IORESOURCE_PREFETCH;
 
-	__pci_setup_bridge(bus, type);
+	pcibios_setup_bridge(bus, type);
 }
 
 
@@ -1467,7 +1473,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		/* avoiding touch the one without PREF */
 		if (type & IORESOURCE_PREFETCH)
 			type = IORESOURCE_PREFETCH;
-		__pci_setup_bridge(bus, type);
+		pci_setup_bridge_resources(bus, type);
 		/* for next child res under same bridge */
 		r->flags = old_flags;
 	}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..68c5ef9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1175,6 +1175,7 @@ void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
 		  void *userdata);
 int pci_cfg_space_size(struct pci_dev *dev);
 unsigned char pci_bus_max_busnr(struct pci_bus *bus);
+void pci_setup_bridge_resources(struct pci_bus *bus, unsigned long type);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 					 unsigned long type);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 02/21] powerpc/powernv: Enable M64 on P7IOC
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The patch enables M64 window on P7IOC, which has been enabled on
PHB3. Comparing to PHB3, there are 16 M64 BARs and each of them
are divided to 8 segments. So each PHB can support 128 M64 segments.
Also, P7IOC has M64DT, which helps mapping one particular M64
segment# to arbitrary PE#. However, we just provide 128 M64 (16 BARs)
segments and fixed mapping between PE# and M64 segment# in order
to keep same logic to support M64 for PHB3 and P7IOC. In turn, we
just need different phb->init_m64() hooks for P7IOC and PHB3.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 115 ++++++++++++++++++++++++++----
 1 file changed, 103 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f8bc950..646962f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -165,6 +165,67 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
 	clear_bit(pe, phb->ioda.pe_alloc);
 }
 
+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+	struct resource *r;
+	int seg;
+	s64 rc;
+
+	/* Each PHB supports 16 separate M64 BARs, each of which are
+	 * divided into 8 segments. So there are number of M64 segments
+	 * as total PE#, which is 128.
+	 */
+	for (seg = 0; seg < phb->ioda.total_pe; seg += 8) {
+		unsigned long base;
+
+		base = phb->ioda.m64_base + seg * phb->ioda.m64_segsize;
+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+						 OPAL_M64_WINDOW_TYPE,
+						 seg / 8,
+						 base,
+						 0, /* unused */
+						 8 * phb->ioda.m64_segsize);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Failure %lld configuring M64 BAR#%d on PHB#%d\n",
+				rc, seg / 8, phb->hose->global_number);
+			goto fail;
+		}
+
+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
+					      OPAL_M64_WINDOW_TYPE,
+					      seg / 8,
+					      OPAL_ENABLE_M64_SPLIT);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Failure %lld enabling M64 BAR#%d on PHB#%d\n",
+				rc, seg / 8, phb->hose->global_number);
+			goto fail;
+		}
+	}
+
+	/* Strip of the segment used by the reserved PE, which
+	 * is expected to be 0 or last supported PE#
+	 */
+	r = &phb->hose->mem_resources[1];
+	if (phb->ioda.reserved_pe == 0)
+		r->start += phb->ioda.m64_segsize;
+	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
+		r->end -= phb->ioda.m64_segsize;
+	else
+		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
+			phb->ioda.reserved_pe);
+
+	return 0;
+
+fail:
+	for ( ; seg >= 0; seg -= 8)
+		opal_pci_phb_mmio_enable(phb->opal_id,
+					 OPAL_M64_WINDOW_TYPE,
+					 seg / 8,
+					 OPAL_DISABLE_M64);
+
+	return -EIO;
+}
+
 /* The default M64 BAR is shared by all PEs */
 static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 {
@@ -222,7 +283,7 @@ fail:
 	return -EIO;
 }
 
-static void pnv_ioda2_reserve_m64_pe(struct pnv_phb *phb)
+static void pnv_ioda_reserve_m64_pe(struct pnv_phb *phb)
 {
 	resource_size_t sgsz = phb->ioda.m64_segsize;
 	struct pci_dev *pdev;
@@ -248,8 +309,8 @@ static void pnv_ioda2_reserve_m64_pe(struct pnv_phb *phb)
 	}
 }
 
-static int pnv_ioda2_pick_m64_pe(struct pnv_phb *phb,
-				 struct pci_bus *bus, int all)
+static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
+				struct pci_bus *bus, int all)
 {
 	resource_size_t segsz = phb->ioda.m64_segsize;
 	struct pci_dev *pdev;
@@ -346,6 +407,28 @@ done:
 			pe->master = master_pe;
 			list_add_tail(&pe->list, &master_pe->slaves);
 		}
+
+		/* P7IOC supports M64DT, which helps mapping M64 segment
+		 * to one particular PE#. Unfortunately, PHB3 has fixed
+		 * mapping between M64 segment and PE#. In order for same
+		 * logic for P7IOC and PHB3, we enforce fixed mapping
+		 * between M64 segment and PE# on P7IOC.
+		 */
+		if (phb->type == PNV_PHB_IODA1) {
+			int64_t rc;
+
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+							 pe->pe_number,
+							 OPAL_M64_WINDOW_TYPE,
+							 pe->pe_number / 8,
+							 pe->pe_number % 8);
+			if (rc != OPAL_SUCCESS)
+				pr_warn("%s: Failure %lld mapping "
+					"M64 for PHB#%d-PE#%d\n",
+					__func__, rc,
+					phb->hose->global_number,
+					pe->pe_number);
+		}
 	}
 
 	kfree(pe_alloc);
@@ -360,12 +443,6 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	const u32 *r;
 	u64 pci_addr;
 
-	/* FIXME: Support M64 for P7IOC */
-	if (phb->type != PNV_PHB_IODA2) {
-		pr_info("  Not support M64 window\n");
-		return;
-	}
-
 	if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
 		pr_info("  Firmware too old to support M64 window\n");
 		return;
@@ -394,9 +471,23 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 
 	/* Use last M64 BAR to cover M64 window */
 	phb->ioda.m64_bar_idx = 15;
-	phb->init_m64 = pnv_ioda2_init_m64;
-	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
-	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
+	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
+	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		phb->init_m64 = pnv_ioda1_init_m64;
+		break;
+	case PNV_PHB_IODA2:
+		phb->init_m64 = pnv_ioda2_init_m64;
+		break;
+	default:
+		phb->init_m64 = NULL;
+		phb->reserve_m64_pe = NULL;
+		phb->pick_m64_pe = NULL;
+		phb->ioda.m64_size = 0;
+		phb->ioda.m64_segsize = 0;
+		phb->ioda.m64_base = 0;
+	}
 }
 
 static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 02/21] powerpc/powernv: Enable M64 on P7IOC
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The patch enables M64 window on P7IOC, which has been enabled on
PHB3. Comparing to PHB3, there are 16 M64 BARs and each of them
are divided to 8 segments. So each PHB can support 128 M64 segments.
Also, P7IOC has M64DT, which helps mapping one particular M64
segment# to arbitrary PE#. However, we just provide 128 M64 (16 BARs)
segments and fixed mapping between PE# and M64 segment# in order
to keep same logic to support M64 for PHB3 and P7IOC. In turn, we
just need different phb->init_m64() hooks for P7IOC and PHB3.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 115 ++++++++++++++++++++++++++----
 1 file changed, 103 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f8bc950..646962f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -165,6 +165,67 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
 	clear_bit(pe, phb->ioda.pe_alloc);
 }
 
+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+	struct resource *r;
+	int seg;
+	s64 rc;
+
+	/* Each PHB supports 16 separate M64 BARs, each of which are
+	 * divided into 8 segments. So there are number of M64 segments
+	 * as total PE#, which is 128.
+	 */
+	for (seg = 0; seg < phb->ioda.total_pe; seg += 8) {
+		unsigned long base;
+
+		base = phb->ioda.m64_base + seg * phb->ioda.m64_segsize;
+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+						 OPAL_M64_WINDOW_TYPE,
+						 seg / 8,
+						 base,
+						 0, /* unused */
+						 8 * phb->ioda.m64_segsize);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Failure %lld configuring M64 BAR#%d on PHB#%d\n",
+				rc, seg / 8, phb->hose->global_number);
+			goto fail;
+		}
+
+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
+					      OPAL_M64_WINDOW_TYPE,
+					      seg / 8,
+					      OPAL_ENABLE_M64_SPLIT);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Failure %lld enabling M64 BAR#%d on PHB#%d\n",
+				rc, seg / 8, phb->hose->global_number);
+			goto fail;
+		}
+	}
+
+	/* Strip of the segment used by the reserved PE, which
+	 * is expected to be 0 or last supported PE#
+	 */
+	r = &phb->hose->mem_resources[1];
+	if (phb->ioda.reserved_pe == 0)
+		r->start += phb->ioda.m64_segsize;
+	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
+		r->end -= phb->ioda.m64_segsize;
+	else
+		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
+			phb->ioda.reserved_pe);
+
+	return 0;
+
+fail:
+	for ( ; seg >= 0; seg -= 8)
+		opal_pci_phb_mmio_enable(phb->opal_id,
+					 OPAL_M64_WINDOW_TYPE,
+					 seg / 8,
+					 OPAL_DISABLE_M64);
+
+	return -EIO;
+}
+
 /* The default M64 BAR is shared by all PEs */
 static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 {
@@ -222,7 +283,7 @@ fail:
 	return -EIO;
 }
 
-static void pnv_ioda2_reserve_m64_pe(struct pnv_phb *phb)
+static void pnv_ioda_reserve_m64_pe(struct pnv_phb *phb)
 {
 	resource_size_t sgsz = phb->ioda.m64_segsize;
 	struct pci_dev *pdev;
@@ -248,8 +309,8 @@ static void pnv_ioda2_reserve_m64_pe(struct pnv_phb *phb)
 	}
 }
 
-static int pnv_ioda2_pick_m64_pe(struct pnv_phb *phb,
-				 struct pci_bus *bus, int all)
+static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
+				struct pci_bus *bus, int all)
 {
 	resource_size_t segsz = phb->ioda.m64_segsize;
 	struct pci_dev *pdev;
@@ -346,6 +407,28 @@ done:
 			pe->master = master_pe;
 			list_add_tail(&pe->list, &master_pe->slaves);
 		}
+
+		/* P7IOC supports M64DT, which helps mapping M64 segment
+		 * to one particular PE#. Unfortunately, PHB3 has fixed
+		 * mapping between M64 segment and PE#. In order for same
+		 * logic for P7IOC and PHB3, we enforce fixed mapping
+		 * between M64 segment and PE# on P7IOC.
+		 */
+		if (phb->type == PNV_PHB_IODA1) {
+			int64_t rc;
+
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+							 pe->pe_number,
+							 OPAL_M64_WINDOW_TYPE,
+							 pe->pe_number / 8,
+							 pe->pe_number % 8);
+			if (rc != OPAL_SUCCESS)
+				pr_warn("%s: Failure %lld mapping "
+					"M64 for PHB#%d-PE#%d\n",
+					__func__, rc,
+					phb->hose->global_number,
+					pe->pe_number);
+		}
 	}
 
 	kfree(pe_alloc);
@@ -360,12 +443,6 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	const u32 *r;
 	u64 pci_addr;
 
-	/* FIXME: Support M64 for P7IOC */
-	if (phb->type != PNV_PHB_IODA2) {
-		pr_info("  Not support M64 window\n");
-		return;
-	}
-
 	if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
 		pr_info("  Firmware too old to support M64 window\n");
 		return;
@@ -394,9 +471,23 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 
 	/* Use last M64 BAR to cover M64 window */
 	phb->ioda.m64_bar_idx = 15;
-	phb->init_m64 = pnv_ioda2_init_m64;
-	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
-	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
+	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
+	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		phb->init_m64 = pnv_ioda1_init_m64;
+		break;
+	case PNV_PHB_IODA2:
+		phb->init_m64 = pnv_ioda2_init_m64;
+		break;
+	default:
+		phb->init_m64 = NULL;
+		phb->reserve_m64_pe = NULL;
+		phb->pick_m64_pe = NULL;
+		phb->ioda.m64_size = 0;
+		phb->ioda.m64_segsize = 0;
+		phb->ioda.m64_base = 0;
+	}
 }
 
 static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 03/21] powerpc/powernv: M64 support improvement
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

We're having the hardware or enforced (on P7IOC) limitation: M64
segment#x can only be assigned to PE#x. IO and M32 segment can be
mapped to arbitrary PE# via IODT and M32DT. It means the PE number
should be x if M64 segment#x has been assigned to the PE. Also, each
PE own one M64 segment at most. Currently, we are reserving PE#
according to root port's M64 window. It won't be reliable once we
extend M64 windows of root port, or the upstream port of the PCIE
switch behind root port to PHB's M64 window, in order to support
PCI hotplug in future.

The patch reserves PE# for M64 segments according to the M64 resources
of the PCI devices (not bridges) contained in the PE. Besides, it's
always worthy to trace the M64 segments consumed by the PE, which can
be released at PCI unplugging time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 190 ++++++++++++++++++------------
 arch/powerpc/platforms/powernv/pci.h      |  10 +-
 2 files changed, 122 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 646962f..a994882 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -283,28 +283,78 @@ fail:
 	return -EIO;
 }
 
-static void pnv_ioda_reserve_m64_pe(struct pnv_phb *phb)
+/* We extend the M64 window of root port, or the upstream bridge port
+ * of the PCIE switch behind root port. So we shouldn't reserve PEs
+ * for M64 resources because there are no (normal) PCI devices consuming
+ * M64 resources on the PCI buses leading from root port, or the upstream
+ * bridge port. The function returns true if the indicated PCI bus needs
+ * reserved PEs because of M64 resources in advance. Otherwise, the
+ * function returns false.
+ */
+static bool pnv_ioda_need_m64_pe(struct pnv_phb *phb,
+				 struct pci_bus *bus)
 {
-	resource_size_t sgsz = phb->ioda.m64_segsize;
+	/* Root bus */
+	if (!bus || pci_is_root_bus(bus))
+		return false;
+
+	/* Bus leading from root port. We need check what types of PCI
+	 * devices on the bus. If it's connecting PCI bridge, we don't
+	 * need reserve M64 PEs for it. Otherwise, we still need to do
+	 * that.
+	 */
+	if (pci_is_root_bus(bus->self->bus)) {
+		struct pci_dev *pdev;
+
+		list_for_each_entry(pdev, &bus->devices, bus_list) {
+			if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL)
+				return true;
+		}
+
+		return false;
+	}
+
+	/* Bus leading from the upstream bridge port on top level */
+	if (pci_is_root_bus(bus->self->bus->self->bus))
+		return false;
+
+	return true;
+}
+
+static void pnv_ioda_reserve_m64_pe(struct pnv_phb *phb,
+				    struct pci_bus *bus)
+{
+	resource_size_t segsz = phb->ioda.m64_segsize;
 	struct pci_dev *pdev;
 	struct resource *r;
-	int base, step, i;
+	unsigned long pe_no, limit;
+	int i;
 
-	/*
-	 * Root bus always has full M64 range and root port has
-	 * M64 range used in reality. So we're checking root port
-	 * instead of root bus.
+	if (!pnv_ioda_need_m64_pe(phb, bus))
+		return;
+
+	/* The bridge's M64 window might have been extended to the
+	 * PHB's M64 window in order to support PCI hotplug. So the
+	 * bridge's M64 window isn't reliable to be used for picking
+	 * PE# for its leading PCI bus. We have to check the M64
+	 * resources consumed by the PCI devices, which seat on the
+	 * PCI bus.
 	 */
-	list_for_each_entry(pdev, &phb->hose->bus->devices, bus_list) {
-		for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
-			r = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
-			if (!r->parent ||
-			    !pnv_pci_is_mem_pref_64(r->flags))
+	list_for_each_entry(pdev, &bus->devices, bus_list) {
+		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+#ifdef CONFIG_PCI_IOV
+			if (i >= PCI_IOV_RESOURCES && i <= PCI_IOV_RESOURCE_END)
+				continue;
+#endif
+			r = &pdev->resource[i];
+			if (!r->flags || r->start >= r->end ||
+			    !r->parent || !pnv_pci_is_mem_pref_64(r->flags))
 				continue;
 
-			base = (r->start - phb->ioda.m64_base) / sgsz;
-			for (step = 0; step < resource_size(r) / sgsz; step++)
-				pnv_ioda_reserve_pe(phb, base + step);
+			pe_no = (r->start - phb->ioda.m64_base) / segsz;
+			limit = ALIGN(r->end - phb->ioda.m64_base, segsz) / segsz;
+			for (; pe_no < limit; pe_no++)
+				pnv_ioda_reserve_pe(phb, pe_no);
 		}
 	}
 }
@@ -316,85 +366,64 @@ static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
 	struct pci_dev *pdev;
 	struct resource *r;
 	struct pnv_ioda_pe *master_pe, *pe;
-	unsigned long size, *pe_alloc;
-	bool found;
-	int start, i, j;
-
-	/* Root bus shouldn't use M64 */
-	if (pci_is_root_bus(bus))
-		return IODA_INVALID_PE;
-
-	/* We support only one M64 window on each bus */
-	found = false;
-	pci_bus_for_each_resource(bus, r, i) {
-		if (r && r->parent &&
-		    pnv_pci_is_mem_pref_64(r->flags)) {
-			found = true;
-			break;
-		}
-	}
+	unsigned long size, *pe_bitsmap;
+	unsigned long pe_no, limit;
+	int i;
 
-	/* No M64 window found ? */
-	if (!found)
+	if (!pnv_ioda_need_m64_pe(phb, bus))
 		return IODA_INVALID_PE;
 
-	/* Allocate bitmap */
+        /* Allocate bitmap */
 	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
-	pe_alloc = kzalloc(size, GFP_KERNEL);
-	if (!pe_alloc) {
-		pr_warn("%s: Out of memory !\n",
-			__func__);
+	pe_bitsmap = kzalloc(size, GFP_KERNEL);
+	if (!pe_bitsmap) {
+		pr_warn("%s: Out of memory !\n", __func__);
 		return IODA_INVALID_PE;
 	}
 
-	/*
-	 * Figure out reserved PE numbers by the PE
-	 * the its child PEs.
-	 */
-	start = (r->start - phb->ioda.m64_base) / segsz;
-	for (i = 0; i < resource_size(r) / segsz; i++)
-		set_bit(start + i, pe_alloc);
-
-	if (all)
-		goto done;
-
-	/*
-	 * If the PE doesn't cover all subordinate buses,
-	 * we need subtract from reserved PEs for children.
+	/* The bridge's M64 window might be extended to PHB's M64
+	 * window by intention to support PCI hotplug. So we have
+	 * to check the M64 resources consumed by the PCI devices
+	 * on the PCI bus.
 	 */
 	list_for_each_entry(pdev, &bus->devices, bus_list) {
-		if (!pdev->subordinate)
-			continue;
+		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+#ifdef CONFIG_PCI_IOV
+			if (i >= PCI_IOV_RESOURCES &&
+			    i <= PCI_IOV_RESOURCE_END)
+				continue;
+#endif
+			/* Don't scan bridge's window if the PE
+			 * doesn't contain its subordinate bus.
+			 */
+			if (!all && i >= PCI_BRIDGE_RESOURCES &&
+			    i <= PCI_BRIDGE_RESOURCE_END)
+				continue;
 
-		pci_bus_for_each_resource(pdev->subordinate, r, i) {
-			if (!r || !r->parent ||
-			    !pnv_pci_is_mem_pref_64(r->flags))
+			r = &pdev->resource[i];
+			if (!r->flags || r->start >= r->end ||
+			    !r->parent || !pnv_pci_is_mem_pref_64(r->flags))
 				continue;
 
-			start = (r->start - phb->ioda.m64_base) / segsz;
-			for (j = 0; j < resource_size(r) / segsz ; j++)
-				clear_bit(start + j, pe_alloc);
-                }
-        }
+			pe_no = (r->start - phb->ioda.m64_base) / segsz;
+			limit = ALIGN(r->end - phb->ioda.m64_base, segsz) / segsz;
+			for (; pe_no < limit; pe_no++)
+				set_bit(pe_no, pe_bitsmap);
+		}
+	}
 
-	/*
-	 * the current bus might not own M64 window and that's all
-	 * contributed by its child buses. For the case, we needn't
-	 * pick M64 dependent PE#.
-	 */
-	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
-		kfree(pe_alloc);
+	/* No M64 window found ? */
+	if (bitmap_empty(pe_bitsmap, phb->ioda.total_pe)) {
+		kfree(pe_bitsmap);
 		return IODA_INVALID_PE;
 	}
 
-	/*
-	 * Figure out the master PE and put all slave PEs to master
-	 * PE's list to form compound PE.
+	/* Figure out the master PE and put all slave PEs
+	 * to master PE's list to form compound PE.
 	 */
-done:
 	master_pe = NULL;
 	i = -1;
-	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
+	while ((i = find_next_bit(pe_bitsmap, phb->ioda.total_pe, i + 1)) <
 		phb->ioda.total_pe) {
 		pe = &phb->ioda.pe_array[i];
 
@@ -408,6 +437,13 @@ done:
 			list_add_tail(&pe->list, &master_pe->slaves);
 		}
 
+		/* Pick the M64 segment, which should be available. Also,
+		 * those M64 segments consumed by slave PEs are contributed
+		 * to the master PE.
+		 */
+		BUG_ON(test_and_set_bit(pe->pe_number, phb->ioda.m64_segmap));
+		BUG_ON(test_and_set_bit(pe->pe_number, master_pe->m64_segmap));
+
 		/* P7IOC supports M64DT, which helps mapping M64 segment
 		 * to one particular PE#. Unfortunately, PHB3 has fixed
 		 * mapping between M64 segment and PE#. In order for same
@@ -431,7 +467,7 @@ done:
 		}
 	}
 
-	kfree(pe_alloc);
+	kfree(pe_bitsmap);
 	return master_pe->pe_number;
 }
 
@@ -1233,7 +1269,7 @@ static void pnv_pci_ioda_setup_PEs(void)
 
 		/* M64 layout might affect PE allocation */
 		if (phb->reserve_m64_pe)
-			phb->reserve_m64_pe(phb);
+			phb->reserve_m64_pe(phb, phb->hose->bus);
 
 		pnv_ioda_setup_PEs(hose->bus);
 	}
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 070ee88..19022cf 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -49,6 +49,13 @@ struct pnv_ioda_pe {
 	/* PE number */
 	unsigned int		pe_number;
 
+	/* IO/M32/M64 segments consumed by the PE. Each PE can
+	 * have one M64 segment at most, but M64 segments consumed
+	 * by slave PEs will be contributed to the master PE. One
+	 * PE can own multiple IO and M32 segments.
+	 */
+	unsigned long		m64_segmap[8];
+
 	/* "Weight" assigned to the PE for the sake of DMA resource
 	 * allocations
 	 */
@@ -114,7 +121,7 @@ struct pnv_phb {
 	u32 (*bdfn_to_pe)(struct pnv_phb *phb, struct pci_bus *bus, u32 devfn);
 	void (*shutdown)(struct pnv_phb *phb);
 	int (*init_m64)(struct pnv_phb *phb);
-	void (*reserve_m64_pe)(struct pnv_phb *phb);
+	void (*reserve_m64_pe)(struct pnv_phb *phb, struct pci_bus *bus);
 	int (*pick_m64_pe)(struct pnv_phb *phb, struct pci_bus *bus, int all);
 	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
 	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
@@ -153,6 +160,7 @@ struct pnv_phb {
 			struct mutex		pe_alloc_mutex;
 
 			/* M32 & IO segment maps */
+			unsigned long		m64_segmap[8];
 			unsigned int		*m32_segmap;
 			unsigned int		*io_segmap;
 			struct pnv_ioda_pe	*pe_array;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 03/21] powerpc/powernv: M64 support improvement
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

We're having the hardware or enforced (on P7IOC) limitation: M64
segment#x can only be assigned to PE#x. IO and M32 segment can be
mapped to arbitrary PE# via IODT and M32DT. It means the PE number
should be x if M64 segment#x has been assigned to the PE. Also, each
PE own one M64 segment at most. Currently, we are reserving PE#
according to root port's M64 window. It won't be reliable once we
extend M64 windows of root port, or the upstream port of the PCIE
switch behind root port to PHB's M64 window, in order to support
PCI hotplug in future.

The patch reserves PE# for M64 segments according to the M64 resources
of the PCI devices (not bridges) contained in the PE. Besides, it's
always worthy to trace the M64 segments consumed by the PE, which can
be released at PCI unplugging time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 190 ++++++++++++++++++------------
 arch/powerpc/platforms/powernv/pci.h      |  10 +-
 2 files changed, 122 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 646962f..a994882 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -283,28 +283,78 @@ fail:
 	return -EIO;
 }
 
-static void pnv_ioda_reserve_m64_pe(struct pnv_phb *phb)
+/* We extend the M64 window of root port, or the upstream bridge port
+ * of the PCIE switch behind root port. So we shouldn't reserve PEs
+ * for M64 resources because there are no (normal) PCI devices consuming
+ * M64 resources on the PCI buses leading from root port, or the upstream
+ * bridge port. The function returns true if the indicated PCI bus needs
+ * reserved PEs because of M64 resources in advance. Otherwise, the
+ * function returns false.
+ */
+static bool pnv_ioda_need_m64_pe(struct pnv_phb *phb,
+				 struct pci_bus *bus)
 {
-	resource_size_t sgsz = phb->ioda.m64_segsize;
+	/* Root bus */
+	if (!bus || pci_is_root_bus(bus))
+		return false;
+
+	/* Bus leading from root port. We need check what types of PCI
+	 * devices on the bus. If it's connecting PCI bridge, we don't
+	 * need reserve M64 PEs for it. Otherwise, we still need to do
+	 * that.
+	 */
+	if (pci_is_root_bus(bus->self->bus)) {
+		struct pci_dev *pdev;
+
+		list_for_each_entry(pdev, &bus->devices, bus_list) {
+			if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL)
+				return true;
+		}
+
+		return false;
+	}
+
+	/* Bus leading from the upstream bridge port on top level */
+	if (pci_is_root_bus(bus->self->bus->self->bus))
+		return false;
+
+	return true;
+}
+
+static void pnv_ioda_reserve_m64_pe(struct pnv_phb *phb,
+				    struct pci_bus *bus)
+{
+	resource_size_t segsz = phb->ioda.m64_segsize;
 	struct pci_dev *pdev;
 	struct resource *r;
-	int base, step, i;
+	unsigned long pe_no, limit;
+	int i;
 
-	/*
-	 * Root bus always has full M64 range and root port has
-	 * M64 range used in reality. So we're checking root port
-	 * instead of root bus.
+	if (!pnv_ioda_need_m64_pe(phb, bus))
+		return;
+
+	/* The bridge's M64 window might have been extended to the
+	 * PHB's M64 window in order to support PCI hotplug. So the
+	 * bridge's M64 window isn't reliable to be used for picking
+	 * PE# for its leading PCI bus. We have to check the M64
+	 * resources consumed by the PCI devices, which seat on the
+	 * PCI bus.
 	 */
-	list_for_each_entry(pdev, &phb->hose->bus->devices, bus_list) {
-		for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
-			r = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
-			if (!r->parent ||
-			    !pnv_pci_is_mem_pref_64(r->flags))
+	list_for_each_entry(pdev, &bus->devices, bus_list) {
+		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+#ifdef CONFIG_PCI_IOV
+			if (i >= PCI_IOV_RESOURCES && i <= PCI_IOV_RESOURCE_END)
+				continue;
+#endif
+			r = &pdev->resource[i];
+			if (!r->flags || r->start >= r->end ||
+			    !r->parent || !pnv_pci_is_mem_pref_64(r->flags))
 				continue;
 
-			base = (r->start - phb->ioda.m64_base) / sgsz;
-			for (step = 0; step < resource_size(r) / sgsz; step++)
-				pnv_ioda_reserve_pe(phb, base + step);
+			pe_no = (r->start - phb->ioda.m64_base) / segsz;
+			limit = ALIGN(r->end - phb->ioda.m64_base, segsz) / segsz;
+			for (; pe_no < limit; pe_no++)
+				pnv_ioda_reserve_pe(phb, pe_no);
 		}
 	}
 }
@@ -316,85 +366,64 @@ static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
 	struct pci_dev *pdev;
 	struct resource *r;
 	struct pnv_ioda_pe *master_pe, *pe;
-	unsigned long size, *pe_alloc;
-	bool found;
-	int start, i, j;
-
-	/* Root bus shouldn't use M64 */
-	if (pci_is_root_bus(bus))
-		return IODA_INVALID_PE;
-
-	/* We support only one M64 window on each bus */
-	found = false;
-	pci_bus_for_each_resource(bus, r, i) {
-		if (r && r->parent &&
-		    pnv_pci_is_mem_pref_64(r->flags)) {
-			found = true;
-			break;
-		}
-	}
+	unsigned long size, *pe_bitsmap;
+	unsigned long pe_no, limit;
+	int i;
 
-	/* No M64 window found ? */
-	if (!found)
+	if (!pnv_ioda_need_m64_pe(phb, bus))
 		return IODA_INVALID_PE;
 
-	/* Allocate bitmap */
+        /* Allocate bitmap */
 	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
-	pe_alloc = kzalloc(size, GFP_KERNEL);
-	if (!pe_alloc) {
-		pr_warn("%s: Out of memory !\n",
-			__func__);
+	pe_bitsmap = kzalloc(size, GFP_KERNEL);
+	if (!pe_bitsmap) {
+		pr_warn("%s: Out of memory !\n", __func__);
 		return IODA_INVALID_PE;
 	}
 
-	/*
-	 * Figure out reserved PE numbers by the PE
-	 * the its child PEs.
-	 */
-	start = (r->start - phb->ioda.m64_base) / segsz;
-	for (i = 0; i < resource_size(r) / segsz; i++)
-		set_bit(start + i, pe_alloc);
-
-	if (all)
-		goto done;
-
-	/*
-	 * If the PE doesn't cover all subordinate buses,
-	 * we need subtract from reserved PEs for children.
+	/* The bridge's M64 window might be extended to PHB's M64
+	 * window by intention to support PCI hotplug. So we have
+	 * to check the M64 resources consumed by the PCI devices
+	 * on the PCI bus.
 	 */
 	list_for_each_entry(pdev, &bus->devices, bus_list) {
-		if (!pdev->subordinate)
-			continue;
+		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+#ifdef CONFIG_PCI_IOV
+			if (i >= PCI_IOV_RESOURCES &&
+			    i <= PCI_IOV_RESOURCE_END)
+				continue;
+#endif
+			/* Don't scan bridge's window if the PE
+			 * doesn't contain its subordinate bus.
+			 */
+			if (!all && i >= PCI_BRIDGE_RESOURCES &&
+			    i <= PCI_BRIDGE_RESOURCE_END)
+				continue;
 
-		pci_bus_for_each_resource(pdev->subordinate, r, i) {
-			if (!r || !r->parent ||
-			    !pnv_pci_is_mem_pref_64(r->flags))
+			r = &pdev->resource[i];
+			if (!r->flags || r->start >= r->end ||
+			    !r->parent || !pnv_pci_is_mem_pref_64(r->flags))
 				continue;
 
-			start = (r->start - phb->ioda.m64_base) / segsz;
-			for (j = 0; j < resource_size(r) / segsz ; j++)
-				clear_bit(start + j, pe_alloc);
-                }
-        }
+			pe_no = (r->start - phb->ioda.m64_base) / segsz;
+			limit = ALIGN(r->end - phb->ioda.m64_base, segsz) / segsz;
+			for (; pe_no < limit; pe_no++)
+				set_bit(pe_no, pe_bitsmap);
+		}
+	}
 
-	/*
-	 * the current bus might not own M64 window and that's all
-	 * contributed by its child buses. For the case, we needn't
-	 * pick M64 dependent PE#.
-	 */
-	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
-		kfree(pe_alloc);
+	/* No M64 window found ? */
+	if (bitmap_empty(pe_bitsmap, phb->ioda.total_pe)) {
+		kfree(pe_bitsmap);
 		return IODA_INVALID_PE;
 	}
 
-	/*
-	 * Figure out the master PE and put all slave PEs to master
-	 * PE's list to form compound PE.
+	/* Figure out the master PE and put all slave PEs
+	 * to master PE's list to form compound PE.
 	 */
-done:
 	master_pe = NULL;
 	i = -1;
-	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
+	while ((i = find_next_bit(pe_bitsmap, phb->ioda.total_pe, i + 1)) <
 		phb->ioda.total_pe) {
 		pe = &phb->ioda.pe_array[i];
 
@@ -408,6 +437,13 @@ done:
 			list_add_tail(&pe->list, &master_pe->slaves);
 		}
 
+		/* Pick the M64 segment, which should be available. Also,
+		 * those M64 segments consumed by slave PEs are contributed
+		 * to the master PE.
+		 */
+		BUG_ON(test_and_set_bit(pe->pe_number, phb->ioda.m64_segmap));
+		BUG_ON(test_and_set_bit(pe->pe_number, master_pe->m64_segmap));
+
 		/* P7IOC supports M64DT, which helps mapping M64 segment
 		 * to one particular PE#. Unfortunately, PHB3 has fixed
 		 * mapping between M64 segment and PE#. In order for same
@@ -431,7 +467,7 @@ done:
 		}
 	}
 
-	kfree(pe_alloc);
+	kfree(pe_bitsmap);
 	return master_pe->pe_number;
 }
 
@@ -1233,7 +1269,7 @@ static void pnv_pci_ioda_setup_PEs(void)
 
 		/* M64 layout might affect PE allocation */
 		if (phb->reserve_m64_pe)
-			phb->reserve_m64_pe(phb);
+			phb->reserve_m64_pe(phb, phb->hose->bus);
 
 		pnv_ioda_setup_PEs(hose->bus);
 	}
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 070ee88..19022cf 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -49,6 +49,13 @@ struct pnv_ioda_pe {
 	/* PE number */
 	unsigned int		pe_number;
 
+	/* IO/M32/M64 segments consumed by the PE. Each PE can
+	 * have one M64 segment at most, but M64 segments consumed
+	 * by slave PEs will be contributed to the master PE. One
+	 * PE can own multiple IO and M32 segments.
+	 */
+	unsigned long		m64_segmap[8];
+
 	/* "Weight" assigned to the PE for the sake of DMA resource
 	 * allocations
 	 */
@@ -114,7 +121,7 @@ struct pnv_phb {
 	u32 (*bdfn_to_pe)(struct pnv_phb *phb, struct pci_bus *bus, u32 devfn);
 	void (*shutdown)(struct pnv_phb *phb);
 	int (*init_m64)(struct pnv_phb *phb);
-	void (*reserve_m64_pe)(struct pnv_phb *phb);
+	void (*reserve_m64_pe)(struct pnv_phb *phb, struct pci_bus *bus);
 	int (*pick_m64_pe)(struct pnv_phb *phb, struct pci_bus *bus, int all);
 	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
 	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
@@ -153,6 +160,7 @@ struct pnv_phb {
 			struct mutex		pe_alloc_mutex;
 
 			/* M32 & IO segment maps */
+			unsigned long		m64_segmap[8];
 			unsigned int		*m32_segmap;
 			unsigned int		*io_segmap;
 			struct pnv_ioda_pe	*pe_array;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 04/21] powerpc/powernv: Improve IO and M32 mapping
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The PHB's IO or M32 window is divided evenly to segments, each of
them can be mapped to arbitrary PE# by IODT or M32DT. Current code
figures out the consumed IO and M32 segments by one particular PE
from the windows of the PE's upstream bridge. It won't be reliable
once we extend M64 windows of root port, or the upstream port of
the PCIE switch behind root port to PHB's IO or M32 window, in order
to support PCI hotplug in future.

The patch improves pnv_ioda_setup_pe_seg() to calculate PE's consumed
IO or M32 segments from its contained devices, no bridge involved any
more. Also, the logic to mapping IO and M32 segments are combined to
simplify the code. Besides, it's always worthy to trace the IO and M32
segments consumed by one PE, which can be released at PCI unplugging
time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 150 ++++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.h      |  13 +--
 2 files changed, 85 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index a994882..7e6e266 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2543,77 +2543,92 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 }
 #endif /* CONFIG_PCI_IOV */
 
-/*
- * This function is supposed to be called on basis of PE from top
- * to bottom style. So the the I/O or MMIO segment assigned to
- * parent PE could be overrided by its child PEs if necessary.
- */
-static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
-				  struct pnv_ioda_pe *pe)
+static int pnv_ioda_map_pe_one_res(struct pci_controller *hose,
+				   struct pnv_ioda_pe *pe,
+				   struct resource *res)
 {
 	struct pnv_phb *phb = hose->private_data;
 	struct pci_bus_region region;
-	struct resource *res;
-	int i, index;
-	int rc;
+	unsigned int segsize, index;
+	unsigned long *segmap, *pe_segmap;
+	uint16_t win_type;
+	int64_t rc;
 
-	/*
-	 * NOTE: We only care PCI bus based PE for now. For PCI
-	 * device based PE, for example SRIOV sensitive VF should
-	 * be figured out later.
-	 */
-	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+	/* Check if we need map the resource */
+	if (!res->parent || !res->flags ||
+	    res->start > res->end ||
+	    pnv_pci_is_mem_pref_64(res->flags))
+		return 0;
 
-	pci_bus_for_each_resource(pe->pbus, res, i) {
-		if (!res || !res->flags ||
-		    res->start > res->end)
-			continue;
+	if (res->flags & IORESOURCE_IO) {
+		segmap = phb->ioda.io_segmap;
+		pe_segmap = pe->io_segmap;
+		region.start = res->start - phb->ioda.io_pci_base;
+		region.end = res->end - phb->ioda.io_pci_base;
+		segsize = phb->ioda.io_segsize;
+		win_type = OPAL_IO_WINDOW_TYPE;
+	} else {
+		segmap = phb->ioda.m32_segmap;
+		pe_segmap = pe->m32_segmap;
+		region.start = res->start -
+			       hose->mem_offset[0] -
+			       phb->ioda.m32_pci_base;
+		region.end = res->end -
+			     hose->mem_offset[0] -
+			     phb->ioda.m32_pci_base;
+		segsize = phb->ioda.m32_segsize;
+		win_type = OPAL_M32_WINDOW_TYPE;
+	}
+
+	index = region.start / segsize;
+	while (index < phb->ioda.total_pe &&
+	       region.start <= region.end) {
+		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+				pe->pe_number, win_type, 0, index);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Error %lld mapping (%d) seg#%d to PE#%d\n",
+				__func__, rc, win_type, index, pe->pe_number);
+			return -EIO;
+		}
 
-		if (res->flags & IORESOURCE_IO) {
-			region.start = res->start - phb->ioda.io_pci_base;
-			region.end   = res->end - phb->ioda.io_pci_base;
-			index = region.start / phb->ioda.io_segsize;
+		set_bit(index, segmap);
+		set_bit(index, pe_segmap);
+		region.start += segsize;
+		index++;
+	}
 
-			while (index < phb->ioda.total_pe &&
-			       region.start <= region.end) {
-				phb->ioda.io_segmap[index] = pe->pe_number;
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping IO "
-					       "segment #%d to PE#%d\n",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
+	return 0;
+}
 
-				region.start += phb->ioda.io_segsize;
-				index++;
-			}
-		} else if ((res->flags & IORESOURCE_MEM) &&
-			   !pnv_pci_is_mem_pref_64(res->flags)) {
-			region.start = res->start -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			region.end   = res->end -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			index = region.start / phb->ioda.m32_segsize;
-
-			while (index < phb->ioda.total_pe &&
-			       region.start <= region.end) {
-				phb->ioda.m32_segmap[index] = pe->pe_number;
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping M32 "
-					       "segment#%d to PE#%d",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
+static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
+				  struct pnv_ioda_pe *pe)
+{
+	struct pci_dev *pdev;
+	struct resource *res;
+	int i;
 
-				region.start += phb->ioda.m32_segsize;
-				index++;
-			}
+	/* This function only works for bus dependent PE */
+	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+
+	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
+		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
+			res = &pdev->resource[i];
+			if (pnv_ioda_map_pe_one_res(hose, pe, res))
+				return;
+		}
+
+		/* If the PE contains all subordinate PCI buses, the
+		 * resources of the child bridges should be mapped
+		 * to the PE as well.
+		 */
+		if (!(pe->flags & PNV_IODA_PE_BUS_ALL) ||
+		    (pdev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
+			continue;
+
+		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
+			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
+			if (pnv_ioda_map_pe_one_res(hose, pe, res))
+				return;
 		}
 	}
 }
@@ -2780,7 +2795,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
+	unsigned long size, pemap_off;
 	const __be64 *prop64;
 	const __be32 *prop32;
 	int len;
@@ -2865,19 +2880,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
 	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
-	m32map_off = size;
-	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
-	if (phb->type == PNV_PHB_IODA1) {
-		iomap_off = size;
-		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
-	}
 	pemap_off = size;
 	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
-	phb->ioda.m32_segmap = aux + m32map_off;
-	if (phb->type == PNV_PHB_IODA1)
-		phb->ioda.io_segmap = aux + iomap_off;
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 19022cf..f604bb7 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -54,6 +54,8 @@ struct pnv_ioda_pe {
 	 * by slave PEs will be contributed to the master PE. One
 	 * PE can own multiple IO and M32 segments.
 	 */
+	unsigned long		io_segmap[8];
+	unsigned long		m32_segmap[8];
 	unsigned long		m64_segmap[8];
 
 	/* "Weight" assigned to the PE for the sake of DMA resource
@@ -154,16 +156,15 @@ struct pnv_phb {
 			unsigned int		io_segsize;
 			unsigned int		io_pci_base;
 
-			/* PE allocation bitmap */
+			/* PE allocation */
+			struct pnv_ioda_pe	*pe_array;
 			unsigned long		*pe_alloc;
-			/* PE allocation mutex */
 			struct mutex		pe_alloc_mutex;
 
-			/* M32 & IO segment maps */
+			/* IO/M32/M64 segment bitmaps */
+			unsigned long		io_segmap[8];
+			unsigned long		m32_segmap[8];
 			unsigned long		m64_segmap[8];
-			unsigned int		*m32_segmap;
-			unsigned int		*io_segmap;
-			struct pnv_ioda_pe	*pe_array;
 
 			/* IRQ chip */
 			int			irq_chip_init;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 04/21] powerpc/powernv: Improve IO and M32 mapping
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The PHB's IO or M32 window is divided evenly to segments, each of
them can be mapped to arbitrary PE# by IODT or M32DT. Current code
figures out the consumed IO and M32 segments by one particular PE
from the windows of the PE's upstream bridge. It won't be reliable
once we extend M64 windows of root port, or the upstream port of
the PCIE switch behind root port to PHB's IO or M32 window, in order
to support PCI hotplug in future.

The patch improves pnv_ioda_setup_pe_seg() to calculate PE's consumed
IO or M32 segments from its contained devices, no bridge involved any
more. Also, the logic to mapping IO and M32 segments are combined to
simplify the code. Besides, it's always worthy to trace the IO and M32
segments consumed by one PE, which can be released at PCI unplugging
time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 150 ++++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.h      |  13 +--
 2 files changed, 85 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index a994882..7e6e266 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2543,77 +2543,92 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 }
 #endif /* CONFIG_PCI_IOV */
 
-/*
- * This function is supposed to be called on basis of PE from top
- * to bottom style. So the the I/O or MMIO segment assigned to
- * parent PE could be overrided by its child PEs if necessary.
- */
-static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
-				  struct pnv_ioda_pe *pe)
+static int pnv_ioda_map_pe_one_res(struct pci_controller *hose,
+				   struct pnv_ioda_pe *pe,
+				   struct resource *res)
 {
 	struct pnv_phb *phb = hose->private_data;
 	struct pci_bus_region region;
-	struct resource *res;
-	int i, index;
-	int rc;
+	unsigned int segsize, index;
+	unsigned long *segmap, *pe_segmap;
+	uint16_t win_type;
+	int64_t rc;
 
-	/*
-	 * NOTE: We only care PCI bus based PE for now. For PCI
-	 * device based PE, for example SRIOV sensitive VF should
-	 * be figured out later.
-	 */
-	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+	/* Check if we need map the resource */
+	if (!res->parent || !res->flags ||
+	    res->start > res->end ||
+	    pnv_pci_is_mem_pref_64(res->flags))
+		return 0;
 
-	pci_bus_for_each_resource(pe->pbus, res, i) {
-		if (!res || !res->flags ||
-		    res->start > res->end)
-			continue;
+	if (res->flags & IORESOURCE_IO) {
+		segmap = phb->ioda.io_segmap;
+		pe_segmap = pe->io_segmap;
+		region.start = res->start - phb->ioda.io_pci_base;
+		region.end = res->end - phb->ioda.io_pci_base;
+		segsize = phb->ioda.io_segsize;
+		win_type = OPAL_IO_WINDOW_TYPE;
+	} else {
+		segmap = phb->ioda.m32_segmap;
+		pe_segmap = pe->m32_segmap;
+		region.start = res->start -
+			       hose->mem_offset[0] -
+			       phb->ioda.m32_pci_base;
+		region.end = res->end -
+			     hose->mem_offset[0] -
+			     phb->ioda.m32_pci_base;
+		segsize = phb->ioda.m32_segsize;
+		win_type = OPAL_M32_WINDOW_TYPE;
+	}
+
+	index = region.start / segsize;
+	while (index < phb->ioda.total_pe &&
+	       region.start <= region.end) {
+		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+				pe->pe_number, win_type, 0, index);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Error %lld mapping (%d) seg#%d to PE#%d\n",
+				__func__, rc, win_type, index, pe->pe_number);
+			return -EIO;
+		}
 
-		if (res->flags & IORESOURCE_IO) {
-			region.start = res->start - phb->ioda.io_pci_base;
-			region.end   = res->end - phb->ioda.io_pci_base;
-			index = region.start / phb->ioda.io_segsize;
+		set_bit(index, segmap);
+		set_bit(index, pe_segmap);
+		region.start += segsize;
+		index++;
+	}
 
-			while (index < phb->ioda.total_pe &&
-			       region.start <= region.end) {
-				phb->ioda.io_segmap[index] = pe->pe_number;
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping IO "
-					       "segment #%d to PE#%d\n",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
+	return 0;
+}
 
-				region.start += phb->ioda.io_segsize;
-				index++;
-			}
-		} else if ((res->flags & IORESOURCE_MEM) &&
-			   !pnv_pci_is_mem_pref_64(res->flags)) {
-			region.start = res->start -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			region.end   = res->end -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			index = region.start / phb->ioda.m32_segsize;
-
-			while (index < phb->ioda.total_pe &&
-			       region.start <= region.end) {
-				phb->ioda.m32_segmap[index] = pe->pe_number;
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping M32 "
-					       "segment#%d to PE#%d",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
+static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
+				  struct pnv_ioda_pe *pe)
+{
+	struct pci_dev *pdev;
+	struct resource *res;
+	int i;
 
-				region.start += phb->ioda.m32_segsize;
-				index++;
-			}
+	/* This function only works for bus dependent PE */
+	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+
+	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
+		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
+			res = &pdev->resource[i];
+			if (pnv_ioda_map_pe_one_res(hose, pe, res))
+				return;
+		}
+
+		/* If the PE contains all subordinate PCI buses, the
+		 * resources of the child bridges should be mapped
+		 * to the PE as well.
+		 */
+		if (!(pe->flags & PNV_IODA_PE_BUS_ALL) ||
+		    (pdev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
+			continue;
+
+		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
+			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
+			if (pnv_ioda_map_pe_one_res(hose, pe, res))
+				return;
 		}
 	}
 }
@@ -2780,7 +2795,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
+	unsigned long size, pemap_off;
 	const __be64 *prop64;
 	const __be32 *prop32;
 	int len;
@@ -2865,19 +2880,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
 	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
-	m32map_off = size;
-	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
-	if (phb->type == PNV_PHB_IODA1) {
-		iomap_off = size;
-		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
-	}
 	pemap_off = size;
 	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
-	phb->ioda.m32_segmap = aux + m32map_off;
-	if (phb->type == PNV_PHB_IODA1)
-		phb->ioda.io_segmap = aux + iomap_off;
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 19022cf..f604bb7 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -54,6 +54,8 @@ struct pnv_ioda_pe {
 	 * by slave PEs will be contributed to the master PE. One
 	 * PE can own multiple IO and M32 segments.
 	 */
+	unsigned long		io_segmap[8];
+	unsigned long		m32_segmap[8];
 	unsigned long		m64_segmap[8];
 
 	/* "Weight" assigned to the PE for the sake of DMA resource
@@ -154,16 +156,15 @@ struct pnv_phb {
 			unsigned int		io_segsize;
 			unsigned int		io_pci_base;
 
-			/* PE allocation bitmap */
+			/* PE allocation */
+			struct pnv_ioda_pe	*pe_array;
 			unsigned long		*pe_alloc;
-			/* PE allocation mutex */
 			struct mutex		pe_alloc_mutex;
 
-			/* M32 & IO segment maps */
+			/* IO/M32/M64 segment bitmaps */
+			unsigned long		io_segmap[8];
+			unsigned long		m32_segmap[8];
 			unsigned long		m64_segmap[8];
-			unsigned int		*m32_segmap;
-			unsigned int		*io_segmap;
-			struct pnv_ioda_pe	*pe_array;
 
 			/* IRQ chip */
 			int			irq_chip_init;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 05/21] powerpc/powernv: Improve DMA32 segment assignment
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

For P7IOC, the whole available DMA32 space, which is below the
MEM32 space, is evenly divided into 256MB segments. How many
continuous segments assigned to one particular PE depends on
the PE's DMA weight that is figured out from the type of each
PCI devices contained in the PE, and PHB's DMA weight which is
accumulative DMA weight of PEs contained in the PHB. It means
that the PHB's DMA weight calculation depends on existing PEs,
which works perfectly now, but not hotplug friendly. As the
whole available DMA32 space can be assigned to one PE on PHB3,
so we don't have the issue on PHB3.

The patch improves DMA32 segment assignment by removing the
dependency of existing PEs to make the piece of logic friendly
to PCI hotplug. Besides, it's always worthy to trace the DMA32
segments consumed by one PE, which can be released at PCI
unplugging time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 204 ++++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.h      |  24 +---
 2 files changed, 116 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7e6e266..9ef745e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -976,8 +976,11 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
 	list_add_tail(&pe->dma_link, &phb->ioda.pe_dma_list);
 }
 
-static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
+static unsigned int pnv_ioda_dev_dma_weight(struct pci_dev *dev)
 {
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
 	/* This is quite simplistic. The "base" weight of a device
 	 * is 10. 0 means no DMA is to be accounted for it.
 	 */
@@ -990,14 +993,33 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
 	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
 	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
 	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-		return 3;
+		return 3 * phb->ioda.tce32_count;
 
 	/* Increase the weight of RAID (includes Obsidian) */
 	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
-		return 15;
+		return 15 * phb->ioda.tce32_count;
 
 	/* Default */
-	return 10;
+	return 10 * phb->ioda.tce32_count;
+}
+
+static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data)
+{
+	unsigned int *dma_weight = data;
+
+	*dma_weight += pnv_ioda_dev_dma_weight(pdev);
+	return 0;
+}
+
+static void pnv_ioda_phb_dma_weight(struct pnv_phb *phb)
+{
+	phb->ioda.dma_weight = 0;
+	if (!phb->hose->bus)
+		return;
+
+	pci_walk_bus(phb->hose->bus,
+		     __pnv_ioda_phb_dma_weight,
+		     &phb->ioda.dma_weight);
 }
 
 #ifdef CONFIG_PCI_IOV
@@ -1156,7 +1178,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 			continue;
 		}
 		pdn->pe_number = pe->pe_number;
-		pe->dma_weight += pnv_ioda_dma_weight(dev);
+		pe->dma_weight += pnv_ioda_dev_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
 			pnv_ioda_setup_same_PE(dev->subordinate, pe);
 	}
@@ -1193,7 +1215,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
-	pe->tce32_seg = -1;
 	pe->mve_number = -1;
 	pe->rid = bus->busn_res.start << 8;
 	pe->dma_weight = 0;
@@ -1223,14 +1244,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	/* Put PE to the list */
 	list_add_tail(&pe->list, &phb->ioda.pe_list);
 
-	/* Account for one DMA PE if at least one DMA capable device exist
-	 * below the bridge
-	 */
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
 	/* Link the PE */
 	pnv_ioda_link_pe_by_weight(phb, pe);
 }
@@ -1569,7 +1582,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		pe->flags = PNV_IODA_PE_VF;
 		pe->pbus = NULL;
 		pe->parent_dev = pdev;
-		pe->tce32_seg = -1;
 		pe->mve_number = -1;
 		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
 			   pci_iov_virtfn_devfn(pdev, vf_index);
@@ -1890,28 +1902,70 @@ void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 		pnv_pci_ioda2_tce_invalidate(pe, tbl, startp, endp, rm);
 }
 
-static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
-				      struct pnv_ioda_pe *pe, unsigned int base,
-				      unsigned int segs)
+static int pnv_pci_ioda1_dma_segment_alloc(struct pnv_phb *phb,
+					   struct pnv_ioda_pe *pe)
+{
+	unsigned int weight, base, segs;
+
+	/* We shouldn't already have 32-bits DMA associated */
+	if (WARN_ON(pe->tce32_seg_start || pe->tce32_seg_end))
+		return -EEXIST;
+
+	/* Needn't setup TCE table for non-DMA capable PE */
+	weight = pe->dma_weight;
+	if (!weight)
+		return -ENODEV;
+
+	/* Calculate the DMA segments that PE needs. It's guaranteed
+	 * that the PE will have one segment at least.
+	 */
+	if (weight < phb->ioda.dma_weight / phb->ioda.tce32_count)
+		weight = phb->ioda.dma_weight / phb->ioda.tce32_count;
+	segs = (weight * phb->ioda.tce32_count) / phb->ioda.dma_weight;
+
+	/* Reserve the DMA segments with back-off way, which should
+	 * give us one segment at least.
+	 */
+	do {
+		base = bitmap_find_next_zero_area(phb->ioda.tce32_segmap,
+						  phb->ioda.tce32_count,
+						  0, segs, 0);
+		if (base < phb->ioda.tce32_count)
+			bitmap_set(phb->ioda.tce32_segmap, base, segs);
+	} while ((base > phb->ioda.tce32_count) && (--segs));
+
+	/* There are possibly no DMA32 segments */
+	if (!segs)
+		return -ENODEV;
+
+	pe->tce32_seg_start = base;
+	pe->tce32_seg_end = base + segs;
+	return 0;
+}
+
+static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
+				       struct pnv_ioda_pe *pe)
 {
 
 	struct page *tce_mem = NULL;
 	const __be64 *swinvp;
 	struct iommu_table *tbl;
-	unsigned int i;
-	int64_t rc;
 	void *addr;
+	unsigned int base, segs, i;
+	int64_t rc;
 
 	/* XXX FIXME: Handle 64-bit only DMA devices */
 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
-	/* XXX FIXME: Allocate multi-level tables on PHB3 */
 
-	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
+	/* Allocate TCE32 segments */
+	if (pnv_pci_ioda1_dma_segment_alloc(phb, pe)) {
+		pe_err(pe, " Cannot setting up 32-bits TCE table\n");
 		return;
+	}
 
-	/* Grab a 32-bit TCE table */
-	pe->tce32_seg = base;
+	/* Build a 32-bits TCE table */
+	base = pe->tce32_seg_start;
+	segs = pe->tce32_seg_end - pe->tce32_seg_start;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
 		(base << 28), ((base + segs) << 28) - 1);
 
@@ -1943,6 +1997,10 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 		}
 	}
 
+	/* Print some info */
+	pe_info(pe, "DMA weight %d, assigned (%d %d) DMA32 segments\n",
+		pe->dma_weight, base, segs);
+
 	/* Setup linux iommu table */
 	tbl = pe->tce32_table;
 	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
@@ -1981,8 +2039,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	return;
  fail:
 	/* XXX Failure: Try to fallback to 64-bit only ? */
-	if (pe->tce32_seg >= 0)
-		pe->tce32_seg = -1;
+	bitmap_clear(phb->ioda.tce32_segmap, base, segs);
+	pe->tce32_seg_start = pe->tce32_seg_end = 0;
 	if (tce_mem)
 		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
 }
@@ -2051,11 +2109,10 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	int64_t rc;
 
 	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
+	if (WARN_ON(pe->tce32_seg_end > pe->tce32_seg_start))
 		return;
 
 	/* The PE will reserve all possible 32-bits space */
-	pe->tce32_seg = 0;
 	end = (1 << ilog2(phb->ioda.m32_pci_base));
 	tce_table_size = (end / 0x1000) * 8;
 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
@@ -2066,7 +2123,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 				   get_order(tce_table_size));
 	if (!tce_mem) {
 		pe_err(pe, "Failed to allocate a 32-bit TCE memory\n");
-		goto fail;
+		return;
 	}
 	addr = page_address(tce_mem);
 	memset(addr, 0, tce_table_size);
@@ -2079,11 +2136,16 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 					pe->pe_number << 1, 1, __pa(addr),
 					tce_table_size, 0x1000);
 	if (rc) {
+		__free_pages(tce_mem, get_order(tce_table_size));
 		pe_err(pe, "Failed to configure 32-bit TCE table,"
 		       " err %ld\n", rc);
-		goto fail;
+		return;
 	}
 
+	/* Print some info */
+	pe->tce32_seg_end = pe->tce32_seg_start + 1;
+	pe_info(pe, "Assigned DMA32 space\n");
+
 	/* Setup linux iommu table */
 	tbl = pe->tce32_table;
 	pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
@@ -2120,76 +2182,30 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	/* Also create a bypass window */
 	if (!pnv_iommu_bypass_disabled)
 		pnv_pci_ioda2_setup_bypass_pe(phb, pe);
-
-	return;
-fail:
-	if (pe->tce32_seg >= 0)
-		pe->tce32_seg = -1;
-	if (tce_mem)
-		__free_pages(tce_mem, get_order(tce_table_size));
 }
 
-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
+void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
+			       struct pnv_ioda_pe *pe)
 {
-	struct pci_controller *hose = phb->hose;
-	unsigned int residual, remaining, segs, tw, base;
-	struct pnv_ioda_pe *pe;
-
-	/* If we have more PE# than segments available, hand out one
-	 * per PE until we run out and let the rest fail. If not,
-	 * then we assign at least one segment per PE, plus more based
-	 * on the amount of devices under that PE
+	/* Recalculate the PHB's total DMA weight, which depends on
+	 * PCI devices. That means the PCI devices beneath the PHB
+	 * should have been probed successfully. Otherwise, the
+	 * calculated PHB's DMA weight won't be accurate.
 	 */
-	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
-		residual = 0;
-	else
-		residual = phb->ioda.tce32_count -
-			phb->ioda.dma_pe_count;
+	pnv_ioda_phb_dma_weight(phb);
 
-	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
-		hose->global_number, phb->ioda.tce32_count);
-	pr_info("PCI: %d PE# for a total weight of %d\n",
-		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
-
-	/* Walk our PE list and configure their DMA segments, hand them
-	 * out one base segment plus any residual segments based on
-	 * weight
-	 */
-	remaining = phb->ioda.tce32_count;
-	tw = phb->ioda.dma_weight;
-	base = 0;
-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
-		if (!pe->dma_weight)
-			continue;
-		if (!remaining) {
-			pe_warn(pe, "No DMA32 resources available\n");
-			continue;
-		}
-		segs = 1;
-		if (residual) {
-			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
-			if (segs > remaining)
-				segs = remaining;
-		}
+	if (phb->type == PNV_PHB_IODA1)
+		pnv_pci_ioda1_setup_dma_pe(phb, pe);
+	else if (phb->type == PNV_PHB_IODA2)
+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+}
 
-		/*
-		 * For IODA2 compliant PHB3, we needn't care about the weight.
-		 * The all available 32-bits DMA space will be assigned to
-		 * the specific PE.
-		 */
-		if (phb->type == PNV_PHB_IODA1) {
-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
-				pe->dma_weight, segs);
-			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
-		} else {
-			pe_info(pe, "Assign DMA32 space\n");
-			segs = 0;
-			pnv_pci_ioda2_setup_dma_pe(phb, pe);
-		}
+static void pnv_ioda_setup_dma(struct pnv_phb *phb)
+{
+	struct pnv_ioda_pe *pe;
 
-		remaining -= segs;
-		base += segs;
-	}
+	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link)
+		pnv_pci_ioda_setup_dma_pe(phb, pe);
 }
 
 #ifdef CONFIG_PCI_MSI
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index f604bb7..2784951 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -58,15 +58,11 @@ struct pnv_ioda_pe {
 	unsigned long		m32_segmap[8];
 	unsigned long		m64_segmap[8];
 
-	/* "Weight" assigned to the PE for the sake of DMA resource
-	 * allocations
-	 */
-	unsigned int		dma_weight;
-
-	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
-	int			tce32_seg;
-	int			tce32_segcount;
+	/* 32-bits DMA */
 	struct iommu_table	*tce32_table;
+	unsigned int		dma_weight;
+	unsigned int		tce32_seg_start;
+	unsigned int		tce32_seg_end;
 	phys_addr_t		tce_inval_reg_phys;
 
 	/* 64-bit TCE bypass region */
@@ -183,17 +179,9 @@ struct pnv_phb {
 			unsigned char		pe_rmap[0x10000];
 
 			/* 32-bit TCE tables allocation */
-			unsigned long		tce32_count;
-
-			/* Total "weight" for the sake of DMA resources
-			 * allocation
-			 */
 			unsigned int		dma_weight;
-			unsigned int		dma_pe_count;
-
-			/* Sorted list of used PE's, sorted at
-			 * boot for resource allocation purposes
-			 */
+			unsigned long		tce32_count;
+			unsigned long		tce32_segmap[8];
 			struct list_head	pe_dma_list;
 		} ioda;
 	};
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 05/21] powerpc/powernv: Improve DMA32 segment assignment
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

For P7IOC, the whole available DMA32 space, which is below the
MEM32 space, is evenly divided into 256MB segments. How many
continuous segments assigned to one particular PE depends on
the PE's DMA weight that is figured out from the type of each
PCI devices contained in the PE, and PHB's DMA weight which is
accumulative DMA weight of PEs contained in the PHB. It means
that the PHB's DMA weight calculation depends on existing PEs,
which works perfectly now, but not hotplug friendly. As the
whole available DMA32 space can be assigned to one PE on PHB3,
so we don't have the issue on PHB3.

The patch improves DMA32 segment assignment by removing the
dependency of existing PEs to make the piece of logic friendly
to PCI hotplug. Besides, it's always worthy to trace the DMA32
segments consumed by one PE, which can be released at PCI
unplugging time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 204 ++++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.h      |  24 +---
 2 files changed, 116 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7e6e266..9ef745e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -976,8 +976,11 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
 	list_add_tail(&pe->dma_link, &phb->ioda.pe_dma_list);
 }
 
-static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
+static unsigned int pnv_ioda_dev_dma_weight(struct pci_dev *dev)
 {
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
 	/* This is quite simplistic. The "base" weight of a device
 	 * is 10. 0 means no DMA is to be accounted for it.
 	 */
@@ -990,14 +993,33 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
 	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
 	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
 	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-		return 3;
+		return 3 * phb->ioda.tce32_count;
 
 	/* Increase the weight of RAID (includes Obsidian) */
 	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
-		return 15;
+		return 15 * phb->ioda.tce32_count;
 
 	/* Default */
-	return 10;
+	return 10 * phb->ioda.tce32_count;
+}
+
+static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data)
+{
+	unsigned int *dma_weight = data;
+
+	*dma_weight += pnv_ioda_dev_dma_weight(pdev);
+	return 0;
+}
+
+static void pnv_ioda_phb_dma_weight(struct pnv_phb *phb)
+{
+	phb->ioda.dma_weight = 0;
+	if (!phb->hose->bus)
+		return;
+
+	pci_walk_bus(phb->hose->bus,
+		     __pnv_ioda_phb_dma_weight,
+		     &phb->ioda.dma_weight);
 }
 
 #ifdef CONFIG_PCI_IOV
@@ -1156,7 +1178,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 			continue;
 		}
 		pdn->pe_number = pe->pe_number;
-		pe->dma_weight += pnv_ioda_dma_weight(dev);
+		pe->dma_weight += pnv_ioda_dev_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
 			pnv_ioda_setup_same_PE(dev->subordinate, pe);
 	}
@@ -1193,7 +1215,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
-	pe->tce32_seg = -1;
 	pe->mve_number = -1;
 	pe->rid = bus->busn_res.start << 8;
 	pe->dma_weight = 0;
@@ -1223,14 +1244,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	/* Put PE to the list */
 	list_add_tail(&pe->list, &phb->ioda.pe_list);
 
-	/* Account for one DMA PE if at least one DMA capable device exist
-	 * below the bridge
-	 */
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
 	/* Link the PE */
 	pnv_ioda_link_pe_by_weight(phb, pe);
 }
@@ -1569,7 +1582,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		pe->flags = PNV_IODA_PE_VF;
 		pe->pbus = NULL;
 		pe->parent_dev = pdev;
-		pe->tce32_seg = -1;
 		pe->mve_number = -1;
 		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
 			   pci_iov_virtfn_devfn(pdev, vf_index);
@@ -1890,28 +1902,70 @@ void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 		pnv_pci_ioda2_tce_invalidate(pe, tbl, startp, endp, rm);
 }
 
-static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
-				      struct pnv_ioda_pe *pe, unsigned int base,
-				      unsigned int segs)
+static int pnv_pci_ioda1_dma_segment_alloc(struct pnv_phb *phb,
+					   struct pnv_ioda_pe *pe)
+{
+	unsigned int weight, base, segs;
+
+	/* We shouldn't already have 32-bits DMA associated */
+	if (WARN_ON(pe->tce32_seg_start || pe->tce32_seg_end))
+		return -EEXIST;
+
+	/* Needn't setup TCE table for non-DMA capable PE */
+	weight = pe->dma_weight;
+	if (!weight)
+		return -ENODEV;
+
+	/* Calculate the DMA segments that PE needs. It's guaranteed
+	 * that the PE will have one segment at least.
+	 */
+	if (weight < phb->ioda.dma_weight / phb->ioda.tce32_count)
+		weight = phb->ioda.dma_weight / phb->ioda.tce32_count;
+	segs = (weight * phb->ioda.tce32_count) / phb->ioda.dma_weight;
+
+	/* Reserve the DMA segments with back-off way, which should
+	 * give us one segment at least.
+	 */
+	do {
+		base = bitmap_find_next_zero_area(phb->ioda.tce32_segmap,
+						  phb->ioda.tce32_count,
+						  0, segs, 0);
+		if (base < phb->ioda.tce32_count)
+			bitmap_set(phb->ioda.tce32_segmap, base, segs);
+	} while ((base > phb->ioda.tce32_count) && (--segs));
+
+	/* There are possibly no DMA32 segments */
+	if (!segs)
+		return -ENODEV;
+
+	pe->tce32_seg_start = base;
+	pe->tce32_seg_end = base + segs;
+	return 0;
+}
+
+static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
+				       struct pnv_ioda_pe *pe)
 {
 
 	struct page *tce_mem = NULL;
 	const __be64 *swinvp;
 	struct iommu_table *tbl;
-	unsigned int i;
-	int64_t rc;
 	void *addr;
+	unsigned int base, segs, i;
+	int64_t rc;
 
 	/* XXX FIXME: Handle 64-bit only DMA devices */
 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
-	/* XXX FIXME: Allocate multi-level tables on PHB3 */
 
-	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
+	/* Allocate TCE32 segments */
+	if (pnv_pci_ioda1_dma_segment_alloc(phb, pe)) {
+		pe_err(pe, " Cannot setting up 32-bits TCE table\n");
 		return;
+	}
 
-	/* Grab a 32-bit TCE table */
-	pe->tce32_seg = base;
+	/* Build a 32-bits TCE table */
+	base = pe->tce32_seg_start;
+	segs = pe->tce32_seg_end - pe->tce32_seg_start;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
 		(base << 28), ((base + segs) << 28) - 1);
 
@@ -1943,6 +1997,10 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 		}
 	}
 
+	/* Print some info */
+	pe_info(pe, "DMA weight %d, assigned (%d %d) DMA32 segments\n",
+		pe->dma_weight, base, segs);
+
 	/* Setup linux iommu table */
 	tbl = pe->tce32_table;
 	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
@@ -1981,8 +2039,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	return;
  fail:
 	/* XXX Failure: Try to fallback to 64-bit only ? */
-	if (pe->tce32_seg >= 0)
-		pe->tce32_seg = -1;
+	bitmap_clear(phb->ioda.tce32_segmap, base, segs);
+	pe->tce32_seg_start = pe->tce32_seg_end = 0;
 	if (tce_mem)
 		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
 }
@@ -2051,11 +2109,10 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	int64_t rc;
 
 	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
+	if (WARN_ON(pe->tce32_seg_end > pe->tce32_seg_start))
 		return;
 
 	/* The PE will reserve all possible 32-bits space */
-	pe->tce32_seg = 0;
 	end = (1 << ilog2(phb->ioda.m32_pci_base));
 	tce_table_size = (end / 0x1000) * 8;
 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
@@ -2066,7 +2123,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 				   get_order(tce_table_size));
 	if (!tce_mem) {
 		pe_err(pe, "Failed to allocate a 32-bit TCE memory\n");
-		goto fail;
+		return;
 	}
 	addr = page_address(tce_mem);
 	memset(addr, 0, tce_table_size);
@@ -2079,11 +2136,16 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 					pe->pe_number << 1, 1, __pa(addr),
 					tce_table_size, 0x1000);
 	if (rc) {
+		__free_pages(tce_mem, get_order(tce_table_size));
 		pe_err(pe, "Failed to configure 32-bit TCE table,"
 		       " err %ld\n", rc);
-		goto fail;
+		return;
 	}
 
+	/* Print some info */
+	pe->tce32_seg_end = pe->tce32_seg_start + 1;
+	pe_info(pe, "Assigned DMA32 space\n");
+
 	/* Setup linux iommu table */
 	tbl = pe->tce32_table;
 	pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
@@ -2120,76 +2182,30 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	/* Also create a bypass window */
 	if (!pnv_iommu_bypass_disabled)
 		pnv_pci_ioda2_setup_bypass_pe(phb, pe);
-
-	return;
-fail:
-	if (pe->tce32_seg >= 0)
-		pe->tce32_seg = -1;
-	if (tce_mem)
-		__free_pages(tce_mem, get_order(tce_table_size));
 }
 
-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
+void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
+			       struct pnv_ioda_pe *pe)
 {
-	struct pci_controller *hose = phb->hose;
-	unsigned int residual, remaining, segs, tw, base;
-	struct pnv_ioda_pe *pe;
-
-	/* If we have more PE# than segments available, hand out one
-	 * per PE until we run out and let the rest fail. If not,
-	 * then we assign at least one segment per PE, plus more based
-	 * on the amount of devices under that PE
+	/* Recalculate the PHB's total DMA weight, which depends on
+	 * PCI devices. That means the PCI devices beneath the PHB
+	 * should have been probed successfully. Otherwise, the
+	 * calculated PHB's DMA weight won't be accurate.
 	 */
-	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
-		residual = 0;
-	else
-		residual = phb->ioda.tce32_count -
-			phb->ioda.dma_pe_count;
+	pnv_ioda_phb_dma_weight(phb);
 
-	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
-		hose->global_number, phb->ioda.tce32_count);
-	pr_info("PCI: %d PE# for a total weight of %d\n",
-		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
-
-	/* Walk our PE list and configure their DMA segments, hand them
-	 * out one base segment plus any residual segments based on
-	 * weight
-	 */
-	remaining = phb->ioda.tce32_count;
-	tw = phb->ioda.dma_weight;
-	base = 0;
-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
-		if (!pe->dma_weight)
-			continue;
-		if (!remaining) {
-			pe_warn(pe, "No DMA32 resources available\n");
-			continue;
-		}
-		segs = 1;
-		if (residual) {
-			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
-			if (segs > remaining)
-				segs = remaining;
-		}
+	if (phb->type == PNV_PHB_IODA1)
+		pnv_pci_ioda1_setup_dma_pe(phb, pe);
+	else if (phb->type == PNV_PHB_IODA2)
+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+}
 
-		/*
-		 * For IODA2 compliant PHB3, we needn't care about the weight.
-		 * The all available 32-bits DMA space will be assigned to
-		 * the specific PE.
-		 */
-		if (phb->type == PNV_PHB_IODA1) {
-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
-				pe->dma_weight, segs);
-			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
-		} else {
-			pe_info(pe, "Assign DMA32 space\n");
-			segs = 0;
-			pnv_pci_ioda2_setup_dma_pe(phb, pe);
-		}
+static void pnv_ioda_setup_dma(struct pnv_phb *phb)
+{
+	struct pnv_ioda_pe *pe;
 
-		remaining -= segs;
-		base += segs;
-	}
+	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link)
+		pnv_pci_ioda_setup_dma_pe(phb, pe);
 }
 
 #ifdef CONFIG_PCI_MSI
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index f604bb7..2784951 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -58,15 +58,11 @@ struct pnv_ioda_pe {
 	unsigned long		m32_segmap[8];
 	unsigned long		m64_segmap[8];
 
-	/* "Weight" assigned to the PE for the sake of DMA resource
-	 * allocations
-	 */
-	unsigned int		dma_weight;
-
-	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
-	int			tce32_seg;
-	int			tce32_segcount;
+	/* 32-bits DMA */
 	struct iommu_table	*tce32_table;
+	unsigned int		dma_weight;
+	unsigned int		tce32_seg_start;
+	unsigned int		tce32_seg_end;
 	phys_addr_t		tce_inval_reg_phys;
 
 	/* 64-bit TCE bypass region */
@@ -183,17 +179,9 @@ struct pnv_phb {
 			unsigned char		pe_rmap[0x10000];
 
 			/* 32-bit TCE tables allocation */
-			unsigned long		tce32_count;
-
-			/* Total "weight" for the sake of DMA resources
-			 * allocation
-			 */
 			unsigned int		dma_weight;
-			unsigned int		dma_pe_count;
-
-			/* Sorted list of used PE's, sorted at
-			 * boot for resource allocation purposes
-			 */
+			unsigned long		tce32_count;
+			unsigned long		tce32_segmap[8];
 			struct list_head	pe_dma_list;
 		} ioda;
 	};
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 06/21] powerpc/powernv: Create PEs dynamically
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

Currently, the PEs and their associated resources are assigned
in ppc_md.pcibios_fixup(). The function is called for once after
PCI probing and resources assignment are finished. Obviously, it's
not hotplug friendly. The patch creates PEs dynamically by
ppc_md.pcibios_setup_bridge(), which is called on the event during
system bootup and PCI hotplug: updating PCI bridge's windows after
resource assignment/reassignment are finished. For partial hotplug
case, where not all PCI devices belonging to the PE are unplugged
and plugged again, we just need unbinding/binding the affected
PCI devices with the corresponding PE without creating new one.

Besides, it might require addtional resources (e.g. M32) to the
windows of the PCI bridge when unplugging current adapter, and
insert a different adapter if there is one PCI slot, which is
assumed behind root port, or the downstream bridge of the PCIE
switch behind root port. The parent bridge of the newly plugged
adapter would reject the request to add more resources, leading
to hotplug failure. For the issue, the patch extends the windows
of root port, or the upstream port of the PCIe switch behind root
port to PHB's windows when ppc_md.pcibios_setup_bridge() is called.

There is no upstream bridge for root bus, so we have to reserve
PE#, which is next to the reserved PE# in advance and fixing the
PE for root bus in ppc_md.pcibios_setup_bridge().

The patch also changes the rule assigning PE#: PE# reserved for
prefetchable 64-bits memory resource and SRIOV VFs starts from
zero while PE# for dynamic allocations starts from ioda.total_pe
reversely. It's because PE# for prefetchable 64-bits memory resource,
which is ually allocated begining with the PHB's aperatus and PE#
and the resource have fixed mapping. The PE# for dynamic allocation
is quite flexible and has no limitation.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |   1 +
 arch/powerpc/kernel/pci-common.c          |  10 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 307 ++++++++++++++++++++----------
 arch/powerpc/platforms/powernv/pci.h      |   4 +-
 4 files changed, 220 insertions(+), 102 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 1811c44..5367eb3 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -29,6 +29,7 @@ struct pci_controller_ops {
 
 	/* Called during PCI resource reassignment */
 	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
+	void		(*setup_bridge)(struct pci_bus *, unsigned long);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
 };
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0d05406..01d2a84 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -134,6 +134,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev)
 	pci_reset_secondary_bus(dev);
 }
 
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+
+	if (hose->controller_ops.setup_bridge)
+		hose->controller_ops.setup_bridge(bus, type);
+	else
+		pci_setup_bridge_resources(bus, type);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9ef745e..910fb67 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -143,18 +143,23 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 
 static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
-	unsigned long pe;
+	unsigned long pe_no;
+	unsigned long limit = phb->ioda.total_pe - 1;
 
 	do {
-		pe = find_next_zero_bit(phb->ioda.pe_alloc,
-					phb->ioda.total_pe, 0);
-		if (pe >= phb->ioda.total_pe)
+		pe_no = find_next_zero_bit(phb->ioda.pe_alloc,
+					   phb->ioda.total_pe, limit);
+		if (pe_no < phb->ioda.total_pe &&
+		    !test_and_set_bit(pe_no, phb->ioda.pe_alloc))
+			break;
+
+		if (--limit >= phb->ioda.total_pe)
 			return IODA_INVALID_PE;
-	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
+	} while(1);
 
-	phb->ioda.pe_array[pe].phb = phb;
-	phb->ioda.pe_array[pe].pe_number = pe;
-	return pe;
+	phb->ioda.pe_array[pe_no].phb = phb;
+	phb->ioda.pe_array[pe_no].pe_number = pe_no;
+	return pe_no;
 }
 
 static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
@@ -214,6 +219,13 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe);
 
+	/* Strip of the segment used by PE for PCI root bus,
+	 * which is last supported PE#, or one next to the
+	 * reserved PE#
+	 */
+	if (phb->ioda.root_pe_no != IODA_INVALID_PE)
+		r->end -= phb->ioda.m64_segsize;
+
 	return 0;
 
 fail:
@@ -264,13 +276,24 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 	 */
 	r = &phb->hose->mem_resources[1];
 	if (phb->ioda.reserved_pe == 0)
-		r->start += phb->ioda.m64_segsize;
+		r->start += (phb->ioda.root_pe_no != IODA_INVALID_PE ?
+			     phb->ioda.m64_segsize * 2 :
+			     phb->ioda.m64_segsize);
 	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
-		r->end -= phb->ioda.m64_segsize;
+		r->end -= (phb->ioda.root_pe_no != IODA_INVALID_PE ?
+			   phb->ioda.m64_segsize * 2 :
+			   phb->ioda.m64_segsize);
 	else
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe);
 
+	/* Strip of the segment used by PE for PCI root bus,
+	 * which is last supported PE#, or one next to the
+	 * reserved PE#
+	 */
+	if (phb->ioda.root_pe_no != IODA_INVALID_PE)
+		r->end -= phb->ioda.m64_segsize;
+
 	return 0;
 
 fail:
@@ -837,7 +860,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	/* Clear the reverse map */
 	for (rid = pe->rid; rid < rid_end; rid++)
-		phb->ioda.pe_rmap[rid] = 0;
+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
 
 	/* Release from all parents PELT-V */
 	while (parent) {
@@ -1172,11 +1195,18 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_dn *pdn = pci_get_pdn(dev);
 
-		if (pdn == NULL) {
-			pr_warn("%s: No device node associated with device !\n",
-				pci_name(dev));
+		if (!pdn) {
+			dev_warn(&dev->dev, "%s: No associated PCI data\n",
+				 __func__);
 			continue;
 		}
+
+		/* The PCI device might have been associated with the PE in
+		 * case of partial hotplug.
+		 */
+		if (pdn->pe_number != IODA_INVALID_PE)
+			continue;
+
 		pdn->pe_number = pe->pe_number;
 		pe->dma_weight += pnv_ioda_dev_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
@@ -1190,15 +1220,31 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
  * subordinate PCI devices and buses. The second type of PE is normally
  * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
  */
-static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
+static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
 	struct pnv_ioda_pe *pe;
 	int pe_num = IODA_INVALID_PE;
 
+	/* For partial hotplug case, the PE instance hasn't been destroyed
+	 * yet. We shouldn't allocated a new one and assign resources to
+	 * it. The existing PE instance should be reused, but we should
+	 * associate the devices to the PE.
+	 */
+	pe_num = phb->ioda.pe_rmap[bus->number << 8];
+	if (pe_num != IODA_INVALID_PE) {
+		pe = &phb->ioda.pe_array[pe_num];
+		pnv_ioda_setup_same_PE(bus, pe);
+		return NULL;
+	}
+
+	/* PE number for root bus should have been reserved */
+	if (pci_is_root_bus(bus))
+		pe_num = phb->ioda.root_pe_no;
+
 	/* Check if PE is determined by M64 */
-	if (phb->pick_m64_pe)
+	if (pe_num == IODA_INVALID_PE && phb->pick_m64_pe)
 		pe_num = phb->pick_m64_pe(phb, bus, all);
 
 	/* The PE number isn't pinned by M64 */
@@ -1208,7 +1254,7 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	if (pe_num == IODA_INVALID_PE) {
 		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
 			__func__, pci_domain_nr(bus), bus->number);
-		return;
+		return NULL;
 	}
 
 	pe = &phb->ioda.pe_array[pe_num];
@@ -1220,18 +1266,18 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	pe->dma_weight = 0;
 
 	if (all)
-		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
-			bus->busn_res.start, bus->busn_res.end, pe_num);
+		pe_info(pe, "Secondary bus %d..%d associated\n",
+			bus->busn_res.start, bus->busn_res.end);
 	else
-		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
-			bus->busn_res.start, pe_num);
+		pe_info(pe, "Secondary bus %d associated\n",
+			bus->busn_res.start);
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
 		/* XXX What do we do here ? */
 		if (pe_num)
 			pnv_ioda_free_pe(phb, pe_num);
 		pe->pbus = NULL;
-		return;
+		return NULL;
 	}
 
 	pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
@@ -1246,46 +1292,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 
 	/* Link the PE */
 	pnv_ioda_link_pe_by_weight(phb, pe);
-}
-
-static void pnv_ioda_setup_PEs(struct pci_bus *bus)
-{
-	struct pci_dev *dev;
-
-	pnv_ioda_setup_bus_PE(bus, 0);
 
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		if (dev->subordinate) {
-			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
-				pnv_ioda_setup_bus_PE(dev->subordinate, 1);
-			else
-				pnv_ioda_setup_PEs(dev->subordinate);
-		}
-	}
-}
-
-/*
- * Configure PEs so that the downstream PCI buses and devices
- * could have their associated PE#. Unfortunately, we didn't
- * figure out the way to identify the PLX bridge yet. So we
- * simply put the PCI bus and the subordinate behind the root
- * port to PE# here. The game rule here is expected to be changed
- * as soon as we can detected PLX bridge correctly.
- */
-static void pnv_pci_ioda_setup_PEs(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pnv_phb *phb;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-
-		/* M64 layout might affect PE allocation */
-		if (phb->reserve_m64_pe)
-			phb->reserve_m64_pe(phb, phb->hose->bus);
-
-		pnv_ioda_setup_PEs(hose->bus);
-	}
+	return pe;
 }
 
 #ifdef CONFIG_PCI_IOV
@@ -2200,14 +2208,6 @@ void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
 }
 
-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
-{
-	struct pnv_ioda_pe *pe;
-
-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link)
-		pnv_pci_ioda_setup_dma_pe(phb, pe);
-}
-
 #ifdef CONFIG_PCI_MSI
 static void pnv_ioda2_msi_eoi(struct irq_data *d)
 {
@@ -2649,34 +2649,6 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 	}
 }
 
-static void pnv_pci_ioda_setup_seg(void)
-{
-	struct pci_controller *tmp, *hose;
-	struct pnv_phb *phb;
-	struct pnv_ioda_pe *pe;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			pnv_ioda_setup_pe_seg(hose, pe);
-		}
-	}
-}
-
-static void pnv_pci_ioda_setup_DMA(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pnv_phb *phb;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		pnv_ioda_setup_dma(hose->private_data);
-
-		/* Mark the PHB initialization done */
-		phb = hose->private_data;
-		phb->initialized = 1;
-	}
-}
-
 static void pnv_pci_ioda_create_dbgfs(void)
 {
 #ifdef CONFIG_DEBUG_FS
@@ -2698,9 +2670,14 @@ static void pnv_pci_ioda_create_dbgfs(void)
 
 static void pnv_pci_ioda_fixup(void)
 {
-	pnv_pci_ioda_setup_PEs();
-	pnv_pci_ioda_setup_seg();
-	pnv_pci_ioda_setup_DMA();
+	struct pci_controller *tmp, *hose;
+	struct pnv_phb *phb;
+
+	/* Notify initialization of PHB done */
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+		phb = hose->private_data;
+		phb->initialized = 1;
+	}
 
 	pnv_pci_ioda_create_dbgfs();
 
@@ -2751,6 +2728,115 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
 	return phb->ioda.io_segsize;
 }
 
+/*
+ * We are updating root port or the upstream bridge behind the root
+ * port with PHB's various windows in order to accomodate the changes
+ * on required resources during PCI (slot) hotplug, which is connected
+ * to either root port, or the downstream ports of PCIe switch behind
+ * the root port.
+ */
+static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
+					   unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct resource *r, *w;
+	int i;
+
+	/* Check if we need apply fixup to the bridge's resources */
+	if (!pci_is_root_bus(bridge->bus) &&
+	    !pci_is_root_bus(bridge->bus->self->bus)) {
+		pci_setup_bridge_resources(bus, type);
+		return;
+	}
+
+	/* Fixup the resoureces */
+	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
+		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
+		if (!r->flags || !r->parent)
+			continue;
+
+		w = NULL;
+		if (r->flags & type & IORESOURCE_IO)
+			w = &hose->io_resource;
+		else if (pnv_pci_is_mem_pref_64(r->flags) &&
+			 (type & IORESOURCE_PREFETCH) &&
+			 phb->ioda.m64_segsize)
+			w = &hose->mem_resources[1];
+		else if (r->flags & type & IORESOURCE_MEM)
+			w = &hose->mem_resources[0];
+
+		r->start = w->start;
+		r->end = w->end;
+	}
+
+	/* Update the resources */
+	pci_setup_bridge_resources(bus, type);
+}
+
+static void pnv_pci_setup_bridge(struct pci_bus *bus,
+				 unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct pci_dev *parent;
+	struct pnv_ioda_pe *pe;
+
+	/* The PCI bus might be behind a PCIE-to-PCI bridge. For that
+	 * case, the PCI bus should have been included to one PE. So
+	 * we needn't assign PE for it again.
+	 */
+	parent = bridge->bus ? bridge->bus->self : NULL;
+	while (parent) {
+		if (pci_pcie_type(parent) == PCI_EXP_TYPE_PCI_BRIDGE)
+			return;
+
+		parent = parent->bus ? parent->bus->self : NULL;
+	}
+
+	/* Assign PE to root bus, which would be the parent PE and
+	 * should be populated prior to any other PEs.
+	 */
+	if (!phb->ioda.root_pe_populated) {
+		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, 0);
+		if (pe && phb->ioda.root_pe_no == IODA_INVALID_PE)
+			phb->ioda.root_pe_no = pe->pe_number;
+		phb->ioda.root_pe_populated = 1;
+	}
+
+	/* Extend bridge's windows if necessary */
+	pnv_pci_fixup_bridge_resources(bus, type);
+
+	/* Don't assign PE to bus, which doesn't have any subordinate
+	 * PCI devices on it.
+	 */
+	if (list_empty(&bus->devices))
+		return;
+
+	/* Reserve PEs for M64 resource */
+	if (phb->reserve_m64_pe)
+		phb->reserve_m64_pe(phb, bus);
+
+	/* Assign PE. We might run here because of partial hotplug.
+	 * For the case, we just pick up the existing PE and should
+	 * not allocate resources again.
+	 */
+	if (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
+		pe = pnv_ioda_setup_bus_PE(bus, 1);
+	else
+		pe = pnv_ioda_setup_bus_PE(bus, 0);
+	if (!pe)
+		return;
+
+	/* Setup MMIO mapping */
+	pnv_ioda_setup_pe_seg(hose, pe);
+
+	/* Setup DMA */
+	pnv_pci_ioda_setup_dma_pe(phb, pe);
+}
+
 #ifdef CONFIG_PCI_IOV
 static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
@@ -2901,7 +2987,22 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
 	phb->ioda.pe_array = aux + pemap_off;
-	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
+
+	/* Choose number of PE for root bus, which shouldn't consume
+	 * any M64 resource. So we avoid picking low-end PE#, which
+	 * is usually binding with 64-bits prefetchable memory resources
+	 * closely.
+	 */
+	pnv_ioda_reserve_pe(phb, phb->ioda.reserved_pe);
+	if (phb->ioda.reserved_pe == 0) {
+		phb->ioda.root_pe_no = phb->ioda.total_pe - 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_no);
+	} else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1)) {
+		phb->ioda.root_pe_no = phb->ioda.reserved_pe - 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_no);
+	} else {
+		phb->ioda.root_pe_no = IODA_INVALID_PE;
+	}
 
 	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
@@ -2910,6 +3011,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	/* Calculate how many 32-bit TCE segments we have */
 	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
 
+	/* Invalidate RID to PE# mapping */
+	memset(phb->ioda.pe_rmap, 0xff, sizeof(phb->ioda.pe_rmap));
+
 #if 0 /* We should really do that ... */
 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
 					 window_type,
@@ -2958,6 +3062,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	 */
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 	pnv_pci_controller_ops.enable_device_hook = pnv_pci_enable_device_hook;
+	pnv_pci_controller_ops.setup_bridge = pnv_pci_setup_bridge;
 	pnv_pci_controller_ops.window_alignment = pnv_pci_window_alignment;
 	pnv_pci_controller_ops.reset_secondary_bus = pnv_pci_reset_secondary_bus;
 	hose->controller_ops = pnv_pci_controller_ops;
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 2784951..1bea3a8 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -134,6 +134,8 @@ struct pnv_phb {
 			/* Global bridge info */
 			unsigned int		total_pe;
 			unsigned int		reserved_pe;
+			unsigned int		root_pe_no;
+			unsigned int		root_pe_populated;
 
 			/* 32-bit MMIO window */
 			unsigned int		m32_size;
@@ -176,7 +178,7 @@ struct pnv_phb {
 			 * we are to support more than 256 PEs, indexed
 			 * bus { bus, devfn }
 			 */
-			unsigned char		pe_rmap[0x10000];
+			unsigned int		pe_rmap[0x10000];
 
 			/* 32-bit TCE tables allocation */
 			unsigned int		dma_weight;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 06/21] powerpc/powernv: Create PEs dynamically
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

Currently, the PEs and their associated resources are assigned
in ppc_md.pcibios_fixup(). The function is called for once after
PCI probing and resources assignment are finished. Obviously, it's
not hotplug friendly. The patch creates PEs dynamically by
ppc_md.pcibios_setup_bridge(), which is called on the event during
system bootup and PCI hotplug: updating PCI bridge's windows after
resource assignment/reassignment are finished. For partial hotplug
case, where not all PCI devices belonging to the PE are unplugged
and plugged again, we just need unbinding/binding the affected
PCI devices with the corresponding PE without creating new one.

Besides, it might require addtional resources (e.g. M32) to the
windows of the PCI bridge when unplugging current adapter, and
insert a different adapter if there is one PCI slot, which is
assumed behind root port, or the downstream bridge of the PCIE
switch behind root port. The parent bridge of the newly plugged
adapter would reject the request to add more resources, leading
to hotplug failure. For the issue, the patch extends the windows
of root port, or the upstream port of the PCIe switch behind root
port to PHB's windows when ppc_md.pcibios_setup_bridge() is called.

There is no upstream bridge for root bus, so we have to reserve
PE#, which is next to the reserved PE# in advance and fixing the
PE for root bus in ppc_md.pcibios_setup_bridge().

The patch also changes the rule assigning PE#: PE# reserved for
prefetchable 64-bits memory resource and SRIOV VFs starts from
zero while PE# for dynamic allocations starts from ioda.total_pe
reversely. It's because PE# for prefetchable 64-bits memory resource,
which is ually allocated begining with the PHB's aperatus and PE#
and the resource have fixed mapping. The PE# for dynamic allocation
is quite flexible and has no limitation.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |   1 +
 arch/powerpc/kernel/pci-common.c          |  10 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 307 ++++++++++++++++++++----------
 arch/powerpc/platforms/powernv/pci.h      |   4 +-
 4 files changed, 220 insertions(+), 102 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 1811c44..5367eb3 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -29,6 +29,7 @@ struct pci_controller_ops {
 
 	/* Called during PCI resource reassignment */
 	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
+	void		(*setup_bridge)(struct pci_bus *, unsigned long);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
 };
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0d05406..01d2a84 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -134,6 +134,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev)
 	pci_reset_secondary_bus(dev);
 }
 
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+
+	if (hose->controller_ops.setup_bridge)
+		hose->controller_ops.setup_bridge(bus, type);
+	else
+		pci_setup_bridge_resources(bus, type);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9ef745e..910fb67 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -143,18 +143,23 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 
 static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
-	unsigned long pe;
+	unsigned long pe_no;
+	unsigned long limit = phb->ioda.total_pe - 1;
 
 	do {
-		pe = find_next_zero_bit(phb->ioda.pe_alloc,
-					phb->ioda.total_pe, 0);
-		if (pe >= phb->ioda.total_pe)
+		pe_no = find_next_zero_bit(phb->ioda.pe_alloc,
+					   phb->ioda.total_pe, limit);
+		if (pe_no < phb->ioda.total_pe &&
+		    !test_and_set_bit(pe_no, phb->ioda.pe_alloc))
+			break;
+
+		if (--limit >= phb->ioda.total_pe)
 			return IODA_INVALID_PE;
-	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
+	} while(1);
 
-	phb->ioda.pe_array[pe].phb = phb;
-	phb->ioda.pe_array[pe].pe_number = pe;
-	return pe;
+	phb->ioda.pe_array[pe_no].phb = phb;
+	phb->ioda.pe_array[pe_no].pe_number = pe_no;
+	return pe_no;
 }
 
 static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
@@ -214,6 +219,13 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe);
 
+	/* Strip of the segment used by PE for PCI root bus,
+	 * which is last supported PE#, or one next to the
+	 * reserved PE#
+	 */
+	if (phb->ioda.root_pe_no != IODA_INVALID_PE)
+		r->end -= phb->ioda.m64_segsize;
+
 	return 0;
 
 fail:
@@ -264,13 +276,24 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 	 */
 	r = &phb->hose->mem_resources[1];
 	if (phb->ioda.reserved_pe == 0)
-		r->start += phb->ioda.m64_segsize;
+		r->start += (phb->ioda.root_pe_no != IODA_INVALID_PE ?
+			     phb->ioda.m64_segsize * 2 :
+			     phb->ioda.m64_segsize);
 	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
-		r->end -= phb->ioda.m64_segsize;
+		r->end -= (phb->ioda.root_pe_no != IODA_INVALID_PE ?
+			   phb->ioda.m64_segsize * 2 :
+			   phb->ioda.m64_segsize);
 	else
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe);
 
+	/* Strip of the segment used by PE for PCI root bus,
+	 * which is last supported PE#, or one next to the
+	 * reserved PE#
+	 */
+	if (phb->ioda.root_pe_no != IODA_INVALID_PE)
+		r->end -= phb->ioda.m64_segsize;
+
 	return 0;
 
 fail:
@@ -837,7 +860,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	/* Clear the reverse map */
 	for (rid = pe->rid; rid < rid_end; rid++)
-		phb->ioda.pe_rmap[rid] = 0;
+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
 
 	/* Release from all parents PELT-V */
 	while (parent) {
@@ -1172,11 +1195,18 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_dn *pdn = pci_get_pdn(dev);
 
-		if (pdn == NULL) {
-			pr_warn("%s: No device node associated with device !\n",
-				pci_name(dev));
+		if (!pdn) {
+			dev_warn(&dev->dev, "%s: No associated PCI data\n",
+				 __func__);
 			continue;
 		}
+
+		/* The PCI device might have been associated with the PE in
+		 * case of partial hotplug.
+		 */
+		if (pdn->pe_number != IODA_INVALID_PE)
+			continue;
+
 		pdn->pe_number = pe->pe_number;
 		pe->dma_weight += pnv_ioda_dev_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
@@ -1190,15 +1220,31 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
  * subordinate PCI devices and buses. The second type of PE is normally
  * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
  */
-static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
+static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
 	struct pnv_ioda_pe *pe;
 	int pe_num = IODA_INVALID_PE;
 
+	/* For partial hotplug case, the PE instance hasn't been destroyed
+	 * yet. We shouldn't allocated a new one and assign resources to
+	 * it. The existing PE instance should be reused, but we should
+	 * associate the devices to the PE.
+	 */
+	pe_num = phb->ioda.pe_rmap[bus->number << 8];
+	if (pe_num != IODA_INVALID_PE) {
+		pe = &phb->ioda.pe_array[pe_num];
+		pnv_ioda_setup_same_PE(bus, pe);
+		return NULL;
+	}
+
+	/* PE number for root bus should have been reserved */
+	if (pci_is_root_bus(bus))
+		pe_num = phb->ioda.root_pe_no;
+
 	/* Check if PE is determined by M64 */
-	if (phb->pick_m64_pe)
+	if (pe_num == IODA_INVALID_PE && phb->pick_m64_pe)
 		pe_num = phb->pick_m64_pe(phb, bus, all);
 
 	/* The PE number isn't pinned by M64 */
@@ -1208,7 +1254,7 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	if (pe_num == IODA_INVALID_PE) {
 		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
 			__func__, pci_domain_nr(bus), bus->number);
-		return;
+		return NULL;
 	}
 
 	pe = &phb->ioda.pe_array[pe_num];
@@ -1220,18 +1266,18 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	pe->dma_weight = 0;
 
 	if (all)
-		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
-			bus->busn_res.start, bus->busn_res.end, pe_num);
+		pe_info(pe, "Secondary bus %d..%d associated\n",
+			bus->busn_res.start, bus->busn_res.end);
 	else
-		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
-			bus->busn_res.start, pe_num);
+		pe_info(pe, "Secondary bus %d associated\n",
+			bus->busn_res.start);
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
 		/* XXX What do we do here ? */
 		if (pe_num)
 			pnv_ioda_free_pe(phb, pe_num);
 		pe->pbus = NULL;
-		return;
+		return NULL;
 	}
 
 	pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
@@ -1246,46 +1292,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 
 	/* Link the PE */
 	pnv_ioda_link_pe_by_weight(phb, pe);
-}
-
-static void pnv_ioda_setup_PEs(struct pci_bus *bus)
-{
-	struct pci_dev *dev;
-
-	pnv_ioda_setup_bus_PE(bus, 0);
 
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		if (dev->subordinate) {
-			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
-				pnv_ioda_setup_bus_PE(dev->subordinate, 1);
-			else
-				pnv_ioda_setup_PEs(dev->subordinate);
-		}
-	}
-}
-
-/*
- * Configure PEs so that the downstream PCI buses and devices
- * could have their associated PE#. Unfortunately, we didn't
- * figure out the way to identify the PLX bridge yet. So we
- * simply put the PCI bus and the subordinate behind the root
- * port to PE# here. The game rule here is expected to be changed
- * as soon as we can detected PLX bridge correctly.
- */
-static void pnv_pci_ioda_setup_PEs(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pnv_phb *phb;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-
-		/* M64 layout might affect PE allocation */
-		if (phb->reserve_m64_pe)
-			phb->reserve_m64_pe(phb, phb->hose->bus);
-
-		pnv_ioda_setup_PEs(hose->bus);
-	}
+	return pe;
 }
 
 #ifdef CONFIG_PCI_IOV
@@ -2200,14 +2208,6 @@ void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
 }
 
-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
-{
-	struct pnv_ioda_pe *pe;
-
-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link)
-		pnv_pci_ioda_setup_dma_pe(phb, pe);
-}
-
 #ifdef CONFIG_PCI_MSI
 static void pnv_ioda2_msi_eoi(struct irq_data *d)
 {
@@ -2649,34 +2649,6 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 	}
 }
 
-static void pnv_pci_ioda_setup_seg(void)
-{
-	struct pci_controller *tmp, *hose;
-	struct pnv_phb *phb;
-	struct pnv_ioda_pe *pe;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			pnv_ioda_setup_pe_seg(hose, pe);
-		}
-	}
-}
-
-static void pnv_pci_ioda_setup_DMA(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pnv_phb *phb;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		pnv_ioda_setup_dma(hose->private_data);
-
-		/* Mark the PHB initialization done */
-		phb = hose->private_data;
-		phb->initialized = 1;
-	}
-}
-
 static void pnv_pci_ioda_create_dbgfs(void)
 {
 #ifdef CONFIG_DEBUG_FS
@@ -2698,9 +2670,14 @@ static void pnv_pci_ioda_create_dbgfs(void)
 
 static void pnv_pci_ioda_fixup(void)
 {
-	pnv_pci_ioda_setup_PEs();
-	pnv_pci_ioda_setup_seg();
-	pnv_pci_ioda_setup_DMA();
+	struct pci_controller *tmp, *hose;
+	struct pnv_phb *phb;
+
+	/* Notify initialization of PHB done */
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+		phb = hose->private_data;
+		phb->initialized = 1;
+	}
 
 	pnv_pci_ioda_create_dbgfs();
 
@@ -2751,6 +2728,115 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
 	return phb->ioda.io_segsize;
 }
 
+/*
+ * We are updating root port or the upstream bridge behind the root
+ * port with PHB's various windows in order to accomodate the changes
+ * on required resources during PCI (slot) hotplug, which is connected
+ * to either root port, or the downstream ports of PCIe switch behind
+ * the root port.
+ */
+static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
+					   unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct resource *r, *w;
+	int i;
+
+	/* Check if we need apply fixup to the bridge's resources */
+	if (!pci_is_root_bus(bridge->bus) &&
+	    !pci_is_root_bus(bridge->bus->self->bus)) {
+		pci_setup_bridge_resources(bus, type);
+		return;
+	}
+
+	/* Fixup the resoureces */
+	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
+		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
+		if (!r->flags || !r->parent)
+			continue;
+
+		w = NULL;
+		if (r->flags & type & IORESOURCE_IO)
+			w = &hose->io_resource;
+		else if (pnv_pci_is_mem_pref_64(r->flags) &&
+			 (type & IORESOURCE_PREFETCH) &&
+			 phb->ioda.m64_segsize)
+			w = &hose->mem_resources[1];
+		else if (r->flags & type & IORESOURCE_MEM)
+			w = &hose->mem_resources[0];
+
+		r->start = w->start;
+		r->end = w->end;
+	}
+
+	/* Update the resources */
+	pci_setup_bridge_resources(bus, type);
+}
+
+static void pnv_pci_setup_bridge(struct pci_bus *bus,
+				 unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct pci_dev *parent;
+	struct pnv_ioda_pe *pe;
+
+	/* The PCI bus might be behind a PCIE-to-PCI bridge. For that
+	 * case, the PCI bus should have been included to one PE. So
+	 * we needn't assign PE for it again.
+	 */
+	parent = bridge->bus ? bridge->bus->self : NULL;
+	while (parent) {
+		if (pci_pcie_type(parent) == PCI_EXP_TYPE_PCI_BRIDGE)
+			return;
+
+		parent = parent->bus ? parent->bus->self : NULL;
+	}
+
+	/* Assign PE to root bus, which would be the parent PE and
+	 * should be populated prior to any other PEs.
+	 */
+	if (!phb->ioda.root_pe_populated) {
+		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, 0);
+		if (pe && phb->ioda.root_pe_no == IODA_INVALID_PE)
+			phb->ioda.root_pe_no = pe->pe_number;
+		phb->ioda.root_pe_populated = 1;
+	}
+
+	/* Extend bridge's windows if necessary */
+	pnv_pci_fixup_bridge_resources(bus, type);
+
+	/* Don't assign PE to bus, which doesn't have any subordinate
+	 * PCI devices on it.
+	 */
+	if (list_empty(&bus->devices))
+		return;
+
+	/* Reserve PEs for M64 resource */
+	if (phb->reserve_m64_pe)
+		phb->reserve_m64_pe(phb, bus);
+
+	/* Assign PE. We might run here because of partial hotplug.
+	 * For the case, we just pick up the existing PE and should
+	 * not allocate resources again.
+	 */
+	if (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
+		pe = pnv_ioda_setup_bus_PE(bus, 1);
+	else
+		pe = pnv_ioda_setup_bus_PE(bus, 0);
+	if (!pe)
+		return;
+
+	/* Setup MMIO mapping */
+	pnv_ioda_setup_pe_seg(hose, pe);
+
+	/* Setup DMA */
+	pnv_pci_ioda_setup_dma_pe(phb, pe);
+}
+
 #ifdef CONFIG_PCI_IOV
 static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
@@ -2901,7 +2987,22 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
 	phb->ioda.pe_array = aux + pemap_off;
-	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
+
+	/* Choose number of PE for root bus, which shouldn't consume
+	 * any M64 resource. So we avoid picking low-end PE#, which
+	 * is usually binding with 64-bits prefetchable memory resources
+	 * closely.
+	 */
+	pnv_ioda_reserve_pe(phb, phb->ioda.reserved_pe);
+	if (phb->ioda.reserved_pe == 0) {
+		phb->ioda.root_pe_no = phb->ioda.total_pe - 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_no);
+	} else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1)) {
+		phb->ioda.root_pe_no = phb->ioda.reserved_pe - 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_no);
+	} else {
+		phb->ioda.root_pe_no = IODA_INVALID_PE;
+	}
 
 	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
@@ -2910,6 +3011,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	/* Calculate how many 32-bit TCE segments we have */
 	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
 
+	/* Invalidate RID to PE# mapping */
+	memset(phb->ioda.pe_rmap, 0xff, sizeof(phb->ioda.pe_rmap));
+
 #if 0 /* We should really do that ... */
 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
 					 window_type,
@@ -2958,6 +3062,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	 */
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 	pnv_pci_controller_ops.enable_device_hook = pnv_pci_enable_device_hook;
+	pnv_pci_controller_ops.setup_bridge = pnv_pci_setup_bridge;
 	pnv_pci_controller_ops.window_alignment = pnv_pci_window_alignment;
 	pnv_pci_controller_ops.reset_secondary_bus = pnv_pci_reset_secondary_bus;
 	hose->controller_ops = pnv_pci_controller_ops;
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 2784951..1bea3a8 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -134,6 +134,8 @@ struct pnv_phb {
 			/* Global bridge info */
 			unsigned int		total_pe;
 			unsigned int		reserved_pe;
+			unsigned int		root_pe_no;
+			unsigned int		root_pe_populated;
 
 			/* 32-bit MMIO window */
 			unsigned int		m32_size;
@@ -176,7 +178,7 @@ struct pnv_phb {
 			 * we are to support more than 256 PEs, indexed
 			 * bus { bus, devfn }
 			 */
-			unsigned char		pe_rmap[0x10000];
+			unsigned int		pe_rmap[0x10000];
 
 			/* 32-bit TCE tables allocation */
 			unsigned int		dma_weight;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 07/21] powerpc/powernv: Release PEs dynamically
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The original code doesn't support releasing PEs dynamically, meaning
that PE and the associated resources (IO, M32, M64 and DMA) can't
be released when unplugging a PCI adapter from one hotpluggable slot.

The patch takes object oriented methodology, introducs reference
count to PE, which is initialized to 1 and increased with 1 when a
new PCI device joins the PE. Once the last PCI device leaves the
PE, the PE is going to be release together with its associated
(IO, M32, M64, DMA) resources.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |   3 +
 arch/powerpc/kernel/pci-hotplug.c         |   5 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 658 +++++++++++++++++++-----------
 arch/powerpc/platforms/powernv/pci.h      |   4 +-
 4 files changed, 432 insertions(+), 238 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 5367eb3..a6ad4b1 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -31,6 +31,9 @@ struct pci_controller_ops {
 	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
 	void		(*setup_bridge)(struct pci_bus *, unsigned long);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
+
+	/* Called when PCI device is released */
+	void		(*release_device)(struct pci_dev *);
 };
 
 /*
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 7ed85a6..0040343 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -29,6 +29,11 @@
  */
 void pcibios_release_device(struct pci_dev *dev)
 {
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+	if (hose->controller_ops.release_device)
+		hose->controller_ops.release_device(dev);
+
 	eeh_remove_device(dev);
 }
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 910fb67..ef8c216 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -12,6 +12,8 @@
 #undef DEBUG
 
 #include <linux/kernel.h>
+#include <linux/atomic.h>
+#include <linux/kref.h>
 #include <linux/pci.h>
 #include <linux/crash_dump.h>
 #include <linux/debugfs.h>
@@ -47,6 +49,8 @@
 /* 256M DMA window, 4K TCE pages, 8 bytes TCE */
 #define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
 
+static void pnv_ioda_release_pe(struct kref *kref);
+
 static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 			    const char *fmt, ...)
 {
@@ -123,25 +127,400 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
-static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
+static inline void pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
 {
-	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
-		pr_warn("%s: Invalid PE %d on PHB#%x\n",
-			__func__, pe_no, phb->hose->global_number);
+	if (!pe)
+		return;
+
+	kref_get(&pe->kref);
+}
+
+static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
+{
+	unsigned int count;
+
+	if (!pe)
 		return;
+
+	/*
+	 * The count is initialized to 1 and increased with 1 when
+	 * a new PCI device is bound with the PE. Once the last PCI
+	 * device is leaving from the PE, the PE is going to be
+	 * released.
+	 */
+	count = atomic_read(&pe->kref.refcount);
+	if (count == 2)
+		kref_sub(&pe->kref, 2, pnv_ioda_release_pe);
+	else
+		kref_put(&pe->kref, pnv_ioda_release_pe);
+}
+
+static void pnv_pci_release_device(struct pci_dev *pdev)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	struct pnv_ioda_pe *pe;
+
+	if (pdn && pdn->pe_number != IODA_INVALID_PE) {
+		pe = &phb->ioda.pe_array[pdn->pe_number];
+		pnv_ioda_pe_put(pe);
+		pdn->pe_number = IODA_INVALID_PE;
 	}
+}
 
-	if (test_and_set_bit(pe_no, phb->ioda.pe_alloc)) {
-		pr_warn("%s: PE %d was assigned on PHB#%x\n",
-			__func__, pe_no, phb->hose->global_number);
+static void pnv_ioda_release_pe_dma(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	int index, count;
+	unsigned long tbl_addr, tbl_size;
+
+	/* No DMA capability for slave PEs */
+	if (pe->flags & PNV_IODA_PE_SLAVE)
+		return;
+
+	/* Bypass DMA window */
+	if (phb->type == PNV_PHB_IODA2 &&
+	    pe->tce_bypass_enabled &&
+	    pe->tce32_table &&
+	    pe->tce32_table->set_bypass)
+		pe->tce32_table->set_bypass(pe->tce32_table, false);
+
+	/* 32-bits DMA window */
+	count = pe->tce32_seg_end - pe->tce32_seg_start;
+	tbl_addr = pe->tce32_table->it_base;
+	if (!count)
 		return;
+
+	/* Free IOMMU table */
+	iommu_free_table(pe->tce32_table,
+			 of_node_full_name(phb->hose->dn));
+
+	/* Deconfigure TCE table */
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		for (index = 0; index < count; index++)
+			opal_pci_map_pe_dma_window(phb->opal_id,
+						   pe->pe_number,
+						   pe->tce32_seg_start + index,
+						   1,
+						   __pa(tbl_addr) +
+						   index * TCE32_TABLE_SIZE,
+						   0,
+						   0x1000);
+		bitmap_clear(phb->ioda.tce32_segmap,
+			     pe->tce32_seg_start,
+			     count);
+		tbl_size = TCE32_TABLE_SIZE * count;
+		break;
+	case PNV_PHB_IODA2:
+		opal_pci_map_pe_dma_window(phb->opal_id,
+					   pe->pe_number,
+					   pe->pe_number << 1,
+					   1,
+					   __pa(tbl_addr),
+					   0,
+					   0x1000);
+		tbl_size = (1ul << ilog2(phb->ioda.m32_pci_base));
+		tbl_size = (tbl_size >> IOMMU_PAGE_SHIFT_4K) * 8;
+		break;
+	default:
+		pe_warn(pe, "Unsupported PHB type %d\n", phb->type);
+		return;
+	}
+
+	/* Free memory of IOMMU table */
+	free_pages(tbl_addr, get_order(tbl_size));
+	pe->tce32_table = NULL;
+	pe->tce32_seg_start = 0;
+	pe->tce32_seg_end = 0;
+}
+
+static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	unsigned long *segmap = NULL, *pe_segmap = NULL;
+	int i;
+	uint16_t win, win_type[] = { OPAL_IO_WINDOW_TYPE,
+				     OPAL_M32_WINDOW_TYPE,
+				     OPAL_M64_WINDOW_TYPE };
+
+	for (win = 0; win < ARRAY_SIZE(win_type); win++) {
+		switch (win_type[win]) {
+		case OPAL_IO_WINDOW_TYPE:
+			segmap = phb->ioda.io_segmap;
+			pe_segmap = pe->io_segmap;
+			break;
+		case OPAL_M32_WINDOW_TYPE:
+			segmap = phb->ioda.m32_segmap;
+			pe_segmap = pe->m32_segmap;
+			break;
+		case OPAL_M64_WINDOW_TYPE:
+			segmap = phb->ioda.m64_segmap;
+			pe_segmap = pe->m64_segmap;
+			break;
+		}
+		i = -1;
+		while ((i = find_next_bit(pe_segmap,
+			phb->ioda.total_pe, i + 1)) < phb->ioda.total_pe) {
+			if (win_type[win] == OPAL_IO_WINDOW_TYPE ||
+			    win_type[win] == OPAL_M32_WINDOW_TYPE)
+				opal_pci_map_pe_mmio_window(phb->opal_id,
+						phb->ioda.reserved_pe,
+						win_type[win], 0, i);
+			else if (phb->type == PNV_PHB_IODA1)
+				opal_pci_map_pe_mmio_window(phb->opal_id,
+						phb->ioda.reserved_pe,
+						win_type[win],
+						i / 8, i % 8);
+
+			clear_bit(i, pe_segmap);
+			clear_bit(i, segmap);
+		}
+	}
+}
+
+static int pnv_ioda_set_one_peltv(struct pnv_phb *phb,
+				  struct pnv_ioda_pe *parent,
+				  struct pnv_ioda_pe *child,
+				  bool is_add)
+{
+	const char *desc = is_add ? "adding" : "removing";
+	uint8_t op = is_add ? OPAL_ADD_PE_TO_DOMAIN :
+			      OPAL_REMOVE_PE_FROM_DOMAIN;
+	struct pnv_ioda_pe *slave;
+	long rc;
+
+	/* Parent PE affects child PE */
+	rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
+				child->pe_number, op);
+	if (rc != OPAL_SUCCESS) {
+		pe_warn(child, "OPAL error %ld %s to parent PELTV\n",
+			rc, desc);
+		return -ENXIO;
+	}
+
+	if (!(child->flags & PNV_IODA_PE_MASTER))
+		return 0;
+
+	/* Compound case: parent PE affects slave PEs */
+	list_for_each_entry(slave, &child->slaves, list) {
+		rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
+					slave->pe_number, op);
+		if (rc != OPAL_SUCCESS) {
+			pe_warn(slave, "OPAL error %ld %s to parent PELTV\n",
+				rc, desc);
+			return -ENXIO;
+		}
+	}
+
+	return 0;
+}
+
+static int pnv_ioda_set_peltv(struct pnv_ioda_pe *pe, bool is_add)
+{
+	struct pnv_phb *phb = pe->phb;
+	struct pnv_ioda_pe *slave;
+	struct pci_dev *pdev = NULL;
+	int ret;
+
+	/*
+	 * Clear PE frozen state. If it's master PE, we need
+	 * clear slave PE frozen state as well.
+	 */
+	opal_pci_eeh_freeze_clear(phb->opal_id,
+				  pe->pe_number,
+				  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry(slave, &pe->slaves, list) {
+			opal_pci_eeh_freeze_clear(phb->opal_id,
+						  slave->pe_number,
+						  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+		}
+	}
+
+	/*
+	 * Associate PE in PELT. We need add the PE into the
+	 * corresponding PELT-V as well. Otherwise, the error
+	 * originated from the PE might contribute to other
+	 * PEs.
+	 */
+	ret = pnv_ioda_set_one_peltv(phb, pe, pe, is_add);
+	if (ret)
+		return ret;
+
+	/* For compound PEs, any one affects all of them */
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry(slave, &pe->slaves, list) {
+			ret = pnv_ioda_set_one_peltv(phb, slave, pe, is_add);
+			if (ret)
+				return ret;
+		}
+	}
+
+	if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
+		pdev = pe->pbus->self;
+	else if (pe->flags & PNV_IODA_PE_DEV)
+		pdev = pe->pdev->bus->self;
+#ifdef CONFIG_PCI_IOV
+	else if (pe->flags & PNV_IODA_PE_VF)
+		pdev = pe->parent_dev->bus->self;
+#endif /* CONFIG_PCI_IOV */
+
+	while (pdev) {
+		struct pci_dn *pdn = pci_get_pdn(pdev);
+		struct pnv_ioda_pe *parent;
+
+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
+			parent = &phb->ioda.pe_array[pdn->pe_number];
+			ret = pnv_ioda_set_one_peltv(phb, parent, pe, is_add);
+			if (ret)
+				return ret;
+		}
+
+		pdev = pdev->bus->self;
+	}
+
+	return 0;
+}
+
+static void pnv_ioda_deconfigure_pe(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	struct pci_dev *parent;
+	uint8_t bcomp, dcomp, fcomp;
+	long rid_end, rid;
+	int64_t rc;
+
+	/* Tear down MVE */
+	if (phb->type == PNV_PHB_IODA1 &&
+	    pe->mve_number != -1) {
+		rc = opal_pci_set_mve(phb->opal_id,
+				      pe->mve_number,
+				      phb->ioda.reserved_pe);
+		if (rc != OPAL_SUCCESS)
+			pe_warn(pe, "Error %lld unmapping MVE#%d\n",
+				rc, pe->mve_number);
+		rc = opal_pci_set_mve_enable(phb->opal_id,
+					     pe->mve_number,
+					     OPAL_DISABLE_MVE);
+		if (rc != OPAL_SUCCESS)
+			pe_warn(pe, "Error %lld disabling MVE#%d\n",
+				rc, pe->mve_number);
+		pe->mve_number = -1;
+	}
+
+	/* Unmapping PELTV */
+	pnv_ioda_set_peltv(pe, false);
+
+	/* To unmap PELTM */
+	if (pe->pbus) {
+		int count;
+
+		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
+		parent = pe->pbus->self;
+		if (pe->flags & PNV_IODA_PE_BUS_ALL)
+			count = pe->pbus->busn_res.end -
+				pe->pbus->busn_res.start + 1;
+		else
+			count = 1;
+
+		switch(count) {
+		case  1: bcomp = OpalPciBusAll;   break;
+		case  2: bcomp = OpalPciBus7Bits; break;
+		case  4: bcomp = OpalPciBus6Bits; break;
+		case  8: bcomp = OpalPciBus5Bits; break;
+		case 16: bcomp = OpalPciBus4Bits; break;
+		case 32: bcomp = OpalPciBus3Bits; break;
+		default:
+			/* Fail back to case of one bus */
+			pe_warn(pe, "Cannot support %d buses\n", count);
+			bcomp = OpalPciBusAll;
+		}
+		rid_end = pe->rid + (count << 8);
+	} else {
+#ifdef CONFIG_PCI_IOV
+		if (pe->flags & PNV_IODA_PE_VF)
+			parent = pe->parent_dev;
+		else
+#endif
+			parent = pe->pdev->bus->self;
+		bcomp = OpalPciBusAll;
+		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
+		rid_end = pe->rid + 1;
+	}
+
+	/* Clear RID mapping */
+	for (rid = pe->rid; rid < rid_end; rid++)
+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
+
+	/* Unmapping PELTM */
+	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
+			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
+	if (rc)
+		pe_warn(pe, "Error %ld unmapping PELTM\n", rc);
+}
+
+static void pnv_ioda_release_pe(struct kref *kref)
+{
+	struct pnv_ioda_pe *pe = container_of(kref, struct pnv_ioda_pe, kref);
+	struct pnv_ioda_pe *tmp, *slave;
+	struct pnv_phb *phb = pe->phb;
+
+	pnv_ioda_release_pe_dma(pe);
+	pnv_ioda_release_pe_seg(pe);
+	pnv_ioda_deconfigure_pe(pe);
+
+	/* Release slave PEs for compound PE */
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
+			pnv_ioda_pe_put(slave);
+	}
+
+	/* Remove the PE from various list. We need remove slave
+	 * PE from master's list.
+	 */
+	list_del(&pe->dma_link);
+	list_del(&pe->list);
+
+	/* Free PE number */
+	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
+}
+
+static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb,
+					    int pe_no)
+{
+	struct pnv_ioda_pe *pe = &phb->ioda.pe_array[pe_no];
+
+	kref_init(&pe->kref);
+	pe->phb = phb;
+	pe->pe_number = pe_no;
+	INIT_LIST_HEAD(&pe->dma_link);
+	INIT_LIST_HEAD(&pe->list);
+
+	return pe;
+}
+
+static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb,
+					       int pe_no)
+{
+	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
+		pr_warn("%s: Invalid PE %d on PHB#%x\n",
+			__func__, pe_no, phb->hose->global_number);
+		return NULL;
 	}
 
-	phb->ioda.pe_array[pe_no].phb = phb;
-	phb->ioda.pe_array[pe_no].pe_number = pe_no;
+	/*
+	 * Same PE might be reserved for multiple times, which
+	 * is out of problem actually.
+	 */
+	set_bit(pe_no, phb->ioda.pe_alloc);
+	return pnv_ioda_init_pe(phb, pe_no);
 }
 
-static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
+static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
 	unsigned long pe_no;
 	unsigned long limit = phb->ioda.total_pe - 1;
@@ -154,20 +533,10 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 			break;
 
 		if (--limit >= phb->ioda.total_pe)
-			return IODA_INVALID_PE;
+			return NULL;
 	} while(1);
 
-	phb->ioda.pe_array[pe_no].phb = phb;
-	phb->ioda.pe_array[pe_no].pe_number = pe_no;
-	return pe_no;
-}
-
-static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
-{
-	WARN_ON(phb->ioda.pe_array[pe].pdev);
-
-	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
-	clear_bit(pe, phb->ioda.pe_alloc);
+	return pnv_ioda_init_pe(phb, pe_no);
 }
 
 static int pnv_ioda1_init_m64(struct pnv_phb *phb)
@@ -382,8 +751,9 @@ static void pnv_ioda_reserve_m64_pe(struct pnv_phb *phb,
 	}
 }
 
-static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
-				struct pci_bus *bus, int all)
+static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
+						struct pci_bus *bus,
+						int all)
 {
 	resource_size_t segsz = phb->ioda.m64_segsize;
 	struct pci_dev *pdev;
@@ -394,14 +764,14 @@ static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
 	int i;
 
 	if (!pnv_ioda_need_m64_pe(phb, bus))
-		return IODA_INVALID_PE;
+		return NULL;
 
         /* Allocate bitmap */
 	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
 	pe_bitsmap = kzalloc(size, GFP_KERNEL);
 	if (!pe_bitsmap) {
 		pr_warn("%s: Out of memory !\n", __func__);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/* The bridge's M64 window might be extended to PHB's M64
@@ -438,7 +808,7 @@ static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
 	/* No M64 window found ? */
 	if (bitmap_empty(pe_bitsmap, phb->ioda.total_pe)) {
 		kfree(pe_bitsmap);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/* Figure out the master PE and put all slave PEs
@@ -491,7 +861,7 @@ static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
 	}
 
 	kfree(pe_bitsmap);
-	return master_pe->pe_number;
+	return master_pe;
 }
 
 static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
@@ -695,7 +1065,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
  * but in the meantime, we need to protect them to avoid warnings
  */
 #ifdef CONFIG_PCI_MSI
-static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
+static struct pnv_ioda_pe *pnv_ioda_pci_dev_to_pe(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -709,191 +1079,6 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 }
 #endif /* CONFIG_PCI_MSI */
 
-static int pnv_ioda_set_one_peltv(struct pnv_phb *phb,
-				  struct pnv_ioda_pe *parent,
-				  struct pnv_ioda_pe *child,
-				  bool is_add)
-{
-	const char *desc = is_add ? "adding" : "removing";
-	uint8_t op = is_add ? OPAL_ADD_PE_TO_DOMAIN :
-			      OPAL_REMOVE_PE_FROM_DOMAIN;
-	struct pnv_ioda_pe *slave;
-	long rc;
-
-	/* Parent PE affects child PE */
-	rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
-				child->pe_number, op);
-	if (rc != OPAL_SUCCESS) {
-		pe_warn(child, "OPAL error %ld %s to parent PELTV\n",
-			rc, desc);
-		return -ENXIO;
-	}
-
-	if (!(child->flags & PNV_IODA_PE_MASTER))
-		return 0;
-
-	/* Compound case: parent PE affects slave PEs */
-	list_for_each_entry(slave, &child->slaves, list) {
-		rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
-					slave->pe_number, op);
-		if (rc != OPAL_SUCCESS) {
-			pe_warn(slave, "OPAL error %ld %s to parent PELTV\n",
-				rc, desc);
-			return -ENXIO;
-		}
-	}
-
-	return 0;
-}
-
-static int pnv_ioda_set_peltv(struct pnv_phb *phb,
-			      struct pnv_ioda_pe *pe,
-			      bool is_add)
-{
-	struct pnv_ioda_pe *slave;
-	struct pci_dev *pdev = NULL;
-	int ret;
-
-	/*
-	 * Clear PE frozen state. If it's master PE, we need
-	 * clear slave PE frozen state as well.
-	 */
-	if (is_add) {
-		opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
-					  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
-		if (pe->flags & PNV_IODA_PE_MASTER) {
-			list_for_each_entry(slave, &pe->slaves, list)
-				opal_pci_eeh_freeze_clear(phb->opal_id,
-							  slave->pe_number,
-							  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
-		}
-	}
-
-	/*
-	 * Associate PE in PELT. We need add the PE into the
-	 * corresponding PELT-V as well. Otherwise, the error
-	 * originated from the PE might contribute to other
-	 * PEs.
-	 */
-	ret = pnv_ioda_set_one_peltv(phb, pe, pe, is_add);
-	if (ret)
-		return ret;
-
-	/* For compound PEs, any one affects all of them */
-	if (pe->flags & PNV_IODA_PE_MASTER) {
-		list_for_each_entry(slave, &pe->slaves, list) {
-			ret = pnv_ioda_set_one_peltv(phb, slave, pe, is_add);
-			if (ret)
-				return ret;
-		}
-	}
-
-	if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
-		pdev = pe->pbus->self;
-	else if (pe->flags & PNV_IODA_PE_DEV)
-		pdev = pe->pdev->bus->self;
-#ifdef CONFIG_PCI_IOV
-	else if (pe->flags & PNV_IODA_PE_VF)
-		pdev = pe->parent_dev->bus->self;
-#endif /* CONFIG_PCI_IOV */
-	while (pdev) {
-		struct pci_dn *pdn = pci_get_pdn(pdev);
-		struct pnv_ioda_pe *parent;
-
-		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
-			parent = &phb->ioda.pe_array[pdn->pe_number];
-			ret = pnv_ioda_set_one_peltv(phb, parent, pe, is_add);
-			if (ret)
-				return ret;
-		}
-
-		pdev = pdev->bus->self;
-	}
-
-	return 0;
-}
-
-#ifdef CONFIG_PCI_IOV
-static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
-{
-	struct pci_dev *parent;
-	uint8_t bcomp, dcomp, fcomp;
-	int64_t rc;
-	long rid_end, rid;
-
-	/* Currently, we just deconfigure VF PE. Bus PE will always there.*/
-	if (pe->pbus) {
-		int count;
-
-		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
-		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
-		parent = pe->pbus->self;
-		if (pe->flags & PNV_IODA_PE_BUS_ALL)
-			count = pe->pbus->busn_res.end - pe->pbus->busn_res.start + 1;
-		else
-			count = 1;
-
-		switch(count) {
-		case  1: bcomp = OpalPciBusAll;         break;
-		case  2: bcomp = OpalPciBus7Bits;       break;
-		case  4: bcomp = OpalPciBus6Bits;       break;
-		case  8: bcomp = OpalPciBus5Bits;       break;
-		case 16: bcomp = OpalPciBus4Bits;       break;
-		case 32: bcomp = OpalPciBus3Bits;       break;
-		default:
-			dev_err(&pe->pbus->dev, "Number of subordinate buses %d unsupported\n",
-			        count);
-			/* Do an exact match only */
-			bcomp = OpalPciBusAll;
-		}
-		rid_end = pe->rid + (count << 8);
-	} else {
-		if (pe->flags & PNV_IODA_PE_VF)
-			parent = pe->parent_dev;
-		else
-			parent = pe->pdev->bus->self;
-		bcomp = OpalPciBusAll;
-		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
-		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
-		rid_end = pe->rid + 1;
-	}
-
-	/* Clear the reverse map */
-	for (rid = pe->rid; rid < rid_end; rid++)
-		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
-
-	/* Release from all parents PELT-V */
-	while (parent) {
-		struct pci_dn *pdn = pci_get_pdn(parent);
-		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
-			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
-						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
-			/* XXX What to do in case of error ? */
-		}
-		parent = parent->bus->self;
-	}
-
-	opal_pci_eeh_freeze_set(phb->opal_id, pe->pe_number,
-				  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
-
-	/* Disassociate PE in PELT */
-	rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number,
-				pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
-	if (rc)
-		pe_warn(pe, "OPAL error %ld remove self from PELTV\n", rc);
-	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
-			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
-	if (rc)
-		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
-
-	pe->pbus = NULL;
-	pe->pdev = NULL;
-	pe->parent_dev = NULL;
-
-	return 0;
-}
-#endif /* CONFIG_PCI_IOV */
-
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *parent;
@@ -953,7 +1138,7 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 	}
 
 	/* Configure PELTV */
-	pnv_ioda_set_peltv(phb, pe, true);
+	pnv_ioda_set_peltv(pe, true);
 
 	/* Setup reverse map */
 	for (rid = pe->rid; rid < rid_end; rid++)
@@ -1207,6 +1392,8 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 		if (pdn->pe_number != IODA_INVALID_PE)
 			continue;
 
+		/* Increase reference count of the parent PE */
+		pnv_ioda_pe_get(pe);
 		pdn->pe_number = pe->pe_number;
 		pe->dma_weight += pnv_ioda_dev_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
@@ -1224,7 +1411,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
-	struct pnv_ioda_pe *pe;
+	struct pnv_ioda_pe *pe = NULL;
 	int pe_num = IODA_INVALID_PE;
 
 	/* For partial hotplug case, the PE instance hasn't been destroyed
@@ -1240,24 +1427,24 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	}
 
 	/* PE number for root bus should have been reserved */
-	if (pci_is_root_bus(bus))
-		pe_num = phb->ioda.root_pe_no;
+	if (pci_is_root_bus(bus) &&
+	    phb->ioda.root_pe_no != IODA_INVALID_PE)
+		pe = &phb->ioda.pe_array[phb->ioda.root_pe_no];
 
 	/* Check if PE is determined by M64 */
-	if (pe_num == IODA_INVALID_PE && phb->pick_m64_pe)
-		pe_num = phb->pick_m64_pe(phb, bus, all);
+	if (!pe && phb->pick_m64_pe)
+		pe = phb->pick_m64_pe(phb, bus, all);
 
 	/* The PE number isn't pinned by M64 */
-	if (pe_num == IODA_INVALID_PE)
-		pe_num = pnv_ioda_alloc_pe(phb);
+	if (!pe)
+		pe = pnv_ioda_alloc_pe(phb);
 
-	if (pe_num == IODA_INVALID_PE) {
-		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
+	if (!pe) {
+		pr_warn("%s: No enough PE# available for PCI bus %04x:%02x\n",
 			__func__, pci_domain_nr(bus), bus->number);
 		return NULL;
 	}
 
-	pe = &phb->ioda.pe_array[pe_num];
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
@@ -1274,14 +1461,12 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
 		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
-		pe->pbus = NULL;
+		pnv_ioda_pe_put(pe);
 		return NULL;
 	}
 
 	pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
-			GFP_KERNEL, hose->node);
+				       GFP_KERNEL, hose->node);
 	pe->tce32_table->data = pe;
 
 	/* Associate it with all child devices */
@@ -1521,9 +1706,9 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		list_del(&pe->list);
 		mutex_unlock(&phb->ioda.pe_list_mutex);
 
-		pnv_ioda_deconfigure_pe(phb, pe);
+		pnv_ioda_deconfigure_pe(pe);
 
-		pnv_ioda_free_pe(phb, pe->pe_number);
+		pnv_ioda_pe_put(pe);
 	}
 }
 
@@ -1601,9 +1786,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 
 		if (pnv_ioda_configure_pe(phb, pe)) {
 			/* XXX What do we do here ? */
-			if (pe_num)
-				pnv_ioda_free_pe(phb, pe_num);
-			pe->pdev = NULL;
+			pnv_ioda_pe_put(pe);
 			continue;
 		}
 
@@ -2263,7 +2446,7 @@ int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode)
 	struct pnv_ioda_pe *pe;
 	int rc;
 
-	pe = pnv_ioda_get_pe(dev);
+	pe = pnv_ioda_pci_dev_to_pe(dev);
 	if (!pe)
 		return -ENODEV;
 
@@ -2379,7 +2562,7 @@ int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 	struct pnv_ioda_pe *pe;
 	int rc;
 
-	if (!(pe = pnv_ioda_get_pe(dev)))
+	if (!(pe = pnv_ioda_pci_dev_to_pe(dev)))
 		return -ENODEV;
 
 	/* Assign XIVE to PE */
@@ -2401,7 +2584,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
 {
-	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
+	struct pnv_ioda_pe *pe = pnv_ioda_pci_dev_to_pe(dev);
 	unsigned int xive_num = hwirq - phb->msi_base;
 	__be32 data;
 	int rc;
@@ -3065,6 +3248,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	pnv_pci_controller_ops.setup_bridge = pnv_pci_setup_bridge;
 	pnv_pci_controller_ops.window_alignment = pnv_pci_window_alignment;
 	pnv_pci_controller_ops.reset_secondary_bus = pnv_pci_reset_secondary_bus;
+	pnv_pci_controller_ops.release_device = pnv_pci_release_device;
 	hose->controller_ops = pnv_pci_controller_ops;
 
 #ifdef CONFIG_PCI_IOV
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 1bea3a8..8b10f01 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -28,6 +28,7 @@ enum pnv_phb_model {
 /* Data associated with a PE, including IOMMU tracking etc.. */
 struct pnv_phb;
 struct pnv_ioda_pe {
+	struct kref		kref;
 	unsigned long		flags;
 	struct pnv_phb		*phb;
 
@@ -120,7 +121,8 @@ struct pnv_phb {
 	void (*shutdown)(struct pnv_phb *phb);
 	int (*init_m64)(struct pnv_phb *phb);
 	void (*reserve_m64_pe)(struct pnv_phb *phb, struct pci_bus *bus);
-	int (*pick_m64_pe)(struct pnv_phb *phb, struct pci_bus *bus, int all);
+	struct pnv_ioda_pe *(*pick_m64_pe)(struct pnv_phb *phb,
+					   struct pci_bus *bus, int all);
 	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
 	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
 	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 07/21] powerpc/powernv: Release PEs dynamically
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The original code doesn't support releasing PEs dynamically, meaning
that PE and the associated resources (IO, M32, M64 and DMA) can't
be released when unplugging a PCI adapter from one hotpluggable slot.

The patch takes object oriented methodology, introducs reference
count to PE, which is initialized to 1 and increased with 1 when a
new PCI device joins the PE. Once the last PCI device leaves the
PE, the PE is going to be release together with its associated
(IO, M32, M64, DMA) resources.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |   3 +
 arch/powerpc/kernel/pci-hotplug.c         |   5 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 658 +++++++++++++++++++-----------
 arch/powerpc/platforms/powernv/pci.h      |   4 +-
 4 files changed, 432 insertions(+), 238 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 5367eb3..a6ad4b1 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -31,6 +31,9 @@ struct pci_controller_ops {
 	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
 	void		(*setup_bridge)(struct pci_bus *, unsigned long);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
+
+	/* Called when PCI device is released */
+	void		(*release_device)(struct pci_dev *);
 };
 
 /*
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 7ed85a6..0040343 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -29,6 +29,11 @@
  */
 void pcibios_release_device(struct pci_dev *dev)
 {
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+	if (hose->controller_ops.release_device)
+		hose->controller_ops.release_device(dev);
+
 	eeh_remove_device(dev);
 }
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 910fb67..ef8c216 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -12,6 +12,8 @@
 #undef DEBUG
 
 #include <linux/kernel.h>
+#include <linux/atomic.h>
+#include <linux/kref.h>
 #include <linux/pci.h>
 #include <linux/crash_dump.h>
 #include <linux/debugfs.h>
@@ -47,6 +49,8 @@
 /* 256M DMA window, 4K TCE pages, 8 bytes TCE */
 #define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
 
+static void pnv_ioda_release_pe(struct kref *kref);
+
 static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 			    const char *fmt, ...)
 {
@@ -123,25 +127,400 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
-static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
+static inline void pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
 {
-	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
-		pr_warn("%s: Invalid PE %d on PHB#%x\n",
-			__func__, pe_no, phb->hose->global_number);
+	if (!pe)
+		return;
+
+	kref_get(&pe->kref);
+}
+
+static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
+{
+	unsigned int count;
+
+	if (!pe)
 		return;
+
+	/*
+	 * The count is initialized to 1 and increased with 1 when
+	 * a new PCI device is bound with the PE. Once the last PCI
+	 * device is leaving from the PE, the PE is going to be
+	 * released.
+	 */
+	count = atomic_read(&pe->kref.refcount);
+	if (count == 2)
+		kref_sub(&pe->kref, 2, pnv_ioda_release_pe);
+	else
+		kref_put(&pe->kref, pnv_ioda_release_pe);
+}
+
+static void pnv_pci_release_device(struct pci_dev *pdev)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	struct pnv_ioda_pe *pe;
+
+	if (pdn && pdn->pe_number != IODA_INVALID_PE) {
+		pe = &phb->ioda.pe_array[pdn->pe_number];
+		pnv_ioda_pe_put(pe);
+		pdn->pe_number = IODA_INVALID_PE;
 	}
+}
 
-	if (test_and_set_bit(pe_no, phb->ioda.pe_alloc)) {
-		pr_warn("%s: PE %d was assigned on PHB#%x\n",
-			__func__, pe_no, phb->hose->global_number);
+static void pnv_ioda_release_pe_dma(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	int index, count;
+	unsigned long tbl_addr, tbl_size;
+
+	/* No DMA capability for slave PEs */
+	if (pe->flags & PNV_IODA_PE_SLAVE)
+		return;
+
+	/* Bypass DMA window */
+	if (phb->type == PNV_PHB_IODA2 &&
+	    pe->tce_bypass_enabled &&
+	    pe->tce32_table &&
+	    pe->tce32_table->set_bypass)
+		pe->tce32_table->set_bypass(pe->tce32_table, false);
+
+	/* 32-bits DMA window */
+	count = pe->tce32_seg_end - pe->tce32_seg_start;
+	tbl_addr = pe->tce32_table->it_base;
+	if (!count)
 		return;
+
+	/* Free IOMMU table */
+	iommu_free_table(pe->tce32_table,
+			 of_node_full_name(phb->hose->dn));
+
+	/* Deconfigure TCE table */
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		for (index = 0; index < count; index++)
+			opal_pci_map_pe_dma_window(phb->opal_id,
+						   pe->pe_number,
+						   pe->tce32_seg_start + index,
+						   1,
+						   __pa(tbl_addr) +
+						   index * TCE32_TABLE_SIZE,
+						   0,
+						   0x1000);
+		bitmap_clear(phb->ioda.tce32_segmap,
+			     pe->tce32_seg_start,
+			     count);
+		tbl_size = TCE32_TABLE_SIZE * count;
+		break;
+	case PNV_PHB_IODA2:
+		opal_pci_map_pe_dma_window(phb->opal_id,
+					   pe->pe_number,
+					   pe->pe_number << 1,
+					   1,
+					   __pa(tbl_addr),
+					   0,
+					   0x1000);
+		tbl_size = (1ul << ilog2(phb->ioda.m32_pci_base));
+		tbl_size = (tbl_size >> IOMMU_PAGE_SHIFT_4K) * 8;
+		break;
+	default:
+		pe_warn(pe, "Unsupported PHB type %d\n", phb->type);
+		return;
+	}
+
+	/* Free memory of IOMMU table */
+	free_pages(tbl_addr, get_order(tbl_size));
+	pe->tce32_table = NULL;
+	pe->tce32_seg_start = 0;
+	pe->tce32_seg_end = 0;
+}
+
+static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	unsigned long *segmap = NULL, *pe_segmap = NULL;
+	int i;
+	uint16_t win, win_type[] = { OPAL_IO_WINDOW_TYPE,
+				     OPAL_M32_WINDOW_TYPE,
+				     OPAL_M64_WINDOW_TYPE };
+
+	for (win = 0; win < ARRAY_SIZE(win_type); win++) {
+		switch (win_type[win]) {
+		case OPAL_IO_WINDOW_TYPE:
+			segmap = phb->ioda.io_segmap;
+			pe_segmap = pe->io_segmap;
+			break;
+		case OPAL_M32_WINDOW_TYPE:
+			segmap = phb->ioda.m32_segmap;
+			pe_segmap = pe->m32_segmap;
+			break;
+		case OPAL_M64_WINDOW_TYPE:
+			segmap = phb->ioda.m64_segmap;
+			pe_segmap = pe->m64_segmap;
+			break;
+		}
+		i = -1;
+		while ((i = find_next_bit(pe_segmap,
+			phb->ioda.total_pe, i + 1)) < phb->ioda.total_pe) {
+			if (win_type[win] == OPAL_IO_WINDOW_TYPE ||
+			    win_type[win] == OPAL_M32_WINDOW_TYPE)
+				opal_pci_map_pe_mmio_window(phb->opal_id,
+						phb->ioda.reserved_pe,
+						win_type[win], 0, i);
+			else if (phb->type == PNV_PHB_IODA1)
+				opal_pci_map_pe_mmio_window(phb->opal_id,
+						phb->ioda.reserved_pe,
+						win_type[win],
+						i / 8, i % 8);
+
+			clear_bit(i, pe_segmap);
+			clear_bit(i, segmap);
+		}
+	}
+}
+
+static int pnv_ioda_set_one_peltv(struct pnv_phb *phb,
+				  struct pnv_ioda_pe *parent,
+				  struct pnv_ioda_pe *child,
+				  bool is_add)
+{
+	const char *desc = is_add ? "adding" : "removing";
+	uint8_t op = is_add ? OPAL_ADD_PE_TO_DOMAIN :
+			      OPAL_REMOVE_PE_FROM_DOMAIN;
+	struct pnv_ioda_pe *slave;
+	long rc;
+
+	/* Parent PE affects child PE */
+	rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
+				child->pe_number, op);
+	if (rc != OPAL_SUCCESS) {
+		pe_warn(child, "OPAL error %ld %s to parent PELTV\n",
+			rc, desc);
+		return -ENXIO;
+	}
+
+	if (!(child->flags & PNV_IODA_PE_MASTER))
+		return 0;
+
+	/* Compound case: parent PE affects slave PEs */
+	list_for_each_entry(slave, &child->slaves, list) {
+		rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
+					slave->pe_number, op);
+		if (rc != OPAL_SUCCESS) {
+			pe_warn(slave, "OPAL error %ld %s to parent PELTV\n",
+				rc, desc);
+			return -ENXIO;
+		}
+	}
+
+	return 0;
+}
+
+static int pnv_ioda_set_peltv(struct pnv_ioda_pe *pe, bool is_add)
+{
+	struct pnv_phb *phb = pe->phb;
+	struct pnv_ioda_pe *slave;
+	struct pci_dev *pdev = NULL;
+	int ret;
+
+	/*
+	 * Clear PE frozen state. If it's master PE, we need
+	 * clear slave PE frozen state as well.
+	 */
+	opal_pci_eeh_freeze_clear(phb->opal_id,
+				  pe->pe_number,
+				  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry(slave, &pe->slaves, list) {
+			opal_pci_eeh_freeze_clear(phb->opal_id,
+						  slave->pe_number,
+						  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+		}
+	}
+
+	/*
+	 * Associate PE in PELT. We need add the PE into the
+	 * corresponding PELT-V as well. Otherwise, the error
+	 * originated from the PE might contribute to other
+	 * PEs.
+	 */
+	ret = pnv_ioda_set_one_peltv(phb, pe, pe, is_add);
+	if (ret)
+		return ret;
+
+	/* For compound PEs, any one affects all of them */
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry(slave, &pe->slaves, list) {
+			ret = pnv_ioda_set_one_peltv(phb, slave, pe, is_add);
+			if (ret)
+				return ret;
+		}
+	}
+
+	if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
+		pdev = pe->pbus->self;
+	else if (pe->flags & PNV_IODA_PE_DEV)
+		pdev = pe->pdev->bus->self;
+#ifdef CONFIG_PCI_IOV
+	else if (pe->flags & PNV_IODA_PE_VF)
+		pdev = pe->parent_dev->bus->self;
+#endif /* CONFIG_PCI_IOV */
+
+	while (pdev) {
+		struct pci_dn *pdn = pci_get_pdn(pdev);
+		struct pnv_ioda_pe *parent;
+
+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
+			parent = &phb->ioda.pe_array[pdn->pe_number];
+			ret = pnv_ioda_set_one_peltv(phb, parent, pe, is_add);
+			if (ret)
+				return ret;
+		}
+
+		pdev = pdev->bus->self;
+	}
+
+	return 0;
+}
+
+static void pnv_ioda_deconfigure_pe(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	struct pci_dev *parent;
+	uint8_t bcomp, dcomp, fcomp;
+	long rid_end, rid;
+	int64_t rc;
+
+	/* Tear down MVE */
+	if (phb->type == PNV_PHB_IODA1 &&
+	    pe->mve_number != -1) {
+		rc = opal_pci_set_mve(phb->opal_id,
+				      pe->mve_number,
+				      phb->ioda.reserved_pe);
+		if (rc != OPAL_SUCCESS)
+			pe_warn(pe, "Error %lld unmapping MVE#%d\n",
+				rc, pe->mve_number);
+		rc = opal_pci_set_mve_enable(phb->opal_id,
+					     pe->mve_number,
+					     OPAL_DISABLE_MVE);
+		if (rc != OPAL_SUCCESS)
+			pe_warn(pe, "Error %lld disabling MVE#%d\n",
+				rc, pe->mve_number);
+		pe->mve_number = -1;
+	}
+
+	/* Unmapping PELTV */
+	pnv_ioda_set_peltv(pe, false);
+
+	/* To unmap PELTM */
+	if (pe->pbus) {
+		int count;
+
+		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
+		parent = pe->pbus->self;
+		if (pe->flags & PNV_IODA_PE_BUS_ALL)
+			count = pe->pbus->busn_res.end -
+				pe->pbus->busn_res.start + 1;
+		else
+			count = 1;
+
+		switch(count) {
+		case  1: bcomp = OpalPciBusAll;   break;
+		case  2: bcomp = OpalPciBus7Bits; break;
+		case  4: bcomp = OpalPciBus6Bits; break;
+		case  8: bcomp = OpalPciBus5Bits; break;
+		case 16: bcomp = OpalPciBus4Bits; break;
+		case 32: bcomp = OpalPciBus3Bits; break;
+		default:
+			/* Fail back to case of one bus */
+			pe_warn(pe, "Cannot support %d buses\n", count);
+			bcomp = OpalPciBusAll;
+		}
+		rid_end = pe->rid + (count << 8);
+	} else {
+#ifdef CONFIG_PCI_IOV
+		if (pe->flags & PNV_IODA_PE_VF)
+			parent = pe->parent_dev;
+		else
+#endif
+			parent = pe->pdev->bus->self;
+		bcomp = OpalPciBusAll;
+		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
+		rid_end = pe->rid + 1;
+	}
+
+	/* Clear RID mapping */
+	for (rid = pe->rid; rid < rid_end; rid++)
+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
+
+	/* Unmapping PELTM */
+	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
+			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
+	if (rc)
+		pe_warn(pe, "Error %ld unmapping PELTM\n", rc);
+}
+
+static void pnv_ioda_release_pe(struct kref *kref)
+{
+	struct pnv_ioda_pe *pe = container_of(kref, struct pnv_ioda_pe, kref);
+	struct pnv_ioda_pe *tmp, *slave;
+	struct pnv_phb *phb = pe->phb;
+
+	pnv_ioda_release_pe_dma(pe);
+	pnv_ioda_release_pe_seg(pe);
+	pnv_ioda_deconfigure_pe(pe);
+
+	/* Release slave PEs for compound PE */
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
+			pnv_ioda_pe_put(slave);
+	}
+
+	/* Remove the PE from various list. We need remove slave
+	 * PE from master's list.
+	 */
+	list_del(&pe->dma_link);
+	list_del(&pe->list);
+
+	/* Free PE number */
+	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
+}
+
+static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb,
+					    int pe_no)
+{
+	struct pnv_ioda_pe *pe = &phb->ioda.pe_array[pe_no];
+
+	kref_init(&pe->kref);
+	pe->phb = phb;
+	pe->pe_number = pe_no;
+	INIT_LIST_HEAD(&pe->dma_link);
+	INIT_LIST_HEAD(&pe->list);
+
+	return pe;
+}
+
+static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb,
+					       int pe_no)
+{
+	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
+		pr_warn("%s: Invalid PE %d on PHB#%x\n",
+			__func__, pe_no, phb->hose->global_number);
+		return NULL;
 	}
 
-	phb->ioda.pe_array[pe_no].phb = phb;
-	phb->ioda.pe_array[pe_no].pe_number = pe_no;
+	/*
+	 * Same PE might be reserved for multiple times, which
+	 * is out of problem actually.
+	 */
+	set_bit(pe_no, phb->ioda.pe_alloc);
+	return pnv_ioda_init_pe(phb, pe_no);
 }
 
-static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
+static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
 	unsigned long pe_no;
 	unsigned long limit = phb->ioda.total_pe - 1;
@@ -154,20 +533,10 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 			break;
 
 		if (--limit >= phb->ioda.total_pe)
-			return IODA_INVALID_PE;
+			return NULL;
 	} while(1);
 
-	phb->ioda.pe_array[pe_no].phb = phb;
-	phb->ioda.pe_array[pe_no].pe_number = pe_no;
-	return pe_no;
-}
-
-static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
-{
-	WARN_ON(phb->ioda.pe_array[pe].pdev);
-
-	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
-	clear_bit(pe, phb->ioda.pe_alloc);
+	return pnv_ioda_init_pe(phb, pe_no);
 }
 
 static int pnv_ioda1_init_m64(struct pnv_phb *phb)
@@ -382,8 +751,9 @@ static void pnv_ioda_reserve_m64_pe(struct pnv_phb *phb,
 	}
 }
 
-static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
-				struct pci_bus *bus, int all)
+static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
+						struct pci_bus *bus,
+						int all)
 {
 	resource_size_t segsz = phb->ioda.m64_segsize;
 	struct pci_dev *pdev;
@@ -394,14 +764,14 @@ static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
 	int i;
 
 	if (!pnv_ioda_need_m64_pe(phb, bus))
-		return IODA_INVALID_PE;
+		return NULL;
 
         /* Allocate bitmap */
 	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
 	pe_bitsmap = kzalloc(size, GFP_KERNEL);
 	if (!pe_bitsmap) {
 		pr_warn("%s: Out of memory !\n", __func__);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/* The bridge's M64 window might be extended to PHB's M64
@@ -438,7 +808,7 @@ static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
 	/* No M64 window found ? */
 	if (bitmap_empty(pe_bitsmap, phb->ioda.total_pe)) {
 		kfree(pe_bitsmap);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/* Figure out the master PE and put all slave PEs
@@ -491,7 +861,7 @@ static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
 	}
 
 	kfree(pe_bitsmap);
-	return master_pe->pe_number;
+	return master_pe;
 }
 
 static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
@@ -695,7 +1065,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
  * but in the meantime, we need to protect them to avoid warnings
  */
 #ifdef CONFIG_PCI_MSI
-static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
+static struct pnv_ioda_pe *pnv_ioda_pci_dev_to_pe(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -709,191 +1079,6 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 }
 #endif /* CONFIG_PCI_MSI */
 
-static int pnv_ioda_set_one_peltv(struct pnv_phb *phb,
-				  struct pnv_ioda_pe *parent,
-				  struct pnv_ioda_pe *child,
-				  bool is_add)
-{
-	const char *desc = is_add ? "adding" : "removing";
-	uint8_t op = is_add ? OPAL_ADD_PE_TO_DOMAIN :
-			      OPAL_REMOVE_PE_FROM_DOMAIN;
-	struct pnv_ioda_pe *slave;
-	long rc;
-
-	/* Parent PE affects child PE */
-	rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
-				child->pe_number, op);
-	if (rc != OPAL_SUCCESS) {
-		pe_warn(child, "OPAL error %ld %s to parent PELTV\n",
-			rc, desc);
-		return -ENXIO;
-	}
-
-	if (!(child->flags & PNV_IODA_PE_MASTER))
-		return 0;
-
-	/* Compound case: parent PE affects slave PEs */
-	list_for_each_entry(slave, &child->slaves, list) {
-		rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
-					slave->pe_number, op);
-		if (rc != OPAL_SUCCESS) {
-			pe_warn(slave, "OPAL error %ld %s to parent PELTV\n",
-				rc, desc);
-			return -ENXIO;
-		}
-	}
-
-	return 0;
-}
-
-static int pnv_ioda_set_peltv(struct pnv_phb *phb,
-			      struct pnv_ioda_pe *pe,
-			      bool is_add)
-{
-	struct pnv_ioda_pe *slave;
-	struct pci_dev *pdev = NULL;
-	int ret;
-
-	/*
-	 * Clear PE frozen state. If it's master PE, we need
-	 * clear slave PE frozen state as well.
-	 */
-	if (is_add) {
-		opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
-					  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
-		if (pe->flags & PNV_IODA_PE_MASTER) {
-			list_for_each_entry(slave, &pe->slaves, list)
-				opal_pci_eeh_freeze_clear(phb->opal_id,
-							  slave->pe_number,
-							  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
-		}
-	}
-
-	/*
-	 * Associate PE in PELT. We need add the PE into the
-	 * corresponding PELT-V as well. Otherwise, the error
-	 * originated from the PE might contribute to other
-	 * PEs.
-	 */
-	ret = pnv_ioda_set_one_peltv(phb, pe, pe, is_add);
-	if (ret)
-		return ret;
-
-	/* For compound PEs, any one affects all of them */
-	if (pe->flags & PNV_IODA_PE_MASTER) {
-		list_for_each_entry(slave, &pe->slaves, list) {
-			ret = pnv_ioda_set_one_peltv(phb, slave, pe, is_add);
-			if (ret)
-				return ret;
-		}
-	}
-
-	if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
-		pdev = pe->pbus->self;
-	else if (pe->flags & PNV_IODA_PE_DEV)
-		pdev = pe->pdev->bus->self;
-#ifdef CONFIG_PCI_IOV
-	else if (pe->flags & PNV_IODA_PE_VF)
-		pdev = pe->parent_dev->bus->self;
-#endif /* CONFIG_PCI_IOV */
-	while (pdev) {
-		struct pci_dn *pdn = pci_get_pdn(pdev);
-		struct pnv_ioda_pe *parent;
-
-		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
-			parent = &phb->ioda.pe_array[pdn->pe_number];
-			ret = pnv_ioda_set_one_peltv(phb, parent, pe, is_add);
-			if (ret)
-				return ret;
-		}
-
-		pdev = pdev->bus->self;
-	}
-
-	return 0;
-}
-
-#ifdef CONFIG_PCI_IOV
-static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
-{
-	struct pci_dev *parent;
-	uint8_t bcomp, dcomp, fcomp;
-	int64_t rc;
-	long rid_end, rid;
-
-	/* Currently, we just deconfigure VF PE. Bus PE will always there.*/
-	if (pe->pbus) {
-		int count;
-
-		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
-		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
-		parent = pe->pbus->self;
-		if (pe->flags & PNV_IODA_PE_BUS_ALL)
-			count = pe->pbus->busn_res.end - pe->pbus->busn_res.start + 1;
-		else
-			count = 1;
-
-		switch(count) {
-		case  1: bcomp = OpalPciBusAll;         break;
-		case  2: bcomp = OpalPciBus7Bits;       break;
-		case  4: bcomp = OpalPciBus6Bits;       break;
-		case  8: bcomp = OpalPciBus5Bits;       break;
-		case 16: bcomp = OpalPciBus4Bits;       break;
-		case 32: bcomp = OpalPciBus3Bits;       break;
-		default:
-			dev_err(&pe->pbus->dev, "Number of subordinate buses %d unsupported\n",
-			        count);
-			/* Do an exact match only */
-			bcomp = OpalPciBusAll;
-		}
-		rid_end = pe->rid + (count << 8);
-	} else {
-		if (pe->flags & PNV_IODA_PE_VF)
-			parent = pe->parent_dev;
-		else
-			parent = pe->pdev->bus->self;
-		bcomp = OpalPciBusAll;
-		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
-		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
-		rid_end = pe->rid + 1;
-	}
-
-	/* Clear the reverse map */
-	for (rid = pe->rid; rid < rid_end; rid++)
-		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
-
-	/* Release from all parents PELT-V */
-	while (parent) {
-		struct pci_dn *pdn = pci_get_pdn(parent);
-		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
-			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
-						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
-			/* XXX What to do in case of error ? */
-		}
-		parent = parent->bus->self;
-	}
-
-	opal_pci_eeh_freeze_set(phb->opal_id, pe->pe_number,
-				  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
-
-	/* Disassociate PE in PELT */
-	rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number,
-				pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
-	if (rc)
-		pe_warn(pe, "OPAL error %ld remove self from PELTV\n", rc);
-	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
-			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
-	if (rc)
-		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
-
-	pe->pbus = NULL;
-	pe->pdev = NULL;
-	pe->parent_dev = NULL;
-
-	return 0;
-}
-#endif /* CONFIG_PCI_IOV */
-
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *parent;
@@ -953,7 +1138,7 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 	}
 
 	/* Configure PELTV */
-	pnv_ioda_set_peltv(phb, pe, true);
+	pnv_ioda_set_peltv(pe, true);
 
 	/* Setup reverse map */
 	for (rid = pe->rid; rid < rid_end; rid++)
@@ -1207,6 +1392,8 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 		if (pdn->pe_number != IODA_INVALID_PE)
 			continue;
 
+		/* Increase reference count of the parent PE */
+		pnv_ioda_pe_get(pe);
 		pdn->pe_number = pe->pe_number;
 		pe->dma_weight += pnv_ioda_dev_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
@@ -1224,7 +1411,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
-	struct pnv_ioda_pe *pe;
+	struct pnv_ioda_pe *pe = NULL;
 	int pe_num = IODA_INVALID_PE;
 
 	/* For partial hotplug case, the PE instance hasn't been destroyed
@@ -1240,24 +1427,24 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 	}
 
 	/* PE number for root bus should have been reserved */
-	if (pci_is_root_bus(bus))
-		pe_num = phb->ioda.root_pe_no;
+	if (pci_is_root_bus(bus) &&
+	    phb->ioda.root_pe_no != IODA_INVALID_PE)
+		pe = &phb->ioda.pe_array[phb->ioda.root_pe_no];
 
 	/* Check if PE is determined by M64 */
-	if (pe_num == IODA_INVALID_PE && phb->pick_m64_pe)
-		pe_num = phb->pick_m64_pe(phb, bus, all);
+	if (!pe && phb->pick_m64_pe)
+		pe = phb->pick_m64_pe(phb, bus, all);
 
 	/* The PE number isn't pinned by M64 */
-	if (pe_num == IODA_INVALID_PE)
-		pe_num = pnv_ioda_alloc_pe(phb);
+	if (!pe)
+		pe = pnv_ioda_alloc_pe(phb);
 
-	if (pe_num == IODA_INVALID_PE) {
-		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
+	if (!pe) {
+		pr_warn("%s: No enough PE# available for PCI bus %04x:%02x\n",
 			__func__, pci_domain_nr(bus), bus->number);
 		return NULL;
 	}
 
-	pe = &phb->ioda.pe_array[pe_num];
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
@@ -1274,14 +1461,12 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
 		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
-		pe->pbus = NULL;
+		pnv_ioda_pe_put(pe);
 		return NULL;
 	}
 
 	pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
-			GFP_KERNEL, hose->node);
+				       GFP_KERNEL, hose->node);
 	pe->tce32_table->data = pe;
 
 	/* Associate it with all child devices */
@@ -1521,9 +1706,9 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		list_del(&pe->list);
 		mutex_unlock(&phb->ioda.pe_list_mutex);
 
-		pnv_ioda_deconfigure_pe(phb, pe);
+		pnv_ioda_deconfigure_pe(pe);
 
-		pnv_ioda_free_pe(phb, pe->pe_number);
+		pnv_ioda_pe_put(pe);
 	}
 }
 
@@ -1601,9 +1786,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 
 		if (pnv_ioda_configure_pe(phb, pe)) {
 			/* XXX What do we do here ? */
-			if (pe_num)
-				pnv_ioda_free_pe(phb, pe_num);
-			pe->pdev = NULL;
+			pnv_ioda_pe_put(pe);
 			continue;
 		}
 
@@ -2263,7 +2446,7 @@ int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode)
 	struct pnv_ioda_pe *pe;
 	int rc;
 
-	pe = pnv_ioda_get_pe(dev);
+	pe = pnv_ioda_pci_dev_to_pe(dev);
 	if (!pe)
 		return -ENODEV;
 
@@ -2379,7 +2562,7 @@ int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 	struct pnv_ioda_pe *pe;
 	int rc;
 
-	if (!(pe = pnv_ioda_get_pe(dev)))
+	if (!(pe = pnv_ioda_pci_dev_to_pe(dev)))
 		return -ENODEV;
 
 	/* Assign XIVE to PE */
@@ -2401,7 +2584,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
 {
-	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
+	struct pnv_ioda_pe *pe = pnv_ioda_pci_dev_to_pe(dev);
 	unsigned int xive_num = hwirq - phb->msi_base;
 	__be32 data;
 	int rc;
@@ -3065,6 +3248,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	pnv_pci_controller_ops.setup_bridge = pnv_pci_setup_bridge;
 	pnv_pci_controller_ops.window_alignment = pnv_pci_window_alignment;
 	pnv_pci_controller_ops.reset_secondary_bus = pnv_pci_reset_secondary_bus;
+	pnv_pci_controller_ops.release_device = pnv_pci_release_device;
 	hose->controller_ops = pnv_pci_controller_ops;
 
 #ifdef CONFIG_PCI_IOV
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 1bea3a8..8b10f01 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -28,6 +28,7 @@ enum pnv_phb_model {
 /* Data associated with a PE, including IOMMU tracking etc.. */
 struct pnv_phb;
 struct pnv_ioda_pe {
+	struct kref		kref;
 	unsigned long		flags;
 	struct pnv_phb		*phb;
 
@@ -120,7 +121,8 @@ struct pnv_phb {
 	void (*shutdown)(struct pnv_phb *phb);
 	int (*init_m64)(struct pnv_phb *phb);
 	void (*reserve_m64_pe)(struct pnv_phb *phb, struct pci_bus *bus);
-	int (*pick_m64_pe)(struct pnv_phb *phb, struct pci_bus *bus, int all);
+	struct pnv_ioda_pe *(*pick_m64_pe)(struct pnv_phb *phb,
+					   struct pci_bus *bus, int all);
 	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
 	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
 	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 08/21] powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

Nobody is using the this function. The patch drops it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 71 -------------------------------
 1 file changed, 71 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index ef8c216..5cd8298 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1302,77 +1302,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 }
 #endif /* CONFIG_PCI_IOV */
 
-#if 0
-static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	struct pci_dn *pdn = pci_get_pdn(dev);
-	struct pnv_ioda_pe *pe;
-	int pe_num;
-
-	if (!pdn) {
-		pr_err("%s: Device tree node not associated properly\n",
-			   pci_name(dev));
-		return NULL;
-	}
-	if (pdn->pe_number != IODA_INVALID_PE)
-		return NULL;
-
-	/* PE#0 has been pre-set */
-	if (dev->bus->number == 0)
-		pe_num = 0;
-	else
-		pe_num = pnv_ioda_alloc_pe(phb);
-	if (pe_num == IODA_INVALID_PE) {
-		pr_warning("%s: Not enough PE# available, disabling device\n",
-			   pci_name(dev));
-		return NULL;
-	}
-
-	/* NOTE: We get only one ref to the pci_dev for the pdn, not for the
-	 * pointer in the PE data structure, both should be destroyed at the
-	 * same time. However, this needs to be looked at more closely again
-	 * once we actually start removing things (Hotplug, SR-IOV, ...)
-	 *
-	 * At some point we want to remove the PDN completely anyways
-	 */
-	pe = &phb->ioda.pe_array[pe_num];
-	pci_dev_get(dev);
-	pdn->pcidev = dev;
-	pdn->pe_number = pe_num;
-	pe->pdev = dev;
-	pe->pbus = NULL;
-	pe->tce32_seg = -1;
-	pe->mve_number = -1;
-	pe->rid = dev->bus->number << 8 | pdn->devfn;
-
-	pe_info(pe, "Associated device to PE\n");
-
-	if (pnv_ioda_configure_pe(phb, pe)) {
-		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
-		pdn->pe_number = IODA_INVALID_PE;
-		pe->pdev = NULL;
-		pci_dev_put(dev);
-		return NULL;
-	}
-
-	/* Assign a DMA weight to the device */
-	pe->dma_weight = pnv_ioda_dma_weight(dev);
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
-	/* Link the PE */
-	pnv_ioda_link_pe_by_weight(phb, pe);
-
-	return pe;
-}
-#endif /* Useful for SRIOV case */
-
 static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *dev;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 08/21] powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

Nobody is using the this function. The patch drops it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 71 -------------------------------
 1 file changed, 71 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index ef8c216..5cd8298 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1302,77 +1302,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 }
 #endif /* CONFIG_PCI_IOV */
 
-#if 0
-static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	struct pci_dn *pdn = pci_get_pdn(dev);
-	struct pnv_ioda_pe *pe;
-	int pe_num;
-
-	if (!pdn) {
-		pr_err("%s: Device tree node not associated properly\n",
-			   pci_name(dev));
-		return NULL;
-	}
-	if (pdn->pe_number != IODA_INVALID_PE)
-		return NULL;
-
-	/* PE#0 has been pre-set */
-	if (dev->bus->number == 0)
-		pe_num = 0;
-	else
-		pe_num = pnv_ioda_alloc_pe(phb);
-	if (pe_num == IODA_INVALID_PE) {
-		pr_warning("%s: Not enough PE# available, disabling device\n",
-			   pci_name(dev));
-		return NULL;
-	}
-
-	/* NOTE: We get only one ref to the pci_dev for the pdn, not for the
-	 * pointer in the PE data structure, both should be destroyed at the
-	 * same time. However, this needs to be looked at more closely again
-	 * once we actually start removing things (Hotplug, SR-IOV, ...)
-	 *
-	 * At some point we want to remove the PDN completely anyways
-	 */
-	pe = &phb->ioda.pe_array[pe_num];
-	pci_dev_get(dev);
-	pdn->pcidev = dev;
-	pdn->pe_number = pe_num;
-	pe->pdev = dev;
-	pe->pbus = NULL;
-	pe->tce32_seg = -1;
-	pe->mve_number = -1;
-	pe->rid = dev->bus->number << 8 | pdn->devfn;
-
-	pe_info(pe, "Associated device to PE\n");
-
-	if (pnv_ioda_configure_pe(phb, pe)) {
-		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
-		pdn->pe_number = IODA_INVALID_PE;
-		pe->pdev = NULL;
-		pci_dev_put(dev);
-		return NULL;
-	}
-
-	/* Assign a DMA weight to the device */
-	pe->dma_weight = pnv_ioda_dma_weight(dev);
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
-	/* Link the PE */
-	pnv_ioda_link_pe_by_weight(phb, pe);
-
-	return pe;
-}
-#endif /* Useful for SRIOV case */
-
 static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *dev;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 09/21] powerpc/eeh: Delay probing EEH device during hotplug
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH
devices in early stage, which is reasonable to pSeries platform.
However, it's wrong for PowerNV platform because the PE# isn't
determined until the resources (IO and MMIO) are assigned to
PE. So we have to delay probing EEH devices for PowerNV platform
until that point.

Fixes: 1c509148b ("powerpc/eeh: Do probe on pci_dn")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c63d1eb..b5847dc 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1056,6 +1056,9 @@ void eeh_add_device_early(struct pci_dn *pdn)
 	if (!edev || !eeh_enabled())
 		return;
 
+	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))
+		return;
+
 	/* USB Bus children of PCI devices will not have BUID's */
 	phb = edev->phb;
 	if (NULL == phb ||
@@ -1105,6 +1108,10 @@ void eeh_add_device_late(struct pci_dev *dev)
 
 	pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 	edev = pdn_to_eeh_dev(pdn);
+	if (eeh_has_flag(EEH_PROBE_MODE_DEV))
+		eeh_ops->probe(pdn, NULL);
+
+	/* Check if PCI device and EEH device has been bound */
 	if (edev->pdev == dev) {
 		pr_debug("EEH: Already referenced !\n");
 		return;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 09/21] powerpc/eeh: Delay probing EEH device during hotplug
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH
devices in early stage, which is reasonable to pSeries platform.
However, it's wrong for PowerNV platform because the PE# isn't
determined until the resources (IO and MMIO) are assigned to
PE. So we have to delay probing EEH devices for PowerNV platform
until that point.

Fixes: 1c509148b ("powerpc/eeh: Do probe on pci_dn")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c63d1eb..b5847dc 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1056,6 +1056,9 @@ void eeh_add_device_early(struct pci_dn *pdn)
 	if (!edev || !eeh_enabled())
 		return;
 
+	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))
+		return;
+
 	/* USB Bus children of PCI devices will not have BUID's */
 	phb = edev->phb;
 	if (NULL == phb ||
@@ -1105,6 +1108,10 @@ void eeh_add_device_late(struct pci_dev *dev)
 
 	pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 	edev = pdn_to_eeh_dev(pdn);
+	if (eeh_has_flag(EEH_PROBE_MODE_DEV))
+		eeh_ops->probe(pdn, NULL);
+
+	/* Check if PCI device and EEH device has been bound */
 	if (edev->pdev == dev) {
 		pr_debug("EEH: Already referenced !\n");
 		return;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 10/21] powerpc/powernv: Use PCI slot reset infrastructure
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

For PowerNV platform, running on top of skiboot, all PE level reset
should be routed to firmware if the bridge of the PE primary bus has
device-node property "ibm,reset-by-firmware". Otherwise, the kernel
has to issue hot reset on PE's primary bus despite the requested reset
types, which is the behaviour before the firmware supports PCI slot
reset. So the changes don't depend on the PCI slot reset capability
exposed from the firmware.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |   1 +
 arch/powerpc/include/asm/opal.h              |   4 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 206 +++++++++++++--------------
 3 files changed, 102 insertions(+), 109 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c5eb86f..2793d24 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -190,6 +190,7 @@ enum {
 #define EEH_RESET_DEACTIVATE	0	/* Deactivate the PE reset	*/
 #define EEH_RESET_HOT		1	/* Hot reset			*/
 #define EEH_RESET_FUNDAMENTAL	3	/* Fundamental reset		*/
+#define EEH_RESET_COMPLETE	4	/* PHB complete reset           */
 #define EEH_LOG_TEMP		1	/* EEH temporary error log	*/
 #define EEH_LOG_PERM		2	/* EEH permanent error log	*/
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..6d467df 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -129,7 +129,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
 int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
 					uint16_t dma_window_number, uint64_t pci_start_addr,
 					uint64_t pci_mem_size);
-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
 
 int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
 				   uint64_t diag_buffer_len);
@@ -145,7 +145,7 @@ int64_t opal_get_epow_status(__be64 *status);
 int64_t opal_set_system_attention_led(uint8_t led_action);
 int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
 			    __be16 *pci_error_type, __be16 *severity);
-int64_t opal_pci_poll(uint64_t phb_id);
+int64_t opal_pci_poll(uint64_t id, uint8_t *val);
 int64_t opal_return_cpu(void);
 int64_t opal_check_token(uint64_t token);
 int64_t opal_reinit_cpus(uint64_t flags);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index ce738ab..3c01095 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -742,12 +742,12 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
 	return ret;
 }
 
-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
+static s64 pnv_eeh_poll(uint64_t id)
 {
 	s64 rc = OPAL_HARDWARE;
 
 	while (1) {
-		rc = opal_pci_poll(phb->opal_id);
+		rc = opal_pci_poll(id, NULL);
 		if (rc <= 0)
 			break;
 
@@ -763,84 +763,38 @@ static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
+	uint8_t scope;
 	s64 rc = OPAL_HARDWARE;
 
 	pr_debug("%s: Reset PHB#%x, option=%d\n",
 		 __func__, hose->global_number, option);
-
-	/* Issue PHB complete reset request */
-	if (option == EEH_RESET_FUNDAMENTAL ||
-	    option == EEH_RESET_HOT)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PHB_COMPLETE,
-				    OPAL_ASSERT_RESET);
-	else if (option == EEH_RESET_DEACTIVATE)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PHB_COMPLETE,
-				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
-
-	/*
-	 * Poll state of the PHB until the request is done
-	 * successfully. The PHB reset is usually PHB complete
-	 * reset followed by hot reset on root bus. So we also
-	 * need the PCI bus settlement delay.
-	 */
-	rc = pnv_eeh_phb_poll(phb);
-	if (option == EEH_RESET_DEACTIVATE) {
-		if (system_state < SYSTEM_RUNNING)
-			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
-		else
-			msleep(EEH_PE_RST_SETTLE_TIME);
+	switch (option) {
+	case EEH_RESET_HOT:
+		scope = OPAL_RESET_PCI_HOT;
+		break;
+	case EEH_RESET_FUNDAMENTAL:
+		scope = OPAL_RESET_PCI_FUNDAMENTAL;
+		break;
+	case EEH_RESET_COMPLETE:
+		scope = OPAL_RESET_PHB_COMPLETE;
+		break;
+	case EEH_RESET_DEACTIVATE:
+		return 0;
+	default:
+		pr_warn("%s: Unsupported option %d\n",
+			__func__, option);
+		return -EINVAL;
 	}
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
 
-	return 0;
-}
-
-static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
-{
-	struct pnv_phb *phb = hose->private_data;
-	s64 rc = OPAL_HARDWARE;
+	/* Issue reset and poll until it's completed */
+	rc = opal_pci_reset(phb->opal_id, scope, OPAL_ASSERT_RESET);
+	if (rc > 0)
+		rc = pnv_eeh_poll(phb->opal_id);
 
-	pr_debug("%s: Reset PHB#%x, option=%d\n",
-		 __func__, hose->global_number, option);
-
-	/*
-	 * During the reset deassert time, we needn't care
-	 * the reset scope because the firmware does nothing
-	 * for fundamental or hot reset during deassert phase.
-	 */
-	if (option == EEH_RESET_FUNDAMENTAL)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PCI_FUNDAMENTAL,
-				    OPAL_ASSERT_RESET);
-	else if (option == EEH_RESET_HOT)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PCI_HOT,
-				    OPAL_ASSERT_RESET);
-	else if (option == EEH_RESET_DEACTIVATE)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PCI_HOT,
-				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
-
-	/* Poll state of the PHB until the request is done */
-	rc = pnv_eeh_phb_poll(phb);
-	if (option == EEH_RESET_DEACTIVATE)
-		msleep(EEH_PE_RST_SETTLE_TIME);
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
-
-	return 0;
+	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
 }
 
-static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
+static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 {
 	struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -891,14 +845,57 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	struct device_node *dn = dev ? pci_device_to_OF_node(dev) : NULL;
+	uint64_t id = (0x1ul << 60);
+	uint8_t scope;
+	s64 rc;
+
+	/*
+	 * If the firmware can't handle it, we will issue hot reset
+	 * on the secondary bus despite the requested reset type
+	 */
+	if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
+		return __pnv_eeh_bridge_reset(dev, option);
+
+	/* The firmware can handle the request */
+	switch (option) {
+	case EEH_RESET_HOT:
+		scope = OPAL_RESET_PCI_HOT;
+		break;
+	case EEH_RESET_FUNDAMENTAL:
+		scope = OPAL_RESET_PCI_FUNDAMENTAL;
+		break;
+	case EEH_RESET_DEACTIVATE:
+		return 0;
+	case EEH_RESET_COMPLETE:
+	default:
+		pr_warn("%s: Unsupported option %d on device %s\n",
+			__func__, option, pci_name(dev));
+		return -EINVAL;
+	}
+
+	hose = pci_bus_to_host(dev->bus);
+	phb = hose->private_data;
+	id |= (dev->bus->number << 24) | (dev->devfn << 16) | phb->opal_id;
+	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
+	if (rc > 0)
+		rc = pnv_eeh_poll(id);
+
+	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *hose;
 
 	if (pci_is_root_bus(dev->bus)) {
 		hose = pci_bus_to_host(dev->bus);
-		pnv_eeh_root_reset(hose, EEH_RESET_HOT);
-		pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
+		pnv_eeh_phb_reset(hose, EEH_RESET_HOT);
+		pnv_eeh_phb_reset(hose, EEH_RESET_DEACTIVATE);
 	} else {
 		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
 		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
@@ -920,8 +917,9 @@ void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 {
 	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb;
 	struct pci_bus *bus;
-	int ret;
+	s64 rc;
 
 	/*
 	 * For PHB reset, we always have complete reset. For those PEs whose
@@ -937,43 +935,37 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 	 * reset. The side effect is that EEH core has to clear the frozen
 	 * state explicitly after BAR restore.
 	 */
-	if (pe->type & EEH_PE_PHB) {
-		ret = pnv_eeh_phb_reset(hose, option);
-	} else {
-		struct pnv_phb *phb;
-		s64 rc;
+	if (pe->type & EEH_PE_PHB)
+		return pnv_eeh_phb_reset(hose, EEH_RESET_COMPLETE);
 
-		/*
-		 * The frozen PE might be caused by PAPR error injection
-		 * registers, which are expected to be cleared after hitting
-		 * frozen PE as stated in the hardware spec. Unfortunately,
-		 * that's not true on P7IOC. So we have to clear it manually
-		 * to avoid recursive EEH errors during recovery.
-		 */
-		phb = hose->private_data;
-		if (phb->model == PNV_PHB_MODEL_P7IOC &&
-		    (option == EEH_RESET_HOT ||
-		    option == EEH_RESET_FUNDAMENTAL)) {
-			rc = opal_pci_reset(phb->opal_id,
-					    OPAL_RESET_PHB_ERROR,
-					    OPAL_ASSERT_RESET);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Failure %lld clearing "
-					"error injection registers\n",
-					__func__, rc);
-				return -EIO;
-			}
+	/*
+	 * The frozen PE might be caused by PAPR error injection
+	 * registers, which are expected to be cleared after hitting
+	 * frozen PE as stated in the hardware spec. Unfortunately,
+	 * that's not true on P7IOC. So we have to clear it manually
+	 * to avoid recursive EEH errors during recovery.
+	 */
+	phb = hose->private_data;
+	if (phb->model == PNV_PHB_MODEL_P7IOC &&
+	    (option == EEH_RESET_HOT ||
+	    option == EEH_RESET_FUNDAMENTAL)) {
+		rc = opal_pci_reset(phb->opal_id,
+				    OPAL_RESET_PHB_ERROR,
+				    OPAL_ASSERT_RESET);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Failure %lld clearing error "
+				"injection registers on PHB#%d\n",
+				__func__, rc, hose->global_number);
+			return -EIO;
 		}
-
-		bus = eeh_pe_bus_get(pe);
-		if (pci_is_root_bus(bus) ||
-			pci_is_root_bus(bus->parent))
-			ret = pnv_eeh_root_reset(hose, option);
-		else
-			ret = pnv_eeh_bridge_reset(bus->self, option);
 	}
 
-	return ret;
+	/* Route the reset request to PHB or upstream bridge */
+	bus = eeh_pe_bus_get(pe);
+	if (pci_is_root_bus(bus))
+		return pnv_eeh_phb_reset(hose, option);
+
+	return pnv_eeh_bridge_reset(bus->self, option);
 }
 
 /**
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 10/21] powerpc/powernv: Use PCI slot reset infrastructure
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

For PowerNV platform, running on top of skiboot, all PE level reset
should be routed to firmware if the bridge of the PE primary bus has
device-node property "ibm,reset-by-firmware". Otherwise, the kernel
has to issue hot reset on PE's primary bus despite the requested reset
types, which is the behaviour before the firmware supports PCI slot
reset. So the changes don't depend on the PCI slot reset capability
exposed from the firmware.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |   1 +
 arch/powerpc/include/asm/opal.h              |   4 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 206 +++++++++++++--------------
 3 files changed, 102 insertions(+), 109 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c5eb86f..2793d24 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -190,6 +190,7 @@ enum {
 #define EEH_RESET_DEACTIVATE	0	/* Deactivate the PE reset	*/
 #define EEH_RESET_HOT		1	/* Hot reset			*/
 #define EEH_RESET_FUNDAMENTAL	3	/* Fundamental reset		*/
+#define EEH_RESET_COMPLETE	4	/* PHB complete reset           */
 #define EEH_LOG_TEMP		1	/* EEH temporary error log	*/
 #define EEH_LOG_PERM		2	/* EEH permanent error log	*/
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..6d467df 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -129,7 +129,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
 int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
 					uint16_t dma_window_number, uint64_t pci_start_addr,
 					uint64_t pci_mem_size);
-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
 
 int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
 				   uint64_t diag_buffer_len);
@@ -145,7 +145,7 @@ int64_t opal_get_epow_status(__be64 *status);
 int64_t opal_set_system_attention_led(uint8_t led_action);
 int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
 			    __be16 *pci_error_type, __be16 *severity);
-int64_t opal_pci_poll(uint64_t phb_id);
+int64_t opal_pci_poll(uint64_t id, uint8_t *val);
 int64_t opal_return_cpu(void);
 int64_t opal_check_token(uint64_t token);
 int64_t opal_reinit_cpus(uint64_t flags);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index ce738ab..3c01095 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -742,12 +742,12 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
 	return ret;
 }
 
-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
+static s64 pnv_eeh_poll(uint64_t id)
 {
 	s64 rc = OPAL_HARDWARE;
 
 	while (1) {
-		rc = opal_pci_poll(phb->opal_id);
+		rc = opal_pci_poll(id, NULL);
 		if (rc <= 0)
 			break;
 
@@ -763,84 +763,38 @@ static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
+	uint8_t scope;
 	s64 rc = OPAL_HARDWARE;
 
 	pr_debug("%s: Reset PHB#%x, option=%d\n",
 		 __func__, hose->global_number, option);
-
-	/* Issue PHB complete reset request */
-	if (option == EEH_RESET_FUNDAMENTAL ||
-	    option == EEH_RESET_HOT)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PHB_COMPLETE,
-				    OPAL_ASSERT_RESET);
-	else if (option == EEH_RESET_DEACTIVATE)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PHB_COMPLETE,
-				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
-
-	/*
-	 * Poll state of the PHB until the request is done
-	 * successfully. The PHB reset is usually PHB complete
-	 * reset followed by hot reset on root bus. So we also
-	 * need the PCI bus settlement delay.
-	 */
-	rc = pnv_eeh_phb_poll(phb);
-	if (option == EEH_RESET_DEACTIVATE) {
-		if (system_state < SYSTEM_RUNNING)
-			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
-		else
-			msleep(EEH_PE_RST_SETTLE_TIME);
+	switch (option) {
+	case EEH_RESET_HOT:
+		scope = OPAL_RESET_PCI_HOT;
+		break;
+	case EEH_RESET_FUNDAMENTAL:
+		scope = OPAL_RESET_PCI_FUNDAMENTAL;
+		break;
+	case EEH_RESET_COMPLETE:
+		scope = OPAL_RESET_PHB_COMPLETE;
+		break;
+	case EEH_RESET_DEACTIVATE:
+		return 0;
+	default:
+		pr_warn("%s: Unsupported option %d\n",
+			__func__, option);
+		return -EINVAL;
 	}
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
 
-	return 0;
-}
-
-static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
-{
-	struct pnv_phb *phb = hose->private_data;
-	s64 rc = OPAL_HARDWARE;
+	/* Issue reset and poll until it's completed */
+	rc = opal_pci_reset(phb->opal_id, scope, OPAL_ASSERT_RESET);
+	if (rc > 0)
+		rc = pnv_eeh_poll(phb->opal_id);
 
-	pr_debug("%s: Reset PHB#%x, option=%d\n",
-		 __func__, hose->global_number, option);
-
-	/*
-	 * During the reset deassert time, we needn't care
-	 * the reset scope because the firmware does nothing
-	 * for fundamental or hot reset during deassert phase.
-	 */
-	if (option == EEH_RESET_FUNDAMENTAL)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PCI_FUNDAMENTAL,
-				    OPAL_ASSERT_RESET);
-	else if (option == EEH_RESET_HOT)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PCI_HOT,
-				    OPAL_ASSERT_RESET);
-	else if (option == EEH_RESET_DEACTIVATE)
-		rc = opal_pci_reset(phb->opal_id,
-				    OPAL_RESET_PCI_HOT,
-				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
-
-	/* Poll state of the PHB until the request is done */
-	rc = pnv_eeh_phb_poll(phb);
-	if (option == EEH_RESET_DEACTIVATE)
-		msleep(EEH_PE_RST_SETTLE_TIME);
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
-
-	return 0;
+	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
 }
 
-static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
+static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 {
 	struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -891,14 +845,57 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	struct device_node *dn = dev ? pci_device_to_OF_node(dev) : NULL;
+	uint64_t id = (0x1ul << 60);
+	uint8_t scope;
+	s64 rc;
+
+	/*
+	 * If the firmware can't handle it, we will issue hot reset
+	 * on the secondary bus despite the requested reset type
+	 */
+	if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
+		return __pnv_eeh_bridge_reset(dev, option);
+
+	/* The firmware can handle the request */
+	switch (option) {
+	case EEH_RESET_HOT:
+		scope = OPAL_RESET_PCI_HOT;
+		break;
+	case EEH_RESET_FUNDAMENTAL:
+		scope = OPAL_RESET_PCI_FUNDAMENTAL;
+		break;
+	case EEH_RESET_DEACTIVATE:
+		return 0;
+	case EEH_RESET_COMPLETE:
+	default:
+		pr_warn("%s: Unsupported option %d on device %s\n",
+			__func__, option, pci_name(dev));
+		return -EINVAL;
+	}
+
+	hose = pci_bus_to_host(dev->bus);
+	phb = hose->private_data;
+	id |= (dev->bus->number << 24) | (dev->devfn << 16) | phb->opal_id;
+	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
+	if (rc > 0)
+		rc = pnv_eeh_poll(id);
+
+	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *hose;
 
 	if (pci_is_root_bus(dev->bus)) {
 		hose = pci_bus_to_host(dev->bus);
-		pnv_eeh_root_reset(hose, EEH_RESET_HOT);
-		pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
+		pnv_eeh_phb_reset(hose, EEH_RESET_HOT);
+		pnv_eeh_phb_reset(hose, EEH_RESET_DEACTIVATE);
 	} else {
 		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
 		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
@@ -920,8 +917,9 @@ void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 {
 	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb;
 	struct pci_bus *bus;
-	int ret;
+	s64 rc;
 
 	/*
 	 * For PHB reset, we always have complete reset. For those PEs whose
@@ -937,43 +935,37 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 	 * reset. The side effect is that EEH core has to clear the frozen
 	 * state explicitly after BAR restore.
 	 */
-	if (pe->type & EEH_PE_PHB) {
-		ret = pnv_eeh_phb_reset(hose, option);
-	} else {
-		struct pnv_phb *phb;
-		s64 rc;
+	if (pe->type & EEH_PE_PHB)
+		return pnv_eeh_phb_reset(hose, EEH_RESET_COMPLETE);
 
-		/*
-		 * The frozen PE might be caused by PAPR error injection
-		 * registers, which are expected to be cleared after hitting
-		 * frozen PE as stated in the hardware spec. Unfortunately,
-		 * that's not true on P7IOC. So we have to clear it manually
-		 * to avoid recursive EEH errors during recovery.
-		 */
-		phb = hose->private_data;
-		if (phb->model == PNV_PHB_MODEL_P7IOC &&
-		    (option == EEH_RESET_HOT ||
-		    option == EEH_RESET_FUNDAMENTAL)) {
-			rc = opal_pci_reset(phb->opal_id,
-					    OPAL_RESET_PHB_ERROR,
-					    OPAL_ASSERT_RESET);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Failure %lld clearing "
-					"error injection registers\n",
-					__func__, rc);
-				return -EIO;
-			}
+	/*
+	 * The frozen PE might be caused by PAPR error injection
+	 * registers, which are expected to be cleared after hitting
+	 * frozen PE as stated in the hardware spec. Unfortunately,
+	 * that's not true on P7IOC. So we have to clear it manually
+	 * to avoid recursive EEH errors during recovery.
+	 */
+	phb = hose->private_data;
+	if (phb->model == PNV_PHB_MODEL_P7IOC &&
+	    (option == EEH_RESET_HOT ||
+	    option == EEH_RESET_FUNDAMENTAL)) {
+		rc = opal_pci_reset(phb->opal_id,
+				    OPAL_RESET_PHB_ERROR,
+				    OPAL_ASSERT_RESET);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Failure %lld clearing error "
+				"injection registers on PHB#%d\n",
+				__func__, rc, hose->global_number);
+			return -EIO;
 		}
-
-		bus = eeh_pe_bus_get(pe);
-		if (pci_is_root_bus(bus) ||
-			pci_is_root_bus(bus->parent))
-			ret = pnv_eeh_root_reset(hose, option);
-		else
-			ret = pnv_eeh_bridge_reset(bus->self, option);
 	}
 
-	return ret;
+	/* Route the reset request to PHB or upstream bridge */
+	bus = eeh_pe_bus_get(pe);
+	if (pci_is_root_bus(bus))
+		return pnv_eeh_phb_reset(hose, option);
+
+	return pnv_eeh_bridge_reset(bus->self, option);
 }
 
 /**
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 11/21] powerpc/powernv: Fundamental reset for PCI bus reset
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

Function pnv_pci_reset_secondary_bus() is used to reset specified
PCI bus, which is leaded by root complex or PCI bridge. That means
the function shouldn't be called on PCI root bus and the patch
removes the logic for that case.

Also, some adapters beneath the indicated PCI bus may require
fundamental reset in order to successfully reload their firmwares
after the reset. The patch translates hot reset to fundamental reset
for that case.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 35 +++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 3c01095..58e4dcf 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -888,18 +888,35 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
 }
 
-void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
 {
-	struct pci_controller *hose;
+	int *freset = data;
 
-	if (pci_is_root_bus(dev->bus)) {
-		hose = pci_bus_to_host(dev->bus);
-		pnv_eeh_phb_reset(hose, EEH_RESET_HOT);
-		pnv_eeh_phb_reset(hose, EEH_RESET_DEACTIVATE);
-	} else {
-		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
-		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
+	/*
+	 * Stop the iteration immediately if there is any
+	 * one PCI device requesting fundamental reset
+	 */
+	*freset |= pdev->needs_freset;
+	return *freset;
+}
+
+void pnv_pci_reset_secondary_bus(struct pci_dev *pdev)
+{
+	int option = EEH_RESET_HOT;
+	int freset = 0;
+
+	/* Check if there're any PCI devices asking for fundamental reset */
+	if (pdev->subordinate) {
+		pci_walk_bus(pdev->subordinate,
+			     pnv_pci_dev_reset_type,
+			     &freset);
+		if (freset)
+			option = EEH_RESET_FUNDAMENTAL;
 	}
+
+	/* Issue the requested type of reset */
+	pnv_eeh_bridge_reset(pdev, option);
+	pnv_eeh_bridge_reset(pdev, EEH_RESET_DEACTIVATE);
 }
 
 /**
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 11/21] powerpc/powernv: Fundamental reset for PCI bus reset
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

Function pnv_pci_reset_secondary_bus() is used to reset specified
PCI bus, which is leaded by root complex or PCI bridge. That means
the function shouldn't be called on PCI root bus and the patch
removes the logic for that case.

Also, some adapters beneath the indicated PCI bus may require
fundamental reset in order to successfully reload their firmwares
after the reset. The patch translates hot reset to fundamental reset
for that case.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 35 +++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 3c01095..58e4dcf 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -888,18 +888,35 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
 }
 
-void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
 {
-	struct pci_controller *hose;
+	int *freset = data;
 
-	if (pci_is_root_bus(dev->bus)) {
-		hose = pci_bus_to_host(dev->bus);
-		pnv_eeh_phb_reset(hose, EEH_RESET_HOT);
-		pnv_eeh_phb_reset(hose, EEH_RESET_DEACTIVATE);
-	} else {
-		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
-		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
+	/*
+	 * Stop the iteration immediately if there is any
+	 * one PCI device requesting fundamental reset
+	 */
+	*freset |= pdev->needs_freset;
+	return *freset;
+}
+
+void pnv_pci_reset_secondary_bus(struct pci_dev *pdev)
+{
+	int option = EEH_RESET_HOT;
+	int freset = 0;
+
+	/* Check if there're any PCI devices asking for fundamental reset */
+	if (pdev->subordinate) {
+		pci_walk_bus(pdev->subordinate,
+			     pnv_pci_dev_reset_type,
+			     &freset);
+		if (freset)
+			option = EEH_RESET_FUNDAMENTAL;
 	}
+
+	/* Issue the requested type of reset */
+	pnv_eeh_bridge_reset(pdev, option);
+	pnv_eeh_bridge_reset(pdev, EEH_RESET_DEACTIVATE);
 }
 
 /**
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 12/21] powerpc/pci: Don't scan empty slot
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

In hotplug case, function pcibios_add_pci_devices() is called to
rescan the specified PCI bus, which might not have any child devices.
Access to the PCI bus's child device node will cause kernel crash
without exception. The patch adds condition of skipping scanning
PCI bus without child devices, in order to avoid kernel crash.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 0040343..651a866a 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -92,7 +92,8 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
 	if (mode == PCI_PROBE_DEVTREE) {
 		/* use ofdt-based probe */
 		of_rescan_bus(dn, bus);
-	} else if (mode == PCI_PROBE_NORMAL) {
+	} else if (mode == PCI_PROBE_NORMAL &&
+		   dn->child && PCI_DN(dn->child)) {
 		/*
 		 * Use legacy probe. In the partial hotplug case, we
 		 * probably have grandchildren devices unplugged. So
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 12/21] powerpc/pci: Don't scan empty slot
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

In hotplug case, function pcibios_add_pci_devices() is called to
rescan the specified PCI bus, which might not have any child devices.
Access to the PCI bus's child device node will cause kernel crash
without exception. The patch adds condition of skipping scanning
PCI bus without child devices, in order to avoid kernel crash.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 0040343..651a866a 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -92,7 +92,8 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
 	if (mode == PCI_PROBE_DEVTREE) {
 		/* use ofdt-based probe */
 		of_rescan_bus(dn, bus);
-	} else if (mode == PCI_PROBE_NORMAL) {
+	} else if (mode == PCI_PROBE_NORMAL &&
+		   dn->child && PCI_DN(dn->child)) {
 		/*
 		 * Use legacy probe. In the partial hotplug case, we
 		 * probably have grandchildren devices unplugged. So
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 13/21] powerpc/pci: Move pcibios_find_pci_bus() around
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The patch moves pcibios_find_pci_bus() to PPC kerenl directory so
that it can be reused by hotplug code for pSeries and PowerNV
platform at the same time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/pci-hotplug.c          | 36 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pci_dlpar.c | 32 --------------------------
 2 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 651a866a..67094fd 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -21,6 +21,42 @@
 #include <asm/firmware.h>
 #include <asm/eeh.h>
 
+static struct pci_bus *find_pci_bus(struct pci_bus *bus,
+				    struct device_node *dn)
+{
+	struct pci_bus *tmp, *child = NULL;
+	struct device_node *busdn;
+
+	busdn = pci_bus_to_OF_node(bus);
+	if (busdn == dn)
+		return bus;
+
+	list_for_each_entry(tmp, &bus->children, node) {
+		child = find_pci_bus(tmp, dn);
+		if (child)
+			break;
+	}
+
+	return child;
+}
+
+/**
+ * pcibios_find_pci_bus - find PCI bus according to the given device node
+ * @dn: Device node
+ *
+ * Find the corresponding PCI bus according to the given device node.
+ */
+struct pci_bus *pcibios_find_pci_bus(struct device_node *dn)
+{
+	struct pci_dn *pdn = PCI_DN(dn);
+
+	if (!pdn  || !pdn->phb || !pdn->phb->bus)
+		return NULL;
+
+	return find_pci_bus(pdn->phb->bus, dn);
+}
+EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
+
 /**
  * pcibios_release_device - release PCI device
  * @dev: PCI device
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 5d4a3df..906dbaa 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -34,38 +34,6 @@
 
 #include "pseries.h"
 
-static struct pci_bus *
-find_bus_among_children(struct pci_bus *bus,
-                        struct device_node *dn)
-{
-	struct pci_bus *child = NULL;
-	struct pci_bus *tmp;
-	struct device_node *busdn;
-
-	busdn = pci_bus_to_OF_node(bus);
-	if (busdn == dn)
-		return bus;
-
-	list_for_each_entry(tmp, &bus->children, node) {
-		child = find_bus_among_children(tmp, dn);
-		if (child)
-			break;
-	};
-	return child;
-}
-
-struct pci_bus *
-pcibios_find_pci_bus(struct device_node *dn)
-{
-	struct pci_dn *pdn = dn->data;
-
-	if (!pdn  || !pdn->phb || !pdn->phb->bus)
-		return NULL;
-
-	return find_bus_among_children(pdn->phb->bus, dn);
-}
-EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
-
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
 	struct pci_controller *phb;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 13/21] powerpc/pci: Move pcibios_find_pci_bus() around
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The patch moves pcibios_find_pci_bus() to PPC kerenl directory so
that it can be reused by hotplug code for pSeries and PowerNV
platform at the same time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/pci-hotplug.c          | 36 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pci_dlpar.c | 32 --------------------------
 2 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 651a866a..67094fd 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -21,6 +21,42 @@
 #include <asm/firmware.h>
 #include <asm/eeh.h>
 
+static struct pci_bus *find_pci_bus(struct pci_bus *bus,
+				    struct device_node *dn)
+{
+	struct pci_bus *tmp, *child = NULL;
+	struct device_node *busdn;
+
+	busdn = pci_bus_to_OF_node(bus);
+	if (busdn == dn)
+		return bus;
+
+	list_for_each_entry(tmp, &bus->children, node) {
+		child = find_pci_bus(tmp, dn);
+		if (child)
+			break;
+	}
+
+	return child;
+}
+
+/**
+ * pcibios_find_pci_bus - find PCI bus according to the given device node
+ * @dn: Device node
+ *
+ * Find the corresponding PCI bus according to the given device node.
+ */
+struct pci_bus *pcibios_find_pci_bus(struct device_node *dn)
+{
+	struct pci_dn *pdn = PCI_DN(dn);
+
+	if (!pdn  || !pdn->phb || !pdn->phb->bus)
+		return NULL;
+
+	return find_pci_bus(pdn->phb->bus, dn);
+}
+EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
+
 /**
  * pcibios_release_device - release PCI device
  * @dev: PCI device
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 5d4a3df..906dbaa 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -34,38 +34,6 @@
 
 #include "pseries.h"
 
-static struct pci_bus *
-find_bus_among_children(struct pci_bus *bus,
-                        struct device_node *dn)
-{
-	struct pci_bus *child = NULL;
-	struct pci_bus *tmp;
-	struct device_node *busdn;
-
-	busdn = pci_bus_to_OF_node(bus);
-	if (busdn == dn)
-		return bus;
-
-	list_for_each_entry(tmp, &bus->children, node) {
-		child = find_bus_among_children(tmp, dn);
-		if (child)
-			break;
-	};
-	return child;
-}
-
-struct pci_bus *
-pcibios_find_pci_bus(struct device_node *dn)
-{
-	struct pci_dn *pdn = dn->data;
-
-	if (!pdn  || !pdn->phb || !pdn->phb->bus)
-		return NULL;
-
-	return find_bus_among_children(pdn->phb->bus, dn);
-}
-EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
-
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
 	struct pci_controller *phb;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 14/21] powerpc/powernv: Introduce pnv_pci_poll()
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

We might not get some PCI slot information (e.g. power status)
immediately by OPAL API. Instead, opal_pci_poll() need to be called
for the required information.

The patch introduces pnv_pci_poll(), which bases on original
pnv_eeh_poll(), to cover the above case

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 28 ++--------------------------
 arch/powerpc/platforms/powernv/pci.c         | 16 ++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h         |  1 +
 3 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 58e4dcf..9253b9e 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -742,24 +742,6 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
 	return ret;
 }
 
-static s64 pnv_eeh_poll(uint64_t id)
-{
-	s64 rc = OPAL_HARDWARE;
-
-	while (1) {
-		rc = opal_pci_poll(id, NULL);
-		if (rc <= 0)
-			break;
-
-		if (system_state < SYSTEM_RUNNING)
-			udelay(1000 * rc);
-		else
-			msleep(rc);
-	}
-
-	return rc;
-}
-
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
@@ -788,10 +770,7 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 
 	/* Issue reset and poll until it's completed */
 	rc = opal_pci_reset(phb->opal_id, scope, OPAL_ASSERT_RESET);
-	if (rc > 0)
-		rc = pnv_eeh_poll(phb->opal_id);
-
-	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+	return pnv_pci_poll(phb->opal_id, rc, NULL);
 }
 
 static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
@@ -882,10 +861,7 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	phb = hose->private_data;
 	id |= (dev->bus->number << 24) | (dev->devfn << 16) | phb->opal_id;
 	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
-	if (rc > 0)
-		rc = pnv_eeh_poll(id);
-
-	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+	return pnv_pci_poll(id, rc, NULL);
 }
 
 static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index bca2aeb..a2da9a3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -44,6 +44,22 @@
 #define cfg_dbg(fmt...)	do { } while(0)
 //#define cfg_dbg(fmt...)	printk(fmt)
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval)
+{
+	while (rval > 0) {
+		if (system_state < SYSTEM_RUNNING)
+			udelay(1000 * rval);
+		else
+			msleep(rval);
+
+		rval = opal_pci_poll(id, pval);
+		if (rval == OPAL_SUCCESS && pval)
+			rval = opal_pci_poll(id, pval);
+	}
+
+	return rval ? -EIO : 0;
+}
+
 #ifdef CONFIG_PCI_MSI
 static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 8b10f01..82c5539 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -202,6 +202,7 @@ struct pnv_phb {
 
 extern struct pci_ops pnv_pci_ops;
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval);
 void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
 				unsigned char *log_buff);
 int pnv_pci_cfg_read(struct pci_dn *pdn,
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 14/21] powerpc/powernv: Introduce pnv_pci_poll()
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

We might not get some PCI slot information (e.g. power status)
immediately by OPAL API. Instead, opal_pci_poll() need to be called
for the required information.

The patch introduces pnv_pci_poll(), which bases on original
pnv_eeh_poll(), to cover the above case

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 28 ++--------------------------
 arch/powerpc/platforms/powernv/pci.c         | 16 ++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h         |  1 +
 3 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 58e4dcf..9253b9e 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -742,24 +742,6 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
 	return ret;
 }
 
-static s64 pnv_eeh_poll(uint64_t id)
-{
-	s64 rc = OPAL_HARDWARE;
-
-	while (1) {
-		rc = opal_pci_poll(id, NULL);
-		if (rc <= 0)
-			break;
-
-		if (system_state < SYSTEM_RUNNING)
-			udelay(1000 * rc);
-		else
-			msleep(rc);
-	}
-
-	return rc;
-}
-
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
@@ -788,10 +770,7 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 
 	/* Issue reset and poll until it's completed */
 	rc = opal_pci_reset(phb->opal_id, scope, OPAL_ASSERT_RESET);
-	if (rc > 0)
-		rc = pnv_eeh_poll(phb->opal_id);
-
-	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+	return pnv_pci_poll(phb->opal_id, rc, NULL);
 }
 
 static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
@@ -882,10 +861,7 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	phb = hose->private_data;
 	id |= (dev->bus->number << 24) | (dev->devfn << 16) | phb->opal_id;
 	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
-	if (rc > 0)
-		rc = pnv_eeh_poll(id);
-
-	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+	return pnv_pci_poll(id, rc, NULL);
 }
 
 static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index bca2aeb..a2da9a3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -44,6 +44,22 @@
 #define cfg_dbg(fmt...)	do { } while(0)
 //#define cfg_dbg(fmt...)	printk(fmt)
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval)
+{
+	while (rval > 0) {
+		if (system_state < SYSTEM_RUNNING)
+			udelay(1000 * rval);
+		else
+			msleep(rval);
+
+		rval = opal_pci_poll(id, pval);
+		if (rval == OPAL_SUCCESS && pval)
+			rval = opal_pci_poll(id, pval);
+	}
+
+	return rval ? -EIO : 0;
+}
+
 #ifdef CONFIG_PCI_MSI
 static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 8b10f01..82c5539 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -202,6 +202,7 @@ struct pnv_phb {
 
 extern struct pci_ops pnv_pci_ops;
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval);
 void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
 				unsigned char *log_buff);
 int pnv_pci_cfg_read(struct pci_dn *pdn,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 15/21] powerpc/powernv: Functions to get/reset PCI slot status
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The patch exports 3 functions, which base on corresponding OPAL
APIs to get or set PCI slot status. Those functions are going to
be used by PCI hotplug module in subsequent patches:

   pnv_pci_get_presence_status()  opal_pci_get_presence_status()
   pnv_pci_get_power_status()     opal_pci_get_power_status()
   pnv_pci_set_power_status()     opal_pci_set_power_status()

Besides, the patch also exports pnv_pci_hotplug_notifier() to allow
registering PCI hotplug notifier, which will be used to receive PCI
hotplug message from skiboot firmware.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal-api.h            |  7 +++-
 arch/powerpc/include/asm/opal.h                |  3 ++
 arch/powerpc/include/asm/pnv-pci.h             |  5 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  3 ++
 arch/powerpc/platforms/powernv/pci.c           | 45 ++++++++++++++++++++++++++
 5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 0321a90..29b407d 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -153,7 +153,10 @@
 #define OPAL_FLASH_READ				110
 #define OPAL_FLASH_WRITE			111
 #define OPAL_FLASH_ERASE			112
-#define OPAL_LAST				112
+#define OPAL_PCI_GET_PRESENCE_STATUS		116
+#define OPAL_PCI_GET_POWER_STATUS		117
+#define OPAL_PCI_SET_POWER_STATUS		118
+#define OPAL_LAST				118
 
 /* Device tree flags */
 
@@ -352,6 +355,8 @@ enum opal_msg_type {
 	OPAL_MSG_SHUTDOWN,		/* params[0] = 1 reboot, 0 shutdown */
 	OPAL_MSG_HMI_EVT,
 	OPAL_MSG_DPO,
+	OPAL_MSG_PRD,
+	OPAL_MSG_PCI_HOTPLUG,
 	OPAL_MSG_TYPE_MAX,
 };
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 6d467df..a0eb206 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -200,6 +200,9 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
 		uint64_t size, uint64_t token);
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
 		uint64_t token);
+int64_t opal_pci_get_presence_status(uint64_t id, uint8_t *status);
+int64_t opal_pci_get_power_status(uint64_t id, uint8_t *status);
+int64_t opal_pci_set_power_status(uint64_t id, uint8_t status);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index f9b4982..50d92a4 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,11 @@
 #include <linux/pci.h>
 #include <misc/cxl.h>
 
+extern int pnv_pci_get_presence_status(uint64_t id, uint8_t *status);
+extern int pnv_pci_get_power_status(uint64_t id, uint8_t *status);
+extern int pnv_pci_set_power_status(uint64_t id, uint8_t status);
+extern int pnv_pci_hotplug_notifier(struct notifier_block *nb, bool enable);
+
 int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
 int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 			   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a7ade94..aa95dcb 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -295,3 +295,6 @@ OPAL_CALL(opal_i2c_request,			OPAL_I2C_REQUEST);
 OPAL_CALL(opal_flash_read,			OPAL_FLASH_READ);
 OPAL_CALL(opal_flash_write,			OPAL_FLASH_WRITE);
 OPAL_CALL(opal_flash_erase,			OPAL_FLASH_ERASE);
+OPAL_CALL(opal_pci_get_presence_status,		OPAL_PCI_GET_PRESENCE_STATUS);
+OPAL_CALL(opal_pci_get_power_status,		OPAL_PCI_GET_POWER_STATUS);
+OPAL_CALL(opal_pci_set_power_status,		OPAL_PCI_SET_POWER_STATUS);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index a2da9a3..60e6d65 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -60,6 +60,51 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval)
 	return rval ? -EIO : 0;
 }
 
+int pnv_pci_get_presence_status(uint64_t id, uint8_t *status)
+{
+	long rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATUS))
+		return -ENXIO;
+
+	rc = opal_pci_get_presence_status(id, status);
+	return pnv_pci_poll(id, rc, status);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_presence_status);
+
+int pnv_pci_get_power_status(uint64_t id, uint8_t *status)
+{
+	long rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_POWER_STATUS))
+		return -ENXIO;
+
+	rc = opal_pci_get_power_status(id, status);
+	return pnv_pci_poll(id, rc, status);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_power_status);
+
+int pnv_pci_set_power_status(uint64_t id, uint8_t status)
+{
+	long rc;
+
+	if (!opal_check_token(OPAL_PCI_SET_POWER_STATUS))
+		return -ENXIO;
+
+	rc = opal_pci_set_power_status(id, status);
+	return pnv_pci_poll(id, rc, NULL);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_set_power_status);
+
+int pnv_pci_hotplug_notifier(struct notifier_block *nb, bool enable)
+{
+	if (enable)
+		return opal_message_notifier_register(OPAL_MSG_PCI_HOTPLUG, nb);
+
+	return opal_message_notifier_unregister(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier);
+
 #ifdef CONFIG_PCI_MSI
 static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 15/21] powerpc/powernv: Functions to get/reset PCI slot status
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The patch exports 3 functions, which base on corresponding OPAL
APIs to get or set PCI slot status. Those functions are going to
be used by PCI hotplug module in subsequent patches:

   pnv_pci_get_presence_status()  opal_pci_get_presence_status()
   pnv_pci_get_power_status()     opal_pci_get_power_status()
   pnv_pci_set_power_status()     opal_pci_set_power_status()

Besides, the patch also exports pnv_pci_hotplug_notifier() to allow
registering PCI hotplug notifier, which will be used to receive PCI
hotplug message from skiboot firmware.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal-api.h            |  7 +++-
 arch/powerpc/include/asm/opal.h                |  3 ++
 arch/powerpc/include/asm/pnv-pci.h             |  5 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  3 ++
 arch/powerpc/platforms/powernv/pci.c           | 45 ++++++++++++++++++++++++++
 5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 0321a90..29b407d 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -153,7 +153,10 @@
 #define OPAL_FLASH_READ				110
 #define OPAL_FLASH_WRITE			111
 #define OPAL_FLASH_ERASE			112
-#define OPAL_LAST				112
+#define OPAL_PCI_GET_PRESENCE_STATUS		116
+#define OPAL_PCI_GET_POWER_STATUS		117
+#define OPAL_PCI_SET_POWER_STATUS		118
+#define OPAL_LAST				118
 
 /* Device tree flags */
 
@@ -352,6 +355,8 @@ enum opal_msg_type {
 	OPAL_MSG_SHUTDOWN,		/* params[0] = 1 reboot, 0 shutdown */
 	OPAL_MSG_HMI_EVT,
 	OPAL_MSG_DPO,
+	OPAL_MSG_PRD,
+	OPAL_MSG_PCI_HOTPLUG,
 	OPAL_MSG_TYPE_MAX,
 };
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 6d467df..a0eb206 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -200,6 +200,9 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
 		uint64_t size, uint64_t token);
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
 		uint64_t token);
+int64_t opal_pci_get_presence_status(uint64_t id, uint8_t *status);
+int64_t opal_pci_get_power_status(uint64_t id, uint8_t *status);
+int64_t opal_pci_set_power_status(uint64_t id, uint8_t status);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index f9b4982..50d92a4 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,11 @@
 #include <linux/pci.h>
 #include <misc/cxl.h>
 
+extern int pnv_pci_get_presence_status(uint64_t id, uint8_t *status);
+extern int pnv_pci_get_power_status(uint64_t id, uint8_t *status);
+extern int pnv_pci_set_power_status(uint64_t id, uint8_t status);
+extern int pnv_pci_hotplug_notifier(struct notifier_block *nb, bool enable);
+
 int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
 int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 			   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a7ade94..aa95dcb 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -295,3 +295,6 @@ OPAL_CALL(opal_i2c_request,			OPAL_I2C_REQUEST);
 OPAL_CALL(opal_flash_read,			OPAL_FLASH_READ);
 OPAL_CALL(opal_flash_write,			OPAL_FLASH_WRITE);
 OPAL_CALL(opal_flash_erase,			OPAL_FLASH_ERASE);
+OPAL_CALL(opal_pci_get_presence_status,		OPAL_PCI_GET_PRESENCE_STATUS);
+OPAL_CALL(opal_pci_get_power_status,		OPAL_PCI_GET_POWER_STATUS);
+OPAL_CALL(opal_pci_set_power_status,		OPAL_PCI_SET_POWER_STATUS);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index a2da9a3..60e6d65 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -60,6 +60,51 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval)
 	return rval ? -EIO : 0;
 }
 
+int pnv_pci_get_presence_status(uint64_t id, uint8_t *status)
+{
+	long rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATUS))
+		return -ENXIO;
+
+	rc = opal_pci_get_presence_status(id, status);
+	return pnv_pci_poll(id, rc, status);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_presence_status);
+
+int pnv_pci_get_power_status(uint64_t id, uint8_t *status)
+{
+	long rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_POWER_STATUS))
+		return -ENXIO;
+
+	rc = opal_pci_get_power_status(id, status);
+	return pnv_pci_poll(id, rc, status);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_power_status);
+
+int pnv_pci_set_power_status(uint64_t id, uint8_t status)
+{
+	long rc;
+
+	if (!opal_check_token(OPAL_PCI_SET_POWER_STATUS))
+		return -ENXIO;
+
+	rc = opal_pci_set_power_status(id, status);
+	return pnv_pci_poll(id, rc, NULL);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_set_power_status);
+
+int pnv_pci_hotplug_notifier(struct notifier_block *nb, bool enable)
+{
+	if (enable)
+		return opal_message_notifier_register(OPAL_MSG_PCI_HOTPLUG, nb);
+
+	return opal_message_notifier_unregister(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier);
+
 #ifdef CONFIG_PCI_MSI
 static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 16/21] powerpc/pci: Delay creating pci_dn
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The pci_dn instances are allocated from memblock or bootmem when
creating PCI controller (hoses) in setup_arch(). The PCI hotplug,
which will be supported by proceeding patches, will release PCI
device nodes and their corresponding pci_dn on unplugging event.
The pci_dn instance memory chunks alloed from memblock or bootmem
are hard to reused after being released.

The patch delay creating pci_dn so that they can be allocated from
slab. In turn, the memory chunks for them can be reused after being
released without problem. The creation of eeh_dev instances, which
depends on pci_dn, is delayed a bit as well.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/ppc-pci.h     |  1 -
 arch/powerpc/kernel/eeh_dev.c          |  2 +-
 arch/powerpc/kernel/pci_dn.c           | 40 +++++++++++++++++++---------------
 arch/powerpc/platforms/maple/pci.c     | 35 +++++++++++++++++------------
 arch/powerpc/platforms/pasemi/pci.c    |  3 ---
 arch/powerpc/platforms/powermac/pci.c  | 39 ++++++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.c   |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  1 -
 8 files changed, 68 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 4122a86..7388316 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -40,7 +40,6 @@ void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
 
-extern void pci_devs_phb_init(void);
 extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
 
 /* From rtas_pci.h */
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index aabba94..f33ce5b 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -110,4 +110,4 @@ static int __init eeh_dev_phb_init(void)
 	return 0;
 }
 
-core_initcall(eeh_dev_phb_init);
+core_initcall_sync(eeh_dev_phb_init);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index b3b4df9..d3833af 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -277,7 +277,7 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	struct device_node *parent;
 	struct pci_dn *pdn;
 
-	pdn = zalloc_maybe_bootmem(sizeof(*pdn), GFP_KERNEL);
+	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
 	if (pdn == NULL)
 		return NULL;
 	dn->data = pdn;
@@ -442,33 +442,37 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	traverse_pci_devices(dn, update_dn_pci_info, phb);
 }
 
-/** 
+static void pci_dev_pdn_setup(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn;
+
+	if (pdev->dev.archdata.pci_data)
+		return;
+
+	/* Setup the fast path */
+	pdn = pci_get_pdn(pdev);
+	pdev->dev.archdata.pci_data = pdn;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_dev_pdn_setup);
+
+/*
  * pci_devs_phb_init - Initialize phbs and pci devs under them.
- * 
- * This routine walks over all phb's (pci-host bridges) on the
- * system, and sets up assorted pci-related structures 
+ *
+ * This routine walks over all phb's (pci-host bridges) on
+ * the system, and sets up assorted pci-related structures
  * (including pci info in the device node structs) for each
  * pci device found underneath.  This routine runs once,
  * early in the boot sequence.
  */
-void __init pci_devs_phb_init(void)
+static int __init pci_devs_phb_init(void)
 {
 	struct pci_controller *phb, *tmp;
 
 	/* This must be done first so the device nodes have valid pci info! */
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		pci_devs_phb_init_dynamic(phb);
-}
-
-static void pci_dev_pdn_setup(struct pci_dev *pdev)
-{
-	struct pci_dn *pdn;
 
-	if (pdev->dev.archdata.pci_data)
-		return;
-
-	/* Setup the fast path */
-	pdn = pci_get_pdn(pdev);
-	pdev->dev.archdata.pci_data = pdn;
+	return 0;
 }
-DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_dev_pdn_setup);
+
+core_initcall(pci_devs_phb_init);
diff --git a/arch/powerpc/platforms/maple/pci.c b/arch/powerpc/platforms/maple/pci.c
index a923230..04a69a8 100644
--- a/arch/powerpc/platforms/maple/pci.c
+++ b/arch/powerpc/platforms/maple/pci.c
@@ -568,6 +568,26 @@ void maple_pci_irq_fixup(struct pci_dev *dev)
 	DBG(" <- maple_pci_irq_fixup\n");
 }
 
+static int maple_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions hopefully.
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+
 void __init maple_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -605,20 +625,7 @@ void __init maple_pci_init(void)
 	if (ht && maple_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */ 
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions hopefully.
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
-
+	ppc_md.pcibios_root_bridge_prepare = maple_pci_root_bridge_prepare;
 	/* Tell pci.c to not change any resource allocations.  */
 	pci_add_flags(PCI_PROBE_ONLY);
 }
diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
index f3a68a0..10c4e8f 100644
--- a/arch/powerpc/platforms/pasemi/pci.c
+++ b/arch/powerpc/platforms/pasemi/pci.c
@@ -229,9 +229,6 @@ void __init pas_pci_init(void)
 			of_node_get(np);
 
 	of_node_put(root);
-
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
 }
 
 void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)
diff --git a/arch/powerpc/platforms/powermac/pci.c b/arch/powerpc/platforms/powermac/pci.c
index 59ab16f..368716f 100644
--- a/arch/powerpc/platforms/powermac/pci.c
+++ b/arch/powerpc/platforms/powermac/pci.c
@@ -878,6 +878,29 @@ void pmac_pci_irq_fixup(struct pci_dev *dev)
 #endif /* CONFIG_PPC32 */
 }
 
+#ifdef CONFIG_PPC64
+static int pmac_pci_root_bridge_prepare(struct pci_hot_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions for now. We should do something better in the
+	 * future though
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+#endif /* CONFIG_PPC64 */
+
 void __init pmac_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -914,22 +937,8 @@ void __init pmac_pci_init(void)
 	if (ht && pmac_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions for now. We should do something better in the
-	 * future though
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
 	/* pmac_check_ht_link(); */
-
+	ppc_md.pcibios_root_bridge_prepare = pmac_pci_root_bridge_prepare;
 #else /* CONFIG_PPC64 */
 	init_p2pbridge();
 	init_second_ohare();
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 60e6d65..21a4eb3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -819,9 +819,6 @@ void __init pnv_pci_init(void)
 	for_each_compatible_node(np, NULL, "ibm,ioda2-phb")
 		pnv_pci_init_ioda2_phb(np);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
 	/* Configure IOMMU DMA hooks */
 	ppc_md.tce_build = pnv_tce_build_vm;
 	ppc_md.tce_free = pnv_tce_free_vm;
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index df6a704..5f80758 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -482,7 +482,6 @@ static void __init find_and_init_phbs(void)
 	}
 
 	of_node_put(root);
-	pci_devs_phb_init();
 
 	/*
 	 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 16/21] powerpc/pci: Delay creating pci_dn
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The pci_dn instances are allocated from memblock or bootmem when
creating PCI controller (hoses) in setup_arch(). The PCI hotplug,
which will be supported by proceeding patches, will release PCI
device nodes and their corresponding pci_dn on unplugging event.
The pci_dn instance memory chunks alloed from memblock or bootmem
are hard to reused after being released.

The patch delay creating pci_dn so that they can be allocated from
slab. In turn, the memory chunks for them can be reused after being
released without problem. The creation of eeh_dev instances, which
depends on pci_dn, is delayed a bit as well.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/ppc-pci.h     |  1 -
 arch/powerpc/kernel/eeh_dev.c          |  2 +-
 arch/powerpc/kernel/pci_dn.c           | 40 +++++++++++++++++++---------------
 arch/powerpc/platforms/maple/pci.c     | 35 +++++++++++++++++------------
 arch/powerpc/platforms/pasemi/pci.c    |  3 ---
 arch/powerpc/platforms/powermac/pci.c  | 39 ++++++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.c   |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  1 -
 8 files changed, 68 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 4122a86..7388316 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -40,7 +40,6 @@ void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
 
-extern void pci_devs_phb_init(void);
 extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
 
 /* From rtas_pci.h */
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index aabba94..f33ce5b 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -110,4 +110,4 @@ static int __init eeh_dev_phb_init(void)
 	return 0;
 }
 
-core_initcall(eeh_dev_phb_init);
+core_initcall_sync(eeh_dev_phb_init);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index b3b4df9..d3833af 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -277,7 +277,7 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	struct device_node *parent;
 	struct pci_dn *pdn;
 
-	pdn = zalloc_maybe_bootmem(sizeof(*pdn), GFP_KERNEL);
+	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
 	if (pdn == NULL)
 		return NULL;
 	dn->data = pdn;
@@ -442,33 +442,37 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	traverse_pci_devices(dn, update_dn_pci_info, phb);
 }
 
-/** 
+static void pci_dev_pdn_setup(struct pci_dev *pdev)
+{
+	struct pci_dn *pdn;
+
+	if (pdev->dev.archdata.pci_data)
+		return;
+
+	/* Setup the fast path */
+	pdn = pci_get_pdn(pdev);
+	pdev->dev.archdata.pci_data = pdn;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_dev_pdn_setup);
+
+/*
  * pci_devs_phb_init - Initialize phbs and pci devs under them.
- * 
- * This routine walks over all phb's (pci-host bridges) on the
- * system, and sets up assorted pci-related structures 
+ *
+ * This routine walks over all phb's (pci-host bridges) on
+ * the system, and sets up assorted pci-related structures
  * (including pci info in the device node structs) for each
  * pci device found underneath.  This routine runs once,
  * early in the boot sequence.
  */
-void __init pci_devs_phb_init(void)
+static int __init pci_devs_phb_init(void)
 {
 	struct pci_controller *phb, *tmp;
 
 	/* This must be done first so the device nodes have valid pci info! */
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		pci_devs_phb_init_dynamic(phb);
-}
-
-static void pci_dev_pdn_setup(struct pci_dev *pdev)
-{
-	struct pci_dn *pdn;
 
-	if (pdev->dev.archdata.pci_data)
-		return;
-
-	/* Setup the fast path */
-	pdn = pci_get_pdn(pdev);
-	pdev->dev.archdata.pci_data = pdn;
+	return 0;
 }
-DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_dev_pdn_setup);
+
+core_initcall(pci_devs_phb_init);
diff --git a/arch/powerpc/platforms/maple/pci.c b/arch/powerpc/platforms/maple/pci.c
index a923230..04a69a8 100644
--- a/arch/powerpc/platforms/maple/pci.c
+++ b/arch/powerpc/platforms/maple/pci.c
@@ -568,6 +568,26 @@ void maple_pci_irq_fixup(struct pci_dev *dev)
 	DBG(" <- maple_pci_irq_fixup\n");
 }
 
+static int maple_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions hopefully.
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+
 void __init maple_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -605,20 +625,7 @@ void __init maple_pci_init(void)
 	if (ht && maple_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */ 
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions hopefully.
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
-
+	ppc_md.pcibios_root_bridge_prepare = maple_pci_root_bridge_prepare;
 	/* Tell pci.c to not change any resource allocations.  */
 	pci_add_flags(PCI_PROBE_ONLY);
 }
diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
index f3a68a0..10c4e8f 100644
--- a/arch/powerpc/platforms/pasemi/pci.c
+++ b/arch/powerpc/platforms/pasemi/pci.c
@@ -229,9 +229,6 @@ void __init pas_pci_init(void)
 			of_node_get(np);
 
 	of_node_put(root);
-
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
 }
 
 void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)
diff --git a/arch/powerpc/platforms/powermac/pci.c b/arch/powerpc/platforms/powermac/pci.c
index 59ab16f..368716f 100644
--- a/arch/powerpc/platforms/powermac/pci.c
+++ b/arch/powerpc/platforms/powermac/pci.c
@@ -878,6 +878,29 @@ void pmac_pci_irq_fixup(struct pci_dev *dev)
 #endif /* CONFIG_PPC32 */
 }
 
+#ifdef CONFIG_PPC64
+static int pmac_pci_root_bridge_prepare(struct pci_hot_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions for now. We should do something better in the
+	 * future though
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+#endif /* CONFIG_PPC64 */
+
 void __init pmac_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -914,22 +937,8 @@ void __init pmac_pci_init(void)
 	if (ht && pmac_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions for now. We should do something better in the
-	 * future though
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
 	/* pmac_check_ht_link(); */
-
+	ppc_md.pcibios_root_bridge_prepare = pmac_pci_root_bridge_prepare;
 #else /* CONFIG_PPC64 */
 	init_p2pbridge();
 	init_second_ohare();
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 60e6d65..21a4eb3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -819,9 +819,6 @@ void __init pnv_pci_init(void)
 	for_each_compatible_node(np, NULL, "ibm,ioda2-phb")
 		pnv_pci_init_ioda2_phb(np);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
 	/* Configure IOMMU DMA hooks */
 	ppc_md.tce_build = pnv_tce_build_vm;
 	ppc_md.tce_free = pnv_tce_free_vm;
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index df6a704..5f80758 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -482,7 +482,6 @@ static void __init find_and_init_phbs(void)
 	}
 
 	of_node_put(root);
-	pci_devs_phb_init();
 
 	/*
 	 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 17/21] powerpc/pci: Create eeh_dev while creating pci_dn
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The eeh_dev is always created based on pci_dn, but with initcall
supported by core_initcall_sync(). The patch creates eeh_dev
when pci_dn is created, indicating they have same life cycle.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h         |  6 ++++--
 arch/powerpc/kernel/eeh_dev.c          | 18 ++++--------------
 arch/powerpc/kernel/pci_dn.c           | 12 ++++++++++++
 arch/powerpc/platforms/pseries/setup.c |  6 +-----
 4 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 2793d24..4ed88f6 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -269,7 +269,8 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
-void *eeh_dev_init(struct pci_dn *pdn, void *data);
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn,
+			     struct pci_controller *phb);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
 int eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
@@ -322,7 +323,8 @@ static inline int eeh_init(void)
 	return 0;
 }
 
-static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
+static inline struct eeh_dev *eeh_dev_init(struct pci_dn *pdn,
+					   struct pci_controller *phb)
 {
 	return NULL;
 }
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index f33ce5b..7486932 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -44,14 +44,14 @@
 /**
  * eeh_dev_init - Create EEH device according to OF node
  * @pdn: PCI device node
- * @data: PHB
+ * @phb: PCI controller
  *
  * It will create EEH device according to the given OF node. The function
  * might be called by PCI emunation, DR, PHB hotplug.
  */
-void *eeh_dev_init(struct pci_dn *pdn, void *data)
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn,
+			     struct pci_controller *phb)
 {
-	struct pci_controller *phb = data;
 	struct eeh_dev *edev;
 
 	/* Allocate EEH device */
@@ -68,7 +68,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
 	edev->phb = phb;
 	INIT_LIST_HEAD(&edev->list);
 
-	return NULL;
+	return edev;
 }
 
 /**
@@ -80,16 +80,8 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
  */
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
 {
-	struct pci_dn *root = phb->pci_data;
-
 	/* EEH PE for PHB */
 	eeh_phb_pe_create(phb);
-
-	/* EEH device for PHB */
-	eeh_dev_init(root, phb);
-
-	/* EEH devices for children OF nodes */
-	traverse_pci_dn(root, eeh_dev_init, phb);
 }
 
 /**
@@ -105,8 +97,6 @@ static int __init eeh_dev_phb_init(void)
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		eeh_dev_phb_init_dynamic(phb);
 
-	pr_info("EEH: devices created\n");
-
 	return 0;
 }
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index d3833af..abc81fa 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -276,6 +276,9 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	const __be32 *regs;
 	struct device_node *parent;
 	struct pci_dn *pdn;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev;
+#endif
 
 	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
 	if (pdn == NULL)
@@ -306,6 +309,15 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	/* Extended config space */
 	pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1);
 
+	/* Initialize EEH device */
+#ifdef CONFIG_EEH
+	edev = eeh_dev_init(pdn, phb);
+	if (!edev) {
+		kfree(pdn);
+		return NULL;
+	}
+#endif
+
 	/* Attach to parent node */
 	INIT_LIST_HEAD(&pdn->child_list);
 	INIT_LIST_HEAD(&pdn->list);
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 5f80758..92974aa 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -261,12 +261,8 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 	switch (action) {
 	case OF_RECONFIG_ATTACH_NODE:
 		pci = np->parent->data;
-		if (pci) {
+		if (pci)
 			update_dn_pci_info(np, pci->phb);
-
-			/* Create EEH device for the OF node */
-			eeh_dev_init(PCI_DN(np), pci->phb);
-		}
 		break;
 	default:
 		err = NOTIFY_DONE;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 17/21] powerpc/pci: Create eeh_dev while creating pci_dn
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The eeh_dev is always created based on pci_dn, but with initcall
supported by core_initcall_sync(). The patch creates eeh_dev
when pci_dn is created, indicating they have same life cycle.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h         |  6 ++++--
 arch/powerpc/kernel/eeh_dev.c          | 18 ++++--------------
 arch/powerpc/kernel/pci_dn.c           | 12 ++++++++++++
 arch/powerpc/platforms/pseries/setup.c |  6 +-----
 4 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 2793d24..4ed88f6 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -269,7 +269,8 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
-void *eeh_dev_init(struct pci_dn *pdn, void *data);
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn,
+			     struct pci_controller *phb);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
 int eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
@@ -322,7 +323,8 @@ static inline int eeh_init(void)
 	return 0;
 }
 
-static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
+static inline struct eeh_dev *eeh_dev_init(struct pci_dn *pdn,
+					   struct pci_controller *phb)
 {
 	return NULL;
 }
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index f33ce5b..7486932 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -44,14 +44,14 @@
 /**
  * eeh_dev_init - Create EEH device according to OF node
  * @pdn: PCI device node
- * @data: PHB
+ * @phb: PCI controller
  *
  * It will create EEH device according to the given OF node. The function
  * might be called by PCI emunation, DR, PHB hotplug.
  */
-void *eeh_dev_init(struct pci_dn *pdn, void *data)
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn,
+			     struct pci_controller *phb)
 {
-	struct pci_controller *phb = data;
 	struct eeh_dev *edev;
 
 	/* Allocate EEH device */
@@ -68,7 +68,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
 	edev->phb = phb;
 	INIT_LIST_HEAD(&edev->list);
 
-	return NULL;
+	return edev;
 }
 
 /**
@@ -80,16 +80,8 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
  */
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
 {
-	struct pci_dn *root = phb->pci_data;
-
 	/* EEH PE for PHB */
 	eeh_phb_pe_create(phb);
-
-	/* EEH device for PHB */
-	eeh_dev_init(root, phb);
-
-	/* EEH devices for children OF nodes */
-	traverse_pci_dn(root, eeh_dev_init, phb);
 }
 
 /**
@@ -105,8 +97,6 @@ static int __init eeh_dev_phb_init(void)
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		eeh_dev_phb_init_dynamic(phb);
 
-	pr_info("EEH: devices created\n");
-
 	return 0;
 }
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index d3833af..abc81fa 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -276,6 +276,9 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	const __be32 *regs;
 	struct device_node *parent;
 	struct pci_dn *pdn;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev;
+#endif
 
 	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
 	if (pdn == NULL)
@@ -306,6 +309,15 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	/* Extended config space */
 	pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1);
 
+	/* Initialize EEH device */
+#ifdef CONFIG_EEH
+	edev = eeh_dev_init(pdn, phb);
+	if (!edev) {
+		kfree(pdn);
+		return NULL;
+	}
+#endif
+
 	/* Attach to parent node */
 	INIT_LIST_HEAD(&pdn->child_list);
 	INIT_LIST_HEAD(&pdn->list);
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 5f80758..92974aa 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -261,12 +261,8 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 	switch (action) {
 	case OF_RECONFIG_ATTACH_NODE:
 		pci = np->parent->data;
-		if (pci) {
+		if (pci)
 			update_dn_pci_info(np, pci->phb);
-
-			/* Create EEH device for the OF node */
-			eeh_dev_init(PCI_DN(np), pci->phb);
-		}
 		break;
 	default:
 		err = NOTIFY_DONE;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 18/21] powerpc/pci: Export traverse_pci_device_nodes()
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The patch exports following functions, which are derived from their
original implementation, so that the PCI hotplug logic can reuse
the functions to add or remove pci_dn for all device nodes under
specified PCI slot.

   traverse_pci_device_nodes()     traverse_pci_devices()
   add_pci_device_node_info()      update_dn_pci_info()
   remove_pci_device_node_info()   newly added

The patch also releases eeh_dev when its corresponding pci_dn
is released, indicating they have same life cycle.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h  |  3 +-
 arch/powerpc/include/asm/ppc-pci.h     |  6 +--
 arch/powerpc/kernel/pci_dn.c           | 67 +++++++++++++++++++++++++++++-----
 arch/powerpc/platforms/pseries/msi.c   |  4 +-
 arch/powerpc/platforms/pseries/setup.c |  2 +-
 5 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index a6ad4b1..e0cb114 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -222,7 +222,8 @@ extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
 extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
 extern void remove_dev_pci_data(struct pci_dev *pdev);
-extern void *update_dn_pci_info(struct device_node *dn, void *data);
+extern void *add_pci_device_node_info(struct device_node *dn, void *data);
+extern void remove_pci_device_node_info(struct device_node *dn);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
 					  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 7388316..3f0874e 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
 struct device_node;
 struct pci_dn;
 
-typedef void *(*traverse_func)(struct device_node *me, void *data);
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data);
+void *traverse_pci_device_nodes(struct device_node *start,
+				void *(*fn)(struct device_node *, void *),
+				void *data);
 void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index abc81fa..6bd9d8c 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -265,11 +265,15 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 #endif /* CONFIG_PCI_IOV */
 }
 
-/*
- * Traverse_func that inits the PCI fields of the device node.
- * NOTE: this *must* be done before read/write config to the device.
+/**
+ * add_pci_device_node_info - Add pci_dn for PCI device node
+ * @dn: PCI device node
+ * @data: additonal argument
+ *
+ * Add pci_dn for the indicated PCI device node. The newly created
+ * pci_dn will be put into that one of the parent device node.
  */
-void *update_dn_pci_info(struct device_node *dn, void *data)
+void *add_pci_device_node_info(struct device_node *dn, void *data)
 {
 	struct pci_controller *phb = data;
 	const __be32 *type = of_get_property(dn, "ibm,pci-config-space-type", NULL);
@@ -328,8 +332,48 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 
 	return NULL;
 }
+EXPORT_SYMBOL(add_pci_device_node_info);
 
-/*
+/**
+ * remove_pci_device_node_info - Remove pci_dn from PCI device node
+ * @dn: PCI device node
+ *
+ * Remove pci_dn from PCI device node. The pci_dn is also removed
+ * from the child list of the parent pci_dn.
+ */
+void remove_pci_device_node_info(struct device_node *np)
+{
+	struct pci_dn *pdn = np ? PCI_DN(np) : NULL;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+#endif
+
+	if (!pdn)
+		return;
+
+#ifdef CONFIG_EEH
+	if (edev) {
+		pdn->edev = NULL;
+		kfree(edev);
+	}
+#endif
+
+	BUG_ON(!list_empty(&pdn->child_list));
+	list_del(&pdn->list);
+	if (pdn->parent)
+		of_node_put(pdn->parent->node);
+
+	np->data = NULL;
+	kfree(pdn);
+}
+EXPORT_SYMBOL(remove_pci_device_node_info);
+
+/**
+ * traverse_pci_device_nodes - Traverse children of indicated device node
+ * @start: indicated device node
+ * @pre: callback
+ * @data: additional parameter to the callback
+ *
  * Traverse a device tree stopping each PCI device in the tree.
  * This is done depth first.  As each node is processed, a "pre"
  * function is called and the children are processed recursively.
@@ -347,8 +391,9 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
  * one of these nodes we also assume its siblings are non-pci for
  * performance.
  */
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data)
+void *traverse_pci_device_nodes(struct device_node *start,
+				void *(*fn)(struct device_node *, void *data),
+				void *data)
 {
 	struct device_node *dn, *nextdn;
 	void *ret;
@@ -363,7 +408,7 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
 		if (classp)
 			class = of_read_number(classp, 1);
 
-		if (pre && ((ret = pre(dn, data)) != NULL))
+		if (fn && ((ret = fn(dn, data)) != NULL))
 			return ret;
 
 		/* If we are a PCI bridge, go down */
@@ -384,8 +429,10 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
 			nextdn = dn->sibling;
 		}
 	}
+
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(traverse_pci_device_nodes);
 
 static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
 				      struct pci_dn *pdn)
@@ -441,7 +488,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	struct pci_dn *pdn;
 
 	/* PHB nodes themselves must not match */
-	update_dn_pci_info(dn, phb);
+	add_pci_device_node_info(dn, phb);
 	pdn = dn->data;
 	if (pdn) {
 		pdn->devfn = pdn->busno = -1;
@@ -451,7 +498,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	}
 
 	/* Update dn->phb ptrs for new phb and children devices */
-	traverse_pci_devices(dn, update_dn_pci_info, phb);
+	traverse_pci_device_nodes(dn, add_pci_device_node_info, phb);
 }
 
 static void pci_dev_pdn_setup(struct pci_dev *pdev)
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index c8d24f9..9ebbd19 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -303,7 +303,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
 	memset(&counts, 0, sizeof(struct msi_counts));
 
 	/* Work out how many devices we have below this PE */
-	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
+	traverse_pci_device_nodes(pe_dn, count_non_bridge_devices, &counts);
 
 	if (counts.num_devices == 0) {
 		pr_err("rtas_msi: found 0 devices under PE for %s\n",
@@ -318,7 +318,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
 	/* else, we have some more calculating to do */
 	counts.requestor = pci_device_to_OF_node(dev);
 	counts.request = request;
-	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
+	traverse_pci_device_nodes(pe_dn, count_spare_msis, &counts);
 
 	/* If the quota isn't an integer multiple of the total, we can
 	 * use the remainder as spare MSIs for anyone that wants them. */
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 92974aa..ed8c894 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -262,7 +262,7 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 	case OF_RECONFIG_ATTACH_NODE:
 		pci = np->parent->data;
 		if (pci)
-			update_dn_pci_info(np, pci->phb);
+			add_pci_device_node_info(np, pci->phb);
 		break;
 	default:
 		err = NOTIFY_DONE;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 18/21] powerpc/pci: Export traverse_pci_device_nodes()
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The patch exports following functions, which are derived from their
original implementation, so that the PCI hotplug logic can reuse
the functions to add or remove pci_dn for all device nodes under
specified PCI slot.

   traverse_pci_device_nodes()     traverse_pci_devices()
   add_pci_device_node_info()      update_dn_pci_info()
   remove_pci_device_node_info()   newly added

The patch also releases eeh_dev when its corresponding pci_dn
is released, indicating they have same life cycle.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h  |  3 +-
 arch/powerpc/include/asm/ppc-pci.h     |  6 +--
 arch/powerpc/kernel/pci_dn.c           | 67 +++++++++++++++++++++++++++++-----
 arch/powerpc/platforms/pseries/msi.c   |  4 +-
 arch/powerpc/platforms/pseries/setup.c |  2 +-
 5 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index a6ad4b1..e0cb114 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -222,7 +222,8 @@ extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
 extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
 extern void remove_dev_pci_data(struct pci_dev *pdev);
-extern void *update_dn_pci_info(struct device_node *dn, void *data);
+extern void *add_pci_device_node_info(struct device_node *dn, void *data);
+extern void remove_pci_device_node_info(struct device_node *dn);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
 					  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 7388316..3f0874e 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
 struct device_node;
 struct pci_dn;
 
-typedef void *(*traverse_func)(struct device_node *me, void *data);
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data);
+void *traverse_pci_device_nodes(struct device_node *start,
+				void *(*fn)(struct device_node *, void *),
+				void *data);
 void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index abc81fa..6bd9d8c 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -265,11 +265,15 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 #endif /* CONFIG_PCI_IOV */
 }
 
-/*
- * Traverse_func that inits the PCI fields of the device node.
- * NOTE: this *must* be done before read/write config to the device.
+/**
+ * add_pci_device_node_info - Add pci_dn for PCI device node
+ * @dn: PCI device node
+ * @data: additonal argument
+ *
+ * Add pci_dn for the indicated PCI device node. The newly created
+ * pci_dn will be put into that one of the parent device node.
  */
-void *update_dn_pci_info(struct device_node *dn, void *data)
+void *add_pci_device_node_info(struct device_node *dn, void *data)
 {
 	struct pci_controller *phb = data;
 	const __be32 *type = of_get_property(dn, "ibm,pci-config-space-type", NULL);
@@ -328,8 +332,48 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 
 	return NULL;
 }
+EXPORT_SYMBOL(add_pci_device_node_info);
 
-/*
+/**
+ * remove_pci_device_node_info - Remove pci_dn from PCI device node
+ * @dn: PCI device node
+ *
+ * Remove pci_dn from PCI device node. The pci_dn is also removed
+ * from the child list of the parent pci_dn.
+ */
+void remove_pci_device_node_info(struct device_node *np)
+{
+	struct pci_dn *pdn = np ? PCI_DN(np) : NULL;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+#endif
+
+	if (!pdn)
+		return;
+
+#ifdef CONFIG_EEH
+	if (edev) {
+		pdn->edev = NULL;
+		kfree(edev);
+	}
+#endif
+
+	BUG_ON(!list_empty(&pdn->child_list));
+	list_del(&pdn->list);
+	if (pdn->parent)
+		of_node_put(pdn->parent->node);
+
+	np->data = NULL;
+	kfree(pdn);
+}
+EXPORT_SYMBOL(remove_pci_device_node_info);
+
+/**
+ * traverse_pci_device_nodes - Traverse children of indicated device node
+ * @start: indicated device node
+ * @pre: callback
+ * @data: additional parameter to the callback
+ *
  * Traverse a device tree stopping each PCI device in the tree.
  * This is done depth first.  As each node is processed, a "pre"
  * function is called and the children are processed recursively.
@@ -347,8 +391,9 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
  * one of these nodes we also assume its siblings are non-pci for
  * performance.
  */
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data)
+void *traverse_pci_device_nodes(struct device_node *start,
+				void *(*fn)(struct device_node *, void *data),
+				void *data)
 {
 	struct device_node *dn, *nextdn;
 	void *ret;
@@ -363,7 +408,7 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
 		if (classp)
 			class = of_read_number(classp, 1);
 
-		if (pre && ((ret = pre(dn, data)) != NULL))
+		if (fn && ((ret = fn(dn, data)) != NULL))
 			return ret;
 
 		/* If we are a PCI bridge, go down */
@@ -384,8 +429,10 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
 			nextdn = dn->sibling;
 		}
 	}
+
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(traverse_pci_device_nodes);
 
 static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
 				      struct pci_dn *pdn)
@@ -441,7 +488,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	struct pci_dn *pdn;
 
 	/* PHB nodes themselves must not match */
-	update_dn_pci_info(dn, phb);
+	add_pci_device_node_info(dn, phb);
 	pdn = dn->data;
 	if (pdn) {
 		pdn->devfn = pdn->busno = -1;
@@ -451,7 +498,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	}
 
 	/* Update dn->phb ptrs for new phb and children devices */
-	traverse_pci_devices(dn, update_dn_pci_info, phb);
+	traverse_pci_device_nodes(dn, add_pci_device_node_info, phb);
 }
 
 static void pci_dev_pdn_setup(struct pci_dev *pdev)
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index c8d24f9..9ebbd19 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -303,7 +303,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
 	memset(&counts, 0, sizeof(struct msi_counts));
 
 	/* Work out how many devices we have below this PE */
-	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
+	traverse_pci_device_nodes(pe_dn, count_non_bridge_devices, &counts);
 
 	if (counts.num_devices == 0) {
 		pr_err("rtas_msi: found 0 devices under PE for %s\n",
@@ -318,7 +318,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
 	/* else, we have some more calculating to do */
 	counts.requestor = pci_device_to_OF_node(dev);
 	counts.request = request;
-	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
+	traverse_pci_device_nodes(pe_dn, count_spare_msis, &counts);
 
 	/* If the quota isn't an integer multiple of the total, we can
 	 * use the remainder as spare MSIs for anyone that wants them. */
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 92974aa..ed8c894 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -262,7 +262,7 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 	case OF_RECONFIG_ATTACH_NODE:
 		pci = np->parent->data;
 		if (pci)
-			update_dn_pci_info(np, pci->phb);
+			add_pci_device_node_info(np, pci->phb);
 		break;
 	default:
 		err = NOTIFY_DONE;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 19/21] powerpc/pci: Update bridge windows on PCI plugging
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

During the PCI plugging event, the PCI devices are rescanned and
their IO and MMIO resources are reassigned. However, the PowerNV
platform will assign PE# based on that, which depends on updating
to window of bridge of the PE's primary bus.

The patch updates the windows of bridge of PE's primary bus if
we have valid bridge. Otherwise, we assume it's root bus or SRIOV
virtual bus and PE won't be assigned during PCI plugging time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-common.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 01d2a84..cb0bb3f 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1473,8 +1473,12 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus)
 	/* Allocate bus and devices resources */
 	pcibios_allocate_bus_resources(bus);
 	pcibios_claim_one_bus(bus);
-	if (!pci_has_flag(PCI_PROBE_ONLY))
-		pci_assign_unassigned_bus_resources(bus);
+	if (!pci_has_flag(PCI_PROBE_ONLY)) {
+		if (bus->self)
+			pci_assign_unassigned_bridge_resources(bus->self);
+		else
+			pci_assign_unassigned_bus_resources(bus);
+	}
 
 	/* Fixup EEH */
 	eeh_add_device_tree_late(bus);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 19/21] powerpc/pci: Update bridge windows on PCI plugging
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

During the PCI plugging event, the PCI devices are rescanned and
their IO and MMIO resources are reassigned. However, the PowerNV
platform will assign PE# based on that, which depends on updating
to window of bridge of the PE's primary bus.

The patch updates the windows of bridge of PE's primary bus if
we have valid bridge. Otherwise, we assume it's root bus or SRIOV
virtual bus and PE won't be assigned during PCI plugging time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-common.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 01d2a84..cb0bb3f 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1473,8 +1473,12 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus)
 	/* Allocate bus and devices resources */
 	pcibios_allocate_bus_resources(bus);
 	pcibios_claim_one_bus(bus);
-	if (!pci_has_flag(PCI_PROBE_ONLY))
-		pci_assign_unassigned_bus_resources(bus);
+	if (!pci_has_flag(PCI_PROBE_ONLY)) {
+		if (bus->self)
+			pci_assign_unassigned_bridge_resources(bus->self);
+		else
+			pci_assign_unassigned_bus_resources(bus);
+	}
 
 	/* Fixup EEH */
 	eeh_add_device_tree_late(bus);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 20/21] powerpc/powernv: Select OF_DYNAMIC
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The device tree nodes will be changed dynamically on PCI hotplug
events on PowerNV platform. The patch selects OF_DYNAMIC on the
platform to support PCI hotplug.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 4b044d8..9c62631 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -18,4 +18,5 @@ config PPC_POWERNV
 	select CPU_FREQ_GOV_ONDEMAND
 	select CPU_FREQ_GOV_CONSERVATIVE
 	select PPC_DOORBELL
+	select OF_DYNAMIC
 	default y
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 20/21] powerpc/powernv: Select OF_DYNAMIC
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The device tree nodes will be changed dynamically on PCI hotplug
events on PowerNV platform. The patch selects OF_DYNAMIC on the
platform to support PCI hotplug.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 4b044d8..9c62631 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -18,4 +18,5 @@ config PPC_POWERNV
 	select CPU_FREQ_GOV_ONDEMAND
 	select CPU_FREQ_GOV_CONSERVATIVE
 	select PPC_DOORBELL
+	select OF_DYNAMIC
 	default y
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 21/21] pci/hotplug: PowerPC PowerNV PCI hotplug driver
  2015-04-27  6:37 ` Gavin Shan
@ 2015-04-27  6:37   ` Gavin Shan
  -1 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-pci, benh, bhelgaas, Gavin Shan

The patch intends to add standalone driver to support PCI hotplug
for PowerPC PowerNV platform, which runs on top of skiboot firmware.
The firmware identified hotpluggable slots and marked their device
tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware".
The driver simply scans device-tree to create/register PCI hotplug slot
accordingly.

If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property "ibm,reset-by-firmware". In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/hotplug/Kconfig            |  12 +
 drivers/pci/hotplug/Makefile           |   4 +
 drivers/pci/hotplug/powernv_php.c      | 146 ++++++
 drivers/pci/hotplug/powernv_php.h      |  78 +++
 drivers/pci/hotplug/powernv_php_slot.c | 913 +++++++++++++++++++++++++++++++++
 5 files changed, 1153 insertions(+)
 create mode 100644 drivers/pci/hotplug/powernv_php.c
 create mode 100644 drivers/pci/hotplug/powernv_php.h
 create mode 100644 drivers/pci/hotplug/powernv_php_slot.c

diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..ef55dae 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
 
 	  When in doubt, say N.
 
+config HOTPLUG_PCI_POWERNV
+	tristate "PowerPC PowerNV PCI Hotplug driver"
+	depends on PPC_POWERNV && EEH
+	help
+	  Say Y here if you run PowerPC PowerNV platform that supports
+          PCI Hotplug
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called powernv-php.
+
+	  When in doubt, say N.
+
 config HOTPLUG_PCI_RPA
 	tristate "RPA PCI Hotplug driver"
 	depends on PPC_PSERIES && EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index 4a9aa08..a69665e 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
 obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= powernv-php.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
@@ -50,6 +51,9 @@ ibmphp-objs		:=	ibmphp_core.o	\
 acpiphp-objs		:=	acpiphp_core.o	\
 				acpiphp_glue.o
 
+powernv-php-objs	:=	powernv_php.o	\
+				powernv_php_slot.o
+
 rpaphp-objs		:=	rpaphp_core.o	\
 				rpaphp_pci.o	\
 				rpaphp_slot.o
diff --git a/drivers/pci/hotplug/powernv_php.c b/drivers/pci/hotplug/powernv_php.c
new file mode 100644
index 0000000..5cf9e717
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php.c
@@ -0,0 +1,146 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sysfs.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+
+#include "powernv_php.h"
+
+#define DRIVER_VERSION	"0.1"
+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
+
+static struct notifier_block php_msg_nb = {
+	.notifier_call	= powernv_php_msg_handler,
+	.next		= NULL,
+	.priority	= 0,
+};
+
+static int powernv_php_register_one(struct device_node *dn)
+{
+	struct powernv_php_slot *slot;
+	const __be32 *prop32;
+	int ret;
+
+	/* Check if it's hotpluggable slot */
+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return 0;
+
+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return 0;
+
+	/* Allocate slot */
+	slot = powernv_php_slot_alloc(dn);
+	if (!slot)
+		return -ENODEV;
+
+	/* Register it */
+	ret = powernv_php_slot_register(slot);
+	if (ret) {
+		powernv_php_slot_put(slot);
+		return ret;
+	}
+
+	return powernv_php_slot_enable(slot->php_slot, false, false);
+}
+
+int powernv_php_register(struct device_node *dn)
+{
+	struct device_node *child;
+	int ret = 0;
+
+	/*
+	 * The parent slots should be registered before their
+	 * child slots.
+	 */
+	for_each_child_of_node(dn, child) {
+		ret = powernv_php_register_one(child);
+		if (ret)
+			break;
+
+		powernv_php_register(child);
+	}
+
+	return ret;
+}
+
+static void powernv_php_unregister_one(struct device_node *dn)
+{
+	struct powernv_php_slot *slot;
+
+	slot = powernv_php_slot_find(dn);
+	if (!slot)
+		return;
+
+	pci_hp_deregister(slot->php_slot);
+}
+
+void powernv_php_unregister(struct device_node *dn)
+{
+	struct device_node *child;
+
+	/* The child slots should go before their parent slots */
+	for_each_child_of_node(dn, child) {
+		powernv_php_unregister(child);
+		powernv_php_unregister_one(child);
+	}
+}
+
+static int __init powernv_php_init(void)
+{
+	struct device_node *dn;
+
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+
+	/* Register hotplug message handler */
+	if (pnv_pci_hotplug_notifier(&php_msg_nb, true)) {
+		pr_warn("%s: Cannot register hotplug message notifier\n",
+			__func__);
+		return -EIO;
+	}
+
+	/* Scan PHB nodes and their children */
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		powernv_php_register(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		powernv_php_register(dn);
+
+	return 0;
+}
+
+static void __exit powernv_php_exit(void)
+{
+	struct device_node *dn;
+
+	pnv_pci_hotplug_notifier(&php_msg_nb, false);
+
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		powernv_php_unregister(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		powernv_php_unregister(dn);
+}
+
+module_init(powernv_php_init);
+module_exit(powernv_php_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/pci/hotplug/powernv_php.h b/drivers/pci/hotplug/powernv_php.h
new file mode 100644
index 0000000..87ba0d0
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php.h
@@ -0,0 +1,78 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _POWERNV_PHP_H
+#define _POWERNV_PHP_H
+
+/* Slot power status */
+#define POWERNV_PHP_SLOT_POWER_OFF	0
+#define POWERNV_PHP_SLOT_POWER_ON	1
+
+/* Slot presence status */
+#define POWERNV_PHP_SLOT_EMPTY		0
+#define POWERNV_PHP_SLOT_PRESENT	1
+
+/* Slot attention status */
+#define POWERNV_PHP_SLOT_ATTEN_OFF	0
+#define POWERNV_PHP_SLOT_ATTEN_ON	1
+#define POWERNV_PHP_SLOT_ATTEN_IND	2
+#define POWERNV_PHP_SLOT_ATTEN_ACT	3
+
+struct powernv_php_slot {
+	struct kref		kref;
+	int			state;
+#define POWERNV_PHP_SLOT_STATE_INIT		0x0
+#define POWERNV_PHP_SLOT_STATE_REGISTER		0x1
+#define POWERNV_PHP_SLOT_STATE_POPULATED	0x2
+	char			*name;
+	struct device_node	*dn;
+	struct pci_bus		*bus;
+	uint64_t		id;
+	int			slot_no;
+	int			check_power_status;
+	int			status_confirmed;
+	struct opal_msg		*msg;
+	struct work_struct	work;
+	wait_queue_head_t	queue;
+	struct hotplug_slot	*php_slot;
+	struct powernv_php_slot	*parent;
+	void (*release)(struct kref *kref);
+	struct list_head	children;
+	struct list_head	link;
+};
+
+#define to_powernv_php_slot(kref) container_of(kref, struct powernv_php_slot, kref)
+
+static inline void powernv_php_slot_get(struct powernv_php_slot *slot)
+{
+	if (slot)
+		kref_get(&slot->kref);
+}
+
+static inline int powernv_php_slot_put(struct powernv_php_slot *slot)
+{
+	if (slot)
+		return kref_put(&slot->kref, slot->release);
+
+	return 0;
+}
+
+int powernv_php_msg_handler(struct notifier_block *nb,
+			    unsigned long type, void *message);
+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn);
+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn);
+int powernv_php_slot_register(struct powernv_php_slot *slot);
+int powernv_php_slot_enable(struct hotplug_slot *php_slot,
+			    bool rescan_bus, bool rescan_slot);
+int powernv_php_register(struct device_node *dn);
+void powernv_php_unregister(struct device_node *dn);
+
+#endif /* !_POWERNV_PHP_H */
diff --git a/drivers/pci/hotplug/powernv_php_slot.c b/drivers/pci/hotplug/powernv_php_slot.c
new file mode 100644
index 0000000..847a84e
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php_slot.c
@@ -0,0 +1,913 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sysfs.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+#include <asm/ppc-pci.h>
+
+#include "powernv_php.h"
+
+static LIST_HEAD(php_slot_list);
+static DEFINE_SPINLOCK(php_slot_lock);
+
+/*
+ * Release firmware data for all child device nodes of the
+ * indicated one.
+ */
+static void release_device_nodes_info(struct device_node *np)
+{
+	struct device_node *child;
+
+	for_each_child_of_node(np, child) {
+		/* In depth first */
+		release_device_nodes_info(child);
+
+		remove_pci_device_node_info(child);
+	}
+}
+
+/*
+ * Release all subordinate device nodes of the indicated one.
+ * Those device nodes in deepest path should be released firstly.
+ */
+static int release_device_nodes(struct device_node *parent)
+{
+	struct device_node *np, *child;
+	int ret = 0;
+
+	/* If the device node has children, remove them firstly */
+	for_each_child_of_node(parent, np) {
+		ret = release_device_nodes(np);
+		if (ret)
+			return ret;
+
+		/* The device shouldn't have alive children */
+		child = of_get_next_child(np, NULL);
+		if (child) {
+			of_node_put(child);
+			of_node_put(np);
+			pr_err("%s: Alive children of node <%s>\n",
+			       __func__, of_node_full_name(np));
+			return -EBUSY;
+		}
+
+		/* Detach the device node */
+		of_detach_node(np);
+		of_node_put(np);
+	}
+
+	return 0;
+}
+
+/*
+ * The function processes the message sent by firmware
+ * to remove all device tree nodes beneath the slot's
+ * nodes, and the associated auxillary data.
+ */
+static void slot_power_off_handler(struct powernv_php_slot *slot)
+{
+	int ret;
+
+	/* Release the firmware data for the child device nodes */
+	release_device_nodes_info(slot->dn);
+
+	/* Release the child device nodes */
+	ret = release_device_nodes(slot->dn);
+	if (ret) {
+		pr_warn("%s: Error %d releasing children of <%s>\n",
+			__func__, ret, of_node_full_name(slot->dn));
+		return;
+	}
+
+	/* The event is handled completely */
+	slot->status_confirmed = 1;
+	wake_up_interruptible(&slot->queue);
+}
+
+/*
+ * Locate the prefix for device node or property, which are [NODE]
+ * and [PROP] separately. Due to arbitrary combination included
+ * in the property value, string searching functions aren't
+ * reliable. So we have to check characters.
+ */
+static char *get_next_prefix(char *start, char *end, const char *needle)
+{
+	char *p = start;
+	int i;
+
+	while (end - p >= strlen(needle)) {
+		for (i = 0; i < strlen(needle); i++)
+			if (p[i] != needle[i])
+				break;
+
+		/* Matched pattern */
+		if (i >= strlen(needle))
+			return p;
+
+		/* Try next one */
+		p++;
+	}
+
+	return NULL;
+}
+
+/*
+ * Get property name, value and its length. Besides, the starting
+ * position of next valid property or the ending position of the
+ * device node will be returned.
+ */
+static char *get_next_property(char *start, char *end,
+			       char **name, char **val, int *len)
+{
+	char *next, *p = start, *tmp;
+
+	/* Property prefix */
+	next = get_next_prefix(p, end, "[PROP] ");
+	if (!next)
+		goto no_property;
+	p = next + 7;
+
+	/* Property name */
+	next = strchr(p, ' ');
+	if (!next || next >= end)
+		goto no_property;
+	*next = '\0';
+	if (!(*name = kstrdup(p, GFP_KERNEL)))
+		goto no_property;
+
+	/* Property length */
+	p = next + 1;
+	next = strchr(p, ' ');
+	if (!next || next >= end) {
+		kfree(*name);
+		goto no_property;
+	}
+	*next = '\0';
+	*len = -1;
+	*len = simple_strtoul(p, &tmp, 10);
+	if (*len == -1 || *tmp != '\0' || tmp >= end) {
+		kfree(*name);
+		goto no_property;
+	}
+
+	/* Property value */
+	p = next + 1;
+	*val = p;
+
+	return p + *len + 1;
+
+no_property:
+	*name = NULL;
+	*val = NULL;
+	*len = 0;
+	return NULL;
+}
+
+/*
+ * Get full name of device node. Besides, the starting position
+ * of property and the ending position of device node will be
+ * returned if there is one valid device node. Otherwise, NULL
+ * will be returned.
+ */
+static char *get_next_node(char *start, char *end,
+			   char **path, char **node_end)
+{
+	char *p, *next;
+
+	/* Find current node */
+	p = start;
+	p = get_next_prefix(p, end, "[NODE] ");
+	if (!p)
+		goto no_node;
+
+	/* Get the path */
+	p += 7;
+	next = strchr(p, ' ');
+	if (!next || next >= end)
+		goto no_node;
+	*next = '\0';
+	*path = p;
+
+	/* Search for the end of properties */
+	p = next + 1;
+	next = get_next_prefix(p, end, "[NODE] ");
+	if (!next)
+		*node_end = end;
+	else
+		*node_end = next;
+
+	/* Start of the properties */
+	return p;
+
+no_node:
+	*path = NULL;
+	*node_end = NULL;
+	return NULL;
+}
+
+/* Helper function to free indicated device node */
+static void free_device_node(struct device_node *np)
+{
+	struct property *prop;
+
+	if (!np)
+		return;
+
+	/* Free properties */
+	while ((prop = np->properties)) {
+		np->properties = prop->next;
+
+		if (prop->value)
+			kfree(prop->value);
+		if (np->name)
+			kfree(prop->name);
+		kfree(prop);
+	}
+
+	/* Dereferencing the parent node */
+	if (np->parent)
+		of_node_put(np->parent);
+
+	/* Free full name the node itself */
+	if (np->full_name)
+		kfree(np->full_name);
+	kfree(np);
+}
+
+static int add_device_nodes(char *start, char *end)
+{
+	struct device_node *np;
+	char *n_start, *n_end, *path, *tmp;
+	struct property *prop;
+	char *p_start, *p_name, *p_val;
+	int p_len, ret = -ENOMEM;
+
+	n_start = start;
+	while ((n_start = get_next_node(n_start, end, &path, &n_end))) {
+		/* Allocate device node */
+		np = kzalloc(sizeof(*np), GFP_KERNEL);
+		if (!np) {
+			pr_warn("%s: Cannot allocate new node\n",
+				__func__);
+			return -ENOMEM;
+		}
+
+		np->full_name = kstrdup(path, GFP_KERNEL);
+		if (!np->full_name) {
+			pr_warn("%s: Cannot duplicate node name <%s>\n",
+				__func__, path);
+			goto error;
+		}
+
+		/* Find the parent */
+		tmp = strrchr(path, '/');
+		if (tmp) {
+			*tmp = '\0';
+			np->parent = of_find_node_by_path(path);
+		}
+		if (!np->parent) {
+			pr_warn("%s: No parent node <%s>\n",
+				__func__, path);
+			goto error;
+		}
+
+		/* Properties */
+		p_start = n_start;
+		while ((p_start = get_next_property(p_start, n_end,
+					&p_name, &p_val, &p_len))) {
+			/* Property */
+			prop = kzalloc(sizeof(*prop), GFP_KERNEL);
+			if (!prop) {
+				pr_warn("%s: No mem for property %s::%s\n",
+					__func__, of_node_full_name(np),
+					p_name);
+				goto error;
+			}
+
+			/* Put to the list of node */
+			prop->next = np->properties;
+			np->properties = prop;
+
+			/* Property name */
+			prop->name = kstrdup(p_name, GFP_KERNEL);
+			if (!prop->name) {
+				pr_warn("%s: No mem for property name %s::%s\n",
+					__func__, of_node_full_name(np),
+					p_name);
+				goto error;
+			}
+
+			/* Proper value */
+			prop->length = p_len;
+			prop->value = kmemdup(p_val, prop->length, GFP_KERNEL);
+			if (!prop->value) {
+				pr_warn("%s: No mem for property val %s::%s\n",
+					__func__, of_node_full_name(np),
+					p_name);
+				goto error;
+			}
+
+			/* Mark the property as dynamic */
+			of_property_set_flag(prop, OF_DYNAMIC);
+		}
+
+		/* Mark the node as dynamic */
+		of_node_set_flag(np, OF_DYNAMIC);
+
+		/* Initialize the device node */
+		of_node_init(np);
+
+		/* Attach device node */
+		ret = of_attach_node(np);
+		if (ret) {
+			pr_warn("%s: Error %d attaching <%s>\n",
+				__func__, ret, of_node_full_name(np));
+			goto error;
+		}
+
+		/* For next node */
+		of_node_put(np->parent);
+		n_start = n_end;
+	}
+
+	return 0;
+error:
+	free_device_node(np);
+	return -ENOMEM;
+}
+
+static void slot_power_on_handler(struct powernv_php_slot *slot)
+{
+	struct opal_msg *msg = slot->msg;
+	unsigned long phys = be64_to_cpu(msg->params[2]);
+	unsigned long len = be64_to_cpu(msg->params[3]);
+	char *virt = (phys && len > 7) ? __va(phys) : NULL;
+	char *buf = NULL;
+	int ret;
+
+	/* There might have nothing behind the slot yet */
+	if (!virt)
+		goto out;
+
+	/* Copy the buffer over */
+	buf = kzalloc(len, GFP_KERNEL);
+	if (!buf) {
+		pr_warn("%s: Out of memory!\n", __func__);
+		goto out;
+	}
+	memcpy(buf, virt, len);
+
+	/* Add the device nodes */
+	ret = add_device_nodes(buf, buf + len);
+	if (ret) {
+		kfree(buf);
+		pr_warn("%s: Error %d adding child nodes for <%s>\n",
+			__func__, ret, of_node_full_name(slot->dn));
+		goto out;
+	}
+
+	/* Add device node firmware data */
+	traverse_pci_device_nodes(slot->dn,
+				  add_pci_device_node_info,
+				  pci_bus_to_host(slot->bus));
+
+	/* Confirm status change */
+out:
+	slot->status_confirmed = 1;
+	wake_up_interruptible(&slot->queue);
+}
+
+static void powernv_php_slot_work(struct work_struct *data)
+{
+	struct powernv_php_slot *slot = container_of(data,
+						     struct powernv_php_slot,
+						     work);
+	uint64_t php_event = be64_to_cpu(slot->msg->params[0]);
+
+	switch (php_event) {
+	case 0: /* Slot power off */
+		slot_power_off_handler(slot);
+		break;
+	case 1: /* Slot power on */
+		slot_power_on_handler(slot);
+		break;
+	default:
+		pr_warn("%s: Unsupported hotplug event %lld\n",
+			__func__, php_event);
+	}
+
+	of_node_put(slot->dn);
+}
+
+int powernv_php_msg_handler(struct notifier_block *nb,
+			    unsigned long type, void *message)
+{
+	phandle h;
+	struct device_node *np;
+	struct powernv_php_slot *slot;
+	struct opal_msg *msg = message;
+
+	/* Check the message type */
+	if (type != OPAL_MSG_PCI_HOTPLUG) {
+		pr_warn("%s: Wrong message type %ld received!\n",
+			__func__, type);
+		return 0;
+	}
+
+	/* Find the device node */
+	h = (phandle)be64_to_cpu(msg->params[1]);
+	np = of_find_node_by_phandle(h);
+	if (!np) {
+		pr_warn("%s: No device node for phandle 0x%08x\n",
+			__func__, h);
+		return 0;
+	}
+
+	/* Find the slot */
+	slot = powernv_php_slot_find(np);
+	if (!slot) {
+		pr_warn("%s: No slot found for node <%s>\n",
+			__func__, of_node_full_name(np));
+		of_node_put(np);
+		return 0;
+	}
+
+	/* Schedule the work */
+	slot->msg = msg;
+	schedule_work(&slot->work);
+	return 0;
+}
+
+static int set_power_status(struct hotplug_slot *php_slot, u8 val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	int ret;
+
+	/* Set power status */
+	slot->status_confirmed = 0;
+	ret = pnv_pci_set_power_status(slot->id, val);
+	if (ret) {
+		pr_warn("%s: Error %d powering %s slot %016llx\n",
+			__func__, ret, val ? "on" : "off", slot->id);
+		return ret;
+	}
+
+	/* Waiting until the device tree is updated */
+	ret = wait_event_timeout(slot->queue,
+				 !slot->status_confirmed,
+				 10 * HZ);
+	if (ret) {
+		pr_warn("%s: Error %d completing power-%s slot %016llx\n",
+			__func__, ret, val ? "on" : "off", slot->id);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int get_power_status(struct hotplug_slot *php_slot, u8 *val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t state;
+	int ret;
+
+	/*
+	 * Retrieve power status from firmware. If we fail
+	 * getting that, the power status fails back to
+	 * be on.
+	 */
+	ret = pnv_pci_get_power_status(slot->id, &state);
+	if (ret) {
+		*val = POWERNV_PHP_SLOT_POWER_ON;
+		pr_warn("%s: Error %d getting power status of slot %016llx\n",
+			__func__, ret, slot->id);
+	} else {
+		*val = state ? POWERNV_PHP_SLOT_POWER_ON :
+			       POWERNV_PHP_SLOT_POWER_OFF;
+		php_slot->info->power_status = *val;
+	}
+
+	return 0;
+}
+
+static int get_adapter_status(struct hotplug_slot *php_slot, u8 *val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t state;
+	int ret;
+
+	/*
+	 * Retrieve presence status from firmware. If we can't
+	 * get that, it will fail back to be empty.
+	 */
+	ret = pnv_pci_get_presence_status(slot->id, &state);
+	if (ret >= 0) {
+                *val = state ? POWERNV_PHP_SLOT_PRESENT :
+                               POWERNV_PHP_SLOT_EMPTY;
+                php_slot->info->adapter_status = *val;
+	} else {
+		*val = POWERNV_PHP_SLOT_EMPTY;
+		pr_warn("%s: Error %d getting presence of slot %016llx\n",
+			__func__, ret, slot->id);
+	}
+
+	return ret < 0 ? ret : 0;
+}
+
+static int set_attention_status(struct hotplug_slot *php_slot, u8 val)
+{
+	/* The default operation would to turn on the attention */
+	switch (val) {
+	case POWERNV_PHP_SLOT_ATTEN_OFF:
+	case POWERNV_PHP_SLOT_ATTEN_ON:
+	case POWERNV_PHP_SLOT_ATTEN_IND:
+	case POWERNV_PHP_SLOT_ATTEN_ACT:
+		break;
+	default:
+		val = POWERNV_PHP_SLOT_ATTEN_ON;
+	}
+
+	/* FIXME: Make it real once firmware supports it */
+	php_slot->info->attention_status = val;
+
+	return 0;
+}
+
+int powernv_php_slot_enable(struct hotplug_slot *php_slot,
+			    bool rescan_bus, bool rescan_slot)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t presence, power_status;
+	int ret;
+
+	/* Check if the slot has been configured */
+	if (slot->state != POWERNV_PHP_SLOT_STATE_REGISTER)
+		return 0;
+
+	/* Retrieve slot presence status */
+	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
+	if (ret) {
+		pr_warn("%s: Error %d getting presence of slot %016llx\n",
+			__func__, ret, slot->id);
+		return ret;
+	}
+
+	/* Proceed if there have nothing behind the slot */
+	if (presence == POWERNV_PHP_SLOT_EMPTY)
+		goto scan;
+
+	/*
+	 * If we don't detect something behind the slot, we need
+	 * make sure the power suply to the slot is on. Otherwise,
+	 * the slot downstream PCIe linkturn should be down.
+	 *
+	 * On the first time, we don't change the power status to
+	 * boost system boot with assumption that the firmware
+	 * supplies consistent slot power status: empty slot always
+	 * has its power off and non-empty slot has its power on.
+	 */
+	if (!slot->check_power_status) {
+		slot->check_power_status = 1;
+		goto scan;
+	}
+
+	/* Check the power status. Scan the slot if that's already on */
+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
+	if (ret) {
+		pr_warn("%s: Error %d getting power status of slot %016llx\n",
+			__func__, ret, slot->id);
+		return ret;
+	}
+	if (power_status == POWERNV_PHP_SLOT_POWER_ON)
+		goto scan;
+
+	/* Power is off, turn it on and then scan the slot */
+	ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_ON);
+	if (ret) {
+		pr_warn("%s: Error %d powering on slot %016llx\n",
+			__func__, ret, slot->id);
+		return ret;
+	}
+
+scan:
+	switch (presence) {
+	case POWERNV_PHP_SLOT_PRESENT:
+		if (rescan_bus) {
+			pci_lock_rescan_remove();
+			pcibios_add_pci_devices(slot->bus);
+			pci_unlock_rescan_remove();
+		}
+
+		/* Rescan for child hotpluggable slots */
+		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
+		if (rescan_slot)
+			powernv_php_register(slot->dn);
+		break;
+	case POWERNV_PHP_SLOT_EMPTY:
+		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
+		break;
+	default:
+		pr_warn("%s: Invalid presence status %d of slot %016llx\n",
+			__func__, presence, slot->id);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int enable_slot(struct hotplug_slot *php_slot)
+{
+	return powernv_php_slot_enable(php_slot, true, true);
+}
+
+static int disable_slot(struct hotplug_slot *php_slot)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t power_status;
+	int ret;
+
+	if (slot->state != POWERNV_PHP_SLOT_STATE_POPULATED)
+		return 0;
+
+	/* Remove all devices behind the slot */
+	pci_lock_rescan_remove();
+	pcibios_remove_pci_devices(slot->bus);
+	pci_unlock_rescan_remove();
+
+	/* Detach the child hotpluggable slots */
+	powernv_php_unregister(slot->dn);
+
+	/*
+	 * Check the power status and turn it off if necessary. If we
+	 * fail to get the power status, the power will be forced to
+	 * be off.
+	 */
+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
+	if (ret || power_status == POWERNV_PHP_SLOT_POWER_ON) {
+		ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_OFF);
+		if (ret)
+			pr_warn("%s: Error %d powering off slot %016llx\n",
+				__func__, ret, slot->id);
+	}
+
+	/* Update slot state */
+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
+	return 0;
+}
+
+static struct hotplug_slot_ops php_slot_ops = {
+	.get_power_status	= get_power_status,
+	.get_adapter_status	= get_adapter_status,
+	.set_attention_status	= set_attention_status,
+	.enable_slot		= enable_slot,
+	.disable_slot		= disable_slot,
+};
+
+static struct powernv_php_slot *php_slot_match(struct device_node *dn,
+					       struct powernv_php_slot *slot)
+{
+	struct powernv_php_slot *target, *tmp;
+
+	if (slot->dn == dn)
+		return slot;
+
+	list_for_each_entry(tmp, &slot->children, link) {
+		target = php_slot_match(dn, tmp);
+		if (target)
+			return target;
+	}
+
+	return NULL;
+}
+
+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn)
+{
+	struct powernv_php_slot *slot, *tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&php_slot_lock, flags);
+	list_for_each_entry(tmp, &php_slot_list, link) {
+		slot = php_slot_match(dn, tmp);
+		if (slot) {
+			spin_unlock_irqrestore(&php_slot_lock, flags);
+			return slot;
+		}
+	}
+	spin_unlock_irqrestore(&php_slot_lock, flags);
+
+	return NULL;
+}
+
+static void php_slot_free(struct kref *kref)
+{
+	struct powernv_php_slot *slot = to_powernv_php_slot(kref);
+
+	WARN_ON(!list_empty(&slot->children));
+	kfree(slot->name);
+	kfree(slot);
+}
+
+static void php_slot_release(struct hotplug_slot *hp_slot)
+{
+	struct powernv_php_slot *slot = hp_slot->private;
+	unsigned long flags;
+
+	/* Remove from global or child list */
+	spin_lock_irqsave(&php_slot_lock, flags);
+	list_del(&slot->link);
+	spin_unlock_irqrestore(&php_slot_lock, flags);
+
+	/* Detach from parent */
+	powernv_php_slot_put(slot);
+	powernv_php_slot_put(slot->parent);
+}
+
+static bool php_slot_get_id(struct device_node *dn,
+			    uint64_t *id)
+{
+	struct device_node *parent = dn;
+	const __be64 *prop64;
+	const __be32 *prop32;
+
+	/*
+	 * The hotpluggable slot always has a compound Id, which
+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
+	 * number, and compound indicator
+	 */
+	*id = (0x1ul << 63);
+
+	/* Bus/Slot/Function number */
+	prop32 = of_get_property(dn, "reg", NULL);
+	if (!prop32)
+		return false;
+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
+
+	/* PHB Id */
+	while ((parent = of_get_parent(parent))) {
+		if (!PCI_DN(parent)) {
+			of_node_put(parent);
+			break;
+		}
+
+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
+			of_node_put(parent);
+			continue;
+		}
+
+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
+		if (!prop64) {
+			of_node_put(parent);
+			return false;
+		}
+
+		*id |= be64_to_cpup(prop64);
+		of_node_put(parent);
+		return true;
+	}
+
+        return false;
+}
+
+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn)
+{
+	struct pci_bus *bus;
+	struct powernv_php_slot *slot;
+	const char *label;
+	uint64_t id;
+	int slot_no;
+	size_t size;
+	void *pmem;
+
+	/* Slot name */
+	label = of_get_property(dn, "ibm,slot-label", NULL);
+	if (!label)
+		return NULL;
+
+	/* Slot indentifier */
+	if (!php_slot_get_id(dn, &id))
+		return NULL;
+
+	/* PCI bus */
+	bus = pcibios_find_pci_bus(dn);
+	if (!bus)
+		return NULL;
+
+	/* Slot number */
+	if (dn->child && PCI_DN(dn->child))
+		slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
+	else
+		slot_no = -1;
+
+	/* Allocate slot */
+	size = sizeof(struct powernv_php_slot) +
+	       sizeof(struct hotplug_slot) +
+	       sizeof(struct hotplug_slot_info);
+	pmem = kzalloc(size, GFP_KERNEL);
+	if (!pmem) {
+		pr_warn("%s: Cannot allocate slot for node %s\n",
+			__func__, dn->full_name);
+		return NULL;
+	}
+
+	/* Assign memory blocks */
+	slot = pmem;
+	slot->php_slot = pmem + sizeof(struct powernv_php_slot);
+	slot->php_slot->info = pmem + sizeof(struct powernv_php_slot) +
+			      sizeof(struct hotplug_slot);
+	slot->name = kstrdup(label, GFP_KERNEL);
+	if (!slot->name) {
+		pr_warn("%s: Cannot populate name for node %s\n",
+			__func__, dn->full_name);
+		kfree(pmem);
+		return NULL;
+	}
+
+	/* Initialize slot */
+	kref_init(&slot->kref);
+	slot->state = POWERNV_PHP_SLOT_STATE_INIT;
+	slot->dn = dn;
+	slot->bus = bus;
+	slot->id = id;
+	slot->slot_no = slot_no;
+	INIT_WORK(&slot->work, powernv_php_slot_work);
+	init_waitqueue_head(&slot->queue);
+	slot->check_power_status = 0;
+	slot->status_confirmed = 0;
+	slot->release = php_slot_free;
+	slot->php_slot->ops = &php_slot_ops;
+	slot->php_slot->release = php_slot_release;
+	slot->php_slot->private = slot;
+	INIT_LIST_HEAD(&slot->children);
+	INIT_LIST_HEAD(&slot->link);
+
+	return slot;
+}
+
+int powernv_php_slot_register(struct powernv_php_slot *slot)
+{
+	struct powernv_php_slot *parent;
+	struct device_node *dn = slot->dn;
+	unsigned long flags;
+	int ret;
+
+	/* Avoid register same slot for twice */
+	if (powernv_php_slot_find(slot->dn))
+		return -EEXIST;
+
+	/* Register slot */
+	ret = pci_hp_register(slot->php_slot, slot->bus,
+			      slot->slot_no, slot->name);
+	if (ret) {
+		pr_warn("%s: Cannot register slot %s (%d)\n",
+			__func__, slot->name, ret);
+		return ret;
+	}
+
+	/* Put into global or parent list */
+	while ((dn = of_get_parent(dn))) {
+		if (!PCI_DN(dn)) {
+			of_node_put(dn);
+			break;
+		}
+
+		parent = powernv_php_slot_find(dn);
+		if (parent) {
+			of_node_put(dn);
+			break;
+		}
+	}
+
+	spin_lock_irqsave(&php_slot_lock, flags);
+	if (parent) {
+		powernv_php_slot_get(parent);
+		slot->parent = parent;
+		list_add_tail(&slot->link, &parent->children);
+	} else {
+		list_add_tail(&slot->link, &php_slot_list);
+	}
+	spin_unlock_irqrestore(&php_slot_lock, flags);
+
+	/* Update slot state */
+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
+	return 0;
+}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 21/21] pci/hotplug: PowerPC PowerNV PCI hotplug driver
@ 2015-04-27  6:37   ` Gavin Shan
  0 siblings, 0 replies; 44+ messages in thread
From: Gavin Shan @ 2015-04-27  6:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bhelgaas, linux-pci, Gavin Shan

The patch intends to add standalone driver to support PCI hotplug
for PowerPC PowerNV platform, which runs on top of skiboot firmware.
The firmware identified hotpluggable slots and marked their device
tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware".
The driver simply scans device-tree to create/register PCI hotplug slot
accordingly.

If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property "ibm,reset-by-firmware". In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/hotplug/Kconfig            |  12 +
 drivers/pci/hotplug/Makefile           |   4 +
 drivers/pci/hotplug/powernv_php.c      | 146 ++++++
 drivers/pci/hotplug/powernv_php.h      |  78 +++
 drivers/pci/hotplug/powernv_php_slot.c | 913 +++++++++++++++++++++++++++++++++
 5 files changed, 1153 insertions(+)
 create mode 100644 drivers/pci/hotplug/powernv_php.c
 create mode 100644 drivers/pci/hotplug/powernv_php.h
 create mode 100644 drivers/pci/hotplug/powernv_php_slot.c

diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..ef55dae 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
 
 	  When in doubt, say N.
 
+config HOTPLUG_PCI_POWERNV
+	tristate "PowerPC PowerNV PCI Hotplug driver"
+	depends on PPC_POWERNV && EEH
+	help
+	  Say Y here if you run PowerPC PowerNV platform that supports
+          PCI Hotplug
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called powernv-php.
+
+	  When in doubt, say N.
+
 config HOTPLUG_PCI_RPA
 	tristate "RPA PCI Hotplug driver"
 	depends on PPC_PSERIES && EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index 4a9aa08..a69665e 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
 obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= powernv-php.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
@@ -50,6 +51,9 @@ ibmphp-objs		:=	ibmphp_core.o	\
 acpiphp-objs		:=	acpiphp_core.o	\
 				acpiphp_glue.o
 
+powernv-php-objs	:=	powernv_php.o	\
+				powernv_php_slot.o
+
 rpaphp-objs		:=	rpaphp_core.o	\
 				rpaphp_pci.o	\
 				rpaphp_slot.o
diff --git a/drivers/pci/hotplug/powernv_php.c b/drivers/pci/hotplug/powernv_php.c
new file mode 100644
index 0000000..5cf9e717
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php.c
@@ -0,0 +1,146 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sysfs.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+
+#include "powernv_php.h"
+
+#define DRIVER_VERSION	"0.1"
+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
+
+static struct notifier_block php_msg_nb = {
+	.notifier_call	= powernv_php_msg_handler,
+	.next		= NULL,
+	.priority	= 0,
+};
+
+static int powernv_php_register_one(struct device_node *dn)
+{
+	struct powernv_php_slot *slot;
+	const __be32 *prop32;
+	int ret;
+
+	/* Check if it's hotpluggable slot */
+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return 0;
+
+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return 0;
+
+	/* Allocate slot */
+	slot = powernv_php_slot_alloc(dn);
+	if (!slot)
+		return -ENODEV;
+
+	/* Register it */
+	ret = powernv_php_slot_register(slot);
+	if (ret) {
+		powernv_php_slot_put(slot);
+		return ret;
+	}
+
+	return powernv_php_slot_enable(slot->php_slot, false, false);
+}
+
+int powernv_php_register(struct device_node *dn)
+{
+	struct device_node *child;
+	int ret = 0;
+
+	/*
+	 * The parent slots should be registered before their
+	 * child slots.
+	 */
+	for_each_child_of_node(dn, child) {
+		ret = powernv_php_register_one(child);
+		if (ret)
+			break;
+
+		powernv_php_register(child);
+	}
+
+	return ret;
+}
+
+static void powernv_php_unregister_one(struct device_node *dn)
+{
+	struct powernv_php_slot *slot;
+
+	slot = powernv_php_slot_find(dn);
+	if (!slot)
+		return;
+
+	pci_hp_deregister(slot->php_slot);
+}
+
+void powernv_php_unregister(struct device_node *dn)
+{
+	struct device_node *child;
+
+	/* The child slots should go before their parent slots */
+	for_each_child_of_node(dn, child) {
+		powernv_php_unregister(child);
+		powernv_php_unregister_one(child);
+	}
+}
+
+static int __init powernv_php_init(void)
+{
+	struct device_node *dn;
+
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+
+	/* Register hotplug message handler */
+	if (pnv_pci_hotplug_notifier(&php_msg_nb, true)) {
+		pr_warn("%s: Cannot register hotplug message notifier\n",
+			__func__);
+		return -EIO;
+	}
+
+	/* Scan PHB nodes and their children */
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		powernv_php_register(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		powernv_php_register(dn);
+
+	return 0;
+}
+
+static void __exit powernv_php_exit(void)
+{
+	struct device_node *dn;
+
+	pnv_pci_hotplug_notifier(&php_msg_nb, false);
+
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		powernv_php_unregister(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		powernv_php_unregister(dn);
+}
+
+module_init(powernv_php_init);
+module_exit(powernv_php_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/pci/hotplug/powernv_php.h b/drivers/pci/hotplug/powernv_php.h
new file mode 100644
index 0000000..87ba0d0
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php.h
@@ -0,0 +1,78 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _POWERNV_PHP_H
+#define _POWERNV_PHP_H
+
+/* Slot power status */
+#define POWERNV_PHP_SLOT_POWER_OFF	0
+#define POWERNV_PHP_SLOT_POWER_ON	1
+
+/* Slot presence status */
+#define POWERNV_PHP_SLOT_EMPTY		0
+#define POWERNV_PHP_SLOT_PRESENT	1
+
+/* Slot attention status */
+#define POWERNV_PHP_SLOT_ATTEN_OFF	0
+#define POWERNV_PHP_SLOT_ATTEN_ON	1
+#define POWERNV_PHP_SLOT_ATTEN_IND	2
+#define POWERNV_PHP_SLOT_ATTEN_ACT	3
+
+struct powernv_php_slot {
+	struct kref		kref;
+	int			state;
+#define POWERNV_PHP_SLOT_STATE_INIT		0x0
+#define POWERNV_PHP_SLOT_STATE_REGISTER		0x1
+#define POWERNV_PHP_SLOT_STATE_POPULATED	0x2
+	char			*name;
+	struct device_node	*dn;
+	struct pci_bus		*bus;
+	uint64_t		id;
+	int			slot_no;
+	int			check_power_status;
+	int			status_confirmed;
+	struct opal_msg		*msg;
+	struct work_struct	work;
+	wait_queue_head_t	queue;
+	struct hotplug_slot	*php_slot;
+	struct powernv_php_slot	*parent;
+	void (*release)(struct kref *kref);
+	struct list_head	children;
+	struct list_head	link;
+};
+
+#define to_powernv_php_slot(kref) container_of(kref, struct powernv_php_slot, kref)
+
+static inline void powernv_php_slot_get(struct powernv_php_slot *slot)
+{
+	if (slot)
+		kref_get(&slot->kref);
+}
+
+static inline int powernv_php_slot_put(struct powernv_php_slot *slot)
+{
+	if (slot)
+		return kref_put(&slot->kref, slot->release);
+
+	return 0;
+}
+
+int powernv_php_msg_handler(struct notifier_block *nb,
+			    unsigned long type, void *message);
+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn);
+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn);
+int powernv_php_slot_register(struct powernv_php_slot *slot);
+int powernv_php_slot_enable(struct hotplug_slot *php_slot,
+			    bool rescan_bus, bool rescan_slot);
+int powernv_php_register(struct device_node *dn);
+void powernv_php_unregister(struct device_node *dn);
+
+#endif /* !_POWERNV_PHP_H */
diff --git a/drivers/pci/hotplug/powernv_php_slot.c b/drivers/pci/hotplug/powernv_php_slot.c
new file mode 100644
index 0000000..847a84e
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php_slot.c
@@ -0,0 +1,913 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sysfs.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+#include <asm/ppc-pci.h>
+
+#include "powernv_php.h"
+
+static LIST_HEAD(php_slot_list);
+static DEFINE_SPINLOCK(php_slot_lock);
+
+/*
+ * Release firmware data for all child device nodes of the
+ * indicated one.
+ */
+static void release_device_nodes_info(struct device_node *np)
+{
+	struct device_node *child;
+
+	for_each_child_of_node(np, child) {
+		/* In depth first */
+		release_device_nodes_info(child);
+
+		remove_pci_device_node_info(child);
+	}
+}
+
+/*
+ * Release all subordinate device nodes of the indicated one.
+ * Those device nodes in deepest path should be released firstly.
+ */
+static int release_device_nodes(struct device_node *parent)
+{
+	struct device_node *np, *child;
+	int ret = 0;
+
+	/* If the device node has children, remove them firstly */
+	for_each_child_of_node(parent, np) {
+		ret = release_device_nodes(np);
+		if (ret)
+			return ret;
+
+		/* The device shouldn't have alive children */
+		child = of_get_next_child(np, NULL);
+		if (child) {
+			of_node_put(child);
+			of_node_put(np);
+			pr_err("%s: Alive children of node <%s>\n",
+			       __func__, of_node_full_name(np));
+			return -EBUSY;
+		}
+
+		/* Detach the device node */
+		of_detach_node(np);
+		of_node_put(np);
+	}
+
+	return 0;
+}
+
+/*
+ * The function processes the message sent by firmware
+ * to remove all device tree nodes beneath the slot's
+ * nodes, and the associated auxillary data.
+ */
+static void slot_power_off_handler(struct powernv_php_slot *slot)
+{
+	int ret;
+
+	/* Release the firmware data for the child device nodes */
+	release_device_nodes_info(slot->dn);
+
+	/* Release the child device nodes */
+	ret = release_device_nodes(slot->dn);
+	if (ret) {
+		pr_warn("%s: Error %d releasing children of <%s>\n",
+			__func__, ret, of_node_full_name(slot->dn));
+		return;
+	}
+
+	/* The event is handled completely */
+	slot->status_confirmed = 1;
+	wake_up_interruptible(&slot->queue);
+}
+
+/*
+ * Locate the prefix for device node or property, which are [NODE]
+ * and [PROP] separately. Due to arbitrary combination included
+ * in the property value, string searching functions aren't
+ * reliable. So we have to check characters.
+ */
+static char *get_next_prefix(char *start, char *end, const char *needle)
+{
+	char *p = start;
+	int i;
+
+	while (end - p >= strlen(needle)) {
+		for (i = 0; i < strlen(needle); i++)
+			if (p[i] != needle[i])
+				break;
+
+		/* Matched pattern */
+		if (i >= strlen(needle))
+			return p;
+
+		/* Try next one */
+		p++;
+	}
+
+	return NULL;
+}
+
+/*
+ * Get property name, value and its length. Besides, the starting
+ * position of next valid property or the ending position of the
+ * device node will be returned.
+ */
+static char *get_next_property(char *start, char *end,
+			       char **name, char **val, int *len)
+{
+	char *next, *p = start, *tmp;
+
+	/* Property prefix */
+	next = get_next_prefix(p, end, "[PROP] ");
+	if (!next)
+		goto no_property;
+	p = next + 7;
+
+	/* Property name */
+	next = strchr(p, ' ');
+	if (!next || next >= end)
+		goto no_property;
+	*next = '\0';
+	if (!(*name = kstrdup(p, GFP_KERNEL)))
+		goto no_property;
+
+	/* Property length */
+	p = next + 1;
+	next = strchr(p, ' ');
+	if (!next || next >= end) {
+		kfree(*name);
+		goto no_property;
+	}
+	*next = '\0';
+	*len = -1;
+	*len = simple_strtoul(p, &tmp, 10);
+	if (*len == -1 || *tmp != '\0' || tmp >= end) {
+		kfree(*name);
+		goto no_property;
+	}
+
+	/* Property value */
+	p = next + 1;
+	*val = p;
+
+	return p + *len + 1;
+
+no_property:
+	*name = NULL;
+	*val = NULL;
+	*len = 0;
+	return NULL;
+}
+
+/*
+ * Get full name of device node. Besides, the starting position
+ * of property and the ending position of device node will be
+ * returned if there is one valid device node. Otherwise, NULL
+ * will be returned.
+ */
+static char *get_next_node(char *start, char *end,
+			   char **path, char **node_end)
+{
+	char *p, *next;
+
+	/* Find current node */
+	p = start;
+	p = get_next_prefix(p, end, "[NODE] ");
+	if (!p)
+		goto no_node;
+
+	/* Get the path */
+	p += 7;
+	next = strchr(p, ' ');
+	if (!next || next >= end)
+		goto no_node;
+	*next = '\0';
+	*path = p;
+
+	/* Search for the end of properties */
+	p = next + 1;
+	next = get_next_prefix(p, end, "[NODE] ");
+	if (!next)
+		*node_end = end;
+	else
+		*node_end = next;
+
+	/* Start of the properties */
+	return p;
+
+no_node:
+	*path = NULL;
+	*node_end = NULL;
+	return NULL;
+}
+
+/* Helper function to free indicated device node */
+static void free_device_node(struct device_node *np)
+{
+	struct property *prop;
+
+	if (!np)
+		return;
+
+	/* Free properties */
+	while ((prop = np->properties)) {
+		np->properties = prop->next;
+
+		if (prop->value)
+			kfree(prop->value);
+		if (np->name)
+			kfree(prop->name);
+		kfree(prop);
+	}
+
+	/* Dereferencing the parent node */
+	if (np->parent)
+		of_node_put(np->parent);
+
+	/* Free full name the node itself */
+	if (np->full_name)
+		kfree(np->full_name);
+	kfree(np);
+}
+
+static int add_device_nodes(char *start, char *end)
+{
+	struct device_node *np;
+	char *n_start, *n_end, *path, *tmp;
+	struct property *prop;
+	char *p_start, *p_name, *p_val;
+	int p_len, ret = -ENOMEM;
+
+	n_start = start;
+	while ((n_start = get_next_node(n_start, end, &path, &n_end))) {
+		/* Allocate device node */
+		np = kzalloc(sizeof(*np), GFP_KERNEL);
+		if (!np) {
+			pr_warn("%s: Cannot allocate new node\n",
+				__func__);
+			return -ENOMEM;
+		}
+
+		np->full_name = kstrdup(path, GFP_KERNEL);
+		if (!np->full_name) {
+			pr_warn("%s: Cannot duplicate node name <%s>\n",
+				__func__, path);
+			goto error;
+		}
+
+		/* Find the parent */
+		tmp = strrchr(path, '/');
+		if (tmp) {
+			*tmp = '\0';
+			np->parent = of_find_node_by_path(path);
+		}
+		if (!np->parent) {
+			pr_warn("%s: No parent node <%s>\n",
+				__func__, path);
+			goto error;
+		}
+
+		/* Properties */
+		p_start = n_start;
+		while ((p_start = get_next_property(p_start, n_end,
+					&p_name, &p_val, &p_len))) {
+			/* Property */
+			prop = kzalloc(sizeof(*prop), GFP_KERNEL);
+			if (!prop) {
+				pr_warn("%s: No mem for property %s::%s\n",
+					__func__, of_node_full_name(np),
+					p_name);
+				goto error;
+			}
+
+			/* Put to the list of node */
+			prop->next = np->properties;
+			np->properties = prop;
+
+			/* Property name */
+			prop->name = kstrdup(p_name, GFP_KERNEL);
+			if (!prop->name) {
+				pr_warn("%s: No mem for property name %s::%s\n",
+					__func__, of_node_full_name(np),
+					p_name);
+				goto error;
+			}
+
+			/* Proper value */
+			prop->length = p_len;
+			prop->value = kmemdup(p_val, prop->length, GFP_KERNEL);
+			if (!prop->value) {
+				pr_warn("%s: No mem for property val %s::%s\n",
+					__func__, of_node_full_name(np),
+					p_name);
+				goto error;
+			}
+
+			/* Mark the property as dynamic */
+			of_property_set_flag(prop, OF_DYNAMIC);
+		}
+
+		/* Mark the node as dynamic */
+		of_node_set_flag(np, OF_DYNAMIC);
+
+		/* Initialize the device node */
+		of_node_init(np);
+
+		/* Attach device node */
+		ret = of_attach_node(np);
+		if (ret) {
+			pr_warn("%s: Error %d attaching <%s>\n",
+				__func__, ret, of_node_full_name(np));
+			goto error;
+		}
+
+		/* For next node */
+		of_node_put(np->parent);
+		n_start = n_end;
+	}
+
+	return 0;
+error:
+	free_device_node(np);
+	return -ENOMEM;
+}
+
+static void slot_power_on_handler(struct powernv_php_slot *slot)
+{
+	struct opal_msg *msg = slot->msg;
+	unsigned long phys = be64_to_cpu(msg->params[2]);
+	unsigned long len = be64_to_cpu(msg->params[3]);
+	char *virt = (phys && len > 7) ? __va(phys) : NULL;
+	char *buf = NULL;
+	int ret;
+
+	/* There might have nothing behind the slot yet */
+	if (!virt)
+		goto out;
+
+	/* Copy the buffer over */
+	buf = kzalloc(len, GFP_KERNEL);
+	if (!buf) {
+		pr_warn("%s: Out of memory!\n", __func__);
+		goto out;
+	}
+	memcpy(buf, virt, len);
+
+	/* Add the device nodes */
+	ret = add_device_nodes(buf, buf + len);
+	if (ret) {
+		kfree(buf);
+		pr_warn("%s: Error %d adding child nodes for <%s>\n",
+			__func__, ret, of_node_full_name(slot->dn));
+		goto out;
+	}
+
+	/* Add device node firmware data */
+	traverse_pci_device_nodes(slot->dn,
+				  add_pci_device_node_info,
+				  pci_bus_to_host(slot->bus));
+
+	/* Confirm status change */
+out:
+	slot->status_confirmed = 1;
+	wake_up_interruptible(&slot->queue);
+}
+
+static void powernv_php_slot_work(struct work_struct *data)
+{
+	struct powernv_php_slot *slot = container_of(data,
+						     struct powernv_php_slot,
+						     work);
+	uint64_t php_event = be64_to_cpu(slot->msg->params[0]);
+
+	switch (php_event) {
+	case 0: /* Slot power off */
+		slot_power_off_handler(slot);
+		break;
+	case 1: /* Slot power on */
+		slot_power_on_handler(slot);
+		break;
+	default:
+		pr_warn("%s: Unsupported hotplug event %lld\n",
+			__func__, php_event);
+	}
+
+	of_node_put(slot->dn);
+}
+
+int powernv_php_msg_handler(struct notifier_block *nb,
+			    unsigned long type, void *message)
+{
+	phandle h;
+	struct device_node *np;
+	struct powernv_php_slot *slot;
+	struct opal_msg *msg = message;
+
+	/* Check the message type */
+	if (type != OPAL_MSG_PCI_HOTPLUG) {
+		pr_warn("%s: Wrong message type %ld received!\n",
+			__func__, type);
+		return 0;
+	}
+
+	/* Find the device node */
+	h = (phandle)be64_to_cpu(msg->params[1]);
+	np = of_find_node_by_phandle(h);
+	if (!np) {
+		pr_warn("%s: No device node for phandle 0x%08x\n",
+			__func__, h);
+		return 0;
+	}
+
+	/* Find the slot */
+	slot = powernv_php_slot_find(np);
+	if (!slot) {
+		pr_warn("%s: No slot found for node <%s>\n",
+			__func__, of_node_full_name(np));
+		of_node_put(np);
+		return 0;
+	}
+
+	/* Schedule the work */
+	slot->msg = msg;
+	schedule_work(&slot->work);
+	return 0;
+}
+
+static int set_power_status(struct hotplug_slot *php_slot, u8 val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	int ret;
+
+	/* Set power status */
+	slot->status_confirmed = 0;
+	ret = pnv_pci_set_power_status(slot->id, val);
+	if (ret) {
+		pr_warn("%s: Error %d powering %s slot %016llx\n",
+			__func__, ret, val ? "on" : "off", slot->id);
+		return ret;
+	}
+
+	/* Waiting until the device tree is updated */
+	ret = wait_event_timeout(slot->queue,
+				 !slot->status_confirmed,
+				 10 * HZ);
+	if (ret) {
+		pr_warn("%s: Error %d completing power-%s slot %016llx\n",
+			__func__, ret, val ? "on" : "off", slot->id);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int get_power_status(struct hotplug_slot *php_slot, u8 *val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t state;
+	int ret;
+
+	/*
+	 * Retrieve power status from firmware. If we fail
+	 * getting that, the power status fails back to
+	 * be on.
+	 */
+	ret = pnv_pci_get_power_status(slot->id, &state);
+	if (ret) {
+		*val = POWERNV_PHP_SLOT_POWER_ON;
+		pr_warn("%s: Error %d getting power status of slot %016llx\n",
+			__func__, ret, slot->id);
+	} else {
+		*val = state ? POWERNV_PHP_SLOT_POWER_ON :
+			       POWERNV_PHP_SLOT_POWER_OFF;
+		php_slot->info->power_status = *val;
+	}
+
+	return 0;
+}
+
+static int get_adapter_status(struct hotplug_slot *php_slot, u8 *val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t state;
+	int ret;
+
+	/*
+	 * Retrieve presence status from firmware. If we can't
+	 * get that, it will fail back to be empty.
+	 */
+	ret = pnv_pci_get_presence_status(slot->id, &state);
+	if (ret >= 0) {
+                *val = state ? POWERNV_PHP_SLOT_PRESENT :
+                               POWERNV_PHP_SLOT_EMPTY;
+                php_slot->info->adapter_status = *val;
+	} else {
+		*val = POWERNV_PHP_SLOT_EMPTY;
+		pr_warn("%s: Error %d getting presence of slot %016llx\n",
+			__func__, ret, slot->id);
+	}
+
+	return ret < 0 ? ret : 0;
+}
+
+static int set_attention_status(struct hotplug_slot *php_slot, u8 val)
+{
+	/* The default operation would to turn on the attention */
+	switch (val) {
+	case POWERNV_PHP_SLOT_ATTEN_OFF:
+	case POWERNV_PHP_SLOT_ATTEN_ON:
+	case POWERNV_PHP_SLOT_ATTEN_IND:
+	case POWERNV_PHP_SLOT_ATTEN_ACT:
+		break;
+	default:
+		val = POWERNV_PHP_SLOT_ATTEN_ON;
+	}
+
+	/* FIXME: Make it real once firmware supports it */
+	php_slot->info->attention_status = val;
+
+	return 0;
+}
+
+int powernv_php_slot_enable(struct hotplug_slot *php_slot,
+			    bool rescan_bus, bool rescan_slot)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t presence, power_status;
+	int ret;
+
+	/* Check if the slot has been configured */
+	if (slot->state != POWERNV_PHP_SLOT_STATE_REGISTER)
+		return 0;
+
+	/* Retrieve slot presence status */
+	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
+	if (ret) {
+		pr_warn("%s: Error %d getting presence of slot %016llx\n",
+			__func__, ret, slot->id);
+		return ret;
+	}
+
+	/* Proceed if there have nothing behind the slot */
+	if (presence == POWERNV_PHP_SLOT_EMPTY)
+		goto scan;
+
+	/*
+	 * If we don't detect something behind the slot, we need
+	 * make sure the power suply to the slot is on. Otherwise,
+	 * the slot downstream PCIe linkturn should be down.
+	 *
+	 * On the first time, we don't change the power status to
+	 * boost system boot with assumption that the firmware
+	 * supplies consistent slot power status: empty slot always
+	 * has its power off and non-empty slot has its power on.
+	 */
+	if (!slot->check_power_status) {
+		slot->check_power_status = 1;
+		goto scan;
+	}
+
+	/* Check the power status. Scan the slot if that's already on */
+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
+	if (ret) {
+		pr_warn("%s: Error %d getting power status of slot %016llx\n",
+			__func__, ret, slot->id);
+		return ret;
+	}
+	if (power_status == POWERNV_PHP_SLOT_POWER_ON)
+		goto scan;
+
+	/* Power is off, turn it on and then scan the slot */
+	ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_ON);
+	if (ret) {
+		pr_warn("%s: Error %d powering on slot %016llx\n",
+			__func__, ret, slot->id);
+		return ret;
+	}
+
+scan:
+	switch (presence) {
+	case POWERNV_PHP_SLOT_PRESENT:
+		if (rescan_bus) {
+			pci_lock_rescan_remove();
+			pcibios_add_pci_devices(slot->bus);
+			pci_unlock_rescan_remove();
+		}
+
+		/* Rescan for child hotpluggable slots */
+		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
+		if (rescan_slot)
+			powernv_php_register(slot->dn);
+		break;
+	case POWERNV_PHP_SLOT_EMPTY:
+		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
+		break;
+	default:
+		pr_warn("%s: Invalid presence status %d of slot %016llx\n",
+			__func__, presence, slot->id);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int enable_slot(struct hotplug_slot *php_slot)
+{
+	return powernv_php_slot_enable(php_slot, true, true);
+}
+
+static int disable_slot(struct hotplug_slot *php_slot)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t power_status;
+	int ret;
+
+	if (slot->state != POWERNV_PHP_SLOT_STATE_POPULATED)
+		return 0;
+
+	/* Remove all devices behind the slot */
+	pci_lock_rescan_remove();
+	pcibios_remove_pci_devices(slot->bus);
+	pci_unlock_rescan_remove();
+
+	/* Detach the child hotpluggable slots */
+	powernv_php_unregister(slot->dn);
+
+	/*
+	 * Check the power status and turn it off if necessary. If we
+	 * fail to get the power status, the power will be forced to
+	 * be off.
+	 */
+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
+	if (ret || power_status == POWERNV_PHP_SLOT_POWER_ON) {
+		ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_OFF);
+		if (ret)
+			pr_warn("%s: Error %d powering off slot %016llx\n",
+				__func__, ret, slot->id);
+	}
+
+	/* Update slot state */
+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
+	return 0;
+}
+
+static struct hotplug_slot_ops php_slot_ops = {
+	.get_power_status	= get_power_status,
+	.get_adapter_status	= get_adapter_status,
+	.set_attention_status	= set_attention_status,
+	.enable_slot		= enable_slot,
+	.disable_slot		= disable_slot,
+};
+
+static struct powernv_php_slot *php_slot_match(struct device_node *dn,
+					       struct powernv_php_slot *slot)
+{
+	struct powernv_php_slot *target, *tmp;
+
+	if (slot->dn == dn)
+		return slot;
+
+	list_for_each_entry(tmp, &slot->children, link) {
+		target = php_slot_match(dn, tmp);
+		if (target)
+			return target;
+	}
+
+	return NULL;
+}
+
+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn)
+{
+	struct powernv_php_slot *slot, *tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&php_slot_lock, flags);
+	list_for_each_entry(tmp, &php_slot_list, link) {
+		slot = php_slot_match(dn, tmp);
+		if (slot) {
+			spin_unlock_irqrestore(&php_slot_lock, flags);
+			return slot;
+		}
+	}
+	spin_unlock_irqrestore(&php_slot_lock, flags);
+
+	return NULL;
+}
+
+static void php_slot_free(struct kref *kref)
+{
+	struct powernv_php_slot *slot = to_powernv_php_slot(kref);
+
+	WARN_ON(!list_empty(&slot->children));
+	kfree(slot->name);
+	kfree(slot);
+}
+
+static void php_slot_release(struct hotplug_slot *hp_slot)
+{
+	struct powernv_php_slot *slot = hp_slot->private;
+	unsigned long flags;
+
+	/* Remove from global or child list */
+	spin_lock_irqsave(&php_slot_lock, flags);
+	list_del(&slot->link);
+	spin_unlock_irqrestore(&php_slot_lock, flags);
+
+	/* Detach from parent */
+	powernv_php_slot_put(slot);
+	powernv_php_slot_put(slot->parent);
+}
+
+static bool php_slot_get_id(struct device_node *dn,
+			    uint64_t *id)
+{
+	struct device_node *parent = dn;
+	const __be64 *prop64;
+	const __be32 *prop32;
+
+	/*
+	 * The hotpluggable slot always has a compound Id, which
+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
+	 * number, and compound indicator
+	 */
+	*id = (0x1ul << 63);
+
+	/* Bus/Slot/Function number */
+	prop32 = of_get_property(dn, "reg", NULL);
+	if (!prop32)
+		return false;
+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
+
+	/* PHB Id */
+	while ((parent = of_get_parent(parent))) {
+		if (!PCI_DN(parent)) {
+			of_node_put(parent);
+			break;
+		}
+
+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
+			of_node_put(parent);
+			continue;
+		}
+
+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
+		if (!prop64) {
+			of_node_put(parent);
+			return false;
+		}
+
+		*id |= be64_to_cpup(prop64);
+		of_node_put(parent);
+		return true;
+	}
+
+        return false;
+}
+
+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn)
+{
+	struct pci_bus *bus;
+	struct powernv_php_slot *slot;
+	const char *label;
+	uint64_t id;
+	int slot_no;
+	size_t size;
+	void *pmem;
+
+	/* Slot name */
+	label = of_get_property(dn, "ibm,slot-label", NULL);
+	if (!label)
+		return NULL;
+
+	/* Slot indentifier */
+	if (!php_slot_get_id(dn, &id))
+		return NULL;
+
+	/* PCI bus */
+	bus = pcibios_find_pci_bus(dn);
+	if (!bus)
+		return NULL;
+
+	/* Slot number */
+	if (dn->child && PCI_DN(dn->child))
+		slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
+	else
+		slot_no = -1;
+
+	/* Allocate slot */
+	size = sizeof(struct powernv_php_slot) +
+	       sizeof(struct hotplug_slot) +
+	       sizeof(struct hotplug_slot_info);
+	pmem = kzalloc(size, GFP_KERNEL);
+	if (!pmem) {
+		pr_warn("%s: Cannot allocate slot for node %s\n",
+			__func__, dn->full_name);
+		return NULL;
+	}
+
+	/* Assign memory blocks */
+	slot = pmem;
+	slot->php_slot = pmem + sizeof(struct powernv_php_slot);
+	slot->php_slot->info = pmem + sizeof(struct powernv_php_slot) +
+			      sizeof(struct hotplug_slot);
+	slot->name = kstrdup(label, GFP_KERNEL);
+	if (!slot->name) {
+		pr_warn("%s: Cannot populate name for node %s\n",
+			__func__, dn->full_name);
+		kfree(pmem);
+		return NULL;
+	}
+
+	/* Initialize slot */
+	kref_init(&slot->kref);
+	slot->state = POWERNV_PHP_SLOT_STATE_INIT;
+	slot->dn = dn;
+	slot->bus = bus;
+	slot->id = id;
+	slot->slot_no = slot_no;
+	INIT_WORK(&slot->work, powernv_php_slot_work);
+	init_waitqueue_head(&slot->queue);
+	slot->check_power_status = 0;
+	slot->status_confirmed = 0;
+	slot->release = php_slot_free;
+	slot->php_slot->ops = &php_slot_ops;
+	slot->php_slot->release = php_slot_release;
+	slot->php_slot->private = slot;
+	INIT_LIST_HEAD(&slot->children);
+	INIT_LIST_HEAD(&slot->link);
+
+	return slot;
+}
+
+int powernv_php_slot_register(struct powernv_php_slot *slot)
+{
+	struct powernv_php_slot *parent;
+	struct device_node *dn = slot->dn;
+	unsigned long flags;
+	int ret;
+
+	/* Avoid register same slot for twice */
+	if (powernv_php_slot_find(slot->dn))
+		return -EEXIST;
+
+	/* Register slot */
+	ret = pci_hp_register(slot->php_slot, slot->bus,
+			      slot->slot_no, slot->name);
+	if (ret) {
+		pr_warn("%s: Cannot register slot %s (%d)\n",
+			__func__, slot->name, ret);
+		return ret;
+	}
+
+	/* Put into global or parent list */
+	while ((dn = of_get_parent(dn))) {
+		if (!PCI_DN(dn)) {
+			of_node_put(dn);
+			break;
+		}
+
+		parent = powernv_php_slot_find(dn);
+		if (parent) {
+			of_node_put(dn);
+			break;
+		}
+	}
+
+	spin_lock_irqsave(&php_slot_lock, flags);
+	if (parent) {
+		powernv_php_slot_get(parent);
+		slot->parent = parent;
+		list_add_tail(&slot->link, &parent->children);
+	} else {
+		list_add_tail(&slot->link, &php_slot_list);
+	}
+	spin_unlock_irqrestore(&php_slot_lock, flags);
+
+	/* Update slot state */
+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
+	return 0;
+}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2015-04-27  6:39 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-27  6:37 [PATCH v3 00/21] PowerPC/PowerNV: PCI Slot Management Gavin Shan
2015-04-27  6:37 ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 01/21] pci: Add pcibios_setup_bridge() Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 02/21] powerpc/powernv: Enable M64 on P7IOC Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 03/21] powerpc/powernv: M64 support improvement Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 04/21] powerpc/powernv: Improve IO and M32 mapping Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 05/21] powerpc/powernv: Improve DMA32 segment assignment Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 06/21] powerpc/powernv: Create PEs dynamically Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 07/21] powerpc/powernv: Release " Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 08/21] powerpc/powernv: Drop pnv_ioda_setup_dev_PE() Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 09/21] powerpc/eeh: Delay probing EEH device during hotplug Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 10/21] powerpc/powernv: Use PCI slot reset infrastructure Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 11/21] powerpc/powernv: Fundamental reset for PCI bus reset Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 12/21] powerpc/pci: Don't scan empty slot Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 13/21] powerpc/pci: Move pcibios_find_pci_bus() around Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 14/21] powerpc/powernv: Introduce pnv_pci_poll() Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 15/21] powerpc/powernv: Functions to get/reset PCI slot status Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 16/21] powerpc/pci: Delay creating pci_dn Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 17/21] powerpc/pci: Create eeh_dev while " Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 18/21] powerpc/pci: Export traverse_pci_device_nodes() Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 19/21] powerpc/pci: Update bridge windows on PCI plugging Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 20/21] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
2015-04-27  6:37   ` Gavin Shan
2015-04-27  6:37 ` [PATCH v3 21/21] pci/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
2015-04-27  6:37   ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.