All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
@ 2016-02-17  3:43 Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 01/45] PCI: Add pcibios_setup_bridge() Gavin Shan
                   ` (39 more replies)
  0 siblings, 40 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This series of patches rebases on powerpc/next branch, plus below additional
patches:

   <This series of patches>
   <Followup 3 patches from Gavin on SRIOV EEH, which aren't posted>
   https://patchwork.ozlabs.org/patch/581315/	(PATCH[1/9] Richard's SRIOV EEH)
   https://patchwork.ozlabs.org/patch/582639/	(PATCH[1/1] Gavin's EEH fix)
   https://patchwork.ozlabs.org/patch/582093/	(PATCH[1/1] Gavin's EEH fix)
   https://patchwork.ozlabs.org/patch/580626/	(PATCH[1/4] Gavin's PCI fix)
   https://patchwork.ozlabs.org/patch/580153/	(PATCH[1/1] Andrew's EEH minor fix)
   https://patchwork.ozlabs.org/patch/566827/	(PATCH[1/1] Russell's P5IOC2 removal)
   https://patchwork.ozlabs.org/patch/534154/	(PATCH[1/7] Richard's SRIOV rework)
   commit 388f7b1 ("Linux 4.5-rc3")
   
The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
for review. The PCI slots are exposed by skiboot with device node properties,
and kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug
because the PE is assigned during PHB fixup time, which is called for once
during system boot time. For this, the PCI infrastructure on PowerNV platform
has been reworked for a lot. After that, the PE and its corresponding resources
(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
resources, on P8 strictly speaking). Each PE will maintain a reference count,
which is (number of child PCI devices + 1). That indicates when last child PCI
device leaves the PE, the PE and its included resources will be relased and put
back into free pool again. With this design, the PE will be released when EEH PE
is released. PATCH[1 - 23] are related to this part.

>From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
resets to EEH. The kernel gets to know if skiboot supports various reset on one
particular PCI slot through device-tree node. If it does, EEH will utilize the
functionality provided by skiboot. Besides, the device-tree nodes have to change
in order to support PCI hotplug. For example, when one PCI adapter inserted to
one slot, its device-tree node should be added to the system dynamically. Conversely,
the device-tree node should be removed from the system when the PCI adapter is going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
doing the related work.

The OF driver is changed to support unflattening FDT blob for sub-stree, which
is covered by PATCH[40 - 44].

The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC PowerNV
platform.

=======
Testing
=======
1. Unplug adapters behind non-empty slot, then plug them.

   1.1 Check status
   # cat /sys/bus/pci/slots/C10/address 
   0003:09:00
   # cat /sys/bus/pci/slots/C10/adapter 
   1
   # cat /sys/bus/pci/slots/C10/power 
   1
   # lspci
   0003:09:00.0 Ethernet controller: \
   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
   0003:09:00.1 Ethernet controller: \
   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
   0003:09:00.2 Ethernet controller: \
   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
   0003:09:00.3 Ethernet controller: \
   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
   # lspci -t
   # lspci -t
   -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
    |                                           +-08.0-[04-08]--
    |                                           +-09.0-[09]--+-00.0
    |                                           |            +-00.1
    |                                           |            +-00.2
    |                                           |            \-00.3
    |                                           +-10.0-[0a-0e]--
    |                                           \-11.0-[0f-13]--

   1.2 Unplug adapter 0003:09.00.x
   # echo 0 > /sys/bus/pci/slots/C10/power 
   # lspci -t
   -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
    |                                           +-08.0-[04-08]--
    |                                           +-09.0-[09]--
    |                                           +-10.0-[0a-0e]--
    |                                           \-11.0-[0f-13]--

   1.3 Plug adapter 0003:09.00.x
   # echo 1 > /sys/bus/pci/slots/C10/power 
   # lspci -t
   -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
    |                                           +-08.0-[04-08]--
    |                                           +-09.0-[09]--+-00.0
    |                                           |            +-00.1
    |                                           |            +-00.2
    |                                           |            \-00.3
    |                                           +-10.0-[0a-0e]--
    |                                           \-11.0-[0f-13]--
 

   1.4 Inject EEH error to adapter 0003:09:00.x, which is recovered.
   # cat /sys/bus/pci/devices/0003:09:00.0/eeh_pe_config_addr 
   0x1
   # echo 1:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0003/err_injct
   # lspci -ns 0003:09:00.0
   # dmesg | grep EEH
   EEH: Frozen PHB#3-PE#1 detected
   EEH: PE location: U78C9.001.WZS00CF-P1-C10, PHB location: N/A
   EEH: Detected PCI bus error on PHB#3-PE#1
   EEH: This PCI device has failed 1 times in the last hour
   EEH: Notify device drivers to shutdown
   EEH: Collect temporary log
   EEH: Reset without hotplug activity
   EEH: Notify device drivers the completion of reset
   EEH: Notify device driver to resume

2. Plug adapter and then unplug it. This requires hack in skiboot
   to skip probing the adapters behind the target (C12 in the
   testing) for once.

   2.1 Check status
   # cat /sys/bus/pci/slots/C12/address 
   0001:06
   # cat /sys/bus/pci/slots/C12/power 
   0
   # cat /sys/bus/pci/slots/C12/adapter 
   1
   # lspci -t
   +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
                                               +-08.0-[05]----00.0
                                               \-09.0-[06-0a]--

   2.2 Plug adapter 0001:06:00.x
   # echo 1 > /sys/bus/pci/slots/C12/power
   # lspci -t
   +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
                                               +-08.0-[05]----00.0
                                               \-09.0-[06-0a]--+-00.0
                                                               \-00.1
   # lspci
   0001:06:00.0 Ethernet controller: \
   Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
   0001:06:00.1 Ethernet controller: \
   Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

   2.3 Inject EEH error to adapter 0001:06:00.x, which is recovered
   # cat /sys/bus/pci/devices/0001:06:00.0/eeh_pe_config_addr 
   0x2
   # echo 2:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0001/err_injct
   # dmesg | grep EEH
   EEH: Frozen PHB#1-PE#2 detected
   EEH: PE location: U78C9.001.WZS00CF-P1-C12, PHB location: N/A
   EEH: Detected PCI bus error on PHB#1-PE#2
   EEH: This PCI device has failed 1 times in the last hour
   EEH: Notify device drivers to shutdown
   EEH: Collect temporary log
   EEH: Reset without hotplug activity
   EEH: Notify device drivers the completion of reset
   EEH: Notify device driver to resume

   2.4 Unplug adapter 0001:06:00.x
   # echo 0 > /sys/bus/pci/slots/C12/power
   # lspci -t
   +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
                                               +-08.0-[05]----00.0
                                               \-09.0-[06-0a]--

=========
Changelog
=========
v8:
   * Rebased to linux-powerpc next branch.
   * Resolve comments from Alexey and Daniel on PCI part
   * Resolve comments from Rob on fdt.c
   * Retested (refer to the "Testing section")
v7:
   * Reworked revision to some extent.
   * Rebased to powerpc/next repository.
   * Reorder/split/merge/drop according - Alexey.
   * Defined macros and use array to track IO/M32/M64/DMA32 segments - Alexey.
   * Merged 3 files to one for the hotplug driver - Alexey.
   * As part of OPAL API, defined macros for PCI slot power state, hotplug
     message type. Defined macros for PCI slot power confirmed state in
     hotplug driver.
   * Misc comments from Alexey.
   * Reworked unflatten_dt_node() to avoid recursive function calls.
   * Use EXPORT_SYMBOL_GPL() and document function's input/output - Rob/Frank.
v6:
   * Patch reorder, split, squash - Alexey.
   * Minor coding style - Alexey.
   * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
   * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
   * Concurrent depth as parameter passed to __unflatten_dt_node() - Grant / Alexey
   * Replace overlay with of_changeset - Grant
v5:
   * Rebased to 4.1.rc6 and some unmerged patches as below:
     Alexey's DDW patchset (v11);
     Gavin's EEH error injection support (in mpe's next branch);
     Richard's EEH cleanup patches (in mpe's next branch);
     Richard's EEH support for VF (v7);
     Gavin's misc EEH fixes for 4.2;
   * The revision bases on skiboot corresponding patches (v7):
     https://patchwork.ozlabs.org/patch/480437/
   * Utilize OF overlay to update device-tree with help of newly introduced
     OPAL API opal_get_overlay_dt().
   * Split patches for easy review according to aik's comments.
   * Fix coding style from checkpatchc.pl as pointed by aik.
   * Code cleanup and misc fixup according to aik's input.
v4:
   * Rebased to 4.1.RC1
   * Added API to unflatten FDT blob to device node sub-tree, which is attached
     the indicated parent device node. The original mechanism based on formatted
     string stream has been dropped.
   * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
     was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
     Support" depends on that.
v3:
   * Rebased to 4.1.RC0
   * PowerNV PCI infrasturcture is total refactored in order to support PCI
     hotplug. The PowerNV hotplug driver is also reworked a lot because of
     the changes in skiboot in order to support PCI hotplug.

Gavin Shan (45):
  PCI: Add pcibios_setup_bridge()
  powerpc/pci: Override pcibios_setup_bridge()
  powerpc/pci: Cleanup on struct pci_controller_ops
  powerpc/powernv: Cleanup on pci_controller_ops instances
  powerpc/powernv: Drop phb->bdfn_to_pe()
  powerpc/powernv: Reorder fields in struct pnv_phb
  powerpc/powernv: Rename PE# fields in struct pnv_phb
  powerpc/powernv: Fix initial IO and M32 segmap
  powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  powerpc/powernv: IO and M32 mapping based on PCI device resources
  powerpc/powernv: Track M64 segment consumption
  powerpc/powernv: Rename M64 related functions
  powerpc/powernv/ioda1: M64 support on P7IOC
  powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()
  powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
  powerpc/powernv: Remove DMA32 PE list
  powerpc/powernv/ioda1: Improve DMA32 segment track
  powerpc/powernv: Increase PE# capacity
  powerpc/powernv: Use PE instead of number during setup and release
  powerpc/powernv: Allocate PE# in reverse order
  powerpc/powernv: Create PEs at PCI hot plugging time
  powerpc/powernv/ioda1: Support releasing IODA1 TCE table
  powerpc/powernv: Dynamically release PEs
  powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
  powerpc/pci: Rename pcibios_find_pci_bus()
  powerpc/pci: Move pci_find_bus_by_node() around
  powerpc/pci: Export pci_add_device_node_info()
  powerpc/pci: Introduce pci_remove_device_node_info()
  powerpc/pci: Export pci_traverse_device_nodes()
  powerpc/pci: Delay populating pdn
  powerpc/pci: Don't scan empty slot
  powerpc/pci: Update bridge windows on PCI plug
  powerpc/powernv: Simplify pnv_eeh_reset()
  powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
  powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  powerpc/powernv: Support PCI slot ID
  powerpc/powernv: Use firmware PCI slot reset infrastructure
  powerpc/powernv: Functions to get/set PCI slot status
  powerpc/powernv: Select OF_DYNAMIC
  drivers/of: Split unflatten_dt_node()
  drivers/of: Avoid recursively calling unflatten_dt_node()
  drivers/of: Rename unflatten_dt_node()
  drivers/of: Specify parent node in of_fdt_unflatten_tree()
  drivers/of: Return allocated memory from of_fdt_unflatten_tree()
  PCI/hotplug: PowerPC PowerNV PCI hotplug driver

 arch/powerpc/include/asm/eeh.h                 |    2 +-
 arch/powerpc/include/asm/opal-api.h            |   17 +-
 arch/powerpc/include/asm/opal.h                |    8 +-
 arch/powerpc/include/asm/pci-bridge.h          |   25 +-
 arch/powerpc/include/asm/pnv-pci.h             |    7 +
 arch/powerpc/include/asm/ppc-pci.h             |    8 +-
 arch/powerpc/kernel/eeh_dev.c                  |   17 +-
 arch/powerpc/kernel/eeh_driver.c               |   12 +-
 arch/powerpc/kernel/pci-common.c               |   16 +-
 arch/powerpc/kernel/pci-hotplug.c              |   47 +-
 arch/powerpc/kernel/pci_dn.c                   |   89 +-
 arch/powerpc/platforms/maple/pci.c             |   34 +-
 arch/powerpc/platforms/pasemi/pci.c            |    3 -
 arch/powerpc/platforms/powermac/pci.c          |   38 +-
 arch/powerpc/platforms/powernv/Kconfig         |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c   |  179 ++--
 arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
 arch/powerpc/platforms/powernv/pci-ioda.c      | 1243 +++++++++++++++---------
 arch/powerpc/platforms/powernv/pci.c           |   92 +-
 arch/powerpc/platforms/powernv/pci.h           |   60 +-
 arch/powerpc/platforms/pseries/msi.c           |    4 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
 arch/powerpc/platforms/pseries/setup.c         |    8 +-
 drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c   |    2 +-
 drivers/of/fdt.c                               |  372 ++++---
 drivers/of/unittest.c                          |    2 +-
 drivers/pci/hotplug/Kconfig                    |   12 +
 drivers/pci/hotplug/Makefile                   |    3 +
 drivers/pci/hotplug/pnv_php.c                  |  870 +++++++++++++++++
 drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
 drivers/pci/hotplug/rpaphp_core.c              |    4 +-
 drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
 drivers/pci/setup-bus.c                        |    5 +
 include/linux/of_fdt.h                         |    5 +-
 include/linux/pci.h                            |    1 +
 35 files changed, 2360 insertions(+), 874 deletions(-)
 create mode 100644 drivers/pci/hotplug/pnv_php.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 174+ messages in thread

* [PATCH v8 01/45] PCI: Add pcibios_setup_bridge()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                   ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(),
which is called for once after PCI probing and resource assignment
are completed, to allocate platform required resources for PCI devices:
PE#, IO and MMIO mapping, DMA address translation (TCE) table etc.
Obviously, it's not hotplug friendly.

This adds weak function pcibios_setup_bridge(), which is called by
pci_setup_bridge(). PowerPC PowerNV platform will reuse the function
to assign above platform required resources to newly plugged PCI devices
during PCI hotplug in subsequent patches.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/setup-bus.c | 5 +++++
 include/linux/pci.h     | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7796d0a..acda514 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -696,11 +696,16 @@ static void __pci_setup_bridge(struct pci_bus *bus, unsigned long type)
 	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl);
 }
 
+void __weak pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+}
+
 void pci_setup_bridge(struct pci_bus *bus)
 {
 	unsigned long type = IORESOURCE_IO | IORESOURCE_MEM |
 				  IORESOURCE_PREFETCH;
 
+	pcibios_setup_bridge(bus, type);
 	__pci_setup_bridge(bus, type);
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index bc435d62..8161c79 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -842,6 +842,7 @@ void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
 void pci_stop_root_bus(struct pci_bus *bus);
 void pci_remove_root_bus(struct pci_bus *bus);
 void pci_setup_cardbus(struct pci_bus *bus);
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type);
 void pci_sort_breadthfirst(void);
 #define dev_is_pci(d) ((d)->bus == &pci_bus_type)
 #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false))
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 01/45] PCI: Add pcibios_setup_bridge() Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
       [not found]   ` <1455680668-23298-3-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2016-02-17  3:43 ` [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
                   ` (37 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This overrides pcibios_setup_bridge() that is called to update PCI
bridge windows when PCI resource assignment is completed, to assign
PE and setup various (resource) mapping for the PE in subsequent
patches.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h | 2 ++
 arch/powerpc/kernel/pci-common.c      | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 9f165e8..b688d04 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -33,6 +33,8 @@ struct pci_controller_ops {
 
 	/* Called during PCI resource reassignment */
 	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
+	void		(*setup_bridge)(struct pci_bus *bus,
+					unsigned long type);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
 
 #ifdef CONFIG_PCI_MSI
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0f7a60f..40df3a5 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -124,6 +124,14 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 	return 1;
 }
 
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+
+	if (hose->controller_ops.setup_bridge)
+		hose->controller_ops.setup_bridge(bus, type);
+}
+
 void pcibios_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *phb = pci_bus_to_host(dev->bus);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 01/45] PCI: Add pcibios_setup_bridge() Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-02-17  4:18   ` Andrew Donnellan
       [not found]   ` <1455680668-23298-4-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2016-02-17  3:43 ` [PATCH v8 04/45] powerpc/powernv: Cleanup on pci_controller_ops instances Gavin Shan
                   ` (36 subsequent siblings)
  39 siblings, 2 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

Each PHB has one instance of "struct pci_controller_ops", which
includes various callbacks called by PCI subsystem. In the definition
of this struct, some callbacks have explicit names for its arguments,
but the left don't have.

This adds all explicit names of the arguments to the callbacks in
"struct pci_controller_ops" so that the code looks consistent.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/include/asm/pci-bridge.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index b688d04..4dd6ef4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -21,18 +21,19 @@ struct pci_controller_ops {
 	void		(*dma_dev_setup)(struct pci_dev *dev);
 	void		(*dma_bus_setup)(struct pci_bus *bus);
 
-	int		(*probe_mode)(struct pci_bus *);
+	int		(*probe_mode)(struct pci_bus *bus);
 
 	/* Called when pci_enable_device() is called. Returns true to
 	 * allow assignment/enabling of the device. */
-	bool		(*enable_device_hook)(struct pci_dev *);
+	bool		(*enable_device_hook)(struct pci_dev *dev);
 
-	void		(*disable_device)(struct pci_dev *);
+	void		(*disable_device)(struct pci_dev *dev);
 
-	void		(*release_device)(struct pci_dev *);
+	void		(*release_device)(struct pci_dev *dev);
 
 	/* Called during PCI resource reassignment */
-	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
+	resource_size_t (*window_alignment)(struct pci_bus *bus,
+					    unsigned long type);
 	void		(*setup_bridge)(struct pci_bus *bus,
 					unsigned long type);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
@@ -46,7 +47,7 @@ struct pci_controller_ops {
 	int             (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask);
 	u64		(*dma_get_required_mask)(struct pci_dev *dev);
 
-	void		(*shutdown)(struct pci_controller *);
+	void		(*shutdown)(struct pci_controller *hose);
 };
 
 /*
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 04/45] powerpc/powernv: Cleanup on pci_controller_ops instances
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (2 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-02-17  4:38   ` Andrew Donnellan
  2016-02-17  3:43 ` [PATCH v8 06/45] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
                   ` (35 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This cleans up on below data struct instances to use tab instead of
space indent of statement to avoid complains from scripts/checkpatch.pl.
No logical changes introduced.

  @pnv_pci_ioda_controller_ops
  @pnv_npu_ioda_controller_ops

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 36 +++++++++++++++----------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index c5baaf3..524c9c7 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3210,31 +3210,31 @@ static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
 }
 
 static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
-       .dma_dev_setup = pnv_pci_dma_dev_setup,
-       .dma_bus_setup = pnv_pci_dma_bus_setup,
+	.dma_dev_setup		= pnv_pci_dma_dev_setup,
+	.dma_bus_setup		= pnv_pci_dma_bus_setup,
 #ifdef CONFIG_PCI_MSI
-       .setup_msi_irqs = pnv_setup_msi_irqs,
-       .teardown_msi_irqs = pnv_teardown_msi_irqs,
+	.setup_msi_irqs		= pnv_setup_msi_irqs,
+	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
 #endif
-       .enable_device_hook = pnv_pci_enable_device_hook,
-       .window_alignment = pnv_pci_window_alignment,
-       .reset_secondary_bus = pnv_pci_reset_secondary_bus,
-       .dma_set_mask = pnv_pci_ioda_dma_set_mask,
-       .dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask,
-       .shutdown = pnv_pci_ioda_shutdown,
+	.enable_device_hook	= pnv_pci_enable_device_hook,
+	.window_alignment	= pnv_pci_window_alignment,
+	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
+	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
+	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
+	.shutdown		= pnv_pci_ioda_shutdown,
 };
 
 static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
-	.dma_dev_setup = pnv_pci_dma_dev_setup,
+	.dma_dev_setup		= pnv_pci_dma_dev_setup,
 #ifdef CONFIG_PCI_MSI
-	.setup_msi_irqs = pnv_setup_msi_irqs,
-	.teardown_msi_irqs = pnv_teardown_msi_irqs,
+	.setup_msi_irqs		= pnv_setup_msi_irqs,
+	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
 #endif
-	.enable_device_hook = pnv_pci_enable_device_hook,
-	.window_alignment = pnv_pci_window_alignment,
-	.reset_secondary_bus = pnv_pci_reset_secondary_bus,
-	.dma_set_mask = pnv_npu_dma_set_mask,
-	.shutdown = pnv_pci_ioda_shutdown,
+	.enable_device_hook	= pnv_pci_enable_device_hook,
+	.window_alignment	= pnv_pci_window_alignment,
+	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
+	.dma_set_mask		= pnv_npu_dma_set_mask,
+	.shutdown		= pnv_pci_ioda_shutdown,
 };
 
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 05/45] powerpc/powernv: Drop phb->bdfn_to_pe()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2016-02-17  3:43     ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                       ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, aik-sLpHqDYs0B2HXe+LvDLADg,
	dja-Yfaxwxk/+vWsTnJN9+BGXg, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, Gavin Shan

This drops struct pnv_phb::bdfn_to_pe() as nobody uses it.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 ---------
 arch/powerpc/platforms/powernv/pci.h      | 1 -
 2 files changed, 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 524c9c7..10ecd97 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3195,12 +3195,6 @@ static bool pnv_pci_enable_device_hook(struct pci_dev *dev)
 	return true;
 }
 
-static u32 pnv_ioda_bdfn_to_pe(struct pnv_phb *phb, struct pci_bus *bus,
-			       u32 devfn)
-{
-	return phb->ioda.pe_rmap[(bus->number << 8) | devfn];
-}
-
 static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
 {
 	struct pnv_phb *phb = hose->private_data;
@@ -3377,9 +3371,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->freeze_pe = pnv_ioda_freeze_pe;
 	phb->unfreeze_pe = pnv_ioda_unfreeze_pe;
 
-	/* Setup RID -> PE mapping function */
-	phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe;
-
 	/* Setup TCEs */
 	phb->dma_dev_setup = pnv_pci_ioda_dma_dev_setup;
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3f814f3..78f035e 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -110,7 +110,6 @@ struct pnv_phb {
 			 unsigned int is_64, struct msi_msg *msg);
 	void (*dma_dev_setup)(struct pnv_phb *phb, struct pci_dev *pdev);
 	void (*fixup_phb)(struct pci_controller *hose);
-	u32 (*bdfn_to_pe)(struct pnv_phb *phb, struct pci_bus *bus, u32 devfn);
 	int (*init_m64)(struct pnv_phb *phb);
 	void (*reserve_m64_pe)(struct pci_bus *bus,
 			       unsigned long *pe_bitmap, bool all);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 05/45] powerpc/powernv: Drop phb->bdfn_to_pe()
@ 2016-02-17  3:43     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This drops struct pnv_phb::bdfn_to_pe() as nobody uses it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 ---------
 arch/powerpc/platforms/powernv/pci.h      | 1 -
 2 files changed, 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 524c9c7..10ecd97 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3195,12 +3195,6 @@ static bool pnv_pci_enable_device_hook(struct pci_dev *dev)
 	return true;
 }
 
-static u32 pnv_ioda_bdfn_to_pe(struct pnv_phb *phb, struct pci_bus *bus,
-			       u32 devfn)
-{
-	return phb->ioda.pe_rmap[(bus->number << 8) | devfn];
-}
-
 static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
 {
 	struct pnv_phb *phb = hose->private_data;
@@ -3377,9 +3371,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->freeze_pe = pnv_ioda_freeze_pe;
 	phb->unfreeze_pe = pnv_ioda_unfreeze_pe;
 
-	/* Setup RID -> PE mapping function */
-	phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe;
-
 	/* Setup TCEs */
 	phb->dma_dev_setup = pnv_pci_ioda_dma_dev_setup;
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3f814f3..78f035e 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -110,7 +110,6 @@ struct pnv_phb {
 			 unsigned int is_64, struct msi_msg *msg);
 	void (*dma_dev_setup)(struct pnv_phb *phb, struct pci_dev *pdev);
 	void (*fixup_phb)(struct pci_controller *hose);
-	u32 (*bdfn_to_pe)(struct pnv_phb *phb, struct pci_bus *bus, u32 devfn);
 	int (*init_m64)(struct pnv_phb *phb);
 	void (*reserve_m64_pe)(struct pci_bus *bus,
 			       unsigned long *pe_bitmap, bool all);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 06/45] powerpc/powernv: Reorder fields in struct pnv_phb
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (3 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 04/45] powerpc/powernv: Cleanup on pci_controller_ops instances Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-04-13  5:56   ` Alexey Kardashevskiy
  2016-02-17  3:43 ` [PATCH v8 07/45] powerpc/powernv: Rename PE# " Gavin Shan
                   ` (34 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This moves those fields in struct pnv_phb that are related to PE
allocation around. No logical change.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 78f035e..f2a1452 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -140,15 +140,14 @@ struct pnv_phb {
 		unsigned int		io_segsize;
 		unsigned int		io_pci_base;
 
-		/* PE allocation bitmap */
-		unsigned long		*pe_alloc;
-		/* PE allocation mutex */
+		/* PE allocation */
 		struct mutex		pe_alloc_mutex;
+		unsigned long		*pe_alloc;
+		struct pnv_ioda_pe	*pe_array;
 
 		/* M32 & IO segment maps */
 		unsigned int		*m32_segmap;
 		unsigned int		*io_segmap;
-		struct pnv_ioda_pe	*pe_array;
 
 		/* IRQ chip */
 		int			irq_chip_init;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 07/45] powerpc/powernv: Rename PE# fields in struct pnv_phb
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (4 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 06/45] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-04-13  5:57   ` Alexey Kardashevskiy
  2016-02-17  3:43 ` [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap Gavin Shan
                   ` (33 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This renames the fields related to PE number in "struct pnv_phb"
for better reflecting of their usages as Alexey suggested. No
logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c    | 58 ++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.c         |  2 +-
 arch/powerpc/platforms/powernv/pci.h         |  4 +-
 4 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 950b3e5..69e41ce 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -75,7 +75,7 @@ static int pnv_eeh_init(void)
 		 * and P7IOC separately. So we should regard
 		 * PE#0 as valid for PHB3 and P7IOC.
 		 */
-		if (phb->ioda.reserved_pe != 0)
+		if (phb->ioda.reserved_pe_idx != 0)
 			eeh_add_flag(EEH_VALID_PE_ZERO);
 
 		break;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 10ecd97..1d2514f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -124,7 +124,7 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 
 static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 {
-	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
+	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
 		pr_warn("%s: Invalid PE %d on PHB#%x\n",
 			__func__, pe_no, phb->hose->global_number);
 		return;
@@ -144,8 +144,8 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 
 	do {
 		pe = find_next_zero_bit(phb->ioda.pe_alloc,
-					phb->ioda.total_pe, 0);
-		if (pe >= phb->ioda.total_pe)
+					phb->ioda.total_pe_num, 0);
+		if (pe >= phb->ioda.total_pe_num)
 			return IODA_INVALID_PE;
 	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
@@ -199,13 +199,13 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 	 * expected to be 0 or last one of PE capabicity.
 	 */
 	r = &phb->hose->mem_resources[1];
-	if (phb->ioda.reserved_pe == 0)
+	if (phb->ioda.reserved_pe_idx == 0)
 		r->start += phb->ioda.m64_segsize;
-	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
 		r->end -= phb->ioda.m64_segsize;
 	else
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
-			phb->ioda.reserved_pe);
+			phb->ioda.reserved_pe_idx);
 
 	return 0;
 
@@ -274,7 +274,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 		return IODA_INVALID_PE;
 
 	/* Allocate bitmap */
-	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
+	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
 	pe_alloc = kzalloc(size, GFP_KERNEL);
 	if (!pe_alloc) {
 		pr_warn("%s: Out of memory !\n",
@@ -290,7 +290,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 	 * contributed by its child buses. For the case, we needn't
 	 * pick M64 dependent PE#.
 	 */
-	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
+	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
 		kfree(pe_alloc);
 		return IODA_INVALID_PE;
 	}
@@ -301,8 +301,8 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 	 */
 	master_pe = NULL;
 	i = -1;
-	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
-		phb->ioda.total_pe) {
+	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe_num, i + 1)) <
+		phb->ioda.total_pe_num) {
 		pe = &phb->ioda.pe_array[i];
 
 		if (!master_pe) {
@@ -355,7 +355,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	hose->mem_offset[1] = res->start - pci_addr;
 
 	phb->ioda.m64_size = resource_size(res);
-	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe;
+	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe_num;
 	phb->ioda.m64_base = pci_addr;
 
 	pr_info(" MEM64 0x%016llx..0x%016llx -> 0x%016llx\n",
@@ -456,7 +456,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
 	s64 rc;
 
 	/* Sanity check on PE number */
-	if (pe_no < 0 || pe_no >= phb->ioda.total_pe)
+	if (pe_no < 0 || pe_no >= phb->ioda.total_pe_num)
 		return OPAL_EEH_STOPPED_PERM_UNAVAIL;
 
 	/*
@@ -1088,7 +1088,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
 	 * same GPU get assigned the same PE.
 	 */
 	gpu_pdev = pnv_pci_get_gpu_dev(npu_pdev);
-	for (pe_num = 0; pe_num < phb->ioda.total_pe; pe_num++) {
+	for (pe_num = 0; pe_num < phb->ioda.total_pe_num; pe_num++) {
 		pe = &phb->ioda.pe_array[pe_num];
 		if (!pe->pdev)
 			continue;
@@ -1537,9 +1537,9 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 		} else {
 			mutex_lock(&phb->ioda.pe_alloc_mutex);
 			*pdn->pe_num_map = bitmap_find_next_zero_area(
-				phb->ioda.pe_alloc, phb->ioda.total_pe,
+				phb->ioda.pe_alloc, phb->ioda.total_pe_num,
 				0, num_vfs, 0);
-			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
+			if (*pdn->pe_num_map >= phb->ioda.total_pe_num) {
 				mutex_unlock(&phb->ioda.pe_alloc_mutex);
 				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
 				kfree(pdn->pe_num_map);
@@ -2858,7 +2858,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	pdn->m64_single_mode = false;
 
 	total_vfs = pci_sriov_get_totalvfs(pdev);
-	mul = phb->ioda.total_pe;
+	mul = phb->ioda.total_pe_num;
 	total_vf_bar_sz = 0;
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
@@ -2960,7 +2960,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 			region.end   = res->end - phb->ioda.io_pci_base;
 			index = region.start / phb->ioda.io_segsize;
 
-			while (index < phb->ioda.total_pe &&
+			while (index < phb->ioda.total_pe_num &&
 			       region.start <= region.end) {
 				phb->ioda.io_segmap[index] = pe->pe_number;
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
@@ -2985,7 +2985,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 				       phb->ioda.m32_pci_base;
 			index = region.start / phb->ioda.m32_segsize;
 
-			while (index < phb->ioda.total_pe &&
+			while (index < phb->ioda.total_pe_num &&
 			       region.start <= region.end) {
 				phb->ioda.m32_segmap[index] = pe->pe_number;
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
@@ -3300,13 +3300,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		pr_err("  Failed to map registers !\n");
 
 	/* Initialize more IODA stuff */
-	phb->ioda.total_pe = 1;
+	phb->ioda.total_pe_num = 1;
 	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
 	if (prop32)
-		phb->ioda.total_pe = be32_to_cpup(prop32);
+		phb->ioda.total_pe_num = be32_to_cpup(prop32);
 	prop32 = of_get_property(np, "ibm,opal-reserved-pe", NULL);
 	if (prop32)
-		phb->ioda.reserved_pe = be32_to_cpup(prop32);
+		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
 
 	/* Parse 64-bit MMIO range */
 	pnv_ioda_parse_m64_window(phb);
@@ -3315,29 +3315,29 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	/* FW Has already off top 64k of M32 space (MSI space) */
 	phb->ioda.m32_size += 0x10000;
 
-	phb->ioda.m32_segsize = phb->ioda.m32_size / phb->ioda.total_pe;
+	phb->ioda.m32_segsize = phb->ioda.m32_size / phb->ioda.total_pe_num;
 	phb->ioda.m32_pci_base = hose->mem_resources[0].start - hose->mem_offset[0];
 	phb->ioda.io_size = hose->pci_io_size;
-	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe;
+	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
 	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
 
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
-	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
+	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
 	m32map_off = size;
-	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
+	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
 	if (phb->type == PNV_PHB_IODA1) {
 		iomap_off = size;
-		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
+		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
 	}
 	pemap_off = size;
-	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
+	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
 	phb->ioda.m32_segmap = aux + m32map_off;
 	if (phb->type == PNV_PHB_IODA1)
 		phb->ioda.io_segmap = aux + iomap_off;
 	phb->ioda.pe_array = aux + pemap_off;
-	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
+	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
 
 	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
@@ -3356,7 +3356,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 #endif
 
 	pr_info("  %03d (%03d) PE's M32: 0x%x [segment=0x%x]\n",
-		phb->ioda.total_pe, phb->ioda.reserved_pe,
+		phb->ioda.total_pe_num, phb->ioda.reserved_pe_idx,
 		phb->ioda.m32_size, phb->ioda.m32_segsize);
 	if (phb->ioda.m64_size)
 		pr_info("                 M64: 0x%lx [segment=0x%lx]\n",
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index f838fcf..a53e4c8 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -380,7 +380,7 @@ static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
 	 */
 	pe_no = pdn->pe_number;
 	if (pe_no == IODA_INVALID_PE) {
-		pe_no = phb->ioda.reserved_pe;
+		pe_no = phb->ioda.reserved_pe_idx;
 	}
 
 	/*
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index f2a1452..784882a 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -120,8 +120,8 @@ struct pnv_phb {
 
 	struct {
 		/* Global bridge info */
-		unsigned int		total_pe;
-		unsigned int		reserved_pe;
+		unsigned int		total_pe_num;
+		unsigned int		reserved_pe_idx;
 
 		/* 32-bit MMIO window */
 		unsigned int		m32_size;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (5 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 07/45] powerpc/powernv: Rename PE# " Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-04-13  6:21   ` Alexey Kardashevskiy
  2016-02-17  3:43 ` [PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
                   ` (32 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

There are two arrays for IO and M32 segment maps on every PHB.
The index of the arrays are segment number and the value stored
in the corresponding element is PE number, indicating the segment
is assigned to the PE. Initially, all elements in those two arrays
are zeroes, meaning all segments are assigned to PE#0. It's wrong.

This fixes the initial values in the elements of those two arrays
to IODA_INVALID_PE, meaning all segments aren't assigned to any
PE. In order to use IODA_INVALID_PE (-1) to represent invalid PE
number, the types of those two arrays are changed from "unsigned int"
to "int".

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++++++--
 arch/powerpc/platforms/powernv/pci.h      | 4 ++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1d2514f..44cc5f3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3239,7 +3239,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
 	const __be64 *prop64;
 	const __be32 *prop32;
-	int len;
+	int i, len;
 	u64 phb_id;
 	void *aux;
 	long rc;
@@ -3334,8 +3334,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
 	phb->ioda.m32_segmap = aux + m32map_off;
-	if (phb->type == PNV_PHB_IODA1)
+	for (i = 0; i < phb->ioda.total_pe_num; i++)
+		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
+	if (phb->type == PNV_PHB_IODA1) {
 		phb->ioda.io_segmap = aux + iomap_off;
+		for (i = 0; i < phb->ioda.total_pe_num; i++)
+			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
+	}
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 784882a..36c4965 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -146,8 +146,8 @@ struct pnv_phb {
 		struct pnv_ioda_pe	*pe_array;
 
 		/* M32 & IO segment maps */
-		unsigned int		*m32_segmap;
-		unsigned int		*io_segmap;
+		int			*m32_segmap;
+		int			*io_segmap;
 
 		/* IRQ chip */
 		int			irq_chip_init;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (6 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-04-13  6:45   ` Alexey Kardashevskiy
  2016-02-17  3:43 ` [PATCH v8 10/45] powerpc/powernv: IO and M32 mapping based on PCI device resources Gavin Shan
                   ` (31 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

The original implementation of pnv_ioda_setup_pe_seg() configures
IO and M32 segments by separate logics, which can be merged by
by caching @segmap, @seg_size, @win in advance. This shouldn't
cause any behavioural changes.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++++++++++++++-----------------
 1 file changed, 28 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 44cc5f3..fd7d382 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2940,8 +2940,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 	struct pnv_phb *phb = hose->private_data;
 	struct pci_bus_region region;
 	struct resource *res;
-	int i, index;
-	int rc;
+	unsigned int segsize;
+	int *segmap, index, i;
+	uint16_t win;
+	int64_t rc;
 
 	/*
 	 * NOTE: We only care PCI bus based PE for now. For PCI
@@ -2958,23 +2960,9 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 		if (res->flags & IORESOURCE_IO) {
 			region.start = res->start - phb->ioda.io_pci_base;
 			region.end   = res->end - phb->ioda.io_pci_base;
-			index = region.start / phb->ioda.io_segsize;
-
-			while (index < phb->ioda.total_pe_num &&
-			       region.start <= region.end) {
-				phb->ioda.io_segmap[index] = pe->pe_number;
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping IO "
-					       "segment #%d to PE#%d\n",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
-
-				region.start += phb->ioda.io_segsize;
-				index++;
-			}
+			segsize      = phb->ioda.io_segsize;
+			segmap       = phb->ioda.io_segmap;
+			win          = OPAL_IO_WINDOW_TYPE;
 		} else if ((res->flags & IORESOURCE_MEM) &&
 			   !pnv_pci_is_mem_pref_64(res->flags)) {
 			region.start = res->start -
@@ -2983,23 +2971,29 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 			region.end   = res->end -
 				       hose->mem_offset[0] -
 				       phb->ioda.m32_pci_base;
-			index = region.start / phb->ioda.m32_segsize;
-
-			while (index < phb->ioda.total_pe_num &&
-			       region.start <= region.end) {
-				phb->ioda.m32_segmap[index] = pe->pe_number;
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping M32 "
-					       "segment#%d to PE#%d",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
+			segsize      = phb->ioda.m32_segsize;
+			segmap       = phb->ioda.m32_segmap;
+			win          = OPAL_M32_WINDOW_TYPE;
+		} else {
+			continue;
+		}
 
-				region.start += phb->ioda.m32_segsize;
-				index++;
+		index = region.start / segsize;
+		while (index < phb->ioda.total_pe_num &&
+		       region.start <= region.end) {
+			segmap[index] = pe->pe_number;
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+					pe->pe_number, win, 0, index);
+			if (rc != OPAL_SUCCESS) {
+				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
+					__func__, rc, win, index,
+					pe->phb->hose->global_number,
+					pe->pe_number);
+				break;
 			}
+
+			region.start += segsize;
+			index++;
 		}
 	}
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 10/45] powerpc/powernv: IO and M32 mapping based on PCI device resources
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (7 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption Gavin Shan
                   ` (30 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

Currently, the IO and M32 segments are mapped to the corresponding
PE based on the windows of the parent bridge of PE's primary bus.
It's not going to work when the windows of root port or upstream
port of the PCIe switch behind root port are extended to PHB's
apertures in order to support hotplug in subsequent patch.

This fixes the issue by mapping IO and M32 segments based on the
resources of the PCI devices included in the PE, instead of the
windows of the parent bridge of the PE's primary bus.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 127 +++++++++++++++++-------------
 1 file changed, 71 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index fd7d382..7330a73 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2929,71 +2929,86 @@ truncate_iov:
 }
 #endif /* CONFIG_PCI_IOV */
 
-/*
- * This function is supposed to be called on basis of PE from top
- * to bottom style. So the the I/O or MMIO segment assigned to
- * parent PE could be overrided by its child PEs if necessary.
- */
-static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
-				  struct pnv_ioda_pe *pe)
+static void pnv_ioda_setup_one_res(struct pnv_ioda_pe *pe,
+				   struct resource *res)
 {
-	struct pnv_phb *phb = hose->private_data;
+	struct pnv_phb *phb = pe->phb;
 	struct pci_bus_region region;
-	struct resource *res;
-	unsigned int segsize;
-	int *segmap, index, i;
+	unsigned int index, segsize;
+	int *segmap;
 	uint16_t win;
 	int64_t rc;
 
-	/*
-	 * NOTE: We only care PCI bus based PE for now. For PCI
-	 * device based PE, for example SRIOV sensitive VF should
-	 * be figured out later.
-	 */
-	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+	if (!res->parent || !res->flags  || res->start > res->end)
+		return;
+	if (!(res->flags & (IORESOURCE_IO | IORESOURCE_MEM)) ||
+	    pnv_pci_is_mem_pref_64(res->flags))
+		return;
 
-	pci_bus_for_each_resource(pe->pbus, res, i) {
-		if (!res || !res->flags ||
-		    res->start > res->end)
-			continue;
+	if (res->flags & IORESOURCE_IO) {
+		region.start = res->start - phb->ioda.io_pci_base;
+		region.end   = res->end - phb->ioda.io_pci_base;
+		segsize      = phb->ioda.io_segsize;
+		segmap       = phb->ioda.io_segmap;
+		win          = OPAL_IO_WINDOW_TYPE;
+	} else {
+		region.start = res->start -
+			       phb->hose->mem_offset[0] -
+			       phb->ioda.m32_pci_base;
+		region.end   = res->end -
+			       phb->hose->mem_offset[0] -
+			       phb->ioda.m32_pci_base;
+		segsize      = phb->ioda.m32_segsize;
+		segmap       = phb->ioda.m32_segmap;
+		win          = OPAL_M32_WINDOW_TYPE;
+	}
+
+	region.start = _ALIGN_DOWN(region.start, segsize);
+	region.end   = _ALIGN_UP(region.end, segsize);
+	index = region.start / segsize;
+	while (index < phb->ioda.total_pe_num && region.start < region.end) {
+		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+				pe->pe_number, win, 0, index);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
+				__func__, rc, win, index,
+				phb->hose->global_number,
+				pe->pe_number);
+			return;
+		}
 
-		if (res->flags & IORESOURCE_IO) {
-			region.start = res->start - phb->ioda.io_pci_base;
-			region.end   = res->end - phb->ioda.io_pci_base;
-			segsize      = phb->ioda.io_segsize;
-			segmap       = phb->ioda.io_segmap;
-			win          = OPAL_IO_WINDOW_TYPE;
-		} else if ((res->flags & IORESOURCE_MEM) &&
-			   !pnv_pci_is_mem_pref_64(res->flags)) {
-			region.start = res->start -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			region.end   = res->end -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			segsize      = phb->ioda.m32_segsize;
-			segmap       = phb->ioda.m32_segmap;
-			win          = OPAL_M32_WINDOW_TYPE;
-		} else {
-			continue;
+		segmap[index] = pe->pe_number;
+		region.start += segsize;
+		index++;
+	}
+}
+
+static void pnv_ioda_setup_pe_seg(struct pnv_ioda_pe *pe)
+{
+	struct pci_dev *pdev;
+	struct resource *res;
+	int i;
+
+	/* This function only works for bus dependent PE */
+	WARN_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+
+	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
+		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
+			res = &pdev->resource[i];
+			pnv_ioda_setup_one_res(pe, res);
 		}
 
-		index = region.start / segsize;
-		while (index < phb->ioda.total_pe_num &&
-		       region.start <= region.end) {
-			segmap[index] = pe->pe_number;
-			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, win, 0, index);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
-					__func__, rc, win, index,
-					pe->phb->hose->global_number,
-					pe->pe_number);
-				break;
-			}
+		/*
+		 * If the PE contains all subordinate PCI buses, the
+		 * windows of the child bridges should be mapped to
+		 * the PE as well.
+		 */
+		if (!(pe->flags & PNV_IODA_PE_BUS_ALL && pci_is_bridge(pdev)))
+			continue;
 
-			region.start += segsize;
-			index++;
+		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
+			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
+			pnv_ioda_setup_one_res(pe, res);
 		}
 	}
 }
@@ -3012,7 +3027,7 @@ static void pnv_pci_ioda_setup_seg(void)
 			continue;
 
 		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			pnv_ioda_setup_pe_seg(hose, pe);
+			pnv_ioda_setup_pe_seg(pe);
 		}
 	}
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (8 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 10/45] powerpc/powernv: IO and M32 mapping based on PCI device resources Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-04-13  7:09   ` Alexey Kardashevskiy
  2016-02-17  3:43 ` [PATCH v8 12/45] powerpc/powernv: Rename M64 related functions Gavin Shan
                   ` (29 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

When unplugging PCI devices, their parent PEs might be offline.
The consumed M64 resource by the PEs should be released at that
time. As we track M32 segment consumption, this introduces an
array to the PHB to track the mapping between M64 segment and
PE number.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++--
 arch/powerpc/platforms/powernv/pci.h      |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7330a73..fc0374a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -305,6 +305,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 		phb->ioda.total_pe_num) {
 		pe = &phb->ioda.pe_array[i];
 
+		phb->ioda.m64_segmap[pe->pe_number] = pe->pe_number;
 		if (!master_pe) {
 			pe->flags |= PNV_IODA_PE_MASTER;
 			INIT_LIST_HEAD(&pe->slaves);
@@ -3245,7 +3246,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
+	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
 	const __be64 *prop64;
 	const __be32 *prop32;
 	int i, len;
@@ -3332,6 +3333,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
 	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
+	m64map_off = size;
+	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
 	m32map_off = size;
 	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
 	if (phb->type == PNV_PHB_IODA1) {
@@ -3342,9 +3345,12 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
+	phb->ioda.m64_segmap = aux + m64map_off;
 	phb->ioda.m32_segmap = aux + m32map_off;
-	for (i = 0; i < phb->ioda.total_pe_num; i++)
+	for (i = 0; i < phb->ioda.total_pe_num; i++) {
+		phb->ioda.m64_segmap[i] = IODA_INVALID_PE;
 		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
+	}
 	if (phb->type == PNV_PHB_IODA1) {
 		phb->ioda.io_segmap = aux + iomap_off;
 		for (i = 0; i < phb->ioda.total_pe_num; i++)
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 36c4965..866a5ea 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -146,6 +146,7 @@ struct pnv_phb {
 		struct pnv_ioda_pe	*pe_array;
 
 		/* M32 & IO segment maps */
+		int			*m64_segmap;
 		int			*m32_segmap;
 		int			*io_segmap;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 12/45] powerpc/powernv: Rename M64 related functions
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (9 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
       [not found]   ` <1455680668-23298-13-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2016-02-17  3:43 ` [PATCH v8 14/45] powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe() Gavin Shan
                   ` (28 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This renames those functions picking PE number based on consumed
M64 segments, mapping M64 segments to PEs as those functions are
going to be shared by IODA1/IODA2 in next patch. No logical changes
introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index fc0374a..1dc663a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -219,7 +219,7 @@ fail:
 	return -EIO;
 }
 
-static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
+static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
 					 unsigned long *pe_bitmap)
 {
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
@@ -246,22 +246,22 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
 	}
 }
 
-static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
-				     unsigned long *pe_bitmap,
-				     bool all)
+static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
+				    unsigned long *pe_bitmap,
+				    bool all)
 {
 	struct pci_dev *pdev;
 
 	list_for_each_entry(pdev, &bus->devices, bus_list) {
-		pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
+		pnv_ioda_reserve_dev_m64_pe(pdev, pe_bitmap);
 
 		if (all && pdev->subordinate)
-			pnv_ioda2_reserve_m64_pe(pdev->subordinate,
-						 pe_bitmap, all);
+			pnv_ioda_reserve_m64_pe(pdev->subordinate,
+						pe_bitmap, all);
 	}
 }
 
-static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
+static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -283,7 +283,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 	}
 
 	/* Figure out reserved PE numbers by the PE */
-	pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
+	pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
 
 	/*
 	 * the current bus might not own M64 window and that's all
@@ -365,8 +365,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	/* Use last M64 BAR to cover M64 window */
 	phb->ioda.m64_bar_idx = 15;
 	phb->init_m64 = pnv_ioda2_init_m64;
-	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
-	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
+	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
+	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
 }
 
 static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2016-02-17  3:43     ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                       ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, aik-sLpHqDYs0B2HXe+LvDLADg,
	dja-Yfaxwxk/+vWsTnJN9+BGXg, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, Gavin Shan

This enables M64 window on P7IOC, which has been enabled on PHB3.
Different from PHB3 where 16 M64 BARs are supported and each of
them can be owned by one particular PE# exclusively or divided
evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
of them are divided to 8 segments. So every P7IOC PHB supports
128 M64 segments in total. P7IOC has M64DT, which helps mapping
one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
M64DT, indicating that one M64 segment can only be pinned to the
fixed PE#. In order to have same code to support M64 on P7IOC and
PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
of them is pinned to the fixed PE# by bypassing the function of
M64DT. In turn, we just need different phb->init_m64() for P7IOC
and PHB3 to support M64.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
 arch/powerpc/platforms/powernv/pci.h      |  3 ++
 2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1dc663a..8488238 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -246,6 +246,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
 	}
 }
 
+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+	struct resource *r;
+	int index;
+
+	/*
+	 * There are 16 M64 BARs, each of which has 8 segments. So
+	 * there are as many M64 segments as the maximum number of
+	 * PEs, which is 128.
+	 */
+	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
+		unsigned long base, segsz = phb->ioda.m64_segsize;
+		int64_t rc;
+
+		base = phb->ioda.m64_base +
+		       index * PNV_IODA1_M64_SEGS * segsz;
+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, index, base, 0,
+				PNV_IODA1_M64_SEGS * segsz);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
+				rc, phb->hose->global_number, index);
+			goto fail;
+		}
+
+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, index,
+				OPAL_ENABLE_M64_SPLIT);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
+				rc, phb->hose->global_number, index);
+			goto fail;
+		}
+	}
+
+	/*
+	 * Exclude the segment used by the reserved PE, which
+	 * is expected to be 0 or last supported PE#.
+	 */
+	r = &phb->hose->mem_resources[1];
+	if (phb->ioda.reserved_pe_idx == 0)
+		r->start += phb->ioda.m64_segsize;
+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
+		r->end -= phb->ioda.m64_segsize;
+	else
+		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
+			phb->ioda.reserved_pe_idx);
+
+	return 0;
+
+fail:
+	for ( ; index >= 0; index--)
+		opal_pci_phb_mmio_enable(phb->opal_id,
+			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
+
+	return -EIO;
+}
+
 static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
 				    unsigned long *pe_bitmap,
 				    bool all)
@@ -315,6 +373,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 			pe->master = master_pe;
 			list_add_tail(&pe->list, &master_pe->slaves);
 		}
+
+		/*
+		 * P7IOC supports M64DT, which helps mapping M64 segment
+		 * to one particular PE#. However, PHB3 has fixed mapping
+		 * between M64 segment and PE#. In order to have same logic
+		 * for P7IOC and PHB3, we enforce fixed mapping between M64
+		 * segment and PE# on P7IOC.
+		 */
+		if (phb->type == PNV_PHB_IODA1) {
+			int64_t rc;
+
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+					pe->pe_number, OPAL_M64_WINDOW_TYPE,
+					pe->pe_number / PNV_IODA1_M64_SEGS,
+					pe->pe_number % PNV_IODA1_M64_SEGS);
+			if (rc != OPAL_SUCCESS)
+				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
+					__func__, rc, phb->hose->global_number,
+					pe->pe_number);
+		}
 	}
 
 	kfree(pe_alloc);
@@ -329,8 +407,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	const u32 *r;
 	u64 pci_addr;
 
-	/* FIXME: Support M64 for P7IOC */
-	if (phb->type != PNV_PHB_IODA2) {
+	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
 		pr_info("  Not support M64 window\n");
 		return;
 	}
@@ -364,7 +441,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 
 	/* Use last M64 BAR to cover M64 window */
 	phb->ioda.m64_bar_idx = 15;
-	phb->init_m64 = pnv_ioda2_init_m64;
+	if (phb->type == PNV_PHB_IODA1)
+		phb->init_m64 = pnv_ioda1_init_m64;
+	else
+		phb->init_m64 = pnv_ioda2_init_m64;
 	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
 	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 866a5ea..00539ff 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -82,6 +82,9 @@ struct pnv_ioda_pe {
 	struct list_head	list;
 };
 
+#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
+#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
+
 #define PNV_PHB_FLAG_EEH	(1 << 0)
 
 struct pnv_phb {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC
@ 2016-02-17  3:43     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This enables M64 window on P7IOC, which has been enabled on PHB3.
Different from PHB3 where 16 M64 BARs are supported and each of
them can be owned by one particular PE# exclusively or divided
evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
of them are divided to 8 segments. So every P7IOC PHB supports
128 M64 segments in total. P7IOC has M64DT, which helps mapping
one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
M64DT, indicating that one M64 segment can only be pinned to the
fixed PE#. In order to have same code to support M64 on P7IOC and
PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
of them is pinned to the fixed PE# by bypassing the function of
M64DT. In turn, we just need different phb->init_m64() for P7IOC
and PHB3 to support M64.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
 arch/powerpc/platforms/powernv/pci.h      |  3 ++
 2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1dc663a..8488238 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -246,6 +246,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
 	}
 }
 
+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+	struct resource *r;
+	int index;
+
+	/*
+	 * There are 16 M64 BARs, each of which has 8 segments. So
+	 * there are as many M64 segments as the maximum number of
+	 * PEs, which is 128.
+	 */
+	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
+		unsigned long base, segsz = phb->ioda.m64_segsize;
+		int64_t rc;
+
+		base = phb->ioda.m64_base +
+		       index * PNV_IODA1_M64_SEGS * segsz;
+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, index, base, 0,
+				PNV_IODA1_M64_SEGS * segsz);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
+				rc, phb->hose->global_number, index);
+			goto fail;
+		}
+
+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, index,
+				OPAL_ENABLE_M64_SPLIT);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
+				rc, phb->hose->global_number, index);
+			goto fail;
+		}
+	}
+
+	/*
+	 * Exclude the segment used by the reserved PE, which
+	 * is expected to be 0 or last supported PE#.
+	 */
+	r = &phb->hose->mem_resources[1];
+	if (phb->ioda.reserved_pe_idx == 0)
+		r->start += phb->ioda.m64_segsize;
+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
+		r->end -= phb->ioda.m64_segsize;
+	else
+		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
+			phb->ioda.reserved_pe_idx);
+
+	return 0;
+
+fail:
+	for ( ; index >= 0; index--)
+		opal_pci_phb_mmio_enable(phb->opal_id,
+			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
+
+	return -EIO;
+}
+
 static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
 				    unsigned long *pe_bitmap,
 				    bool all)
@@ -315,6 +373,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 			pe->master = master_pe;
 			list_add_tail(&pe->list, &master_pe->slaves);
 		}
+
+		/*
+		 * P7IOC supports M64DT, which helps mapping M64 segment
+		 * to one particular PE#. However, PHB3 has fixed mapping
+		 * between M64 segment and PE#. In order to have same logic
+		 * for P7IOC and PHB3, we enforce fixed mapping between M64
+		 * segment and PE# on P7IOC.
+		 */
+		if (phb->type == PNV_PHB_IODA1) {
+			int64_t rc;
+
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+					pe->pe_number, OPAL_M64_WINDOW_TYPE,
+					pe->pe_number / PNV_IODA1_M64_SEGS,
+					pe->pe_number % PNV_IODA1_M64_SEGS);
+			if (rc != OPAL_SUCCESS)
+				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
+					__func__, rc, phb->hose->global_number,
+					pe->pe_number);
+		}
 	}
 
 	kfree(pe_alloc);
@@ -329,8 +407,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	const u32 *r;
 	u64 pci_addr;
 
-	/* FIXME: Support M64 for P7IOC */
-	if (phb->type != PNV_PHB_IODA2) {
+	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
 		pr_info("  Not support M64 window\n");
 		return;
 	}
@@ -364,7 +441,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 
 	/* Use last M64 BAR to cover M64 window */
 	phb->ioda.m64_bar_idx = 15;
-	phb->init_m64 = pnv_ioda2_init_m64;
+	if (phb->type == PNV_PHB_IODA1)
+		phb->init_m64 = pnv_ioda1_init_m64;
+	else
+		phb->init_m64 = pnv_ioda2_init_m64;
 	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
 	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 866a5ea..00539ff 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -82,6 +82,9 @@ struct pnv_ioda_pe {
 	struct list_head	list;
 };
 
+#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
+#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
+
 #define PNV_PHB_FLAG_EEH	(1 << 0)
 
 struct pnv_phb {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 14/45] powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (10 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 12/45] powerpc/powernv: Rename M64 related functions Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-04-13  7:36   ` Alexey Kardashevskiy
  2016-02-17  3:43 ` [PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list Gavin Shan
                   ` (27 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe()
as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe().
No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8488238..d18b95e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2026,9 +2026,10 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
 	.free = pnv_ioda2_table_free,
 };
 
-static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
-				      struct pnv_ioda_pe *pe, unsigned int base,
-				      unsigned int segs)
+static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
+				       struct pnv_ioda_pe *pe,
+				       unsigned int base,
+				       unsigned int segs)
 {
 
 	struct page *tce_mem = NULL;
@@ -2616,7 +2617,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 		if (phb->type == PNV_PHB_IODA1) {
 			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
 				pe->dma_weight, segs);
-			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
+			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
 		} else if (phb->type == PNV_PHB_IODA2) {
 			pe_info(pe, "Assign DMA32 space\n");
 			segs = 0;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2016-02-17  3:43     ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                       ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, aik-sLpHqDYs0B2HXe+LvDLADg,
	dja-Yfaxwxk/+vWsTnJN9+BGXg, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, Gavin Shan

Currently, there is one macro (TCE32_TABLE_SIZE) representing the
TCE table size for one DMA32 segment. The constant representing
the DMA32 segment size (1 << 28) is still used in the code.

This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
segment size. the TCE table size can be calcualted when the page
has fixed 4KB size. So all the related calculation depends on one
macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 30 +++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.h      |  1 +
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d18b95e..e60cff6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -48,9 +48,6 @@
 #include "powernv.h"
 #include "pci.h"
 
-/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
-
 #define POWERNV_IOMMU_DEFAULT_LEVELS	1
 #define POWERNV_IOMMU_MAX_LEVELS	5
 
@@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 
 	struct page *tce_mem = NULL;
 	struct iommu_table *tbl;
-	unsigned int i;
+	unsigned int tce32_segsz, i;
 	int64_t rc;
 	void *addr;
 
@@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	/* Grab a 32-bit TCE table */
 	pe->tce32_seg = base;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
-		(base << 28), ((base + segs) << 28) - 1);
+		base * PNV_IODA1_DMA32_SEGSIZE,
+		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
 
 	/* XXX Currently, we allocate one big contiguous table for the
 	 * TCEs. We only really need one chunk per 256M of TCE space
 	 * (ie per segment) but that's an optimization for later, it
 	 * requires some added smarts with our get/put_tce implementation
+	 *
+	 * Each TCE page is 4KB in size and each TCE entry occupies 8
+	 * bytes
 	 */
+	tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);
 	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
-				   get_order(TCE32_TABLE_SIZE * segs));
+				   get_order(tce32_segsz * segs));
 	if (!tce_mem) {
 		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
 		goto fail;
 	}
 	addr = page_address(tce_mem);
-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
+	memset(addr, 0, tce32_segsz * segs);
 
 	/* Configure HW */
 	for (i = 0; i < segs; i++) {
 		rc = opal_pci_map_pe_dma_window(phb->opal_id,
 					      pe->pe_number,
 					      base + i, 1,
-					      __pa(addr) + TCE32_TABLE_SIZE * i,
-					      TCE32_TABLE_SIZE, 0x1000);
+					      __pa(addr) + tce32_segsz * i,
+					      tce32_segsz, 0x1000);
 		if (rc) {
 			pe_err(pe, " Failed to configure 32-bit TCE table,"
 			       " err %ld\n", rc);
@@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	}
 
 	/* Setup linux iommu table */
-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
-				  base << 28, IOMMU_PAGE_SHIFT_4K);
+	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
+				  base * PNV_IODA1_DMA32_SEGSIZE,
+				  IOMMU_PAGE_SHIFT_4K);
 
 	/* OPAL variant of P7IOC SW invalidated TCEs */
 	if (phb->ioda.tce_inval_reg)
@@ -2116,7 +2119,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	if (pe->tce32_seg >= 0)
 		pe->tce32_seg = -1;
 	if (tce_mem)
-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
+		__free_pages(tce_mem, get_order(tce32_segsz * segs));
 	if (tbl) {
 		pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
 		iommu_free_table(tbl, "pnv");
@@ -3445,7 +3448,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	mutex_init(&phb->ioda.pe_list_mutex);
 
 	/* Calculate how many 32-bit TCE segments we have */
-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
+	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
+				PNV_IODA1_DMA32_SEGSIZE;
 
 #if 0 /* We should really do that ... */
 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 00539ff..1d8e775 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -84,6 +84,7 @@ struct pnv_ioda_pe {
 
 #define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
 #define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
+#define PNV_IODA1_DMA32_SEGSIZE	0x10000000
 
 #define PNV_PHB_FLAG_EEH	(1 << 0)
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
@ 2016-02-17  3:43     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

Currently, there is one macro (TCE32_TABLE_SIZE) representing the
TCE table size for one DMA32 segment. The constant representing
the DMA32 segment size (1 << 28) is still used in the code.

This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
segment size. the TCE table size can be calcualted when the page
has fixed 4KB size. So all the related calculation depends on one
macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 30 +++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.h      |  1 +
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d18b95e..e60cff6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -48,9 +48,6 @@
 #include "powernv.h"
 #include "pci.h"
 
-/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
-
 #define POWERNV_IOMMU_DEFAULT_LEVELS	1
 #define POWERNV_IOMMU_MAX_LEVELS	5
 
@@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 
 	struct page *tce_mem = NULL;
 	struct iommu_table *tbl;
-	unsigned int i;
+	unsigned int tce32_segsz, i;
 	int64_t rc;
 	void *addr;
 
@@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	/* Grab a 32-bit TCE table */
 	pe->tce32_seg = base;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
-		(base << 28), ((base + segs) << 28) - 1);
+		base * PNV_IODA1_DMA32_SEGSIZE,
+		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
 
 	/* XXX Currently, we allocate one big contiguous table for the
 	 * TCEs. We only really need one chunk per 256M of TCE space
 	 * (ie per segment) but that's an optimization for later, it
 	 * requires some added smarts with our get/put_tce implementation
+	 *
+	 * Each TCE page is 4KB in size and each TCE entry occupies 8
+	 * bytes
 	 */
+	tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);
 	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
-				   get_order(TCE32_TABLE_SIZE * segs));
+				   get_order(tce32_segsz * segs));
 	if (!tce_mem) {
 		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
 		goto fail;
 	}
 	addr = page_address(tce_mem);
-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
+	memset(addr, 0, tce32_segsz * segs);
 
 	/* Configure HW */
 	for (i = 0; i < segs; i++) {
 		rc = opal_pci_map_pe_dma_window(phb->opal_id,
 					      pe->pe_number,
 					      base + i, 1,
-					      __pa(addr) + TCE32_TABLE_SIZE * i,
-					      TCE32_TABLE_SIZE, 0x1000);
+					      __pa(addr) + tce32_segsz * i,
+					      tce32_segsz, 0x1000);
 		if (rc) {
 			pe_err(pe, " Failed to configure 32-bit TCE table,"
 			       " err %ld\n", rc);
@@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	}
 
 	/* Setup linux iommu table */
-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
-				  base << 28, IOMMU_PAGE_SHIFT_4K);
+	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
+				  base * PNV_IODA1_DMA32_SEGSIZE,
+				  IOMMU_PAGE_SHIFT_4K);
 
 	/* OPAL variant of P7IOC SW invalidated TCEs */
 	if (phb->ioda.tce_inval_reg)
@@ -2116,7 +2119,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	if (pe->tce32_seg >= 0)
 		pe->tce32_seg = -1;
 	if (tce_mem)
-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
+		__free_pages(tce_mem, get_order(tce32_segsz * segs));
 	if (tbl) {
 		pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
 		iommu_free_table(tbl, "pnv");
@@ -3445,7 +3448,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	mutex_init(&phb->ioda.pe_list_mutex);
 
 	/* Calculate how many 32-bit TCE segments we have */
-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
+	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
+				PNV_IODA1_DMA32_SEGSIZE;
 
 #if 0 /* We should really do that ... */
 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 00539ff..1d8e775 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -84,6 +84,7 @@ struct pnv_ioda_pe {
 
 #define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
 #define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
+#define PNV_IODA1_DMA32_SEGSIZE	0x10000000
 
 #define PNV_PHB_FLAG_EEH	(1 << 0)
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (11 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 14/45] powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe() Gavin Shan
@ 2016-02-17  3:43 ` Gavin Shan
  2016-04-13  8:59   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 18/45] powerpc/powernv: Increase PE# capacity Gavin Shan
                   ` (26 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:43 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
to their DMA32 weight. The PEs on the list are iterated to setup
their TCE32 tables at system booting time. The list is used for
once and there is for keep having it.

This moves the logic calculating DMA32 weight of PHB and PE to
pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE
traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount
are useless and they're removed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 168 +++++++++++++-----------------
 arch/powerpc/platforms/powernv/pci.h      |  19 ----
 2 files changed, 75 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e60cff6..0fc2309 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -886,44 +886,6 @@ out:
 	return 0;
 }
 
-static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
-				       struct pnv_ioda_pe *pe)
-{
-	struct pnv_ioda_pe *lpe;
-
-	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
-		if (lpe->dma_weight < pe->dma_weight) {
-			list_add_tail(&pe->dma_link, &lpe->dma_link);
-			return;
-		}
-	}
-	list_add_tail(&pe->dma_link, &phb->ioda.pe_dma_list);
-}
-
-static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
-{
-	/* This is quite simplistic. The "base" weight of a device
-	 * is 10. 0 means no DMA is to be accounted for it.
-	 */
-
-	/* If it's a bridge, no DMA */
-	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-		return 0;
-
-	/* Reduce the weight of slow USB controllers */
-	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
-	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
-	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-		return 3;
-
-	/* Increase the weight of RAID (includes Obsidian) */
-	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
-		return 15;
-
-	/* Default */
-	return 10;
-}
-
 #ifdef CONFIG_PCI_IOV
 static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 {
@@ -1028,7 +990,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 	pe->flags = PNV_IODA_PE_DEV;
 	pe->pdev = dev;
 	pe->pbus = NULL;
-	pe->tce32_seg = -1;
 	pe->mve_number = -1;
 	pe->rid = dev->bus->number << 8 | pdn->devfn;
 
@@ -1044,16 +1005,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 		return NULL;
 	}
 
-	/* Assign a DMA weight to the device */
-	pe->dma_weight = pnv_ioda_dma_weight(dev);
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
-	/* Link the PE */
-	pnv_ioda_link_pe_by_weight(phb, pe);
-
 	return pe;
 }
 
@@ -1071,7 +1022,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 		}
 		pdn->pcidev = dev;
 		pdn->pe_number = pe->pe_number;
-		pe->dma_weight += pnv_ioda_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
 			pnv_ioda_setup_same_PE(dev->subordinate, pe);
 	}
@@ -1108,10 +1058,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
-	pe->tce32_seg = -1;
 	pe->mve_number = -1;
 	pe->rid = bus->busn_res.start << 8;
-	pe->dma_weight = 0;
 
 	if (all)
 		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
@@ -1133,17 +1081,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 
 	/* Put PE to the list */
 	list_add_tail(&pe->list, &phb->ioda.pe_list);
-
-	/* Account for one DMA PE if at least one DMA capable device exist
-	 * below the bridge
-	 */
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
-	/* Link the PE */
-	pnv_ioda_link_pe_by_weight(phb, pe);
 }
 
 static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
@@ -1184,7 +1121,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
 			rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
 			npu_pdn->pcidev = npu_pdev;
 			npu_pdn->pe_number = pe_num;
-			pe->dma_weight += pnv_ioda_dma_weight(npu_pdev);
 			phb->ioda.pe_rmap[rid] = pe->pe_number;
 
 			/* Map the PE to this link */
@@ -1532,7 +1468,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		pe->flags = PNV_IODA_PE_VF;
 		pe->pbus = NULL;
 		pe->parent_dev = pdev;
-		pe->tce32_seg = -1;
 		pe->mve_number = -1;
 		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
 			   pci_iov_virtfn_devfn(pdev, vf_index);
@@ -2023,6 +1958,54 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
 	.free = pnv_ioda2_table_free,
 };
 
+static int pnv_pci_ioda_dev_dma_weight(struct pci_dev *dev, void *data)
+{
+	unsigned int *weight = (unsigned int *)data;
+
+	/* This is quite simplistic. The "base" weight of a device
+	 * is 10. 0 means no DMA is to be accounted for it.
+	 */
+	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
+		return 0;
+
+	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
+	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
+	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
+		*weight += 3;
+	else if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
+		*weight += 15;
+	else
+		*weight += 10;
+
+	return 0;
+}
+
+static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe)
+{
+	unsigned int weight = 0;
+
+	if ((pe->flags & PNV_IODA_PE_DEV) && pe->pdev) {
+		pnv_pci_ioda_dev_dma_weight(pe->pdev, &weight);
+	} else if ((pe->flags & PNV_IODA_PE_BUS) && pe->pbus) {
+		struct pci_dev *pdev;
+
+		list_for_each_entry(pdev, &pe->pbus->devices, bus_list)
+			pnv_pci_ioda_dev_dma_weight(pdev, &weight);
+	} else if ((pe->flags & PNV_IODA_PE_BUS_ALL) && pe->pbus) {
+		pci_walk_bus(pe->pbus, pnv_pci_ioda_dev_dma_weight, &weight);
+	}
+
+	return weight;
+}
+
+static unsigned int pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
+{
+	unsigned int weight = 0;
+
+	pci_walk_bus(phb->hose->bus, pnv_pci_ioda_dev_dma_weight, &weight);
+	return weight;
+}
+
 static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 				       struct pnv_ioda_pe *pe,
 				       unsigned int base,
@@ -2039,17 +2022,12 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
 
-	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
-		return;
-
 	tbl = pnv_pci_table_alloc(phb->hose->node);
 	iommu_register_group(&pe->table_group, phb->hose->global_number,
 			pe->pe_number);
 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
 
 	/* Grab a 32-bit TCE table */
-	pe->tce32_seg = base;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
 		base * PNV_IODA1_DMA32_SEGSIZE,
 		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
@@ -2116,8 +2094,6 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	return;
  fail:
 	/* XXX Failure: Try to fallback to 64-bit only ? */
-	if (pe->tce32_seg >= 0)
-		pe->tce32_seg = -1;
 	if (tce_mem)
 		__free_pages(tce_mem, get_order(tce32_segsz * segs));
 	if (tbl) {
@@ -2528,10 +2504,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 {
 	int64_t rc;
 
-	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
-		return;
-
 	/* TVE #1 is selected by PCI address bit 59 */
 	pe->tce_bypass_base = 1ull << 59;
 
@@ -2539,7 +2511,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 			pe->pe_number);
 
 	/* The PE will reserve all possible 32-bits space */
-	pe->tce32_seg = 0;
 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
 		phb->ioda.m32_pci_base);
 
@@ -2555,11 +2526,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 #endif
 
 	rc = pnv_pci_ioda2_setup_default_config(pe);
-	if (rc) {
-		if (pe->tce32_seg >= 0)
-			pe->tce32_seg = -1;
+	if (rc)
 		return;
-	}
 
 	if (pe->flags & PNV_IODA_PE_DEV)
 		iommu_add_device(&pe->pdev->dev);
@@ -2570,24 +2538,32 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 {
 	struct pci_controller *hose = phb->hose;
-	unsigned int residual, remaining, segs, tw, base;
+	unsigned int weight, total_weight, dma_pe_count;
+	unsigned int residual, remaining, segs, base;
 	struct pnv_ioda_pe *pe;
 
+	total_weight = pnv_pci_ioda_total_dma_weight(phb);
+	dma_pe_count = 0;
+	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
+		weight = pnv_pci_ioda_pe_dma_weight(pe);
+		if (weight > 0)
+			dma_pe_count++;
+	}
+
 	/* If we have more PE# than segments available, hand out one
 	 * per PE until we run out and let the rest fail. If not,
 	 * then we assign at least one segment per PE, plus more based
 	 * on the amount of devices under that PE
 	 */
-	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
+	if (dma_pe_count > phb->ioda.tce32_count)
 		residual = 0;
 	else
-		residual = phb->ioda.tce32_count -
-			phb->ioda.dma_pe_count;
+		residual = phb->ioda.tce32_count - dma_pe_count;
 
 	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
 		hose->global_number, phb->ioda.tce32_count);
 	pr_info("PCI: %d PE# for a total weight of %d\n",
-		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
+		dma_pe_count, total_weight);
 
 	pnv_pci_ioda_setup_opal_tce_kill(phb);
 
@@ -2596,18 +2572,20 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 	 * weight
 	 */
 	remaining = phb->ioda.tce32_count;
-	tw = phb->ioda.dma_weight;
 	base = 0;
-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
-		if (!pe->dma_weight)
+	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
+		weight = pnv_pci_ioda_pe_dma_weight(pe);
+		if (!weight)
 			continue;
+
 		if (!remaining) {
 			pe_warn(pe, "No DMA32 resources available\n");
 			continue;
 		}
 		segs = 1;
 		if (residual) {
-			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
+			segs += ((weight * residual) + (total_weight / 2)) /
+				total_weight;
 			if (segs > remaining)
 				segs = remaining;
 		}
@@ -2619,7 +2597,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 		 */
 		if (phb->type == PNV_PHB_IODA1) {
 			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
-				pe->dma_weight, segs);
+				weight, segs);
 			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
 		} else if (phb->type == PNV_PHB_IODA2) {
 			pe_info(pe, "Assign DMA32 space\n");
@@ -3156,13 +3134,18 @@ static void pnv_npu_ioda_fixup(void)
 	struct pci_controller *hose, *tmp;
 	struct pnv_phb *phb;
 	struct pnv_ioda_pe *pe;
+	unsigned int weight;
 
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
 		if (phb->type != PNV_PHB_NPU)
 			continue;
 
-		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
+		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
+			weight = pnv_pci_ioda_pe_dma_weight(pe);
+			if (!weight)
+				continue;
+
 			enable_bypass = dma_get_mask(&pe->pdev->dev) ==
 				DMA_BIT_MASK(64);
 			pnv_npu_init_dma_pe(pe);
@@ -3443,7 +3426,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
 
-	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
 	mutex_init(&phb->ioda.pe_list_mutex);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 1d8e775..e90bcbe 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -53,14 +53,7 @@ struct pnv_ioda_pe {
 	/* PE number */
 	unsigned int		pe_number;
 
-	/* "Weight" assigned to the PE for the sake of DMA resource
-	 * allocations
-	 */
-	unsigned int		dma_weight;
-
 	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
-	int			tce32_seg;
-	int			tce32_segcount;
 	struct iommu_table_group table_group;
 
 	/* 64-bit TCE bypass region */
@@ -78,7 +71,6 @@ struct pnv_ioda_pe {
 	struct list_head	slaves;
 
 	/* Link in list of PE#s */
-	struct list_head	dma_link;
 	struct list_head	list;
 };
 
@@ -173,17 +165,6 @@ struct pnv_phb {
 		/* 32-bit TCE tables allocation */
 		unsigned long		tce32_count;
 
-		/* Total "weight" for the sake of DMA resources
-		 * allocation
-		 */
-		unsigned int		dma_weight;
-		unsigned int		dma_pe_count;
-
-		/* Sorted list of used PE's, sorted at
-		 * boot for resource allocation purposes
-		 */
-		struct list_head	pe_dma_list;
-
 		/* TCE cache invalidate registers (physical and
 		 * remapped)
 		 */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 17/45] powerpc/powernv/ioda1: Improve DMA32 segment track
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2016-02-17  3:44     ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                       ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, aik-sLpHqDYs0B2HXe+LvDLADg,
	dja-Yfaxwxk/+vWsTnJN9+BGXg, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, Gavin Shan

In current implementation, the DMA32 segments required by one specific
PE isn't calculated with the information hold in the PE independently.
It conflicts with the PCI hotplug design: PE centralized, meaning the
PE's DMA32 segments should be calculated from the information hold in
the PE independently.

This introduces an array (@dma32_segmap) for every PHB to track the
DMA32 segmeng usage. Besides, this moves the logic calculating PE's
consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
DMA32 segments are calculated/allocated from the information hold in
the PE (DMA32 weight). Also the logic is improved: we try to allocate
as much DMA32 segments as we can. It's acceptable that number of DMA32
segments less than the expected number are allocated.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 111 +++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.h      |   7 +-
 2 files changed, 66 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0fc2309..59782fba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2007,20 +2007,54 @@ static unsigned int pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
 }
 
 static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
-				       struct pnv_ioda_pe *pe,
-				       unsigned int base,
-				       unsigned int segs)
+				       struct pnv_ioda_pe *pe)
 {
 
 	struct page *tce_mem = NULL;
 	struct iommu_table *tbl;
-	unsigned int tce32_segsz, i;
+	unsigned int weight, total_weight;
+	unsigned int tce32_segsz, base, segs, i;
 	int64_t rc;
 	void *addr;
 
 	/* XXX FIXME: Handle 64-bit only DMA devices */
 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
+	total_weight = pnv_pci_ioda_total_dma_weight(phb);
+	weight = pnv_pci_ioda_pe_dma_weight(pe);
+
+	segs = (weight * phb->ioda.dma32_count) / total_weight;
+	if (!segs)
+		segs = 1;
+
+	/*
+	 * Allocate contiguous DMA32 segments. We begin with the expected
+	 * number of segments. With one more attempt, the number of DMA32
+	 * segments to be allocated is decreased by one until one segment
+	 * is allocated successfully.
+	 */
+	while (segs) {
+		for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
+			for (i = base; i < base + segs; i++) {
+				if (phb->ioda.dma32_segmap[i] !=
+				    IODA_INVALID_PE)
+					break;
+			}
+
+			if (i >= base + segs)
+				break;
+		}
+
+		if (i >= base + segs)
+			break;
+
+		segs--;
+	}
+
+	if (!segs) {
+		pe_warn(pe, "No available DMA32 segments\n");
+		return;
+	}
 
 	tbl = pnv_pci_table_alloc(phb->hose->node);
 	iommu_register_group(&pe->table_group, phb->hose->global_number,
@@ -2028,6 +2062,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
 
 	/* Grab a 32-bit TCE table */
+	pe_info(pe, "DMA weight %d (%d), assigned (%d) %d DMA32 segments\n",
+		weight, total_weight, base, segs);
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
 		base * PNV_IODA1_DMA32_SEGSIZE,
 		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
@@ -2064,6 +2100,10 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 		}
 	}
 
+	/* Setup DMA32 segment mapping */
+	for (i = base; i < base + segs; i++)
+		phb->ioda.dma32_segmap[i] = pe->pe_number;
+
 	/* Setup linux iommu table */
 	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
 				  base * PNV_IODA1_DMA32_SEGSIZE,
@@ -2538,70 +2578,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 {
 	struct pci_controller *hose = phb->hose;
-	unsigned int weight, total_weight, dma_pe_count;
-	unsigned int residual, remaining, segs, base;
 	struct pnv_ioda_pe *pe;
-
-	total_weight = pnv_pci_ioda_total_dma_weight(phb);
-	dma_pe_count = 0;
-	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-		weight = pnv_pci_ioda_pe_dma_weight(pe);
-		if (weight > 0)
-			dma_pe_count++;
-	}
+	unsigned int weight;
 
 	/* If we have more PE# than segments available, hand out one
 	 * per PE until we run out and let the rest fail. If not,
 	 * then we assign at least one segment per PE, plus more based
 	 * on the amount of devices under that PE
 	 */
-	if (dma_pe_count > phb->ioda.tce32_count)
-		residual = 0;
-	else
-		residual = phb->ioda.tce32_count - dma_pe_count;
-
 	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
-		hose->global_number, phb->ioda.tce32_count);
-	pr_info("PCI: %d PE# for a total weight of %d\n",
-		dma_pe_count, total_weight);
+		hose->global_number, phb->ioda.dma32_count);
 
 	pnv_pci_ioda_setup_opal_tce_kill(phb);
 
-	/* Walk our PE list and configure their DMA segments, hand them
-	 * out one base segment plus any residual segments based on
-	 * weight
-	 */
-	remaining = phb->ioda.tce32_count;
-	base = 0;
+	/* Walk our PE list and configure their DMA segments */
 	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
 		weight = pnv_pci_ioda_pe_dma_weight(pe);
 		if (!weight)
 			continue;
 
-		if (!remaining) {
-			pe_warn(pe, "No DMA32 resources available\n");
-			continue;
-		}
-		segs = 1;
-		if (residual) {
-			segs += ((weight * residual) + (total_weight / 2)) /
-				total_weight;
-			if (segs > remaining)
-				segs = remaining;
-		}
-
 		/*
 		 * For IODA2 compliant PHB3, we needn't care about the weight.
 		 * The all available 32-bits DMA space will be assigned to
 		 * the specific PE.
 		 */
 		if (phb->type == PNV_PHB_IODA1) {
-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
-				weight, segs);
-			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
+			pnv_pci_ioda1_setup_dma_pe(phb, pe);
 		} else if (phb->type == PNV_PHB_IODA2) {
 			pe_info(pe, "Assign DMA32 space\n");
-			segs = 0;
 			pnv_pci_ioda2_setup_dma_pe(phb, pe);
 		} else if (phb->type == PNV_PHB_NPU) {
 			/*
@@ -2611,9 +2615,6 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 			 * as the PHB3 TVT.
 			 */
 		}
-
-		remaining -= segs;
-		base += segs;
 	}
 }
 
@@ -3313,7 +3314,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
-	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
+	unsigned long size, m64map_off, m32map_off, pemap_off;
+	unsigned long iomap_off = 0, dma32map_off = 0;
 	const __be64 *prop64;
 	const __be32 *prop32;
 	int i, len;
@@ -3398,6 +3400,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
 	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
 
+	/* Calculate how many 32-bit TCE segments we have */
+	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
+				PNV_IODA1_DMA32_SEGSIZE;
+
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
 	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
 	m64map_off = size;
@@ -3407,6 +3413,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (phb->type == PNV_PHB_IODA1) {
 		iomap_off = size;
 		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
+		dma32map_off = size;
+		size += phb->ioda.dma32_count *
+			sizeof(phb->ioda.dma32_segmap[0]);
 	}
 	pemap_off = size;
 	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
@@ -3422,6 +3431,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		phb->ioda.io_segmap = aux + iomap_off;
 		for (i = 0; i < phb->ioda.total_pe_num; i++)
 			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
+
+		phb->ioda.dma32_segmap = aux + dma32map_off;
+		for (i = 0; i < phb->ioda.dma32_count; i++)
+			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
 	}
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
@@ -3430,7 +3443,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	mutex_init(&phb->ioda.pe_list_mutex);
 
 	/* Calculate how many 32-bit TCE segments we have */
-	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
+	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
 				PNV_IODA1_DMA32_SEGSIZE;
 
 #if 0 /* We should really do that ... */
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index e90bcbe..350e630 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -146,6 +146,10 @@ struct pnv_phb {
 		int			*m32_segmap;
 		int			*io_segmap;
 
+		/* DMA32 segment maps - IODA1 only */
+		unsigned long		dma32_count;
+		int			*dma32_segmap;
+
 		/* IRQ chip */
 		int			irq_chip_init;
 		struct irq_chip		irq_chip;
@@ -162,9 +166,6 @@ struct pnv_phb {
 		 */
 		unsigned char		pe_rmap[0x10000];
 
-		/* 32-bit TCE tables allocation */
-		unsigned long		tce32_count;
-
 		/* TCE cache invalidate registers (physical and
 		 * remapped)
 		 */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 17/45] powerpc/powernv/ioda1: Improve DMA32 segment track
@ 2016-02-17  3:44     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

In current implementation, the DMA32 segments required by one specific
PE isn't calculated with the information hold in the PE independently.
It conflicts with the PCI hotplug design: PE centralized, meaning the
PE's DMA32 segments should be calculated from the information hold in
the PE independently.

This introduces an array (@dma32_segmap) for every PHB to track the
DMA32 segmeng usage. Besides, this moves the logic calculating PE's
consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
DMA32 segments are calculated/allocated from the information hold in
the PE (DMA32 weight). Also the logic is improved: we try to allocate
as much DMA32 segments as we can. It's acceptable that number of DMA32
segments less than the expected number are allocated.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 111 +++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.h      |   7 +-
 2 files changed, 66 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0fc2309..59782fba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2007,20 +2007,54 @@ static unsigned int pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
 }
 
 static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
-				       struct pnv_ioda_pe *pe,
-				       unsigned int base,
-				       unsigned int segs)
+				       struct pnv_ioda_pe *pe)
 {
 
 	struct page *tce_mem = NULL;
 	struct iommu_table *tbl;
-	unsigned int tce32_segsz, i;
+	unsigned int weight, total_weight;
+	unsigned int tce32_segsz, base, segs, i;
 	int64_t rc;
 	void *addr;
 
 	/* XXX FIXME: Handle 64-bit only DMA devices */
 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
+	total_weight = pnv_pci_ioda_total_dma_weight(phb);
+	weight = pnv_pci_ioda_pe_dma_weight(pe);
+
+	segs = (weight * phb->ioda.dma32_count) / total_weight;
+	if (!segs)
+		segs = 1;
+
+	/*
+	 * Allocate contiguous DMA32 segments. We begin with the expected
+	 * number of segments. With one more attempt, the number of DMA32
+	 * segments to be allocated is decreased by one until one segment
+	 * is allocated successfully.
+	 */
+	while (segs) {
+		for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
+			for (i = base; i < base + segs; i++) {
+				if (phb->ioda.dma32_segmap[i] !=
+				    IODA_INVALID_PE)
+					break;
+			}
+
+			if (i >= base + segs)
+				break;
+		}
+
+		if (i >= base + segs)
+			break;
+
+		segs--;
+	}
+
+	if (!segs) {
+		pe_warn(pe, "No available DMA32 segments\n");
+		return;
+	}
 
 	tbl = pnv_pci_table_alloc(phb->hose->node);
 	iommu_register_group(&pe->table_group, phb->hose->global_number,
@@ -2028,6 +2062,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
 
 	/* Grab a 32-bit TCE table */
+	pe_info(pe, "DMA weight %d (%d), assigned (%d) %d DMA32 segments\n",
+		weight, total_weight, base, segs);
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
 		base * PNV_IODA1_DMA32_SEGSIZE,
 		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
@@ -2064,6 +2100,10 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 		}
 	}
 
+	/* Setup DMA32 segment mapping */
+	for (i = base; i < base + segs; i++)
+		phb->ioda.dma32_segmap[i] = pe->pe_number;
+
 	/* Setup linux iommu table */
 	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
 				  base * PNV_IODA1_DMA32_SEGSIZE,
@@ -2538,70 +2578,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 {
 	struct pci_controller *hose = phb->hose;
-	unsigned int weight, total_weight, dma_pe_count;
-	unsigned int residual, remaining, segs, base;
 	struct pnv_ioda_pe *pe;
-
-	total_weight = pnv_pci_ioda_total_dma_weight(phb);
-	dma_pe_count = 0;
-	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-		weight = pnv_pci_ioda_pe_dma_weight(pe);
-		if (weight > 0)
-			dma_pe_count++;
-	}
+	unsigned int weight;
 
 	/* If we have more PE# than segments available, hand out one
 	 * per PE until we run out and let the rest fail. If not,
 	 * then we assign at least one segment per PE, plus more based
 	 * on the amount of devices under that PE
 	 */
-	if (dma_pe_count > phb->ioda.tce32_count)
-		residual = 0;
-	else
-		residual = phb->ioda.tce32_count - dma_pe_count;
-
 	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
-		hose->global_number, phb->ioda.tce32_count);
-	pr_info("PCI: %d PE# for a total weight of %d\n",
-		dma_pe_count, total_weight);
+		hose->global_number, phb->ioda.dma32_count);
 
 	pnv_pci_ioda_setup_opal_tce_kill(phb);
 
-	/* Walk our PE list and configure their DMA segments, hand them
-	 * out one base segment plus any residual segments based on
-	 * weight
-	 */
-	remaining = phb->ioda.tce32_count;
-	base = 0;
+	/* Walk our PE list and configure their DMA segments */
 	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
 		weight = pnv_pci_ioda_pe_dma_weight(pe);
 		if (!weight)
 			continue;
 
-		if (!remaining) {
-			pe_warn(pe, "No DMA32 resources available\n");
-			continue;
-		}
-		segs = 1;
-		if (residual) {
-			segs += ((weight * residual) + (total_weight / 2)) /
-				total_weight;
-			if (segs > remaining)
-				segs = remaining;
-		}
-
 		/*
 		 * For IODA2 compliant PHB3, we needn't care about the weight.
 		 * The all available 32-bits DMA space will be assigned to
 		 * the specific PE.
 		 */
 		if (phb->type == PNV_PHB_IODA1) {
-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
-				weight, segs);
-			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
+			pnv_pci_ioda1_setup_dma_pe(phb, pe);
 		} else if (phb->type == PNV_PHB_IODA2) {
 			pe_info(pe, "Assign DMA32 space\n");
-			segs = 0;
 			pnv_pci_ioda2_setup_dma_pe(phb, pe);
 		} else if (phb->type == PNV_PHB_NPU) {
 			/*
@@ -2611,9 +2615,6 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 			 * as the PHB3 TVT.
 			 */
 		}
-
-		remaining -= segs;
-		base += segs;
 	}
 }
 
@@ -3313,7 +3314,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
-	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
+	unsigned long size, m64map_off, m32map_off, pemap_off;
+	unsigned long iomap_off = 0, dma32map_off = 0;
 	const __be64 *prop64;
 	const __be32 *prop32;
 	int i, len;
@@ -3398,6 +3400,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
 	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
 
+	/* Calculate how many 32-bit TCE segments we have */
+	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
+				PNV_IODA1_DMA32_SEGSIZE;
+
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
 	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
 	m64map_off = size;
@@ -3407,6 +3413,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (phb->type == PNV_PHB_IODA1) {
 		iomap_off = size;
 		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
+		dma32map_off = size;
+		size += phb->ioda.dma32_count *
+			sizeof(phb->ioda.dma32_segmap[0]);
 	}
 	pemap_off = size;
 	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
@@ -3422,6 +3431,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		phb->ioda.io_segmap = aux + iomap_off;
 		for (i = 0; i < phb->ioda.total_pe_num; i++)
 			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
+
+		phb->ioda.dma32_segmap = aux + dma32map_off;
+		for (i = 0; i < phb->ioda.dma32_count; i++)
+			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
 	}
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
@@ -3430,7 +3443,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	mutex_init(&phb->ioda.pe_list_mutex);
 
 	/* Calculate how many 32-bit TCE segments we have */
-	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
+	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
 				PNV_IODA1_DMA32_SEGSIZE;
 
 #if 0 /* We should really do that ... */
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index e90bcbe..350e630 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -146,6 +146,10 @@ struct pnv_phb {
 		int			*m32_segmap;
 		int			*io_segmap;
 
+		/* DMA32 segment maps - IODA1 only */
+		unsigned long		dma32_count;
+		int			*dma32_segmap;
+
 		/* IRQ chip */
 		int			irq_chip_init;
 		struct irq_chip		irq_chip;
@@ -162,9 +166,6 @@ struct pnv_phb {
 		 */
 		unsigned char		pe_rmap[0x10000];
 
-		/* 32-bit TCE tables allocation */
-		unsigned long		tce32_count;
-
 		/* TCE cache invalidate registers (physical and
 		 * remapped)
 		 */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 18/45] powerpc/powernv: Increase PE# capacity
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (12 preceding siblings ...)
  2016-02-17  3:43 ` [PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  2:02   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 19/45] powerpc/powernv: Use PE instead of number during setup and release Gavin Shan
                   ` (25 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

Each PHB maintains an array helping to translate 2-bytes Request
ID (RID) to PE# with the assumption that PE# takes one byte, meaning
that we can't have more than 256 PEs. However, pci_dn->pe_number
already had 4-bytes for the PE#.

This extends the PE# capacity for every PHB. After that, the PE number
is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
check the PE# in phb->pe_rmap[] is valid or not.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++++-
 arch/powerpc/platforms/powernv/pci.h      | 7 ++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 59782fba..7800897 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -757,7 +757,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	/* Clear the reverse map */
 	for (rid = pe->rid; rid < rid_end; rid++)
-		phb->ioda.pe_rmap[rid] = 0;
+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
 
 	/* Release from all parents PELT-V */
 	while (parent) {
@@ -3387,6 +3387,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (prop32)
 		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
 
+	/* Invalidate RID to PE# mapping */
+	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
+		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
+
 	/* Parse 64-bit MMIO range */
 	pnv_ioda_parse_m64_window(phb);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 350e630..928cf81 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -160,11 +160,8 @@ struct pnv_phb {
 		struct list_head	pe_list;
 		struct mutex            pe_list_mutex;
 
-		/* Reverse map of PEs, will have to extend if
-		 * we are to support more than 256 PEs, indexed
-		 * bus { bus, devfn }
-		 */
-		unsigned char		pe_rmap[0x10000];
+		/* Reverse map of PEs, indexed by {bus, devfn} */
+		int			pe_rmap[0x10000];
 
 		/* TCE cache invalidate registers (physical and
 		 * remapped)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 19/45] powerpc/powernv: Use PE instead of number during setup and release
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (13 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 18/45] powerpc/powernv: Increase PE# capacity Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  2:50   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 20/45] powerpc/powernv: Allocate PE# in reverse order Gavin Shan
                   ` (24 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

In current implementation, the PEs that are allocated or picked
from the reserved list are identified by PE number. The PE instance
has to be picked according to the PE number eventually. We have
same issue when PE is released.

For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
returns the reserved/allocated PE instance to be used in subsequent
patches. On the other hand, pnv_ioda_free_pe() uses PE instance
(not number) as its argument. No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 104 +++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.h      |   2 +-
 2 files changed, 59 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7800897..f182ca7 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -119,6 +119,14 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
+static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
+{
+	phb->ioda.pe_array[pe_no].phb = phb;
+	phb->ioda.pe_array[pe_no].pe_number = pe_no;
+
+	return &phb->ioda.pe_array[pe_no];
+}
+
 static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 {
 	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
@@ -131,11 +139,10 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 		pr_debug("%s: PE %d was reserved on PHB#%x\n",
 			 __func__, pe_no, phb->hose->global_number);
 
-	phb->ioda.pe_array[pe_no].phb = phb;
-	phb->ioda.pe_array[pe_no].pe_number = pe_no;
+	pnv_ioda_init_pe(phb, pe_no);
 }
 
-static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
+static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
 	unsigned long pe;
 
@@ -143,20 +150,20 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 		pe = find_next_zero_bit(phb->ioda.pe_alloc,
 					phb->ioda.total_pe_num, 0);
 		if (pe >= phb->ioda.total_pe_num)
-			return IODA_INVALID_PE;
+			return NULL;
 	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
-	phb->ioda.pe_array[pe].phb = phb;
-	phb->ioda.pe_array[pe].pe_number = pe;
-	return pe;
+	return pnv_ioda_init_pe(phb, pe);
 }
 
-static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
+static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
 {
-	WARN_ON(phb->ioda.pe_array[pe].pdev);
+	struct pnv_phb *phb = pe->phb;
 
-	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
-	clear_bit(pe, phb->ioda.pe_alloc);
+	WARN_ON(pe->pdev);
+
+	memset(pe, 0, sizeof(struct pnv_ioda_pe));
+	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
 }
 
 /* The default M64 BAR is shared by all PEs */
@@ -316,7 +323,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
 	}
 }
 
-static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
+static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -326,7 +333,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 
 	/* Root bus shouldn't use M64 */
 	if (pci_is_root_bus(bus))
-		return IODA_INVALID_PE;
+		return NULL;
 
 	/* Allocate bitmap */
 	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
@@ -334,7 +341,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	if (!pe_alloc) {
 		pr_warn("%s: Out of memory !\n",
 			__func__);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/* Figure out reserved PE numbers by the PE */
@@ -347,7 +354,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	 */
 	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
 		kfree(pe_alloc);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/*
@@ -393,7 +400,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	}
 
 	kfree(pe_alloc);
-	return master_pe->pe_number;
+	return master_pe;
 }
 
 static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
@@ -959,7 +966,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 	struct pnv_phb *phb = hose->private_data;
 	struct pci_dn *pdn = pci_get_pdn(dev);
 	struct pnv_ioda_pe *pe;
-	int pe_num;
 
 	if (!pdn) {
 		pr_err("%s: Device tree node not associated properly\n",
@@ -969,8 +975,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 	if (pdn->pe_number != IODA_INVALID_PE)
 		return NULL;
 
-	pe_num = pnv_ioda_alloc_pe(phb);
-	if (pe_num == IODA_INVALID_PE) {
+	pe = pnv_ioda_alloc_pe(phb);
+	if (!pe) {
 		pr_warning("%s: Not enough PE# available, disabling device\n",
 			   pci_name(dev));
 		return NULL;
@@ -983,10 +989,9 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 	 *
 	 * At some point we want to remove the PDN completely anyways
 	 */
-	pe = &phb->ioda.pe_array[pe_num];
 	pci_dev_get(dev);
 	pdn->pcidev = dev;
-	pdn->pe_number = pe_num;
+	pdn->pe_number = pe->pe_number;
 	pe->flags = PNV_IODA_PE_DEV;
 	pe->pdev = dev;
 	pe->pbus = NULL;
@@ -997,8 +1002,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
 		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
+		pnv_ioda_free_pe(pe);
 		pdn->pe_number = IODA_INVALID_PE;
 		pe->pdev = NULL;
 		pci_dev_put(dev);
@@ -1033,28 +1037,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
  * subordinate PCI devices and buses. The second type of PE is normally
  * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
  */
-static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
+static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
-	struct pnv_ioda_pe *pe;
-	int pe_num = IODA_INVALID_PE;
+	struct pnv_ioda_pe *pe = NULL;
 
 	/* Check if PE is determined by M64 */
 	if (phb->pick_m64_pe)
-		pe_num = phb->pick_m64_pe(bus, all);
+		pe = phb->pick_m64_pe(bus, all);
 
 	/* The PE number isn't pinned by M64 */
-	if (pe_num == IODA_INVALID_PE)
-		pe_num = pnv_ioda_alloc_pe(phb);
+	if (!pe)
+		pe = pnv_ioda_alloc_pe(phb);
 
-	if (pe_num == IODA_INVALID_PE) {
+	if (!pe) {
 		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
 			__func__, pci_domain_nr(bus), bus->number);
-		return;
+		return NULL;
 	}
 
-	pe = &phb->ioda.pe_array[pe_num];
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
@@ -1063,17 +1065,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 
 	if (all)
 		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
-			bus->busn_res.start, bus->busn_res.end, pe_num);
+			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
 	else
 		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
-			bus->busn_res.start, pe_num);
+			bus->busn_res.start, pe->pe_number);
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
 		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
+		pnv_ioda_free_pe(pe);
 		pe->pbus = NULL;
-		return;
+		return NULL;
 	}
 
 	/* Associate it with all child devices */
@@ -1081,6 +1082,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 
 	/* Put PE to the list */
 	list_add_tail(&pe->list, &phb->ioda.pe_list);
+
+	return pe;
 }
 
 static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
@@ -1392,7 +1395,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
 
 		pnv_ioda_deconfigure_pe(phb, pe);
 
-		pnv_ioda_free_pe(phb, pe->pe_number);
+		pnv_ioda_free_pe(pe);
 	}
 }
 
@@ -1401,6 +1404,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
+	struct pnv_ioda_pe    *pe;
 	struct pci_dn         *pdn;
 	struct pci_sriov      *iov;
 	u16                    num_vfs, i;
@@ -1425,8 +1429,11 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 		/* Release PE numbers */
 		if (pdn->m64_single_mode) {
 			for (i = 0; i < num_vfs; i++) {
-				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
-					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
+				if (pdn->pe_num_map[i] == IODA_INVALID_PE)
+					continue;
+
+				pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
+				pnv_ioda_free_pe(pe);
 			}
 		} else
 			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
@@ -1479,8 +1486,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 
 		if (pnv_ioda_configure_pe(phb, pe)) {
 			/* XXX What do we do here ? */
-			if (pe_num)
-				pnv_ioda_free_pe(phb, pe_num);
+			pnv_ioda_free_pe(pe);
 			pe->pdev = NULL;
 			continue;
 		}
@@ -1499,6 +1505,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
+	struct pnv_ioda_pe    *pe;
 	struct pci_dn         *pdn;
 	int                    ret;
 	u16                    i;
@@ -1541,11 +1548,13 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 		/* Calculate available PE for required VFs */
 		if (pdn->m64_single_mode) {
 			for (i = 0; i < num_vfs; i++) {
-				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
-				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
+				pe = pnv_ioda_alloc_pe(phb);
+				if (!pe) {
 					ret = -EBUSY;
 					goto m64_failed;
 				}
+
+				pdn->pe_num_map[i] = pe->pe_number;
 			}
 		} else {
 			mutex_lock(&phb->ioda.pe_alloc_mutex);
@@ -1590,8 +1599,11 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 m64_failed:
 	if (pdn->m64_single_mode) {
 		for (i = 0; i < num_vfs; i++) {
-			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
-				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
+			if (pdn->pe_num_map[i] == IODA_INVALID_PE)
+				continue;
+
+			pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
+			pnv_ioda_free_pe(pe);
 		}
 	} else
 		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 928cf81..ef9924a 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -109,7 +109,7 @@ struct pnv_phb {
 	int (*init_m64)(struct pnv_phb *phb);
 	void (*reserve_m64_pe)(struct pci_bus *bus,
 			       unsigned long *pe_bitmap, bool all);
-	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
+	struct pnv_ioda_pe *(*pick_m64_pe)(struct pci_bus *bus, bool all);
 	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
 	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
 	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 20/45] powerpc/powernv: Allocate PE# in reverse order
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (14 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 19/45] powerpc/powernv: Use PE instead of number during setup and release Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  3:07   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time Gavin Shan
                   ` (23 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

PE number for one particular PE can be allocated dynamically or
reserved according to the consumed M64 (64-bits prefetchable)
segments of the PE. The M64 resources, and hence their segments
and PE number are assigned/reserved in ascending order. The PE
numbers are allocated dynamically in ascending order as well.
It's not a problem as the PE numbers are reserved and then
allocated all at once in fine order. However, it will introduce
conflicts when PCI hotplug is supported: the PE number to be
reserved for newly added PE might have been assigned.

To resolve above conflicts, this forces the PE number to be
allocated dynamically in reverse order. With this patch applied,
the PE numbers are reserved in ascending order, but allocated
dynamically in reverse order.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f182ca7..565725b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -144,16 +144,14 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 
 static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
-	unsigned long pe;
+	unsigned long pe = phb->ioda.total_pe_num - 1;
 
-	do {
-		pe = find_next_zero_bit(phb->ioda.pe_alloc,
-					phb->ioda.total_pe_num, 0);
-		if (pe >= phb->ioda.total_pe_num)
-			return NULL;
-	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
+	for (pe = phb->ioda.total_pe_num - 1; pe >= 0; pe--) {
+		if (!test_and_set_bit(pe, phb->ioda.pe_alloc))
+			return pnv_ioda_init_pe(phb, pe);
+	}
 
-	return pnv_ioda_init_pe(phb, pe);
+	return NULL;
 }
 
 static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (15 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 20/45] powerpc/powernv: Allocate PE# in reverse order Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  4:16   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table Gavin Shan
                   ` (22 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

Currently, the PEs and their associated resources are assigned
in ppc_md.pcibios_fixup() except those used by SRIOV VFs. The
function is called for once after PCI probing and resources
assignment is completed. So it isn't hotplug friendly.

This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
is called on the event during system bootup and PCI hotplug: updating
PCI bridge's windows after resource assignment/reassignment are done.
For partial hotplug case, where not all PCI devices belonging to the
PE are unplugged and plugged again, we just need unbinding/binding
the affected PCI devices with the corresponding PE without creating
new one.

As there is no upstream bridge for root bus that needs to be covered
by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
before any other PEs can be created, as PE for root bus is the ancestor
to anyone else.

Also, the windows of root port or the upstream port of PCIe switch behind
root port are extended to be PHB's apertures to accommodate the additional
resources needed by newly plugged devices based on the fact: hotpluggable
slot is behind root port or downstream port of the PCIe switch behind
root port. The extension for those PCI brdiges' windows is done in
ppc_md.pcibios_setup_bridge() as well.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 294 +++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.h      |   2 +
 2 files changed, 168 insertions(+), 128 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 565725b..d360607 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -197,14 +197,14 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 	set_bit(phb->ioda.m64_bar_idx, &phb->ioda.m64_bar_alloc);
 
 	/*
-	 * Strip off the segment used by the reserved PE, which is
-	 * expected to be 0 or last one of PE capabicity.
+	 * Exclude the segments for reserved and root bus PE, which
+	 * are first or last two PEs.
 	 */
 	r = &phb->hose->mem_resources[1];
 	if (phb->ioda.reserved_pe_idx == 0)
-		r->start += phb->ioda.m64_segsize;
+		r->start += (2 * phb->ioda.m64_segsize);
 	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
-		r->end -= phb->ioda.m64_segsize;
+		r->end -= (2 * phb->ioda.m64_segsize);
 	else
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe_idx);
@@ -284,14 +284,14 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
 	}
 
 	/*
-	 * Exclude the segment used by the reserved PE, which
-	 * is expected to be 0 or last supported PE#.
+	 * Exclude the segments for reserved and root bus PE, which
+	 * are first or last two PEs.
 	 */
 	r = &phb->hose->mem_resources[1];
 	if (phb->ioda.reserved_pe_idx == 0)
-		r->start += phb->ioda.m64_segsize;
+		r->start += (2 * phb->ioda.m64_segsize);
 	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
-		r->end -= phb->ioda.m64_segsize;
+		r->end -= (2 * phb->ioda.m64_segsize);
 	else
 		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe_idx);
@@ -1022,6 +1022,15 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 				pci_name(dev));
 			continue;
 		}
+
+		/*
+		 * In partial hotplug case, the PCI device might be still
+		 * associated with the PE and needn't be attached to the
+		 * PE again.
+		 */
+		if (pdn->pe_number != IODA_INVALID_PE)
+			continue;
+
 		pdn->pcidev = dev;
 		pdn->pe_number = pe->pe_number;
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
@@ -1040,9 +1049,26 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
 	struct pnv_ioda_pe *pe = NULL;
+	int pe_num;
+
+	/*
+	 * In partial hotplug case, the PE instance might be still alive.
+	 * We should reuse it instead of allocating a new one.
+	 */
+	pe_num = phb->ioda.pe_rmap[bus->number << 8];
+	if (pe_num != IODA_INVALID_PE) {
+		pe = &phb->ioda.pe_array[pe_num];
+		pnv_ioda_setup_same_PE(bus, pe);
+		return NULL;
+	}
+
+	/* PE number for root bus should have been reserved */
+	if (pci_is_root_bus(bus) &&
+	    phb->ioda.root_pe_idx != IODA_INVALID_PE)
+		pe = &phb->ioda.pe_array[phb->ioda.root_pe_idx];
 
 	/* Check if PE is determined by M64 */
-	if (phb->pick_m64_pe)
+	if (!pe && phb->pick_m64_pe)
 		pe = phb->pick_m64_pe(bus, all);
 
 	/* The PE number isn't pinned by M64 */
@@ -1154,30 +1180,6 @@ static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus)
 		pnv_ioda_setup_npu_PE(pdev);
 }
 
-static void pnv_ioda_setup_PEs(struct pci_bus *bus)
-{
-	struct pci_dev *dev;
-
-	pnv_ioda_setup_bus_PE(bus, false);
-
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		if (dev->subordinate) {
-			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
-				pnv_ioda_setup_bus_PE(dev->subordinate, true);
-			else
-				pnv_ioda_setup_PEs(dev->subordinate);
-		}
-	}
-}
-
-/*
- * Configure PEs so that the downstream PCI buses and devices
- * could have their associated PE#. Unfortunately, we didn't
- * figure out the way to identify the PLX bridge yet. So we
- * simply put the PCI bus and the subordinate behind the root
- * port to PE# here. The game rule here is expected to be changed
- * as soon as we can detected PLX bridge correctly.
- */
 static void pnv_pci_ioda_setup_PEs(void)
 {
 	struct pci_controller *hose, *tmp;
@@ -1185,22 +1187,12 @@ static void pnv_pci_ioda_setup_PEs(void)
 
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
+		if (phb->type != PNV_PHB_NPU)
+			continue;
 
-		/* M64 layout might affect PE allocation */
-		if (phb->reserve_m64_pe)
-			phb->reserve_m64_pe(hose->bus, NULL, true);
-
-		/*
-		 * On NPU PHB, we expect separate PEs for individual PCI
-		 * functions. PCI bus dependent PEs are required for the
-		 * remaining types of PHBs.
-		 */
-		if (phb->type == PNV_PHB_NPU) {
-			/* PE#0 is needed for error reporting */
-			pnv_ioda_reserve_pe(phb, 0);
-			pnv_ioda_setup_npu_PEs(hose->bus);
-		} else
-			pnv_ioda_setup_PEs(hose->bus);
+		/* PE#0 is needed for error reporting */
+		pnv_ioda_reserve_pe(phb, 0);
+		pnv_ioda_setup_npu_PEs(hose->bus);
 	}
 }
 
@@ -2552,8 +2544,13 @@ static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 				       struct pnv_ioda_pe *pe)
 {
+	unsigned int weight;
 	int64_t rc;
 
+	weight = pnv_pci_ioda_pe_dma_weight(pe);
+	if (!weight)
+		return;
+
 	/* TVE #1 is selected by PCI address bit 59 */
 	pe->tce_bypass_base = 1ull << 59;
 
@@ -2585,49 +2582,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
 }
 
-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
-{
-	struct pci_controller *hose = phb->hose;
-	struct pnv_ioda_pe *pe;
-	unsigned int weight;
-
-	/* If we have more PE# than segments available, hand out one
-	 * per PE until we run out and let the rest fail. If not,
-	 * then we assign at least one segment per PE, plus more based
-	 * on the amount of devices under that PE
-	 */
-	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
-		hose->global_number, phb->ioda.dma32_count);
-
-	pnv_pci_ioda_setup_opal_tce_kill(phb);
-
-	/* Walk our PE list and configure their DMA segments */
-	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-		weight = pnv_pci_ioda_pe_dma_weight(pe);
-		if (!weight)
-			continue;
-
-		/*
-		 * For IODA2 compliant PHB3, we needn't care about the weight.
-		 * The all available 32-bits DMA space will be assigned to
-		 * the specific PE.
-		 */
-		if (phb->type == PNV_PHB_IODA1) {
-			pnv_pci_ioda1_setup_dma_pe(phb, pe);
-		} else if (phb->type == PNV_PHB_IODA2) {
-			pe_info(pe, "Assign DMA32 space\n");
-			pnv_pci_ioda2_setup_dma_pe(phb, pe);
-		} else if (phb->type == PNV_PHB_NPU) {
-			/*
-			 * We initialise the DMA space for an NPU PHB
-			 * after setup of the PHB is complete as we
-			 * point the NPU TVT to the the same location
-			 * as the PHB3 TVT.
-			 */
-		}
-	}
-}
-
 #ifdef CONFIG_PCI_MSI
 static void pnv_ioda2_msi_eoi(struct irq_data *d)
 {
@@ -3087,39 +3041,6 @@ static void pnv_ioda_setup_pe_seg(struct pnv_ioda_pe *pe)
 	}
 }
 
-static void pnv_pci_ioda_setup_seg(void)
-{
-	struct pci_controller *tmp, *hose;
-	struct pnv_phb *phb;
-	struct pnv_ioda_pe *pe;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-
-		/* NPU PHB does not support IO or MMIO segmentation */
-		if (phb->type == PNV_PHB_NPU)
-			continue;
-
-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			pnv_ioda_setup_pe_seg(pe);
-		}
-	}
-}
-
-static void pnv_pci_ioda_setup_DMA(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pnv_phb *phb;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		pnv_ioda_setup_dma(hose->private_data);
-
-		/* Mark the PHB initialization done */
-		phb = hose->private_data;
-		phb->initialized = 1;
-	}
-}
-
 static void pnv_pci_ioda_create_dbgfs(void)
 {
 #ifdef CONFIG_DEBUG_FS
@@ -3130,6 +3051,9 @@ static void pnv_pci_ioda_create_dbgfs(void)
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
 
+		/* Notify initialization of PHB done */
+		phb->initialized = 1;
+
 		sprintf(name, "PCI%04x", hose->global_number);
 		phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
 		if (!phb->dbgfs)
@@ -3168,9 +3092,6 @@ static void pnv_npu_ioda_fixup(void)
 static void pnv_pci_ioda_fixup(void)
 {
 	pnv_pci_ioda_setup_PEs();
-	pnv_pci_ioda_setup_seg();
-	pnv_pci_ioda_setup_DMA();
-
 	pnv_pci_ioda_create_dbgfs();
 
 #ifdef CONFIG_EEH
@@ -3223,6 +3144,104 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
 	return phb->ioda.io_segsize;
 }
 
+/*
+ * We are updating root port or the upstream port of the
+ * bridge behind the root port with PHB's windows in order
+ * to accommodate the changes on required resources during
+ * PCI (slot) hotplug, which is connected to either root
+ * port or the downstream ports of PCIe switch behind the
+ * root port.
+ */
+static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
+					   unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct resource *r, *w;
+	int i;
+
+	/* Check if we need apply fixup to the bridge's windows */
+	if (!pci_is_root_bus(bridge->bus) &&
+	    !pci_is_root_bus(bridge->bus->self->bus))
+		return;
+
+	/* Fixup the resources */
+	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
+		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
+		if (!r->flags || !r->parent)
+			continue;
+
+		w = NULL;
+		if (r->flags & type & IORESOURCE_IO)
+			w = &hose->io_resource;
+		else if (pnv_pci_is_mem_pref_64(r->flags) &&
+			 (type & IORESOURCE_PREFETCH) &&
+			 phb->ioda.m64_segsize)
+			w = &hose->mem_resources[1];
+		else if (r->flags & type & IORESOURCE_MEM)
+			w = &hose->mem_resources[0];
+
+		r->start = w->start;
+		r->end = w->end;
+	}
+}
+
+static void pnv_pci_setup_bridge(struct pci_bus *bus,
+				 unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct pnv_ioda_pe *pe;
+	bool all = (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
+
+	 /* The PE for root bus should be realized before any one else */
+	if (!phb->ioda.root_pe_populated) {
+		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, false);
+		if (pe) {
+			phb->ioda.root_pe_idx = pe->pe_number;
+			phb->ioda.root_pe_populated = true;
+		}
+	}
+
+	/* Extend bridge's windows if necessary */
+	pnv_pci_fixup_bridge_resources(bus, type);
+
+	/* Don't assign PE to PCI bus, which doesn't have subordinate devices */
+	if (list_empty(&bus->devices))
+		return;
+
+	/* Reserve PEs according to used M64 resources */
+	if (phb->reserve_m64_pe)
+		phb->reserve_m64_pe(bus, NULL, all);
+
+	/*
+	 * Assign PE. We might run here because of partial hotplug.
+	 * For the case, we just pick up the existing PE and should
+	 * not allocate resources again.
+	 */
+	pe = pnv_ioda_setup_bus_PE(bus, all);
+	if (!pe)
+		return;
+
+	/* Setup MMIO mapping */
+	pnv_ioda_setup_pe_seg(pe);
+
+	/* Setup DMA */
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		pnv_pci_ioda1_setup_dma_pe(phb, pe);
+		break;
+	case PNV_PHB_IODA2:
+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+		break;
+	default:
+		pr_warn("%s: No DMA for PHB#%d (type %d)\n",
+			__func__, phb->hose->global_number, phb->type);
+	}
+}
+
 #ifdef CONFIG_PCI_IOV
 static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
@@ -3300,6 +3319,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 #endif
 	.enable_device_hook	= pnv_pci_enable_device_hook,
 	.window_alignment	= pnv_pci_window_alignment,
+	.setup_bridge		= pnv_pci_setup_bridge,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
 	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
@@ -3388,6 +3408,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (phb->regs == NULL)
 		pr_err("  Failed to map registers !\n");
 
+	/* Initialize TCE kill register */
+	pnv_pci_ioda_setup_opal_tce_kill(phb);
+
 	/* Initialize more IODA stuff */
 	phb->ioda.total_pe_num = 1;
 	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
@@ -3451,7 +3474,22 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
 	}
 	phb->ioda.pe_array = aux + pemap_off;
-	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
+
+	/*
+	 * Choose PE number for root bus, which shouldn't have
+	 * M64 resources consumed by its child devices. To pick
+	 * the PE number adjacent to the reserved one if possible.
+	 */
+	pnv_ioda_reserve_pe(phb, phb->ioda.reserved_pe_idx);
+	if (phb->ioda.reserved_pe_idx == 0) {
+		phb->ioda.root_pe_idx = 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
+	} else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1)) {
+		phb->ioda.root_pe_idx = phb->ioda.reserved_pe_idx - 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
+	} else {
+		phb->ioda.root_pe_idx = IODA_INVALID_PE;
+	}
 
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
 	mutex_init(&phb->ioda.pe_list_mutex);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index ef9924a..01f2428 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -118,6 +118,8 @@ struct pnv_phb {
 		/* Global bridge info */
 		unsigned int		total_pe_num;
 		unsigned int		reserved_pe_idx;
+		unsigned int		root_pe_idx;
+		bool			root_pe_populated;
 
 		/* 32-bit MMIO window */
 		unsigned int		m32_size;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (16 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  4:28   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 23/45] powerpc/powernv: Dynamically release PEs Gavin Shan
                   ` (21 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

pnv_pci_ioda_table_free_pages() can be reused to release the IODA1
TCE table when releasing IODA1 PE in subsequent patches.

This renames the following functions to support releasing IODA1 TCE
table: pnv_pci_ioda2_table_free_pages() to pnv_pci_ioda_table_free_pages(),
pnv_pci_ioda2_table_do_free_pages() to pnv_pci_ioda_table_do_free_pages().
No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d360607..077f9db 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -51,7 +51,7 @@
 #define POWERNV_IOMMU_DEFAULT_LEVELS	1
 #define POWERNV_IOMMU_MAX_LEVELS	5
 
-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl);
 
 static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 			    const char *fmt, ...)
@@ -1352,7 +1352,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
 		iommu_group_put(pe->table_group.group);
 		BUG_ON(pe->table_group.group);
 	}
-	pnv_pci_ioda2_table_free_pages(tbl);
+	pnv_pci_ioda_table_free_pages(tbl);
 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
 }
 
@@ -1946,7 +1946,7 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, long index,
 
 static void pnv_ioda2_table_free(struct iommu_table *tbl)
 {
-	pnv_pci_ioda2_table_free_pages(tbl);
+	pnv_pci_ioda_table_free_pages(tbl);
 	iommu_free_table(tbl, "pnv");
 }
 
@@ -2448,7 +2448,7 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int nid, unsigned shift,
 	return addr;
 }
 
-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
 		unsigned long size, unsigned level);
 
 static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
@@ -2487,7 +2487,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	 * release partially allocated table.
 	 */
 	if (offset < tce_table_size) {
-		pnv_pci_ioda2_table_do_free_pages(addr,
+		pnv_pci_ioda_table_do_free_pages(addr,
 				1ULL << (level_shift - 3), levels - 1);
 		return -ENOMEM;
 	}
@@ -2505,7 +2505,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	return 0;
 }
 
-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
 		unsigned long size, unsigned level)
 {
 	const unsigned long addr_ul = (unsigned long) addr &
@@ -2521,7 +2521,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
 			if (!(hpa & (TCE_PCI_READ | TCE_PCI_WRITE)))
 				continue;
 
-			pnv_pci_ioda2_table_do_free_pages(__va(hpa), size,
+			pnv_pci_ioda_table_do_free_pages(__va(hpa), size,
 					level - 1);
 		}
 	}
@@ -2529,7 +2529,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
 	free_pages(addr_ul, get_order(size << 3));
 }
 
-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl)
 {
 	const unsigned long size = tbl->it_indirect_levels ?
 			tbl->it_level_size : tbl->it_size;
@@ -2537,7 +2537,7 @@ static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
 	if (!tbl->it_size)
 		return;
 
-	pnv_pci_ioda2_table_do_free_pages((__be64 *)tbl->it_base, size,
+	pnv_pci_ioda_table_do_free_pages((__be64 *)tbl->it_base, size,
 			tbl->it_indirect_levels);
 }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 23/45] powerpc/powernv: Dynamically release PEs
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (17 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  5:19   ` Alexey Kardashevskiy
  2016-02-17  3:44   ` [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Gavin Shan
                   ` (20 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This support releasing PEs dynamically. Firstly, this moves
pnv_pci_ioda2_release_dma_pe() around, which is called to
release DMA resource on releasing IODA2 PE. Secondly, several
functions are implemented to release the consumed resources
on releasing the PE:

   * pnv_pci_ioda1_unset_window() to unset TVEs for the PE.
   * pnv_pci_ioda1_release_dma_pe() to unset TVEs for the PE and
     destroy the IOMMU table.
   * pnv_ioda_release_pe_seg() releases the consumed IO/M32/M64
     segments by the PE.

Lastly, this adds a reference count of PE, representing the number
of PCI devices associated with the PE. The reference count is
increased when PCI device joins the PE. It's decreased when PCI
device leaves the PE in pnv_pci_release_device(). When the count
becomes zero, its consumed resources are released by functions
as mentioned above. Note that the count is accessed concurrently.
So a "counter" with "int" type is enough here.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 236 ++++++++++++++++++++++++++----
 arch/powerpc/platforms/powernv/pci.h      |   1 +
 2 files changed, 209 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 077f9db..fa428a8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -119,6 +119,158 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
+static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe);
+static long pnv_pci_ioda1_unset_window(struct iommu_table_group *table_group,
+				       int num);
+static void pnv_pci_ioda1_release_dma_pe(struct pnv_ioda_pe *pe)
+{
+	struct iommu_table *tbl;
+	unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
+	int64_t rc;
+
+	if (!weight)
+		return;
+
+	tbl = pe->table_group.tables[0];
+	rc = pnv_pci_ioda1_unset_window(&pe->table_group, 0);
+	if (rc)
+		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
+
+	if (pe->table_group.group) {
+		iommu_group_put(pe->table_group.group);
+		WARN_ON(pe->table_group.group);
+	}
+
+	pnv_pci_ioda_table_free_pages(tbl);
+	iommu_free_table(tbl, "pnv");
+}
+
+static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
+				       int num);
+static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
+static void pnv_pci_ioda2_release_dma_pe(struct pnv_ioda_pe *pe)
+{
+	struct iommu_table *tbl;
+	unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
+	int64_t rc;
+
+	if (!weight)
+		return;
+
+	tbl = pe->table_group.tables[0];
+	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
+	if (rc)
+		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
+
+	pnv_pci_ioda2_set_bypass(pe, false);
+	if (pe->table_group.group) {
+		iommu_group_put(pe->table_group.group);
+		WARN_ON(pe->table_group.group);
+	}
+
+	pnv_pci_ioda_table_free_pages(tbl);
+	iommu_free_table(tbl, "pnv");
+}
+
+static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	int win, index, *segmap = NULL;
+	int64_t rc;
+
+	for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++) {
+		if (phb->type == PNV_PHB_IODA2 &&
+		    (win == OPAL_IO_WINDOW_TYPE || win == OPAL_M64_WINDOW_TYPE))
+			continue;
+
+		switch (win) {
+		case OPAL_IO_WINDOW_TYPE:
+			segmap = phb->ioda.io_segmap;
+			break;
+		case OPAL_M32_WINDOW_TYPE:
+			segmap = phb->ioda.m32_segmap;
+			break;
+		case OPAL_M64_WINDOW_TYPE:
+			segmap = phb->ioda.m64_segmap;
+			break;
+		}
+
+		for (index = 0; index < phb->ioda.total_pe_num; index++) {
+			if (segmap[index] != pe->pe_number)
+				continue;
+
+			if (win == OPAL_M64_WINDOW_TYPE)
+				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+						phb->ioda.reserved_pe_idx, win,
+						index / PNV_IODA1_M64_SEGS,
+						index % PNV_IODA1_M64_SEGS);
+			else
+				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+						phb->ioda.reserved_pe_idx, win,
+						0, index);
+			if (rc != OPAL_SUCCESS)
+				pe_warn(pe, "Error %ld unmapping (%d) segment#%d\n",
+					rc, win, index);
+
+			segmap[index] = IODA_INVALID_PE;
+		}
+	}
+}
+
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb,
+				   struct pnv_ioda_pe *pe);
+static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe);
+static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	struct pnv_ioda_pe *tmp, *slave;
+
+	/* Release slave PEs in compound PE */
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
+			pnv_ioda_release_pe(slave);
+	}
+
+	/* Remove the PE from the list */
+	list_del(&pe->list);
+
+	/* Release DMA segments */
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		pnv_pci_ioda1_release_dma_pe(pe);
+		break;
+	case PNV_PHB_IODA2:
+		pnv_pci_ioda2_release_dma_pe(pe);
+		break;
+	default:
+		WARN_ON(1);
+	}
+
+	pnv_ioda_release_pe_seg(pe);
+	pnv_ioda_deconfigure_pe(pe->phb, pe);
+
+	pnv_ioda_free_pe(pe);
+}
+
+static void pnv_pci_release_device(struct pci_dev *pdev)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	struct pnv_ioda_pe *pe;
+
+	if (pdev->is_virtfn)
+		return;
+
+	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
+		return;
+
+	pe = &phb->ioda.pe_array[pdn->pe_number];
+	WARN_ON(--pe->device_count < 0);
+	if (pe->device_count == 0)
+		pnv_ioda_release_pe(pe);
+}
+
 static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
 {
 	phb->ioda.pe_array[pe_no].phb = phb;
@@ -715,7 +867,6 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
 	return 0;
 }
 
-#ifdef CONFIG_PCI_IOV
 static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *parent;
@@ -750,9 +901,11 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 		}
 		rid_end = pe->rid + (count << 8);
 	} else {
+#ifdef CONFIG_PCI_IOV
 		if (pe->flags & PNV_IODA_PE_VF)
 			parent = pe->parent_dev;
 		else
+#endif
 			parent = pe->pdev->bus->self;
 		bcomp = OpalPciBusAll;
 		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
@@ -790,11 +943,12 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	pe->pbus = NULL;
 	pe->pdev = NULL;
+#ifdef CONFIG_PCI_IOV
 	pe->parent_dev = NULL;
+#endif
 
 	return 0;
 }
-#endif /* CONFIG_PCI_IOV */
 
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
@@ -1031,6 +1185,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 		if (pdn->pe_number != IODA_INVALID_PE)
 			continue;
 
+		pe->device_count++;
 		pdn->pcidev = dev;
 		pdn->pe_number = pe->pe_number;
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
@@ -1095,9 +1250,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 			bus->busn_res.start, pe->pe_number);
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
-		/* XXX What do we do here ? */
-		pnv_ioda_free_pe(pe);
 		pe->pbus = NULL;
+		pnv_ioda_release_pe(pe);
 		return NULL;
 	}
 
@@ -1333,29 +1487,6 @@ m64_failed:
 	return -EBUSY;
 }
 
-static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
-		int num);
-static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
-
-static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe *pe)
-{
-	struct iommu_table    *tbl;
-	int64_t               rc;
-
-	tbl = pe->table_group.tables[0];
-	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
-	if (rc)
-		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
-
-	pnv_pci_ioda2_set_bypass(pe, false);
-	if (pe->table_group.group) {
-		iommu_group_put(pe->table_group.group);
-		BUG_ON(pe->table_group.group);
-	}
-	pnv_pci_ioda_table_free_pages(tbl);
-	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
-}
-
 static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
 {
 	struct pci_bus        *bus;
@@ -1376,7 +1507,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
 		if (pe->parent_dev != pdev)
 			continue;
 
-		pnv_pci_ioda2_release_dma_pe(pdev, pe);
+		pnv_pci_ioda2_release_dma_pe(pe);
 
 		/* Remove from list */
 		mutex_lock(&phb->ioda.pe_list_mutex);
@@ -1780,6 +1911,16 @@ static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl,
 	 */
 }
 
+static void pnv_pci_ioda1_tce_invalidate_entire(struct pnv_ioda_pe *pe)
+{
+	struct iommu_table *tbl = pe->table_group.tables[0];
+
+	if (!tbl)
+		return;
+
+	pnv_pci_ioda1_tce_invalidate(tbl, tbl->it_offset, tbl->it_size, false);
+}
+
 static int pnv_ioda1_tce_build(struct iommu_table *tbl, long index,
 		long npages, unsigned long uaddr,
 		enum dma_data_direction direction,
@@ -2144,6 +2285,44 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	}
 }
 
+static long pnv_pci_ioda1_unset_window(struct iommu_table_group *table_group,
+				       int num)
+{
+	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
+					      table_group);
+	struct pnv_phb *phb = pe->phb;
+	int start, count, i;
+	long rc = OPAL_SUCCESS;
+
+	pe_info(pe, "Removing DMA window #%d\n", num);
+
+	/* Search the used DMA32 segments */
+	start = -1;
+	count = 0;
+	for (i = 0; i < phb->ioda.dma32_count; i++) {
+		if (phb->ioda.dma32_segmap[i] != pe->pe_number)
+			continue;
+
+		if (count++ == 0)
+			start = i;
+	}
+
+	if (!count)
+		return OPAL_SUCCESS;
+
+	for (i = start; i < start + count; i++)
+		rc |= opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
+						 i, 0, 0ul, 0ul, 0ul);
+	if (rc)
+		pe_warn(pe, "Failure %ld unmapping TVEs\n");
+	else
+		pnv_pci_ioda1_tce_invalidate_entire(pe);
+
+	pnv_pci_unlink_table_and_group(table_group->tables[num], table_group);
+
+	return rc;
+}
+
 static long pnv_pci_ioda2_set_window(struct iommu_table_group *table_group,
 		int num, struct iommu_table *tbl)
 {
@@ -3318,6 +3497,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
 #endif
 	.enable_device_hook	= pnv_pci_enable_device_hook,
+	.release_device		= pnv_pci_release_device,
 	.window_alignment	= pnv_pci_window_alignment,
 	.setup_bridge		= pnv_pci_setup_bridge,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 01f2428..0cddde3 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -31,6 +31,7 @@ struct pnv_phb;
 struct pnv_ioda_pe {
 	unsigned long		flags;
 	struct pnv_phb		*phb;
+	int			device_count;
 
 #define PNV_IODA_MAX_PEER_PES	8
 	struct pnv_ioda_pe	*peers[PNV_IODA_MAX_PEER_PES];
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2016-02-17  3:44   ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                     ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
with names of the weak functions in PCI subsystem, which have the
prefix "pcibios". No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |  4 ++--
 arch/powerpc/kernel/eeh_driver.c      | 12 ++++++------
 arch/powerpc/kernel/pci-hotplug.c     | 15 +++++++--------
 drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
 drivers/pci/hotplug/rpaphp_core.c     |  4 ++--
 drivers/pci/hotplug/rpaphp_pci.c      |  2 +-
 6 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 4dd6ef4..c817f38 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -263,10 +263,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
 extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
 
 /** Remove all of the PCI devices under this bus */
-extern void pcibios_remove_pci_devices(struct pci_bus *bus);
+extern void pci_remove_pci_devices(struct pci_bus *bus);
 
 /** Discover new pci devices under this bus, and add them */
-extern void pcibios_add_pci_devices(struct pci_bus *bus);
+extern void pci_add_pci_devices(struct pci_bus *bus);
 
 
 extern void isa_bridge_find_early(struct pci_controller *hose);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index fb6207d..59e53fe 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -621,7 +621,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 	 * We don't remove the corresponding PE instances because
 	 * we need the information afterwords. The attached EEH
 	 * devices are expected to be attached soon when calling
-	 * into pcibios_add_pci_devices().
+	 * into pci_add_pci_devices().
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
@@ -630,7 +630,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 		} else {
 			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
 			pci_lock_rescan_remove();
-			pcibios_remove_pci_devices(bus);
+			pci_remove_pci_devices(bus);
 			pci_unlock_rescan_remove();
 		}
 	} else if (frozen_bus) {
@@ -681,7 +681,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 		if (pe->type & EEH_PE_VF)
 			eeh_add_virt_device(edev, NULL);
 		else
-			pcibios_add_pci_devices(bus);
+			pci_add_pci_devices(bus);
 	} else if (frozen_bus && rmv_data->removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
@@ -691,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 		if (pe->type & EEH_PE_VF)
 			eeh_add_virt_device(edev, NULL);
 		else
-			pcibios_add_pci_devices(frozen_bus);
+			pci_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -896,7 +896,7 @@ perm_error:
 			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 
 			pci_lock_rescan_remove();
-			pcibios_remove_pci_devices(frozen_bus);
+			pci_remove_pci_devices(frozen_bus);
 			pci_unlock_rescan_remove();
 		}
 	}
@@ -981,7 +981,7 @@ static void eeh_handle_special_event(void)
 				bus = eeh_pe_bus_get(phb_pe);
 				eeh_pe_dev_traverse(pe,
 					eeh_report_failure, NULL);
-				pcibios_remove_pci_devices(bus);
+				pci_remove_pci_devices(bus);
 			}
 			pci_unlock_rescan_remove();
 		}
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 59c4361..78bf2a1 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -38,20 +38,20 @@ void pcibios_release_device(struct pci_dev *dev)
 }
 
 /**
- * pcibios_remove_pci_devices - remove all devices under this bus
+ * pci_remove_pci_devices - remove all devices under this bus
  * @bus: the indicated PCI bus
  *
  * Remove all of the PCI devices under this bus both from the
  * linux pci device tree, and from the powerpc EEH address cache.
  */
-void pcibios_remove_pci_devices(struct pci_bus *bus)
+void pci_remove_pci_devices(struct pci_bus *bus)
 {
 	struct pci_dev *dev, *tmp;
 	struct pci_bus *child_bus;
 
 	/* First go down child busses */
 	list_for_each_entry(child_bus, &bus->children, node)
-		pcibios_remove_pci_devices(child_bus);
+		pci_remove_pci_devices(child_bus);
 
 	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 		 pci_domain_nr(bus),  bus->number);
@@ -60,11 +60,10 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 		pci_stop_and_remove_bus_device(dev);
 	}
 }
-
-EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
+EXPORT_SYMBOL_GPL(pci_remove_pci_devices);
 
 /**
- * pcibios_add_pci_devices - adds new pci devices to bus
+ * pci_add_pci_devices - adds new pci devices to bus
  * @bus: the indicated PCI bus
  *
  * This routine will find and fixup new pci devices under
@@ -74,7 +73,7 @@ EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
  * is how this routine differs from other, similar pcibios
  * routines.)
  */
-void pcibios_add_pci_devices(struct pci_bus * bus)
+void pci_add_pci_devices(struct pci_bus *bus)
 {
 	int slotno, mode, pass, max;
 	struct pci_dev *dev;
@@ -114,4 +113,4 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
 	}
 	pcibios_finish_adding_to_bus(bus);
 }
-EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
+EXPORT_SYMBOL_GPL(pci_add_pci_devices);
diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
index b46b57d..730982b 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -380,7 +380,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
 	}
 
 	/* Remove all devices below slot */
-	pcibios_remove_pci_devices(bus);
+	pci_remove_pci_devices(bus);
 
 	/* Unmap PCI IO space */
 	if (pcibios_unmap_io_space(bus)) {
diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
index 611f605..bba07b3 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -404,7 +404,7 @@ static int enable_slot(struct hotplug_slot *hotplug_slot)
 
 	if (state == PRESENT) {
 		pci_lock_rescan_remove();
-		pcibios_add_pci_devices(slot->bus);
+		pci_add_pci_devices(slot->bus);
 		pci_unlock_rescan_remove();
 		slot->state = CONFIGURED;
 	} else if (state == EMPTY) {
@@ -426,7 +426,7 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
 		return -EINVAL;
 
 	pci_lock_rescan_remove();
-	pcibios_remove_pci_devices(slot->bus);
+	pci_remove_pci_devices(slot->bus);
 	pci_unlock_rescan_remove();
 	vm_unmap_aliases();
 
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
index 7836d69..1099b38 100644
--- a/drivers/pci/hotplug/rpaphp_pci.c
+++ b/drivers/pci/hotplug/rpaphp_pci.c
@@ -116,7 +116,7 @@ int rpaphp_enable_slot(struct slot *slot)
 		}
 
 		if (list_empty(&bus->devices))
-			pcibios_add_pci_devices(bus);
+			pci_add_pci_devices(bus);
 
 		if (!list_empty(&bus->devices)) {
 			info->adapter_status = CONFIGURED;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add, remove}_pci_devices()
@ 2016-02-17  3:44   ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
with names of the weak functions in PCI subsystem, which have the
prefix "pcibios". No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |  4 ++--
 arch/powerpc/kernel/eeh_driver.c      | 12 ++++++------
 arch/powerpc/kernel/pci-hotplug.c     | 15 +++++++--------
 drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
 drivers/pci/hotplug/rpaphp_core.c     |  4 ++--
 drivers/pci/hotplug/rpaphp_pci.c      |  2 +-
 6 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 4dd6ef4..c817f38 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -263,10 +263,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
 extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
 
 /** Remove all of the PCI devices under this bus */
-extern void pcibios_remove_pci_devices(struct pci_bus *bus);
+extern void pci_remove_pci_devices(struct pci_bus *bus);
 
 /** Discover new pci devices under this bus, and add them */
-extern void pcibios_add_pci_devices(struct pci_bus *bus);
+extern void pci_add_pci_devices(struct pci_bus *bus);
 
 
 extern void isa_bridge_find_early(struct pci_controller *hose);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index fb6207d..59e53fe 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -621,7 +621,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 	 * We don't remove the corresponding PE instances because
 	 * we need the information afterwords. The attached EEH
 	 * devices are expected to be attached soon when calling
-	 * into pcibios_add_pci_devices().
+	 * into pci_add_pci_devices().
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
@@ -630,7 +630,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 		} else {
 			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
 			pci_lock_rescan_remove();
-			pcibios_remove_pci_devices(bus);
+			pci_remove_pci_devices(bus);
 			pci_unlock_rescan_remove();
 		}
 	} else if (frozen_bus) {
@@ -681,7 +681,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 		if (pe->type & EEH_PE_VF)
 			eeh_add_virt_device(edev, NULL);
 		else
-			pcibios_add_pci_devices(bus);
+			pci_add_pci_devices(bus);
 	} else if (frozen_bus && rmv_data->removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
@@ -691,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 		if (pe->type & EEH_PE_VF)
 			eeh_add_virt_device(edev, NULL);
 		else
-			pcibios_add_pci_devices(frozen_bus);
+			pci_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -896,7 +896,7 @@ perm_error:
 			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 
 			pci_lock_rescan_remove();
-			pcibios_remove_pci_devices(frozen_bus);
+			pci_remove_pci_devices(frozen_bus);
 			pci_unlock_rescan_remove();
 		}
 	}
@@ -981,7 +981,7 @@ static void eeh_handle_special_event(void)
 				bus = eeh_pe_bus_get(phb_pe);
 				eeh_pe_dev_traverse(pe,
 					eeh_report_failure, NULL);
-				pcibios_remove_pci_devices(bus);
+				pci_remove_pci_devices(bus);
 			}
 			pci_unlock_rescan_remove();
 		}
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 59c4361..78bf2a1 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -38,20 +38,20 @@ void pcibios_release_device(struct pci_dev *dev)
 }
 
 /**
- * pcibios_remove_pci_devices - remove all devices under this bus
+ * pci_remove_pci_devices - remove all devices under this bus
  * @bus: the indicated PCI bus
  *
  * Remove all of the PCI devices under this bus both from the
  * linux pci device tree, and from the powerpc EEH address cache.
  */
-void pcibios_remove_pci_devices(struct pci_bus *bus)
+void pci_remove_pci_devices(struct pci_bus *bus)
 {
 	struct pci_dev *dev, *tmp;
 	struct pci_bus *child_bus;
 
 	/* First go down child busses */
 	list_for_each_entry(child_bus, &bus->children, node)
-		pcibios_remove_pci_devices(child_bus);
+		pci_remove_pci_devices(child_bus);
 
 	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 		 pci_domain_nr(bus),  bus->number);
@@ -60,11 +60,10 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 		pci_stop_and_remove_bus_device(dev);
 	}
 }
-
-EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
+EXPORT_SYMBOL_GPL(pci_remove_pci_devices);
 
 /**
- * pcibios_add_pci_devices - adds new pci devices to bus
+ * pci_add_pci_devices - adds new pci devices to bus
  * @bus: the indicated PCI bus
  *
  * This routine will find and fixup new pci devices under
@@ -74,7 +73,7 @@ EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
  * is how this routine differs from other, similar pcibios
  * routines.)
  */
-void pcibios_add_pci_devices(struct pci_bus * bus)
+void pci_add_pci_devices(struct pci_bus *bus)
 {
 	int slotno, mode, pass, max;
 	struct pci_dev *dev;
@@ -114,4 +113,4 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
 	}
 	pcibios_finish_adding_to_bus(bus);
 }
-EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
+EXPORT_SYMBOL_GPL(pci_add_pci_devices);
diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
index b46b57d..730982b 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -380,7 +380,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
 	}
 
 	/* Remove all devices below slot */
-	pcibios_remove_pci_devices(bus);
+	pci_remove_pci_devices(bus);
 
 	/* Unmap PCI IO space */
 	if (pcibios_unmap_io_space(bus)) {
diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
index 611f605..bba07b3 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -404,7 +404,7 @@ static int enable_slot(struct hotplug_slot *hotplug_slot)
 
 	if (state == PRESENT) {
 		pci_lock_rescan_remove();
-		pcibios_add_pci_devices(slot->bus);
+		pci_add_pci_devices(slot->bus);
 		pci_unlock_rescan_remove();
 		slot->state = CONFIGURED;
 	} else if (state == EMPTY) {
@@ -426,7 +426,7 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
 		return -EINVAL;
 
 	pci_lock_rescan_remove();
-	pcibios_remove_pci_devices(slot->bus);
+	pci_remove_pci_devices(slot->bus);
 	pci_unlock_rescan_remove();
 	vm_unmap_aliases();
 
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
index 7836d69..1099b38 100644
--- a/drivers/pci/hotplug/rpaphp_pci.c
+++ b/drivers/pci/hotplug/rpaphp_pci.c
@@ -116,7 +116,7 @@ int rpaphp_enable_slot(struct slot *slot)
 		}
 
 		if (list_empty(&bus->devices))
-			pcibios_add_pci_devices(bus);
+			pci_add_pci_devices(bus);
 
 		if (!list_empty(&bus->devices)) {
 			info->adapter_status = CONFIGURED;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 25/45] powerpc/pci: Rename pcibios_find_pci_bus()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (19 preceding siblings ...)
  2016-02-17  3:44   ` [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  5:31   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 26/45] powerpc/pci: Move pci_find_bus_by_node() around Gavin Shan
                   ` (18 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This renames pcibios_find_pci_bus() to pci_find_bus_by_node() to
avoid conflicts with those PCI subsystem weak function names, which
have prefix "pcibios". No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h      | 2 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c | 5 ++---
 drivers/pci/hotplug/rpadlpar_core.c        | 6 +++---
 drivers/pci/hotplug/rpaphp_pci.c           | 2 +-
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c817f38..03f4ee7 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -260,7 +260,7 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
 #endif
 
 /** Find the bus corresponding to the indicated device node */
-extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
+extern struct pci_bus *pci_find_bus_by_node(struct device_node *dn);
 
 /** Remove all of the PCI devices under this bus */
 extern void pci_remove_pci_devices(struct pci_bus *bus);
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 5d4a3df..aee22b4 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -54,8 +54,7 @@ find_bus_among_children(struct pci_bus *bus,
 	return child;
 }
 
-struct pci_bus *
-pcibios_find_pci_bus(struct device_node *dn)
+struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
 {
 	struct pci_dn *pdn = dn->data;
 
@@ -64,7 +63,7 @@ pcibios_find_pci_bus(struct device_node *dn)
 
 	return find_bus_among_children(pdn->phb->bus, dn);
 }
-EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
+EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
 
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
index 730982b..acbf041 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -175,7 +175,7 @@ static int dlpar_add_pci_slot(char *drc_name, struct device_node *dn)
 	struct pci_dev *dev;
 	struct pci_controller *phb;
 
-	if (pcibios_find_pci_bus(dn))
+	if (pci_find_bus_by_node(dn))
 		return -EINVAL;
 
 	/* Add pci bus */
@@ -212,7 +212,7 @@ static int dlpar_remove_phb(char *drc_name, struct device_node *dn)
 	struct pci_dn *pdn;
 	int rc = 0;
 
-	if (!pcibios_find_pci_bus(dn))
+	if (!pci_find_bus_by_node(dn))
 		return -EINVAL;
 
 	/* If pci slot is hotpluggable, use hotplug to remove it */
@@ -356,7 +356,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
 
 	pci_lock_rescan_remove();
 
-	bus = pcibios_find_pci_bus(dn);
+	bus = pci_find_bus_by_node(dn);
 	if (!bus) {
 		ret = -EINVAL;
 		goto out;
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
index 1099b38..a9180bb 100644
--- a/drivers/pci/hotplug/rpaphp_pci.c
+++ b/drivers/pci/hotplug/rpaphp_pci.c
@@ -93,7 +93,7 @@ int rpaphp_enable_slot(struct slot *slot)
 	if (rc)
 		return rc;
 
-	bus = pcibios_find_pci_bus(slot->dn);
+	bus = pci_find_bus_by_node(slot->dn);
 	if (!bus) {
 		err("%s: no pci_bus for dn %s\n", __func__, slot->dn->full_name);
 		return -EINVAL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 26/45] powerpc/pci: Move pci_find_bus_by_node() around
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (20 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 25/45] powerpc/pci: Rename pcibios_find_pci_bus() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-02-17  3:44 ` [PATCH v8 27/45] powerpc/pci: Export pci_add_device_node_info() Gavin Shan
                   ` (17 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This moves pci_find_bus_by_node() from arch/powerpc/platforms/
pseries/pci_dlpar.c to arch/powerpc/kernel/pci-hotplug.c so that
the function can be used by pSeries and PowerNV platform at the
same time. Also, below cleanup applied. No functional changes
introduced.

   * Remove variable "busdn" in find_bus_among_children()
   * Use PCI_DN() to convert device node to pci_dn

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/pci-hotplug.c          | 29 ++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pci_dlpar.c | 31 ------------------------------
 2 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 78bf2a1..7929a1c 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -21,6 +21,35 @@
 #include <asm/firmware.h>
 #include <asm/eeh.h>
 
+static struct pci_bus *find_bus_among_children(struct pci_bus *bus,
+					       struct device_node *dn)
+{
+	struct pci_bus *child = NULL;
+	struct pci_bus *tmp;
+
+	if (pci_bus_to_OF_node(bus) == dn)
+		return bus;
+
+	list_for_each_entry(tmp, &bus->children, node) {
+		child = find_bus_among_children(tmp, dn);
+		if (child)
+			break;
+	}
+
+	return child;
+}
+
+struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
+{
+	struct pci_dn *pdn = PCI_DN(dn);
+
+	if (!pdn  || !pdn->phb || !pdn->phb->bus)
+		return NULL;
+
+	return find_bus_among_children(pdn->phb->bus, dn);
+}
+EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
+
 /**
  * pcibios_release_device - release PCI device
  * @dev: PCI device
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index aee22b4..906dbaa 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -34,37 +34,6 @@
 
 #include "pseries.h"
 
-static struct pci_bus *
-find_bus_among_children(struct pci_bus *bus,
-                        struct device_node *dn)
-{
-	struct pci_bus *child = NULL;
-	struct pci_bus *tmp;
-	struct device_node *busdn;
-
-	busdn = pci_bus_to_OF_node(bus);
-	if (busdn == dn)
-		return bus;
-
-	list_for_each_entry(tmp, &bus->children, node) {
-		child = find_bus_among_children(tmp, dn);
-		if (child)
-			break;
-	};
-	return child;
-}
-
-struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
-{
-	struct pci_dn *pdn = dn->data;
-
-	if (!pdn  || !pdn->phb || !pdn->phb->bus)
-		return NULL;
-
-	return find_bus_among_children(pdn->phb->bus, dn);
-}
-EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
-
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
 	struct pci_controller *phb;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 27/45] powerpc/pci: Export pci_add_device_node_info()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (21 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 26/45] powerpc/pci: Move pci_find_bus_by_node() around Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  5:35   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 28/45] powerpc/pci: Introduce pci_remove_device_node_info() Gavin Shan
                   ` (16 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This renames update_dn_pci_info() to pci_add_device_node_info()
with corresponding adjustment on the parameter type and exports it.
The function is used to create pdn (struct pci_dn) for the indicated
device node. Another function add_pdn(), almost wrapper of
pci_add_device_node_info(), to be used in traverse_pci_devices(). No
logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h  |  3 ++-
 arch/powerpc/kernel/pci_dn.c           | 30 +++++++++++++++++++-----------
 arch/powerpc/platforms/pseries/setup.c |  2 +-
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 03f4ee7..72a9d4e 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -238,7 +238,8 @@ extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
 extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
 extern void remove_dev_pci_data(struct pci_dev *pdev);
-extern void *update_dn_pci_info(struct device_node *dn, void *data);
+extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
+					       struct device_node *dn);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
 					  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 38102cb..0a249ff 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -282,13 +282,9 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 #endif /* CONFIG_PCI_IOV */
 }
 
-/*
- * Traverse_func that inits the PCI fields of the device node.
- * NOTE: this *must* be done before read/write config to the device.
- */
-void *update_dn_pci_info(struct device_node *dn, void *data)
+struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
+					struct device_node *dn)
 {
-	struct pci_controller *phb = data;
 	const __be32 *type = of_get_property(dn, "ibm,pci-config-space-type", NULL);
 	const __be32 *regs;
 	struct device_node *parent;
@@ -299,7 +295,7 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 		return NULL;
 	dn->data = pdn;
 	pdn->node = dn;
-	pdn->phb = phb;
+	pdn->phb = hose;
 #ifdef CONFIG_PPC_POWERNV
 	pdn->pe_number = IODA_INVALID_PE;
 #endif
@@ -331,8 +327,9 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	if (pdn->parent)
 		list_add_tail(&pdn->list, &pdn->parent->child_list);
 
-	return NULL;
+	return pdn;
 }
+EXPORT_SYMBOL_GPL(pci_add_device_node_info);
 
 /*
  * Traverse a device tree stopping each PCI device in the tree.
@@ -432,6 +429,18 @@ void *traverse_pci_dn(struct pci_dn *root,
 	return NULL;
 }
 
+static void *add_pdn(struct device_node *dn, void *data)
+{
+	struct pci_controller *hose = data;
+	struct pci_dn *pdn;
+
+	pdn = pci_add_device_node_info(hose, dn);
+	if (!pdn)
+		return ERR_PTR(-ENOMEM);
+
+	return NULL;
+}
+
 /** 
  * pci_devs_phb_init_dynamic - setup pci devices under this PHB
  * phb: pci-to-host bridge (top-level bridge connecting to cpu)
@@ -446,8 +455,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	struct pci_dn *pdn;
 
 	/* PHB nodes themselves must not match */
-	update_dn_pci_info(dn, phb);
-	pdn = dn->data;
+	pdn = pci_add_device_node_info(phb, dn);
 	if (pdn) {
 		pdn->devfn = pdn->busno = -1;
 		pdn->vendor_id = pdn->device_id = pdn->class_code = 0;
@@ -456,7 +464,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	}
 
 	/* Update dn->phb ptrs for new phb and children devices */
-	traverse_pci_devices(dn, update_dn_pci_info, phb);
+	traverse_pci_devices(dn, add_pdn, phb);
 }
 
 /** 
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 36df46e..6f8d020 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -265,7 +265,7 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 		pdn = parent ? PCI_DN(parent) : NULL;
 		if (pdn) {
 			/* Create pdn and EEH device */
-			update_dn_pci_info(np, pdn->phb);
+			pci_add_device_node_info(pdn->phb, np);
 			eeh_dev_init(PCI_DN(np), pdn->phb);
 		}
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 28/45] powerpc/pci: Introduce pci_remove_device_node_info()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (22 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 27/45] powerpc/pci: Export pci_add_device_node_info() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  5:48   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes() Gavin Shan
                   ` (15 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This implements and exports pci_remove_device_node_info(). It's
used to remove the pdn (struct pci_dn) for the indicated device
node. The function is going to be used by PowerNV PCI hotplug
driver.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |  1 +
 arch/powerpc/kernel/pci_dn.c          | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 72a9d4e..c6310e2 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -240,6 +240,7 @@ extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
 extern void remove_dev_pci_data(struct pci_dev *pdev);
 extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
 					       struct device_node *dn);
+extern void pci_remove_device_node_info(struct device_node *dn);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
 					  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 0a249ff..ce10281 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -331,6 +331,29 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
 }
 EXPORT_SYMBOL_GPL(pci_add_device_node_info);
 
+void pci_remove_device_node_info(struct device_node *dn)
+{
+	struct pci_dn *pdn = dn ? PCI_DN(dn) : NULL;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+
+	if (edev)
+		edev->pdn = NULL;
+#endif
+
+	if (!pdn)
+		return;
+
+	WARN_ON(!list_empty(&pdn->child_list));
+	list_del(&pdn->list);
+	if (pdn->parent)
+		of_node_put(pdn->parent->node);
+
+	dn->data = NULL;
+	kfree(pdn);
+}
+EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
+
 /*
  * Traverse a device tree stopping each PCI device in the tree.
  * This is done depth first.  As each node is processed, a "pre"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (23 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 28/45] powerpc/pci: Introduce pci_remove_device_node_info() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
       [not found]   ` <1455680668-23298-30-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2016-02-17  3:44 ` [PATCH v8 30/45] powerpc/pci: Delay populating pdn Gavin Shan
                   ` (14 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This renames traverse_pci_devices() to pci_traverse_device_nodes().
The function traverses all subordinate device nodes of the specified
one. Also, below cleanup applied to the function. No logical changes
introduced.

   * Rename "pre" to "fn".
   * Avoid assignment in if condition reported from checkpatch.pl.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
 arch/powerpc/kernel/pci_dn.c         | 15 ++++++++++-----
 arch/powerpc/platforms/pseries/msi.c |  4 ++--
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index ca0c5bf..8753e4e 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
 struct device_node;
 struct pci_dn;
 
-typedef void *(*traverse_func)(struct device_node *me, void *data);
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data);
+void *pci_traverse_device_nodes(struct device_node *start,
+				void *(*fn)(struct device_node *, void *),
+				void *data);
 void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index ce10281..ecdccce 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -372,8 +372,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
  * one of these nodes we also assume its siblings are non-pci for
  * performance.
  */
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data)
+void *pci_traverse_device_nodes(struct device_node *start,
+				void *(*fn)(struct device_node *, void *),
+				void *data)
 {
 	struct device_node *dn, *nextdn;
 	void *ret;
@@ -388,8 +389,11 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
 		if (classp)
 			class = of_read_number(classp, 1);
 
-		if (pre && ((ret = pre(dn, data)) != NULL))
-			return ret;
+		if (fn) {
+			ret = fn(dn, data);
+			if (ret)
+				return ret;
+		}
 
 		/* If we are a PCI bridge, go down */
 		if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
@@ -411,6 +415,7 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
 	}
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(pci_traverse_device_nodes);
 
 static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
 				      struct pci_dn *pdn)
@@ -487,7 +492,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	}
 
 	/* Update dn->phb ptrs for new phb and children devices */
-	traverse_pci_devices(dn, add_pdn, phb);
+	pci_traverse_device_nodes(dn, add_pdn, phb);
 }
 
 /** 
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 272e9ec..543a638 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
 	memset(&counts, 0, sizeof(struct msi_counts));
 
 	/* Work out how many devices we have below this PE */
-	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
+	pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, &counts);
 
 	if (counts.num_devices == 0) {
 		pr_err("rtas_msi: found 0 devices under PE for %s\n",
@@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
 	/* else, we have some more calculating to do */
 	counts.requestor = pci_device_to_OF_node(dev);
 	counts.request = request;
-	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
+	pci_traverse_device_nodes(pe_dn, count_spare_msis, &counts);
 
 	/* If the quota isn't an integer multiple of the total, we can
 	 * use the remainder as spare MSIs for anyone that wants them. */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 30/45] powerpc/pci: Delay populating pdn
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (24 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  8:19   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 31/45] powerpc/pci: Don't scan empty slot Gavin Shan
                   ` (13 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

The pdn (struct pci_dn) instances are allocated from memblock or
bootmem when creating PCI controller (hoses) in setup_arch(). PCI
hotplug, which will be supported by proceeding patches, releases
PCI device nodes and their corresponding pdn on unplugging event.
The memory chunks for pdn instances allocated from memblock or
bootmem are hard to reused after being released.

This delays creating pdn by pci_devs_phb_init() from setup_arch()
to core_initcall() so that they are allocated from slab. The memory
consumed by pdn can be released to system without problem during
PCI unplugging time. It indicates that pci_dn is unavailable in
setup_arch() and the the fixup on pdn (like AGP's) can't be carried
out that time. We have to do that in ppc_md.pcibios_root_bridge_prepare()
on maple/pasemi/powermac platforms where/when the pdn is available.

At the mean while, the EEH device is created when pdn is populated,
meaning pdn and EEH device have same life cycle. In turn, we needn't
call eeh_dev_init() to create EEH device explicitly.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h         |  2 +-
 arch/powerpc/include/asm/ppc-pci.h     |  2 --
 arch/powerpc/kernel/eeh_dev.c          | 17 +++------------
 arch/powerpc/kernel/pci_dn.c           | 23 ++++++++++++++++----
 arch/powerpc/platforms/maple/pci.c     | 34 ++++++++++++++++++------------
 arch/powerpc/platforms/pasemi/pci.c    |  3 ---
 arch/powerpc/platforms/powermac/pci.c  | 38 +++++++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.c   |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  6 +-----
 9 files changed, 69 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index fb9f376..8721580 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -274,7 +274,7 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
-void *eeh_dev_init(struct pci_dn *pdn, void *data);
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
 int eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 8753e4e..0f73de0 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -39,8 +39,6 @@ void *pci_traverse_device_nodes(struct device_node *start,
 void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
-
-extern void pci_devs_phb_init(void);
 extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
 
 /* From rtas_pci.h */
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index 7815095..d6b2ca7 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -44,14 +44,13 @@
 /**
  * eeh_dev_init - Create EEH device according to OF node
  * @pdn: PCI device node
- * @data: PHB
  *
  * It will create EEH device according to the given OF node. The function
  * might be called by PCI emunation, DR, PHB hotplug.
  */
-void *eeh_dev_init(struct pci_dn *pdn, void *data)
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
 {
-	struct pci_controller *phb = data;
+	struct pci_controller *phb = pdn->phb;
 	struct eeh_dev *edev;
 
 	/* Allocate EEH device */
@@ -69,7 +68,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
 	INIT_LIST_HEAD(&edev->list);
 	INIT_LIST_HEAD(&edev->rmv_list);
 
-	return NULL;
+	return edev;
 }
 
 /**
@@ -81,16 +80,8 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
  */
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
 {
-	struct pci_dn *root = phb->pci_data;
-
 	/* EEH PE for PHB */
 	eeh_phb_pe_create(phb);
-
-	/* EEH device for PHB */
-	eeh_dev_init(root, phb);
-
-	/* EEH devices for children OF nodes */
-	traverse_pci_dn(root, eeh_dev_init, phb);
 }
 
 /**
@@ -106,8 +97,6 @@ static int __init eeh_dev_phb_init(void)
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		eeh_dev_phb_init_dynamic(phb);
 
-	pr_info("EEH: devices created\n");
-
 	return 0;
 }
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index ecdccce..9cbf95a 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -209,8 +209,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 		}
 
 		/* Create the EEH device for the VF */
-		eeh_dev_init(pdn, pci_bus_to_host(pdev->bus));
-		edev = pdn_to_eeh_dev(pdn);
+		edev = eeh_dev_init(pdn);
 		BUG_ON(!edev);
 		edev->physfn = pdev;
 	}
@@ -289,8 +288,11 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
 	const __be32 *regs;
 	struct device_node *parent;
 	struct pci_dn *pdn;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev;
+#endif
 
-	pdn = zalloc_maybe_bootmem(sizeof(*pdn), GFP_KERNEL);
+	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
 	if (pdn == NULL)
 		return NULL;
 	dn->data = pdn;
@@ -319,6 +321,15 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
 	/* Extended config space */
 	pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1);
 
+	/* Create EEH device */
+#ifdef CONFIG_EEH
+	edev = eeh_dev_init(pdn);
+	if (!edev) {
+		kfree(pdn);
+		return NULL;
+	}
+#endif
+
 	/* Attach to parent node */
 	INIT_LIST_HEAD(&pdn->child_list);
 	INIT_LIST_HEAD(&pdn->list);
@@ -504,15 +515,19 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
  * pci device found underneath.  This routine runs once,
  * early in the boot sequence.
  */
-void __init pci_devs_phb_init(void)
+static int __init pci_devs_phb_init(void)
 {
 	struct pci_controller *phb, *tmp;
 
 	/* This must be done first so the device nodes have valid pci info! */
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		pci_devs_phb_init_dynamic(phb);
+
+	return 0;
 }
 
+core_initcall(pci_devs_phb_init);
+
 static void pci_dev_pdn_setup(struct pci_dev *pdev)
 {
 	struct pci_dn *pdn;
diff --git a/arch/powerpc/platforms/maple/pci.c b/arch/powerpc/platforms/maple/pci.c
index a923230..a2f89e6 100644
--- a/arch/powerpc/platforms/maple/pci.c
+++ b/arch/powerpc/platforms/maple/pci.c
@@ -568,6 +568,26 @@ void maple_pci_irq_fixup(struct pci_dev *dev)
 	DBG(" <- maple_pci_irq_fixup\n");
 }
 
+static int maple_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions hopefully.
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+
 void __init maple_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -605,19 +625,7 @@ void __init maple_pci_init(void)
 	if (ht && maple_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */ 
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions hopefully.
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
+	ppc_md.pcibios_root_bridge_prepare = maple_pci_root_bridge_prepare;
 
 	/* Tell pci.c to not change any resource allocations.  */
 	pci_add_flags(PCI_PROBE_ONLY);
diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
index f3a68a0..10c4e8f 100644
--- a/arch/powerpc/platforms/pasemi/pci.c
+++ b/arch/powerpc/platforms/pasemi/pci.c
@@ -229,9 +229,6 @@ void __init pas_pci_init(void)
 			of_node_get(np);
 
 	of_node_put(root);
-
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
 }
 
 void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)
diff --git a/arch/powerpc/platforms/powermac/pci.c b/arch/powerpc/platforms/powermac/pci.c
index 59ab16f..6e06c3b 100644
--- a/arch/powerpc/platforms/powermac/pci.c
+++ b/arch/powerpc/platforms/powermac/pci.c
@@ -878,6 +878,29 @@ void pmac_pci_irq_fixup(struct pci_dev *dev)
 #endif /* CONFIG_PPC32 */
 }
 
+#ifdef CONFIG_PPC64
+static int pmac_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions for now. We should do something better in the
+	 * future though
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+#endif /* CONFIG_PPC64 */
+
 void __init pmac_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -914,20 +937,7 @@ void __init pmac_pci_init(void)
 	if (ht && pmac_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions for now. We should do something better in the
-	 * future though
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
+	ppc_md.pcibios_root_bridge_prepare = pmac_pci_root_bridge_prepare;
 	/* pmac_check_ht_link(); */
 
 #else /* CONFIG_PPC64 */
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index a53e4c8..b87a315 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -816,9 +816,6 @@ void __init pnv_pci_init(void)
 	for_each_compatible_node(np, NULL, "ibm,ioda2-npu-phb")
 		pnv_pci_init_npu_phb(np);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
 	/* Configure IOMMU DMA hooks */
 	set_pci_dma_ops(&dma_iommu_ops);
 }
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 6f8d020..5fbc312 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -263,11 +263,8 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 	case OF_RECONFIG_ATTACH_NODE:
 		parent = of_get_parent(np);
 		pdn = parent ? PCI_DN(parent) : NULL;
-		if (pdn) {
-			/* Create pdn and EEH device */
+		if (pdn)
 			pci_add_device_node_info(pdn->phb, np);
-			eeh_dev_init(PCI_DN(np), pdn->phb);
-		}
 
 		of_node_put(parent);
 		break;
@@ -490,7 +487,6 @@ static void __init find_and_init_phbs(void)
 	}
 
 	of_node_put(root);
-	pci_devs_phb_init();
 
 	/*
 	 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 31/45] powerpc/pci: Don't scan empty slot
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (25 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 30/45] powerpc/pci: Delay populating pdn Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  8:19   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 32/45] powerpc/pci: Update bridge windows on PCI plug Gavin Shan
                   ` (12 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

In hotplug case, function pci_add_pci_devices() is called to rescan
the specified PCI bus, which might not have any child devices. Access
to the PCI bus's child device node will cause kernel crash without
exception.

This adds one more check to skip scanning PCI bus that doesn't have
any subordinate devices from device-tree, in order to avoid kernel
crash.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 7929a1c..3628c38 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -120,7 +120,8 @@ void pci_add_pci_devices(struct pci_bus *bus)
 	if (mode == PCI_PROBE_DEVTREE) {
 		/* use ofdt-based probe */
 		of_rescan_bus(dn, bus);
-	} else if (mode == PCI_PROBE_NORMAL) {
+	} else if (mode == PCI_PROBE_NORMAL &&
+		   dn->child && PCI_DN(dn->child)) {
 		/*
 		 * Use legacy probe. In the partial hotplug case, we
 		 * probably have grandchildren devices unplugged. So
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 32/45] powerpc/pci: Update bridge windows on PCI plug
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (26 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 31/45] powerpc/pci: Don't scan empty slot Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  8:47   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
                   ` (11 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

On the PCI plugging event, PCI slot's subordinate devices are
scanned and their (IO and MMIO) resources are assigned. Platform
dependent resources (PE#, IO/MMIO/DMA windows) are allocated or
created on updating windows of the slot's upstream bridge.

This updates the windows of the hot plugged slot's upstream bridge
in pcibios_finish_adding_to_bus() so that the platform resources
(PE#, IO/MMIO/DMA segments) are allocated or created accordingly.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-common.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 40df3a5..be9e515 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1444,8 +1444,12 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus)
 	/* Allocate bus and devices resources */
 	pcibios_allocate_bus_resources(bus);
 	pcibios_claim_one_bus(bus);
-	if (!pci_has_flag(PCI_PROBE_ONLY))
-		pci_assign_unassigned_bus_resources(bus);
+	if (!pci_has_flag(PCI_PROBE_ONLY)) {
+		if (bus->self)
+			pci_assign_unassigned_bridge_resources(bus->self);
+		else
+			pci_assign_unassigned_bus_resources(bus);
+	}
 
 	/* Fixup EEH */
 	eeh_add_device_tree_late(bus);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (27 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 32/45] powerpc/pci: Update bridge windows on PCI plug Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-02-17  4:35   ` Andrew Donnellan
  2016-04-19  8:49   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 34/45] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() Gavin Shan
                   ` (10 subsequent siblings)
  39 siblings, 2 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This drops unnecessary nested if statements in pnv_eeh_reset() to
improve the code readability. After the changes, the unused local
variable "ret" is dropped as well. No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 67 +++++++++++++---------------
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 69e41ce..9226df1 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1009,8 +1009,9 @@ static int pnv_eeh_reset_vf_pe(struct eeh_pe *pe, int option)
 static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 {
 	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb;
 	struct pci_bus *bus;
-	int ret;
+	int64_t rc;
 
 	/*
 	 * For PHB reset, we always have complete reset. For those PEs whose
@@ -1026,45 +1027,39 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 	 * reset. The side effect is that EEH core has to clear the frozen
 	 * state explicitly after BAR restore.
 	 */
-	if (pe->type & EEH_PE_PHB) {
-		ret = pnv_eeh_phb_reset(hose, option);
-	} else {
-		struct pnv_phb *phb;
-		s64 rc;
+	if (pe->type & EEH_PE_PHB)
+		return pnv_eeh_phb_reset(hose, option);
 
-		/*
-		 * The frozen PE might be caused by PAPR error injection
-		 * registers, which are expected to be cleared after hitting
-		 * frozen PE as stated in the hardware spec. Unfortunately,
-		 * that's not true on P7IOC. So we have to clear it manually
-		 * to avoid recursive EEH errors during recovery.
-		 */
-		phb = hose->private_data;
-		if (phb->model == PNV_PHB_MODEL_P7IOC &&
-		    (option == EEH_RESET_HOT ||
-		    option == EEH_RESET_FUNDAMENTAL)) {
-			rc = opal_pci_reset(phb->opal_id,
-					    OPAL_RESET_PHB_ERROR,
-					    OPAL_ASSERT_RESET);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Failure %lld clearing "
-					"error injection registers\n",
-					__func__, rc);
-				return -EIO;
-			}
+	/*
+	 * The frozen PE might be caused by PAPR error injection
+	 * registers, which are expected to be cleared after hitting
+	 * frozen PE as stated in the hardware spec. Unfortunately,
+	 * that's not true on P7IOC. So we have to clear it manually
+	 * to avoid recursive EEH errors during recovery.
+	 */
+	phb = hose->private_data;
+	if (phb->model == PNV_PHB_MODEL_P7IOC &&
+	    (option == EEH_RESET_HOT ||
+	     option == EEH_RESET_FUNDAMENTAL)) {
+		rc = opal_pci_reset(phb->opal_id,
+				    OPAL_RESET_PHB_ERROR,
+				    OPAL_ASSERT_RESET);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Failure %lld clearing error injection registers\n",
+				__func__, rc);
+			return -EIO;
 		}
-
-		bus = eeh_pe_bus_get(pe);
-		if (pe->type & EEH_PE_VF)
-			ret = pnv_eeh_reset_vf_pe(pe, option);
-		else if (pci_is_root_bus(bus) ||
-			pci_is_root_bus(bus->parent))
-			ret = pnv_eeh_root_reset(hose, option);
-		else
-			ret = pnv_eeh_bridge_reset(bus->self, option);
 	}
 
-	return ret;
+	bus = eeh_pe_bus_get(pe);
+	if (pe->type & EEH_PE_VF)
+		return pnv_eeh_reset_vf_pe(pe, option);
+
+	if (pci_is_root_bus(bus) ||
+	    pci_is_root_bus(bus->parent))
+		return pnv_eeh_root_reset(hose, option);
+
+	return pnv_eeh_bridge_reset(bus->self, option);
 }
 
 /**
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 34/45] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (28 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  8:57   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 35/45] powerpc/powernv: Fundamental reset " Gavin Shan
                   ` (9 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

The function pnv_pci_reset_secondary_bus() is called like below.
It's impossible for call the function on root bus. So it's safe
to remove the root bus case in the function. No functional changes
introduced.

   pci_parent_bus_reset() / pci_bus_reset() / pci_try_reset_bus()
   pci_reset_bridge_secondary_bus()
   pcibios_reset_secondary_bus()
   pnv_pci_reset_secondary_bus()

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 9226df1..593b8dc 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -868,16 +868,8 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
-	struct pci_controller *hose;
-
-	if (pci_is_root_bus(dev->bus)) {
-		hose = pci_bus_to_host(dev->bus);
-		pnv_eeh_root_reset(hose, EEH_RESET_HOT);
-		pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
-	} else {
-		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
-		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
-	}
+	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
+	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
 }
 
 static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, const char *type,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 35/45] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (29 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 34/45] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
       [not found]   ` <1455680668-23298-36-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2016-02-17  3:44 ` [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID Gavin Shan
                   ` (8 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

In pnv_pci_reset_secondary_bus(), we should issue fundamental reset
if any one subordinate device of the specified bus is requesting that.
Otherwise, the device might not come up after the reset.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 593b8dc..c7454ba 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -866,9 +866,28 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
+{
+	int *freset = data;
+
+	/*
+	 * Stop the iteration immediately if there has any one
+	 * PCI device requesting fundamental reset.
+	 */
+	*freset |= pdev->needs_freset;
+	return *freset;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
-	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
+	int option, freset = 0;
+
+	if (dev->subordinate)
+		pci_walk_bus(dev->subordinate,
+			     pnv_pci_dev_reset_type, &freset);
+
+	option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
+	pnv_eeh_bridge_reset(dev, option);
 	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
 }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (30 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 35/45] powerpc/powernv: Fundamental reset " Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
       [not found]   ` <1455680668-23298-37-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2016-02-17  3:44 ` [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure Gavin Shan
                   ` (7 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

PowerNV platforms runs on top of skiboot firmware that includes
changes to support PCI slots. PCI slots are identified by PHB's
ID or the combo of that and PCI slot ID.

This changes the EEH PowerNV backend to support PCI slots:

   * Rename arguments of opal_pci_reset() and opal_pci_poll().
   * One more argument (PCI slot's state) added to opal_pci_poll().
   * Drop pnv_eeh_phb_poll() and introduce a enhanced similar
     function pnv_pci_poll() that will be used by PowerNV hotplug
     backends.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h              |  4 +--
 arch/powerpc/platforms/powernv/eeh-powernv.c | 42 ++++++----------------------
 arch/powerpc/platforms/powernv/pci.c         | 21 ++++++++++++++
 arch/powerpc/platforms/powernv/pci.h         |  1 +
 4 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 07a99e6..9e0039f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -131,7 +131,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
 int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
 					uint16_t dma_window_number, uint64_t pci_start_addr,
 					uint64_t pci_mem_size);
-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
 
 int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
 				   uint64_t diag_buffer_len);
@@ -148,7 +148,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
 int64_t opal_set_system_attention_led(uint8_t led_action);
 int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
 			    __be16 *pci_error_type, __be16 *severity);
-int64_t opal_pci_poll(uint64_t phb_id);
+int64_t opal_pci_poll(uint64_t id, uint8_t *state);
 int64_t opal_return_cpu(void);
 int64_t opal_check_token(uint64_t token);
 int64_t opal_reinit_cpus(uint64_t flags);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index c7454ba..e23b063 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -717,28 +717,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
 	return ret;
 }
 
-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
-{
-	s64 rc = OPAL_HARDWARE;
-
-	while (1) {
-		rc = opal_pci_poll(phb->opal_id);
-		if (rc <= 0)
-			break;
-
-		if (system_state < SYSTEM_RUNNING)
-			udelay(1000 * rc);
-		else
-			msleep(rc);
-	}
-
-	return rc;
-}
-
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
 	s64 rc = OPAL_HARDWARE;
+	int ret;
 
 	pr_debug("%s: Reset PHB#%x, option=%d\n",
 		 __func__, hose->global_number, option);
@@ -753,8 +736,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 		rc = opal_pci_reset(phb->opal_id,
 				    OPAL_RESET_PHB_COMPLETE,
 				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
 
 	/*
 	 * Poll state of the PHB until the request is done
@@ -762,24 +743,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 	 * reset followed by hot reset on root bus. So we also
 	 * need the PCI bus settlement delay.
 	 */
-	rc = pnv_eeh_phb_poll(phb);
-	if (option == EEH_RESET_DEACTIVATE) {
+	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
+	if (option == EEH_RESET_DEACTIVATE && !ret) {
 		if (system_state < SYSTEM_RUNNING)
 			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
 		else
 			msleep(EEH_PE_RST_SETTLE_TIME);
 	}
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
 
-	return 0;
+	return ret;
 }
 
 static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
 	s64 rc = OPAL_HARDWARE;
+	int ret;
 
 	pr_debug("%s: Reset PHB#%x, option=%d\n",
 		 __func__, hose->global_number, option);
@@ -801,18 +780,13 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 		rc = opal_pci_reset(phb->opal_id,
 				    OPAL_RESET_PCI_HOT,
 				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
 
 	/* Poll state of the PHB until the request is done */
-	rc = pnv_eeh_phb_poll(phb);
-	if (option == EEH_RESET_DEACTIVATE)
+	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
+	if (option == EEH_RESET_DEACTIVATE && !ret)
 		msleep(EEH_PE_RST_SETTLE_TIME);
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
 
-	return 0;
+	return ret;
 }
 
 static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index b87a315..a458703 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -42,6 +42,27 @@
 #define cfg_dbg(fmt...)	do { } while(0)
 //#define cfg_dbg(fmt...)	printk(fmt)
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
+{
+	while (rval > 0) {
+		if (system_state < SYSTEM_RUNNING)
+			udelay(1000 * rval);
+		else
+			msleep(rval);
+
+		rval = opal_pci_poll(id, state);
+	}
+
+	/*
+	 * The caller expects to retrieve additional
+	 * information if the last argument isn't NULL.
+	 */
+	if (rval == OPAL_SUCCESS && state)
+		rval = opal_pci_poll(id, state);
+
+	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
+}
+
 #ifdef CONFIG_PCI_MSI
 int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 0cddde3..6857703 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -192,6 +192,7 @@ extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
 		unsigned long *hpa, enum dma_data_direction *direction);
 extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state);
 void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
 				unsigned char *log_buff);
 int pnv_pci_cfg_read(struct pci_dn *pdn,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (31 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  9:34   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
                   ` (6 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

The skiboot firmware might provide the PCI slot reset capability
which is identified by property "ibm,reset-by-firmware" on the
PCI slot associated device node.

This checks the property. If it exists, the reset request is routed
to firmware. Otherwise, the reset is done by kernel as before.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e23b063..c8a5217 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -789,7 +789,7 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 	return ret;
 }
 
-static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
+static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 {
 	struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -840,6 +840,45 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	struct device_node *dn = pdev ? pci_device_to_OF_node(pdev) : NULL;
+	uint64_t id = (0x1ul << 60);
+	uint8_t scope;
+	int64_t rc;
+
+	/*
+	 * If the firmware can't handle it, we will issue hot reset
+	 * on the secondary bus despite the requested reset type.
+	 */
+	if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
+		return __pnv_eeh_bridge_reset(pdev, option);
+
+	/* The firmware can handle the request */
+	switch (option) {
+	case EEH_RESET_HOT:
+		scope = OPAL_RESET_PCI_HOT;
+		break;
+	case EEH_RESET_FUNDAMENTAL:
+		scope = OPAL_RESET_PCI_FUNDAMENTAL;
+		break;
+	case EEH_RESET_DEACTIVATE:
+		return 0;
+	default:
+		dev_warn(&pdev->dev, "%s: Unsupported reset %d\n",
+			 __func__, option);
+		return -EINVAL;
+	}
+
+	hose = pci_bus_to_host(pdev->bus);
+	phb = hose->private_data;
+	id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
+	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
+	return pnv_pci_poll(id, rc, NULL);
+}
+
 static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
 {
 	int *freset = data;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2016-02-17  3:44     ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                       ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, aik-sLpHqDYs0B2HXe+LvDLADg,
	dja-Yfaxwxk/+vWsTnJN9+BGXg, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, Gavin Shan

This exports 4 functins, which base on the corresponding OPAL
APIs to get/set PCI slot status. Those functions are going to
be used by PowerNV PCI hotplug driver:

   pnv_pci_get_device_tree()    opal_get_device_tree()
   pnv_pci_get_presence_state() opal_pci_get_presence_state()
   pnv_pci_get_power_state()    opal_pci_get_power_state()
   pnv_pci_set_power_state()    opal_pci_set_power_state()

Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
unregister}() to allow registration and unregistration of PCI hotplug
notifier, which will be used to receive PCI hotplug message from
skiboot firmware in PowerNV PCI hotplug driver.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/include/asm/opal-api.h            | 17 ++++++-
 arch/powerpc/include/asm/opal.h                |  4 ++
 arch/powerpc/include/asm/pnv-pci.h             |  7 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
 arch/powerpc/platforms/powernv/pci.c           | 66 ++++++++++++++++++++++++++
 5 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index f8faaae..a6af338 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -158,7 +158,11 @@
 #define OPAL_LEDS_SET_INDICATOR			115
 #define OPAL_CEC_REBOOT2			116
 #define OPAL_CONSOLE_FLUSH			117
-#define OPAL_LAST				117
+#define OPAL_GET_DEVICE_TREE			118
+#define OPAL_PCI_GET_PRESENCE_STATE		119
+#define OPAL_PCI_GET_POWER_STATE		120
+#define OPAL_PCI_SET_POWER_STATE		121
+#define OPAL_LAST				121
 
 /* Device tree flags */
 
@@ -344,6 +348,16 @@ enum OpalPciResetState {
 	OPAL_ASSERT_RESET   = 1
 };
 
+enum OpalPciSlotPresentenceState {
+	OPAL_PCI_SLOT_EMPTY	= 0,
+	OPAL_PCI_SLOT_PRESENT	= 1
+};
+
+enum OpalPciSlotPowerState {
+	OPAL_PCI_SLOT_POWER_OFF	= 0,
+	OPAL_PCI_SLOT_POWER_ON	= 1
+};
+
 enum OpalSlotLedType {
 	OPAL_SLOT_LED_TYPE_ID = 0,	/* IDENTIFY LED */
 	OPAL_SLOT_LED_TYPE_FAULT = 1,	/* FAULT LED */
@@ -378,6 +392,7 @@ enum opal_msg_type {
 	OPAL_MSG_DPO,
 	OPAL_MSG_PRD,
 	OPAL_MSG_OCC,
+	OPAL_MSG_PCI_HOTPLUG,
 	OPAL_MSG_TYPE_MAX,
 };
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9e0039f..899bcb941 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -209,6 +209,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
 		uint64_t size, uint64_t token);
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
 		uint64_t token);
+int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
+int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index 6f77f71..d9d095b 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,13 @@
 #include <linux/pci.h>
 #include <misc/cxl-base.h>
 
+extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
+extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
+extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
+extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
+
 int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
 int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 			   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index e45b88a..3ea1a855 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -302,3 +302,7 @@ OPAL_CALL(opal_prd_msg,				OPAL_PRD_MSG);
 OPAL_CALL(opal_leds_get_ind,			OPAL_LEDS_GET_INDICATOR);
 OPAL_CALL(opal_leds_set_ind,			OPAL_LEDS_SET_INDICATOR);
 OPAL_CALL(opal_console_flush,			OPAL_CONSOLE_FLUSH);
+OPAL_CALL(opal_get_device_tree,			OPAL_GET_DEVICE_TREE);
+OPAL_CALL(opal_pci_get_presence_state,		OPAL_PCI_GET_PRESENCE_STATE);
+OPAL_CALL(opal_pci_get_power_state,		OPAL_PCI_GET_POWER_STATE);
+OPAL_CALL(opal_pci_set_power_state,		OPAL_PCI_SET_POWER_STATE);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index a458703..206385f 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -63,6 +63,72 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
 	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
 }
 
+int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_GET_DEVICE_TREE))
+		return -ENXIO;
+
+	rc = opal_get_device_tree(phandle, (uint64_t)buf, len);
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_device_tree);
+
+int pnv_pci_get_presence_state(uint64_t id, uint8_t *state)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATE))
+		return -ENXIO;
+
+	rc = opal_pci_get_presence_state(id, state);
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_presence_state);
+
+int pnv_pci_get_power_state(uint64_t id, uint8_t *state)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_POWER_STATE))
+		return -ENXIO;
+
+	rc = opal_pci_get_power_state(id, state);
+	return pnv_pci_poll(id, rc, state);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_power_state);
+
+int pnv_pci_set_power_state(uint64_t id, uint8_t state)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_SET_POWER_STATE))
+		return -ENXIO;
+
+	rc = opal_pci_set_power_state(id, state);
+	return pnv_pci_poll(id, rc, NULL);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_set_power_state);
+
+int pnv_pci_hotplug_notifier_register(struct notifier_block *nb)
+{
+	return opal_message_notifier_register(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_register);
+
+int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb)
+{
+	return opal_message_notifier_unregister(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_unregister);
+
 #ifdef CONFIG_PCI_MSI
 int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status
@ 2016-02-17  3:44     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This exports 4 functins, which base on the corresponding OPAL
APIs to get/set PCI slot status. Those functions are going to
be used by PowerNV PCI hotplug driver:

   pnv_pci_get_device_tree()    opal_get_device_tree()
   pnv_pci_get_presence_state() opal_pci_get_presence_state()
   pnv_pci_get_power_state()    opal_pci_get_power_state()
   pnv_pci_set_power_state()    opal_pci_set_power_state()

Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
unregister}() to allow registration and unregistration of PCI hotplug
notifier, which will be used to receive PCI hotplug message from
skiboot firmware in PowerNV PCI hotplug driver.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal-api.h            | 17 ++++++-
 arch/powerpc/include/asm/opal.h                |  4 ++
 arch/powerpc/include/asm/pnv-pci.h             |  7 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
 arch/powerpc/platforms/powernv/pci.c           | 66 ++++++++++++++++++++++++++
 5 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index f8faaae..a6af338 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -158,7 +158,11 @@
 #define OPAL_LEDS_SET_INDICATOR			115
 #define OPAL_CEC_REBOOT2			116
 #define OPAL_CONSOLE_FLUSH			117
-#define OPAL_LAST				117
+#define OPAL_GET_DEVICE_TREE			118
+#define OPAL_PCI_GET_PRESENCE_STATE		119
+#define OPAL_PCI_GET_POWER_STATE		120
+#define OPAL_PCI_SET_POWER_STATE		121
+#define OPAL_LAST				121
 
 /* Device tree flags */
 
@@ -344,6 +348,16 @@ enum OpalPciResetState {
 	OPAL_ASSERT_RESET   = 1
 };
 
+enum OpalPciSlotPresentenceState {
+	OPAL_PCI_SLOT_EMPTY	= 0,
+	OPAL_PCI_SLOT_PRESENT	= 1
+};
+
+enum OpalPciSlotPowerState {
+	OPAL_PCI_SLOT_POWER_OFF	= 0,
+	OPAL_PCI_SLOT_POWER_ON	= 1
+};
+
 enum OpalSlotLedType {
 	OPAL_SLOT_LED_TYPE_ID = 0,	/* IDENTIFY LED */
 	OPAL_SLOT_LED_TYPE_FAULT = 1,	/* FAULT LED */
@@ -378,6 +392,7 @@ enum opal_msg_type {
 	OPAL_MSG_DPO,
 	OPAL_MSG_PRD,
 	OPAL_MSG_OCC,
+	OPAL_MSG_PCI_HOTPLUG,
 	OPAL_MSG_TYPE_MAX,
 };
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9e0039f..899bcb941 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -209,6 +209,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
 		uint64_t size, uint64_t token);
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
 		uint64_t token);
+int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
+int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index 6f77f71..d9d095b 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,13 @@
 #include <linux/pci.h>
 #include <misc/cxl-base.h>
 
+extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
+extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
+extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
+extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
+
 int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
 int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 			   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index e45b88a..3ea1a855 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -302,3 +302,7 @@ OPAL_CALL(opal_prd_msg,				OPAL_PRD_MSG);
 OPAL_CALL(opal_leds_get_ind,			OPAL_LEDS_GET_INDICATOR);
 OPAL_CALL(opal_leds_set_ind,			OPAL_LEDS_SET_INDICATOR);
 OPAL_CALL(opal_console_flush,			OPAL_CONSOLE_FLUSH);
+OPAL_CALL(opal_get_device_tree,			OPAL_GET_DEVICE_TREE);
+OPAL_CALL(opal_pci_get_presence_state,		OPAL_PCI_GET_PRESENCE_STATE);
+OPAL_CALL(opal_pci_get_power_state,		OPAL_PCI_GET_POWER_STATE);
+OPAL_CALL(opal_pci_set_power_state,		OPAL_PCI_SET_POWER_STATE);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index a458703..206385f 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -63,6 +63,72 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
 	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
 }
 
+int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_GET_DEVICE_TREE))
+		return -ENXIO;
+
+	rc = opal_get_device_tree(phandle, (uint64_t)buf, len);
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_device_tree);
+
+int pnv_pci_get_presence_state(uint64_t id, uint8_t *state)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATE))
+		return -ENXIO;
+
+	rc = opal_pci_get_presence_state(id, state);
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_presence_state);
+
+int pnv_pci_get_power_state(uint64_t id, uint8_t *state)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_POWER_STATE))
+		return -ENXIO;
+
+	rc = opal_pci_get_power_state(id, state);
+	return pnv_pci_poll(id, rc, state);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_power_state);
+
+int pnv_pci_set_power_state(uint64_t id, uint8_t state)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_SET_POWER_STATE))
+		return -ENXIO;
+
+	rc = opal_pci_set_power_state(id, state);
+	return pnv_pci_poll(id, rc, NULL);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_set_power_state);
+
+int pnv_pci_hotplug_notifier_register(struct notifier_block *nb)
+{
+	return opal_message_notifier_register(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_register);
+
+int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb)
+{
+	return opal_message_notifier_unregister(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_unregister);
+
 #ifdef CONFIG_PCI_MSI
 int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (32 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-04-19  9:42   ` Alexey Kardashevskiy
  2016-02-17  3:44 ` [PATCH v8 40/45] drivers/of: Split unflatten_dt_node() Gavin Shan
                   ` (5 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

The device tree will change dynamically in PowerNV PCI hotplug
driver. This enables CONFIG_OF_DYNAMIC to support that.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 604190c..e7b1ad7 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -18,6 +18,7 @@ config PPC_POWERNV
 	select CPU_FREQ_GOV_ONDEMAND
 	select CPU_FREQ_GOV_CONSERVATIVE
 	select PPC_DOORBELL
+	select OF_DYNAMIC
 	default y
 
 config OPAL_PRD
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 40/45] drivers/of: Split unflatten_dt_node()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (33 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-02-17 14:30   ` Rob Herring
  2016-02-17  3:44 ` [PATCH v8 41/45] drivers/of: Avoid recursively calling unflatten_dt_node() Gavin Shan
                   ` (4 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

The function unflatten_dt_node() is called recursively to unflatten
device nodes and properties in the FDT blob. It looks complicated
and hard to be understood.

This splits the function into 3 functions: populate_properties(),
populate_node() and unflatten_dt_node(). populate_properties(),
which is called by populate_node(), creates properties for the
indicated device node. The later one creates the device nodes
from FDT blob. populate_node() gets the offset in FDT blob for
next device nodes and then calls populate_node(). No logical
changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c | 249 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 147 insertions(+), 102 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 655f79d..3c69002 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -161,39 +161,127 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
 	return res;
 }
 
-/**
- * unflatten_dt_node - Alloc and populate a device_node from the flat tree
- * @blob: The parent device tree blob
- * @mem: Memory chunk to use for allocating device nodes and properties
- * @poffset: pointer to node in flat tree
- * @dad: Parent struct device_node
- * @nodepp: The device_node tree created by the call
- * @fpsize: Size of the node path up at the current depth.
- * @dryrun: If true, do not allocate device nodes but still calculate needed
- * memory size
- */
-static void * unflatten_dt_node(const void *blob,
-				void *mem,
-				int *poffset,
-				struct device_node *dad,
-				struct device_node **nodepp,
-				unsigned long fpsize,
+static void populate_properties(const void *blob,
+				int offset,
+				void **mem,
+				struct device_node *np,
+				const char *nodename,
 				bool dryrun)
 {
-	const __be32 *p;
+	struct property *pp, **pprev = NULL;
+	int cur;
+	bool has_name = false;
+
+	pprev = &np->properties;
+	for (cur = fdt_first_property_offset(blob, offset);
+	     cur >= 0;
+	     cur = fdt_next_property_offset(blob, cur)) {
+		const __be32 *val;
+		const char *pname;
+		u32 sz;
+
+		val = fdt_getprop_by_offset(blob, cur, &pname, &sz);
+		if (!val) {
+			pr_warn("%s: Cannot locate property at 0x%x\n",
+				__func__, cur);
+			continue;
+		}
+
+		if (!pname) {
+			pr_warn("%s: Cannot find property name at 0x%x\n",
+				__func__, cur);
+			continue;
+		}
+
+		if (!strcmp(pname, "name"))
+			has_name = true;
+
+		pp = unflatten_dt_alloc(mem, sizeof(struct property),
+					__alignof__(struct property));
+		if (dryrun)
+			continue;
+
+		/* We accept flattened tree phandles either in
+		 * ePAPR-style "phandle" properties, or the
+		 * legacy "linux,phandle" properties.  If both
+		 * appear and have different values, things
+		 * will get weird. Don't do that.
+		 */
+		if (!strcmp(pname, "phandle") ||
+		    !strcmp(pname, "linux,phandle")) {
+			if (!np->phandle)
+				np->phandle = be32_to_cpup(val);
+		}
+
+		/* And we process the "ibm,phandle" property
+		 * used in pSeries dynamic device tree
+		 * stuff
+		 */
+		if (!strcmp(pname, "ibm,phandle"))
+			np->phandle = be32_to_cpup(val);
+
+		pp->name   = (char *)pname;
+		pp->length = sz;
+		pp->value  = (__be32 *)val;
+		*pprev     = pp;
+		pprev      = &pp->next;
+	}
+
+	/* With version 0x10 we may not have the name property,
+	 * recreate it here from the unit name if absent
+	 */
+	if (!has_name) {
+		const char *p = nodename, *ps = p, *pa = NULL;
+		int len;
+
+		while (*p) {
+			if ((*p) == '@')
+				pa = p;
+			else if ((*p) == '/')
+				ps = p + 1;
+			p++;
+		}
+
+		if (pa < ps)
+			pa = p;
+		len = (pa - ps) + 1;
+		pp = unflatten_dt_alloc(mem, sizeof(struct property) + len,
+					__alignof__(struct property));
+		if (!dryrun) {
+			pp->name   = "name";
+			pp->length = len;
+			pp->value  = pp + 1;
+			*pprev     = pp;
+			pprev      = &pp->next;
+			memcpy(pp->value, ps, len - 1);
+			((char *)pp->value)[len - 1] = 0;
+			pr_debug("fixed up name for %s -> %s\n",
+				 nodename, (char *)pp->value);
+		}
+	}
+
+	if (!dryrun)
+		*pprev = NULL;
+}
+
+static unsigned long populate_node(const void *blob,
+				   int offset,
+				   void **mem,
+				   struct device_node *dad,
+				   unsigned long fpsize,
+				   struct device_node **pnp,
+				   bool dryrun)
+{
 	struct device_node *np;
-	struct property *pp, **prev_pp = NULL;
 	const char *pathp;
 	unsigned int l, allocl;
-	static int depth;
-	int old_depth;
-	int offset;
-	int has_name = 0;
 	int new_format = 0;
 
-	pathp = fdt_get_name(blob, *poffset, &l);
-	if (!pathp)
-		return mem;
+	pathp = fdt_get_name(blob, offset, &l);
+	if (!pathp) {
+		*pnp = NULL;
+		return 0;
+	}
 
 	allocl = ++l;
 
@@ -223,7 +311,7 @@ static void * unflatten_dt_node(const void *blob,
 		}
 	}
 
-	np = unflatten_dt_alloc(&mem, sizeof(struct device_node) + allocl,
+	np = unflatten_dt_alloc(mem, sizeof(struct device_node) + allocl,
 				__alignof__(struct device_node));
 	if (!dryrun) {
 		char *fn;
@@ -246,89 +334,15 @@ static void * unflatten_dt_node(const void *blob,
 		}
 		memcpy(fn, pathp, l);
 
-		prev_pp = &np->properties;
 		if (dad != NULL) {
 			np->parent = dad;
 			np->sibling = dad->child;
 			dad->child = np;
 		}
 	}
-	/* process properties */
-	for (offset = fdt_first_property_offset(blob, *poffset);
-	     (offset >= 0);
-	     (offset = fdt_next_property_offset(blob, offset))) {
-		const char *pname;
-		u32 sz;
 
-		if (!(p = fdt_getprop_by_offset(blob, offset, &pname, &sz))) {
-			offset = -FDT_ERR_INTERNAL;
-			break;
-		}
-
-		if (pname == NULL) {
-			pr_info("Can't find property name in list !\n");
-			break;
-		}
-		if (strcmp(pname, "name") == 0)
-			has_name = 1;
-		pp = unflatten_dt_alloc(&mem, sizeof(struct property),
-					__alignof__(struct property));
-		if (!dryrun) {
-			/* We accept flattened tree phandles either in
-			 * ePAPR-style "phandle" properties, or the
-			 * legacy "linux,phandle" properties.  If both
-			 * appear and have different values, things
-			 * will get weird.  Don't do that. */
-			if ((strcmp(pname, "phandle") == 0) ||
-			    (strcmp(pname, "linux,phandle") == 0)) {
-				if (np->phandle == 0)
-					np->phandle = be32_to_cpup(p);
-			}
-			/* And we process the "ibm,phandle" property
-			 * used in pSeries dynamic device tree
-			 * stuff */
-			if (strcmp(pname, "ibm,phandle") == 0)
-				np->phandle = be32_to_cpup(p);
-			pp->name = (char *)pname;
-			pp->length = sz;
-			pp->value = (__be32 *)p;
-			*prev_pp = pp;
-			prev_pp = &pp->next;
-		}
-	}
-	/* with version 0x10 we may not have the name property, recreate
-	 * it here from the unit name if absent
-	 */
-	if (!has_name) {
-		const char *p1 = pathp, *ps = pathp, *pa = NULL;
-		int sz;
-
-		while (*p1) {
-			if ((*p1) == '@')
-				pa = p1;
-			if ((*p1) == '/')
-				ps = p1 + 1;
-			p1++;
-		}
-		if (pa < ps)
-			pa = p1;
-		sz = (pa - ps) + 1;
-		pp = unflatten_dt_alloc(&mem, sizeof(struct property) + sz,
-					__alignof__(struct property));
-		if (!dryrun) {
-			pp->name = "name";
-			pp->length = sz;
-			pp->value = pp + 1;
-			*prev_pp = pp;
-			prev_pp = &pp->next;
-			memcpy(pp->value, ps, sz - 1);
-			((char *)pp->value)[sz - 1] = 0;
-			pr_debug("fixed up name for %s -> %s\n", pathp,
-				(char *)pp->value);
-		}
-	}
+	populate_properties(blob, offset, mem, np, pathp, dryrun);
 	if (!dryrun) {
-		*prev_pp = NULL;
 		np->name = of_get_property(np, "name", NULL);
 		np->type = of_get_property(np, "device_type", NULL);
 
@@ -338,6 +352,37 @@ static void * unflatten_dt_node(const void *blob,
 			np->type = "<NULL>";
 	}
 
+	*pnp = np;
+	return fpsize;
+}
+
+/**
+ * unflatten_dt_node - Alloc and populate a device_node from the flat tree
+ * @blob: The parent device tree blob
+ * @mem: Memory chunk to use for allocating device nodes and properties
+ * @poffset: pointer to node in flat tree
+ * @dad: Parent struct device_node
+ * @nodepp: The device_node tree created by the call
+ * @fpsize: Size of the node path up at the current depth.
+ * @dryrun: If true, do not allocate device nodes but still calculate needed
+ * memory size
+ */
+static void *unflatten_dt_node(const void *blob,
+			       void *mem,
+			       int *poffset,
+			       struct device_node *dad,
+			       struct device_node **nodepp,
+			       unsigned long fpsize,
+			       bool dryrun)
+{
+	struct device_node *np;
+	static int depth;
+	int old_depth;
+
+	fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
+	if (!fpsize)
+		return mem;
+
 	old_depth = depth;
 	*poffset = fdt_next_node(blob, *poffset, &depth);
 	if (depth < 0)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 41/45] drivers/of: Avoid recursively calling unflatten_dt_node()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (34 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 40/45] drivers/of: Split unflatten_dt_node() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-02-17 14:53     ` Rob Herring
  2016-02-17  3:44 ` [PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree() Gavin Shan
                   ` (3 subsequent siblings)
  39 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

In current implementation, unflatten_dt_node() is called recursively
to unflatten device nodes in FDT blob. It's stress to limited stack
capacity, especially to adopt the function to unflatten device sub-tree
that possibly has multiple root nodes. In that case, we runs out of
stack and the system can't boot up successfully.

In order to reuse the function to unflatten device sub-tree, this avoids
calling the function recursively, meaning the device nodes are unflattened
in one call on unflatten_dt_node(): two arrays are introduced to track the
parent path size and the device node of current level of depth, which will
be used by the device node on next level of depth to be unflattened. All
device nodes in more than 64 level of depth are dropped and hopefully,
the system can boot up successfully with the partial device-tree.

Also, the parameter "poffset" and "fpsize" are unused and dropped and the
parameter "dryrun" is figured out from "mem == NULL". Besides, the return
value of the function is changed to indicate the size of memory consumed by
the unflatten device tree or error code.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c | 122 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 74 insertions(+), 48 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 3c69002..667a5b2 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -356,63 +356,90 @@ static unsigned long populate_node(const void *blob,
 	return fpsize;
 }
 
+static void reverse_nodes(struct device_node *parent)
+{
+	struct device_node *child, *next;
+
+	/* In-depth first */
+	child = parent->child;
+	while (child) {
+		reverse_nodes(child);
+
+		child = child->sibling;
+	}
+
+	/* Reverse the nodes in the child list */
+	child = parent->child;
+	parent->child = NULL;
+	while (child) {
+		next = child->sibling;
+
+		child->sibling = parent->child;
+		parent->child = child;
+		child = next;
+	}
+}
+
 /**
  * unflatten_dt_node - Alloc and populate a device_node from the flat tree
  * @blob: The parent device tree blob
  * @mem: Memory chunk to use for allocating device nodes and properties
- * @poffset: pointer to node in flat tree
  * @dad: Parent struct device_node
  * @nodepp: The device_node tree created by the call
- * @fpsize: Size of the node path up at the current depth.
- * @dryrun: If true, do not allocate device nodes but still calculate needed
- * memory size
+ *
+ * It returns the size of unflattened device tree or error code
  */
-static void *unflatten_dt_node(const void *blob,
-			       void *mem,
-			       int *poffset,
-			       struct device_node *dad,
-			       struct device_node **nodepp,
-			       unsigned long fpsize,
-			       bool dryrun)
+static int unflatten_dt_node(const void *blob,
+			     void *mem,
+			     struct device_node *dad,
+			     struct device_node **nodepp)
 {
-	struct device_node *np;
-	static int depth;
-	int old_depth;
+	struct device_node *root;
+	int offset = 0, depth = 0;
+#define FDT_MAX_DEPTH	64
+	unsigned long fpsizes[FDT_MAX_DEPTH];
+	struct device_node *nps[FDT_MAX_DEPTH];
+	void *base = mem;
+	bool dryrun = !base;
 
-	fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
-	if (!fpsize)
-		return mem;
+	if (nodepp)
+		*nodepp = NULL;
+
+	root = dad;
+	fpsizes[depth] = dad ? strlen(of_node_full_name(dad)) : 0;
+	nps[depth++] = dad;
+	for (offset = 0;
+	     offset >= 0;
+	     offset = fdt_next_node(blob, offset, &depth)) {
+		if (WARN_ON_ONCE(depth >= FDT_MAX_DEPTH))
+			continue;
 
-	old_depth = depth;
-	*poffset = fdt_next_node(blob, *poffset, &depth);
-	if (depth < 0)
-		depth = 0;
-	while (*poffset > 0 && depth > old_depth)
-		mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
-					fpsize, dryrun);
+		fpsizes[depth] = populate_node(blob, offset, &mem,
+					       nps[depth - 1],
+					       fpsizes[depth - 1],
+					       &nps[depth], dryrun);
+		if (!fpsizes[depth])
+			return mem - base;
+
+		if (!dryrun && nodepp && !*nodepp)
+			*nodepp = nps[depth];
+		if (!dryrun && !root)
+			root = nps[depth];
+	}
 
-	if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
-		pr_err("unflatten: error %d processing FDT\n", *poffset);
+	if (offset < 0 && offset != -FDT_ERR_NOTFOUND) {
+		pr_err("%s: Error %d processing FDT\n", __func__, offset);
+		return -EINVAL;
+	}
 
 	/*
 	 * Reverse the child list. Some drivers assumes node order matches .dts
 	 * node order
 	 */
-	if (!dryrun && np->child) {
-		struct device_node *child = np->child;
-		np->child = NULL;
-		while (child) {
-			struct device_node *next = child->sibling;
-			child->sibling = np->child;
-			np->child = child;
-			child = next;
-		}
-	}
-
-	if (nodepp)
-		*nodepp = np;
+	if (!dryrun)
+		reverse_nodes(root);
 
-	return mem;
+	return mem - base;
 }
 
 /**
@@ -431,8 +458,7 @@ static void __unflatten_device_tree(const void *blob,
 			     struct device_node **mynodes,
 			     void * (*dt_alloc)(u64 size, u64 align))
 {
-	unsigned long size;
-	int start;
+	int size;
 	void *mem;
 
 	pr_debug(" -> unflatten_device_tree()\n");
@@ -453,11 +479,12 @@ static void __unflatten_device_tree(const void *blob,
 	}
 
 	/* First pass, scan for size */
-	start = 0;
-	size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
-	size = ALIGN(size, 4);
+	size = unflatten_dt_node(blob, NULL, NULL, NULL);
+	if (size < 0)
+		return;
 
-	pr_debug("  size is %lx, allocating...\n", size);
+	size = ALIGN(size, 4);
+	pr_debug("  size is %d, allocating...\n", size);
 
 	/* Allocate memory for the expanded device tree */
 	mem = dt_alloc(size + 4, __alignof__(struct device_node));
@@ -468,8 +495,7 @@ static void __unflatten_device_tree(const void *blob,
 	pr_debug("  unflattening %p...\n", mem);
 
 	/* Second pass, do actual unflattening */
-	start = 0;
-	unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
+	unflatten_dt_node(blob, mem, NULL, mynodes);
 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
 		pr_warning("End of tree marker overwritten: %08x\n",
 			   be32_to_cpup(mem + size));
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2016-02-17  3:44     ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                       ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, aik-sLpHqDYs0B2HXe+LvDLADg,
	dja-Yfaxwxk/+vWsTnJN9+BGXg, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, Gavin Shan

This renames unflatten_dt_node() to unflatten_dt_nodes() as it
populates multiple device nodes from FDT blob. No logical changes
introduced.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 drivers/of/fdt.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 667a5b2..3fc9a30 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -381,7 +381,7 @@ static void reverse_nodes(struct device_node *parent)
 }
 
 /**
- * unflatten_dt_node - Alloc and populate a device_node from the flat tree
+ * unflatten_dt_nodes - Alloc and populate a device_node from the flat tree
  * @blob: The parent device tree blob
  * @mem: Memory chunk to use for allocating device nodes and properties
  * @dad: Parent struct device_node
@@ -389,10 +389,10 @@ static void reverse_nodes(struct device_node *parent)
  *
  * It returns the size of unflattened device tree or error code
  */
-static int unflatten_dt_node(const void *blob,
-			     void *mem,
-			     struct device_node *dad,
-			     struct device_node **nodepp)
+static int unflatten_dt_nodes(const void *blob,
+			      void *mem,
+			      struct device_node *dad,
+			      struct device_node **nodepp)
 {
 	struct device_node *root;
 	int offset = 0, depth = 0;
@@ -479,7 +479,7 @@ static void __unflatten_device_tree(const void *blob,
 	}
 
 	/* First pass, scan for size */
-	size = unflatten_dt_node(blob, NULL, NULL, NULL);
+	size = unflatten_dt_nodes(blob, NULL, NULL, NULL);
 	if (size < 0)
 		return;
 
@@ -495,7 +495,7 @@ static void __unflatten_device_tree(const void *blob,
 	pr_debug("  unflattening %p...\n", mem);
 
 	/* Second pass, do actual unflattening */
-	unflatten_dt_node(blob, mem, NULL, mynodes);
+	unflatten_dt_nodes(blob, mem, NULL, mynodes);
 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
 		pr_warning("End of tree marker overwritten: %08x\n",
 			   be32_to_cpup(mem + size));
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
@ 2016-02-17  3:44     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This renames unflatten_dt_node() to unflatten_dt_nodes() as it
populates multiple device nodes from FDT blob. No logical changes
introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 667a5b2..3fc9a30 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -381,7 +381,7 @@ static void reverse_nodes(struct device_node *parent)
 }
 
 /**
- * unflatten_dt_node - Alloc and populate a device_node from the flat tree
+ * unflatten_dt_nodes - Alloc and populate a device_node from the flat tree
  * @blob: The parent device tree blob
  * @mem: Memory chunk to use for allocating device nodes and properties
  * @dad: Parent struct device_node
@@ -389,10 +389,10 @@ static void reverse_nodes(struct device_node *parent)
  *
  * It returns the size of unflattened device tree or error code
  */
-static int unflatten_dt_node(const void *blob,
-			     void *mem,
-			     struct device_node *dad,
-			     struct device_node **nodepp)
+static int unflatten_dt_nodes(const void *blob,
+			      void *mem,
+			      struct device_node *dad,
+			      struct device_node **nodepp)
 {
 	struct device_node *root;
 	int offset = 0, depth = 0;
@@ -479,7 +479,7 @@ static void __unflatten_device_tree(const void *blob,
 	}
 
 	/* First pass, scan for size */
-	size = unflatten_dt_node(blob, NULL, NULL, NULL);
+	size = unflatten_dt_nodes(blob, NULL, NULL, NULL);
 	if (size < 0)
 		return;
 
@@ -495,7 +495,7 @@ static void __unflatten_device_tree(const void *blob,
 	pr_debug("  unflattening %p...\n", mem);
 
 	/* Second pass, do actual unflattening */
-	unflatten_dt_node(blob, mem, NULL, mynodes);
+	unflatten_dt_nodes(blob, mem, NULL, mynodes);
 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
 		pr_warning("End of tree marker overwritten: %08x\n",
 			   be32_to_cpup(mem + size));
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (35 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 41/45] drivers/of: Avoid recursively calling unflatten_dt_node() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
  2016-02-17 15:00   ` Rob Herring
  2016-02-17 15:58     ` Jyri Sarha
  2016-02-17  3:44 ` [PATCH v8 44/45] drivers/of: Return allocated memory from of_fdt_unflatten_tree() Gavin Shan
                   ` (2 subsequent siblings)
  39 siblings, 2 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan, Jyri Sarha

This adds one more argument to of_fdt_unflatten_tree() to specify
the parent node of the FDT blob that is going to be unflattened.
In the result, the function can be used to unflatten FDT blob that
represents device sub-tree in PowerNV PCI hotplug driver.

Cc: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c |  2 +-
 drivers/of/fdt.c                             | 14 ++++++++++----
 drivers/of/unittest.c                        |  2 +-
 include/linux/of_fdt.h                       |  1 +
 4 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
index 106679b..f9c79da 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
@@ -157,7 +157,7 @@ struct device_node * __init tilcdc_get_overlay(struct kfree_table *kft)
 	if (!overlay_data || kfree_table_add(kft, overlay_data))
 		return NULL;
 
-	of_fdt_unflatten_tree(overlay_data, &overlay);
+	of_fdt_unflatten_tree(overlay_data, NULL, &overlay);
 	if (!overlay) {
 		pr_warn("%s: Unfattening overlay tree failed\n", __func__);
 		return NULL;
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 3fc9a30..16a1ba5 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -450,11 +450,13 @@ static int unflatten_dt_nodes(const void *blob,
  * pointers of the nodes so the normal device-tree walking functions
  * can be used.
  * @blob: The blob to expand
+ * @dad: Parent device node
  * @mynodes: The device_node tree created by the call
  * @dt_alloc: An allocator that provides a virtual address to memory
  * for the resulting tree
  */
 static void __unflatten_device_tree(const void *blob,
+			     struct device_node *dad,
 			     struct device_node **mynodes,
 			     void * (*dt_alloc)(u64 size, u64 align))
 {
@@ -479,7 +481,7 @@ static void __unflatten_device_tree(const void *blob,
 	}
 
 	/* First pass, scan for size */
-	size = unflatten_dt_nodes(blob, NULL, NULL, NULL);
+	size = unflatten_dt_nodes(blob, NULL, dad, NULL);
 	if (size < 0)
 		return;
 
@@ -495,7 +497,7 @@ static void __unflatten_device_tree(const void *blob,
 	pr_debug("  unflattening %p...\n", mem);
 
 	/* Second pass, do actual unflattening */
-	unflatten_dt_nodes(blob, mem, NULL, mynodes);
+	unflatten_dt_nodes(blob, mem, dad, mynodes);
 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
 		pr_warning("End of tree marker overwritten: %08x\n",
 			   be32_to_cpup(mem + size));
@@ -512,6 +514,9 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
 
 /**
  * of_fdt_unflatten_tree - create tree of device_nodes from flat blob
+ * @blob: Flat device tree blob
+ * @dad: Parent device node
+ * @mynodes: The device tree created by the call
  *
  * unflattens the device-tree passed by the firmware, creating the
  * tree of struct device_node. It also fills the "name" and "type"
@@ -519,10 +524,11 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
  * can be used.
  */
 void of_fdt_unflatten_tree(const unsigned long *blob,
+			struct device_node *dad,
 			struct device_node **mynodes)
 {
 	mutex_lock(&of_fdt_unflatten_mutex);
-	__unflatten_device_tree(blob, mynodes, &kernel_tree_alloc);
+	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
 	mutex_unlock(&of_fdt_unflatten_mutex);
 }
 EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
@@ -1180,7 +1186,7 @@ bool __init early_init_dt_scan(void *params)
  */
 void __init unflatten_device_tree(void)
 {
-	__unflatten_device_tree(initial_boot_params, &of_root,
+	__unflatten_device_tree(initial_boot_params, NULL, &of_root,
 				early_init_dt_alloc_memory_arch);
 
 	/* Get pointer to "/chosen" and "/aliases" nodes for use everywhere */
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 979b6e4..ec36f93 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -921,7 +921,7 @@ static int __init unittest_data_add(void)
 			"not running tests\n", __func__);
 		return -ENOMEM;
 	}
-	of_fdt_unflatten_tree(unittest_data, &unittest_data_node);
+	of_fdt_unflatten_tree(unittest_data, NULL, &unittest_data_node);
 	if (!unittest_data_node) {
 		pr_warn("%s: No tree to attach; not running tests\n", __func__);
 		return -ENODATA;
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index df9ef38..3644960 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -38,6 +38,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
 extern int of_fdt_match(const void *blob, unsigned long node,
 			const char *const *compat);
 extern void of_fdt_unflatten_tree(const unsigned long *blob,
+			       struct device_node *dad,
 			       struct device_node **mynodes);
 
 /* TBD: Temporary export of fdt globals - remove when code fully merged */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 44/45] drivers/of: Return allocated memory from of_fdt_unflatten_tree()
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (36 preceding siblings ...)
  2016-02-17  3:44 ` [PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree() Gavin Shan
@ 2016-02-17  3:44 ` Gavin Shan
       [not found] ` <1455680668-23298-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2016-04-13  7:28 ` [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Alexey Kardashevskiy
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This returns the allocate memory chunk, storing the unflattened device
tree, from of_fdt_unflatten_tree() so that memory chunk can be released
on demand in PowerNV PCI hotplug driver.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 drivers/of/fdt.c       | 33 ++++++++++++++++++++++-----------
 include/linux/of_fdt.h |  6 +++---
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 16a1ba5..47ec278 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -454,11 +454,14 @@ static int unflatten_dt_nodes(const void *blob,
  * @mynodes: The device_node tree created by the call
  * @dt_alloc: An allocator that provides a virtual address to memory
  * for the resulting tree
+ *
+ * Returns NULL on failure or the memory chunk containing the unflattened
+ * device tree on success.
  */
-static void __unflatten_device_tree(const void *blob,
-			     struct device_node *dad,
-			     struct device_node **mynodes,
-			     void * (*dt_alloc)(u64 size, u64 align))
+static void *__unflatten_device_tree(const void *blob,
+				     struct device_node *dad,
+				     struct device_node **mynodes,
+				     void *(*dt_alloc)(u64 size, u64 align))
 {
 	int size;
 	void *mem;
@@ -467,7 +470,7 @@ static void __unflatten_device_tree(const void *blob,
 
 	if (!blob) {
 		pr_debug("No device tree pointer\n");
-		return;
+		return NULL;
 	}
 
 	pr_debug("Unflattening device tree:\n");
@@ -477,13 +480,13 @@ static void __unflatten_device_tree(const void *blob,
 
 	if (fdt_check_header(blob)) {
 		pr_err("Invalid device tree blob header\n");
-		return;
+		return NULL;
 	}
 
 	/* First pass, scan for size */
 	size = unflatten_dt_nodes(blob, NULL, dad, NULL);
 	if (size < 0)
-		return;
+		return NULL;
 
 	size = ALIGN(size, 4);
 	pr_debug("  size is %d, allocating...\n", size);
@@ -503,6 +506,7 @@ static void __unflatten_device_tree(const void *blob,
 			   be32_to_cpup(mem + size));
 
 	pr_debug(" <- unflatten_device_tree()\n");
+	return mem;
 }
 
 static void *kernel_tree_alloc(u64 size, u64 align)
@@ -522,14 +526,21 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
  * tree of struct device_node. It also fills the "name" and "type"
  * pointers of the nodes so the normal device-tree walking functions
  * can be used.
+ *
+ * Returns NULL on failure or the memory chunk containing the unflattened
+ * device tree on success.
  */
-void of_fdt_unflatten_tree(const unsigned long *blob,
-			struct device_node *dad,
-			struct device_node **mynodes)
+void *of_fdt_unflatten_tree(const unsigned long *blob,
+			    struct device_node *dad,
+			    struct device_node **mynodes)
 {
+	void *mem;
+
 	mutex_lock(&of_fdt_unflatten_mutex);
-	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
+	mem = __unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
 	mutex_unlock(&of_fdt_unflatten_mutex);
+
+	return mem;
 }
 EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
 
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 3644960..b87b26a7 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -37,9 +37,9 @@ extern bool of_fdt_is_big_endian(const void *blob,
 				 unsigned long node);
 extern int of_fdt_match(const void *blob, unsigned long node,
 			const char *const *compat);
-extern void of_fdt_unflatten_tree(const unsigned long *blob,
-			       struct device_node *dad,
-			       struct device_node **mynodes);
+extern void *of_fdt_unflatten_tree(const unsigned long *blob,
+				   struct device_node *dad,
+				   struct device_node **mynodes);
 
 /* TBD: Temporary export of fdt globals - remove when code fully merged */
 extern int __initdata dt_root_addr_cells;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2016-02-17  3:44     ` Gavin Shan
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                       ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, aik-sLpHqDYs0B2HXe+LvDLADg,
	dja-Yfaxwxk/+vWsTnJN9+BGXg, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, Gavin Shan

This adds standalone driver to support PCI hotplug for PowerPC PowerNV
platform that runs on top of skiboot firmware. The firmware identifies
hotpluggable slots and marked their device tree node with proper
"ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
device tree nodes to create/register PCI hotplug slot accordingly.

The PCI slots are organized in fashion of tree, which means one
PCI slot might have parent PCI slot and parent PCI slot possibly
contains multiple child PCI slots. At the plugging time, the parent
PCI slot is populated before its children. The child PCI slots are
removed before their parent PCI slot can be removed from the system.

If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property "ibm,reset-by-firmware". In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 drivers/pci/hotplug/Kconfig   |  12 +
 drivers/pci/hotplug/Makefile  |   3 +
 drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 885 insertions(+)
 create mode 100644 drivers/pci/hotplug/pnv_php.c

diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..167c8ce 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
 
 	  When in doubt, say N.
 
+config HOTPLUG_PCI_POWERNV
+	tristate "PowerPC PowerNV PCI Hotplug driver"
+	depends on PPC_POWERNV && EEH
+	help
+	  Say Y here if you run PowerPC PowerNV platform that supports
+	  PCI Hotplug
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called pnv-php.
+
+	  When in doubt, say N.
+
 config HOTPLUG_PCI_RPA
 	tristate "RPA PCI Hotplug driver"
 	depends on PPC_PSERIES && EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index b616e75..e33cdda 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
 obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
@@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
 acpiphp-objs		:=	acpiphp_core.o	\
 				acpiphp_glue.o
 
+pnv-php-objs		:=	pnv_php.o
+
 rpaphp-objs		:=	rpaphp_core.o	\
 				rpaphp_pci.o	\
 				rpaphp_slot.o
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
new file mode 100644
index 0000000..364ec36
--- /dev/null
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -0,0 +1,870 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/libfdt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+#include <asm/ppc-pci.h>
+
+#define DRIVER_VERSION	"0.1"
+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
+
+struct pnv_php_slot {
+	struct hotplug_slot		slot;
+	struct hotplug_slot_info	slot_info;
+	uint64_t			id;
+	char				*name;
+	int				slot_no;
+	struct kref			kref;
+#define PNV_PHP_STATE_INITIALIZED	0
+#define PNV_PHP_STATE_REGISTERED	1
+#define PNV_PHP_STATE_POPULATED		2
+	int				state;
+	struct device_node		*dn;
+	struct pci_dev			*pdev;
+	struct pci_bus			*bus;
+	bool				power_state_check;
+	int				power_state_confirmed;
+#define PNV_PHP_POWER_CONFIRMED_INVALID	0
+#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
+#define PNV_PHP_POWER_CONFIRMED_FAIL	2
+	struct opal_msg			*msg;
+	void				*fdt;
+	void				*dt;
+	struct of_changeset		ocs;
+	struct work_struct		work;
+	wait_queue_head_t		queue;
+	struct pnv_php_slot		*parent;
+	struct list_head		children;
+	struct list_head		link;
+};
+
+static LIST_HEAD(pnv_php_slot_list);
+static DEFINE_SPINLOCK(pnv_php_lock);
+
+static void pnv_php_register(struct device_node *dn);
+static void pnv_php_unregister_one(struct device_node *dn);
+static void pnv_php_unregister(struct device_node *dn);
+
+static void pnv_php_free_slot(struct kref *kref)
+{
+	struct pnv_php_slot *php_slot = container_of(kref,
+						     struct pnv_php_slot,
+						     kref);
+
+	WARN_ON(!list_empty(&php_slot->children));
+	kfree(php_slot->name);
+	kfree(php_slot);
+}
+
+static inline void pnv_php_put_slot(struct pnv_php_slot *php_slot)
+{
+	if (!php_slot)
+		return;
+
+	kref_put(&php_slot->kref, pnv_php_free_slot);
+}
+
+static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
+					  struct pnv_php_slot *php_slot)
+{
+	struct pnv_php_slot *target, *tmp;
+
+	if (php_slot->dn == dn) {
+		kref_get(&php_slot->kref);
+		return php_slot;
+	}
+
+	list_for_each_entry(tmp, &php_slot->children, link) {
+		target = pnv_php_match(dn, tmp);
+		if (target)
+			return target;
+	}
+
+	return NULL;
+}
+
+static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
+{
+	struct pnv_php_slot *php_slot, *tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pnv_php_lock, flags);
+	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
+		php_slot = pnv_php_match(dn, tmp);
+		if (php_slot) {
+			spin_unlock_irqrestore(&pnv_php_lock, flags);
+			return php_slot;
+		}
+	}
+	spin_unlock_irqrestore(&pnv_php_lock, flags);
+
+	return NULL;
+}
+
+/*
+ * Remove pdn for all children of the indicated device node.
+ * The function should remove pdn in a depth-first manner.
+ */
+static void pnv_php_rmv_pdns(struct device_node *dn)
+{
+	struct device_node *child;
+
+	for_each_child_of_node(dn, child) {
+		pnv_php_rmv_pdns(child);
+
+		pci_remove_device_node_info(child);
+	}
+}
+
+/*
+ * Remove all child nodes of the indicated device nodes. The
+ * function should remove device nodes in depth-first manner.
+ */
+static int pnv_php_rmv_device_nodes(struct device_node *parent)
+{
+	struct device_node *dn, *child;
+	int ret = 0;
+
+	for_each_child_of_node(parent, dn) {
+		ret = pnv_php_rmv_device_nodes(dn);
+		if (ret)
+			return ret;
+
+		child = of_get_next_child(dn, NULL);
+		if (child) {
+			of_node_put(child);
+			of_node_put(dn);
+			pr_err("%s: Alive children of node <%s>\n",
+			       __func__, of_node_full_name(dn));
+			return -EBUSY;
+		}
+
+		of_detach_node(dn);
+		of_node_put(dn);
+	}
+
+	return 0;
+}
+
+/*
+ * The function processes the message sent by firmware
+ * to remove all device tree nodes beneath the slot's
+ * nodes and the associated auxiliary data.
+ */
+static void pnv_php_handle_poweroff(struct pnv_php_slot *php_slot)
+{
+	int ret;
+
+	pnv_php_rmv_pdns(php_slot->dn);
+
+	/*
+	 * If the device sub-tree was created from OF changeset, simply
+	 * to revert that. Otherwise, the device nodes in the sub-tree
+	 * need to be iterated and detached.
+	 */
+	if (php_slot->fdt) {
+		of_changeset_destroy(&php_slot->ocs);
+		kfree(php_slot->dt);
+		kfree(php_slot->fdt);
+		php_slot->dt        = NULL;
+		php_slot->dn->child = NULL;
+		php_slot->fdt       = NULL;
+		php_slot->power_state_confirmed =
+			PNV_PHP_POWER_CONFIRMED_SUCCESS;
+		wake_up_interruptible(&php_slot->queue);
+		return;
+	}
+
+	ret = pnv_php_rmv_device_nodes(php_slot->dn);
+	if (!ret) {
+		php_slot->power_state_confirmed =
+			PNV_PHP_POWER_CONFIRMED_SUCCESS;
+	} else {
+		php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
+		dev_warn(&php_slot->pdev->dev, "Error %d freeing nodes\n", ret);
+	}
+
+	wake_up_interruptible(&php_slot->queue);
+}
+
+static int pnv_php_populate_changeset(struct of_changeset *ocs,
+				      struct device_node *dn)
+{
+	struct device_node *child;
+	int ret = 0;
+
+	for_each_child_of_node(dn, child) {
+		ret = of_changeset_attach_node(ocs, child);
+		if (ret)
+			break;
+
+		ret = pnv_php_populate_changeset(ocs, child);
+	}
+
+	return ret;
+}
+
+static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
+{
+	struct pci_controller *hose = (struct pci_controller *)data;
+	struct pci_dn *pdn;
+
+	pdn = pci_add_device_node_info(hose, dn);
+	if (!pdn)
+		return ERR_PTR(-ENOMEM);
+
+	return NULL;
+}
+
+static void pnv_php_add_pdns(struct pnv_php_slot *slot)
+{
+	struct pci_controller *hose = pci_bus_to_host(slot->bus);
+
+	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
+}
+
+static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
+{
+	void *fdt, *fdt1, *dt;
+	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
+	int ret;
+
+	/* We don't know the FDT blob size. We try to get it through
+	 * maximal memory chunk and then copy it to another chunk that
+	 * fits the real size.
+	 */
+	fdt1 = kzalloc(0x10000, GFP_KERNEL);
+	if (!fdt1)
+		goto error;
+
+	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
+	if (ret)
+		goto free_fdt1;
+
+	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
+	if (!fdt)
+		goto free_fdt1;
+
+	/* Unflatten device tree blob */
+	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
+	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
+	if (!dt) {
+		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
+		goto free_fdt;
+	}
+
+	/* Initialize and apply the changeset */
+	of_changeset_init(&php_slot->ocs);
+	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
+	if (ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
+			 ret);
+		goto free_dt;
+	}
+
+	php_slot->dn->child = NULL;
+	ret = of_changeset_apply(&php_slot->ocs);
+	if (ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
+			 ret);
+		goto destroy_changeset;
+	}
+
+	/* Add device node firmware data */
+	pnv_php_add_pdns(php_slot);
+	php_slot->fdt = fdt;
+	php_slot->dt  = dt;
+	goto out;
+
+destroy_changeset:
+	of_changeset_destroy(&php_slot->ocs);
+free_dt:
+	kfree(dt);
+	php_slot->dn->child = NULL;
+free_fdt:
+	kfree(fdt);
+free_fdt1:
+	kfree(fdt1);
+error:
+	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
+out:
+	/* Confirm status change */
+	php_slot->power_state_confirmed = confirm;
+	wake_up_interruptible(&php_slot->queue);
+}
+
+static void pnv_php_work(struct work_struct *data)
+{
+	struct pnv_php_slot *php_slot = container_of(data,
+						     struct pnv_php_slot,
+						     work);
+	uint64_t event = be64_to_cpu(php_slot->msg->params[0]);
+
+	if (event == OPAL_PCI_SLOT_POWER_OFF)
+		pnv_php_handle_poweroff(php_slot);
+	else
+		pnv_php_handle_poweron(php_slot);
+
+	pnv_php_put_slot(php_slot);
+}
+
+static int pnv_php_handle_msg(struct notifier_block *nb,
+			      unsigned long type,
+			      void *message)
+{
+	phandle h;
+	struct device_node *dn;
+	struct pnv_php_slot *php_slot;
+	struct opal_msg *msg = message;
+
+	if (type != OPAL_MSG_PCI_HOTPLUG) {
+		pr_warn("%s: Invalid message %ld received!\n",
+			__func__, type);
+		return NOTIFY_DONE;
+	}
+
+	h = (phandle)be64_to_cpu(msg->params[1]);
+	dn = of_find_node_by_phandle(h);
+	if (!dn) {
+		pr_warn("%s: No device node for phandle 0x%x\n",
+			__func__, h);
+		return NOTIFY_DONE;
+	}
+
+	php_slot = pnv_php_find_slot(dn);
+	if (!php_slot) {
+		pr_warn("%s: No slot found for node <%s>\n",
+			__func__, of_node_full_name(dn));
+		of_node_put(dn);
+		return NOTIFY_DONE;
+	}
+
+	of_node_put(dn);
+	php_slot->msg = msg;
+	schedule_work(&php_slot->work);
+	return NOTIFY_OK;
+}
+
+static int pnv_php_set_power_state(struct hotplug_slot *slot, u8 state)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	int ret;
+
+	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
+	ret = pnv_pci_set_power_state(php_slot->id, state);
+	if (ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d powering %s slot\n",
+			 ret, state ? "on" : "off");
+		return ret;
+	}
+
+	/* Continue to PCI probing after finalized device-tree. The
+	 * device-tree might have been updated completely at this
+	 * point. Thus we don't have to wait forever.
+	 */
+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
+		return 0;
+
+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
+		return -EBUSY;
+
+	/* Wait for firmware to add or remove device sub-tree. When it's done,
+	 * one signal is received from firmware.
+	 */
+	ret = wait_event_timeout(php_slot->queue,
+				 php_slot->power_state_confirmed, 10 * HZ);
+	if (!ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d waiting for power-%s\n",
+			 ret, state ? "on" : "off");
+		return -EBUSY;
+	}
+
+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
+		return 0;
+
+	dev_warn(&php_slot->pdev->dev, "Error status %d for power-%s\n",
+		 php_slot->power_state_confirmed, state ? "on" : "off");
+	return -EBUSY;
+}
+
+static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	uint8_t power_state;
+	int ret;
+
+	/*
+	 * Retrieve power status from firmware. If we fail
+	 * getting that, the power status fails back to
+	 * be on.
+	 */
+	ret = pnv_pci_get_power_state(php_slot->id, &power_state);
+	if (ret) {
+		*state = OPAL_PCI_SLOT_POWER_ON;
+		dev_warn(&php_slot->pdev->dev, "Error %d getting power status\n",
+			 ret);
+	} else {
+		*state = power_state;
+		slot->info->power_status = power_state;
+	}
+
+	return 0;
+}
+
+static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	uint8_t presence;
+	int ret;
+
+	/*
+	 * Retrieve presence status from firmware. If we can't
+	 * get that, it will fail back to be empty.
+	 */
+	ret = pnv_pci_get_presence_state(php_slot->id, &presence);
+	if (ret >= 0) {
+		*state = presence;
+		slot->info->adapter_status = presence;
+		ret = 0;
+	} else {
+		*state = OPAL_PCI_SLOT_EMPTY;
+		dev_warn(&php_slot->pdev->dev, "Error %d getting presence\n",
+			 ret);
+	}
+
+	return ret;
+}
+
+static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
+{
+	/* FIXME: Make it real once firmware supports it */
+	slot->info->attention_status = state;
+
+	return 0;
+}
+
+static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
+{
+	struct hotplug_slot *slot = &php_slot->slot;
+	uint8_t presence, power_status;
+	int ret;
+
+	/* Check if the slot has been configured */
+	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
+		return 0;
+
+	/* Retrieve slot presence status */
+	ret = pnv_php_get_adapter_state(slot, &presence);
+	if (ret)
+		return ret;
+
+	/* Proceed if there have nothing behind the slot */
+	if (presence == OPAL_PCI_SLOT_EMPTY)
+		goto scan;
+
+	/*
+	 * If the power suply to the slot is off, we can't detect
+	 * adapter presence state. That means we have to turn the
+	 * slot on before going to probe slot's presence state.
+	 *
+	 * On the first time, we don't change the power status to
+	 * boost system boot with assumption that the firmware
+	 * supplies consistent slot power status: empty slot always
+	 * has its power off and non-empty slot has its power on.
+	 */
+	if (!php_slot->power_state_check) {
+		php_slot->power_state_check = true;
+
+		ret = pnv_php_get_power_state(slot, &power_status);
+		if (ret)
+			return ret;
+
+		if (power_status != OPAL_PCI_SLOT_POWER_ON)
+			return 0;
+	}
+
+	/* Check the power status. Scan the slot if that's already on */
+	ret = pnv_php_get_power_state(slot, &power_status);
+	if (ret)
+		return ret;
+
+	if (power_status == OPAL_PCI_SLOT_POWER_ON)
+		goto scan;
+
+	/* Power is off, turn it on and then scan the slot */
+	ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+	if (ret)
+		return ret;
+
+scan:
+	if (presence == OPAL_PCI_SLOT_PRESENT) {
+		if (rescan) {
+			pci_lock_rescan_remove();
+			pci_add_pci_devices(php_slot->bus);
+			pci_unlock_rescan_remove();
+		}
+
+		/* Rescan for child hotpluggable slots */
+		php_slot->state = PNV_PHP_STATE_POPULATED;
+		if (rescan)
+			pnv_php_register(php_slot->dn);
+	} else {
+		php_slot->state = PNV_PHP_STATE_POPULATED;
+	}
+
+	return 0;
+}
+
+static int pnv_php_enable_slot(struct hotplug_slot *slot)
+{
+	struct pnv_php_slot *php_slot = container_of(slot,
+						     struct pnv_php_slot, slot);
+
+	return pnv_php_enable(php_slot, true);
+}
+
+static int pnv_php_disable_slot(struct hotplug_slot *slot)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	uint8_t power_state;
+	int ret;
+
+	if (php_slot->state != PNV_PHP_STATE_POPULATED)
+		return 0;
+
+	/* Remove all devices behind the slot */
+	pci_lock_rescan_remove();
+	pci_remove_pci_devices(php_slot->bus);
+	pci_unlock_rescan_remove();
+
+	/* Detach the child hotpluggable slots */
+	pnv_php_unregister(php_slot->dn);
+
+	/*
+	 * Check the power status and turn it off if necessary. If we
+	 * fail to get the power status, the power will be forced to
+	 * be off.
+	 */
+	ret = pnv_php_get_power_state(slot, &power_state);
+	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
+		ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_OFF);
+		if (ret)
+			dev_warn(&php_slot->pdev->dev, "Error %d powering off\n",
+				 ret);
+	}
+
+	/* Update slot state */
+	php_slot->state = PNV_PHP_STATE_REGISTERED;
+	return 0;
+}
+
+static struct hotplug_slot_ops php_slot_ops = {
+	.get_power_status	= pnv_php_get_power_state,
+	.get_adapter_status	= pnv_php_get_adapter_state,
+	.set_attention_status	= pnv_php_set_attention_state,
+	.enable_slot		= pnv_php_enable_slot,
+	.disable_slot		= pnv_php_disable_slot,
+};
+
+static void pnv_php_release(struct hotplug_slot *slot)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	unsigned long flags;
+
+	/* Remove from global or child list */
+	spin_lock_irqsave(&pnv_php_lock, flags);
+	list_del(&php_slot->link);
+	spin_unlock_irqrestore(&pnv_php_lock, flags);
+
+	/* Detach from parent */
+	pnv_php_put_slot(php_slot);
+	pnv_php_put_slot(php_slot->parent);
+}
+
+static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
+{
+	struct device_node *parent = dn;
+	const __be64 *prop64;
+	const __be32 *prop32;
+
+	/*
+	 * The hotpluggable slot always has a compound Id, which
+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
+	 * number, and compound indicator
+	 */
+	*id = (0x1ul << 63);
+
+	/* Bus/Slot/Function number */
+	prop32 = of_get_property(dn, "reg", NULL);
+	if (!prop32)
+		return -ENXIO;
+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
+
+	/* PHB Id */
+	while ((parent = of_get_parent(parent))) {
+		if (!PCI_DN(parent)) {
+			of_node_put(parent);
+			break;
+		}
+
+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
+			of_node_put(parent);
+			continue;
+		}
+
+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
+		if (!prop64) {
+			of_node_put(parent);
+			return -ENXIO;
+		}
+
+		*id |= be64_to_cpup(prop64);
+		of_node_put(parent);
+		return 0;
+	}
+
+	return -ENODEV;
+}
+
+static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
+{
+	struct pnv_php_slot *php_slot;
+	struct pci_bus *bus;
+	const char *label;
+	uint64_t id;
+
+	label = of_get_property(dn, "ibm,slot-label", NULL);
+	if (!label)
+		return NULL;
+
+	if (pnv_php_get_slot_id(dn, &id))
+		return NULL;
+
+	bus = pci_find_bus_by_node(dn);
+	if (!bus)
+		return NULL;
+
+	php_slot = kzalloc(sizeof(*php_slot), GFP_KERNEL);
+	if (!php_slot)
+		return NULL;
+
+	php_slot->name = kstrdup(label, GFP_KERNEL);
+	if (!php_slot->name) {
+		kfree(php_slot);
+		return NULL;
+	}
+
+	if (dn->child && PCI_DN(dn->child))
+		php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
+	else
+		php_slot->slot_no = -1;   /* Placeholder slot */
+
+	kref_init(&php_slot->kref);
+	php_slot->state	                = PNV_PHP_STATE_INITIALIZED;
+	php_slot->dn	                = dn;
+	php_slot->pdev	                = bus->self;
+	php_slot->bus	                = bus;
+	php_slot->id	                = id;
+	php_slot->power_state_check     = false;
+	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
+	php_slot->slot.ops              = &php_slot_ops;
+	php_slot->slot.info             = &php_slot->slot_info;
+	php_slot->slot.release          = pnv_php_release;
+	php_slot->slot.private          = php_slot;
+
+	INIT_WORK(&php_slot->work, pnv_php_work);
+	init_waitqueue_head(&php_slot->queue);
+	INIT_LIST_HEAD(&php_slot->children);
+	INIT_LIST_HEAD(&php_slot->link);
+
+	return php_slot;
+}
+
+static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
+{
+	struct pnv_php_slot *parent;
+	struct device_node *dn = php_slot->dn;
+	unsigned long flags;
+	int ret;
+
+	/* Check if the slot is registered or not */
+	parent = pnv_php_find_slot(php_slot->dn);
+	if (parent) {
+		pnv_php_put_slot(parent);
+		return -EEXIST;
+	}
+
+	/* Register PCI slot */
+	ret = pci_hp_register(&php_slot->slot, php_slot->bus,
+			      php_slot->slot_no, php_slot->name);
+	if (ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d registering slot\n",
+			 ret);
+		return ret;
+	}
+
+	/* Attach to the parent's child list or global list */
+	while ((dn = of_get_parent(dn))) {
+		if (!PCI_DN(dn)) {
+			of_node_put(dn);
+			break;
+		}
+
+		parent = pnv_php_find_slot(dn);
+		if (parent) {
+			of_node_put(dn);
+			break;
+		}
+
+		of_node_put(dn);
+	}
+
+	spin_lock_irqsave(&pnv_php_lock, flags);
+	php_slot->parent = parent;
+	if (parent)
+		list_add_tail(&php_slot->link, &parent->children);
+	else
+		list_add_tail(&php_slot->link, &pnv_php_slot_list);
+	spin_unlock_irqrestore(&pnv_php_lock, flags);
+
+	php_slot->state = PNV_PHP_STATE_REGISTERED;
+	return 0;
+}
+
+static int pnv_php_register_one(struct device_node *dn)
+{
+	struct pnv_php_slot *php_slot;
+	const __be32 *prop32;
+	int ret;
+
+	/* Check if it's hotpluggable slot */
+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return -ENXIO;
+
+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return -ENXIO;
+
+	php_slot = pnv_php_alloc_slot(dn);
+	if (!php_slot)
+		return -ENODEV;
+
+	ret = pnv_php_register_slot(php_slot);
+	if (ret)
+		goto free_slot;
+
+	ret = pnv_php_enable(php_slot, false);
+	if (ret)
+		goto unregister_slot;
+
+	return 0;
+
+unregister_slot:
+	pnv_php_unregister_one(php_slot->dn);
+free_slot:
+	pnv_php_put_slot(php_slot);
+	return ret;
+}
+
+static void pnv_php_register(struct device_node *dn)
+{
+	struct device_node *child;
+
+	/*
+	 * The parent slots should be registered before their
+	 * child slots.
+	 */
+	for_each_child_of_node(dn, child) {
+		pnv_php_register_one(child);
+		pnv_php_register(child);
+	}
+}
+
+static void pnv_php_unregister_one(struct device_node *dn)
+{
+	struct pnv_php_slot *php_slot;
+
+	php_slot = pnv_php_find_slot(dn);
+	if (!php_slot)
+		return;
+
+	pnv_php_put_slot(php_slot);
+	pci_hp_deregister(&php_slot->slot);
+}
+
+static void pnv_php_unregister(struct device_node *dn)
+{
+	struct device_node *child;
+
+	/* The child slots should go before their parent slots */
+	for_each_child_of_node(dn, child) {
+		pnv_php_unregister(child);
+		pnv_php_unregister_one(child);
+	}
+}
+
+static struct notifier_block php_msg_nb = {
+	.notifier_call	= pnv_php_handle_msg,
+	.next		= NULL,
+	.priority	= 0,
+};
+
+static int __init pnv_php_init(void)
+{
+	struct device_node *dn;
+	int ret;
+
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+
+	/* Register hotplug message handler */
+	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
+	if (ret) {
+		pr_warn("%s: Error %d registering hotplug notifier\n",
+			__func__, ret);
+		return ret;
+	}
+
+	/* Scan PHB nodes and their children */
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		pnv_php_register(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		pnv_php_register(dn);
+
+	return 0;
+}
+
+static void __exit pnv_php_exit(void)
+{
+	struct device_node *dn;
+
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		pnv_php_unregister(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		pnv_php_unregister(dn);
+
+	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
+}
+
+module_init(pnv_php_init);
+module_exit(pnv_php_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
@ 2016-02-17  3:44     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-02-17  3:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely, Gavin Shan

This adds standalone driver to support PCI hotplug for PowerPC PowerNV
platform that runs on top of skiboot firmware. The firmware identifies
hotpluggable slots and marked their device tree node with proper
"ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
device tree nodes to create/register PCI hotplug slot accordingly.

The PCI slots are organized in fashion of tree, which means one
PCI slot might have parent PCI slot and parent PCI slot possibly
contains multiple child PCI slots. At the plugging time, the parent
PCI slot is populated before its children. The child PCI slots are
removed before their parent PCI slot can be removed from the system.

If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property "ibm,reset-by-firmware". In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/Kconfig   |  12 +
 drivers/pci/hotplug/Makefile  |   3 +
 drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 885 insertions(+)
 create mode 100644 drivers/pci/hotplug/pnv_php.c

diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..167c8ce 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
 
 	  When in doubt, say N.
 
+config HOTPLUG_PCI_POWERNV
+	tristate "PowerPC PowerNV PCI Hotplug driver"
+	depends on PPC_POWERNV && EEH
+	help
+	  Say Y here if you run PowerPC PowerNV platform that supports
+	  PCI Hotplug
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called pnv-php.
+
+	  When in doubt, say N.
+
 config HOTPLUG_PCI_RPA
 	tristate "RPA PCI Hotplug driver"
 	depends on PPC_PSERIES && EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index b616e75..e33cdda 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
 obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
@@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
 acpiphp-objs		:=	acpiphp_core.o	\
 				acpiphp_glue.o
 
+pnv-php-objs		:=	pnv_php.o
+
 rpaphp-objs		:=	rpaphp_core.o	\
 				rpaphp_pci.o	\
 				rpaphp_slot.o
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
new file mode 100644
index 0000000..364ec36
--- /dev/null
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -0,0 +1,870 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/libfdt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+#include <asm/ppc-pci.h>
+
+#define DRIVER_VERSION	"0.1"
+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
+
+struct pnv_php_slot {
+	struct hotplug_slot		slot;
+	struct hotplug_slot_info	slot_info;
+	uint64_t			id;
+	char				*name;
+	int				slot_no;
+	struct kref			kref;
+#define PNV_PHP_STATE_INITIALIZED	0
+#define PNV_PHP_STATE_REGISTERED	1
+#define PNV_PHP_STATE_POPULATED		2
+	int				state;
+	struct device_node		*dn;
+	struct pci_dev			*pdev;
+	struct pci_bus			*bus;
+	bool				power_state_check;
+	int				power_state_confirmed;
+#define PNV_PHP_POWER_CONFIRMED_INVALID	0
+#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
+#define PNV_PHP_POWER_CONFIRMED_FAIL	2
+	struct opal_msg			*msg;
+	void				*fdt;
+	void				*dt;
+	struct of_changeset		ocs;
+	struct work_struct		work;
+	wait_queue_head_t		queue;
+	struct pnv_php_slot		*parent;
+	struct list_head		children;
+	struct list_head		link;
+};
+
+static LIST_HEAD(pnv_php_slot_list);
+static DEFINE_SPINLOCK(pnv_php_lock);
+
+static void pnv_php_register(struct device_node *dn);
+static void pnv_php_unregister_one(struct device_node *dn);
+static void pnv_php_unregister(struct device_node *dn);
+
+static void pnv_php_free_slot(struct kref *kref)
+{
+	struct pnv_php_slot *php_slot = container_of(kref,
+						     struct pnv_php_slot,
+						     kref);
+
+	WARN_ON(!list_empty(&php_slot->children));
+	kfree(php_slot->name);
+	kfree(php_slot);
+}
+
+static inline void pnv_php_put_slot(struct pnv_php_slot *php_slot)
+{
+	if (!php_slot)
+		return;
+
+	kref_put(&php_slot->kref, pnv_php_free_slot);
+}
+
+static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
+					  struct pnv_php_slot *php_slot)
+{
+	struct pnv_php_slot *target, *tmp;
+
+	if (php_slot->dn == dn) {
+		kref_get(&php_slot->kref);
+		return php_slot;
+	}
+
+	list_for_each_entry(tmp, &php_slot->children, link) {
+		target = pnv_php_match(dn, tmp);
+		if (target)
+			return target;
+	}
+
+	return NULL;
+}
+
+static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
+{
+	struct pnv_php_slot *php_slot, *tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pnv_php_lock, flags);
+	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
+		php_slot = pnv_php_match(dn, tmp);
+		if (php_slot) {
+			spin_unlock_irqrestore(&pnv_php_lock, flags);
+			return php_slot;
+		}
+	}
+	spin_unlock_irqrestore(&pnv_php_lock, flags);
+
+	return NULL;
+}
+
+/*
+ * Remove pdn for all children of the indicated device node.
+ * The function should remove pdn in a depth-first manner.
+ */
+static void pnv_php_rmv_pdns(struct device_node *dn)
+{
+	struct device_node *child;
+
+	for_each_child_of_node(dn, child) {
+		pnv_php_rmv_pdns(child);
+
+		pci_remove_device_node_info(child);
+	}
+}
+
+/*
+ * Remove all child nodes of the indicated device nodes. The
+ * function should remove device nodes in depth-first manner.
+ */
+static int pnv_php_rmv_device_nodes(struct device_node *parent)
+{
+	struct device_node *dn, *child;
+	int ret = 0;
+
+	for_each_child_of_node(parent, dn) {
+		ret = pnv_php_rmv_device_nodes(dn);
+		if (ret)
+			return ret;
+
+		child = of_get_next_child(dn, NULL);
+		if (child) {
+			of_node_put(child);
+			of_node_put(dn);
+			pr_err("%s: Alive children of node <%s>\n",
+			       __func__, of_node_full_name(dn));
+			return -EBUSY;
+		}
+
+		of_detach_node(dn);
+		of_node_put(dn);
+	}
+
+	return 0;
+}
+
+/*
+ * The function processes the message sent by firmware
+ * to remove all device tree nodes beneath the slot's
+ * nodes and the associated auxiliary data.
+ */
+static void pnv_php_handle_poweroff(struct pnv_php_slot *php_slot)
+{
+	int ret;
+
+	pnv_php_rmv_pdns(php_slot->dn);
+
+	/*
+	 * If the device sub-tree was created from OF changeset, simply
+	 * to revert that. Otherwise, the device nodes in the sub-tree
+	 * need to be iterated and detached.
+	 */
+	if (php_slot->fdt) {
+		of_changeset_destroy(&php_slot->ocs);
+		kfree(php_slot->dt);
+		kfree(php_slot->fdt);
+		php_slot->dt        = NULL;
+		php_slot->dn->child = NULL;
+		php_slot->fdt       = NULL;
+		php_slot->power_state_confirmed =
+			PNV_PHP_POWER_CONFIRMED_SUCCESS;
+		wake_up_interruptible(&php_slot->queue);
+		return;
+	}
+
+	ret = pnv_php_rmv_device_nodes(php_slot->dn);
+	if (!ret) {
+		php_slot->power_state_confirmed =
+			PNV_PHP_POWER_CONFIRMED_SUCCESS;
+	} else {
+		php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
+		dev_warn(&php_slot->pdev->dev, "Error %d freeing nodes\n", ret);
+	}
+
+	wake_up_interruptible(&php_slot->queue);
+}
+
+static int pnv_php_populate_changeset(struct of_changeset *ocs,
+				      struct device_node *dn)
+{
+	struct device_node *child;
+	int ret = 0;
+
+	for_each_child_of_node(dn, child) {
+		ret = of_changeset_attach_node(ocs, child);
+		if (ret)
+			break;
+
+		ret = pnv_php_populate_changeset(ocs, child);
+	}
+
+	return ret;
+}
+
+static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
+{
+	struct pci_controller *hose = (struct pci_controller *)data;
+	struct pci_dn *pdn;
+
+	pdn = pci_add_device_node_info(hose, dn);
+	if (!pdn)
+		return ERR_PTR(-ENOMEM);
+
+	return NULL;
+}
+
+static void pnv_php_add_pdns(struct pnv_php_slot *slot)
+{
+	struct pci_controller *hose = pci_bus_to_host(slot->bus);
+
+	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
+}
+
+static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
+{
+	void *fdt, *fdt1, *dt;
+	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
+	int ret;
+
+	/* We don't know the FDT blob size. We try to get it through
+	 * maximal memory chunk and then copy it to another chunk that
+	 * fits the real size.
+	 */
+	fdt1 = kzalloc(0x10000, GFP_KERNEL);
+	if (!fdt1)
+		goto error;
+
+	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
+	if (ret)
+		goto free_fdt1;
+
+	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
+	if (!fdt)
+		goto free_fdt1;
+
+	/* Unflatten device tree blob */
+	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
+	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
+	if (!dt) {
+		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
+		goto free_fdt;
+	}
+
+	/* Initialize and apply the changeset */
+	of_changeset_init(&php_slot->ocs);
+	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
+	if (ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
+			 ret);
+		goto free_dt;
+	}
+
+	php_slot->dn->child = NULL;
+	ret = of_changeset_apply(&php_slot->ocs);
+	if (ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
+			 ret);
+		goto destroy_changeset;
+	}
+
+	/* Add device node firmware data */
+	pnv_php_add_pdns(php_slot);
+	php_slot->fdt = fdt;
+	php_slot->dt  = dt;
+	goto out;
+
+destroy_changeset:
+	of_changeset_destroy(&php_slot->ocs);
+free_dt:
+	kfree(dt);
+	php_slot->dn->child = NULL;
+free_fdt:
+	kfree(fdt);
+free_fdt1:
+	kfree(fdt1);
+error:
+	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
+out:
+	/* Confirm status change */
+	php_slot->power_state_confirmed = confirm;
+	wake_up_interruptible(&php_slot->queue);
+}
+
+static void pnv_php_work(struct work_struct *data)
+{
+	struct pnv_php_slot *php_slot = container_of(data,
+						     struct pnv_php_slot,
+						     work);
+	uint64_t event = be64_to_cpu(php_slot->msg->params[0]);
+
+	if (event == OPAL_PCI_SLOT_POWER_OFF)
+		pnv_php_handle_poweroff(php_slot);
+	else
+		pnv_php_handle_poweron(php_slot);
+
+	pnv_php_put_slot(php_slot);
+}
+
+static int pnv_php_handle_msg(struct notifier_block *nb,
+			      unsigned long type,
+			      void *message)
+{
+	phandle h;
+	struct device_node *dn;
+	struct pnv_php_slot *php_slot;
+	struct opal_msg *msg = message;
+
+	if (type != OPAL_MSG_PCI_HOTPLUG) {
+		pr_warn("%s: Invalid message %ld received!\n",
+			__func__, type);
+		return NOTIFY_DONE;
+	}
+
+	h = (phandle)be64_to_cpu(msg->params[1]);
+	dn = of_find_node_by_phandle(h);
+	if (!dn) {
+		pr_warn("%s: No device node for phandle 0x%x\n",
+			__func__, h);
+		return NOTIFY_DONE;
+	}
+
+	php_slot = pnv_php_find_slot(dn);
+	if (!php_slot) {
+		pr_warn("%s: No slot found for node <%s>\n",
+			__func__, of_node_full_name(dn));
+		of_node_put(dn);
+		return NOTIFY_DONE;
+	}
+
+	of_node_put(dn);
+	php_slot->msg = msg;
+	schedule_work(&php_slot->work);
+	return NOTIFY_OK;
+}
+
+static int pnv_php_set_power_state(struct hotplug_slot *slot, u8 state)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	int ret;
+
+	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
+	ret = pnv_pci_set_power_state(php_slot->id, state);
+	if (ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d powering %s slot\n",
+			 ret, state ? "on" : "off");
+		return ret;
+	}
+
+	/* Continue to PCI probing after finalized device-tree. The
+	 * device-tree might have been updated completely at this
+	 * point. Thus we don't have to wait forever.
+	 */
+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
+		return 0;
+
+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
+		return -EBUSY;
+
+	/* Wait for firmware to add or remove device sub-tree. When it's done,
+	 * one signal is received from firmware.
+	 */
+	ret = wait_event_timeout(php_slot->queue,
+				 php_slot->power_state_confirmed, 10 * HZ);
+	if (!ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d waiting for power-%s\n",
+			 ret, state ? "on" : "off");
+		return -EBUSY;
+	}
+
+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
+		return 0;
+
+	dev_warn(&php_slot->pdev->dev, "Error status %d for power-%s\n",
+		 php_slot->power_state_confirmed, state ? "on" : "off");
+	return -EBUSY;
+}
+
+static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	uint8_t power_state;
+	int ret;
+
+	/*
+	 * Retrieve power status from firmware. If we fail
+	 * getting that, the power status fails back to
+	 * be on.
+	 */
+	ret = pnv_pci_get_power_state(php_slot->id, &power_state);
+	if (ret) {
+		*state = OPAL_PCI_SLOT_POWER_ON;
+		dev_warn(&php_slot->pdev->dev, "Error %d getting power status\n",
+			 ret);
+	} else {
+		*state = power_state;
+		slot->info->power_status = power_state;
+	}
+
+	return 0;
+}
+
+static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	uint8_t presence;
+	int ret;
+
+	/*
+	 * Retrieve presence status from firmware. If we can't
+	 * get that, it will fail back to be empty.
+	 */
+	ret = pnv_pci_get_presence_state(php_slot->id, &presence);
+	if (ret >= 0) {
+		*state = presence;
+		slot->info->adapter_status = presence;
+		ret = 0;
+	} else {
+		*state = OPAL_PCI_SLOT_EMPTY;
+		dev_warn(&php_slot->pdev->dev, "Error %d getting presence\n",
+			 ret);
+	}
+
+	return ret;
+}
+
+static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
+{
+	/* FIXME: Make it real once firmware supports it */
+	slot->info->attention_status = state;
+
+	return 0;
+}
+
+static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
+{
+	struct hotplug_slot *slot = &php_slot->slot;
+	uint8_t presence, power_status;
+	int ret;
+
+	/* Check if the slot has been configured */
+	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
+		return 0;
+
+	/* Retrieve slot presence status */
+	ret = pnv_php_get_adapter_state(slot, &presence);
+	if (ret)
+		return ret;
+
+	/* Proceed if there have nothing behind the slot */
+	if (presence == OPAL_PCI_SLOT_EMPTY)
+		goto scan;
+
+	/*
+	 * If the power suply to the slot is off, we can't detect
+	 * adapter presence state. That means we have to turn the
+	 * slot on before going to probe slot's presence state.
+	 *
+	 * On the first time, we don't change the power status to
+	 * boost system boot with assumption that the firmware
+	 * supplies consistent slot power status: empty slot always
+	 * has its power off and non-empty slot has its power on.
+	 */
+	if (!php_slot->power_state_check) {
+		php_slot->power_state_check = true;
+
+		ret = pnv_php_get_power_state(slot, &power_status);
+		if (ret)
+			return ret;
+
+		if (power_status != OPAL_PCI_SLOT_POWER_ON)
+			return 0;
+	}
+
+	/* Check the power status. Scan the slot if that's already on */
+	ret = pnv_php_get_power_state(slot, &power_status);
+	if (ret)
+		return ret;
+
+	if (power_status == OPAL_PCI_SLOT_POWER_ON)
+		goto scan;
+
+	/* Power is off, turn it on and then scan the slot */
+	ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+	if (ret)
+		return ret;
+
+scan:
+	if (presence == OPAL_PCI_SLOT_PRESENT) {
+		if (rescan) {
+			pci_lock_rescan_remove();
+			pci_add_pci_devices(php_slot->bus);
+			pci_unlock_rescan_remove();
+		}
+
+		/* Rescan for child hotpluggable slots */
+		php_slot->state = PNV_PHP_STATE_POPULATED;
+		if (rescan)
+			pnv_php_register(php_slot->dn);
+	} else {
+		php_slot->state = PNV_PHP_STATE_POPULATED;
+	}
+
+	return 0;
+}
+
+static int pnv_php_enable_slot(struct hotplug_slot *slot)
+{
+	struct pnv_php_slot *php_slot = container_of(slot,
+						     struct pnv_php_slot, slot);
+
+	return pnv_php_enable(php_slot, true);
+}
+
+static int pnv_php_disable_slot(struct hotplug_slot *slot)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	uint8_t power_state;
+	int ret;
+
+	if (php_slot->state != PNV_PHP_STATE_POPULATED)
+		return 0;
+
+	/* Remove all devices behind the slot */
+	pci_lock_rescan_remove();
+	pci_remove_pci_devices(php_slot->bus);
+	pci_unlock_rescan_remove();
+
+	/* Detach the child hotpluggable slots */
+	pnv_php_unregister(php_slot->dn);
+
+	/*
+	 * Check the power status and turn it off if necessary. If we
+	 * fail to get the power status, the power will be forced to
+	 * be off.
+	 */
+	ret = pnv_php_get_power_state(slot, &power_state);
+	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
+		ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_OFF);
+		if (ret)
+			dev_warn(&php_slot->pdev->dev, "Error %d powering off\n",
+				 ret);
+	}
+
+	/* Update slot state */
+	php_slot->state = PNV_PHP_STATE_REGISTERED;
+	return 0;
+}
+
+static struct hotplug_slot_ops php_slot_ops = {
+	.get_power_status	= pnv_php_get_power_state,
+	.get_adapter_status	= pnv_php_get_adapter_state,
+	.set_attention_status	= pnv_php_set_attention_state,
+	.enable_slot		= pnv_php_enable_slot,
+	.disable_slot		= pnv_php_disable_slot,
+};
+
+static void pnv_php_release(struct hotplug_slot *slot)
+{
+	struct pnv_php_slot *php_slot = slot->private;
+	unsigned long flags;
+
+	/* Remove from global or child list */
+	spin_lock_irqsave(&pnv_php_lock, flags);
+	list_del(&php_slot->link);
+	spin_unlock_irqrestore(&pnv_php_lock, flags);
+
+	/* Detach from parent */
+	pnv_php_put_slot(php_slot);
+	pnv_php_put_slot(php_slot->parent);
+}
+
+static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
+{
+	struct device_node *parent = dn;
+	const __be64 *prop64;
+	const __be32 *prop32;
+
+	/*
+	 * The hotpluggable slot always has a compound Id, which
+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
+	 * number, and compound indicator
+	 */
+	*id = (0x1ul << 63);
+
+	/* Bus/Slot/Function number */
+	prop32 = of_get_property(dn, "reg", NULL);
+	if (!prop32)
+		return -ENXIO;
+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
+
+	/* PHB Id */
+	while ((parent = of_get_parent(parent))) {
+		if (!PCI_DN(parent)) {
+			of_node_put(parent);
+			break;
+		}
+
+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
+			of_node_put(parent);
+			continue;
+		}
+
+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
+		if (!prop64) {
+			of_node_put(parent);
+			return -ENXIO;
+		}
+
+		*id |= be64_to_cpup(prop64);
+		of_node_put(parent);
+		return 0;
+	}
+
+	return -ENODEV;
+}
+
+static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
+{
+	struct pnv_php_slot *php_slot;
+	struct pci_bus *bus;
+	const char *label;
+	uint64_t id;
+
+	label = of_get_property(dn, "ibm,slot-label", NULL);
+	if (!label)
+		return NULL;
+
+	if (pnv_php_get_slot_id(dn, &id))
+		return NULL;
+
+	bus = pci_find_bus_by_node(dn);
+	if (!bus)
+		return NULL;
+
+	php_slot = kzalloc(sizeof(*php_slot), GFP_KERNEL);
+	if (!php_slot)
+		return NULL;
+
+	php_slot->name = kstrdup(label, GFP_KERNEL);
+	if (!php_slot->name) {
+		kfree(php_slot);
+		return NULL;
+	}
+
+	if (dn->child && PCI_DN(dn->child))
+		php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
+	else
+		php_slot->slot_no = -1;   /* Placeholder slot */
+
+	kref_init(&php_slot->kref);
+	php_slot->state	                = PNV_PHP_STATE_INITIALIZED;
+	php_slot->dn	                = dn;
+	php_slot->pdev	                = bus->self;
+	php_slot->bus	                = bus;
+	php_slot->id	                = id;
+	php_slot->power_state_check     = false;
+	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
+	php_slot->slot.ops              = &php_slot_ops;
+	php_slot->slot.info             = &php_slot->slot_info;
+	php_slot->slot.release          = pnv_php_release;
+	php_slot->slot.private          = php_slot;
+
+	INIT_WORK(&php_slot->work, pnv_php_work);
+	init_waitqueue_head(&php_slot->queue);
+	INIT_LIST_HEAD(&php_slot->children);
+	INIT_LIST_HEAD(&php_slot->link);
+
+	return php_slot;
+}
+
+static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
+{
+	struct pnv_php_slot *parent;
+	struct device_node *dn = php_slot->dn;
+	unsigned long flags;
+	int ret;
+
+	/* Check if the slot is registered or not */
+	parent = pnv_php_find_slot(php_slot->dn);
+	if (parent) {
+		pnv_php_put_slot(parent);
+		return -EEXIST;
+	}
+
+	/* Register PCI slot */
+	ret = pci_hp_register(&php_slot->slot, php_slot->bus,
+			      php_slot->slot_no, php_slot->name);
+	if (ret) {
+		dev_warn(&php_slot->pdev->dev, "Error %d registering slot\n",
+			 ret);
+		return ret;
+	}
+
+	/* Attach to the parent's child list or global list */
+	while ((dn = of_get_parent(dn))) {
+		if (!PCI_DN(dn)) {
+			of_node_put(dn);
+			break;
+		}
+
+		parent = pnv_php_find_slot(dn);
+		if (parent) {
+			of_node_put(dn);
+			break;
+		}
+
+		of_node_put(dn);
+	}
+
+	spin_lock_irqsave(&pnv_php_lock, flags);
+	php_slot->parent = parent;
+	if (parent)
+		list_add_tail(&php_slot->link, &parent->children);
+	else
+		list_add_tail(&php_slot->link, &pnv_php_slot_list);
+	spin_unlock_irqrestore(&pnv_php_lock, flags);
+
+	php_slot->state = PNV_PHP_STATE_REGISTERED;
+	return 0;
+}
+
+static int pnv_php_register_one(struct device_node *dn)
+{
+	struct pnv_php_slot *php_slot;
+	const __be32 *prop32;
+	int ret;
+
+	/* Check if it's hotpluggable slot */
+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return -ENXIO;
+
+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return -ENXIO;
+
+	php_slot = pnv_php_alloc_slot(dn);
+	if (!php_slot)
+		return -ENODEV;
+
+	ret = pnv_php_register_slot(php_slot);
+	if (ret)
+		goto free_slot;
+
+	ret = pnv_php_enable(php_slot, false);
+	if (ret)
+		goto unregister_slot;
+
+	return 0;
+
+unregister_slot:
+	pnv_php_unregister_one(php_slot->dn);
+free_slot:
+	pnv_php_put_slot(php_slot);
+	return ret;
+}
+
+static void pnv_php_register(struct device_node *dn)
+{
+	struct device_node *child;
+
+	/*
+	 * The parent slots should be registered before their
+	 * child slots.
+	 */
+	for_each_child_of_node(dn, child) {
+		pnv_php_register_one(child);
+		pnv_php_register(child);
+	}
+}
+
+static void pnv_php_unregister_one(struct device_node *dn)
+{
+	struct pnv_php_slot *php_slot;
+
+	php_slot = pnv_php_find_slot(dn);
+	if (!php_slot)
+		return;
+
+	pnv_php_put_slot(php_slot);
+	pci_hp_deregister(&php_slot->slot);
+}
+
+static void pnv_php_unregister(struct device_node *dn)
+{
+	struct device_node *child;
+
+	/* The child slots should go before their parent slots */
+	for_each_child_of_node(dn, child) {
+		pnv_php_unregister(child);
+		pnv_php_unregister_one(child);
+	}
+}
+
+static struct notifier_block php_msg_nb = {
+	.notifier_call	= pnv_php_handle_msg,
+	.next		= NULL,
+	.priority	= 0,
+};
+
+static int __init pnv_php_init(void)
+{
+	struct device_node *dn;
+	int ret;
+
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+
+	/* Register hotplug message handler */
+	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
+	if (ret) {
+		pr_warn("%s: Error %d registering hotplug notifier\n",
+			__func__, ret);
+		return ret;
+	}
+
+	/* Scan PHB nodes and their children */
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		pnv_php_register(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		pnv_php_register(dn);
+
+	return 0;
+}
+
+static void __exit pnv_php_exit(void)
+{
+	struct device_node *dn;
+
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		pnv_php_unregister(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		pnv_php_unregister(dn);
+
+	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
+}
+
+module_init(pnv_php_init);
+module_exit(pnv_php_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops
  2016-02-17  3:43 ` [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
@ 2016-02-17  4:18   ` Andrew Donnellan
       [not found]   ` <1455680668-23298-4-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  1 sibling, 0 replies; 174+ messages in thread
From: Andrew Donnellan @ 2016-02-17  4:18 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: devicetree, aik, linux-pci, grant.likely, robherring2, bhelgaas, dja

On 17/02/16 14:43, Gavin Shan wrote:
> Each PHB has one instance of "struct pci_controller_ops", which
> includes various callbacks called by PCI subsystem. In the definition
> of this struct, some callbacks have explicit names for its arguments,
> but the left don't have.
>
> This adds all explicit names of the arguments to the callbacks in
> "struct pci_controller_ops" so that the code looks consistent.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Reviewed-by: Daniel Axtens <dja@axtens.net>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset()
  2016-02-17  3:44 ` [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
@ 2016-02-17  4:35   ` Andrew Donnellan
  2016-04-19  8:49   ` Alexey Kardashevskiy
  1 sibling, 0 replies; 174+ messages in thread
From: Andrew Donnellan @ 2016-02-17  4:35 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: devicetree, aik, linux-pci, grant.likely, robherring2, bhelgaas, dja

On 17/02/16 14:44, Gavin Shan wrote:
> This drops unnecessary nested if statements in pnv_eeh_reset() to
> improve the code readability. After the changes, the unused local
> variable "ret" is dropped as well. No logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

This looks good to me.

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 04/45] powerpc/powernv: Cleanup on pci_controller_ops instances
  2016-02-17  3:43 ` [PATCH v8 04/45] powerpc/powernv: Cleanup on pci_controller_ops instances Gavin Shan
@ 2016-02-17  4:38   ` Andrew Donnellan
  0 siblings, 0 replies; 174+ messages in thread
From: Andrew Donnellan @ 2016-02-17  4:38 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: devicetree, aik, linux-pci, grant.likely, robherring2, bhelgaas, dja

On 17/02/16 14:43, Gavin Shan wrote:
> This cleans up on below data struct instances to use tab instead of
> space indent of statement to avoid complains from scripts/checkpatch.pl.
> No logical changes introduced.
>
>    @pnv_pci_ioda_controller_ops
>    @pnv_npu_ioda_controller_ops
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Reviewed-by: Daniel Axtens <dja@axtens.net>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 40/45] drivers/of: Split unflatten_dt_node()
  2016-02-17  3:44 ` [PATCH v8 40/45] drivers/of: Split unflatten_dt_node() Gavin Shan
@ 2016-02-17 14:30   ` Rob Herring
  2016-04-20  2:38     ` Gavin Shan
  2016-05-02  2:02     ` Gavin Shan
  0 siblings, 2 replies; 174+ messages in thread
From: Rob Herring @ 2016-02-17 14:30 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, Benjamin Herrenschmidt,
	Michael Ellerman, aik, dja, Bjorn Helgaas, Grant Likely

On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> The function unflatten_dt_node() is called recursively to unflatten
> device nodes and properties in the FDT blob. It looks complicated
> and hard to be understood.
>
> This splits the function into 3 functions: populate_properties(),
> populate_node() and unflatten_dt_node(). populate_properties(),
> which is called by populate_node(), creates properties for the
> indicated device node. The later one creates the device nodes
> from FDT blob. populate_node() gets the offset in FDT blob for
> next device nodes and then calls populate_node(). No logical
> changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c | 249 ++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 147 insertions(+), 102 deletions(-)

One nit, otherwise:

Acked-by: Rob Herring <robh@kernel.org>

[...]

> +               /* And we process the "ibm,phandle" property
> +                * used in pSeries dynamic device tree
> +                * stuff
> +                */
> +               if (!strcmp(pname, "ibm,phandle"))
> +                       np->phandle = be32_to_cpup(val);
> +
> +               pp->name   = (char *)pname;
> +               pp->length = sz;
> +               pp->value  = (__be32 *)val;

This cast should not be needed.

> +               *pprev     = pp;
> +               pprev      = &pp->next;
> +       }

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 41/45] drivers/of: Avoid recursively calling unflatten_dt_node()
  2016-02-17  3:44 ` [PATCH v8 41/45] drivers/of: Avoid recursively calling unflatten_dt_node() Gavin Shan
@ 2016-02-17 14:53     ` Rob Herring
  0 siblings, 0 replies; 174+ messages in thread
From: Rob Herring @ 2016-02-17 14:53 UTC (permalink / raw)
  To: Gavin Shan
  Cc: devicetree, aik, linux-pci, Grant Likely, Bjorn Helgaas,
	linuxppc-dev, dja

On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> In current implementation, unflatten_dt_node() is called recursively
> to unflatten device nodes in FDT blob. It's stress to limited stack
> capacity, especially to adopt the function to unflatten device sub-tree
> that possibly has multiple root nodes. In that case, we runs out of
> stack and the system can't boot up successfully.
>
> In order to reuse the function to unflatten device sub-tree, this avoids
> calling the function recursively, meaning the device nodes are unflattened
> in one call on unflatten_dt_node(): two arrays are introduced to track the
> parent path size and the device node of current level of depth, which will
> be used by the device node on next level of depth to be unflattened. All
> device nodes in more than 64 level of depth are dropped and hopefully,
> the system can boot up successfully with the partial device-tree.
>
> Also, the parameter "poffset" and "fpsize" are unused and dropped and the
> parameter "dryrun" is figured out from "mem == NULL". Besides, the return
> value of the function is changed to indicate the size of memory consumed by
> the unflatten device tree or error code.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c | 122 +++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 74 insertions(+), 48 deletions(-)

Acked-by: Rob Herring <robh@kernel.org>
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 41/45] drivers/of: Avoid recursively calling unflatten_dt_node()
@ 2016-02-17 14:53     ` Rob Herring
  0 siblings, 0 replies; 174+ messages in thread
From: Rob Herring @ 2016-02-17 14:53 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, Benjamin Herrenschmidt,
	Michael Ellerman, aik, dja, Bjorn Helgaas, Grant Likely

On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> In current implementation, unflatten_dt_node() is called recursively
> to unflatten device nodes in FDT blob. It's stress to limited stack
> capacity, especially to adopt the function to unflatten device sub-tree
> that possibly has multiple root nodes. In that case, we runs out of
> stack and the system can't boot up successfully.
>
> In order to reuse the function to unflatten device sub-tree, this avoids
> calling the function recursively, meaning the device nodes are unflattened
> in one call on unflatten_dt_node(): two arrays are introduced to track the
> parent path size and the device node of current level of depth, which will
> be used by the device node on next level of depth to be unflattened. All
> device nodes in more than 64 level of depth are dropped and hopefully,
> the system can boot up successfully with the partial device-tree.
>
> Also, the parameter "poffset" and "fpsize" are unused and dropped and the
> parameter "dryrun" is figured out from "mem == NULL". Besides, the return
> value of the function is changed to indicate the size of memory consumed by
> the unflatten device tree or error code.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c | 122 +++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 74 insertions(+), 48 deletions(-)

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
  2016-02-17  3:44     ` Gavin Shan
@ 2016-02-17 14:59         ` Rob Herring
  -1 siblings, 0 replies; 174+ messages in thread
From: Rob Herring @ 2016-02-17 14:59 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
	Michael Ellerman, aik-sLpHqDYs0B2HXe+LvDLADg,
	dja-Yfaxwxk/+vWsTnJN9+BGXg, Bjorn Helgaas, Grant Likely

On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> This renames unflatten_dt_node() to unflatten_dt_nodes() as it
> populates multiple device nodes from FDT blob. No logical changes
> introduced.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  drivers/of/fdt.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)

Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

I'm happy to take patches 40-42 for 4.6 if the rest of the series
doesn't go in given they fix a separate problem. I just need to know
soon (or at least they need to go into -next soon).

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
@ 2016-02-17 14:59         ` Rob Herring
  0 siblings, 0 replies; 174+ messages in thread
From: Rob Herring @ 2016-02-17 14:59 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, Benjamin Herrenschmidt,
	Michael Ellerman, aik, dja, Bjorn Helgaas, Grant Likely

On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> This renames unflatten_dt_node() to unflatten_dt_nodes() as it
> populates multiple device nodes from FDT blob. No logical changes
> introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)

Acked-by: Rob Herring <robh@kernel.org>

I'm happy to take patches 40-42 for 4.6 if the rest of the series
doesn't go in given they fix a separate problem. I just need to know
soon (or at least they need to go into -next soon).

Rob

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree()
  2016-02-17  3:44 ` [PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree() Gavin Shan
@ 2016-02-17 15:00   ` Rob Herring
  2016-02-17 15:58     ` Jyri Sarha
  1 sibling, 0 replies; 174+ messages in thread
From: Rob Herring @ 2016-02-17 15:00 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, Benjamin Herrenschmidt,
	Michael Ellerman, aik, dja, Bjorn Helgaas, Grant Likely,
	Jyri Sarha

On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> This adds one more argument to of_fdt_unflatten_tree() to specify
> the parent node of the FDT blob that is going to be unflattened.
> In the result, the function can be used to unflatten FDT blob that
> represents device sub-tree in PowerNV PCI hotplug driver.
>
> Cc: Jyri Sarha <jsarha@ti.com>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c |  2 +-
>  drivers/of/fdt.c                             | 14 ++++++++++----
>  drivers/of/unittest.c                        |  2 +-
>  include/linux/of_fdt.h                       |  1 +
>  4 files changed, 13 insertions(+), 6 deletions(-)

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree()
  2016-02-17  3:44 ` [PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree() Gavin Shan
@ 2016-02-17 15:58     ` Jyri Sarha
  2016-02-17 15:58     ` Jyri Sarha
  1 sibling, 0 replies; 174+ messages in thread
From: Jyri Sarha @ 2016-02-17 15:58 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely

On 02/17/16 05:44, Gavin Shan wrote:
> This adds one more argument to of_fdt_unflatten_tree() to specify
> the parent node of the FDT blob that is going to be unflattened.
> In the result, the function can be used to unflatten FDT blob that
> represents device sub-tree in PowerNV PCI hotplug driver.
>
> Cc: Jyri Sarha <jsarha@ti.com>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c |  2 +-

Acked-by: Jyri Sarha <jsarha@ti.com>

>   drivers/of/fdt.c                             | 14 ++++++++++----
>   drivers/of/unittest.c                        |  2 +-
>   include/linux/of_fdt.h                       |  1 +
>   4 files changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> index 106679b..f9c79da 100644
> --- a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> +++ b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> @@ -157,7 +157,7 @@ struct device_node * __init tilcdc_get_overlay(struct kfree_table *kft)
>   	if (!overlay_data || kfree_table_add(kft, overlay_data))
>   		return NULL;
>
> -	of_fdt_unflatten_tree(overlay_data, &overlay);
> +	of_fdt_unflatten_tree(overlay_data, NULL, &overlay);
>   	if (!overlay) {
>   		pr_warn("%s: Unfattening overlay tree failed\n", __func__);
>   		return NULL;
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 3fc9a30..16a1ba5 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -450,11 +450,13 @@ static int unflatten_dt_nodes(const void *blob,
>    * pointers of the nodes so the normal device-tree walking functions
>    * can be used.
>    * @blob: The blob to expand
> + * @dad: Parent device node
>    * @mynodes: The device_node tree created by the call
>    * @dt_alloc: An allocator that provides a virtual address to memory
>    * for the resulting tree
>    */
>   static void __unflatten_device_tree(const void *blob,
> +			     struct device_node *dad,
>   			     struct device_node **mynodes,
>   			     void * (*dt_alloc)(u64 size, u64 align))
>   {
> @@ -479,7 +481,7 @@ static void __unflatten_device_tree(const void *blob,
>   	}
>
>   	/* First pass, scan for size */
> -	size = unflatten_dt_nodes(blob, NULL, NULL, NULL);
> +	size = unflatten_dt_nodes(blob, NULL, dad, NULL);
>   	if (size < 0)
>   		return;
>
> @@ -495,7 +497,7 @@ static void __unflatten_device_tree(const void *blob,
>   	pr_debug("  unflattening %p...\n", mem);
>
>   	/* Second pass, do actual unflattening */
> -	unflatten_dt_nodes(blob, mem, NULL, mynodes);
> +	unflatten_dt_nodes(blob, mem, dad, mynodes);
>   	if (be32_to_cpup(mem + size) != 0xdeadbeef)
>   		pr_warning("End of tree marker overwritten: %08x\n",
>   			   be32_to_cpup(mem + size));
> @@ -512,6 +514,9 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
>
>   /**
>    * of_fdt_unflatten_tree - create tree of device_nodes from flat blob
> + * @blob: Flat device tree blob
> + * @dad: Parent device node
> + * @mynodes: The device tree created by the call
>    *
>    * unflattens the device-tree passed by the firmware, creating the
>    * tree of struct device_node. It also fills the "name" and "type"
> @@ -519,10 +524,11 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
>    * can be used.
>    */
>   void of_fdt_unflatten_tree(const unsigned long *blob,
> +			struct device_node *dad,
>   			struct device_node **mynodes)
>   {
>   	mutex_lock(&of_fdt_unflatten_mutex);
> -	__unflatten_device_tree(blob, mynodes, &kernel_tree_alloc);
> +	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
>   	mutex_unlock(&of_fdt_unflatten_mutex);
>   }
>   EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
> @@ -1180,7 +1186,7 @@ bool __init early_init_dt_scan(void *params)
>    */
>   void __init unflatten_device_tree(void)
>   {
> -	__unflatten_device_tree(initial_boot_params, &of_root,
> +	__unflatten_device_tree(initial_boot_params, NULL, &of_root,
>   				early_init_dt_alloc_memory_arch);
>
>   	/* Get pointer to "/chosen" and "/aliases" nodes for use everywhere */
> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
> index 979b6e4..ec36f93 100644
> --- a/drivers/of/unittest.c
> +++ b/drivers/of/unittest.c
> @@ -921,7 +921,7 @@ static int __init unittest_data_add(void)
>   			"not running tests\n", __func__);
>   		return -ENOMEM;
>   	}
> -	of_fdt_unflatten_tree(unittest_data, &unittest_data_node);
> +	of_fdt_unflatten_tree(unittest_data, NULL, &unittest_data_node);
>   	if (!unittest_data_node) {
>   		pr_warn("%s: No tree to attach; not running tests\n", __func__);
>   		return -ENODATA;
> diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
> index df9ef38..3644960 100644
> --- a/include/linux/of_fdt.h
> +++ b/include/linux/of_fdt.h
> @@ -38,6 +38,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
>   extern int of_fdt_match(const void *blob, unsigned long node,
>   			const char *const *compat);
>   extern void of_fdt_unflatten_tree(const unsigned long *blob,
> +			       struct device_node *dad,
>   			       struct device_node **mynodes);
>
>   /* TBD: Temporary export of fdt globals - remove when code fully merged */
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree()
@ 2016-02-17 15:58     ` Jyri Sarha
  0 siblings, 0 replies; 174+ messages in thread
From: Jyri Sarha @ 2016-02-17 15:58 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, dja, bhelgaas,
	robherring2, grant.likely

On 02/17/16 05:44, Gavin Shan wrote:
> This adds one more argument to of_fdt_unflatten_tree() to specify
> the parent node of the FDT blob that is going to be unflattened.
> In the result, the function can be used to unflatten FDT blob that
> represents device sub-tree in PowerNV PCI hotplug driver.
>
> Cc: Jyri Sarha <jsarha@ti.com>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c |  2 +-

Acked-by: Jyri Sarha <jsarha@ti.com>

>   drivers/of/fdt.c                             | 14 ++++++++++----
>   drivers/of/unittest.c                        |  2 +-
>   include/linux/of_fdt.h                       |  1 +
>   4 files changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> index 106679b..f9c79da 100644
> --- a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> +++ b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> @@ -157,7 +157,7 @@ struct device_node * __init tilcdc_get_overlay(struct kfree_table *kft)
>   	if (!overlay_data || kfree_table_add(kft, overlay_data))
>   		return NULL;
>
> -	of_fdt_unflatten_tree(overlay_data, &overlay);
> +	of_fdt_unflatten_tree(overlay_data, NULL, &overlay);
>   	if (!overlay) {
>   		pr_warn("%s: Unfattening overlay tree failed\n", __func__);
>   		return NULL;
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 3fc9a30..16a1ba5 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -450,11 +450,13 @@ static int unflatten_dt_nodes(const void *blob,
>    * pointers of the nodes so the normal device-tree walking functions
>    * can be used.
>    * @blob: The blob to expand
> + * @dad: Parent device node
>    * @mynodes: The device_node tree created by the call
>    * @dt_alloc: An allocator that provides a virtual address to memory
>    * for the resulting tree
>    */
>   static void __unflatten_device_tree(const void *blob,
> +			     struct device_node *dad,
>   			     struct device_node **mynodes,
>   			     void * (*dt_alloc)(u64 size, u64 align))
>   {
> @@ -479,7 +481,7 @@ static void __unflatten_device_tree(const void *blob,
>   	}
>
>   	/* First pass, scan for size */
> -	size = unflatten_dt_nodes(blob, NULL, NULL, NULL);
> +	size = unflatten_dt_nodes(blob, NULL, dad, NULL);
>   	if (size < 0)
>   		return;
>
> @@ -495,7 +497,7 @@ static void __unflatten_device_tree(const void *blob,
>   	pr_debug("  unflattening %p...\n", mem);
>
>   	/* Second pass, do actual unflattening */
> -	unflatten_dt_nodes(blob, mem, NULL, mynodes);
> +	unflatten_dt_nodes(blob, mem, dad, mynodes);
>   	if (be32_to_cpup(mem + size) != 0xdeadbeef)
>   		pr_warning("End of tree marker overwritten: %08x\n",
>   			   be32_to_cpup(mem + size));
> @@ -512,6 +514,9 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
>
>   /**
>    * of_fdt_unflatten_tree - create tree of device_nodes from flat blob
> + * @blob: Flat device tree blob
> + * @dad: Parent device node
> + * @mynodes: The device tree created by the call
>    *
>    * unflattens the device-tree passed by the firmware, creating the
>    * tree of struct device_node. It also fills the "name" and "type"
> @@ -519,10 +524,11 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
>    * can be used.
>    */
>   void of_fdt_unflatten_tree(const unsigned long *blob,
> +			struct device_node *dad,
>   			struct device_node **mynodes)
>   {
>   	mutex_lock(&of_fdt_unflatten_mutex);
> -	__unflatten_device_tree(blob, mynodes, &kernel_tree_alloc);
> +	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
>   	mutex_unlock(&of_fdt_unflatten_mutex);
>   }
>   EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
> @@ -1180,7 +1186,7 @@ bool __init early_init_dt_scan(void *params)
>    */
>   void __init unflatten_device_tree(void)
>   {
> -	__unflatten_device_tree(initial_boot_params, &of_root,
> +	__unflatten_device_tree(initial_boot_params, NULL, &of_root,
>   				early_init_dt_alloc_memory_arch);
>
>   	/* Get pointer to "/chosen" and "/aliases" nodes for use everywhere */
> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
> index 979b6e4..ec36f93 100644
> --- a/drivers/of/unittest.c
> +++ b/drivers/of/unittest.c
> @@ -921,7 +921,7 @@ static int __init unittest_data_add(void)
>   			"not running tests\n", __func__);
>   		return -ENOMEM;
>   	}
> -	of_fdt_unflatten_tree(unittest_data, &unittest_data_node);
> +	of_fdt_unflatten_tree(unittest_data, NULL, &unittest_data_node);
>   	if (!unittest_data_node) {
>   		pr_warn("%s: No tree to attach; not running tests\n", __func__);
>   		return -ENODATA;
> diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
> index df9ef38..3644960 100644
> --- a/include/linux/of_fdt.h
> +++ b/include/linux/of_fdt.h
> @@ -38,6 +38,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
>   extern int of_fdt_match(const void *blob, unsigned long node,
>   			const char *const *compat);
>   extern void of_fdt_unflatten_tree(const unsigned long *blob,
> +			       struct device_node *dad,
>   			       struct device_node **mynodes);
>
>   /* TBD: Temporary export of fdt globals - remove when code fully merged */
>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
  2016-02-17 14:59         ` Rob Herring
  (?)
@ 2016-02-19  3:16         ` Gavin Shan
  2016-03-02  2:40             ` Rob Herring
  -1 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-02-19  3:16 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree,
	Benjamin Herrenschmidt, Michael Ellerman, aik, dja,
	Bjorn Helgaas, Grant Likely

On Wed, Feb 17, 2016 at 08:59:53AM -0600, Rob Herring wrote:
>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> This renames unflatten_dt_node() to unflatten_dt_nodes() as it
>> populates multiple device nodes from FDT blob. No logical changes
>> introduced.
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  drivers/of/fdt.c | 14 +++++++-------
>>  1 file changed, 7 insertions(+), 7 deletions(-)
>
>Acked-by: Rob Herring <robh@kernel.org>
>
>I'm happy to take patches 40-42 for 4.6 if the rest of the series
>doesn't go in given they fix a separate problem. I just need to know
>soon (or at least they need to go into -next soon).
>

Thanks for quick response, Rob. It depends how much comments I will
receive for the powerpc/powernv part. Except that, all parts including
this one have been ack'ed. I can discuss it with Michael Ellerman.
By the way, how soon you need the decision to merge 40-42? If that's
one or two weeks later, I don't think the reivew on the whole series
can be done.

Also, I think you probably can merge 40-44 as they're all about
fdt.c. If they can be merged at one time, I needn't bother (cc)
you again if I need send a updated revision. Thanks for your
review.

Thanks,
Gavin

>Rob
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
  2016-02-19  3:16         ` Gavin Shan
@ 2016-03-02  2:40             ` Rob Herring
  0 siblings, 0 replies; 174+ messages in thread
From: Rob Herring @ 2016-03-02  2:40 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
	Michael Ellerman, aik-sLpHqDYs0B2HXe+LvDLADg,
	dja-Yfaxwxk/+vWsTnJN9+BGXg, Bjorn Helgaas, Grant Likely

On Thu, Feb 18, 2016 at 9:16 PM, Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> On Wed, Feb 17, 2016 at 08:59:53AM -0600, Rob Herring wrote:
>>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
>>> This renames unflatten_dt_node() to unflatten_dt_nodes() as it
>>> populates multiple device nodes from FDT blob. No logical changes
>>> introduced.
>>>
>>> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>> ---
>>>  drivers/of/fdt.c | 14 +++++++-------
>>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>
>>Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>
>>I'm happy to take patches 40-42 for 4.6 if the rest of the series
>>doesn't go in given they fix a separate problem. I just need to know
>>soon (or at least they need to go into -next soon).
>>
>
> Thanks for quick response, Rob. It depends how much comments I will
> receive for the powerpc/powernv part. Except that, all parts including
> this one have been ack'ed. I can discuss it with Michael Ellerman.
> By the way, how soon you need the decision to merge 40-42? If that's
> one or two weeks later, I don't think the reivew on the whole series
> can be done.

Well, it's been 2 weeks now. I need to know this week.

> Also, I think you probably can merge 40-44 as they're all about
> fdt.c. If they can be merged at one time, I needn't bother (cc)
> you again if I need send a updated revision. Thanks for your
> review.

I did not include 43 and 44 as they are only needed for the rest of your series.

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
@ 2016-03-02  2:40             ` Rob Herring
  0 siblings, 0 replies; 174+ messages in thread
From: Rob Herring @ 2016-03-02  2:40 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, Benjamin Herrenschmidt,
	Michael Ellerman, aik, dja, Bjorn Helgaas, Grant Likely

On Thu, Feb 18, 2016 at 9:16 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> On Wed, Feb 17, 2016 at 08:59:53AM -0600, Rob Herring wrote:
>>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>>> This renames unflatten_dt_node() to unflatten_dt_nodes() as it
>>> populates multiple device nodes from FDT blob. No logical changes
>>> introduced.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>  drivers/of/fdt.c | 14 +++++++-------
>>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>
>>Acked-by: Rob Herring <robh@kernel.org>
>>
>>I'm happy to take patches 40-42 for 4.6 if the rest of the series
>>doesn't go in given they fix a separate problem. I just need to know
>>soon (or at least they need to go into -next soon).
>>
>
> Thanks for quick response, Rob. It depends how much comments I will
> receive for the powerpc/powernv part. Except that, all parts including
> this one have been ack'ed. I can discuss it with Michael Ellerman.
> By the way, how soon you need the decision to merge 40-42? If that's
> one or two weeks later, I don't think the reivew on the whole series
> can be done.

Well, it's been 2 weeks now. I need to know this week.

> Also, I think you probably can merge 40-44 as they're all about
> fdt.c. If they can be merged at one time, I needn't bother (cc)
> you again if I need send a updated revision. Thanks for your
> review.

I did not include 43 and 44 as they are only needed for the rest of your series.

Rob

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
  2016-03-02  2:40             ` Rob Herring
  (?)
@ 2016-03-08  0:56             ` Gavin Shan
  2016-03-17 13:31               ` Rob Herring
  -1 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-03-08  0:56 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree,
	Benjamin Herrenschmidt, Michael Ellerman, aik, dja,
	Bjorn Helgaas, Grant Likely

On Tue, Mar 01, 2016 at 08:40:12PM -0600, Rob Herring wrote:
>On Thu, Feb 18, 2016 at 9:16 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> On Wed, Feb 17, 2016 at 08:59:53AM -0600, Rob Herring wrote:
>>>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>>>> This renames unflatten_dt_node() to unflatten_dt_nodes() as it
>>>> populates multiple device nodes from FDT blob. No logical changes
>>>> introduced.
>>>>
>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> ---
>>>>  drivers/of/fdt.c | 14 +++++++-------
>>>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>>
>>>Acked-by: Rob Herring <robh@kernel.org>
>>>
>>>I'm happy to take patches 40-42 for 4.6 if the rest of the series
>>>doesn't go in given they fix a separate problem. I just need to know
>>>soon (or at least they need to go into -next soon).
>>>
>>
>> Thanks for quick response, Rob. It depends how much comments I will
>> receive for the powerpc/powernv part. Except that, all parts including
>> this one have been ack'ed. I can discuss it with Michael Ellerman.
>> By the way, how soon you need the decision to merge 40-42? If that's
>> one or two weeks later, I don't think the reivew on the whole series
>> can be done.
>
>Well, it's been 2 weeks now. I need to know this week.
>
>> Also, I think you probably can merge 40-44 as they're all about
>> fdt.c. If they can be merged at one time, I needn't bother (cc)
>> you again if I need send a updated revision. Thanks for your
>> review.
>
>I did not include 43 and 44 as they are only needed for the rest of your series.
>

Rob, sorry for late reponse. I really expect this series to be merged to 4.6 and
I was checking reviewers' bandwidth to review it. Unfortunately, I didn't receive
any comments except yours until now. That means this series has to miss 4.6. Please
pick/merge 41 and 42 if no body has objection. Thanks again for your time on this.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
  2016-03-08  0:56             ` Gavin Shan
@ 2016-03-17 13:31               ` Rob Herring
  2016-03-17 22:44                 ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Rob Herring @ 2016-03-17 13:31 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, Benjamin Herrenschmidt,
	Michael Ellerman, aik, dja, Bjorn Helgaas, Grant Likely

On Mon, Mar 7, 2016 at 6:56 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> On Tue, Mar 01, 2016 at 08:40:12PM -0600, Rob Herring wrote:
>>On Thu, Feb 18, 2016 at 9:16 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>>> On Wed, Feb 17, 2016 at 08:59:53AM -0600, Rob Herring wrote:
>>>>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>>>>> This renames unflatten_dt_node() to unflatten_dt_nodes() as it
>>>>> populates multiple device nodes from FDT blob. No logical changes
>>>>> introduced.
>>>>>
>>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>> ---
>>>>>  drivers/of/fdt.c | 14 +++++++-------
>>>>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>>>
>>>>Acked-by: Rob Herring <robh@kernel.org>
>>>>
>>>>I'm happy to take patches 40-42 for 4.6 if the rest of the series
>>>>doesn't go in given they fix a separate problem. I just need to know
>>>>soon (or at least they need to go into -next soon).
>>>>
>>>
>>> Thanks for quick response, Rob. It depends how much comments I will
>>> receive for the powerpc/powernv part. Except that, all parts including
>>> this one have been ack'ed. I can discuss it with Michael Ellerman.
>>> By the way, how soon you need the decision to merge 40-42? If that's
>>> one or two weeks later, I don't think the reivew on the whole series
>>> can be done.
>>
>>Well, it's been 2 weeks now. I need to know this week.
>>
>>> Also, I think you probably can merge 40-44 as they're all about
>>> fdt.c. If they can be merged at one time, I needn't bother (cc)
>>> you again if I need send a updated revision. Thanks for your
>>> review.
>>
>>I did not include 43 and 44 as they are only needed for the rest of your series.
>>
>
> Rob, sorry for late reponse. I really expect this series to be merged to 4.6 and
> I was checking reviewers' bandwidth to review it. Unfortunately, I didn't receive
> any comments except yours until now. That means this series has to miss 4.6. Please
> pick/merge 41 and 42 if no body has objection. Thanks again for your time on this.

Too late for 4.6.

Rob

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()
  2016-03-17 13:31               ` Rob Herring
@ 2016-03-17 22:44                 ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-03-17 22:44 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree,
	Benjamin Herrenschmidt, Michael Ellerman, aik, dja,
	Bjorn Helgaas, Grant Likely

On Thu, Mar 17, 2016 at 08:31:16AM -0500, Rob Herring wrote:
>On Mon, Mar 7, 2016 at 6:56 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> On Tue, Mar 01, 2016 at 08:40:12PM -0600, Rob Herring wrote:
>>>On Thu, Feb 18, 2016 at 9:16 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>>>> On Wed, Feb 17, 2016 at 08:59:53AM -0600, Rob Herring wrote:
>>>>>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>>>>>> This renames unflatten_dt_node() to unflatten_dt_nodes() as it
>>>>>> populates multiple device nodes from FDT blob. No logical changes
>>>>>> introduced.
>>>>>>
>>>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  drivers/of/fdt.c | 14 +++++++-------
>>>>>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>>>>
>>>>>Acked-by: Rob Herring <robh@kernel.org>
>>>>>
>>>>>I'm happy to take patches 40-42 for 4.6 if the rest of the series
>>>>>doesn't go in given they fix a separate problem. I just need to know
>>>>>soon (or at least they need to go into -next soon).
>>>>>
>>>>
>>>> Thanks for quick response, Rob. It depends how much comments I will
>>>> receive for the powerpc/powernv part. Except that, all parts including
>>>> this one have been ack'ed. I can discuss it with Michael Ellerman.
>>>> By the way, how soon you need the decision to merge 40-42? If that's
>>>> one or two weeks later, I don't think the reivew on the whole series
>>>> can be done.
>>>
>>>Well, it's been 2 weeks now. I need to know this week.
>>>
>>>> Also, I think you probably can merge 40-44 as they're all about
>>>> fdt.c. If they can be merged at one time, I needn't bother (cc)
>>>> you again if I need send a updated revision. Thanks for your
>>>> review.
>>>
>>>I did not include 43 and 44 as they are only needed for the rest of your series.
>>>
>>
>> Rob, sorry for late reponse. I really expect this series to be merged to 4.6 and
>> I was checking reviewers' bandwidth to review it. Unfortunately, I didn't receive
>> any comments except yours until now. That means this series has to miss 4.6. Please
>> pick/merge 41 and 42 if no body has objection. Thanks again for your time on this.
>
>Too late for 4.6.
>

Yeah, Sorry about that, Rob.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops
  2016-02-17  3:43 ` [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
@ 2016-04-13  5:52       ` Alexey Kardashevskiy
       [not found]   ` <1455680668-23298-4-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  1 sibling, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  5:52 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, dja-Yfaxwxk/+vWsTnJN9+BGXg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> Each PHB has one instance of "struct pci_controller_ops", which
> includes various callbacks called by PCI subsystem. In the definition
> of this struct, some callbacks have explicit names for its arguments,
> but the left don't have.
>
> This adds all explicit names of the arguments to the callbacks in
> "struct pci_controller_ops" so that the code looks consistent.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Reviewed-by: Daniel Axtens <dja-Yfaxwxk/+vWsTnJN9+BGXg@public.gmane.org>

With tiny nit below,

Reviewed-by: Alexey Kardashevskiy <aik-sLpHqDYs0B2HXe+LvDLADg@public.gmane.org>



> ---
>   arch/powerpc/include/asm/pci-bridge.h | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index b688d04..4dd6ef4 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -21,18 +21,19 @@ struct pci_controller_ops {
>   	void		(*dma_dev_setup)(struct pci_dev *dev);
>   	void		(*dma_bus_setup)(struct pci_bus *bus);
>
> -	int		(*probe_mode)(struct pci_bus *);
> +	int		(*probe_mode)(struct pci_bus *bus);
>
>   	/* Called when pci_enable_device() is called. Returns true to
>   	 * allow assignment/enabling of the device. */
> -	bool		(*enable_device_hook)(struct pci_dev *);
> +	bool		(*enable_device_hook)(struct pci_dev *dev);


"pdev" is slightly better as it is of the "pci_dev" type (4130 occurrences 
of "pci_dev *pdev" and just 2833 of "pci_dev *dev" in the current kernel), 
"dev" is for "struct device".




>
> -	void		(*disable_device)(struct pci_dev *);
> +	void		(*disable_device)(struct pci_dev *dev);
>
> -	void		(*release_device)(struct pci_dev *);
> +	void		(*release_device)(struct pci_dev *dev);
>
>   	/* Called during PCI resource reassignment */
> -	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
> +	resource_size_t (*window_alignment)(struct pci_bus *bus,
> +					    unsigned long type);
>   	void		(*setup_bridge)(struct pci_bus *bus,
>   					unsigned long type);
>   	void		(*reset_secondary_bus)(struct pci_dev *dev);
> @@ -46,7 +47,7 @@ struct pci_controller_ops {
>   	int             (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask);
>   	u64		(*dma_get_required_mask)(struct pci_dev *dev);
>
> -	void		(*shutdown)(struct pci_controller *);
> +	void		(*shutdown)(struct pci_controller *hose);
>   };
>
>   /*
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops
@ 2016-04-13  5:52       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  5:52 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> Each PHB has one instance of "struct pci_controller_ops", which
> includes various callbacks called by PCI subsystem. In the definition
> of this struct, some callbacks have explicit names for its arguments,
> but the left don't have.
>
> This adds all explicit names of the arguments to the callbacks in
> "struct pci_controller_ops" so that the code looks consistent.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Reviewed-by: Daniel Axtens <dja@axtens.net>

With tiny nit below,

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>   arch/powerpc/include/asm/pci-bridge.h | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index b688d04..4dd6ef4 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -21,18 +21,19 @@ struct pci_controller_ops {
>   	void		(*dma_dev_setup)(struct pci_dev *dev);
>   	void		(*dma_bus_setup)(struct pci_bus *bus);
>
> -	int		(*probe_mode)(struct pci_bus *);
> +	int		(*probe_mode)(struct pci_bus *bus);
>
>   	/* Called when pci_enable_device() is called. Returns true to
>   	 * allow assignment/enabling of the device. */
> -	bool		(*enable_device_hook)(struct pci_dev *);
> +	bool		(*enable_device_hook)(struct pci_dev *dev);


"pdev" is slightly better as it is of the "pci_dev" type (4130 occurrences 
of "pci_dev *pdev" and just 2833 of "pci_dev *dev" in the current kernel), 
"dev" is for "struct device".




>
> -	void		(*disable_device)(struct pci_dev *);
> +	void		(*disable_device)(struct pci_dev *dev);
>
> -	void		(*release_device)(struct pci_dev *);
> +	void		(*release_device)(struct pci_dev *dev);
>
>   	/* Called during PCI resource reassignment */
> -	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
> +	resource_size_t (*window_alignment)(struct pci_bus *bus,
> +					    unsigned long type);
>   	void		(*setup_bridge)(struct pci_bus *bus,
>   					unsigned long type);
>   	void		(*reset_secondary_bus)(struct pci_dev *dev);
> @@ -46,7 +47,7 @@ struct pci_controller_ops {
>   	int             (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask);
>   	u64		(*dma_get_required_mask)(struct pci_dev *dev);
>
> -	void		(*shutdown)(struct pci_controller *);
> +	void		(*shutdown)(struct pci_controller *hose);
>   };
>
>   /*
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge()
  2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
@ 2016-04-13  5:52       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  5:52 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, dja-Yfaxwxk/+vWsTnJN9+BGXg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This overrides pcibios_setup_bridge() that is called to update PCI
> bridge windows when PCI resource assignment is completed, to assign
> PE and setup various (resource) mapping for the PE in subsequent
> patches.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>



Reviewed-by: Alexey Kardashevskiy <aik-sLpHqDYs0B2HXe+LvDLADg@public.gmane.org>


> ---
>   arch/powerpc/include/asm/pci-bridge.h | 2 ++
>   arch/powerpc/kernel/pci-common.c      | 8 ++++++++
>   2 files changed, 10 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index 9f165e8..b688d04 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -33,6 +33,8 @@ struct pci_controller_ops {
>
>   	/* Called during PCI resource reassignment */
>   	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
> +	void		(*setup_bridge)(struct pci_bus *bus,
> +					unsigned long type);
>   	void		(*reset_secondary_bus)(struct pci_dev *dev);
>
>   #ifdef CONFIG_PCI_MSI
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index 0f7a60f..40df3a5 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -124,6 +124,14 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>   	return 1;
>   }
>
> +void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bus);
> +
> +	if (hose->controller_ops.setup_bridge)
> +		hose->controller_ops.setup_bridge(bus, type);
> +}
> +
>   void pcibios_reset_secondary_bus(struct pci_dev *dev)
>   {
>   	struct pci_controller *phb = pci_bus_to_host(dev->bus);
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge()
@ 2016-04-13  5:52       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  5:52 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This overrides pcibios_setup_bridge() that is called to update PCI
> bridge windows when PCI resource assignment is completed, to assign
> PE and setup various (resource) mapping for the PE in subsequent
> patches.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/include/asm/pci-bridge.h | 2 ++
>   arch/powerpc/kernel/pci-common.c      | 8 ++++++++
>   2 files changed, 10 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index 9f165e8..b688d04 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -33,6 +33,8 @@ struct pci_controller_ops {
>
>   	/* Called during PCI resource reassignment */
>   	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
> +	void		(*setup_bridge)(struct pci_bus *bus,
> +					unsigned long type);
>   	void		(*reset_secondary_bus)(struct pci_dev *dev);
>
>   #ifdef CONFIG_PCI_MSI
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index 0f7a60f..40df3a5 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -124,6 +124,14 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>   	return 1;
>   }
>
> +void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bus);
> +
> +	if (hose->controller_ops.setup_bridge)
> +		hose->controller_ops.setup_bridge(bus, type);
> +}
> +
>   void pcibios_reset_secondary_bus(struct pci_dev *dev)
>   {
>   	struct pci_controller *phb = pci_bus_to_host(dev->bus);
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 05/45] powerpc/powernv: Drop phb->bdfn_to_pe()
  2016-02-17  3:43     ` Gavin Shan
  (?)
@ 2016-04-13  5:53     ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  5:53 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This drops struct pnv_phb::bdfn_to_pe() as nobody uses it.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 9 ---------
>   arch/powerpc/platforms/powernv/pci.h      | 1 -
>   2 files changed, 10 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 524c9c7..10ecd97 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3195,12 +3195,6 @@ static bool pnv_pci_enable_device_hook(struct pci_dev *dev)
>   	return true;
>   }
>
> -static u32 pnv_ioda_bdfn_to_pe(struct pnv_phb *phb, struct pci_bus *bus,
> -			       u32 devfn)
> -{
> -	return phb->ioda.pe_rmap[(bus->number << 8) | devfn];
> -}
> -
>   static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>   {
>   	struct pnv_phb *phb = hose->private_data;
> @@ -3377,9 +3371,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	phb->freeze_pe = pnv_ioda_freeze_pe;
>   	phb->unfreeze_pe = pnv_ioda_unfreeze_pe;
>
> -	/* Setup RID -> PE mapping function */
> -	phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe;
> -
>   	/* Setup TCEs */
>   	phb->dma_dev_setup = pnv_pci_ioda_dma_dev_setup;
>
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 3f814f3..78f035e 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -110,7 +110,6 @@ struct pnv_phb {
>   			 unsigned int is_64, struct msi_msg *msg);
>   	void (*dma_dev_setup)(struct pnv_phb *phb, struct pci_dev *pdev);
>   	void (*fixup_phb)(struct pci_controller *hose);
> -	u32 (*bdfn_to_pe)(struct pnv_phb *phb, struct pci_bus *bus, u32 devfn);
>   	int (*init_m64)(struct pnv_phb *phb);
>   	void (*reserve_m64_pe)(struct pci_bus *bus,
>   			       unsigned long *pe_bitmap, bool all);
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 06/45] powerpc/powernv: Reorder fields in struct pnv_phb
  2016-02-17  3:43 ` [PATCH v8 06/45] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
@ 2016-04-13  5:56   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  5:56 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This moves those fields in struct pnv_phb that are related to PE
> allocation around. No logical change.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>   arch/powerpc/platforms/powernv/pci.h | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 78f035e..f2a1452 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -140,15 +140,14 @@ struct pnv_phb {
>   		unsigned int		io_segsize;
>   		unsigned int		io_pci_base;
>
> -		/* PE allocation bitmap */
> -		unsigned long		*pe_alloc;
> -		/* PE allocation mutex */
> +		/* PE allocation */
>   		struct mutex		pe_alloc_mutex;
> +		unsigned long		*pe_alloc;
> +		struct pnv_ioda_pe	*pe_array;
>
>   		/* M32 & IO segment maps */
>   		unsigned int		*m32_segmap;
>   		unsigned int		*io_segmap;
> -		struct pnv_ioda_pe	*pe_array;
>
>   		/* IRQ chip */
>   		int			irq_chip_init;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 07/45] powerpc/powernv: Rename PE# fields in struct pnv_phb
  2016-02-17  3:43 ` [PATCH v8 07/45] powerpc/powernv: Rename PE# " Gavin Shan
@ 2016-04-13  5:57   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  5:57 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This renames the fields related to PE number in "struct pnv_phb"
> for better reflecting of their usages as Alexey suggested. No
> logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/platforms/powernv/eeh-powernv.c |  2 +-
>   arch/powerpc/platforms/powernv/pci-ioda.c    | 58 ++++++++++++++--------------
>   arch/powerpc/platforms/powernv/pci.c         |  2 +-
>   arch/powerpc/platforms/powernv/pci.h         |  4 +-
>   4 files changed, 33 insertions(+), 33 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 950b3e5..69e41ce 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -75,7 +75,7 @@ static int pnv_eeh_init(void)
>   		 * and P7IOC separately. So we should regard
>   		 * PE#0 as valid for PHB3 and P7IOC.
>   		 */
> -		if (phb->ioda.reserved_pe != 0)
> +		if (phb->ioda.reserved_pe_idx != 0)
>   			eeh_add_flag(EEH_VALID_PE_ZERO);
>
>   		break;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 10ecd97..1d2514f 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -124,7 +124,7 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>
>   static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>   {
> -	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
> +	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
>   		pr_warn("%s: Invalid PE %d on PHB#%x\n",
>   			__func__, pe_no, phb->hose->global_number);
>   		return;
> @@ -144,8 +144,8 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>
>   	do {
>   		pe = find_next_zero_bit(phb->ioda.pe_alloc,
> -					phb->ioda.total_pe, 0);
> -		if (pe >= phb->ioda.total_pe)
> +					phb->ioda.total_pe_num, 0);
> +		if (pe >= phb->ioda.total_pe_num)
>   			return IODA_INVALID_PE;
>   	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>
> @@ -199,13 +199,13 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
>   	 * expected to be 0 or last one of PE capabicity.
>   	 */
>   	r = &phb->hose->mem_resources[1];
> -	if (phb->ioda.reserved_pe == 0)
> +	if (phb->ioda.reserved_pe_idx == 0)
>   		r->start += phb->ioda.m64_segsize;
> -	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
> +	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>   		r->end -= phb->ioda.m64_segsize;
>   	else
>   		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
> -			phb->ioda.reserved_pe);
> +			phb->ioda.reserved_pe_idx);
>
>   	return 0;
>
> @@ -274,7 +274,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   		return IODA_INVALID_PE;
>
>   	/* Allocate bitmap */
> -	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
> +	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>   	pe_alloc = kzalloc(size, GFP_KERNEL);
>   	if (!pe_alloc) {
>   		pr_warn("%s: Out of memory !\n",
> @@ -290,7 +290,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   	 * contributed by its child buses. For the case, we needn't
>   	 * pick M64 dependent PE#.
>   	 */
> -	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
> +	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
>   		kfree(pe_alloc);
>   		return IODA_INVALID_PE;
>   	}
> @@ -301,8 +301,8 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   	 */
>   	master_pe = NULL;
>   	i = -1;
> -	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
> -		phb->ioda.total_pe) {
> +	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe_num, i + 1)) <
> +		phb->ioda.total_pe_num) {
>   		pe = &phb->ioda.pe_array[i];
>
>   		if (!master_pe) {
> @@ -355,7 +355,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>   	hose->mem_offset[1] = res->start - pci_addr;
>
>   	phb->ioda.m64_size = resource_size(res);
> -	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe;
> +	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe_num;
>   	phb->ioda.m64_base = pci_addr;
>
>   	pr_info(" MEM64 0x%016llx..0x%016llx -> 0x%016llx\n",
> @@ -456,7 +456,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
>   	s64 rc;
>
>   	/* Sanity check on PE number */
> -	if (pe_no < 0 || pe_no >= phb->ioda.total_pe)
> +	if (pe_no < 0 || pe_no >= phb->ioda.total_pe_num)
>   		return OPAL_EEH_STOPPED_PERM_UNAVAIL;
>
>   	/*
> @@ -1088,7 +1088,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
>   	 * same GPU get assigned the same PE.
>   	 */
>   	gpu_pdev = pnv_pci_get_gpu_dev(npu_pdev);
> -	for (pe_num = 0; pe_num < phb->ioda.total_pe; pe_num++) {
> +	for (pe_num = 0; pe_num < phb->ioda.total_pe_num; pe_num++) {
>   		pe = &phb->ioda.pe_array[pe_num];
>   		if (!pe->pdev)
>   			continue;
> @@ -1537,9 +1537,9 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   		} else {
>   			mutex_lock(&phb->ioda.pe_alloc_mutex);
>   			*pdn->pe_num_map = bitmap_find_next_zero_area(
> -				phb->ioda.pe_alloc, phb->ioda.total_pe,
> +				phb->ioda.pe_alloc, phb->ioda.total_pe_num,
>   				0, num_vfs, 0);
> -			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
> +			if (*pdn->pe_num_map >= phb->ioda.total_pe_num) {
>   				mutex_unlock(&phb->ioda.pe_alloc_mutex);
>   				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
>   				kfree(pdn->pe_num_map);
> @@ -2858,7 +2858,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>   	pdn->m64_single_mode = false;
>
>   	total_vfs = pci_sriov_get_totalvfs(pdev);
> -	mul = phb->ioda.total_pe;
> +	mul = phb->ioda.total_pe_num;
>   	total_vf_bar_sz = 0;
>
>   	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> @@ -2960,7 +2960,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>   			region.end   = res->end - phb->ioda.io_pci_base;
>   			index = region.start / phb->ioda.io_segsize;
>
> -			while (index < phb->ioda.total_pe &&
> +			while (index < phb->ioda.total_pe_num &&
>   			       region.start <= region.end) {
>   				phb->ioda.io_segmap[index] = pe->pe_number;
>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> @@ -2985,7 +2985,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>   				       phb->ioda.m32_pci_base;
>   			index = region.start / phb->ioda.m32_segsize;
>
> -			while (index < phb->ioda.total_pe &&
> +			while (index < phb->ioda.total_pe_num &&
>   			       region.start <= region.end) {
>   				phb->ioda.m32_segmap[index] = pe->pe_number;
>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> @@ -3300,13 +3300,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   		pr_err("  Failed to map registers !\n");
>
>   	/* Initialize more IODA stuff */
> -	phb->ioda.total_pe = 1;
> +	phb->ioda.total_pe_num = 1;
>   	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
>   	if (prop32)
> -		phb->ioda.total_pe = be32_to_cpup(prop32);
> +		phb->ioda.total_pe_num = be32_to_cpup(prop32);
>   	prop32 = of_get_property(np, "ibm,opal-reserved-pe", NULL);
>   	if (prop32)
> -		phb->ioda.reserved_pe = be32_to_cpup(prop32);
> +		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
>
>   	/* Parse 64-bit MMIO range */
>   	pnv_ioda_parse_m64_window(phb);
> @@ -3315,29 +3315,29 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	/* FW Has already off top 64k of M32 space (MSI space) */
>   	phb->ioda.m32_size += 0x10000;
>
> -	phb->ioda.m32_segsize = phb->ioda.m32_size / phb->ioda.total_pe;
> +	phb->ioda.m32_segsize = phb->ioda.m32_size / phb->ioda.total_pe_num;
>   	phb->ioda.m32_pci_base = hose->mem_resources[0].start - hose->mem_offset[0];
>   	phb->ioda.io_size = hose->pci_io_size;
> -	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe;
> +	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
>   	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
>
>   	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
> -	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
> +	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>   	m32map_off = size;
> -	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
> +	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
>   	if (phb->type == PNV_PHB_IODA1) {
>   		iomap_off = size;
> -		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
> +		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
>   	}
>   	pemap_off = size;
> -	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
> +	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
>   	aux = memblock_virt_alloc(size, 0);
>   	phb->ioda.pe_alloc = aux;
>   	phb->ioda.m32_segmap = aux + m32map_off;
>   	if (phb->type == PNV_PHB_IODA1)
>   		phb->ioda.io_segmap = aux + iomap_off;
>   	phb->ioda.pe_array = aux + pemap_off;
> -	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
> +	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>
>   	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
>   	INIT_LIST_HEAD(&phb->ioda.pe_list);
> @@ -3356,7 +3356,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   #endif
>
>   	pr_info("  %03d (%03d) PE's M32: 0x%x [segment=0x%x]\n",
> -		phb->ioda.total_pe, phb->ioda.reserved_pe,
> +		phb->ioda.total_pe_num, phb->ioda.reserved_pe_idx,
>   		phb->ioda.m32_size, phb->ioda.m32_segsize);
>   	if (phb->ioda.m64_size)
>   		pr_info("                 M64: 0x%lx [segment=0x%lx]\n",
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index f838fcf..a53e4c8 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -380,7 +380,7 @@ static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
>   	 */
>   	pe_no = pdn->pe_number;
>   	if (pe_no == IODA_INVALID_PE) {
> -		pe_no = phb->ioda.reserved_pe;
> +		pe_no = phb->ioda.reserved_pe_idx;
>   	}
>
>   	/*
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index f2a1452..784882a 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -120,8 +120,8 @@ struct pnv_phb {
>
>   	struct {
>   		/* Global bridge info */
> -		unsigned int		total_pe;
> -		unsigned int		reserved_pe;
> +		unsigned int		total_pe_num;
> +		unsigned int		reserved_pe_idx;
>
>   		/* 32-bit MMIO window */
>   		unsigned int		m32_size;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap
  2016-02-17  3:43 ` [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap Gavin Shan
@ 2016-04-13  6:21   ` Alexey Kardashevskiy
  2016-04-13  7:53       ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  6:21 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> There are two arrays for IO and M32 segment maps on every PHB.
> The index of the arrays are segment number and the value stored
> in the corresponding element is PE number, indicating the segment
> is assigned to the PE. Initially, all elements in those two arrays
> are zeroes, meaning all segments are assigned to PE#0. It's wrong.
 >
> This fixes the initial values in the elements of those two arrays
> to IODA_INVALID_PE, meaning all segments aren't assigned to any
> PE.

This is ok.

> In order to use IODA_INVALID_PE (-1) to represent invalid PE
> number, the types of those two arrays are changed from "unsigned int"
> to "int".

"unsigned" can carry (-1) perfectly fine, just add a type cast to 
IODA_INVALID_PE:

#define IODA_INVALID_PE    (unsigned int)(-1)

Using "signed" type for indexes which cannot be negative does not make much 
sense - instead of checking for the upper boundary, you have to check for 
"< 0" too.

OPAL uses unsigned type for PE (uint64_t or uint32_t or uint16_t - this is 
quite funny).

pnv_ioda_pe::pe_number is "unsigned" and this pe_number is the same thing 
as I can see in pnv_ioda_setup_dev_PE().

Some printk() print the PE number as "%x" (which implies "unsigned").


I suggest changing the pci_dn::pe_number type from "int" to "unsigned int" 
to match pnv_ioda_pe::pe_number, in a separate patch. Or do not touch types 
for now.


> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++++++--
>   arch/powerpc/platforms/powernv/pci.h      | 4 ++--
>   2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 1d2514f..44cc5f3 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3239,7 +3239,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>   	const __be64 *prop64;
>   	const __be32 *prop32;
> -	int len;
> +	int i, len;
>   	u64 phb_id;
>   	void *aux;
>   	long rc;
> @@ -3334,8 +3334,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	aux = memblock_virt_alloc(size, 0);
>   	phb->ioda.pe_alloc = aux;
>   	phb->ioda.m32_segmap = aux + m32map_off;
> -	if (phb->type == PNV_PHB_IODA1)
> +	for (i = 0; i < phb->ioda.total_pe_num; i++)
> +		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
> +	if (phb->type == PNV_PHB_IODA1) {
>   		phb->ioda.io_segmap = aux + iomap_off;
> +		for (i = 0; i < phb->ioda.total_pe_num; i++)
> +			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
> +	}
>   	phb->ioda.pe_array = aux + pemap_off;
>   	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 784882a..36c4965 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -146,8 +146,8 @@ struct pnv_phb {
>   		struct pnv_ioda_pe	*pe_array;
>
>   		/* M32 & IO segment maps */
> -		unsigned int		*m32_segmap;
> -		unsigned int		*io_segmap;
> +		int			*m32_segmap;
> +		int			*io_segmap;
>
>   		/* IRQ chip */
>   		int			irq_chip_init;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  2016-02-17  3:43 ` [PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
@ 2016-04-13  6:45   ` Alexey Kardashevskiy
  2016-04-20  0:04     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  6:45 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> The original implementation of pnv_ioda_setup_pe_seg() configures
> IO and M32 segments by separate logics, which can be merged by
> by caching @segmap, @seg_size, @win in advance. This shouldn't
> cause any behavioural changes.
 >
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++++++++++++++-----------------
>   1 file changed, 28 insertions(+), 34 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 44cc5f3..fd7d382 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2940,8 +2940,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>   	struct pnv_phb *phb = hose->private_data;
>   	struct pci_bus_region region;
>   	struct resource *res;
> -	int i, index;
> -	int rc;
> +	unsigned int segsize;
> +	int *segmap, index, i;
> +	uint16_t win;
> +	int64_t rc;
>
>   	/*
>   	 * NOTE: We only care PCI bus based PE for now. For PCI
> @@ -2958,23 +2960,9 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>   		if (res->flags & IORESOURCE_IO) {
>   			region.start = res->start - phb->ioda.io_pci_base;
>   			region.end   = res->end - phb->ioda.io_pci_base;
> -			index = region.start / phb->ioda.io_segsize;
> -
> -			while (index < phb->ioda.total_pe_num &&
> -			       region.start <= region.end) {
> -				phb->ioda.io_segmap[index] = pe->pe_number;
> -				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> -					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
> -				if (rc != OPAL_SUCCESS) {
> -					pr_err("%s: OPAL error %d when mapping IO "
> -					       "segment #%d to PE#%d\n",
> -					       __func__, rc, index, pe->pe_number);
> -					break;
> -				}
> -
> -				region.start += phb->ioda.io_segsize;
> -				index++;
> -			}
> +			segsize      = phb->ioda.io_segsize;
> +			segmap       = phb->ioda.io_segmap;
> +			win          = OPAL_IO_WINDOW_TYPE;
>   		} else if ((res->flags & IORESOURCE_MEM) &&
>   			   !pnv_pci_is_mem_pref_64(res->flags)) {
>   			region.start = res->start -
> @@ -2983,23 +2971,29 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>   			region.end   = res->end -
>   				       hose->mem_offset[0] -
>   				       phb->ioda.m32_pci_base;
> -			index = region.start / phb->ioda.m32_segsize;
> -
> -			while (index < phb->ioda.total_pe_num &&
> -			       region.start <= region.end) {
> -				phb->ioda.m32_segmap[index] = pe->pe_number;
> -				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> -					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
> -				if (rc != OPAL_SUCCESS) {
> -					pr_err("%s: OPAL error %d when mapping M32 "
> -					       "segment#%d to PE#%d",
> -					       __func__, rc, index, pe->pe_number);
> -					break;
> -				}
> +			segsize      = phb->ioda.m32_segsize;
> +			segmap       = phb->ioda.m32_segmap;
> +			win          = OPAL_M32_WINDOW_TYPE;
> +		} else {
> +			continue;
> +		}
>
> -				region.start += phb->ioda.m32_segsize;
> -				index++;
> +		index = region.start / segsize;
> +		while (index < phb->ioda.total_pe_num &&
> +		       region.start <= region.end) {
> +			segmap[index] = pe->pe_number;
> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +					pe->pe_number, win, 0, index);
> +			if (rc != OPAL_SUCCESS) {
> +				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
> +					__func__, rc, win, index,
> +					pe->phb->hose->global_number,
> +					pe->pe_number);
> +				break;

Please move this loop to a helper and stop caching segsize/segmap/win; this 
will make the code easier to read and the next patch will look much cleaner 
as it will not have to move this exact loop.


>   			}
> +
> +			region.start += segsize;
> +			index++;
>   		}
>   	}
>   }
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption
  2016-02-17  3:43 ` [PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption Gavin Shan
@ 2016-04-13  7:09   ` Alexey Kardashevskiy
  2016-04-20  0:05     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  7:09 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> When unplugging PCI devices, their parent PEs might be offline.
> The consumed M64 resource by the PEs should be released at that
> time. As we track M32 segment consumption, this introduces an
> array to the PHB to track the mapping between M64 segment and
> PE number.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

but it would not hurt to mention in the commit log why M64 segment is not 
tracked/setup by the existing (at this point, at least) 
pnv_ioda_setup_one_res().


> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++--
>   arch/powerpc/platforms/powernv/pci.h      |  1 +
>   2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 7330a73..fc0374a 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -305,6 +305,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   		phb->ioda.total_pe_num) {
>   		pe = &phb->ioda.pe_array[i];
>
> +		phb->ioda.m64_segmap[pe->pe_number] = pe->pe_number;
>   		if (!master_pe) {
>   			pe->flags |= PNV_IODA_PE_MASTER;
>   			INIT_LIST_HEAD(&pe->slaves);
> @@ -3245,7 +3246,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   {
>   	struct pci_controller *hose;
>   	struct pnv_phb *phb;
> -	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
> +	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
>   	const __be64 *prop64;
>   	const __be32 *prop32;
>   	int i, len;
> @@ -3332,6 +3333,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>
>   	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>   	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
> +	m64map_off = size;
> +	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
>   	m32map_off = size;
>   	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
>   	if (phb->type == PNV_PHB_IODA1) {
> @@ -3342,9 +3345,12 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
>   	aux = memblock_virt_alloc(size, 0);
>   	phb->ioda.pe_alloc = aux;
> +	phb->ioda.m64_segmap = aux + m64map_off;
>   	phb->ioda.m32_segmap = aux + m32map_off;
> -	for (i = 0; i < phb->ioda.total_pe_num; i++)
> +	for (i = 0; i < phb->ioda.total_pe_num; i++) {
> +		phb->ioda.m64_segmap[i] = IODA_INVALID_PE;
>   		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
> +	}
>   	if (phb->type == PNV_PHB_IODA1) {
>   		phb->ioda.io_segmap = aux + iomap_off;
>   		for (i = 0; i < phb->ioda.total_pe_num; i++)
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 36c4965..866a5ea 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -146,6 +146,7 @@ struct pnv_phb {
>   		struct pnv_ioda_pe	*pe_array;
>
>   		/* M32 & IO segment maps */
> +		int			*m64_segmap;
>   		int			*m32_segmap;
>   		int			*io_segmap;
>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 12/45] powerpc/powernv: Rename M64 related functions
  2016-02-17  3:43 ` [PATCH v8 12/45] powerpc/powernv: Rename M64 related functions Gavin Shan
@ 2016-04-13  7:20       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  7:20 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, dja-Yfaxwxk/+vWsTnJN9+BGXg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This renames those functions picking PE number based on consumed
> M64 segments, mapping M64 segments to PEs as those functions are
> going to be shared by IODA1/IODA2 in next patch. No logical changes
> introduced.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>


Reviewed-by: Alexey Kardashevskiy <aik-sLpHqDYs0B2HXe+LvDLADg@public.gmane.org>




> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 22 +++++++++++-----------
>   1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index fc0374a..1dc663a 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -219,7 +219,7 @@ fail:
>   	return -EIO;
>   }
>
> -static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
> +static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>   					 unsigned long *pe_bitmap)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> @@ -246,22 +246,22 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
>   	}
>   }
>
> -static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
> -				     unsigned long *pe_bitmap,
> -				     bool all)
> +static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
> +				    unsigned long *pe_bitmap,
> +				    bool all)
>   {
>   	struct pci_dev *pdev;
>
>   	list_for_each_entry(pdev, &bus->devices, bus_list) {
> -		pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
> +		pnv_ioda_reserve_dev_m64_pe(pdev, pe_bitmap);
>
>   		if (all && pdev->subordinate)
> -			pnv_ioda2_reserve_m64_pe(pdev->subordinate,
> -						 pe_bitmap, all);
> +			pnv_ioda_reserve_m64_pe(pdev->subordinate,
> +						pe_bitmap, all);
>   	}
>   }
>
> -static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
> +static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
> @@ -283,7 +283,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   	}
>
>   	/* Figure out reserved PE numbers by the PE */
> -	pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
> +	pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
>
>   	/*
>   	 * the current bus might not own M64 window and that's all
> @@ -365,8 +365,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>   	/* Use last M64 BAR to cover M64 window */
>   	phb->ioda.m64_bar_idx = 15;
>   	phb->init_m64 = pnv_ioda2_init_m64;
> -	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
> -	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
> +	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
> +	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>   }
>
>   static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 12/45] powerpc/powernv: Rename M64 related functions
@ 2016-04-13  7:20       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  7:20 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This renames those functions picking PE number based on consumed
> M64 segments, mapping M64 segments to PEs as those functions are
> going to be shared by IODA1/IODA2 in next patch. No logical changes
> introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>




> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 22 +++++++++++-----------
>   1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index fc0374a..1dc663a 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -219,7 +219,7 @@ fail:
>   	return -EIO;
>   }
>
> -static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
> +static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>   					 unsigned long *pe_bitmap)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> @@ -246,22 +246,22 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
>   	}
>   }
>
> -static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
> -				     unsigned long *pe_bitmap,
> -				     bool all)
> +static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
> +				    unsigned long *pe_bitmap,
> +				    bool all)
>   {
>   	struct pci_dev *pdev;
>
>   	list_for_each_entry(pdev, &bus->devices, bus_list) {
> -		pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
> +		pnv_ioda_reserve_dev_m64_pe(pdev, pe_bitmap);
>
>   		if (all && pdev->subordinate)
> -			pnv_ioda2_reserve_m64_pe(pdev->subordinate,
> -						 pe_bitmap, all);
> +			pnv_ioda_reserve_m64_pe(pdev->subordinate,
> +						pe_bitmap, all);
>   	}
>   }
>
> -static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
> +static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
> @@ -283,7 +283,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   	}
>
>   	/* Figure out reserved PE numbers by the PE */
> -	pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
> +	pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
>
>   	/*
>   	 * the current bus might not own M64 window and that's all
> @@ -365,8 +365,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>   	/* Use last M64 BAR to cover M64 window */
>   	phb->ioda.m64_bar_idx = 15;
>   	phb->init_m64 = pnv_ioda2_init_m64;
> -	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
> -	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
> +	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
> +	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>   }
>
>   static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (38 preceding siblings ...)
       [not found] ` <1455680668-23298-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2016-04-13  7:28 ` Alexey Kardashevskiy
  2016-04-13  7:42   ` Gavin Shan
  39 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  7:28 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This series of patches rebases on powerpc/next branch, plus below additional
> patches:
>
>     <This series of patches>
>     <Followup 3 patches from Gavin on SRIOV EEH, which aren't posted>
>     https://patchwork.ozlabs.org/patch/581315/	(PATCH[1/9] Richard's SRIOV EEH)
>     https://patchwork.ozlabs.org/patch/582639/	(PATCH[1/1] Gavin's EEH fix)
>     https://patchwork.ozlabs.org/patch/582093/	(PATCH[1/1] Gavin's EEH fix)
>     https://patchwork.ozlabs.org/patch/580626/	(PATCH[1/4] Gavin's PCI fix)
>     https://patchwork.ozlabs.org/patch/580153/	(PATCH[1/1] Andrew's EEH minor fix)
>     https://patchwork.ozlabs.org/patch/566827/	(PATCH[1/1] Russell's P5IOC2 removal)
>     https://patchwork.ozlabs.org/patch/534154/	(PATCH[1/7] Richard's SRIOV rework)
>     commit 388f7b1 ("Linux 4.5-rc3")
>
> The series of patches intend to support PCI slot for PowerPC PowerNV platform,
> which is running on top of skiboot firmware. The patchset requires corresponding
> changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
> for review. The PCI slots are exposed by skiboot with device node properties,
> and kernel utilizes those properties to populated PCI slots accordingly.
>
> The original PCI infrastructure on PowerNV platform can't support hotplug
> because the PE is assigned during PHB fixup time, which is called for once
> during system boot time. For this, the PCI infrastructure on PowerNV platform
> has been reworked for a lot. After that, the PE and its corresponding resources
> (IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
> PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
> resources, on P8 strictly speaking). Each PE will maintain a reference count,
> which is (number of child PCI devices + 1). That indicates when last child PCI
> device leaves the PE, the PE and its included resources will be relased and put
> back into free pool again. With this design, the PE will be released when EEH PE
> is released. PATCH[1 - 23] are related to this part.
>
>  From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
> resets to EEH. The kernel gets to know if skiboot supports various reset on one
> particular PCI slot through device-tree node. If it does, EEH will utilize the
> functionality provided by skiboot. Besides, the device-tree nodes have to change
> in order to support PCI hotplug. For example, when one PCI adapter inserted to
> one slot, its device-tree node should be added to the system dynamically. Conversely,
> the device-tree node should be removed from the system when the PCI adapter is going
> to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
> they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
> doing the related work.
>
> The OF driver is changed to support unflattening FDT blob for sub-stree, which
> is covered by PATCH[40 - 44].
>
> The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC PowerNV
> platform.
>
> =======
> Testing
> =======
> 1. Unplug adapters behind non-empty slot, then plug them.
>
>     1.1 Check status
>     # cat /sys/bus/pci/slots/C10/address
>     0003:09:00
>     # cat /sys/bus/pci/slots/C10/adapter
>     1
>     # cat /sys/bus/pci/slots/C10/power
>     1
>     # lspci
>     0003:09:00.0 Ethernet controller: \
>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>     0003:09:00.1 Ethernet controller: \
>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>     0003:09:00.2 Ethernet controller: \
>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>     0003:09:00.3 Ethernet controller: \
>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>     # lspci -t
>     # lspci -t
>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>      |                                           +-08.0-[04-08]--
>      |                                           +-09.0-[09]--+-00.0
>      |                                           |            +-00.1
>      |                                           |            +-00.2
>      |                                           |            \-00.3
>      |                                           +-10.0-[0a-0e]--
>      |                                           \-11.0-[0f-13]--
>
>     1.2 Unplug adapter 0003:09.00.x
>     # echo 0 > /sys/bus/pci/slots/C10/power
>     # lspci -t
>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>      |                                           +-08.0-[04-08]--
>      |                                           +-09.0-[09]--
>      |                                           +-10.0-[0a-0e]--
>      |                                           \-11.0-[0f-13]--
>
>     1.3 Plug adapter 0003:09.00.x
>     # echo 1 > /sys/bus/pci/slots/C10/power


Do I understand correctly that the adapter was not physically moved in/out 
of the slot between 1.2 and 1.3?



>     # lspci -t
>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>      |                                           +-08.0-[04-08]--
>      |                                           +-09.0-[09]--+-00.0
>      |                                           |            +-00.1
>      |                                           |            +-00.2
>      |                                           |            \-00.3
>      |                                           +-10.0-[0a-0e]--
>      |                                           \-11.0-[0f-13]--
>
>
>     1.4 Inject EEH error to adapter 0003:09:00.x, which is recovered.

I am confused - why is this needed to test hotplug?




>     # cat /sys/bus/pci/devices/0003:09:00.0/eeh_pe_config_addr
>     0x1
>     # echo 1:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0003/err_injct
>     # lspci -ns 0003:09:00.0
>     # dmesg | grep EEH
>     EEH: Frozen PHB#3-PE#1 detected
>     EEH: PE location: U78C9.001.WZS00CF-P1-C10, PHB location: N/A
>     EEH: Detected PCI bus error on PHB#3-PE#1
>     EEH: This PCI device has failed 1 times in the last hour
>     EEH: Notify device drivers to shutdown
>     EEH: Collect temporary log
>     EEH: Reset without hotplug activity
>     EEH: Notify device drivers the completion of reset
>     EEH: Notify device driver to resume
>
> 2. Plug adapter and then unplug it. This requires hack in skiboot
>     to skip probing the adapters behind the target (C12 in the
>     testing) for once.
>
>     2.1 Check status
>     # cat /sys/bus/pci/slots/C12/address
>     0001:06
>     # cat /sys/bus/pci/slots/C12/power
>     0
>     # cat /sys/bus/pci/slots/C12/adapter
>     1
>     # lspci -t
>     +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>                                                 +-08.0-[05]----00.0
>                                                 \-09.0-[06-0a]--
>
>     2.2 Plug adapter 0001:06:00.x
>     # echo 1 > /sys/bus/pci/slots/C12/power
>     # lspci -t
>     +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>                                                 +-08.0-[05]----00.0
>                                                 \-09.0-[06-0a]--+-00.0
>                                                                 \-00.1
>     # lspci
>     0001:06:00.0 Ethernet controller: \
>     Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>     0001:06:00.1 Ethernet controller: \
>     Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>
>     2.3 Inject EEH error to adapter 0001:06:00.x, which is recovered
>     # cat /sys/bus/pci/devices/0001:06:00.0/eeh_pe_config_addr
>     0x2
>     # echo 2:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0001/err_injct
>     # dmesg | grep EEH
>     EEH: Frozen PHB#1-PE#2 detected
>     EEH: PE location: U78C9.001.WZS00CF-P1-C12, PHB location: N/A
>     EEH: Detected PCI bus error on PHB#1-PE#2
>     EEH: This PCI device has failed 1 times in the last hour
>     EEH: Notify device drivers to shutdown
>     EEH: Collect temporary log
>     EEH: Reset without hotplug activity
>     EEH: Notify device drivers the completion of reset
>     EEH: Notify device driver to resume
>
>     2.4 Unplug adapter 0001:06:00.x
>     # echo 0 > /sys/bus/pci/slots/C12/power
>     # lspci -t
>     +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>                                                 +-08.0-[05]----00.0
>                                                 \-09.0-[06-0a]--
>
> =========
> Changelog
> =========
> v8:
>     * Rebased to linux-powerpc next branch.
>     * Resolve comments from Alexey and Daniel on PCI part
>     * Resolve comments from Rob on fdt.c
>     * Retested (refer to the "Testing section")
> v7:
>     * Reworked revision to some extent.
>     * Rebased to powerpc/next repository.
>     * Reorder/split/merge/drop according - Alexey.
>     * Defined macros and use array to track IO/M32/M64/DMA32 segments - Alexey.
>     * Merged 3 files to one for the hotplug driver - Alexey.
>     * As part of OPAL API, defined macros for PCI slot power state, hotplug
>       message type. Defined macros for PCI slot power confirmed state in
>       hotplug driver.
>     * Misc comments from Alexey.
>     * Reworked unflatten_dt_node() to avoid recursive function calls.
>     * Use EXPORT_SYMBOL_GPL() and document function's input/output - Rob/Frank.
> v6:
>     * Patch reorder, split, squash - Alexey.
>     * Minor coding style - Alexey.
>     * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
>     * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
>     * Concurrent depth as parameter passed to __unflatten_dt_node() - Grant / Alexey
>     * Replace overlay with of_changeset - Grant
> v5:
>     * Rebased to 4.1.rc6 and some unmerged patches as below:
>       Alexey's DDW patchset (v11);
>       Gavin's EEH error injection support (in mpe's next branch);
>       Richard's EEH cleanup patches (in mpe's next branch);
>       Richard's EEH support for VF (v7);
>       Gavin's misc EEH fixes for 4.2;
>     * The revision bases on skiboot corresponding patches (v7):
>       https://patchwork.ozlabs.org/patch/480437/
>     * Utilize OF overlay to update device-tree with help of newly introduced
>       OPAL API opal_get_overlay_dt().
>     * Split patches for easy review according to aik's comments.
>     * Fix coding style from checkpatchc.pl as pointed by aik.
>     * Code cleanup and misc fixup according to aik's input.
> v4:
>     * Rebased to 4.1.RC1
>     * Added API to unflatten FDT blob to device node sub-tree, which is attached
>       the indicated parent device node. The original mechanism based on formatted
>       string stream has been dropped.
>     * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
>       was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
>       Support" depends on that.
> v3:
>     * Rebased to 4.1.RC0
>     * PowerNV PCI infrasturcture is total refactored in order to support PCI
>       hotplug. The PowerNV hotplug driver is also reworked a lot because of
>       the changes in skiboot in order to support PCI hotplug.
>
> Gavin Shan (45):
>    PCI: Add pcibios_setup_bridge()
>    powerpc/pci: Override pcibios_setup_bridge()
>    powerpc/pci: Cleanup on struct pci_controller_ops
>    powerpc/powernv: Cleanup on pci_controller_ops instances
>    powerpc/powernv: Drop phb->bdfn_to_pe()
>    powerpc/powernv: Reorder fields in struct pnv_phb
>    powerpc/powernv: Rename PE# fields in struct pnv_phb
>    powerpc/powernv: Fix initial IO and M32 segmap
>    powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
>    powerpc/powernv: IO and M32 mapping based on PCI device resources
>    powerpc/powernv: Track M64 segment consumption
>    powerpc/powernv: Rename M64 related functions
>    powerpc/powernv/ioda1: M64 support on P7IOC
>    powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()
>    powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
>    powerpc/powernv: Remove DMA32 PE list
>    powerpc/powernv/ioda1: Improve DMA32 segment track
>    powerpc/powernv: Increase PE# capacity
>    powerpc/powernv: Use PE instead of number during setup and release
>    powerpc/powernv: Allocate PE# in reverse order
>    powerpc/powernv: Create PEs at PCI hot plugging time
>    powerpc/powernv/ioda1: Support releasing IODA1 TCE table
>    powerpc/powernv: Dynamically release PEs
>    powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
>    powerpc/pci: Rename pcibios_find_pci_bus()
>    powerpc/pci: Move pci_find_bus_by_node() around
>    powerpc/pci: Export pci_add_device_node_info()
>    powerpc/pci: Introduce pci_remove_device_node_info()
>    powerpc/pci: Export pci_traverse_device_nodes()
>    powerpc/pci: Delay populating pdn
>    powerpc/pci: Don't scan empty slot
>    powerpc/pci: Update bridge windows on PCI plug
>    powerpc/powernv: Simplify pnv_eeh_reset()
>    powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
>    powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
>    powerpc/powernv: Support PCI slot ID
>    powerpc/powernv: Use firmware PCI slot reset infrastructure
>    powerpc/powernv: Functions to get/set PCI slot status
>    powerpc/powernv: Select OF_DYNAMIC
>    drivers/of: Split unflatten_dt_node()
>    drivers/of: Avoid recursively calling unflatten_dt_node()
>    drivers/of: Rename unflatten_dt_node()
>    drivers/of: Specify parent node in of_fdt_unflatten_tree()
>    drivers/of: Return allocated memory from of_fdt_unflatten_tree()
>    PCI/hotplug: PowerPC PowerNV PCI hotplug driver
>
>   arch/powerpc/include/asm/eeh.h                 |    2 +-
>   arch/powerpc/include/asm/opal-api.h            |   17 +-
>   arch/powerpc/include/asm/opal.h                |    8 +-
>   arch/powerpc/include/asm/pci-bridge.h          |   25 +-
>   arch/powerpc/include/asm/pnv-pci.h             |    7 +
>   arch/powerpc/include/asm/ppc-pci.h             |    8 +-
>   arch/powerpc/kernel/eeh_dev.c                  |   17 +-
>   arch/powerpc/kernel/eeh_driver.c               |   12 +-
>   arch/powerpc/kernel/pci-common.c               |   16 +-
>   arch/powerpc/kernel/pci-hotplug.c              |   47 +-
>   arch/powerpc/kernel/pci_dn.c                   |   89 +-
>   arch/powerpc/platforms/maple/pci.c             |   34 +-
>   arch/powerpc/platforms/pasemi/pci.c            |    3 -
>   arch/powerpc/platforms/powermac/pci.c          |   38 +-
>   arch/powerpc/platforms/powernv/Kconfig         |    1 +
>   arch/powerpc/platforms/powernv/eeh-powernv.c   |  179 ++--
>   arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
>   arch/powerpc/platforms/powernv/pci-ioda.c      | 1243 +++++++++++++++---------
>   arch/powerpc/platforms/powernv/pci.c           |   92 +-
>   arch/powerpc/platforms/powernv/pci.h           |   60 +-
>   arch/powerpc/platforms/pseries/msi.c           |    4 +-
>   arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
>   arch/powerpc/platforms/pseries/setup.c         |    8 +-
>   drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c   |    2 +-
>   drivers/of/fdt.c                               |  372 ++++---
>   drivers/of/unittest.c                          |    2 +-
>   drivers/pci/hotplug/Kconfig                    |   12 +
>   drivers/pci/hotplug/Makefile                   |    3 +
>   drivers/pci/hotplug/pnv_php.c                  |  870 +++++++++++++++++
>   drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
>   drivers/pci/hotplug/rpaphp_core.c              |    4 +-
>   drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
>   drivers/pci/setup-bus.c                        |    5 +
>   include/linux/of_fdt.h                         |    5 +-
>   include/linux/pci.h                            |    1 +
>   35 files changed, 2360 insertions(+), 874 deletions(-)
>   create mode 100644 drivers/pci/hotplug/pnv_php.c
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 14/45] powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()
  2016-02-17  3:43 ` [PATCH v8 14/45] powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe() Gavin Shan
@ 2016-04-13  7:36   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  7:36 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe()
> as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe().
> No logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 8488238..d18b95e 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2026,9 +2026,10 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
>   	.free = pnv_ioda2_table_free,
>   };
>
> -static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> -				      struct pnv_ioda_pe *pe, unsigned int base,
> -				      unsigned int segs)
> +static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
> +				       struct pnv_ioda_pe *pe,
> +				       unsigned int base,
> +				       unsigned int segs)
>   {
>
>   	struct page *tce_mem = NULL;
> @@ -2616,7 +2617,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   		if (phb->type == PNV_PHB_IODA1) {
>   			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>   				pe->dma_weight, segs);
> -			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
> +			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>   		} else if (phb->type == PNV_PHB_IODA2) {
>   			pe_info(pe, "Assign DMA32 space\n");
>   			segs = 0;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-13  7:28 ` [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Alexey Kardashevskiy
@ 2016-04-13  7:42   ` Gavin Shan
  2016-04-13  9:14       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-13  7:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 13, 2016 at 05:28:15PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>This series of patches rebases on powerpc/next branch, plus below additional
>>patches:
>>
>>    <This series of patches>
>>    <Followup 3 patches from Gavin on SRIOV EEH, which aren't posted>
>>    https://patchwork.ozlabs.org/patch/581315/	(PATCH[1/9] Richard's SRIOV EEH)
>>    https://patchwork.ozlabs.org/patch/582639/	(PATCH[1/1] Gavin's EEH fix)
>>    https://patchwork.ozlabs.org/patch/582093/	(PATCH[1/1] Gavin's EEH fix)
>>    https://patchwork.ozlabs.org/patch/580626/	(PATCH[1/4] Gavin's PCI fix)
>>    https://patchwork.ozlabs.org/patch/580153/	(PATCH[1/1] Andrew's EEH minor fix)
>>    https://patchwork.ozlabs.org/patch/566827/	(PATCH[1/1] Russell's P5IOC2 removal)
>>    https://patchwork.ozlabs.org/patch/534154/	(PATCH[1/7] Richard's SRIOV rework)
>>    commit 388f7b1 ("Linux 4.5-rc3")
>>
>>The series of patches intend to support PCI slot for PowerPC PowerNV platform,
>>which is running on top of skiboot firmware. The patchset requires corresponding
>>changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
>>for review. The PCI slots are exposed by skiboot with device node properties,
>>and kernel utilizes those properties to populated PCI slots accordingly.
>>
>>The original PCI infrastructure on PowerNV platform can't support hotplug
>>because the PE is assigned during PHB fixup time, which is called for once
>>during system boot time. For this, the PCI infrastructure on PowerNV platform
>>has been reworked for a lot. After that, the PE and its corresponding resources
>>(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
>>PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
>>resources, on P8 strictly speaking). Each PE will maintain a reference count,
>>which is (number of child PCI devices + 1). That indicates when last child PCI
>>device leaves the PE, the PE and its included resources will be relased and put
>>back into free pool again. With this design, the PE will be released when EEH PE
>>is released. PATCH[1 - 23] are related to this part.
>>
>> From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
>>resets to EEH. The kernel gets to know if skiboot supports various reset on one
>>particular PCI slot through device-tree node. If it does, EEH will utilize the
>>functionality provided by skiboot. Besides, the device-tree nodes have to change
>>in order to support PCI hotplug. For example, when one PCI adapter inserted to
>>one slot, its device-tree node should be added to the system dynamically. Conversely,
>>the device-tree node should be removed from the system when the PCI adapter is going
>>to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
>>they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
>>doing the related work.
>>
>>The OF driver is changed to support unflattening FDT blob for sub-stree, which
>>is covered by PATCH[40 - 44].
>>
>>The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC PowerNV
>>platform.
>>
>>=======
>>Testing
>>=======
>>1. Unplug adapters behind non-empty slot, then plug them.
>>
>>    1.1 Check status
>>    # cat /sys/bus/pci/slots/C10/address
>>    0003:09:00
>>    # cat /sys/bus/pci/slots/C10/adapter
>>    1
>>    # cat /sys/bus/pci/slots/C10/power
>>    1
>>    # lspci
>>    0003:09:00.0 Ethernet controller: \
>>    Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>    0003:09:00.1 Ethernet controller: \
>>    Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>    0003:09:00.2 Ethernet controller: \
>>    Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>    0003:09:00.3 Ethernet controller: \
>>    Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>    # lspci -t
>>    # lspci -t
>>    -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>     |                                           +-08.0-[04-08]--
>>     |                                           +-09.0-[09]--+-00.0
>>     |                                           |            +-00.1
>>     |                                           |            +-00.2
>>     |                                           |            \-00.3
>>     |                                           +-10.0-[0a-0e]--
>>     |                                           \-11.0-[0f-13]--
>>
>>    1.2 Unplug adapter 0003:09.00.x
>>    # echo 0 > /sys/bus/pci/slots/C10/power
>>    # lspci -t
>>    -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>     |                                           +-08.0-[04-08]--
>>     |                                           +-09.0-[09]--
>>     |                                           +-10.0-[0a-0e]--
>>     |                                           \-11.0-[0f-13]--
>>
>>    1.3 Plug adapter 0003:09.00.x
>>    # echo 1 > /sys/bus/pci/slots/C10/power
>
>
>Do I understand correctly that the adapter was not physically moved in/out of
>the slot between 1.2 and 1.3?
>

Correct.

>
>
>>    # lspci -t
>>    -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>     |                                           +-08.0-[04-08]--
>>     |                                           +-09.0-[09]--+-00.0
>>     |                                           |            +-00.1
>>     |                                           |            +-00.2
>>     |                                           |            \-00.3
>>     |                                           +-10.0-[0a-0e]--
>>     |                                           \-11.0-[0f-13]--
>>
>>
>>    1.4 Inject EEH error to adapter 0003:09:00.x, which is recovered.
>
>I am confused - why is this needed to test hotplug?
>

Without the series, the EEH reset is always done by kenrel. With the
series applied, the EEH reset could be done in skiboot. That's the
major change introduced by the series from EEH's perspective. Also,
the EEH code was touched.

>>    # cat /sys/bus/pci/devices/0003:09:00.0/eeh_pe_config_addr
>>    0x1
>>    # echo 1:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0003/err_injct
>>    # lspci -ns 0003:09:00.0
>>    # dmesg | grep EEH
>>    EEH: Frozen PHB#3-PE#1 detected
>>    EEH: PE location: U78C9.001.WZS00CF-P1-C10, PHB location: N/A
>>    EEH: Detected PCI bus error on PHB#3-PE#1
>>    EEH: This PCI device has failed 1 times in the last hour
>>    EEH: Notify device drivers to shutdown
>>    EEH: Collect temporary log
>>    EEH: Reset without hotplug activity
>>    EEH: Notify device drivers the completion of reset
>>    EEH: Notify device driver to resume
>>
>>2. Plug adapter and then unplug it. This requires hack in skiboot
>>    to skip probing the adapters behind the target (C12 in the
>>    testing) for once.
>>
>>    2.1 Check status
>>    # cat /sys/bus/pci/slots/C12/address
>>    0001:06
>>    # cat /sys/bus/pci/slots/C12/power
>>    0
>>    # cat /sys/bus/pci/slots/C12/adapter
>>    1
>>    # lspci -t
>>    +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>                                                +-08.0-[05]----00.0
>>                                                \-09.0-[06-0a]--
>>
>>    2.2 Plug adapter 0001:06:00.x
>>    # echo 1 > /sys/bus/pci/slots/C12/power
>>    # lspci -t
>>    +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>                                                +-08.0-[05]----00.0
>>                                                \-09.0-[06-0a]--+-00.0
>>                                                                \-00.1
>>    # lspci
>>    0001:06:00.0 Ethernet controller: \
>>    Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>>    0001:06:00.1 Ethernet controller: \
>>    Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>>
>>    2.3 Inject EEH error to adapter 0001:06:00.x, which is recovered
>>    # cat /sys/bus/pci/devices/0001:06:00.0/eeh_pe_config_addr
>>    0x2
>>    # echo 2:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0001/err_injct
>>    # dmesg | grep EEH
>>    EEH: Frozen PHB#1-PE#2 detected
>>    EEH: PE location: U78C9.001.WZS00CF-P1-C12, PHB location: N/A
>>    EEH: Detected PCI bus error on PHB#1-PE#2
>>    EEH: This PCI device has failed 1 times in the last hour
>>    EEH: Notify device drivers to shutdown
>>    EEH: Collect temporary log
>>    EEH: Reset without hotplug activity
>>    EEH: Notify device drivers the completion of reset
>>    EEH: Notify device driver to resume
>>
>>    2.4 Unplug adapter 0001:06:00.x
>>    # echo 0 > /sys/bus/pci/slots/C12/power
>>    # lspci -t
>>    +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>                                                +-08.0-[05]----00.0
>>                                                \-09.0-[06-0a]--
>>
>>=========
>>Changelog
>>=========
>>v8:
>>    * Rebased to linux-powerpc next branch.
>>    * Resolve comments from Alexey and Daniel on PCI part
>>    * Resolve comments from Rob on fdt.c
>>    * Retested (refer to the "Testing section")
>>v7:
>>    * Reworked revision to some extent.
>>    * Rebased to powerpc/next repository.
>>    * Reorder/split/merge/drop according - Alexey.
>>    * Defined macros and use array to track IO/M32/M64/DMA32 segments - Alexey.
>>    * Merged 3 files to one for the hotplug driver - Alexey.
>>    * As part of OPAL API, defined macros for PCI slot power state, hotplug
>>      message type. Defined macros for PCI slot power confirmed state in
>>      hotplug driver.
>>    * Misc comments from Alexey.
>>    * Reworked unflatten_dt_node() to avoid recursive function calls.
>>    * Use EXPORT_SYMBOL_GPL() and document function's input/output - Rob/Frank.
>>v6:
>>    * Patch reorder, split, squash - Alexey.
>>    * Minor coding style - Alexey.
>>    * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
>>    * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
>>    * Concurrent depth as parameter passed to __unflatten_dt_node() - Grant / Alexey
>>    * Replace overlay with of_changeset - Grant
>>v5:
>>    * Rebased to 4.1.rc6 and some unmerged patches as below:
>>      Alexey's DDW patchset (v11);
>>      Gavin's EEH error injection support (in mpe's next branch);
>>      Richard's EEH cleanup patches (in mpe's next branch);
>>      Richard's EEH support for VF (v7);
>>      Gavin's misc EEH fixes for 4.2;
>>    * The revision bases on skiboot corresponding patches (v7):
>>      https://patchwork.ozlabs.org/patch/480437/
>>    * Utilize OF overlay to update device-tree with help of newly introduced
>>      OPAL API opal_get_overlay_dt().
>>    * Split patches for easy review according to aik's comments.
>>    * Fix coding style from checkpatchc.pl as pointed by aik.
>>    * Code cleanup and misc fixup according to aik's input.
>>v4:
>>    * Rebased to 4.1.RC1
>>    * Added API to unflatten FDT blob to device node sub-tree, which is attached
>>      the indicated parent device node. The original mechanism based on formatted
>>      string stream has been dropped.
>>    * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
>>      was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
>>      Support" depends on that.
>>v3:
>>    * Rebased to 4.1.RC0
>>    * PowerNV PCI infrasturcture is total refactored in order to support PCI
>>      hotplug. The PowerNV hotplug driver is also reworked a lot because of
>>      the changes in skiboot in order to support PCI hotplug.
>>
>>Gavin Shan (45):
>>   PCI: Add pcibios_setup_bridge()
>>   powerpc/pci: Override pcibios_setup_bridge()
>>   powerpc/pci: Cleanup on struct pci_controller_ops
>>   powerpc/powernv: Cleanup on pci_controller_ops instances
>>   powerpc/powernv: Drop phb->bdfn_to_pe()
>>   powerpc/powernv: Reorder fields in struct pnv_phb
>>   powerpc/powernv: Rename PE# fields in struct pnv_phb
>>   powerpc/powernv: Fix initial IO and M32 segmap
>>   powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
>>   powerpc/powernv: IO and M32 mapping based on PCI device resources
>>   powerpc/powernv: Track M64 segment consumption
>>   powerpc/powernv: Rename M64 related functions
>>   powerpc/powernv/ioda1: M64 support on P7IOC
>>   powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()
>>   powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
>>   powerpc/powernv: Remove DMA32 PE list
>>   powerpc/powernv/ioda1: Improve DMA32 segment track
>>   powerpc/powernv: Increase PE# capacity
>>   powerpc/powernv: Use PE instead of number during setup and release
>>   powerpc/powernv: Allocate PE# in reverse order
>>   powerpc/powernv: Create PEs at PCI hot plugging time
>>   powerpc/powernv/ioda1: Support releasing IODA1 TCE table
>>   powerpc/powernv: Dynamically release PEs
>>   powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
>>   powerpc/pci: Rename pcibios_find_pci_bus()
>>   powerpc/pci: Move pci_find_bus_by_node() around
>>   powerpc/pci: Export pci_add_device_node_info()
>>   powerpc/pci: Introduce pci_remove_device_node_info()
>>   powerpc/pci: Export pci_traverse_device_nodes()
>>   powerpc/pci: Delay populating pdn
>>   powerpc/pci: Don't scan empty slot
>>   powerpc/pci: Update bridge windows on PCI plug
>>   powerpc/powernv: Simplify pnv_eeh_reset()
>>   powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
>>   powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
>>   powerpc/powernv: Support PCI slot ID
>>   powerpc/powernv: Use firmware PCI slot reset infrastructure
>>   powerpc/powernv: Functions to get/set PCI slot status
>>   powerpc/powernv: Select OF_DYNAMIC
>>   drivers/of: Split unflatten_dt_node()
>>   drivers/of: Avoid recursively calling unflatten_dt_node()
>>   drivers/of: Rename unflatten_dt_node()
>>   drivers/of: Specify parent node in of_fdt_unflatten_tree()
>>   drivers/of: Return allocated memory from of_fdt_unflatten_tree()
>>   PCI/hotplug: PowerPC PowerNV PCI hotplug driver
>>
>>  arch/powerpc/include/asm/eeh.h                 |    2 +-
>>  arch/powerpc/include/asm/opal-api.h            |   17 +-
>>  arch/powerpc/include/asm/opal.h                |    8 +-
>>  arch/powerpc/include/asm/pci-bridge.h          |   25 +-
>>  arch/powerpc/include/asm/pnv-pci.h             |    7 +
>>  arch/powerpc/include/asm/ppc-pci.h             |    8 +-
>>  arch/powerpc/kernel/eeh_dev.c                  |   17 +-
>>  arch/powerpc/kernel/eeh_driver.c               |   12 +-
>>  arch/powerpc/kernel/pci-common.c               |   16 +-
>>  arch/powerpc/kernel/pci-hotplug.c              |   47 +-
>>  arch/powerpc/kernel/pci_dn.c                   |   89 +-
>>  arch/powerpc/platforms/maple/pci.c             |   34 +-
>>  arch/powerpc/platforms/pasemi/pci.c            |    3 -
>>  arch/powerpc/platforms/powermac/pci.c          |   38 +-
>>  arch/powerpc/platforms/powernv/Kconfig         |    1 +
>>  arch/powerpc/platforms/powernv/eeh-powernv.c   |  179 ++--
>>  arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
>>  arch/powerpc/platforms/powernv/pci-ioda.c      | 1243 +++++++++++++++---------
>>  arch/powerpc/platforms/powernv/pci.c           |   92 +-
>>  arch/powerpc/platforms/powernv/pci.h           |   60 +-
>>  arch/powerpc/platforms/pseries/msi.c           |    4 +-
>>  arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
>>  arch/powerpc/platforms/pseries/setup.c         |    8 +-
>>  drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c   |    2 +-
>>  drivers/of/fdt.c                               |  372 ++++---
>>  drivers/of/unittest.c                          |    2 +-
>>  drivers/pci/hotplug/Kconfig                    |   12 +
>>  drivers/pci/hotplug/Makefile                   |    3 +
>>  drivers/pci/hotplug/pnv_php.c                  |  870 +++++++++++++++++
>>  drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
>>  drivers/pci/hotplug/rpaphp_core.c              |    4 +-
>>  drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
>>  drivers/pci/setup-bus.c                        |    5 +
>>  include/linux/of_fdt.h                         |    5 +-
>>  include/linux/pci.h                            |    1 +
>>  35 files changed, 2360 insertions(+), 874 deletions(-)
>>  create mode 100644 drivers/pci/hotplug/pnv_php.c
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC
  2016-02-17  3:43     ` Gavin Shan
  (?)
@ 2016-04-13  7:47     ` Alexey Kardashevskiy
  2016-04-20  0:22       ` Gavin Shan
  -1 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  7:47 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> This enables M64 window on P7IOC, which has been enabled on PHB3.
> Different from PHB3 where 16 M64 BARs are supported and each of
> them can be owned by one particular PE# exclusively or divided
> evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
> of them are divided to 8 segments. So every P7IOC PHB supports
> 128 M64 segments in total. P7IOC has M64DT, which helps mapping
> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
> M64DT, indicating that one M64 segment can only be pinned to the
> fixed PE#. In order to have same code to support M64 on P7IOC and
> PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
> of them is pinned to the fixed PE# by bypassing the function of
> M64DT. In turn, we just need different phb->init_m64() for P7IOC
> and PHB3 to support M64.

The comment is not quite correct - in addition to pnv_ioda1_init_m64(), you 
also need to hack pnv_ioda_pick_m64_pe().


>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>   arch/powerpc/platforms/powernv/pci.h      |  3 ++
>   2 files changed, 86 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 1dc663a..8488238 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -246,6 +246,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>   	}
>   }
>
> +static int pnv_ioda1_init_m64(struct pnv_phb *phb)
> +{
> +	struct resource *r;
> +	int index;
> +
> +	/*
> +	 * There are 16 M64 BARs, each of which has 8 segments. So
> +	 * there are as many M64 segments as the maximum number of
> +	 * PEs, which is 128.
> +	 */
> +	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
> +		unsigned long base, segsz = phb->ioda.m64_segsize;
> +		int64_t rc;
> +
> +		base = phb->ioda.m64_base +
> +		       index * PNV_IODA1_M64_SEGS * segsz;
> +		rc = opal_pci_set_phb_mem_window(phb->opal_id,
> +				OPAL_M64_WINDOW_TYPE, index, base, 0,
> +				PNV_IODA1_M64_SEGS * segsz);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
> +				rc, phb->hose->global_number, index);
> +			goto fail;
> +		}
> +
> +		rc = opal_pci_phb_mmio_enable(phb->opal_id,
> +				OPAL_M64_WINDOW_TYPE, index,
> +				OPAL_ENABLE_M64_SPLIT);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
> +				rc, phb->hose->global_number, index);
> +			goto fail;
> +		}
> +	}
> +
> +	/*
> +	 * Exclude the segment used by the reserved PE, which
> +	 * is expected to be 0 or last supported PE#.
> +	 */
> +	r = &phb->hose->mem_resources[1];
> +	if (phb->ioda.reserved_pe_idx == 0)
> +		r->start += phb->ioda.m64_segsize;
> +	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
> +		r->end -= phb->ioda.m64_segsize;
> +	else
> +		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
> +			phb->ioda.reserved_pe_idx);
> +
> +	return 0;
> +
> +fail:
> +	for ( ; index >= 0; index--)
> +		opal_pci_phb_mmio_enable(phb->opal_id,
> +			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
> +
> +	return -EIO;
> +}
> +
>   static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>   				    unsigned long *pe_bitmap,
>   				    bool all)
> @@ -315,6 +373,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   			pe->master = master_pe;
>   			list_add_tail(&pe->list, &master_pe->slaves);
>   		}
> +
> +		/*
> +		 * P7IOC supports M64DT, which helps mapping M64 segment
> +		 * to one particular PE#. However, PHB3 has fixed mapping
> +		 * between M64 segment and PE#. In order to have same logic
> +		 * for P7IOC and PHB3, we enforce fixed mapping between M64
> +		 * segment and PE# on P7IOC.
> +		 */
> +		if (phb->type == PNV_PHB_IODA1) {
> +			int64_t rc;
> +
> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +					pe->pe_number, OPAL_M64_WINDOW_TYPE,
> +					pe->pe_number / PNV_IODA1_M64_SEGS,
> +					pe->pe_number % PNV_IODA1_M64_SEGS);
> +			if (rc != OPAL_SUCCESS)
> +				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
> +					__func__, rc, phb->hose->global_number,
> +					pe->pe_number);
> +		}


Cannot this go to pnv_ioda1_init_m64()? From the commit log I understood 
that this setup is supposed to be static so it can be done once. Or it is 
sort of enable/disable PE? Then make is a helper and call it 
ioda1_pe_enable() or something.


>   	}
>
>   	kfree(pe_alloc);
> @@ -329,8 +407,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>   	const u32 *r;
>   	u64 pci_addr;
>
> -	/* FIXME: Support M64 for P7IOC */
> -	if (phb->type != PNV_PHB_IODA2) {
> +	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
>   		pr_info("  Not support M64 window\n");
>   		return;
>   	}
> @@ -364,7 +441,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>
>   	/* Use last M64 BAR to cover M64 window */
>   	phb->ioda.m64_bar_idx = 15;
> -	phb->init_m64 = pnv_ioda2_init_m64;
> +	if (phb->type == PNV_PHB_IODA1)
> +		phb->init_m64 = pnv_ioda1_init_m64;
> +	else
> +		phb->init_m64 = pnv_ioda2_init_m64;
>   	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>   	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>   }
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 866a5ea..00539ff 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -82,6 +82,9 @@ struct pnv_ioda_pe {
>   	struct list_head	list;
>   };
>
> +#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
> +#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
> +

Why here, not in the beginning of arch/powerpc/platforms/powernv/pci-ioda.c 
? It exposes symbols but nothing is using them (except pci-ioda.c) and code 
browsing gets bit more inconvenient.



>   #define PNV_PHB_FLAG_EEH	(1 << 0)
>
>   struct pnv_phb {
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap
  2016-04-13  6:21   ` Alexey Kardashevskiy
@ 2016-04-13  7:53       ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-13  7:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: devicetree, Gavin Shan, grant.likely, robherring2, linux-pci,
	bhelgaas, linuxppc-dev, dja

On Wed, Apr 13, 2016 at 04:21:07PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>There are two arrays for IO and M32 segment maps on every PHB.
>>The index of the arrays are segment number and the value stored
>>in the corresponding element is PE number, indicating the segment
>>is assigned to the PE. Initially, all elements in those two arrays
>>are zeroes, meaning all segments are assigned to PE#0. It's wrong.
>>
>>This fixes the initial values in the elements of those two arrays
>>to IODA_INVALID_PE, meaning all segments aren't assigned to any
>>PE.
>
>This is ok.
>
>>In order to use IODA_INVALID_PE (-1) to represent invalid PE
>>number, the types of those two arrays are changed from "unsigned int"
>>to "int".
>
>"unsigned" can carry (-1) perfectly fine, just add a type cast to
>IODA_INVALID_PE:
>
>#define IODA_INVALID_PE    (unsigned int)(-1)
>
>Using "signed" type for indexes which cannot be negative does not make much
>sense - instead of checking for the upper boundary, you have to check for "<
>0" too.
>
>OPAL uses unsigned type for PE (uint64_t or uint32_t or uint16_t - this is
>quite funny).
>
>pnv_ioda_pe::pe_number is "unsigned" and this pe_number is the same thing as
>I can see in pnv_ioda_setup_dev_PE().
>
>Some printk() print the PE number as "%x" (which implies "unsigned").
>

Yes, I can simply have something like below when PE number as well as
segment index are represented by "unsigned int" values, right?

#define IODA_INVALID_PE		0xffffffff

>
>I suggest changing the pci_dn::pe_number type from "int" to "unsigned int" to
>match pnv_ioda_pe::pe_number, in a separate patch. Or do not touch types for
>now.
>

Yes, I will have a separate patch right before this one to address it.

>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++++++--
>>  arch/powerpc/platforms/powernv/pci.h      | 4 ++--
>>  2 files changed, 9 insertions(+), 4 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 1d2514f..44cc5f3 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -3239,7 +3239,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>  	const __be64 *prop64;
>>  	const __be32 *prop32;
>>-	int len;
>>+	int i, len;
>>  	u64 phb_id;
>>  	void *aux;
>>  	long rc;
>>@@ -3334,8 +3334,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	aux = memblock_virt_alloc(size, 0);
>>  	phb->ioda.pe_alloc = aux;
>>  	phb->ioda.m32_segmap = aux + m32map_off;
>>-	if (phb->type == PNV_PHB_IODA1)
>>+	for (i = 0; i < phb->ioda.total_pe_num; i++)
>>+		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
>>+	if (phb->type == PNV_PHB_IODA1) {
>>  		phb->ioda.io_segmap = aux + iomap_off;
>>+		for (i = 0; i < phb->ioda.total_pe_num; i++)
>>+			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
>>+	}
>>  	phb->ioda.pe_array = aux + pemap_off;
>>  	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 784882a..36c4965 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -146,8 +146,8 @@ struct pnv_phb {
>>  		struct pnv_ioda_pe	*pe_array;
>>
>>  		/* M32 & IO segment maps */
>>-		unsigned int		*m32_segmap;
>>-		unsigned int		*io_segmap;
>>+		int			*m32_segmap;
>>+		int			*io_segmap;
>>
>>  		/* IRQ chip */
>>  		int			irq_chip_init;
>>
>
>
>-- 
>Alexey
>

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap
@ 2016-04-13  7:53       ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-13  7:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 13, 2016 at 04:21:07PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>There are two arrays for IO and M32 segment maps on every PHB.
>>The index of the arrays are segment number and the value stored
>>in the corresponding element is PE number, indicating the segment
>>is assigned to the PE. Initially, all elements in those two arrays
>>are zeroes, meaning all segments are assigned to PE#0. It's wrong.
>>
>>This fixes the initial values in the elements of those two arrays
>>to IODA_INVALID_PE, meaning all segments aren't assigned to any
>>PE.
>
>This is ok.
>
>>In order to use IODA_INVALID_PE (-1) to represent invalid PE
>>number, the types of those two arrays are changed from "unsigned int"
>>to "int".
>
>"unsigned" can carry (-1) perfectly fine, just add a type cast to
>IODA_INVALID_PE:
>
>#define IODA_INVALID_PE    (unsigned int)(-1)
>
>Using "signed" type for indexes which cannot be negative does not make much
>sense - instead of checking for the upper boundary, you have to check for "<
>0" too.
>
>OPAL uses unsigned type for PE (uint64_t or uint32_t or uint16_t - this is
>quite funny).
>
>pnv_ioda_pe::pe_number is "unsigned" and this pe_number is the same thing as
>I can see in pnv_ioda_setup_dev_PE().
>
>Some printk() print the PE number as "%x" (which implies "unsigned").
>

Yes, I can simply have something like below when PE number as well as
segment index are represented by "unsigned int" values, right?

#define IODA_INVALID_PE		0xffffffff

>
>I suggest changing the pci_dn::pe_number type from "int" to "unsigned int" to
>match pnv_ioda_pe::pe_number, in a separate patch. Or do not touch types for
>now.
>

Yes, I will have a separate patch right before this one to address it.

>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++++++--
>>  arch/powerpc/platforms/powernv/pci.h      | 4 ++--
>>  2 files changed, 9 insertions(+), 4 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 1d2514f..44cc5f3 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -3239,7 +3239,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>  	const __be64 *prop64;
>>  	const __be32 *prop32;
>>-	int len;
>>+	int i, len;
>>  	u64 phb_id;
>>  	void *aux;
>>  	long rc;
>>@@ -3334,8 +3334,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	aux = memblock_virt_alloc(size, 0);
>>  	phb->ioda.pe_alloc = aux;
>>  	phb->ioda.m32_segmap = aux + m32map_off;
>>-	if (phb->type == PNV_PHB_IODA1)
>>+	for (i = 0; i < phb->ioda.total_pe_num; i++)
>>+		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
>>+	if (phb->type == PNV_PHB_IODA1) {
>>  		phb->ioda.io_segmap = aux + iomap_off;
>>+		for (i = 0; i < phb->ioda.total_pe_num; i++)
>>+			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
>>+	}
>>  	phb->ioda.pe_array = aux + pemap_off;
>>  	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 784882a..36c4965 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -146,8 +146,8 @@ struct pnv_phb {
>>  		struct pnv_ioda_pe	*pe_array;
>>
>>  		/* M32 & IO segment maps */
>>-		unsigned int		*m32_segmap;
>>-		unsigned int		*io_segmap;
>>+		int			*m32_segmap;
>>+		int			*io_segmap;
>>
>>  		/* IRQ chip */
>>  		int			irq_chip_init;
>>
>
>
>-- 
>Alexey
>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
  2016-02-17  3:43     ` Gavin Shan
  (?)
@ 2016-04-13  8:29     ` Alexey Kardashevskiy
  2016-04-13 23:54       ` Gavin Shan
  -1 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  8:29 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> Currently, there is one macro (TCE32_TABLE_SIZE) representing the
> TCE table size for one DMA32 segment. The constant representing
> the DMA32 segment size (1 << 28) is still used in the code.
>
> This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
> segment size. the TCE table size can be calcualted when the page

s/calcualted/calculated/


> has fixed 4KB size. So all the related calculation depends on one
> macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.

Please move PNV_IODA1_DMA32_SEGSIZE where TCE32_TABLE_SIZE was.


>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 30 +++++++++++++++++-------------
>   arch/powerpc/platforms/powernv/pci.h      |  1 +
>   2 files changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index d18b95e..e60cff6 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -48,9 +48,6 @@
>   #include "powernv.h"
>   #include "pci.h"
>
> -/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
> -#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
> -
>   #define POWERNV_IOMMU_DEFAULT_LEVELS	1
>   #define POWERNV_IOMMU_MAX_LEVELS	5
>
> @@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>
>   	struct page *tce_mem = NULL;
>   	struct iommu_table *tbl;
> -	unsigned int i;
> +	unsigned int tce32_segsz, i;


PNV_IODA1_DMA32_SEGSIZE is a segment size in bytes. The name @tce32_segsz 
also suggests that it is a segment size in bytes (otherwise it would be 
tce32_seg_entries or something like this) but it is not, it is a number of 
TCE entries (arch/powerpc/kernel/iommu.c uses "entry" for these). And 
tce32_segsz never changes. So:

const unsigned int entries = PNV_IODA1_DMA32_SEGSIZE >> 
(IOMMU_PAGE_SHIFT_4K - 3);




>   	int64_t rc;
>   	void *addr;
>
> @@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   	/* Grab a 32-bit TCE table */
>   	pe->tce32_seg = base;
>   	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
> -		(base << 28), ((base + segs) << 28) - 1);
> +		base * PNV_IODA1_DMA32_SEGSIZE,
> +		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>
>   	/* XXX Currently, we allocate one big contiguous table for the
>   	 * TCEs. We only really need one chunk per 256M of TCE space
>   	 * (ie per segment) but that's an optimization for later, it
>   	 * requires some added smarts with our get/put_tce implementation
> +	 *
> +	 * Each TCE page is 4KB in size and each TCE entry occupies 8
> +	 * bytes
>   	 */
> +	tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);

>   	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
> -				   get_order(TCE32_TABLE_SIZE * segs));
> +				   get_order(tce32_segsz * segs));
>   	if (!tce_mem) {
>   		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
>   		goto fail;
>   	}
>   	addr = page_address(tce_mem);
> -	memset(addr, 0, TCE32_TABLE_SIZE * segs);
> +	memset(addr, 0, tce32_segsz * segs);
>
>   	/* Configure HW */
>   	for (i = 0; i < segs; i++) {
>   		rc = opal_pci_map_pe_dma_window(phb->opal_id,
>   					      pe->pe_number,
>   					      base + i, 1,
> -					      __pa(addr) + TCE32_TABLE_SIZE * i,
> -					      TCE32_TABLE_SIZE, 0x1000);
> +					      __pa(addr) + tce32_segsz * i,
> +					      tce32_segsz, 0x1000);


As you started using IOMMU_PAGE_SHIFT_4K and you are also touching this 
piece of code -

s/0x1000/IOMMU_PAGE_SHIFT_4K/


>   		if (rc) {
>   			pe_err(pe, " Failed to configure 32-bit TCE table,"
>   			       " err %ld\n", rc);
> @@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   	}
>
>   	/* Setup linux iommu table */
> -	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
> -				  base << 28, IOMMU_PAGE_SHIFT_4K);
> +	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
> +				  base * PNV_IODA1_DMA32_SEGSIZE,
> +				  IOMMU_PAGE_SHIFT_4K);
>
>   	/* OPAL variant of P7IOC SW invalidated TCEs */
>   	if (phb->ioda.tce_inval_reg)
> @@ -2116,7 +2119,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   	if (pe->tce32_seg >= 0)
>   		pe->tce32_seg = -1;
>   	if (tce_mem)
> -		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
> +		__free_pages(tce_mem, get_order(tce32_segsz * segs));
>   	if (tbl) {
>   		pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
>   		iommu_free_table(tbl, "pnv");
> @@ -3445,7 +3448,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	mutex_init(&phb->ioda.pe_list_mutex);
>
>   	/* Calculate how many 32-bit TCE segments we have */
> -	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
> +	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
> +				PNV_IODA1_DMA32_SEGSIZE;
>
>   #if 0 /* We should really do that ... */
>   	rc = opal_pci_set_phb_mem_window(opal->phb_id,
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 00539ff..1d8e775 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -84,6 +84,7 @@ struct pnv_ioda_pe {
>
>   #define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
>   #define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
> +#define PNV_IODA1_DMA32_SEGSIZE	0x10000000
>
>   #define PNV_PHB_FLAG_EEH	(1 << 0)
>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list
  2016-02-17  3:43 ` [PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list Gavin Shan
@ 2016-04-13  8:59   ` Alexey Kardashevskiy
  2016-04-20  0:34     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  8:59 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:43 PM, Gavin Shan wrote:
> PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
> to their DMA32 weight. The PEs on the list are iterated to setup
> their TCE32 tables at system booting time. The list is used for
> once and there is for keep having it.

"there is no need to keep it" may be?


>
> This moves the logic calculating DMA32 weight of PHB and PE to
> pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE
> traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount
> are useless and they're removed.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

with few comments below...

> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 168 +++++++++++++-----------------
>   arch/powerpc/platforms/powernv/pci.h      |  19 ----
>   2 files changed, 75 insertions(+), 112 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index e60cff6..0fc2309 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -886,44 +886,6 @@ out:
>   	return 0;
>   }
>
> -static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
> -				       struct pnv_ioda_pe *pe)
> -{
> -	struct pnv_ioda_pe *lpe;
> -
> -	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
> -		if (lpe->dma_weight < pe->dma_weight) {
> -			list_add_tail(&pe->dma_link, &lpe->dma_link);
> -			return;
> -		}
> -	}
> -	list_add_tail(&pe->dma_link, &phb->ioda.pe_dma_list);
> -}
> -
> -static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
> -{
> -	/* This is quite simplistic. The "base" weight of a device
> -	 * is 10. 0 means no DMA is to be accounted for it.
> -	 */
> -
> -	/* If it's a bridge, no DMA */
> -	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> -		return 0;
> -
> -	/* Reduce the weight of slow USB controllers */
> -	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
> -	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
> -	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> -		return 3;
> -
> -	/* Increase the weight of RAID (includes Obsidian) */
> -	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
> -		return 15;
> -
> -	/* Default */
> -	return 10;
> -}
> -
>   #ifdef CONFIG_PCI_IOV
>   static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>   {
> @@ -1028,7 +990,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>   	pe->flags = PNV_IODA_PE_DEV;
>   	pe->pdev = dev;
>   	pe->pbus = NULL;
> -	pe->tce32_seg = -1;
>   	pe->mve_number = -1;
>   	pe->rid = dev->bus->number << 8 | pdn->devfn;
>
> @@ -1044,16 +1005,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>   		return NULL;
>   	}
>
> -	/* Assign a DMA weight to the device */
> -	pe->dma_weight = pnv_ioda_dma_weight(dev);
> -	if (pe->dma_weight != 0) {
> -		phb->ioda.dma_weight += pe->dma_weight;
> -		phb->ioda.dma_pe_count++;
> -	}
> -
> -	/* Link the PE */
> -	pnv_ioda_link_pe_by_weight(phb, pe);
> -
>   	return pe;
>   }
>
> @@ -1071,7 +1022,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>   		}
>   		pdn->pcidev = dev;
>   		pdn->pe_number = pe->pe_number;
> -		pe->dma_weight += pnv_ioda_dma_weight(dev);
>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>   			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>   	}
> @@ -1108,10 +1058,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>   	pe->pbus = bus;
>   	pe->pdev = NULL;
> -	pe->tce32_seg = -1;
>   	pe->mve_number = -1;
>   	pe->rid = bus->busn_res.start << 8;
> -	pe->dma_weight = 0;
>
>   	if (all)
>   		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
> @@ -1133,17 +1081,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>
>   	/* Put PE to the list */
>   	list_add_tail(&pe->list, &phb->ioda.pe_list);
> -
> -	/* Account for one DMA PE if at least one DMA capable device exist
> -	 * below the bridge
> -	 */
> -	if (pe->dma_weight != 0) {
> -		phb->ioda.dma_weight += pe->dma_weight;
> -		phb->ioda.dma_pe_count++;
> -	}
> -
> -	/* Link the PE */
> -	pnv_ioda_link_pe_by_weight(phb, pe);
>   }
>
>   static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
> @@ -1184,7 +1121,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
>   			rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
>   			npu_pdn->pcidev = npu_pdev;
>   			npu_pdn->pe_number = pe_num;
> -			pe->dma_weight += pnv_ioda_dma_weight(npu_pdev);
>   			phb->ioda.pe_rmap[rid] = pe->pe_number;
>
>   			/* Map the PE to this link */
> @@ -1532,7 +1468,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>   		pe->flags = PNV_IODA_PE_VF;
>   		pe->pbus = NULL;
>   		pe->parent_dev = pdev;
> -		pe->tce32_seg = -1;
>   		pe->mve_number = -1;
>   		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
>   			   pci_iov_virtfn_devfn(pdev, vf_index);
> @@ -2023,6 +1958,54 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
>   	.free = pnv_ioda2_table_free,
>   };
>
> +static int pnv_pci_ioda_dev_dma_weight(struct pci_dev *dev, void *data)
> +{
> +	unsigned int *weight = (unsigned int *)data;
> +
> +	/* This is quite simplistic. The "base" weight of a device
> +	 * is 10. 0 means no DMA is to be accounted for it.
> +	 */
> +	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> +		return 0;
> +
> +	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
> +	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
> +	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> +		*weight += 3;
> +	else if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
> +		*weight += 15;
> +	else
> +		*weight += 10;
> +
> +	return 0;
> +}
> +
> +static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe)
> +{
> +	unsigned int weight = 0;
> +
> +	if ((pe->flags & PNV_IODA_PE_DEV) && pe->pdev) {
> +		pnv_pci_ioda_dev_dma_weight(pe->pdev, &weight);
> +	} else if ((pe->flags & PNV_IODA_PE_BUS) && pe->pbus) {
> +		struct pci_dev *pdev;
> +
> +		list_for_each_entry(pdev, &pe->pbus->devices, bus_list)
> +			pnv_pci_ioda_dev_dma_weight(pdev, &weight);
> +	} else if ((pe->flags & PNV_IODA_PE_BUS_ALL) && pe->pbus) {
> +		pci_walk_bus(pe->pbus, pnv_pci_ioda_dev_dma_weight, &weight);
> +	}
> +
> +	return weight;
> +}
> +
> +static unsigned int pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)


s/pnv_pci_ioda_total_dma_weight/pnv_pci_ioda1_phb_dma_weight/ ? "total" 
does not say much. Or just merge it into pnv_pci_ioda1_setup_dma_pe() as it 
is useless for anything but IODA1.




> +{
> +	unsigned int weight = 0;
> +
> +	pci_walk_bus(phb->hose->bus, pnv_pci_ioda_dev_dma_weight, &weight);
> +	return weight;
> +}
> +
>   static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   				       struct pnv_ioda_pe *pe,
>   				       unsigned int base,
> @@ -2039,17 +2022,12 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>   	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>
> -	/* We shouldn't already have a 32-bit DMA associated */
> -	if (WARN_ON(pe->tce32_seg >= 0))
> -		return;
> -
>   	tbl = pnv_pci_table_alloc(phb->hose->node);
>   	iommu_register_group(&pe->table_group, phb->hose->global_number,
>   			pe->pe_number);
>   	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>
>   	/* Grab a 32-bit TCE table */
> -	pe->tce32_seg = base;
>   	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>   		base * PNV_IODA1_DMA32_SEGSIZE,
>   		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
> @@ -2116,8 +2094,6 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   	return;
>    fail:
>   	/* XXX Failure: Try to fallback to 64-bit only ? */
> -	if (pe->tce32_seg >= 0)
> -		pe->tce32_seg = -1;
>   	if (tce_mem)
>   		__free_pages(tce_mem, get_order(tce32_segsz * segs));
>   	if (tbl) {
> @@ -2528,10 +2504,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   {
>   	int64_t rc;
>
> -	/* We shouldn't already have a 32-bit DMA associated */
> -	if (WARN_ON(pe->tce32_seg >= 0))
> -		return;
> -
>   	/* TVE #1 is selected by PCI address bit 59 */
>   	pe->tce_bypass_base = 1ull << 59;
>
> @@ -2539,7 +2511,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   			pe->pe_number);
>
>   	/* The PE will reserve all possible 32-bits space */
> -	pe->tce32_seg = 0;
>   	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
>   		phb->ioda.m32_pci_base);
>
> @@ -2555,11 +2526,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   #endif
>
>   	rc = pnv_pci_ioda2_setup_default_config(pe);
> -	if (rc) {
> -		if (pe->tce32_seg >= 0)
> -			pe->tce32_seg = -1;
> +	if (rc)
>   		return;
> -	}
>
>   	if (pe->flags & PNV_IODA_PE_DEV)
>   		iommu_add_device(&pe->pdev->dev);
> @@ -2570,24 +2538,32 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   {
>   	struct pci_controller *hose = phb->hose;
> -	unsigned int residual, remaining, segs, tw, base;
> +	unsigned int weight, total_weight, dma_pe_count;
> +	unsigned int residual, remaining, segs, base;
>   	struct pnv_ioda_pe *pe;
>
> +	total_weight = pnv_pci_ioda_total_dma_weight(phb);
> +	dma_pe_count = 0;
> +	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> +		weight = pnv_pci_ioda_pe_dma_weight(pe);
> +		if (weight > 0)
> +			dma_pe_count++;
> +	}
> +
>   	/* If we have more PE# than segments available, hand out one
>   	 * per PE until we run out and let the rest fail. If not,
>   	 * then we assign at least one segment per PE, plus more based
>   	 * on the amount of devices under that PE
>   	 */
> -	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
> +	if (dma_pe_count > phb->ioda.tce32_count)
>   		residual = 0;
>   	else
> -		residual = phb->ioda.tce32_count -
> -			phb->ioda.dma_pe_count;
> +		residual = phb->ioda.tce32_count - dma_pe_count;
>
>   	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
>   		hose->global_number, phb->ioda.tce32_count);
>   	pr_info("PCI: %d PE# for a total weight of %d\n",
> -		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
> +		dma_pe_count, total_weight);
>
>   	pnv_pci_ioda_setup_opal_tce_kill(phb);
>
> @@ -2596,18 +2572,20 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   	 * weight
>   	 */
>   	remaining = phb->ioda.tce32_count;
> -	tw = phb->ioda.dma_weight;
>   	base = 0;
> -	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
> -		if (!pe->dma_weight)
> +	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> +		weight = pnv_pci_ioda_pe_dma_weight(pe);
> +		if (!weight)
>   			continue;
> +
>   		if (!remaining) {
>   			pe_warn(pe, "No DMA32 resources available\n");
>   			continue;
>   		}
>   		segs = 1;
>   		if (residual) {
> -			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
> +			segs += ((weight * residual) + (total_weight / 2)) /
> +				total_weight;
>   			if (segs > remaining)
>   				segs = remaining;
>   		}
> @@ -2619,7 +2597,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   		 */
>   		if (phb->type == PNV_PHB_IODA1) {
>   			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
> -				pe->dma_weight, segs);
> +				weight, segs);
>   			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>   		} else if (phb->type == PNV_PHB_IODA2) {
>   			pe_info(pe, "Assign DMA32 space\n");
> @@ -3156,13 +3134,18 @@ static void pnv_npu_ioda_fixup(void)
>   	struct pci_controller *hose, *tmp;
>   	struct pnv_phb *phb;
>   	struct pnv_ioda_pe *pe;
> +	unsigned int weight;
>
>   	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>   		phb = hose->private_data;
>   		if (phb->type != PNV_PHB_NPU)
>   			continue;
>
> -		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
> +		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> +			weight = pnv_pci_ioda_pe_dma_weight(pe);
> +			if (!weight)
> +				continue;

Is this even possible for NPU PE to get weight==0? WARN_ON()? BUG_ON()?



> +
>   			enable_bypass = dma_get_mask(&pe->pdev->dev) ==
>   				DMA_BIT_MASK(64);
>   			pnv_npu_init_dma_pe(pe);
> @@ -3443,7 +3426,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	phb->ioda.pe_array = aux + pemap_off;
>   	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>
> -	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
>   	INIT_LIST_HEAD(&phb->ioda.pe_list);
>   	mutex_init(&phb->ioda.pe_list_mutex);
>
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 1d8e775..e90bcbe 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -53,14 +53,7 @@ struct pnv_ioda_pe {
>   	/* PE number */
>   	unsigned int		pe_number;
>
> -	/* "Weight" assigned to the PE for the sake of DMA resource
> -	 * allocations
> -	 */
> -	unsigned int		dma_weight;
> -
>   	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
> -	int			tce32_seg;
> -	int			tce32_segcount;
>   	struct iommu_table_group table_group;
>
>   	/* 64-bit TCE bypass region */
> @@ -78,7 +71,6 @@ struct pnv_ioda_pe {
>   	struct list_head	slaves;
>
>   	/* Link in list of PE#s */
> -	struct list_head	dma_link;
>   	struct list_head	list;
>   };
>
> @@ -173,17 +165,6 @@ struct pnv_phb {
>   		/* 32-bit TCE tables allocation */
>   		unsigned long		tce32_count;
>
> -		/* Total "weight" for the sake of DMA resources
> -		 * allocation
> -		 */
> -		unsigned int		dma_weight;
> -		unsigned int		dma_pe_count;
> -
> -		/* Sorted list of used PE's, sorted at
> -		 * boot for resource allocation purposes
> -		 */
> -		struct list_head	pe_dma_list;
> -
>   		/* TCE cache invalidate registers (physical and
>   		 * remapped)
>   		 */
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-13  7:42   ` Gavin Shan
@ 2016-04-13  9:14       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  9:14 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, dja-Yfaxwxk/+vWsTnJN9+BGXg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A

On 04/13/2016 05:42 PM, Gavin Shan wrote:
> On Wed, Apr 13, 2016 at 05:28:15PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>> This series of patches rebases on powerpc/next branch, plus below additional
>>> patches:
>>>
>>>     <This series of patches>
>>>     <Followup 3 patches from Gavin on SRIOV EEH, which aren't posted>
>>>     https://patchwork.ozlabs.org/patch/581315/	(PATCH[1/9] Richard's SRIOV EEH)
>>>     https://patchwork.ozlabs.org/patch/582639/	(PATCH[1/1] Gavin's EEH fix)
>>>     https://patchwork.ozlabs.org/patch/582093/	(PATCH[1/1] Gavin's EEH fix)
>>>     https://patchwork.ozlabs.org/patch/580626/	(PATCH[1/4] Gavin's PCI fix)
>>>     https://patchwork.ozlabs.org/patch/580153/	(PATCH[1/1] Andrew's EEH minor fix)
>>>     https://patchwork.ozlabs.org/patch/566827/	(PATCH[1/1] Russell's P5IOC2 removal)
>>>     https://patchwork.ozlabs.org/patch/534154/	(PATCH[1/7] Richard's SRIOV rework)
>>>     commit 388f7b1 ("Linux 4.5-rc3")
>>>
>>> The series of patches intend to support PCI slot for PowerPC PowerNV platform,
>>> which is running on top of skiboot firmware. The patchset requires corresponding
>>> changes from skiboot firmware, which is sent to skiboot-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
>>> for review. The PCI slots are exposed by skiboot with device node properties,
>>> and kernel utilizes those properties to populated PCI slots accordingly.
>>>
>>> The original PCI infrastructure on PowerNV platform can't support hotplug
>>> because the PE is assigned during PHB fixup time, which is called for once
>>> during system boot time. For this, the PCI infrastructure on PowerNV platform
>>> has been reworked for a lot. After that, the PE and its corresponding resources
>>> (IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
>>> PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
>>> resources, on P8 strictly speaking). Each PE will maintain a reference count,
>>> which is (number of child PCI devices + 1). That indicates when last child PCI
>>> device leaves the PE, the PE and its included resources will be relased and put
>>> back into free pool again. With this design, the PE will be released when EEH PE
>>> is released. PATCH[1 - 23] are related to this part.
>>>
>>>  From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
>>> resets to EEH. The kernel gets to know if skiboot supports various reset on one
>>> particular PCI slot through device-tree node. If it does, EEH will utilize the
>>> functionality provided by skiboot. Besides, the device-tree nodes have to change
>>> in order to support PCI hotplug. For example, when one PCI adapter inserted to
>>> one slot, its device-tree node should be added to the system dynamically. Conversely,
>>> the device-tree node should be removed from the system when the PCI adapter is going
>>> to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
>>> they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
>>> doing the related work.
>>>
>>> The OF driver is changed to support unflattening FDT blob for sub-stree, which
>>> is covered by PATCH[40 - 44].
>>>
>>> The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC PowerNV
>>> platform.
>>>
>>> =======
>>> Testing
>>> =======
>>> 1. Unplug adapters behind non-empty slot, then plug them.
>>>
>>>     1.1 Check status
>>>     # cat /sys/bus/pci/slots/C10/address
>>>     0003:09:00
>>>     # cat /sys/bus/pci/slots/C10/adapter
>>>     1
>>>     # cat /sys/bus/pci/slots/C10/power
>>>     1
>>>     # lspci
>>>     0003:09:00.0 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>     0003:09:00.1 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>     0003:09:00.2 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>     0003:09:00.3 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>     # lspci -t
>>>     # lspci -t
>>>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>      |                                           +-08.0-[04-08]--
>>>      |                                           +-09.0-[09]--+-00.0
>>>      |                                           |            +-00.1
>>>      |                                           |            +-00.2
>>>      |                                           |            \-00.3
>>>      |                                           +-10.0-[0a-0e]--
>>>      |                                           \-11.0-[0f-13]--
>>>
>>>     1.2 Unplug adapter 0003:09.00.x
>>>     # echo 0 > /sys/bus/pci/slots/C10/power
>>>     # lspci -t
>>>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>      |                                           +-08.0-[04-08]--
>>>      |                                           +-09.0-[09]--
>>>      |                                           +-10.0-[0a-0e]--
>>>      |                                           \-11.0-[0f-13]--
>>>
>>>     1.3 Plug adapter 0003:09.00.x
>>>     # echo 1 > /sys/bus/pci/slots/C10/power
>>
>>
>> Do I understand correctly that the adapter was not physically moved in/out of
>> the slot between 1.2 and 1.3?
>>
>
> Correct.


This is not right then... Someone should try it, on both P7 and P8.



>
>>
>>
>>>     # lspci -t
>>>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>      |                                           +-08.0-[04-08]--
>>>      |                                           +-09.0-[09]--+-00.0
>>>      |                                           |            +-00.1
>>>      |                                           |            +-00.2
>>>      |                                           |            \-00.3
>>>      |                                           +-10.0-[0a-0e]--
>>>      |                                           \-11.0-[0f-13]--
>>>
>>>
>>>     1.4 Inject EEH error to adapter 0003:09:00.x, which is recovered.
>>
>> I am confused - why is this needed to test hotplug?
>>
>
> Without the series, the EEH reset is always done by kenrel. With the
> series applied, the EEH reset could be done in skiboot.


Why exactly cannot EEH reset changes go to a smaller separate patchset 
(before hotplug)?



> That's the
> major change introduced by the series from EEH's perspective. Also,
> the EEH code was touched.
>
>>>     # cat /sys/bus/pci/devices/0003:09:00.0/eeh_pe_config_addr
>>>     0x1
>>>     # echo 1:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0003/err_injct
>>>     # lspci -ns 0003:09:00.0
>>>     # dmesg | grep EEH
>>>     EEH: Frozen PHB#3-PE#1 detected
>>>     EEH: PE location: U78C9.001.WZS00CF-P1-C10, PHB location: N/A
>>>     EEH: Detected PCI bus error on PHB#3-PE#1
>>>     EEH: This PCI device has failed 1 times in the last hour
>>>     EEH: Notify device drivers to shutdown
>>>     EEH: Collect temporary log
>>>     EEH: Reset without hotplug activity
>>>     EEH: Notify device drivers the completion of reset
>>>     EEH: Notify device driver to resume
>>>
>>> 2. Plug adapter and then unplug it. This requires hack in skiboot
>>>     to skip probing the adapters behind the target (C12 in the
>>>     testing) for once.
>>>
>>>     2.1 Check status
>>>     # cat /sys/bus/pci/slots/C12/address
>>>     0001:06
>>>     # cat /sys/bus/pci/slots/C12/power
>>>     0
>>>     # cat /sys/bus/pci/slots/C12/adapter
>>>     1
>>>     # lspci -t
>>>     +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>>                                                 +-08.0-[05]----00.0
>>>                                                 \-09.0-[06-0a]--
>>>
>>>     2.2 Plug adapter 0001:06:00.x
>>>     # echo 1 > /sys/bus/pci/slots/C12/power
>>>     # lspci -t
>>>     +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>>                                                 +-08.0-[05]----00.0
>>>                                                 \-09.0-[06-0a]--+-00.0
>>>                                                                 \-00.1
>>>     # lspci
>>>     0001:06:00.0 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>>>     0001:06:00.1 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>>>
>>>     2.3 Inject EEH error to adapter 0001:06:00.x, which is recovered
>>>     # cat /sys/bus/pci/devices/0001:06:00.0/eeh_pe_config_addr
>>>     0x2
>>>     # echo 2:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0001/err_injct
>>>     # dmesg | grep EEH
>>>     EEH: Frozen PHB#1-PE#2 detected
>>>     EEH: PE location: U78C9.001.WZS00CF-P1-C12, PHB location: N/A
>>>     EEH: Detected PCI bus error on PHB#1-PE#2
>>>     EEH: This PCI device has failed 1 times in the last hour
>>>     EEH: Notify device drivers to shutdown
>>>     EEH: Collect temporary log
>>>     EEH: Reset without hotplug activity
>>>     EEH: Notify device drivers the completion of reset
>>>     EEH: Notify device driver to resume
>>>
>>>     2.4 Unplug adapter 0001:06:00.x
>>>     # echo 0 > /sys/bus/pci/slots/C12/power
>>>     # lspci -t
>>>     +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>>                                                 +-08.0-[05]----00.0
>>>                                                 \-09.0-[06-0a]--
>>>
>>> =========
>>> Changelog
>>> =========
>>> v8:
>>>     * Rebased to linux-powerpc next branch.
>>>     * Resolve comments from Alexey and Daniel on PCI part
>>>     * Resolve comments from Rob on fdt.c
>>>     * Retested (refer to the "Testing section")
>>> v7:
>>>     * Reworked revision to some extent.
>>>     * Rebased to powerpc/next repository.
>>>     * Reorder/split/merge/drop according - Alexey.
>>>     * Defined macros and use array to track IO/M32/M64/DMA32 segments - Alexey.
>>>     * Merged 3 files to one for the hotplug driver - Alexey.
>>>     * As part of OPAL API, defined macros for PCI slot power state, hotplug
>>>       message type. Defined macros for PCI slot power confirmed state in
>>>       hotplug driver.
>>>     * Misc comments from Alexey.
>>>     * Reworked unflatten_dt_node() to avoid recursive function calls.
>>>     * Use EXPORT_SYMBOL_GPL() and document function's input/output - Rob/Frank.
>>> v6:
>>>     * Patch reorder, split, squash - Alexey.
>>>     * Minor coding style - Alexey.
>>>     * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
>>>     * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
>>>     * Concurrent depth as parameter passed to __unflatten_dt_node() - Grant / Alexey
>>>     * Replace overlay with of_changeset - Grant
>>> v5:
>>>     * Rebased to 4.1.rc6 and some unmerged patches as below:
>>>       Alexey's DDW patchset (v11);
>>>       Gavin's EEH error injection support (in mpe's next branch);
>>>       Richard's EEH cleanup patches (in mpe's next branch);
>>>       Richard's EEH support for VF (v7);
>>>       Gavin's misc EEH fixes for 4.2;
>>>     * The revision bases on skiboot corresponding patches (v7):
>>>       https://patchwork.ozlabs.org/patch/480437/
>>>     * Utilize OF overlay to update device-tree with help of newly introduced
>>>       OPAL API opal_get_overlay_dt().
>>>     * Split patches for easy review according to aik's comments.
>>>     * Fix coding style from checkpatchc.pl as pointed by aik.
>>>     * Code cleanup and misc fixup according to aik's input.
>>> v4:
>>>     * Rebased to 4.1.RC1
>>>     * Added API to unflatten FDT blob to device node sub-tree, which is attached
>>>       the indicated parent device node. The original mechanism based on formatted
>>>       string stream has been dropped.
>>>     * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
>>>       was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
>>>       Support" depends on that.
>>> v3:
>>>     * Rebased to 4.1.RC0
>>>     * PowerNV PCI infrasturcture is total refactored in order to support PCI
>>>       hotplug. The PowerNV hotplug driver is also reworked a lot because of
>>>       the changes in skiboot in order to support PCI hotplug.
>>>
>>> Gavin Shan (45):
>>>    PCI: Add pcibios_setup_bridge()
>>>    powerpc/pci: Override pcibios_setup_bridge()
>>>    powerpc/pci: Cleanup on struct pci_controller_ops
>>>    powerpc/powernv: Cleanup on pci_controller_ops instances
>>>    powerpc/powernv: Drop phb->bdfn_to_pe()
>>>    powerpc/powernv: Reorder fields in struct pnv_phb
>>>    powerpc/powernv: Rename PE# fields in struct pnv_phb
>>>    powerpc/powernv: Fix initial IO and M32 segmap
>>>    powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
>>>    powerpc/powernv: IO and M32 mapping based on PCI device resources
>>>    powerpc/powernv: Track M64 segment consumption
>>>    powerpc/powernv: Rename M64 related functions
>>>    powerpc/powernv/ioda1: M64 support on P7IOC
>>>    powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()
>>>    powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
>>>    powerpc/powernv: Remove DMA32 PE list
>>>    powerpc/powernv/ioda1: Improve DMA32 segment track
>>>    powerpc/powernv: Increase PE# capacity
>>>    powerpc/powernv: Use PE instead of number during setup and release
>>>    powerpc/powernv: Allocate PE# in reverse order
>>>    powerpc/powernv: Create PEs at PCI hot plugging time
>>>    powerpc/powernv/ioda1: Support releasing IODA1 TCE table
>>>    powerpc/powernv: Dynamically release PEs
>>>    powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
>>>    powerpc/pci: Rename pcibios_find_pci_bus()
>>>    powerpc/pci: Move pci_find_bus_by_node() around
>>>    powerpc/pci: Export pci_add_device_node_info()
>>>    powerpc/pci: Introduce pci_remove_device_node_info()
>>>    powerpc/pci: Export pci_traverse_device_nodes()
>>>    powerpc/pci: Delay populating pdn
>>>    powerpc/pci: Don't scan empty slot
>>>    powerpc/pci: Update bridge windows on PCI plug
>>>    powerpc/powernv: Simplify pnv_eeh_reset()
>>>    powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
>>>    powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
>>>    powerpc/powernv: Support PCI slot ID
>>>    powerpc/powernv: Use firmware PCI slot reset infrastructure
>>>    powerpc/powernv: Functions to get/set PCI slot status
>>>    powerpc/powernv: Select OF_DYNAMIC
>>>    drivers/of: Split unflatten_dt_node()
>>>    drivers/of: Avoid recursively calling unflatten_dt_node()
>>>    drivers/of: Rename unflatten_dt_node()
>>>    drivers/of: Specify parent node in of_fdt_unflatten_tree()
>>>    drivers/of: Return allocated memory from of_fdt_unflatten_tree()
>>>    PCI/hotplug: PowerPC PowerNV PCI hotplug driver
>>>
>>>   arch/powerpc/include/asm/eeh.h                 |    2 +-
>>>   arch/powerpc/include/asm/opal-api.h            |   17 +-
>>>   arch/powerpc/include/asm/opal.h                |    8 +-
>>>   arch/powerpc/include/asm/pci-bridge.h          |   25 +-
>>>   arch/powerpc/include/asm/pnv-pci.h             |    7 +
>>>   arch/powerpc/include/asm/ppc-pci.h             |    8 +-
>>>   arch/powerpc/kernel/eeh_dev.c                  |   17 +-
>>>   arch/powerpc/kernel/eeh_driver.c               |   12 +-
>>>   arch/powerpc/kernel/pci-common.c               |   16 +-
>>>   arch/powerpc/kernel/pci-hotplug.c              |   47 +-
>>>   arch/powerpc/kernel/pci_dn.c                   |   89 +-
>>>   arch/powerpc/platforms/maple/pci.c             |   34 +-
>>>   arch/powerpc/platforms/pasemi/pci.c            |    3 -
>>>   arch/powerpc/platforms/powermac/pci.c          |   38 +-
>>>   arch/powerpc/platforms/powernv/Kconfig         |    1 +
>>>   arch/powerpc/platforms/powernv/eeh-powernv.c   |  179 ++--
>>>   arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
>>>   arch/powerpc/platforms/powernv/pci-ioda.c      | 1243 +++++++++++++++---------
>>>   arch/powerpc/platforms/powernv/pci.c           |   92 +-
>>>   arch/powerpc/platforms/powernv/pci.h           |   60 +-
>>>   arch/powerpc/platforms/pseries/msi.c           |    4 +-
>>>   arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
>>>   arch/powerpc/platforms/pseries/setup.c         |    8 +-
>>>   drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c   |    2 +-
>>>   drivers/of/fdt.c                               |  372 ++++---
>>>   drivers/of/unittest.c                          |    2 +-
>>>   drivers/pci/hotplug/Kconfig                    |   12 +
>>>   drivers/pci/hotplug/Makefile                   |    3 +
>>>   drivers/pci/hotplug/pnv_php.c                  |  870 +++++++++++++++++
>>>   drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
>>>   drivers/pci/hotplug/rpaphp_core.c              |    4 +-
>>>   drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
>>>   drivers/pci/setup-bus.c                        |    5 +
>>>   include/linux/of_fdt.h                         |    5 +-
>>>   include/linux/pci.h                            |    1 +
>>>   35 files changed, 2360 insertions(+), 874 deletions(-)
>>>   create mode 100644 drivers/pci/hotplug/pnv_php.c
>>>
>>
>>
>> --
>> Alexey
>>
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
@ 2016-04-13  9:14       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  9:14 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/13/2016 05:42 PM, Gavin Shan wrote:
> On Wed, Apr 13, 2016 at 05:28:15PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>> This series of patches rebases on powerpc/next branch, plus below additional
>>> patches:
>>>
>>>     <This series of patches>
>>>     <Followup 3 patches from Gavin on SRIOV EEH, which aren't posted>
>>>     https://patchwork.ozlabs.org/patch/581315/	(PATCH[1/9] Richard's SRIOV EEH)
>>>     https://patchwork.ozlabs.org/patch/582639/	(PATCH[1/1] Gavin's EEH fix)
>>>     https://patchwork.ozlabs.org/patch/582093/	(PATCH[1/1] Gavin's EEH fix)
>>>     https://patchwork.ozlabs.org/patch/580626/	(PATCH[1/4] Gavin's PCI fix)
>>>     https://patchwork.ozlabs.org/patch/580153/	(PATCH[1/1] Andrew's EEH minor fix)
>>>     https://patchwork.ozlabs.org/patch/566827/	(PATCH[1/1] Russell's P5IOC2 removal)
>>>     https://patchwork.ozlabs.org/patch/534154/	(PATCH[1/7] Richard's SRIOV rework)
>>>     commit 388f7b1 ("Linux 4.5-rc3")
>>>
>>> The series of patches intend to support PCI slot for PowerPC PowerNV platform,
>>> which is running on top of skiboot firmware. The patchset requires corresponding
>>> changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
>>> for review. The PCI slots are exposed by skiboot with device node properties,
>>> and kernel utilizes those properties to populated PCI slots accordingly.
>>>
>>> The original PCI infrastructure on PowerNV platform can't support hotplug
>>> because the PE is assigned during PHB fixup time, which is called for once
>>> during system boot time. For this, the PCI infrastructure on PowerNV platform
>>> has been reworked for a lot. After that, the PE and its corresponding resources
>>> (IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
>>> PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
>>> resources, on P8 strictly speaking). Each PE will maintain a reference count,
>>> which is (number of child PCI devices + 1). That indicates when last child PCI
>>> device leaves the PE, the PE and its included resources will be relased and put
>>> back into free pool again. With this design, the PE will be released when EEH PE
>>> is released. PATCH[1 - 23] are related to this part.
>>>
>>>  From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
>>> resets to EEH. The kernel gets to know if skiboot supports various reset on one
>>> particular PCI slot through device-tree node. If it does, EEH will utilize the
>>> functionality provided by skiboot. Besides, the device-tree nodes have to change
>>> in order to support PCI hotplug. For example, when one PCI adapter inserted to
>>> one slot, its device-tree node should be added to the system dynamically. Conversely,
>>> the device-tree node should be removed from the system when the PCI adapter is going
>>> to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
>>> they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
>>> doing the related work.
>>>
>>> The OF driver is changed to support unflattening FDT blob for sub-stree, which
>>> is covered by PATCH[40 - 44].
>>>
>>> The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC PowerNV
>>> platform.
>>>
>>> =======
>>> Testing
>>> =======
>>> 1. Unplug adapters behind non-empty slot, then plug them.
>>>
>>>     1.1 Check status
>>>     # cat /sys/bus/pci/slots/C10/address
>>>     0003:09:00
>>>     # cat /sys/bus/pci/slots/C10/adapter
>>>     1
>>>     # cat /sys/bus/pci/slots/C10/power
>>>     1
>>>     # lspci
>>>     0003:09:00.0 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>     0003:09:00.1 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>     0003:09:00.2 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>     0003:09:00.3 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>     # lspci -t
>>>     # lspci -t
>>>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>      |                                           +-08.0-[04-08]--
>>>      |                                           +-09.0-[09]--+-00.0
>>>      |                                           |            +-00.1
>>>      |                                           |            +-00.2
>>>      |                                           |            \-00.3
>>>      |                                           +-10.0-[0a-0e]--
>>>      |                                           \-11.0-[0f-13]--
>>>
>>>     1.2 Unplug adapter 0003:09.00.x
>>>     # echo 0 > /sys/bus/pci/slots/C10/power
>>>     # lspci -t
>>>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>      |                                           +-08.0-[04-08]--
>>>      |                                           +-09.0-[09]--
>>>      |                                           +-10.0-[0a-0e]--
>>>      |                                           \-11.0-[0f-13]--
>>>
>>>     1.3 Plug adapter 0003:09.00.x
>>>     # echo 1 > /sys/bus/pci/slots/C10/power
>>
>>
>> Do I understand correctly that the adapter was not physically moved in/out of
>> the slot between 1.2 and 1.3?
>>
>
> Correct.


This is not right then... Someone should try it, on both P7 and P8.



>
>>
>>
>>>     # lspci -t
>>>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>      |                                           +-08.0-[04-08]--
>>>      |                                           +-09.0-[09]--+-00.0
>>>      |                                           |            +-00.1
>>>      |                                           |            +-00.2
>>>      |                                           |            \-00.3
>>>      |                                           +-10.0-[0a-0e]--
>>>      |                                           \-11.0-[0f-13]--
>>>
>>>
>>>     1.4 Inject EEH error to adapter 0003:09:00.x, which is recovered.
>>
>> I am confused - why is this needed to test hotplug?
>>
>
> Without the series, the EEH reset is always done by kenrel. With the
> series applied, the EEH reset could be done in skiboot.


Why exactly cannot EEH reset changes go to a smaller separate patchset 
(before hotplug)?



> That's the
> major change introduced by the series from EEH's perspective. Also,
> the EEH code was touched.
>
>>>     # cat /sys/bus/pci/devices/0003:09:00.0/eeh_pe_config_addr
>>>     0x1
>>>     # echo 1:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0003/err_injct
>>>     # lspci -ns 0003:09:00.0
>>>     # dmesg | grep EEH
>>>     EEH: Frozen PHB#3-PE#1 detected
>>>     EEH: PE location: U78C9.001.WZS00CF-P1-C10, PHB location: N/A
>>>     EEH: Detected PCI bus error on PHB#3-PE#1
>>>     EEH: This PCI device has failed 1 times in the last hour
>>>     EEH: Notify device drivers to shutdown
>>>     EEH: Collect temporary log
>>>     EEH: Reset without hotplug activity
>>>     EEH: Notify device drivers the completion of reset
>>>     EEH: Notify device driver to resume
>>>
>>> 2. Plug adapter and then unplug it. This requires hack in skiboot
>>>     to skip probing the adapters behind the target (C12 in the
>>>     testing) for once.
>>>
>>>     2.1 Check status
>>>     # cat /sys/bus/pci/slots/C12/address
>>>     0001:06
>>>     # cat /sys/bus/pci/slots/C12/power
>>>     0
>>>     # cat /sys/bus/pci/slots/C12/adapter
>>>     1
>>>     # lspci -t
>>>     +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>>                                                 +-08.0-[05]----00.0
>>>                                                 \-09.0-[06-0a]--
>>>
>>>     2.2 Plug adapter 0001:06:00.x
>>>     # echo 1 > /sys/bus/pci/slots/C12/power
>>>     # lspci -t
>>>     +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>>                                                 +-08.0-[05]----00.0
>>>                                                 \-09.0-[06-0a]--+-00.0
>>>                                                                 \-00.1
>>>     # lspci
>>>     0001:06:00.0 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>>>     0001:06:00.1 Ethernet controller: \
>>>     Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>>>
>>>     2.3 Inject EEH error to adapter 0001:06:00.x, which is recovered
>>>     # cat /sys/bus/pci/devices/0001:06:00.0/eeh_pe_config_addr
>>>     0x2
>>>     # echo 2:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0001/err_injct
>>>     # dmesg | grep EEH
>>>     EEH: Frozen PHB#1-PE#2 detected
>>>     EEH: PE location: U78C9.001.WZS00CF-P1-C12, PHB location: N/A
>>>     EEH: Detected PCI bus error on PHB#1-PE#2
>>>     EEH: This PCI device has failed 1 times in the last hour
>>>     EEH: Notify device drivers to shutdown
>>>     EEH: Collect temporary log
>>>     EEH: Reset without hotplug activity
>>>     EEH: Notify device drivers the completion of reset
>>>     EEH: Notify device driver to resume
>>>
>>>     2.4 Unplug adapter 0001:06:00.x
>>>     # echo 0 > /sys/bus/pci/slots/C12/power
>>>     # lspci -t
>>>     +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>>                                                 +-08.0-[05]----00.0
>>>                                                 \-09.0-[06-0a]--
>>>
>>> =========
>>> Changelog
>>> =========
>>> v8:
>>>     * Rebased to linux-powerpc next branch.
>>>     * Resolve comments from Alexey and Daniel on PCI part
>>>     * Resolve comments from Rob on fdt.c
>>>     * Retested (refer to the "Testing section")
>>> v7:
>>>     * Reworked revision to some extent.
>>>     * Rebased to powerpc/next repository.
>>>     * Reorder/split/merge/drop according - Alexey.
>>>     * Defined macros and use array to track IO/M32/M64/DMA32 segments - Alexey.
>>>     * Merged 3 files to one for the hotplug driver - Alexey.
>>>     * As part of OPAL API, defined macros for PCI slot power state, hotplug
>>>       message type. Defined macros for PCI slot power confirmed state in
>>>       hotplug driver.
>>>     * Misc comments from Alexey.
>>>     * Reworked unflatten_dt_node() to avoid recursive function calls.
>>>     * Use EXPORT_SYMBOL_GPL() and document function's input/output - Rob/Frank.
>>> v6:
>>>     * Patch reorder, split, squash - Alexey.
>>>     * Minor coding style - Alexey.
>>>     * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
>>>     * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
>>>     * Concurrent depth as parameter passed to __unflatten_dt_node() - Grant / Alexey
>>>     * Replace overlay with of_changeset - Grant
>>> v5:
>>>     * Rebased to 4.1.rc6 and some unmerged patches as below:
>>>       Alexey's DDW patchset (v11);
>>>       Gavin's EEH error injection support (in mpe's next branch);
>>>       Richard's EEH cleanup patches (in mpe's next branch);
>>>       Richard's EEH support for VF (v7);
>>>       Gavin's misc EEH fixes for 4.2;
>>>     * The revision bases on skiboot corresponding patches (v7):
>>>       https://patchwork.ozlabs.org/patch/480437/
>>>     * Utilize OF overlay to update device-tree with help of newly introduced
>>>       OPAL API opal_get_overlay_dt().
>>>     * Split patches for easy review according to aik's comments.
>>>     * Fix coding style from checkpatchc.pl as pointed by aik.
>>>     * Code cleanup and misc fixup according to aik's input.
>>> v4:
>>>     * Rebased to 4.1.RC1
>>>     * Added API to unflatten FDT blob to device node sub-tree, which is attached
>>>       the indicated parent device node. The original mechanism based on formatted
>>>       string stream has been dropped.
>>>     * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
>>>       was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
>>>       Support" depends on that.
>>> v3:
>>>     * Rebased to 4.1.RC0
>>>     * PowerNV PCI infrasturcture is total refactored in order to support PCI
>>>       hotplug. The PowerNV hotplug driver is also reworked a lot because of
>>>       the changes in skiboot in order to support PCI hotplug.
>>>
>>> Gavin Shan (45):
>>>    PCI: Add pcibios_setup_bridge()
>>>    powerpc/pci: Override pcibios_setup_bridge()
>>>    powerpc/pci: Cleanup on struct pci_controller_ops
>>>    powerpc/powernv: Cleanup on pci_controller_ops instances
>>>    powerpc/powernv: Drop phb->bdfn_to_pe()
>>>    powerpc/powernv: Reorder fields in struct pnv_phb
>>>    powerpc/powernv: Rename PE# fields in struct pnv_phb
>>>    powerpc/powernv: Fix initial IO and M32 segmap
>>>    powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
>>>    powerpc/powernv: IO and M32 mapping based on PCI device resources
>>>    powerpc/powernv: Track M64 segment consumption
>>>    powerpc/powernv: Rename M64 related functions
>>>    powerpc/powernv/ioda1: M64 support on P7IOC
>>>    powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()
>>>    powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
>>>    powerpc/powernv: Remove DMA32 PE list
>>>    powerpc/powernv/ioda1: Improve DMA32 segment track
>>>    powerpc/powernv: Increase PE# capacity
>>>    powerpc/powernv: Use PE instead of number during setup and release
>>>    powerpc/powernv: Allocate PE# in reverse order
>>>    powerpc/powernv: Create PEs at PCI hot plugging time
>>>    powerpc/powernv/ioda1: Support releasing IODA1 TCE table
>>>    powerpc/powernv: Dynamically release PEs
>>>    powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
>>>    powerpc/pci: Rename pcibios_find_pci_bus()
>>>    powerpc/pci: Move pci_find_bus_by_node() around
>>>    powerpc/pci: Export pci_add_device_node_info()
>>>    powerpc/pci: Introduce pci_remove_device_node_info()
>>>    powerpc/pci: Export pci_traverse_device_nodes()
>>>    powerpc/pci: Delay populating pdn
>>>    powerpc/pci: Don't scan empty slot
>>>    powerpc/pci: Update bridge windows on PCI plug
>>>    powerpc/powernv: Simplify pnv_eeh_reset()
>>>    powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
>>>    powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
>>>    powerpc/powernv: Support PCI slot ID
>>>    powerpc/powernv: Use firmware PCI slot reset infrastructure
>>>    powerpc/powernv: Functions to get/set PCI slot status
>>>    powerpc/powernv: Select OF_DYNAMIC
>>>    drivers/of: Split unflatten_dt_node()
>>>    drivers/of: Avoid recursively calling unflatten_dt_node()
>>>    drivers/of: Rename unflatten_dt_node()
>>>    drivers/of: Specify parent node in of_fdt_unflatten_tree()
>>>    drivers/of: Return allocated memory from of_fdt_unflatten_tree()
>>>    PCI/hotplug: PowerPC PowerNV PCI hotplug driver
>>>
>>>   arch/powerpc/include/asm/eeh.h                 |    2 +-
>>>   arch/powerpc/include/asm/opal-api.h            |   17 +-
>>>   arch/powerpc/include/asm/opal.h                |    8 +-
>>>   arch/powerpc/include/asm/pci-bridge.h          |   25 +-
>>>   arch/powerpc/include/asm/pnv-pci.h             |    7 +
>>>   arch/powerpc/include/asm/ppc-pci.h             |    8 +-
>>>   arch/powerpc/kernel/eeh_dev.c                  |   17 +-
>>>   arch/powerpc/kernel/eeh_driver.c               |   12 +-
>>>   arch/powerpc/kernel/pci-common.c               |   16 +-
>>>   arch/powerpc/kernel/pci-hotplug.c              |   47 +-
>>>   arch/powerpc/kernel/pci_dn.c                   |   89 +-
>>>   arch/powerpc/platforms/maple/pci.c             |   34 +-
>>>   arch/powerpc/platforms/pasemi/pci.c            |    3 -
>>>   arch/powerpc/platforms/powermac/pci.c          |   38 +-
>>>   arch/powerpc/platforms/powernv/Kconfig         |    1 +
>>>   arch/powerpc/platforms/powernv/eeh-powernv.c   |  179 ++--
>>>   arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
>>>   arch/powerpc/platforms/powernv/pci-ioda.c      | 1243 +++++++++++++++---------
>>>   arch/powerpc/platforms/powernv/pci.c           |   92 +-
>>>   arch/powerpc/platforms/powernv/pci.h           |   60 +-
>>>   arch/powerpc/platforms/pseries/msi.c           |    4 +-
>>>   arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
>>>   arch/powerpc/platforms/pseries/setup.c         |    8 +-
>>>   drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c   |    2 +-
>>>   drivers/of/fdt.c                               |  372 ++++---
>>>   drivers/of/unittest.c                          |    2 +-
>>>   drivers/pci/hotplug/Kconfig                    |   12 +
>>>   drivers/pci/hotplug/Makefile                   |    3 +
>>>   drivers/pci/hotplug/pnv_php.c                  |  870 +++++++++++++++++
>>>   drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
>>>   drivers/pci/hotplug/rpaphp_core.c              |    4 +-
>>>   drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
>>>   drivers/pci/setup-bus.c                        |    5 +
>>>   include/linux/of_fdt.h                         |    5 +-
>>>   include/linux/pci.h                            |    1 +
>>>   35 files changed, 2360 insertions(+), 874 deletions(-)
>>>   create mode 100644 drivers/pci/hotplug/pnv_php.c
>>>
>>
>>
>> --
>> Alexey
>>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap
  2016-04-13  7:53       ` Gavin Shan
  (?)
@ 2016-04-13  9:53       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-13  9:53 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/13/2016 05:53 PM, Gavin Shan wrote:
> On Wed, Apr 13, 2016 at 04:21:07PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>> There are two arrays for IO and M32 segment maps on every PHB.
>>> The index of the arrays are segment number and the value stored
>>> in the corresponding element is PE number, indicating the segment
>>> is assigned to the PE. Initially, all elements in those two arrays
>>> are zeroes, meaning all segments are assigned to PE#0. It's wrong.
>>>
>>> This fixes the initial values in the elements of those two arrays
>>> to IODA_INVALID_PE, meaning all segments aren't assigned to any
>>> PE.
>>
>> This is ok.
>>
>>> In order to use IODA_INVALID_PE (-1) to represent invalid PE
>>> number, the types of those two arrays are changed from "unsigned int"
>>> to "int".
>>
>> "unsigned" can carry (-1) perfectly fine, just add a type cast to
>> IODA_INVALID_PE:
>>
>> #define IODA_INVALID_PE    (unsigned int)(-1)
>>
>> Using "signed" type for indexes which cannot be negative does not make much
>> sense - instead of checking for the upper boundary, you have to check for "<
>> 0" too.
>>
>> OPAL uses unsigned type for PE (uint64_t or uint32_t or uint16_t - this is
>> quite funny).
>>
>> pnv_ioda_pe::pe_number is "unsigned" and this pe_number is the same thing as
>> I can see in pnv_ioda_setup_dev_PE().
>>
>> Some printk() print the PE number as "%x" (which implies "unsigned").
>>
>
> Yes, I can simply have something like below when PE number as well as
> segment index are represented by "unsigned int" values, right?
>
> #define IODA_INVALID_PE		0xffffffff


This will work too, yes.

>
>>
>> I suggest changing the pci_dn::pe_number type from "int" to "unsigned int" to
>> match pnv_ioda_pe::pe_number, in a separate patch. Or do not touch types for
>> now.
>>
>
> Yes, I will have a separate patch right before this one to address it.
>
>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++++++--
>>>   arch/powerpc/platforms/powernv/pci.h      | 4 ++--
>>>   2 files changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 1d2514f..44cc5f3 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -3239,7 +3239,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>>   	const __be64 *prop64;
>>>   	const __be32 *prop32;
>>> -	int len;
>>> +	int i, len;
>>>   	u64 phb_id;
>>>   	void *aux;
>>>   	long rc;
>>> @@ -3334,8 +3334,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   	aux = memblock_virt_alloc(size, 0);
>>>   	phb->ioda.pe_alloc = aux;
>>>   	phb->ioda.m32_segmap = aux + m32map_off;
>>> -	if (phb->type == PNV_PHB_IODA1)
>>> +	for (i = 0; i < phb->ioda.total_pe_num; i++)
>>> +		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
>>> +	if (phb->type == PNV_PHB_IODA1) {
>>>   		phb->ioda.io_segmap = aux + iomap_off;
>>> +		for (i = 0; i < phb->ioda.total_pe_num; i++)
>>> +			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
>>> +	}
>>>   	phb->ioda.pe_array = aux + pemap_off;
>>>   	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index 784882a..36c4965 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -146,8 +146,8 @@ struct pnv_phb {
>>>   		struct pnv_ioda_pe	*pe_array;
>>>
>>>   		/* M32 & IO segment maps */
>>> -		unsigned int		*m32_segmap;
>>> -		unsigned int		*io_segmap;
>>> +		int			*m32_segmap;
>>> +		int			*io_segmap;
>>>
>>>   		/* IRQ chip */
>>>   		int			irq_chip_init;
>>>
>>
>>
>> --
>> Alexey
>>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-13  9:14       ` Alexey Kardashevskiy
  (?)
@ 2016-04-13 23:42       ` Gavin Shan
  2016-04-13 23:57         ` Alistair Popple
  2016-04-14  3:26         ` Alexey Kardashevskiy
  -1 siblings, 2 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-13 23:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 13, 2016 at 07:14:59PM +1000, Alexey Kardashevskiy wrote:
>On 04/13/2016 05:42 PM, Gavin Shan wrote:
>>On Wed, Apr 13, 2016 at 05:28:15PM +1000, Alexey Kardashevskiy wrote:
>>>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>>>This series of patches rebases on powerpc/next branch, plus below additional
>>>>patches:
>>>>
>>>>    <This series of patches>
>>>>    <Followup 3 patches from Gavin on SRIOV EEH, which aren't posted>
>>>>    https://patchwork.ozlabs.org/patch/581315/	(PATCH[1/9] Richard's SRIOV EEH)
>>>>    https://patchwork.ozlabs.org/patch/582639/	(PATCH[1/1] Gavin's EEH fix)
>>>>    https://patchwork.ozlabs.org/patch/582093/	(PATCH[1/1] Gavin's EEH fix)
>>>>    https://patchwork.ozlabs.org/patch/580626/	(PATCH[1/4] Gavin's PCI fix)
>>>>    https://patchwork.ozlabs.org/patch/580153/	(PATCH[1/1] Andrew's EEH minor fix)
>>>>    https://patchwork.ozlabs.org/patch/566827/	(PATCH[1/1] Russell's P5IOC2 removal)
>>>>    https://patchwork.ozlabs.org/patch/534154/	(PATCH[1/7] Richard's SRIOV rework)
>>>>    commit 388f7b1 ("Linux 4.5-rc3")
>>>>
>>>>The series of patches intend to support PCI slot for PowerPC PowerNV platform,
>>>>which is running on top of skiboot firmware. The patchset requires corresponding
>>>>changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
>>>>for review. The PCI slots are exposed by skiboot with device node properties,
>>>>and kernel utilizes those properties to populated PCI slots accordingly.
>>>>
>>>>The original PCI infrastructure on PowerNV platform can't support hotplug
>>>>because the PE is assigned during PHB fixup time, which is called for once
>>>>during system boot time. For this, the PCI infrastructure on PowerNV platform
>>>>has been reworked for a lot. After that, the PE and its corresponding resources
>>>>(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
>>>>PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
>>>>resources, on P8 strictly speaking). Each PE will maintain a reference count,
>>>>which is (number of child PCI devices + 1). That indicates when last child PCI
>>>>device leaves the PE, the PE and its included resources will be relased and put
>>>>back into free pool again. With this design, the PE will be released when EEH PE
>>>>is released. PATCH[1 - 23] are related to this part.
>>>>
>>>> From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
>>>>resets to EEH. The kernel gets to know if skiboot supports various reset on one
>>>>particular PCI slot through device-tree node. If it does, EEH will utilize the
>>>>functionality provided by skiboot. Besides, the device-tree nodes have to change
>>>>in order to support PCI hotplug. For example, when one PCI adapter inserted to
>>>>one slot, its device-tree node should be added to the system dynamically. Conversely,
>>>>the device-tree node should be removed from the system when the PCI adapter is going
>>>>to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
>>>>they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
>>>>doing the related work.
>>>>
>>>>The OF driver is changed to support unflattening FDT blob for sub-stree, which
>>>>is covered by PATCH[40 - 44].
>>>>
>>>>The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC PowerNV
>>>>platform.
>>>>
>>>>=======
>>>>Testing
>>>>=======
>>>>1. Unplug adapters behind non-empty slot, then plug them.
>>>>
>>>>    1.1 Check status
>>>>    # cat /sys/bus/pci/slots/C10/address
>>>>    0003:09:00
>>>>    # cat /sys/bus/pci/slots/C10/adapter
>>>>    1
>>>>    # cat /sys/bus/pci/slots/C10/power
>>>>    1
>>>>    # lspci
>>>>    0003:09:00.0 Ethernet controller: \
>>>>    Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>>    0003:09:00.1 Ethernet controller: \
>>>>    Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>>    0003:09:00.2 Ethernet controller: \
>>>>    Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>>    0003:09:00.3 Ethernet controller: \
>>>>    Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>>    # lspci -t
>>>>    # lspci -t
>>>>    -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>>     |                                           +-08.0-[04-08]--
>>>>     |                                           +-09.0-[09]--+-00.0
>>>>     |                                           |            +-00.1
>>>>     |                                           |            +-00.2
>>>>     |                                           |            \-00.3
>>>>     |                                           +-10.0-[0a-0e]--
>>>>     |                                           \-11.0-[0f-13]--
>>>>
>>>>    1.2 Unplug adapter 0003:09.00.x
>>>>    # echo 0 > /sys/bus/pci/slots/C10/power
>>>>    # lspci -t
>>>>    -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>>     |                                           +-08.0-[04-08]--
>>>>     |                                           +-09.0-[09]--
>>>>     |                                           +-10.0-[0a-0e]--
>>>>     |                                           \-11.0-[0f-13]--
>>>>
>>>>    1.3 Plug adapter 0003:09.00.x
>>>>    # echo 1 > /sys/bus/pci/slots/C10/power
>>>
>>>
>>>Do I understand correctly that the adapter was not physically moved in/out of
>>>the slot between 1.2 and 1.3?
>>>
>>
>>Correct.
>
>
>This is not right then... Someone should try it, on both P7 and P8.
>

Do you mean physically pull the adapter out and insert the same
adapter back? What's the point for the test case?

>>
>>>
>>>
>>>>    # lspci -t
>>>>    -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>>     |                                           +-08.0-[04-08]--
>>>>     |                                           +-09.0-[09]--+-00.0
>>>>     |                                           |            +-00.1
>>>>     |                                           |            +-00.2
>>>>     |                                           |            \-00.3
>>>>     |                                           +-10.0-[0a-0e]--
>>>>     |                                           \-11.0-[0f-13]--
>>>>
>>>>
>>>>    1.4 Inject EEH error to adapter 0003:09:00.x, which is recovered.
>>>
>>>I am confused - why is this needed to test hotplug?
>>>
>>
>>Without the series, the EEH reset is always done by kenrel. With the
>>series applied, the EEH reset could be done in skiboot.
>
>
>Why exactly cannot EEH reset changes go to a smaller separate patchset
>(before hotplug)?
>

As I explained before, the patchset's order is: PCI generic part,
PowerNV PCI related, EEH related, device-tree part and hotplug driver.

The EEH reset change is included in PATCH[37/45]. There is no point
to reorder the patches.

>>That's the
>>major change introduced by the series from EEH's perspective. Also,
>>the EEH code was touched.
>>
>>>>    # cat /sys/bus/pci/devices/0003:09:00.0/eeh_pe_config_addr
>>>>    0x1
>>>>    # echo 1:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0003/err_injct
>>>>    # lspci -ns 0003:09:00.0
>>>>    # dmesg | grep EEH
>>>>    EEH: Frozen PHB#3-PE#1 detected
>>>>    EEH: PE location: U78C9.001.WZS00CF-P1-C10, PHB location: N/A
>>>>    EEH: Detected PCI bus error on PHB#3-PE#1
>>>>    EEH: This PCI device has failed 1 times in the last hour
>>>>    EEH: Notify device drivers to shutdown
>>>>    EEH: Collect temporary log
>>>>    EEH: Reset without hotplug activity
>>>>    EEH: Notify device drivers the completion of reset
>>>>    EEH: Notify device driver to resume
>>>>
>>>>2. Plug adapter and then unplug it. This requires hack in skiboot
>>>>    to skip probing the adapters behind the target (C12 in the
>>>>    testing) for once.
>>>>
>>>>    2.1 Check status
>>>>    # cat /sys/bus/pci/slots/C12/address
>>>>    0001:06
>>>>    # cat /sys/bus/pci/slots/C12/power
>>>>    0
>>>>    # cat /sys/bus/pci/slots/C12/adapter
>>>>    1
>>>>    # lspci -t
>>>>    +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>>>                                                +-08.0-[05]----00.0
>>>>                                                \-09.0-[06-0a]--
>>>>
>>>>    2.2 Plug adapter 0001:06:00.x
>>>>    # echo 1 > /sys/bus/pci/slots/C12/power
>>>>    # lspci -t
>>>>    +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>>>                                                +-08.0-[05]----00.0
>>>>                                                \-09.0-[06-0a]--+-00.0
>>>>                                                                \-00.1
>>>>    # lspci
>>>>    0001:06:00.0 Ethernet controller: \
>>>>    Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>>>>    0001:06:00.1 Ethernet controller: \
>>>>    Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
>>>>
>>>>    2.3 Inject EEH error to adapter 0001:06:00.x, which is recovered
>>>>    # cat /sys/bus/pci/devices/0001:06:00.0/eeh_pe_config_addr
>>>>    0x2
>>>>    # echo 2:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0001/err_injct
>>>>    # dmesg | grep EEH
>>>>    EEH: Frozen PHB#1-PE#2 detected
>>>>    EEH: PE location: U78C9.001.WZS00CF-P1-C12, PHB location: N/A
>>>>    EEH: Detected PCI bus error on PHB#1-PE#2
>>>>    EEH: This PCI device has failed 1 times in the last hour
>>>>    EEH: Notify device drivers to shutdown
>>>>    EEH: Collect temporary log
>>>>    EEH: Reset without hotplug activity
>>>>    EEH: Notify device drivers the completion of reset
>>>>    EEH: Notify device driver to resume
>>>>
>>>>    2.4 Unplug adapter 0001:06:00.x
>>>>    # echo 0 > /sys/bus/pci/slots/C12/power
>>>>    # lspci -t
>>>>    +-[0001:00]---00.0-[01-0a]----00.0-[02-0a]--+-01.0-[03-04]----00.0-[04]----00.0
>>>>                                                +-08.0-[05]----00.0
>>>>                                                \-09.0-[06-0a]--
>>>>
>>>>=========
>>>>Changelog
>>>>=========
>>>>v8:
>>>>    * Rebased to linux-powerpc next branch.
>>>>    * Resolve comments from Alexey and Daniel on PCI part
>>>>    * Resolve comments from Rob on fdt.c
>>>>    * Retested (refer to the "Testing section")
>>>>v7:
>>>>    * Reworked revision to some extent.
>>>>    * Rebased to powerpc/next repository.
>>>>    * Reorder/split/merge/drop according - Alexey.
>>>>    * Defined macros and use array to track IO/M32/M64/DMA32 segments - Alexey.
>>>>    * Merged 3 files to one for the hotplug driver - Alexey.
>>>>    * As part of OPAL API, defined macros for PCI slot power state, hotplug
>>>>      message type. Defined macros for PCI slot power confirmed state in
>>>>      hotplug driver.
>>>>    * Misc comments from Alexey.
>>>>    * Reworked unflatten_dt_node() to avoid recursive function calls.
>>>>    * Use EXPORT_SYMBOL_GPL() and document function's input/output - Rob/Frank.
>>>>v6:
>>>>    * Patch reorder, split, squash - Alexey.
>>>>    * Minor coding style - Alexey.
>>>>    * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
>>>>    * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
>>>>    * Concurrent depth as parameter passed to __unflatten_dt_node() - Grant / Alexey
>>>>    * Replace overlay with of_changeset - Grant
>>>>v5:
>>>>    * Rebased to 4.1.rc6 and some unmerged patches as below:
>>>>      Alexey's DDW patchset (v11);
>>>>      Gavin's EEH error injection support (in mpe's next branch);
>>>>      Richard's EEH cleanup patches (in mpe's next branch);
>>>>      Richard's EEH support for VF (v7);
>>>>      Gavin's misc EEH fixes for 4.2;
>>>>    * The revision bases on skiboot corresponding patches (v7):
>>>>      https://patchwork.ozlabs.org/patch/480437/
>>>>    * Utilize OF overlay to update device-tree with help of newly introduced
>>>>      OPAL API opal_get_overlay_dt().
>>>>    * Split patches for easy review according to aik's comments.
>>>>    * Fix coding style from checkpatchc.pl as pointed by aik.
>>>>    * Code cleanup and misc fixup according to aik's input.
>>>>v4:
>>>>    * Rebased to 4.1.RC1
>>>>    * Added API to unflatten FDT blob to device node sub-tree, which is attached
>>>>      the indicated parent device node. The original mechanism based on formatted
>>>>      string stream has been dropped.
>>>>    * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
>>>>      was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
>>>>      Support" depends on that.
>>>>v3:
>>>>    * Rebased to 4.1.RC0
>>>>    * PowerNV PCI infrasturcture is total refactored in order to support PCI
>>>>      hotplug. The PowerNV hotplug driver is also reworked a lot because of
>>>>      the changes in skiboot in order to support PCI hotplug.
>>>>
>>>>Gavin Shan (45):
>>>>   PCI: Add pcibios_setup_bridge()
>>>>   powerpc/pci: Override pcibios_setup_bridge()
>>>>   powerpc/pci: Cleanup on struct pci_controller_ops
>>>>   powerpc/powernv: Cleanup on pci_controller_ops instances
>>>>   powerpc/powernv: Drop phb->bdfn_to_pe()
>>>>   powerpc/powernv: Reorder fields in struct pnv_phb
>>>>   powerpc/powernv: Rename PE# fields in struct pnv_phb
>>>>   powerpc/powernv: Fix initial IO and M32 segmap
>>>>   powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
>>>>   powerpc/powernv: IO and M32 mapping based on PCI device resources
>>>>   powerpc/powernv: Track M64 segment consumption
>>>>   powerpc/powernv: Rename M64 related functions
>>>>   powerpc/powernv/ioda1: M64 support on P7IOC
>>>>   powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()
>>>>   powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
>>>>   powerpc/powernv: Remove DMA32 PE list
>>>>   powerpc/powernv/ioda1: Improve DMA32 segment track
>>>>   powerpc/powernv: Increase PE# capacity
>>>>   powerpc/powernv: Use PE instead of number during setup and release
>>>>   powerpc/powernv: Allocate PE# in reverse order
>>>>   powerpc/powernv: Create PEs at PCI hot plugging time
>>>>   powerpc/powernv/ioda1: Support releasing IODA1 TCE table
>>>>   powerpc/powernv: Dynamically release PEs
>>>>   powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
>>>>   powerpc/pci: Rename pcibios_find_pci_bus()
>>>>   powerpc/pci: Move pci_find_bus_by_node() around
>>>>   powerpc/pci: Export pci_add_device_node_info()
>>>>   powerpc/pci: Introduce pci_remove_device_node_info()
>>>>   powerpc/pci: Export pci_traverse_device_nodes()
>>>>   powerpc/pci: Delay populating pdn
>>>>   powerpc/pci: Don't scan empty slot
>>>>   powerpc/pci: Update bridge windows on PCI plug
>>>>   powerpc/powernv: Simplify pnv_eeh_reset()
>>>>   powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
>>>>   powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
>>>>   powerpc/powernv: Support PCI slot ID
>>>>   powerpc/powernv: Use firmware PCI slot reset infrastructure
>>>>   powerpc/powernv: Functions to get/set PCI slot status
>>>>   powerpc/powernv: Select OF_DYNAMIC
>>>>   drivers/of: Split unflatten_dt_node()
>>>>   drivers/of: Avoid recursively calling unflatten_dt_node()
>>>>   drivers/of: Rename unflatten_dt_node()
>>>>   drivers/of: Specify parent node in of_fdt_unflatten_tree()
>>>>   drivers/of: Return allocated memory from of_fdt_unflatten_tree()
>>>>   PCI/hotplug: PowerPC PowerNV PCI hotplug driver
>>>>
>>>>  arch/powerpc/include/asm/eeh.h                 |    2 +-
>>>>  arch/powerpc/include/asm/opal-api.h            |   17 +-
>>>>  arch/powerpc/include/asm/opal.h                |    8 +-
>>>>  arch/powerpc/include/asm/pci-bridge.h          |   25 +-
>>>>  arch/powerpc/include/asm/pnv-pci.h             |    7 +
>>>>  arch/powerpc/include/asm/ppc-pci.h             |    8 +-
>>>>  arch/powerpc/kernel/eeh_dev.c                  |   17 +-
>>>>  arch/powerpc/kernel/eeh_driver.c               |   12 +-
>>>>  arch/powerpc/kernel/pci-common.c               |   16 +-
>>>>  arch/powerpc/kernel/pci-hotplug.c              |   47 +-
>>>>  arch/powerpc/kernel/pci_dn.c                   |   89 +-
>>>>  arch/powerpc/platforms/maple/pci.c             |   34 +-
>>>>  arch/powerpc/platforms/pasemi/pci.c            |    3 -
>>>>  arch/powerpc/platforms/powermac/pci.c          |   38 +-
>>>>  arch/powerpc/platforms/powernv/Kconfig         |    1 +
>>>>  arch/powerpc/platforms/powernv/eeh-powernv.c   |  179 ++--
>>>>  arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c      | 1243 +++++++++++++++---------
>>>>  arch/powerpc/platforms/powernv/pci.c           |   92 +-
>>>>  arch/powerpc/platforms/powernv/pci.h           |   60 +-
>>>>  arch/powerpc/platforms/pseries/msi.c           |    4 +-
>>>>  arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
>>>>  arch/powerpc/platforms/pseries/setup.c         |    8 +-
>>>>  drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c   |    2 +-
>>>>  drivers/of/fdt.c                               |  372 ++++---
>>>>  drivers/of/unittest.c                          |    2 +-
>>>>  drivers/pci/hotplug/Kconfig                    |   12 +
>>>>  drivers/pci/hotplug/Makefile                   |    3 +
>>>>  drivers/pci/hotplug/pnv_php.c                  |  870 +++++++++++++++++
>>>>  drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
>>>>  drivers/pci/hotplug/rpaphp_core.c              |    4 +-
>>>>  drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
>>>>  drivers/pci/setup-bus.c                        |    5 +
>>>>  include/linux/of_fdt.h                         |    5 +-
>>>>  include/linux/pci.h                            |    1 +
>>>>  35 files changed, 2360 insertions(+), 874 deletions(-)
>>>>  create mode 100644 drivers/pci/hotplug/pnv_php.c
>>>>
>>>
>>>
>>>--
>>>Alexey
>>>
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
  2016-04-13  8:29     ` Alexey Kardashevskiy
@ 2016-04-13 23:54       ` Gavin Shan
  2016-04-14  3:36         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-13 23:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 13, 2016 at 06:29:42PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>Currently, there is one macro (TCE32_TABLE_SIZE) representing the
>>TCE table size for one DMA32 segment. The constant representing
>>the DMA32 segment size (1 << 28) is still used in the code.
>>
>>This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
>>segment size. the TCE table size can be calcualted when the page
>
>s/calcualted/calculated/
>
>
>>has fixed 4KB size. So all the related calculation depends on one
>>macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.
>
>Please move PNV_IODA1_DMA32_SEGSIZE where TCE32_TABLE_SIZE was.
>
>
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 30 +++++++++++++++++-------------
>>  arch/powerpc/platforms/powernv/pci.h      |  1 +
>>  2 files changed, 18 insertions(+), 13 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index d18b95e..e60cff6 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -48,9 +48,6 @@
>>  #include "powernv.h"
>>  #include "pci.h"
>>
>>-/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
>>-
>>  #define POWERNV_IOMMU_DEFAULT_LEVELS	1
>>  #define POWERNV_IOMMU_MAX_LEVELS	5
>>
>>@@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>
>>  	struct page *tce_mem = NULL;
>>  	struct iommu_table *tbl;
>>-	unsigned int i;
>>+	unsigned int tce32_segsz, i;
>
>
>PNV_IODA1_DMA32_SEGSIZE is a segment size in bytes. The name @tce32_segsz
>also suggests that it is a segment size in bytes (otherwise it would be
>tce32_seg_entries or something like this) but it is not, it is a number of
>TCE entries (arch/powerpc/kernel/iommu.c uses "entry" for these). And
>tce32_segsz never changes. So:
>
>const unsigned int entries = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K
>- 3);
>

Are you sure @tce32_segsz and equation you gave are for number of TCE entries,
not the size of meory required for the DMA32 segment TCE table?

>>  	int64_t rc;
>>  	void *addr;
>>
>>@@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  	/* Grab a 32-bit TCE table */
>>  	pe->tce32_seg = base;
>>  	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>-		(base << 28), ((base + segs) << 28) - 1);
>>+		base * PNV_IODA1_DMA32_SEGSIZE,
>>+		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>>
>>  	/* XXX Currently, we allocate one big contiguous table for the
>>  	 * TCEs. We only really need one chunk per 256M of TCE space
>>  	 * (ie per segment) but that's an optimization for later, it
>>  	 * requires some added smarts with our get/put_tce implementation
>>+	 *
>>+	 * Each TCE page is 4KB in size and each TCE entry occupies 8
>>+	 * bytes
>>  	 */
>>+	tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);
>
>>  	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
>>-				   get_order(TCE32_TABLE_SIZE * segs));
>>+				   get_order(tce32_segsz * segs));
>>  	if (!tce_mem) {
>>  		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
>>  		goto fail;
>>  	}
>>  	addr = page_address(tce_mem);
>>-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
>>+	memset(addr, 0, tce32_segsz * segs);
>>
>>  	/* Configure HW */
>>  	for (i = 0; i < segs; i++) {
>>  		rc = opal_pci_map_pe_dma_window(phb->opal_id,
>>  					      pe->pe_number,
>>  					      base + i, 1,
>>-					      __pa(addr) + TCE32_TABLE_SIZE * i,
>>-					      TCE32_TABLE_SIZE, 0x1000);
>>+					      __pa(addr) + tce32_segsz * i,
>>+					      tce32_segsz, 0x1000);
>
>
>As you started using IOMMU_PAGE_SHIFT_4K and you are also touching this piece
>of code -
>
>s/0x1000/IOMMU_PAGE_SHIFT_4K/
>

Does 0x1000 is equal to IOMMU_PAGE_SHIFT_4K? I guess you probably suggested
to use IOMMU_PAGE_SIZE_4K instead?

>>  		if (rc) {
>>  			pe_err(pe, " Failed to configure 32-bit TCE table,"
>>  			       " err %ld\n", rc);
>>@@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  	}
>>
>>  	/* Setup linux iommu table */
>>-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
>>-				  base << 28, IOMMU_PAGE_SHIFT_4K);
>>+	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
>>+				  base * PNV_IODA1_DMA32_SEGSIZE,
>>+				  IOMMU_PAGE_SHIFT_4K);
>>
>>  	/* OPAL variant of P7IOC SW invalidated TCEs */
>>  	if (phb->ioda.tce_inval_reg)
>>@@ -2116,7 +2119,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  	if (pe->tce32_seg >= 0)
>>  		pe->tce32_seg = -1;
>>  	if (tce_mem)
>>-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>>+		__free_pages(tce_mem, get_order(tce32_segsz * segs));
>>  	if (tbl) {
>>  		pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
>>  		iommu_free_table(tbl, "pnv");
>>@@ -3445,7 +3448,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	mutex_init(&phb->ioda.pe_list_mutex);
>>
>>  	/* Calculate how many 32-bit TCE segments we have */
>>-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
>>+	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
>>+				PNV_IODA1_DMA32_SEGSIZE;
>>
>>  #if 0 /* We should really do that ... */
>>  	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 00539ff..1d8e775 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -84,6 +84,7 @@ struct pnv_ioda_pe {
>>
>>  #define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
>>  #define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
>>+#define PNV_IODA1_DMA32_SEGSIZE	0x10000000
>>
>>  #define PNV_PHB_FLAG_EEH	(1 << 0)
>>
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-13 23:42       ` Gavin Shan
@ 2016-04-13 23:57         ` Alistair Popple
  2016-04-14  1:30           ` Gavin Shan
  2016-04-14  3:26         ` Alexey Kardashevskiy
  1 sibling, 1 reply; 174+ messages in thread
From: Alistair Popple @ 2016-04-13 23:57 UTC (permalink / raw)
  To: linuxppc-dev, Gavin Shan
  Cc: Alexey Kardashevskiy, devicetree, grant.likely, robherring2,
	linux-pci, bhelgaas, dja

Hi Gavin,

<snip>

> >Why exactly cannot EEH reset changes go to a smaller separate patchset
> >(before hotplug)?
> >
> 
> As I explained before, the patchset's order is: PCI generic part,
> PowerNV PCI related, EEH related, device-tree part and hotplug driver.
> 
> The EEH reset change is included in PATCH[37/45]. There is no point
> to reorder the patches.
 
I don't understand all of the dependencies but if possible splitting the 
series up into a set of smaller self-contained patch series makes things 
easier to review and may make it easier for you to get this functionality 
reviewed and accepted into upstream.

Regards,

Alistair

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-13 23:57         ` Alistair Popple
@ 2016-04-14  1:30           ` Gavin Shan
  2016-04-14  3:38             ` Alexey Kardashevskiy
  2016-04-15 16:10             ` Rob Herring
  0 siblings, 2 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-14  1:30 UTC (permalink / raw)
  To: Alistair Popple
  Cc: linuxppc-dev, Gavin Shan, Alexey Kardashevskiy, devicetree,
	grant.likely, robherring2, linux-pci, bhelgaas, dja

On Thu, Apr 14, 2016 at 09:57:32AM +1000, Alistair Popple wrote:
>Hi Gavin,
>
><snip>
>
>> >Why exactly cannot EEH reset changes go to a smaller separate patchset
>> >(before hotplug)?
>> >
>> 
>> As I explained before, the patchset's order is: PCI generic part,
>> PowerNV PCI related, EEH related, device-tree part and hotplug driver.
>> 
>> The EEH reset change is included in PATCH[37/45]. There is no point
>> to reorder the patches.
>
>I don't understand all of the dependencies but if possible splitting the 
>series up into a set of smaller self-contained patch series makes things 
>easier to review and may make it easier for you to get this functionality 
>reviewed and accepted into upstream.
>

Thanks, Alistair. I will move those cleanup/refactor related patches
to form a separate series which is expected to be merged first. That
will helps the reviewers to focus on the patches with complicated
changes as you suggested. Alexey, please let me know if that way is
you like to see or not.

Thanks,
Gavin

>Regards,
>
>Alistair
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-13 23:42       ` Gavin Shan
  2016-04-13 23:57         ` Alistair Popple
@ 2016-04-14  3:26         ` Alexey Kardashevskiy
  2016-04-14  5:25           ` Gavin Shan
  1 sibling, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-14  3:26 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/14/2016 09:42 AM, Gavin Shan wrote:
> On Wed, Apr 13, 2016 at 07:14:59PM +1000, Alexey Kardashevskiy wrote:
>> On 04/13/2016 05:42 PM, Gavin Shan wrote:
>>> On Wed, Apr 13, 2016 at 05:28:15PM +1000, Alexey Kardashevskiy wrote:
>>>> On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>>>> This series of patches rebases on powerpc/next branch, plus below additional
>>>>> patches:
>>>>>
>>>>>     <This series of patches>
>>>>>     <Followup 3 patches from Gavin on SRIOV EEH, which aren't posted>
>>>>>     https://patchwork.ozlabs.org/patch/581315/	(PATCH[1/9] Richard's SRIOV EEH)
>>>>>     https://patchwork.ozlabs.org/patch/582639/	(PATCH[1/1] Gavin's EEH fix)
>>>>>     https://patchwork.ozlabs.org/patch/582093/	(PATCH[1/1] Gavin's EEH fix)
>>>>>     https://patchwork.ozlabs.org/patch/580626/	(PATCH[1/4] Gavin's PCI fix)
>>>>>     https://patchwork.ozlabs.org/patch/580153/	(PATCH[1/1] Andrew's EEH minor fix)
>>>>>     https://patchwork.ozlabs.org/patch/566827/	(PATCH[1/1] Russell's P5IOC2 removal)
>>>>>     https://patchwork.ozlabs.org/patch/534154/	(PATCH[1/7] Richard's SRIOV rework)
>>>>>     commit 388f7b1 ("Linux 4.5-rc3")
>>>>>
>>>>> The series of patches intend to support PCI slot for PowerPC PowerNV platform,
>>>>> which is running on top of skiboot firmware. The patchset requires corresponding
>>>>> changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
>>>>> for review. The PCI slots are exposed by skiboot with device node properties,
>>>>> and kernel utilizes those properties to populated PCI slots accordingly.
>>>>>
>>>>> The original PCI infrastructure on PowerNV platform can't support hotplug
>>>>> because the PE is assigned during PHB fixup time, which is called for once
>>>>> during system boot time. For this, the PCI infrastructure on PowerNV platform
>>>>> has been reworked for a lot. After that, the PE and its corresponding resources
>>>>> (IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
>>>>> PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
>>>>> resources, on P8 strictly speaking). Each PE will maintain a reference count,
>>>>> which is (number of child PCI devices + 1). That indicates when last child PCI
>>>>> device leaves the PE, the PE and its included resources will be relased and put
>>>>> back into free pool again. With this design, the PE will be released when EEH PE
>>>>> is released. PATCH[1 - 23] are related to this part.
>>>>>
>>>>>  From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
>>>>> resets to EEH. The kernel gets to know if skiboot supports various reset on one
>>>>> particular PCI slot through device-tree node. If it does, EEH will utilize the
>>>>> functionality provided by skiboot. Besides, the device-tree nodes have to change
>>>>> in order to support PCI hotplug. For example, when one PCI adapter inserted to
>>>>> one slot, its device-tree node should be added to the system dynamically. Conversely,
>>>>> the device-tree node should be removed from the system when the PCI adapter is going
>>>>> to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
>>>>> they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
>>>>> doing the related work.
>>>>>
>>>>> The OF driver is changed to support unflattening FDT blob for sub-stree, which
>>>>> is covered by PATCH[40 - 44].
>>>>>
>>>>> The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC PowerNV
>>>>> platform.
>>>>>
>>>>> =======
>>>>> Testing
>>>>> =======
>>>>> 1. Unplug adapters behind non-empty slot, then plug them.
>>>>>
>>>>>     1.1 Check status
>>>>>     # cat /sys/bus/pci/slots/C10/address
>>>>>     0003:09:00
>>>>>     # cat /sys/bus/pci/slots/C10/adapter
>>>>>     1
>>>>>     # cat /sys/bus/pci/slots/C10/power
>>>>>     1
>>>>>     # lspci
>>>>>     0003:09:00.0 Ethernet controller: \
>>>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>>>     0003:09:00.1 Ethernet controller: \
>>>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>>>     0003:09:00.2 Ethernet controller: \
>>>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>>>     0003:09:00.3 Ethernet controller: \
>>>>>     Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>>>>     # lspci -t
>>>>>     # lspci -t
>>>>>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>>>      |                                           +-08.0-[04-08]--
>>>>>      |                                           +-09.0-[09]--+-00.0
>>>>>      |                                           |            +-00.1
>>>>>      |                                           |            +-00.2
>>>>>      |                                           |            \-00.3
>>>>>      |                                           +-10.0-[0a-0e]--
>>>>>      |                                           \-11.0-[0f-13]--
>>>>>
>>>>>     1.2 Unplug adapter 0003:09.00.x
>>>>>     # echo 0 > /sys/bus/pci/slots/C10/power
>>>>>     # lspci -t
>>>>>     -+-[0003:00]---00.0-[01-13]----00.0-[02-13]--+-01.0-[03]----00.0
>>>>>      |                                           +-08.0-[04-08]--
>>>>>      |                                           +-09.0-[09]--
>>>>>      |                                           +-10.0-[0a-0e]--
>>>>>      |                                           \-11.0-[0f-13]--
>>>>>
>>>>>     1.3 Plug adapter 0003:09.00.x
>>>>>     # echo 1 > /sys/bus/pci/slots/C10/power
>>>>
>>>>
>>>> Do I understand correctly that the adapter was not physically moved in/out of
>>>> the slot between 1.2 and 1.3?
>>>>
>>>
>>> Correct.
>>
>>
>> This is not right then... Someone should try it, on both P7 and P8.
>>
>
> Do you mean physically pull the adapter out and insert the same
> adapter back? What's the point for the test case?


Because this is what the patchset is for - to replace a physical device on 
a physical machine. Powering on/off the slots via sysfs is just an 
approximation (which is fine when you are debugging), something can go 
wrong and require some work but you do not know it for sure.



-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
  2016-04-13 23:54       ` Gavin Shan
@ 2016-04-14  3:36         ` Alexey Kardashevskiy
  2016-04-20  0:25           ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-14  3:36 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/14/2016 09:54 AM, Gavin Shan wrote:
> On Wed, Apr 13, 2016 at 06:29:42PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>> Currently, there is one macro (TCE32_TABLE_SIZE) representing the
>>> TCE table size for one DMA32 segment. The constant representing
>>> the DMA32 segment size (1 << 28) is still used in the code.
>>>
>>> This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
>>> segment size. the TCE table size can be calcualted when the page
>>
>> s/calcualted/calculated/
>>
>>
>>> has fixed 4KB size. So all the related calculation depends on one
>>> macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.
>>
>> Please move PNV_IODA1_DMA32_SEGSIZE where TCE32_TABLE_SIZE was.
>>
>>
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 30 +++++++++++++++++-------------
>>>   arch/powerpc/platforms/powernv/pci.h      |  1 +
>>>   2 files changed, 18 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index d18b95e..e60cff6 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -48,9 +48,6 @@
>>>   #include "powernv.h"
>>>   #include "pci.h"
>>>
>>> -/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>> -#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
>>> -
>>>   #define POWERNV_IOMMU_DEFAULT_LEVELS	1
>>>   #define POWERNV_IOMMU_MAX_LEVELS	5
>>>
>>> @@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>
>>>   	struct page *tce_mem = NULL;
>>>   	struct iommu_table *tbl;
>>> -	unsigned int i;
>>> +	unsigned int tce32_segsz, i;
>>
>>
>> PNV_IODA1_DMA32_SEGSIZE is a segment size in bytes. The name @tce32_segsz
>> also suggests that it is a segment size in bytes (otherwise it would be
>> tce32_seg_entries or something like this) but it is not, it is a number of
>> TCE entries (arch/powerpc/kernel/iommu.c uses "entry" for these). And
>> tce32_segsz never changes. So:
>>
>> const unsigned int entries = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K
>> - 3);
>>
>
> Are you sure @tce32_segsz and equation you gave are for number of TCE entries,
> not the size of meory required for the DMA32 segment TCE table?

No, I am not :) "-3" makes it a table size in bytes, so it is rather 
tablesz then.


>
>>>   	int64_t rc;
>>>   	void *addr;
>>>
>>> @@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>   	/* Grab a 32-bit TCE table */
>>>   	pe->tce32_seg = base;
>>>   	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>> -		(base << 28), ((base + segs) << 28) - 1);
>>> +		base * PNV_IODA1_DMA32_SEGSIZE,
>>> +		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>>>
>>>   	/* XXX Currently, we allocate one big contiguous table for the
>>>   	 * TCEs. We only really need one chunk per 256M of TCE space
>>>   	 * (ie per segment) but that's an optimization for later, it
>>>   	 * requires some added smarts with our get/put_tce implementation
>>> +	 *
>>> +	 * Each TCE page is 4KB in size and each TCE entry occupies 8
>>> +	 * bytes
>>>   	 */
>>> +	tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);
>>
>>>   	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
>>> -				   get_order(TCE32_TABLE_SIZE * segs));
>>> +				   get_order(tce32_segsz * segs));
>>>   	if (!tce_mem) {
>>>   		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
>>>   		goto fail;
>>>   	}
>>>   	addr = page_address(tce_mem);
>>> -	memset(addr, 0, TCE32_TABLE_SIZE * segs);
>>> +	memset(addr, 0, tce32_segsz * segs);
>>>
>>>   	/* Configure HW */
>>>   	for (i = 0; i < segs; i++) {
>>>   		rc = opal_pci_map_pe_dma_window(phb->opal_id,
>>>   					      pe->pe_number,
>>>   					      base + i, 1,
>>> -					      __pa(addr) + TCE32_TABLE_SIZE * i,
>>> -					      TCE32_TABLE_SIZE, 0x1000);
>>> +					      __pa(addr) + tce32_segsz * i,
>>> +					      tce32_segsz, 0x1000);
>>
>>
>> As you started using IOMMU_PAGE_SHIFT_4K and you are also touching this piece
>> of code -
>>
>> s/0x1000/IOMMU_PAGE_SHIFT_4K/
>>
>
> Does 0x1000 is equal to IOMMU_PAGE_SHIFT_4K? I guess you probably suggested
> to use IOMMU_PAGE_SIZE_4K instead?


Ah, my bad, should have been IOMMU_PAGE_SIZE_4K. I'll pay more attention to 
the details, sorry.


>
>>>   		if (rc) {
>>>   			pe_err(pe, " Failed to configure 32-bit TCE table,"
>>>   			       " err %ld\n", rc);
>>> @@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>   	}
>>>
>>>   	/* Setup linux iommu table */
>>> -	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
>>> -				  base << 28, IOMMU_PAGE_SHIFT_4K);
>>> +	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
>>> +				  base * PNV_IODA1_DMA32_SEGSIZE,
>>> +				  IOMMU_PAGE_SHIFT_4K);
>>>
>>>   	/* OPAL variant of P7IOC SW invalidated TCEs */
>>>   	if (phb->ioda.tce_inval_reg)
>>> @@ -2116,7 +2119,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>   	if (pe->tce32_seg >= 0)
>>>   		pe->tce32_seg = -1;
>>>   	if (tce_mem)
>>> -		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>>> +		__free_pages(tce_mem, get_order(tce32_segsz * segs));
>>>   	if (tbl) {
>>>   		pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
>>>   		iommu_free_table(tbl, "pnv");
>>> @@ -3445,7 +3448,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   	mutex_init(&phb->ioda.pe_list_mutex);
>>>
>>>   	/* Calculate how many 32-bit TCE segments we have */
>>> -	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
>>> +	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
>>> +				PNV_IODA1_DMA32_SEGSIZE;
>>>
>>>   #if 0 /* We should really do that ... */
>>>   	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index 00539ff..1d8e775 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -84,6 +84,7 @@ struct pnv_ioda_pe {
>>>
>>>   #define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
>>>   #define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
>>> +#define PNV_IODA1_DMA32_SEGSIZE	0x10000000
>>>
>>>   #define PNV_PHB_FLAG_EEH	(1 << 0)
>>>
>>>
>>
>>
>> --
>> Alexey
>>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-14  1:30           ` Gavin Shan
@ 2016-04-14  3:38             ` Alexey Kardashevskiy
  2016-04-15 16:10             ` Rob Herring
  1 sibling, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-14  3:38 UTC (permalink / raw)
  To: Gavin Shan, Alistair Popple
  Cc: linuxppc-dev, devicetree, grant.likely, robherring2, linux-pci,
	bhelgaas, dja

On 04/14/2016 11:30 AM, Gavin Shan wrote:
> On Thu, Apr 14, 2016 at 09:57:32AM +1000, Alistair Popple wrote:
>> Hi Gavin,
>>
>> <snip>
>>
>>>> Why exactly cannot EEH reset changes go to a smaller separate patchset
>>>> (before hotplug)?
>>>>
>>>
>>> As I explained before, the patchset's order is: PCI generic part,
>>> PowerNV PCI related, EEH related, device-tree part and hotplug driver.
>>>
>>> The EEH reset change is included in PATCH[37/45]. There is no point
>>> to reorder the patches.
>>
>> I don't understand all of the dependencies but if possible splitting the
>> series up into a set of smaller self-contained patch series makes things
>> easier to review and may make it easier for you to get this functionality
>> reviewed and accepted into upstream.
>>
>
> Thanks, Alistair. I will move those cleanup/refactor related patches
> to form a separate series which is expected to be merged first. That
> will helps the reviewers to focus on the patches with complicated
> changes as you suggested. Alexey, please let me know if that way is
> you like to see or not.

I do not know yet, I have not finished reviewing this version. May be the 
EEH reset patch depends on 1/45..36/45; or it only makes sense when 45/45 
is applied - this all is unclear.

If 37/45 has no dependencies and good just by itself, you could have posted 
it separately few months ago and it would have reached upstream by now and 
this patchset would be at least one patch shorter and you would not have to 
rebase all 45 patches over and over again on top of the current upstream 
tree...



-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-14  3:26         ` Alexey Kardashevskiy
@ 2016-04-14  5:25           ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-14  5:25 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Thu, Apr 14, 2016 at 01:26:51PM +1000, Alexey Kardashevskiy wrote:

.../...

>>
>>Do you mean physically pull the adapter out and insert the same
>>adapter back? What's the point for the test case?
>
>
>Because this is what the patchset is for - to replace a physical device on a
>physical machine. Powering on/off the slots via sysfs is just an
>approximation (which is fine when you are debugging), something can go wrong
>and require some work but you do not know it for sure.
>

Yes, It's absolutely worthy to be covered by the test cases though case (2)
covers part of that. Anyway, I'll test it through in next revision. Thanks
for your review.

>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-02-17  3:44     ` Gavin Shan
  (?)
@ 2016-04-15  0:47     ` Alistair Popple
  2016-04-15  1:39       ` Gavin Shan
  -1 siblings, 1 reply; 174+ messages in thread
From: Alistair Popple @ 2016-04-15  0:47 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Gavin Shan, devicetree, aik, linux-pci, grant.likely,
	robherring2, bhelgaas, dja

Hi Gavin,

I was reading through this to understand how it all works and noticed a couple
of things, comments below.

- Alistair

On Wed, 17 Feb 2016 14:44:28 Gavin Shan wrote:

<snip>

> +
> +static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
> +{
> +	void *fdt, *fdt1, *dt;
> +	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +	int ret;
> +
> +	/* We don't know the FDT blob size. We try to get it through
> +	 * maximal memory chunk and then copy it to another chunk that
> +	 * fits the real size.
> +	 */
> +	fdt1 = kzalloc(0x10000, GFP_KERNEL);
> +	if (!fdt1)
> +		goto error;
> +
> +	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
> +	if (ret)
> +		goto free_fdt1;
> +
> +	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
> +	if (!fdt)
> +		goto free_fdt1;
> +
> +	/* Unflatten device tree blob */
> +	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
> +	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
> +	if (!dt) {
> +		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
> +		goto free_fdt;
> +	}
> +
> +	/* Initialize and apply the changeset */
> +	of_changeset_init(&php_slot->ocs);
> +	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
> +			 ret);
> +		goto free_dt;
> +	}
> +
> +	php_slot->dn->child = NULL;
> +	ret = of_changeset_apply(&php_slot->ocs);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
> +			 ret);
> +		goto destroy_changeset;
> +	}
> +
> +	/* Add device node firmware data */
> +	pnv_php_add_pdns(php_slot);
> +	php_slot->fdt = fdt;
> +	php_slot->dt  = dt;
> +	goto out;

Doesn't this leak memory from fdt1? I can't see where it gets freed in this
case.

> +destroy_changeset:
> +	of_changeset_destroy(&php_slot->ocs);
> +free_dt:
> +	kfree(dt);
> +	php_slot->dn->child = NULL;
> +free_fdt:
> +	kfree(fdt);
> +free_fdt1:
> +	kfree(fdt1);
> +error:
> +	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
> +out:
> +	/* Confirm status change */
> +	php_slot->power_state_confirmed = confirm;
> +	wake_up_interruptible(&php_slot->queue);
> +}
> +

<snip>

> +
> +static void __exit pnv_php_exit(void)
> +{
> +	struct device_node *dn;
> +
> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> +		pnv_php_unregister(dn);
> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> +		pnv_php_unregister(dn);
> +
> +	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);

Do you flush the workqueues anywhere? Usually you would stop work being queued 
and call something like flush_workqueue() to ensure no work is still
running/queued before unloading the module.

- Alistair

> +}
> +
> +module_init(pnv_php_init);
> +module_exit(pnv_php_exit);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-04-15  0:47     ` Alistair Popple
@ 2016-04-15  1:39       ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-15  1:39 UTC (permalink / raw)
  To: Alistair Popple
  Cc: linuxppc-dev, Gavin Shan, devicetree, aik, linux-pci,
	grant.likely, robherring2, bhelgaas, dja

On Fri, Apr 15, 2016 at 10:47:52AM +1000, Alistair Popple wrote:
>Hi Gavin,
>
>I was reading through this to understand how it all works and noticed a couple
>of things, comments below.
>

Alistair, thanks for your time on review.

>
>On Wed, 17 Feb 2016 14:44:28 Gavin Shan wrote:
>
><snip>
>
>> +
>> +static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
>> +{
>> +	void *fdt, *fdt1, *dt;
>> +	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
>> +	int ret;
>> +
>> +	/* We don't know the FDT blob size. We try to get it through
>> +	 * maximal memory chunk and then copy it to another chunk that
>> +	 * fits the real size.
>> +	 */
>> +	fdt1 = kzalloc(0x10000, GFP_KERNEL);
>> +	if (!fdt1)
>> +		goto error;
>> +
>> +	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
>> +	if (ret)
>> +		goto free_fdt1;
>> +
>> +	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
>> +	if (!fdt)
>> +		goto free_fdt1;
>> +
>> +	/* Unflatten device tree blob */
>> +	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
>> +	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
>> +	if (!dt) {
>> +		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
>> +		goto free_fdt;
>> +	}
>> +
>> +	/* Initialize and apply the changeset */
>> +	of_changeset_init(&php_slot->ocs);
>> +	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
>> +	if (ret) {
>> +		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
>> +			 ret);
>> +		goto free_dt;
>> +	}
>> +
>> +	php_slot->dn->child = NULL;
>> +	ret = of_changeset_apply(&php_slot->ocs);
>> +	if (ret) {
>> +		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
>> +			 ret);
>> +		goto destroy_changeset;
>> +	}
>> +
>> +	/* Add device node firmware data */
>> +	pnv_php_add_pdns(php_slot);
>> +	php_slot->fdt = fdt;
>> +	php_slot->dt  = dt;
>> +	goto out;
>
>Doesn't this leak memory from fdt1? I can't see where it gets freed in this
>case.
>

You're right that @fdt1 should be released here. I'll fix it in
next revision.

>> +destroy_changeset:
>> +	of_changeset_destroy(&php_slot->ocs);
>> +free_dt:
>> +	kfree(dt);
>> +	php_slot->dn->child = NULL;
>> +free_fdt:
>> +	kfree(fdt);
>> +free_fdt1:
>> +	kfree(fdt1);
>> +error:
>> +	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
>> +out:
>> +	/* Confirm status change */
>> +	php_slot->power_state_confirmed = confirm;
>> +	wake_up_interruptible(&php_slot->queue);
>> +}
>> +
>
><snip>
>
>> +
>> +static void __exit pnv_php_exit(void)
>> +{
>> +	struct device_node *dn;
>> +
>> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>> +		pnv_php_unregister(dn);
>> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>> +		pnv_php_unregister(dn);
>> +
>> +	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
>
>Do you flush the workqueues anywhere? Usually you would stop work being queued 
>and call something like flush_workqueue() to ensure no work is still
>running/queued before unloading the module.
>

Good question. Yeah, I'll flush the workqueue before the module is going
to be unloaded.

Thanks,
Gavin

>- Alistair
>
>> +}
>> +
>> +module_init(pnv_php_init);
>> +module_exit(pnv_php_exit);
>> +
>> +MODULE_VERSION(DRIVER_VERSION);
>> +MODULE_LICENSE("GPL v2");
>> +MODULE_AUTHOR(DRIVER_AUTHOR);
>> +MODULE_DESCRIPTION(DRIVER_DESC);
>> 
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-14  1:30           ` Gavin Shan
  2016-04-14  3:38             ` Alexey Kardashevskiy
@ 2016-04-15 16:10             ` Rob Herring
  2016-04-20  2:40               ` Gavin Shan
  1 sibling, 1 reply; 174+ messages in thread
From: Rob Herring @ 2016-04-15 16:10 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Alistair Popple, linuxppc-dev, Alexey Kardashevskiy, devicetree,
	Grant Likely, linux-pci, Bjorn Helgaas, dja

On Wed, Apr 13, 2016 at 8:30 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> On Thu, Apr 14, 2016 at 09:57:32AM +1000, Alistair Popple wrote:
>>Hi Gavin,
>>
>><snip>
>>
>>> >Why exactly cannot EEH reset changes go to a smaller separate patchset
>>> >(before hotplug)?
>>> >
>>>
>>> As I explained before, the patchset's order is: PCI generic part,
>>> PowerNV PCI related, EEH related, device-tree part and hotplug driver.
>>>
>>> The EEH reset change is included in PATCH[37/45]. There is no point
>>> to reorder the patches.
>>
>>I don't understand all of the dependencies but if possible splitting the
>>series up into a set of smaller self-contained patch series makes things
>>easier to review and may make it easier for you to get this functionality
>>reviewed and accepted into upstream.
>>
>
> Thanks, Alistair. I will move those cleanup/refactor related patches
> to form a separate series which is expected to be merged first. That
> will helps the reviewers to focus on the patches with complicated
> changes as you suggested. Alexey, please let me know if that way is
> you like to see or not.

As I said last cycle, I'll happily take the DT refactoring patches
separately, but you have to tell me if you want me to apply them and
it has to be well before the merge window.

Rob

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 17/45] powerpc/powernv/ioda1: Improve DMA32 segment track
  2016-02-17  3:44     ` Gavin Shan
  (?)
@ 2016-04-19  1:50     ` Alexey Kardashevskiy
  2016-04-20  0:49       ` Gavin Shan
  -1 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  1:50 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> In current implementation, the DMA32 segments required by one specific
> PE isn't calculated with the information hold in the PE independently.
> It conflicts with the PCI hotplug design: PE centralized, meaning the
> PE's DMA32 segments should be calculated from the information hold in
> the PE independently.
>
> This introduces an array (@dma32_segmap) for every PHB to track the
> DMA32 segmeng usage. Besides, this moves the logic calculating PE's
> consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
> DMA32 segments are calculated/allocated from the information hold in
> the PE (DMA32 weight). Also the logic is improved: we try to allocate
> as much DMA32 segments as we can. It's acceptable that number of DMA32
> segments less than the expected number are allocated.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


This DMA segments business was the reason why I have not even tried 
implementing DDW for POWER7 - it is way too different from POWER8 and there 
is no chance that anyone outside Ozlabs will ever try using this in 
practice; the same applies to PCI hotplug on POWER7.

I am suggesting to ditch all IODA1 changes from this patchset as this code 
will hang around (unused) for may be a year or so and then will be gone as 
p5ioc2.



> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 111 +++++++++++++++++-------------
>   arch/powerpc/platforms/powernv/pci.h      |   7 +-
>   2 files changed, 66 insertions(+), 52 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 0fc2309..59782fba 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2007,20 +2007,54 @@ static unsigned int pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
>   }
>
>   static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
> -				       struct pnv_ioda_pe *pe,
> -				       unsigned int base,
> -				       unsigned int segs)
> +				       struct pnv_ioda_pe *pe)
>   {
>
>   	struct page *tce_mem = NULL;
>   	struct iommu_table *tbl;
> -	unsigned int tce32_segsz, i;
> +	unsigned int weight, total_weight;
> +	unsigned int tce32_segsz, base, segs, i;
>   	int64_t rc;
>   	void *addr;
>
>   	/* XXX FIXME: Handle 64-bit only DMA devices */
>   	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>   	/* XXX FIXME: Allocate multi-level tables on PHB3 */
> +	total_weight = pnv_pci_ioda_total_dma_weight(phb);
> +	weight = pnv_pci_ioda_pe_dma_weight(pe);
> +
> +	segs = (weight * phb->ioda.dma32_count) / total_weight;
> +	if (!segs)
> +		segs = 1;
> +
> +	/*
> +	 * Allocate contiguous DMA32 segments. We begin with the expected
> +	 * number of segments. With one more attempt, the number of DMA32
> +	 * segments to be allocated is decreased by one until one segment
> +	 * is allocated successfully.
> +	 */
> +	while (segs) {
> +		for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
> +			for (i = base; i < base + segs; i++) {
> +				if (phb->ioda.dma32_segmap[i] !=
> +				    IODA_INVALID_PE)
> +					break;
> +			}
> +
> +			if (i >= base + segs)
> +				break;
> +		}
> +
> +		if (i >= base + segs)
> +			break;
> +
> +		segs--;
> +	}
> +
> +	if (!segs) {
> +		pe_warn(pe, "No available DMA32 segments\n");
> +		return;
> +	}
>
>   	tbl = pnv_pci_table_alloc(phb->hose->node);
>   	iommu_register_group(&pe->table_group, phb->hose->global_number,
> @@ -2028,6 +2062,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>
>   	/* Grab a 32-bit TCE table */
> +	pe_info(pe, "DMA weight %d (%d), assigned (%d) %d DMA32 segments\n",
> +		weight, total_weight, base, segs);
>   	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>   		base * PNV_IODA1_DMA32_SEGSIZE,
>   		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
> @@ -2064,6 +2100,10 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   		}
>   	}
>
> +	/* Setup DMA32 segment mapping */
> +	for (i = base; i < base + segs; i++)
> +		phb->ioda.dma32_segmap[i] = pe->pe_number;
> +
>   	/* Setup linux iommu table */
>   	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
>   				  base * PNV_IODA1_DMA32_SEGSIZE,
> @@ -2538,70 +2578,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   {
>   	struct pci_controller *hose = phb->hose;
> -	unsigned int weight, total_weight, dma_pe_count;
> -	unsigned int residual, remaining, segs, base;
>   	struct pnv_ioda_pe *pe;
> -
> -	total_weight = pnv_pci_ioda_total_dma_weight(phb);
> -	dma_pe_count = 0;
> -	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> -		weight = pnv_pci_ioda_pe_dma_weight(pe);
> -		if (weight > 0)
> -			dma_pe_count++;
> -	}
> +	unsigned int weight;
>
>   	/* If we have more PE# than segments available, hand out one
>   	 * per PE until we run out and let the rest fail. If not,
>   	 * then we assign at least one segment per PE, plus more based
>   	 * on the amount of devices under that PE
>   	 */
> -	if (dma_pe_count > phb->ioda.tce32_count)
> -		residual = 0;
> -	else
> -		residual = phb->ioda.tce32_count - dma_pe_count;
> -
>   	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
> -		hose->global_number, phb->ioda.tce32_count);
> -	pr_info("PCI: %d PE# for a total weight of %d\n",
> -		dma_pe_count, total_weight);
> +		hose->global_number, phb->ioda.dma32_count);
>
>   	pnv_pci_ioda_setup_opal_tce_kill(phb);
>
> -	/* Walk our PE list and configure their DMA segments, hand them
> -	 * out one base segment plus any residual segments based on
> -	 * weight
> -	 */
> -	remaining = phb->ioda.tce32_count;
> -	base = 0;
> +	/* Walk our PE list and configure their DMA segments */
>   	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>   		weight = pnv_pci_ioda_pe_dma_weight(pe);
>   		if (!weight)
>   			continue;
>
> -		if (!remaining) {
> -			pe_warn(pe, "No DMA32 resources available\n");
> -			continue;
> -		}
> -		segs = 1;
> -		if (residual) {
> -			segs += ((weight * residual) + (total_weight / 2)) /
> -				total_weight;
> -			if (segs > remaining)
> -				segs = remaining;
> -		}
> -
>   		/*
>   		 * For IODA2 compliant PHB3, we needn't care about the weight.
>   		 * The all available 32-bits DMA space will be assigned to
>   		 * the specific PE.
>   		 */
>   		if (phb->type == PNV_PHB_IODA1) {
> -			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
> -				weight, segs);
> -			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
> +			pnv_pci_ioda1_setup_dma_pe(phb, pe);
>   		} else if (phb->type == PNV_PHB_IODA2) {
>   			pe_info(pe, "Assign DMA32 space\n");
> -			segs = 0;
>   			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>   		} else if (phb->type == PNV_PHB_NPU) {
>   			/*
> @@ -2611,9 +2615,6 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   			 * as the PHB3 TVT.
>   			 */
>   		}
> -
> -		remaining -= segs;
> -		base += segs;
>   	}
>   }
>
> @@ -3313,7 +3314,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   {
>   	struct pci_controller *hose;
>   	struct pnv_phb *phb;
> -	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
> +	unsigned long size, m64map_off, m32map_off, pemap_off;
> +	unsigned long iomap_off = 0, dma32map_off = 0;
>   	const __be64 *prop64;
>   	const __be32 *prop32;
>   	int i, len;
> @@ -3398,6 +3400,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
>   	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
>
> +	/* Calculate how many 32-bit TCE segments we have */
> +	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
> +				PNV_IODA1_DMA32_SEGSIZE;
> +
>   	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>   	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>   	m64map_off = size;
> @@ -3407,6 +3413,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	if (phb->type == PNV_PHB_IODA1) {
>   		iomap_off = size;
>   		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
> +		dma32map_off = size;
> +		size += phb->ioda.dma32_count *
> +			sizeof(phb->ioda.dma32_segmap[0]);
>   	}
>   	pemap_off = size;
>   	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
> @@ -3422,6 +3431,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   		phb->ioda.io_segmap = aux + iomap_off;
>   		for (i = 0; i < phb->ioda.total_pe_num; i++)
>   			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
> +
> +		phb->ioda.dma32_segmap = aux + dma32map_off;
> +		for (i = 0; i < phb->ioda.dma32_count; i++)
> +			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
>   	}
>   	phb->ioda.pe_array = aux + pemap_off;
>   	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
> @@ -3430,7 +3443,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	mutex_init(&phb->ioda.pe_list_mutex);
>
>   	/* Calculate how many 32-bit TCE segments we have */
> -	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
> +	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
>   				PNV_IODA1_DMA32_SEGSIZE;
>
>   #if 0 /* We should really do that ... */
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index e90bcbe..350e630 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -146,6 +146,10 @@ struct pnv_phb {
>   		int			*m32_segmap;
>   		int			*io_segmap;
>
> +		/* DMA32 segment maps - IODA1 only */
> +		unsigned long		dma32_count;
> +		int			*dma32_segmap;
> +
>   		/* IRQ chip */
>   		int			irq_chip_init;
>   		struct irq_chip		irq_chip;
> @@ -162,9 +166,6 @@ struct pnv_phb {
>   		 */
>   		unsigned char		pe_rmap[0x10000];
>
> -		/* 32-bit TCE tables allocation */
> -		unsigned long		tce32_count;
> -
>   		/* TCE cache invalidate registers (physical and
>   		 * remapped)
>   		 */
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 18/45] powerpc/powernv: Increase PE# capacity
  2016-02-17  3:44 ` [PATCH v8 18/45] powerpc/powernv: Increase PE# capacity Gavin Shan
@ 2016-04-19  2:02   ` Alexey Kardashevskiy
  2016-04-20  0:52     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  2:02 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> Each PHB maintains an array helping to translate 2-bytes Request
> ID (RID) to PE# with the assumption that PE# takes one byte, meaning
> that we can't have more than 256 PEs. However, pci_dn->pe_number
> already had 4-bytes for the PE#.
>
> This extends the PE# capacity for every PHB. After that, the PE number
> is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
> check the PE# in phb->pe_rmap[] is valid or not.


This should be merged into "[PATCH v8 21/45] powerpc/powernv: Create PEs at 
PCI hot plugging time" as it does not make sense alone (this patch does the 
initialization but only 3 patches apart this default value is analyzed -> 
hard to review).



> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Reviewed-by: Daniel Axtens <dja@axtens.net>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++++-
>   arch/powerpc/platforms/powernv/pci.h      | 7 ++-----
>   2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 59782fba..7800897 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -757,7 +757,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>
>   	/* Clear the reverse map */
>   	for (rid = pe->rid; rid < rid_end; rid++)
> -		phb->ioda.pe_rmap[rid] = 0;
> +		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>
>   	/* Release from all parents PELT-V */
>   	while (parent) {
> @@ -3387,6 +3387,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	if (prop32)
>   		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
>
> +	/* Invalidate RID to PE# mapping */
> +	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
> +		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
> +
>   	/* Parse 64-bit MMIO range */
>   	pnv_ioda_parse_m64_window(phb);
>
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 350e630..928cf81 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -160,11 +160,8 @@ struct pnv_phb {
>   		struct list_head	pe_list;
>   		struct mutex            pe_list_mutex;
>
> -		/* Reverse map of PEs, will have to extend if
> -		 * we are to support more than 256 PEs, indexed
> -		 * bus { bus, devfn }
> -		 */
> -		unsigned char		pe_rmap[0x10000];
> +		/* Reverse map of PEs, indexed by {bus, devfn} */
> +		int			pe_rmap[0x10000];
>
>   		/* TCE cache invalidate registers (physical and
>   		 * remapped)
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 19/45] powerpc/powernv: Use PE instead of number during setup and release
  2016-02-17  3:44 ` [PATCH v8 19/45] powerpc/powernv: Use PE instead of number during setup and release Gavin Shan
@ 2016-04-19  2:50   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  2:50 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> In current implementation, the PEs that are allocated or picked
> from the reserved list are identified by PE number. The PE instance
> has to be picked according to the PE number eventually. We have
> same issue when PE is released.
>
> For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
> PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
> or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
> returns the reserved/allocated PE instance to be used in subsequent
> patches. On the other hand, pnv_ioda_free_pe() uses PE instance
> (not number) as its argument. No logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 104 +++++++++++++++++-------------
>   arch/powerpc/platforms/powernv/pci.h      |   2 +-
>   2 files changed, 59 insertions(+), 47 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 7800897..f182ca7 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -119,6 +119,14 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>   		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>   }
>
> +static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
> +{
> +	phb->ioda.pe_array[pe_no].phb = phb;
> +	phb->ioda.pe_array[pe_no].pe_number = pe_no;
> +
> +	return &phb->ioda.pe_array[pe_no];
> +}
> +
>   static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>   {
>   	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
> @@ -131,11 +139,10 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>   		pr_debug("%s: PE %d was reserved on PHB#%x\n",
>   			 __func__, pe_no, phb->hose->global_number);
>
> -	phb->ioda.pe_array[pe_no].phb = phb;
> -	phb->ioda.pe_array[pe_no].pe_number = pe_no;
> +	pnv_ioda_init_pe(phb, pe_no);
>   }
>
> -static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
> +static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>   {
>   	unsigned long pe;
>
> @@ -143,20 +150,20 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>   		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>   					phb->ioda.total_pe_num, 0);
>   		if (pe >= phb->ioda.total_pe_num)
> -			return IODA_INVALID_PE;
> +			return NULL;
>   	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>
> -	phb->ioda.pe_array[pe].phb = phb;
> -	phb->ioda.pe_array[pe].pe_number = pe;
> -	return pe;
> +	return pnv_ioda_init_pe(phb, pe);
>   }
>
> -static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
> +static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
>   {
> -	WARN_ON(phb->ioda.pe_array[pe].pdev);
> +	struct pnv_phb *phb = pe->phb;
>
> -	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
> -	clear_bit(pe, phb->ioda.pe_alloc);
> +	WARN_ON(pe->pdev);
> +
> +	memset(pe, 0, sizeof(struct pnv_ioda_pe));
> +	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
>   }
>
>   /* The default M64 BAR is shared by all PEs */
> @@ -316,7 +323,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>   	}
>   }
>
> -static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
> +static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
> @@ -326,7 +333,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>
>   	/* Root bus shouldn't use M64 */
>   	if (pci_is_root_bus(bus))
> -		return IODA_INVALID_PE;
> +		return NULL;
>
>   	/* Allocate bitmap */
>   	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
> @@ -334,7 +341,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   	if (!pe_alloc) {
>   		pr_warn("%s: Out of memory !\n",
>   			__func__);
> -		return IODA_INVALID_PE;
> +		return NULL;
>   	}
>
>   	/* Figure out reserved PE numbers by the PE */
> @@ -347,7 +354,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   	 */
>   	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
>   		kfree(pe_alloc);
> -		return IODA_INVALID_PE;
> +		return NULL;
>   	}
>
>   	/*
> @@ -393,7 +400,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   	}
>
>   	kfree(pe_alloc);
> -	return master_pe->pe_number;
> +	return master_pe;
>   }
>
>   static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
> @@ -959,7 +966,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>   	struct pnv_phb *phb = hose->private_data;
>   	struct pci_dn *pdn = pci_get_pdn(dev);
>   	struct pnv_ioda_pe *pe;
> -	int pe_num;
>
>   	if (!pdn) {
>   		pr_err("%s: Device tree node not associated properly\n",
> @@ -969,8 +975,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>   	if (pdn->pe_number != IODA_INVALID_PE)
>   		return NULL;
>
> -	pe_num = pnv_ioda_alloc_pe(phb);
> -	if (pe_num == IODA_INVALID_PE) {
> +	pe = pnv_ioda_alloc_pe(phb);
> +	if (!pe) {
>   		pr_warning("%s: Not enough PE# available, disabling device\n",
>   			   pci_name(dev));
>   		return NULL;
> @@ -983,10 +989,9 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>   	 *
>   	 * At some point we want to remove the PDN completely anyways
>   	 */
> -	pe = &phb->ioda.pe_array[pe_num];
>   	pci_dev_get(dev);
>   	pdn->pcidev = dev;
> -	pdn->pe_number = pe_num;
> +	pdn->pe_number = pe->pe_number;
>   	pe->flags = PNV_IODA_PE_DEV;
>   	pe->pdev = dev;
>   	pe->pbus = NULL;
> @@ -997,8 +1002,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>
>   	if (pnv_ioda_configure_pe(phb, pe)) {
>   		/* XXX What do we do here ? */
> -		if (pe_num)
> -			pnv_ioda_free_pe(phb, pe_num);
> +		pnv_ioda_free_pe(pe);
>   		pdn->pe_number = IODA_INVALID_PE;
>   		pe->pdev = NULL;
>   		pci_dev_put(dev);
> @@ -1033,28 +1037,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>    * subordinate PCI devices and buses. The second type of PE is normally
>    * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
>    */
> -static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
> +static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
> -	struct pnv_ioda_pe *pe;
> -	int pe_num = IODA_INVALID_PE;
> +	struct pnv_ioda_pe *pe = NULL;
>
>   	/* Check if PE is determined by M64 */
>   	if (phb->pick_m64_pe)
> -		pe_num = phb->pick_m64_pe(bus, all);
> +		pe = phb->pick_m64_pe(bus, all);
>
>   	/* The PE number isn't pinned by M64 */
> -	if (pe_num == IODA_INVALID_PE)
> -		pe_num = pnv_ioda_alloc_pe(phb);
> +	if (!pe)
> +		pe = pnv_ioda_alloc_pe(phb);
>
> -	if (pe_num == IODA_INVALID_PE) {
> +	if (!pe) {
>   		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
>   			__func__, pci_domain_nr(bus), bus->number);
> -		return;
> +		return NULL;
>   	}
>
> -	pe = &phb->ioda.pe_array[pe_num];
>   	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>   	pe->pbus = bus;
>   	pe->pdev = NULL;
> @@ -1063,17 +1065,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>
>   	if (all)
>   		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
> -			bus->busn_res.start, bus->busn_res.end, pe_num);
> +			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
>   	else
>   		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
> -			bus->busn_res.start, pe_num);
> +			bus->busn_res.start, pe->pe_number);
>
>   	if (pnv_ioda_configure_pe(phb, pe)) {
>   		/* XXX What do we do here ? */
> -		if (pe_num)
> -			pnv_ioda_free_pe(phb, pe_num);
> +		pnv_ioda_free_pe(pe);
>   		pe->pbus = NULL;
> -		return;
> +		return NULL;
>   	}
>
>   	/* Associate it with all child devices */
> @@ -1081,6 +1082,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>
>   	/* Put PE to the list */
>   	list_add_tail(&pe->list, &phb->ioda.pe_list);
> +
> +	return pe;
>   }
>
>   static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
> @@ -1392,7 +1395,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>
>   		pnv_ioda_deconfigure_pe(phb, pe);
>
> -		pnv_ioda_free_pe(phb, pe->pe_number);
> +		pnv_ioda_free_pe(pe);
>   	}
>   }
>
> @@ -1401,6 +1404,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>   	struct pci_bus        *bus;
>   	struct pci_controller *hose;
>   	struct pnv_phb        *phb;
> +	struct pnv_ioda_pe    *pe;
>   	struct pci_dn         *pdn;
>   	struct pci_sriov      *iov;
>   	u16                    num_vfs, i;
> @@ -1425,8 +1429,11 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>   		/* Release PE numbers */
>   		if (pdn->m64_single_mode) {
>   			for (i = 0; i < num_vfs; i++) {
> -				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
> -					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
> +				if (pdn->pe_num_map[i] == IODA_INVALID_PE)
> +					continue;
> +
> +				pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
> +				pnv_ioda_free_pe(pe);
>   			}
>   		} else
>   			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
> @@ -1479,8 +1486,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>
>   		if (pnv_ioda_configure_pe(phb, pe)) {
>   			/* XXX What do we do here ? */
> -			if (pe_num)
> -				pnv_ioda_free_pe(phb, pe_num);
> +			pnv_ioda_free_pe(pe);
>   			pe->pdev = NULL;
>   			continue;
>   		}
> @@ -1499,6 +1505,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   	struct pci_bus        *bus;
>   	struct pci_controller *hose;
>   	struct pnv_phb        *phb;
> +	struct pnv_ioda_pe    *pe;
>   	struct pci_dn         *pdn;
>   	int                    ret;
>   	u16                    i;
> @@ -1541,11 +1548,13 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   		/* Calculate available PE for required VFs */
>   		if (pdn->m64_single_mode) {
>   			for (i = 0; i < num_vfs; i++) {
> -				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
> -				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
> +				pe = pnv_ioda_alloc_pe(phb);
> +				if (!pe) {
>   					ret = -EBUSY;
>   					goto m64_failed;
>   				}
> +
> +				pdn->pe_num_map[i] = pe->pe_number;
>   			}
>   		} else {
>   			mutex_lock(&phb->ioda.pe_alloc_mutex);
> @@ -1590,8 +1599,11 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   m64_failed:
>   	if (pdn->m64_single_mode) {
>   		for (i = 0; i < num_vfs; i++) {
> -			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
> -				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
> +			if (pdn->pe_num_map[i] == IODA_INVALID_PE)
> +				continue;
> +
> +			pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
> +			pnv_ioda_free_pe(pe);
>   		}
>   	} else
>   		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 928cf81..ef9924a 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -109,7 +109,7 @@ struct pnv_phb {
>   	int (*init_m64)(struct pnv_phb *phb);
>   	void (*reserve_m64_pe)(struct pci_bus *bus,
>   			       unsigned long *pe_bitmap, bool all);
> -	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
> +	struct pnv_ioda_pe *(*pick_m64_pe)(struct pci_bus *bus, bool all);
>   	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
>   	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
>   	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 20/45] powerpc/powernv: Allocate PE# in reverse order
  2016-02-17  3:44 ` [PATCH v8 20/45] powerpc/powernv: Allocate PE# in reverse order Gavin Shan
@ 2016-04-19  3:07   ` Alexey Kardashevskiy
  2016-04-20  1:04     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  3:07 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> PE number for one particular PE can be allocated dynamically or
> reserved according to the consumed M64 (64-bits prefetchable)
> segments of the PE. The M64 resources, and hence their segments
> and PE number are assigned/reserved in ascending order. The PE
> numbers are allocated dynamically in ascending order as well.
> It's not a problem as the PE numbers are reserved and then
> allocated all at once in fine order. However, it will introduce
> conflicts when PCI hotplug is supported: the PE number to be
> reserved for newly added PE might have been assigned.
>
> To resolve above conflicts, this forces the PE number to be
> allocated dynamically in reverse order. With this patch applied,
> the PE numbers are reserved in ascending order, but allocated
> dynamically in reverse order.


The patch is probably is ok, the commit log is not - I do not follow it. 
Some PEs are reserved (for what? why does the absolute PE number matter? 
put it in the commit log), that means that the corresponding bits in 
pe_alloc[] should be set so when you will be allocating PEs for a just 
plugged device, you won't pick them and you will pick free ones, and the 
order should not matter. I would think that "reservation" happens once at 
the boot time so you set "used" bits for the reserved PEs then and after 
that the dynamic allocator will skip them.


>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++--------
>   1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index f182ca7..565725b 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -144,16 +144,14 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>
>   static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>   {
> -	unsigned long pe;
> +	unsigned long pe = phb->ioda.total_pe_num - 1;
>
> -	do {
> -		pe = find_next_zero_bit(phb->ioda.pe_alloc,
> -					phb->ioda.total_pe_num, 0);
> -		if (pe >= phb->ioda.total_pe_num)
> -			return NULL;
> -	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
> +	for (pe = phb->ioda.total_pe_num - 1; pe >= 0; pe--) {
> +		if (!test_and_set_bit(pe, phb->ioda.pe_alloc))
> +			return pnv_ioda_init_pe(phb, pe);
> +	}
>
> -	return pnv_ioda_init_pe(phb, pe);
> +	return NULL;
>   }
>
>   static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time
  2016-02-17  3:44 ` [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time Gavin Shan
@ 2016-04-19  4:16   ` Alexey Kardashevskiy
  2016-04-20  1:12     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  4:16 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> Currently, the PEs and their associated resources are assigned
> in ppc_md.pcibios_fixup() except those used by SRIOV VFs.

But this new code does not affect IOV and VF's PEs will still be created 
somewhere else rather than pnv_pci_setup_bridge()?


> The
> function is called for once after PCI probing and resources
> assignment is completed. So it isn't hotplug friendly.
>
> This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
> is called on the event during system bootup and PCI hotplug: updating
> PCI bridge's windows after resource assignment/reassignment are done.
> For partial hotplug case, where not all PCI devices belonging to the
> PE are unplugged and plugged again, we just need unbinding/binding
> the affected PCI devices with the corresponding PE without creating
> new one.
>
> As there is no upstream bridge for root bus that needs to be covered
> by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
> before any other PEs can be created, as PE for root bus is the ancestor
> to anyone else.

We did not need a root bus PE before? What is the other PE reserved for? 
Comments only say "reserved"...

>
> Also, the windows of root port or the upstream port of PCIe switch behind
> root port are extended to be PHB's apertures to accommodate the additional
> resources needed by newly plugged devices based on the fact: hotpluggable
> slot is behind root port or downstream port of the PCIe switch behind
> root port. The extension for those PCI brdiges' windows is done in
> ppc_md.pcibios_setup_bridge() as well.


This patch seems to be doing way too many things, hard to follow.

Could you please split the patch into smaller chunks? For example (you can 
do it totally different):
- move pnv_pci_ioda_setup_opal_tce_kill()
- move PE creation from pnv_pci_ioda_fixup() to pnv_pci_setup_bridge();
- add pnv_pci_fixup_bridge_resources()
- add an extra reserved PE for the root bus (and all this magic with 
root_pe_idx/root_pe_populated)
- ...




-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table
  2016-02-17  3:44 ` [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table Gavin Shan
@ 2016-04-19  4:28   ` Alexey Kardashevskiy
  2016-04-20  1:15     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  4:28 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> pnv_pci_ioda_table_free_pages() can be reused to release the IODA1
> TCE table when releasing IODA1 PE in subsequent patches.
>
> This renames the following functions to support releasing IODA1 TCE
> table: pnv_pci_ioda2_table_free_pages() to pnv_pci_ioda_table_free_pages(),
> pnv_pci_ioda2_table_do_free_pages() to pnv_pci_ioda_table_do_free_pages().
> No logical changes introduced.

I can only see renaming here but it seems (from 
IODA_architecture_04-14-2008.pdf) that IODA1 does not support multi-level 
TCE tables in the way IODA2 does.


>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 18 +++++++++---------
>   1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index d360607..077f9db 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -51,7 +51,7 @@
>   #define POWERNV_IOMMU_DEFAULT_LEVELS	1
>   #define POWERNV_IOMMU_MAX_LEVELS	5
>
> -static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
> +static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl);
>
>   static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
>   			    const char *fmt, ...)
> @@ -1352,7 +1352,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>   		iommu_group_put(pe->table_group.group);
>   		BUG_ON(pe->table_group.group);
>   	}
> -	pnv_pci_ioda2_table_free_pages(tbl);
> +	pnv_pci_ioda_table_free_pages(tbl);
>   	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>   }
>
> @@ -1946,7 +1946,7 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, long index,
>
>   static void pnv_ioda2_table_free(struct iommu_table *tbl)
>   {
> -	pnv_pci_ioda2_table_free_pages(tbl);
> +	pnv_pci_ioda_table_free_pages(tbl);
>   	iommu_free_table(tbl, "pnv");
>   }
>
> @@ -2448,7 +2448,7 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int nid, unsigned shift,
>   	return addr;
>   }
>
> -static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
> +static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
>   		unsigned long size, unsigned level);
>
>   static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> @@ -2487,7 +2487,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>   	 * release partially allocated table.
>   	 */
>   	if (offset < tce_table_size) {
> -		pnv_pci_ioda2_table_do_free_pages(addr,
> +		pnv_pci_ioda_table_do_free_pages(addr,
>   				1ULL << (level_shift - 3), levels - 1);
>   		return -ENOMEM;
>   	}
> @@ -2505,7 +2505,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>   	return 0;
>   }
>
> -static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
> +static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
>   		unsigned long size, unsigned level)
>   {
>   	const unsigned long addr_ul = (unsigned long) addr &
> @@ -2521,7 +2521,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>   			if (!(hpa & (TCE_PCI_READ | TCE_PCI_WRITE)))
>   				continue;
>
> -			pnv_pci_ioda2_table_do_free_pages(__va(hpa), size,
> +			pnv_pci_ioda_table_do_free_pages(__va(hpa), size,
>   					level - 1);
>   		}
>   	}
> @@ -2529,7 +2529,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>   	free_pages(addr_ul, get_order(size << 3));
>   }
>
> -static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
> +static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl)
>   {
>   	const unsigned long size = tbl->it_indirect_levels ?
>   			tbl->it_level_size : tbl->it_size;
> @@ -2537,7 +2537,7 @@ static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
>   	if (!tbl->it_size)
>   		return;
>
> -	pnv_pci_ioda2_table_do_free_pages((__be64 *)tbl->it_base, size,
> +	pnv_pci_ioda_table_do_free_pages((__be64 *)tbl->it_base, size,
>   			tbl->it_indirect_levels);
>   }
>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 23/45] powerpc/powernv: Dynamically release PEs
  2016-02-17  3:44 ` [PATCH v8 23/45] powerpc/powernv: Dynamically release PEs Gavin Shan
@ 2016-04-19  5:19   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  5:19 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This support releasing PEs dynamically. Firstly, this moves
> pnv_pci_ioda2_release_dma_pe() around, which is called to
> release DMA resource on releasing IODA2 PE.


imho move would only make sense if we could get rid of the forward 
declarations but this is not the case.


> Secondly, several
> functions are implemented to release the consumed resources
> on releasing the PE:
>
>     * pnv_pci_ioda1_unset_window() to unset TVEs for the PE.
>     * pnv_pci_ioda1_release_dma_pe() to unset TVEs for the PE and
>       destroy the IOMMU table.
>     * pnv_ioda_release_pe_seg() releases the consumed IO/M32/M64
>       segments by the PE.
>
> Lastly, this adds a reference count of PE, representing the number
> of PCI devices associated with the PE. The reference count is
> increased when PCI device joins the PE. It's decreased when PCI
> device leaves the PE in pnv_pci_release_device(). When the count
> becomes zero, its consumed resources are released by functions
> as mentioned above. Note that the count is accessed concurrently.
> So a "counter" with "int" type is enough here.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 236 ++++++++++++++++++++++++++----
>   arch/powerpc/platforms/powernv/pci.h      |   1 +
>   2 files changed, 209 insertions(+), 28 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 077f9db..fa428a8 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -119,6 +119,158 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>   		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>   }
>
> +static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe);
> +static long pnv_pci_ioda1_unset_window(struct iommu_table_group *table_group,
> +				       int num);
> +static void pnv_pci_ioda1_release_dma_pe(struct pnv_ioda_pe *pe)
> +{
> +	struct iommu_table *tbl;
> +	unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
> +	int64_t rc;
> +
> +	if (!weight)
> +		return;
> +
> +	tbl = pe->table_group.tables[0];
> +	rc = pnv_pci_ioda1_unset_window(&pe->table_group, 0);
> +	if (rc)
> +		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
> +
> +	if (pe->table_group.group) {
> +		iommu_group_put(pe->table_group.group);
> +		WARN_ON(pe->table_group.group);
> +	}
> +
> +	pnv_pci_ioda_table_free_pages(tbl);
> +	iommu_free_table(tbl, "pnv");
> +}
> +
> +static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
> +				       int num);
> +static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
> +static void pnv_pci_ioda2_release_dma_pe(struct pnv_ioda_pe *pe)


If you left this code in its old location, it would be lot more obvious 
what you silently changed in this function (checking for weight). Please 
either do not move the code (this is preferred as I am hacking same chunk 
in  "[PATCH kernel 0/2] powerpc/powernv: Fix crash on PF unbind when VF is 
passed" and I'd like to reduce conflicts) or split it to a separate patch.


> +{
> +	struct iommu_table *tbl;
> +	unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
> +	int64_t rc;
> +
> +	if (!weight)
> +		return;
> +
> +	tbl = pe->table_group.tables[0];
> +	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> +	if (rc)
> +		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
> +
> +	pnv_pci_ioda2_set_bypass(pe, false);
> +	if (pe->table_group.group) {
> +		iommu_group_put(pe->table_group.group);
> +		WARN_ON(pe->table_group.group);
> +	}
> +
> +	pnv_pci_ioda_table_free_pages(tbl);
> +	iommu_free_table(tbl, "pnv");
> +}
> +
> +static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
> +{
> +	struct pnv_phb *phb = pe->phb;
> +	int win, index, *segmap = NULL;
> +	int64_t rc;
> +
> +	for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++) {


In "Re: [PATCH v7 27/50] powerpc/powernv: Dynamically release PEs" I 
suggested shorter & cleaner pnv_ioda_release_window(), what was wrong with it?



> +		if (phb->type == PNV_PHB_IODA2 &&
> +		    (win == OPAL_IO_WINDOW_TYPE || win == OPAL_M64_WINDOW_TYPE))
> +			continue;
> +
> +		switch (win) {
> +		case OPAL_IO_WINDOW_TYPE:
> +			segmap = phb->ioda.io_segmap;
> +			break;
> +		case OPAL_M32_WINDOW_TYPE:
> +			segmap = phb->ioda.m32_segmap;
> +			break;
> +		case OPAL_M64_WINDOW_TYPE:
> +			segmap = phb->ioda.m64_segmap;
> +			break;
> +		}
> +
> +		for (index = 0; index < phb->ioda.total_pe_num; index++) {
> +			if (segmap[index] != pe->pe_number)
> +				continue;
> +
> +			if (win == OPAL_M64_WINDOW_TYPE)
> +				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +						phb->ioda.reserved_pe_idx, win,
> +						index / PNV_IODA1_M64_SEGS,
> +						index % PNV_IODA1_M64_SEGS);
> +			else
> +				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +						phb->ioda.reserved_pe_idx, win,
> +						0, index);
> +			if (rc != OPAL_SUCCESS)
> +				pe_warn(pe, "Error %ld unmapping (%d) segment#%d\n",
> +					rc, win, index);
> +
> +			segmap[index] = IODA_INVALID_PE;
> +		}
> +	}
> +}
> +
> +static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb,
> +				   struct pnv_ioda_pe *pe);
> +static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe);
> +static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
> +{
> +	struct pnv_phb *phb = pe->phb;
> +	struct pnv_ioda_pe *tmp, *slave;
> +
> +	/* Release slave PEs in compound PE */
> +	if (pe->flags & PNV_IODA_PE_MASTER) {
> +		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
> +			pnv_ioda_release_pe(slave);
> +	}
> +
> +	/* Remove the PE from the list */
> +	list_del(&pe->list);
> +
> +	/* Release DMA segments */
> +	switch (phb->type) {
> +	case PNV_PHB_IODA1:
> +		pnv_pci_ioda1_release_dma_pe(pe);
> +		break;
> +	case PNV_PHB_IODA2:
> +		pnv_pci_ioda2_release_dma_pe(pe);
> +		break;
> +	default:
> +		WARN_ON(1);
> +	}
> +
> +	pnv_ioda_release_pe_seg(pe);
> +	pnv_ioda_deconfigure_pe(pe->phb, pe);
> +
> +	pnv_ioda_free_pe(pe);
> +}
> +
> +static void pnv_pci_release_device(struct pci_dev *pdev)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pci_dn *pdn = pci_get_pdn(pdev);
> +	struct pnv_ioda_pe *pe;
> +
> +	if (pdev->is_virtfn)
> +		return;
> +
> +	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
> +		return;
> +
> +	pe = &phb->ioda.pe_array[pdn->pe_number];
> +	WARN_ON(--pe->device_count < 0);
> +	if (pe->device_count == 0)
> +		pnv_ioda_release_pe(pe);
> +}
> +
>   static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
>   {
>   	phb->ioda.pe_array[pe_no].phb = phb;
> @@ -715,7 +867,6 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
>   	return 0;
>   }
>
> -#ifdef CONFIG_PCI_IOV
>   static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>   {
>   	struct pci_dev *parent;
> @@ -750,9 +901,11 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>   		}
>   		rid_end = pe->rid + (count << 8);
>   	} else {
> +#ifdef CONFIG_PCI_IOV
>   		if (pe->flags & PNV_IODA_PE_VF)
>   			parent = pe->parent_dev;
>   		else
> +#endif
>   			parent = pe->pdev->bus->self;
>   		bcomp = OpalPciBusAll;
>   		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
> @@ -790,11 +943,12 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>
>   	pe->pbus = NULL;
>   	pe->pdev = NULL;
> +#ifdef CONFIG_PCI_IOV
>   	pe->parent_dev = NULL;
> +#endif
>
>   	return 0;
>   }
> -#endif /* CONFIG_PCI_IOV */
>
>   static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>   {
> @@ -1031,6 +1185,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>   		if (pdn->pe_number != IODA_INVALID_PE)
>   			continue;
>
> +		pe->device_count++;
>   		pdn->pcidev = dev;
>   		pdn->pe_number = pe->pe_number;
>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
> @@ -1095,9 +1250,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   			bus->busn_res.start, pe->pe_number);
>
>   	if (pnv_ioda_configure_pe(phb, pe)) {
> -		/* XXX What do we do here ? */
> -		pnv_ioda_free_pe(pe);
>   		pe->pbus = NULL;
> +		pnv_ioda_release_pe(pe);
>   		return NULL;
>   	}
>
> @@ -1333,29 +1487,6 @@ m64_failed:
>   	return -EBUSY;
>   }
>
> -static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
> -		int num);
> -static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
> -
> -static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe *pe)
> -{
> -	struct iommu_table    *tbl;
> -	int64_t               rc;
> -
> -	tbl = pe->table_group.tables[0];
> -	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> -	if (rc)
> -		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
> -
> -	pnv_pci_ioda2_set_bypass(pe, false);
> -	if (pe->table_group.group) {
> -		iommu_group_put(pe->table_group.group);
> -		BUG_ON(pe->table_group.group);
> -	}
> -	pnv_pci_ioda_table_free_pages(tbl);
> -	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
> -}
> -
>   static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>   {
>   	struct pci_bus        *bus;
> @@ -1376,7 +1507,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>   		if (pe->parent_dev != pdev)
>   			continue;
>
> -		pnv_pci_ioda2_release_dma_pe(pdev, pe);
> +		pnv_pci_ioda2_release_dma_pe(pe);
>
>   		/* Remove from list */
>   		mutex_lock(&phb->ioda.pe_list_mutex);
> @@ -1780,6 +1911,16 @@ static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl,
>   	 */
>   }
>
> +static void pnv_pci_ioda1_tce_invalidate_entire(struct pnv_ioda_pe *pe)
> +{
> +	struct iommu_table *tbl = pe->table_group.tables[0];
> +
> +	if (!tbl)
> +		return;
> +
> +	pnv_pci_ioda1_tce_invalidate(tbl, tbl->it_offset, tbl->it_size, false);
> +}
> +
>   static int pnv_ioda1_tce_build(struct iommu_table *tbl, long index,
>   		long npages, unsigned long uaddr,
>   		enum dma_data_direction direction,
> @@ -2144,6 +2285,44 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   	}
>   }
>
> +static long pnv_pci_ioda1_unset_window(struct iommu_table_group *table_group,
> +				       int num)
> +{
> +	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
> +					      table_group);
> +	struct pnv_phb *phb = pe->phb;
> +	int start, count, i;
> +	long rc = OPAL_SUCCESS;
> +
> +	pe_info(pe, "Removing DMA window #%d\n", num);
> +
> +	/* Search the used DMA32 segments */
> +	start = -1;
> +	count = 0;
> +	for (i = 0; i < phb->ioda.dma32_count; i++) {
> +		if (phb->ioda.dma32_segmap[i] != pe->pe_number)
> +			continue;
> +
> +		if (count++ == 0)
> +			start = i;
> +	}
> +
> +	if (!count)
> +		return OPAL_SUCCESS;
> +
> +	for (i = start; i < start + count; i++)
> +		rc |= opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
> +						 i, 0, 0ul, 0ul, 0ul);

If there are 2 different bad rc, they will make up absolutely irrelevant 
error code.


> +	if (rc)
> +		pe_warn(pe, "Failure %ld unmapping TVEs\n");
> +	else
> +		pnv_pci_ioda1_tce_invalidate_entire(pe);
> +
> +	pnv_pci_unlink_table_and_group(table_group->tables[num], table_group);
> +
> +	return rc;
> +}
> +
>   static long pnv_pci_ioda2_set_window(struct iommu_table_group *table_group,
>   		int num, struct iommu_table *tbl)
>   {
> @@ -3318,6 +3497,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>   	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
>   #endif
>   	.enable_device_hook	= pnv_pci_enable_device_hook,
> +	.release_device		= pnv_pci_release_device,
>   	.window_alignment	= pnv_pci_window_alignment,
>   	.setup_bridge		= pnv_pci_setup_bridge,
>   	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 01f2428..0cddde3 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -31,6 +31,7 @@ struct pnv_phb;
>   struct pnv_ioda_pe {
>   	unsigned long		flags;
>   	struct pnv_phb		*phb;
> +	int			device_count;
>
>   #define PNV_IODA_MAX_PEER_PES	8
>   	struct pnv_ioda_pe	*peers[PNV_IODA_MAX_PEER_PES];
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
  2016-02-17  3:44   ` [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Gavin Shan
  (?)
@ 2016-04-19  5:28   ` Alexey Kardashevskiy
  2016-04-20  1:23     ` Gavin Shan
  -1 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  5:28 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
> with names of the weak functions in PCI subsystem, which have the
> prefix "pcibios". No logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/pci-bridge.h |  4 ++--
>   arch/powerpc/kernel/eeh_driver.c      | 12 ++++++------
>   arch/powerpc/kernel/pci-hotplug.c     | 15 +++++++--------
>   drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
>   drivers/pci/hotplug/rpaphp_core.c     |  4 ++--
>   drivers/pci/hotplug/rpaphp_pci.c      |  2 +-
>   6 files changed, 19 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index 4dd6ef4..c817f38 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -263,10 +263,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
>   extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
>
>   /** Remove all of the PCI devices under this bus */
> -extern void pcibios_remove_pci_devices(struct pci_bus *bus);
> +extern void pci_remove_pci_devices(struct pci_bus *bus);


pci_lala_pci_lala() ("pci" is used twice) looks weird, if the prefix is 
"pci", what other device types can they handle?...

May be pcihp_add_devices(), pcihp_remove_devices() as these as defined in 
pci-hotplug.c?


>
>   /** Discover new pci devices under this bus, and add them */
> -extern void pcibios_add_pci_devices(struct pci_bus *bus);
> +extern void pci_add_pci_devices(struct pci_bus *bus);
>
>
>   extern void isa_bridge_find_early(struct pci_controller *hose);
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index fb6207d..59e53fe 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -621,7 +621,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>   	 * We don't remove the corresponding PE instances because
>   	 * we need the information afterwords. The attached EEH
>   	 * devices are expected to be attached soon when calling
> -	 * into pcibios_add_pci_devices().
> +	 * into pci_add_pci_devices().
>   	 */
>   	eeh_pe_state_mark(pe, EEH_PE_KEEP);
>   	if (bus) {
> @@ -630,7 +630,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>   		} else {
>   			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
>   			pci_lock_rescan_remove();
> -			pcibios_remove_pci_devices(bus);
> +			pci_remove_pci_devices(bus);
>   			pci_unlock_rescan_remove();
>   		}
>   	} else if (frozen_bus) {
> @@ -681,7 +681,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>   		if (pe->type & EEH_PE_VF)
>   			eeh_add_virt_device(edev, NULL);
>   		else
> -			pcibios_add_pci_devices(bus);
> +			pci_add_pci_devices(bus);
>   	} else if (frozen_bus && rmv_data->removed) {
>   		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>   		ssleep(5);
> @@ -691,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>   		if (pe->type & EEH_PE_VF)
>   			eeh_add_virt_device(edev, NULL);
>   		else
> -			pcibios_add_pci_devices(frozen_bus);
> +			pci_add_pci_devices(frozen_bus);
>   	}
>   	eeh_pe_state_clear(pe, EEH_PE_KEEP);
>
> @@ -896,7 +896,7 @@ perm_error:
>   			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>
>   			pci_lock_rescan_remove();
> -			pcibios_remove_pci_devices(frozen_bus);
> +			pci_remove_pci_devices(frozen_bus);
>   			pci_unlock_rescan_remove();
>   		}
>   	}
> @@ -981,7 +981,7 @@ static void eeh_handle_special_event(void)
>   				bus = eeh_pe_bus_get(phb_pe);
>   				eeh_pe_dev_traverse(pe,
>   					eeh_report_failure, NULL);
> -				pcibios_remove_pci_devices(bus);
> +				pci_remove_pci_devices(bus);
>   			}
>   			pci_unlock_rescan_remove();
>   		}
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 59c4361..78bf2a1 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -38,20 +38,20 @@ void pcibios_release_device(struct pci_dev *dev)
>   }
>
>   /**
> - * pcibios_remove_pci_devices - remove all devices under this bus
> + * pci_remove_pci_devices - remove all devices under this bus
>    * @bus: the indicated PCI bus
>    *
>    * Remove all of the PCI devices under this bus both from the
>    * linux pci device tree, and from the powerpc EEH address cache.
>    */
> -void pcibios_remove_pci_devices(struct pci_bus *bus)
> +void pci_remove_pci_devices(struct pci_bus *bus)
>   {
>   	struct pci_dev *dev, *tmp;
>   	struct pci_bus *child_bus;
>
>   	/* First go down child busses */
>   	list_for_each_entry(child_bus, &bus->children, node)
> -		pcibios_remove_pci_devices(child_bus);
> +		pci_remove_pci_devices(child_bus);
>
>   	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
>   		 pci_domain_nr(bus),  bus->number);
> @@ -60,11 +60,10 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
>   		pci_stop_and_remove_bus_device(dev);
>   	}
>   }
> -
> -EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
> +EXPORT_SYMBOL_GPL(pci_remove_pci_devices);
>
>   /**
> - * pcibios_add_pci_devices - adds new pci devices to bus
> + * pci_add_pci_devices - adds new pci devices to bus
>    * @bus: the indicated PCI bus
>    *
>    * This routine will find and fixup new pci devices under
> @@ -74,7 +73,7 @@ EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
>    * is how this routine differs from other, similar pcibios
>    * routines.)
>    */
> -void pcibios_add_pci_devices(struct pci_bus * bus)
> +void pci_add_pci_devices(struct pci_bus *bus)
>   {
>   	int slotno, mode, pass, max;
>   	struct pci_dev *dev;
> @@ -114,4 +113,4 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
>   	}
>   	pcibios_finish_adding_to_bus(bus);
>   }
> -EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
> +EXPORT_SYMBOL_GPL(pci_add_pci_devices);
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
> index b46b57d..730982b 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -380,7 +380,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
>   	}
>
>   	/* Remove all devices below slot */
> -	pcibios_remove_pci_devices(bus);
> +	pci_remove_pci_devices(bus);
>
>   	/* Unmap PCI IO space */
>   	if (pcibios_unmap_io_space(bus)) {
> diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
> index 611f605..bba07b3 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -404,7 +404,7 @@ static int enable_slot(struct hotplug_slot *hotplug_slot)
>
>   	if (state == PRESENT) {
>   		pci_lock_rescan_remove();
> -		pcibios_add_pci_devices(slot->bus);
> +		pci_add_pci_devices(slot->bus);
>   		pci_unlock_rescan_remove();
>   		slot->state = CONFIGURED;
>   	} else if (state == EMPTY) {
> @@ -426,7 +426,7 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
>   		return -EINVAL;
>
>   	pci_lock_rescan_remove();
> -	pcibios_remove_pci_devices(slot->bus);
> +	pci_remove_pci_devices(slot->bus);
>   	pci_unlock_rescan_remove();
>   	vm_unmap_aliases();
>
> diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
> index 7836d69..1099b38 100644
> --- a/drivers/pci/hotplug/rpaphp_pci.c
> +++ b/drivers/pci/hotplug/rpaphp_pci.c
> @@ -116,7 +116,7 @@ int rpaphp_enable_slot(struct slot *slot)
>   		}
>
>   		if (list_empty(&bus->devices))
> -			pcibios_add_pci_devices(bus);
> +			pci_add_pci_devices(bus);
>
>   		if (!list_empty(&bus->devices)) {
>   			info->adapter_status = CONFIGURED;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 25/45] powerpc/pci: Rename pcibios_find_pci_bus()
  2016-02-17  3:44 ` [PATCH v8 25/45] powerpc/pci: Rename pcibios_find_pci_bus() Gavin Shan
@ 2016-04-19  5:31   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  5:31 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This renames pcibios_find_pci_bus() to pci_find_bus_by_node() to
> avoid conflicts with those PCI subsystem weak function names, which
> have prefix "pcibios". No logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/include/asm/pci-bridge.h      | 2 +-
>   arch/powerpc/platforms/pseries/pci_dlpar.c | 5 ++---
>   drivers/pci/hotplug/rpadlpar_core.c        | 6 +++---
>   drivers/pci/hotplug/rpaphp_pci.c           | 2 +-
>   4 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index c817f38..03f4ee7 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -260,7 +260,7 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
>   #endif
>
>   /** Find the bus corresponding to the indicated device node */
> -extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
> +extern struct pci_bus *pci_find_bus_by_node(struct device_node *dn);
>
>   /** Remove all of the PCI devices under this bus */
>   extern void pci_remove_pci_devices(struct pci_bus *bus);
> diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
> index 5d4a3df..aee22b4 100644
> --- a/arch/powerpc/platforms/pseries/pci_dlpar.c
> +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
> @@ -54,8 +54,7 @@ find_bus_among_children(struct pci_bus *bus,
>   	return child;
>   }
>
> -struct pci_bus *
> -pcibios_find_pci_bus(struct device_node *dn)
> +struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
>   {
>   	struct pci_dn *pdn = dn->data;
>
> @@ -64,7 +63,7 @@ pcibios_find_pci_bus(struct device_node *dn)
>
>   	return find_bus_among_children(pdn->phb->bus, dn);
>   }
> -EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
> +EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
>
>   struct pci_controller *init_phb_dynamic(struct device_node *dn)
>   {
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
> index 730982b..acbf041 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -175,7 +175,7 @@ static int dlpar_add_pci_slot(char *drc_name, struct device_node *dn)
>   	struct pci_dev *dev;
>   	struct pci_controller *phb;
>
> -	if (pcibios_find_pci_bus(dn))
> +	if (pci_find_bus_by_node(dn))
>   		return -EINVAL;
>
>   	/* Add pci bus */
> @@ -212,7 +212,7 @@ static int dlpar_remove_phb(char *drc_name, struct device_node *dn)
>   	struct pci_dn *pdn;
>   	int rc = 0;
>
> -	if (!pcibios_find_pci_bus(dn))
> +	if (!pci_find_bus_by_node(dn))
>   		return -EINVAL;
>
>   	/* If pci slot is hotpluggable, use hotplug to remove it */
> @@ -356,7 +356,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
>
>   	pci_lock_rescan_remove();
>
> -	bus = pcibios_find_pci_bus(dn);
> +	bus = pci_find_bus_by_node(dn);
>   	if (!bus) {
>   		ret = -EINVAL;
>   		goto out;
> diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
> index 1099b38..a9180bb 100644
> --- a/drivers/pci/hotplug/rpaphp_pci.c
> +++ b/drivers/pci/hotplug/rpaphp_pci.c
> @@ -93,7 +93,7 @@ int rpaphp_enable_slot(struct slot *slot)
>   	if (rc)
>   		return rc;
>
> -	bus = pcibios_find_pci_bus(slot->dn);
> +	bus = pci_find_bus_by_node(slot->dn);
>   	if (!bus) {
>   		err("%s: no pci_bus for dn %s\n", __func__, slot->dn->full_name);
>   		return -EINVAL;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 27/45] powerpc/pci: Export pci_add_device_node_info()
  2016-02-17  3:44 ` [PATCH v8 27/45] powerpc/pci: Export pci_add_device_node_info() Gavin Shan
@ 2016-04-19  5:35   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  5:35 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This renames update_dn_pci_info() to pci_add_device_node_info()
> with corresponding adjustment on the parameter type and exports it.
> The function is used to create pdn (struct pci_dn) for the indicated
> device node. Another function add_pdn(), almost wrapper of
> pci_add_device_node_info(), to be used in traverse_pci_devices(). No
> logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/include/asm/pci-bridge.h  |  3 ++-
>   arch/powerpc/kernel/pci_dn.c           | 30 +++++++++++++++++++-----------
>   arch/powerpc/platforms/pseries/setup.c |  2 +-
>   3 files changed, 22 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index 03f4ee7..72a9d4e 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -238,7 +238,8 @@ extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
>   extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
>   extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
>   extern void remove_dev_pci_data(struct pci_dev *pdev);
> -extern void *update_dn_pci_info(struct device_node *dn, void *data);
> +extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
> +					       struct device_node *dn);
>
>   static inline int pci_device_from_OF_node(struct device_node *np,
>   					  u8 *bus, u8 *devfn)
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 38102cb..0a249ff 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -282,13 +282,9 @@ void remove_dev_pci_data(struct pci_dev *pdev)
>   #endif /* CONFIG_PCI_IOV */
>   }
>
> -/*
> - * Traverse_func that inits the PCI fields of the device node.
> - * NOTE: this *must* be done before read/write config to the device.
> - */
> -void *update_dn_pci_info(struct device_node *dn, void *data)
> +struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
> +					struct device_node *dn)
>   {
> -	struct pci_controller *phb = data;
>   	const __be32 *type = of_get_property(dn, "ibm,pci-config-space-type", NULL);
>   	const __be32 *regs;
>   	struct device_node *parent;
> @@ -299,7 +295,7 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
>   		return NULL;
>   	dn->data = pdn;
>   	pdn->node = dn;
> -	pdn->phb = phb;
> +	pdn->phb = hose;
>   #ifdef CONFIG_PPC_POWERNV
>   	pdn->pe_number = IODA_INVALID_PE;
>   #endif
> @@ -331,8 +327,9 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
>   	if (pdn->parent)
>   		list_add_tail(&pdn->list, &pdn->parent->child_list);
>
> -	return NULL;
> +	return pdn;
>   }
> +EXPORT_SYMBOL_GPL(pci_add_device_node_info);
>
>   /*
>    * Traverse a device tree stopping each PCI device in the tree.
> @@ -432,6 +429,18 @@ void *traverse_pci_dn(struct pci_dn *root,
>   	return NULL;
>   }
>
> +static void *add_pdn(struct device_node *dn, void *data)
> +{
> +	struct pci_controller *hose = data;
> +	struct pci_dn *pdn;
> +
> +	pdn = pci_add_device_node_info(hose, dn);
> +	if (!pdn)
> +		return ERR_PTR(-ENOMEM);
> +
> +	return NULL;
> +}
> +
>   /**
>    * pci_devs_phb_init_dynamic - setup pci devices under this PHB
>    * phb: pci-to-host bridge (top-level bridge connecting to cpu)
> @@ -446,8 +455,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>   	struct pci_dn *pdn;
>
>   	/* PHB nodes themselves must not match */
> -	update_dn_pci_info(dn, phb);
> -	pdn = dn->data;
> +	pdn = pci_add_device_node_info(phb, dn);
>   	if (pdn) {
>   		pdn->devfn = pdn->busno = -1;
>   		pdn->vendor_id = pdn->device_id = pdn->class_code = 0;
> @@ -456,7 +464,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>   	}
>
>   	/* Update dn->phb ptrs for new phb and children devices */
> -	traverse_pci_devices(dn, update_dn_pci_info, phb);
> +	traverse_pci_devices(dn, add_pdn, phb);
>   }
>
>   /**
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index 36df46e..6f8d020 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -265,7 +265,7 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
>   		pdn = parent ? PCI_DN(parent) : NULL;
>   		if (pdn) {
>   			/* Create pdn and EEH device */
> -			update_dn_pci_info(np, pdn->phb);
> +			pci_add_device_node_info(pdn->phb, np);
>   			eeh_dev_init(PCI_DN(np), pdn->phb);
>   		}
>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 28/45] powerpc/pci: Introduce pci_remove_device_node_info()
  2016-02-17  3:44 ` [PATCH v8 28/45] powerpc/pci: Introduce pci_remove_device_node_info() Gavin Shan
@ 2016-04-19  5:48   ` Alexey Kardashevskiy
  2016-04-20  1:25     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  5:48 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This implements and exports pci_remove_device_node_info(). It's
> used to remove the pdn (struct pci_dn) for the indicated device
> node. The function is going to be used by PowerNV PCI hotplug
> driver.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Kind of strange that there is no such helper for pseries, is there?


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/include/asm/pci-bridge.h |  1 +
>   arch/powerpc/kernel/pci_dn.c          | 23 +++++++++++++++++++++++
>   2 files changed, 24 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index 72a9d4e..c6310e2 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -240,6 +240,7 @@ extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
>   extern void remove_dev_pci_data(struct pci_dev *pdev);
>   extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>   					       struct device_node *dn);
> +extern void pci_remove_device_node_info(struct device_node *dn);
>
>   static inline int pci_device_from_OF_node(struct device_node *np,
>   					  u8 *bus, u8 *devfn)
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 0a249ff..ce10281 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -331,6 +331,29 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>   }
>   EXPORT_SYMBOL_GPL(pci_add_device_node_info);
>
> +void pci_remove_device_node_info(struct device_node *dn)
> +{
> +	struct pci_dn *pdn = dn ? PCI_DN(dn) : NULL;
> +#ifdef CONFIG_EEH
> +	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
> +
> +	if (edev)
> +		edev->pdn = NULL;
> +#endif
> +
> +	if (!pdn)
> +		return;
> +
> +	WARN_ON(!list_empty(&pdn->child_list));
> +	list_del(&pdn->list);
> +	if (pdn->parent)
> +		of_node_put(pdn->parent->node);
> +
> +	dn->data = NULL;
> +	kfree(pdn);
> +}
> +EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
> +
>   /*
>    * Traverse a device tree stopping each PCI device in the tree.
>    * This is done depth first.  As each node is processed, a "pre"
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes()
  2016-02-17  3:44 ` [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes() Gavin Shan
@ 2016-04-19  5:51       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  5:51 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, dja-Yfaxwxk/+vWsTnJN9+BGXg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This renames traverse_pci_devices() to pci_traverse_device_nodes().
> The function traverses all subordinate device nodes of the specified
> one. Also, below cleanup applied to the function. No logical changes
> introduced.
>
>     * Rename "pre" to "fn".
>     * Avoid assignment in if condition reported from checkpatch.pl.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>   arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
>   arch/powerpc/kernel/pci_dn.c         | 15 ++++++++++-----
>   arch/powerpc/platforms/pseries/msi.c |  4 ++--
>   3 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
> index ca0c5bf..8753e4e 100644
> --- a/arch/powerpc/include/asm/ppc-pci.h
> +++ b/arch/powerpc/include/asm/ppc-pci.h
> @@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
>   struct device_node;
>   struct pci_dn;
>
> -typedef void *(*traverse_func)(struct device_node *me, void *data);



Why removing this typedef? Typedef's are good.

Anyway,


Reviewed-by: Alexey Kardashevskiy <aik-sLpHqDYs0B2HXe+LvDLADg@public.gmane.org>




> -void *traverse_pci_devices(struct device_node *start, traverse_func pre,
> -		void *data);
> +void *pci_traverse_device_nodes(struct device_node *start,
> +				void *(*fn)(struct device_node *, void *),
> +				void *data);
>   void *traverse_pci_dn(struct pci_dn *root,
>   		      void *(*fn)(struct pci_dn *, void *),
>   		      void *data);
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index ce10281..ecdccce 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -372,8 +372,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
>    * one of these nodes we also assume its siblings are non-pci for
>    * performance.
>    */
> -void *traverse_pci_devices(struct device_node *start, traverse_func pre,
> -		void *data)
> +void *pci_traverse_device_nodes(struct device_node *start,
> +				void *(*fn)(struct device_node *, void *),
> +				void *data)
>   {
>   	struct device_node *dn, *nextdn;
>   	void *ret;
> @@ -388,8 +389,11 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>   		if (classp)
>   			class = of_read_number(classp, 1);
>
> -		if (pre && ((ret = pre(dn, data)) != NULL))
> -			return ret;
> +		if (fn) {
> +			ret = fn(dn, data);
> +			if (ret)
> +				return ret;
> +		}
>
>   		/* If we are a PCI bridge, go down */
>   		if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
> @@ -411,6 +415,7 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>   	}
>   	return NULL;
>   }
> +EXPORT_SYMBOL_GPL(pci_traverse_device_nodes);
>
>   static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
>   				      struct pci_dn *pdn)
> @@ -487,7 +492,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>   	}
>
>   	/* Update dn->phb ptrs for new phb and children devices */
> -	traverse_pci_devices(dn, add_pdn, phb);
> +	pci_traverse_device_nodes(dn, add_pdn, phb);
>   }
>
>   /**
> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
> index 272e9ec..543a638 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>   	memset(&counts, 0, sizeof(struct msi_counts));
>
>   	/* Work out how many devices we have below this PE */
> -	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
> +	pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, &counts);
>
>   	if (counts.num_devices == 0) {
>   		pr_err("rtas_msi: found 0 devices under PE for %s\n",
> @@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>   	/* else, we have some more calculating to do */
>   	counts.requestor = pci_device_to_OF_node(dev);
>   	counts.request = request;
> -	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
> +	pci_traverse_device_nodes(pe_dn, count_spare_msis, &counts);
>
>   	/* If the quota isn't an integer multiple of the total, we can
>   	 * use the remainder as spare MSIs for anyone that wants them. */
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes()
@ 2016-04-19  5:51       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  5:51 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This renames traverse_pci_devices() to pci_traverse_device_nodes().
> The function traverses all subordinate device nodes of the specified
> one. Also, below cleanup applied to the function. No logical changes
> introduced.
>
>     * Rename "pre" to "fn".
>     * Avoid assignment in if condition reported from checkpatch.pl.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
>   arch/powerpc/kernel/pci_dn.c         | 15 ++++++++++-----
>   arch/powerpc/platforms/pseries/msi.c |  4 ++--
>   3 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
> index ca0c5bf..8753e4e 100644
> --- a/arch/powerpc/include/asm/ppc-pci.h
> +++ b/arch/powerpc/include/asm/ppc-pci.h
> @@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
>   struct device_node;
>   struct pci_dn;
>
> -typedef void *(*traverse_func)(struct device_node *me, void *data);



Why removing this typedef? Typedef's are good.

Anyway,


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>




> -void *traverse_pci_devices(struct device_node *start, traverse_func pre,
> -		void *data);
> +void *pci_traverse_device_nodes(struct device_node *start,
> +				void *(*fn)(struct device_node *, void *),
> +				void *data);
>   void *traverse_pci_dn(struct pci_dn *root,
>   		      void *(*fn)(struct pci_dn *, void *),
>   		      void *data);
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index ce10281..ecdccce 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -372,8 +372,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
>    * one of these nodes we also assume its siblings are non-pci for
>    * performance.
>    */
> -void *traverse_pci_devices(struct device_node *start, traverse_func pre,
> -		void *data)
> +void *pci_traverse_device_nodes(struct device_node *start,
> +				void *(*fn)(struct device_node *, void *),
> +				void *data)
>   {
>   	struct device_node *dn, *nextdn;
>   	void *ret;
> @@ -388,8 +389,11 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>   		if (classp)
>   			class = of_read_number(classp, 1);
>
> -		if (pre && ((ret = pre(dn, data)) != NULL))
> -			return ret;
> +		if (fn) {
> +			ret = fn(dn, data);
> +			if (ret)
> +				return ret;
> +		}
>
>   		/* If we are a PCI bridge, go down */
>   		if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
> @@ -411,6 +415,7 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>   	}
>   	return NULL;
>   }
> +EXPORT_SYMBOL_GPL(pci_traverse_device_nodes);
>
>   static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
>   				      struct pci_dn *pdn)
> @@ -487,7 +492,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>   	}
>
>   	/* Update dn->phb ptrs for new phb and children devices */
> -	traverse_pci_devices(dn, add_pdn, phb);
> +	pci_traverse_device_nodes(dn, add_pdn, phb);
>   }
>
>   /**
> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
> index 272e9ec..543a638 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>   	memset(&counts, 0, sizeof(struct msi_counts));
>
>   	/* Work out how many devices we have below this PE */
> -	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
> +	pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, &counts);
>
>   	if (counts.num_devices == 0) {
>   		pr_err("rtas_msi: found 0 devices under PE for %s\n",
> @@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>   	/* else, we have some more calculating to do */
>   	counts.requestor = pci_device_to_OF_node(dev);
>   	counts.request = request;
> -	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
> +	pci_traverse_device_nodes(pe_dn, count_spare_msis, &counts);
>
>   	/* If the quota isn't an integer multiple of the total, we can
>   	 * use the remainder as spare MSIs for anyone that wants them. */
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 30/45] powerpc/pci: Delay populating pdn
  2016-02-17  3:44 ` [PATCH v8 30/45] powerpc/pci: Delay populating pdn Gavin Shan
@ 2016-04-19  8:19   ` Alexey Kardashevskiy
  2016-04-20  2:13     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  8:19 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> The pdn (struct pci_dn) instances are allocated from memblock or
> bootmem when creating PCI controller (hoses) in setup_arch(). PCI
> hotplug, which will be supported by proceeding patches, releases
> PCI device nodes and their corresponding pdn on unplugging event.
> The memory chunks for pdn instances allocated from memblock or
> bootmem are hard to reused after being released.
>
> This delays creating pdn by pci_devs_phb_init() from setup_arch()
> to core_initcall() so that they are allocated from slab. The memory
> consumed by pdn can be released to system without problem during
> PCI unplugging time. It indicates that pci_dn is unavailable in
> setup_arch() and the the fixup on pdn (like AGP's) can't be carried
> out that time. We have to do that in ppc_md.pcibios_root_bridge_prepare()
> on maple/pasemi/powermac platforms where/when the pdn is available.
>
> At the mean while, the EEH device is created when pdn is populated,
> meaning pdn and EEH device have same life cycle. In turn, we needn't
> call eeh_dev_init() to create EEH device explicitly.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Uff. It would not hurt to mention that  pcibios_root_bridge_prepare is 
called from subsys_initcall() which is executed after core_initcall() so 
the code flow does not change.

Have you checked if there is anything in between 
core_initcall(pci_devs_phb_init) and subsys_initcall(pcibios_init) which 
might need device tree nodes? For example, subsys_initcall(pcibios_init) 
calls (eventually) pnv_pci_ioda_fixup(), if we are unlucky and 
pcibios_init() (and therefore pnv_pci_ioda_fixup() or what pseries/others 
do) is called before pcibios_init() - won't we crash or something?




-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 31/45] powerpc/pci: Don't scan empty slot
  2016-02-17  3:44 ` [PATCH v8 31/45] powerpc/pci: Don't scan empty slot Gavin Shan
@ 2016-04-19  8:19   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  8:19 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> In hotplug case, function pci_add_pci_devices() is called to rescan
> the specified PCI bus, which might not have any child devices. Access
> to the PCI bus's child device node will cause kernel crash without
> exception.
>
> This adds one more check to skip scanning PCI bus that doesn't have
> any subordinate devices from device-tree, in order to avoid kernel
> crash.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/kernel/pci-hotplug.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 7929a1c..3628c38 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -120,7 +120,8 @@ void pci_add_pci_devices(struct pci_bus *bus)
>   	if (mode == PCI_PROBE_DEVTREE) {
>   		/* use ofdt-based probe */
>   		of_rescan_bus(dn, bus);
> -	} else if (mode == PCI_PROBE_NORMAL) {
> +	} else if (mode == PCI_PROBE_NORMAL &&
> +		   dn->child && PCI_DN(dn->child)) {
>   		/*
>   		 * Use legacy probe. In the partial hotplug case, we
>   		 * probably have grandchildren devices unplugged. So
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 32/45] powerpc/pci: Update bridge windows on PCI plug
  2016-02-17  3:44 ` [PATCH v8 32/45] powerpc/pci: Update bridge windows on PCI plug Gavin Shan
@ 2016-04-19  8:47   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  8:47 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> On the PCI plugging event, PCI slot's subordinate devices are
> scanned and their (IO and MMIO) resources are assigned. Platform
> dependent resources (PE#, IO/MMIO/DMA windows) are allocated or
> created on updating windows of the slot's upstream bridge.
>
> This updates the windows of the hot plugged slot's upstream bridge
> in pcibios_finish_adding_to_bus() so that the platform resources
> (PE#, IO/MMIO/DMA segments) are allocated or created accordingly.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


To my very limited knowledge of the common PCI code, looks good.


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>




> ---
>   arch/powerpc/kernel/pci-common.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index 40df3a5..be9e515 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -1444,8 +1444,12 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus)
>   	/* Allocate bus and devices resources */
>   	pcibios_allocate_bus_resources(bus);
>   	pcibios_claim_one_bus(bus);
> -	if (!pci_has_flag(PCI_PROBE_ONLY))
> -		pci_assign_unassigned_bus_resources(bus);
> +	if (!pci_has_flag(PCI_PROBE_ONLY)) {
> +		if (bus->self)
> +			pci_assign_unassigned_bridge_resources(bus->self);
> +		else
> +			pci_assign_unassigned_bus_resources(bus);
> +	}
>
>   	/* Fixup EEH */
>   	eeh_add_device_tree_late(bus);
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset()
  2016-02-17  3:44 ` [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
  2016-02-17  4:35   ` Andrew Donnellan
@ 2016-04-19  8:49   ` Alexey Kardashevskiy
  1 sibling, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  8:49 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This drops unnecessary nested if statements in pnv_eeh_reset() to
> improve the code readability. After the changes, the unused local
> variable "ret" is dropped as well. No logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 67 +++++++++++++---------------
>   1 file changed, 31 insertions(+), 36 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 69e41ce..9226df1 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -1009,8 +1009,9 @@ static int pnv_eeh_reset_vf_pe(struct eeh_pe *pe, int option)
>   static int pnv_eeh_reset(struct eeh_pe *pe, int option)
>   {
>   	struct pci_controller *hose = pe->phb;
> +	struct pnv_phb *phb;
>   	struct pci_bus *bus;
> -	int ret;
> +	int64_t rc;
>
>   	/*
>   	 * For PHB reset, we always have complete reset. For those PEs whose
> @@ -1026,45 +1027,39 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
>   	 * reset. The side effect is that EEH core has to clear the frozen
>   	 * state explicitly after BAR restore.
>   	 */
> -	if (pe->type & EEH_PE_PHB) {
> -		ret = pnv_eeh_phb_reset(hose, option);
> -	} else {
> -		struct pnv_phb *phb;
> -		s64 rc;
> +	if (pe->type & EEH_PE_PHB)
> +		return pnv_eeh_phb_reset(hose, option);
>
> -		/*
> -		 * The frozen PE might be caused by PAPR error injection
> -		 * registers, which are expected to be cleared after hitting
> -		 * frozen PE as stated in the hardware spec. Unfortunately,
> -		 * that's not true on P7IOC. So we have to clear it manually
> -		 * to avoid recursive EEH errors during recovery.
> -		 */
> -		phb = hose->private_data;
> -		if (phb->model == PNV_PHB_MODEL_P7IOC &&
> -		    (option == EEH_RESET_HOT ||
> -		    option == EEH_RESET_FUNDAMENTAL)) {
> -			rc = opal_pci_reset(phb->opal_id,
> -					    OPAL_RESET_PHB_ERROR,
> -					    OPAL_ASSERT_RESET);
> -			if (rc != OPAL_SUCCESS) {
> -				pr_warn("%s: Failure %lld clearing "
> -					"error injection registers\n",
> -					__func__, rc);
> -				return -EIO;
> -			}
> +	/*
> +	 * The frozen PE might be caused by PAPR error injection
> +	 * registers, which are expected to be cleared after hitting
> +	 * frozen PE as stated in the hardware spec. Unfortunately,
> +	 * that's not true on P7IOC. So we have to clear it manually
> +	 * to avoid recursive EEH errors during recovery.
> +	 */
> +	phb = hose->private_data;
> +	if (phb->model == PNV_PHB_MODEL_P7IOC &&
> +	    (option == EEH_RESET_HOT ||
> +	     option == EEH_RESET_FUNDAMENTAL)) {
> +		rc = opal_pci_reset(phb->opal_id,
> +				    OPAL_RESET_PHB_ERROR,
> +				    OPAL_ASSERT_RESET);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("%s: Failure %lld clearing error injection registers\n",
> +				__func__, rc);
> +			return -EIO;
>   		}
> -
> -		bus = eeh_pe_bus_get(pe);
> -		if (pe->type & EEH_PE_VF)
> -			ret = pnv_eeh_reset_vf_pe(pe, option);
> -		else if (pci_is_root_bus(bus) ||
> -			pci_is_root_bus(bus->parent))
> -			ret = pnv_eeh_root_reset(hose, option);
> -		else
> -			ret = pnv_eeh_bridge_reset(bus->self, option);
>   	}
>
> -	return ret;
> +	bus = eeh_pe_bus_get(pe);
> +	if (pe->type & EEH_PE_VF)
> +		return pnv_eeh_reset_vf_pe(pe, option);
> +
> +	if (pci_is_root_bus(bus) ||
> +	    pci_is_root_bus(bus->parent))
> +		return pnv_eeh_root_reset(hose, option);
> +
> +	return pnv_eeh_bridge_reset(bus->self, option);
>   }
>
>   /**
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 34/45] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
  2016-02-17  3:44 ` [PATCH v8 34/45] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() Gavin Shan
@ 2016-04-19  8:57   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  8:57 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> The function pnv_pci_reset_secondary_bus() is called like below.
> It's impossible for call the function on root bus. So it's safe
> to remove the root bus case in the function. No functional changes
> introduced.
>
>     pci_parent_bus_reset() / pci_bus_reset() / pci_try_reset_bus()
>     pci_reset_bridge_secondary_bus()
>     pcibios_reset_secondary_bus()
>     pnv_pci_reset_secondary_bus()
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Reviewed-by: Daniel Axtens <dja@axtens.net>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 12 ++----------
>   1 file changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 9226df1..593b8dc 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -868,16 +868,8 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>
>   void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>   {
> -	struct pci_controller *hose;
> -
> -	if (pci_is_root_bus(dev->bus)) {
> -		hose = pci_bus_to_host(dev->bus);
> -		pnv_eeh_root_reset(hose, EEH_RESET_HOT);
> -		pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
> -	} else {
> -		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> -		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
> -	}
> +	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> +	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
>   }
>
>   static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, const char *type,
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 35/45] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  2016-02-17  3:44 ` [PATCH v8 35/45] powerpc/powernv: Fundamental reset " Gavin Shan
@ 2016-04-19  9:04       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  9:04 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, dja-Yfaxwxk/+vWsTnJN9+BGXg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> In pnv_pci_reset_secondary_bus(), we should issue fundamental reset
> if any one subordinate device of the specified bus is requesting that.
> Otherwise, the device might not come up after the reset.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>


Reviewed-by: Alexey Kardashevskiy <aik-sLpHqDYs0B2HXe+LvDLADg@public.gmane.org>


Out of curiosity - what does "fundamental" reset actually do?


> ---
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 21 ++++++++++++++++++++-
>   1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 593b8dc..c7454ba 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -866,9 +866,28 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>   	return 0;
>   }
>
> +static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
> +{
> +	int *freset = data;
> +
> +	/*
> +	 * Stop the iteration immediately if there has any one
> +	 * PCI device requesting fundamental reset.
> +	 */
> +	*freset |= pdev->needs_freset;
> +	return *freset;
> +}
> +
>   void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>   {
> -	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> +	int option, freset = 0;
> +
> +	if (dev->subordinate)
> +		pci_walk_bus(dev->subordinate,
> +			     pnv_pci_dev_reset_type, &freset);
> +
> +	option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
> +	pnv_eeh_bridge_reset(dev, option);
>   	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
>   }
>
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 35/45] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
@ 2016-04-19  9:04       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  9:04 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> In pnv_pci_reset_secondary_bus(), we should issue fundamental reset
> if any one subordinate device of the specified bus is requesting that.
> Otherwise, the device might not come up after the reset.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


Out of curiosity - what does "fundamental" reset actually do?


> ---
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 21 ++++++++++++++++++++-
>   1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 593b8dc..c7454ba 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -866,9 +866,28 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>   	return 0;
>   }
>
> +static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
> +{
> +	int *freset = data;
> +
> +	/*
> +	 * Stop the iteration immediately if there has any one
> +	 * PCI device requesting fundamental reset.
> +	 */
> +	*freset |= pdev->needs_freset;
> +	return *freset;
> +}
> +
>   void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>   {
> -	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> +	int option, freset = 0;
> +
> +	if (dev->subordinate)
> +		pci_walk_bus(dev->subordinate,
> +			     pnv_pci_dev_reset_type, &freset);
> +
> +	option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
> +	pnv_eeh_bridge_reset(dev, option);
>   	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
>   }
>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID
  2016-02-17  3:44 ` [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID Gavin Shan
@ 2016-04-19  9:28       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  9:28 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, dja-Yfaxwxk/+vWsTnJN9+BGXg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> PowerNV platforms runs on top of skiboot firmware that includes
> changes to support PCI slots. PCI slots are identified by PHB's
> ID or the combo of that and PCI slot ID.
>
> This changes the EEH PowerNV backend to support PCI slots:
>
>     * Rename arguments of opal_pci_reset() and opal_pci_poll().
>     * One more argument (PCI slot's state) added to opal_pci_poll().
>     * Drop pnv_eeh_phb_poll() and introduce a enhanced similar
>       function pnv_pci_poll() that will be used by PowerNV hotplug
>       backends.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>   arch/powerpc/include/asm/opal.h              |  4 +--
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 42 ++++++----------------------
>   arch/powerpc/platforms/powernv/pci.c         | 21 ++++++++++++++
>   arch/powerpc/platforms/powernv/pci.h         |  1 +
>   4 files changed, 32 insertions(+), 36 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 07a99e6..9e0039f 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -131,7 +131,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
>   int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
>   					uint16_t dma_window_number, uint64_t pci_start_addr,
>   					uint64_t pci_mem_size);
> -int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
> +int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
>
>   int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
>   				   uint64_t diag_buffer_len);
> @@ -148,7 +148,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
>   int64_t opal_set_system_attention_led(uint8_t led_action);
>   int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
>   			    __be16 *pci_error_type, __be16 *severity);
> -int64_t opal_pci_poll(uint64_t phb_id);
> +int64_t opal_pci_poll(uint64_t id, uint8_t *state);
>   int64_t opal_return_cpu(void);
>   int64_t opal_check_token(uint64_t token);
>   int64_t opal_reinit_cpus(uint64_t flags);
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index c7454ba..e23b063 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -717,28 +717,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
>   	return ret;
>   }
>
> -static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
> -{
> -	s64 rc = OPAL_HARDWARE;
> -
> -	while (1) {
> -		rc = opal_pci_poll(phb->opal_id);
> -		if (rc <= 0)
> -			break;
> -
> -		if (system_state < SYSTEM_RUNNING)
> -			udelay(1000 * rc);
> -		else
> -			msleep(rc);
> -	}
> -
> -	return rc;
> -}
> -
>   int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>   {
>   	struct pnv_phb *phb = hose->private_data;
>   	s64 rc = OPAL_HARDWARE;
> +	int ret;
>
>   	pr_debug("%s: Reset PHB#%x, option=%d\n",
>   		 __func__, hose->global_number, option);
> @@ -753,8 +736,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>   		rc = opal_pci_reset(phb->opal_id,
>   				    OPAL_RESET_PHB_COMPLETE,
>   				    OPAL_DEASSERT_RESET);
> -	if (rc < 0)
> -		goto out;
>
>   	/*
>   	 * Poll state of the PHB until the request is done
> @@ -762,24 +743,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>   	 * reset followed by hot reset on root bus. So we also
>   	 * need the PCI bus settlement delay.
>   	 */
> -	rc = pnv_eeh_phb_poll(phb);
> -	if (option == EEH_RESET_DEACTIVATE) {
> +	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
> +	if (option == EEH_RESET_DEACTIVATE && !ret) {
>   		if (system_state < SYSTEM_RUNNING)
>   			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
>   		else
>   			msleep(EEH_PE_RST_SETTLE_TIME);
>   	}
> -out:
> -	if (rc != OPAL_SUCCESS)
> -		return -EIO;
>
> -	return 0;
> +	return ret;
>   }
>
>   static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>   {
>   	struct pnv_phb *phb = hose->private_data;
>   	s64 rc = OPAL_HARDWARE;
> +	int ret;
>
>   	pr_debug("%s: Reset PHB#%x, option=%d\n",
>   		 __func__, hose->global_number, option);
> @@ -801,18 +780,13 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>   		rc = opal_pci_reset(phb->opal_id,
>   				    OPAL_RESET_PCI_HOT,
>   				    OPAL_DEASSERT_RESET);
> -	if (rc < 0)
> -		goto out;
>
>   	/* Poll state of the PHB until the request is done */
> -	rc = pnv_eeh_phb_poll(phb);
> -	if (option == EEH_RESET_DEACTIVATE)
> +	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
> +	if (option == EEH_RESET_DEACTIVATE && !ret)
>   		msleep(EEH_PE_RST_SETTLE_TIME);
> -out:
> -	if (rc != OPAL_SUCCESS)
> -		return -EIO;
>
> -	return 0;
> +	return ret;
>   }
>
>   static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index b87a315..a458703 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -42,6 +42,27 @@
>   #define cfg_dbg(fmt...)	do { } while(0)
>   //#define cfg_dbg(fmt...)	printk(fmt)
>
> +int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
> +{
> +	while (rval > 0) {
> +		if (system_state < SYSTEM_RUNNING)
> +			udelay(1000 * rval);
> +		else
> +			msleep(rval);
> +
> +		rval = opal_pci_poll(id, state);
> +	}
> +
> +	/*
> +	 * The caller expects to retrieve additional
> +	 * information if the last argument isn't NULL.
> +	 */
> +	if (rval == OPAL_SUCCESS && state)
> +		rval = opal_pci_poll(id, state);


Old OPAL won't touch @state so whatever garbage was there will stay there 
as the only caller which is passing not-NULL there is 
pnv_php_get_power_state() and it does not initialize @power_state (it is in 
"[PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver").


btw how will new OPAL react if old kernel is running, i.e. not passing 
@state at all? If it is initialized to NULL somewher - fine but what 
exactly does this initialization and makes sure that OPAL won't see garbage 
as a second parameter?

When ABI like this changes, I expect to see opal_pci_poll2() or 
opal_pci_poll_ex() rather than just an additional parameter to 
opal_pci_poll()...



> +
> +	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
> +}
> +
>   #ifdef CONFIG_PCI_MSI
>   int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
>   {
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 0cddde3..6857703 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -192,6 +192,7 @@ extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
>   		unsigned long *hpa, enum dma_data_direction *direction);
>   extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
>
> +int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state);
>   void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
>   				unsigned char *log_buff);
>   int pnv_pci_cfg_read(struct pci_dn *pdn,
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID
@ 2016-04-19  9:28       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  9:28 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> PowerNV platforms runs on top of skiboot firmware that includes
> changes to support PCI slots. PCI slots are identified by PHB's
> ID or the combo of that and PCI slot ID.
>
> This changes the EEH PowerNV backend to support PCI slots:
>
>     * Rename arguments of opal_pci_reset() and opal_pci_poll().
>     * One more argument (PCI slot's state) added to opal_pci_poll().
>     * Drop pnv_eeh_phb_poll() and introduce a enhanced similar
>       function pnv_pci_poll() that will be used by PowerNV hotplug
>       backends.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/opal.h              |  4 +--
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 42 ++++++----------------------
>   arch/powerpc/platforms/powernv/pci.c         | 21 ++++++++++++++
>   arch/powerpc/platforms/powernv/pci.h         |  1 +
>   4 files changed, 32 insertions(+), 36 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 07a99e6..9e0039f 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -131,7 +131,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
>   int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
>   					uint16_t dma_window_number, uint64_t pci_start_addr,
>   					uint64_t pci_mem_size);
> -int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
> +int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
>
>   int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
>   				   uint64_t diag_buffer_len);
> @@ -148,7 +148,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
>   int64_t opal_set_system_attention_led(uint8_t led_action);
>   int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
>   			    __be16 *pci_error_type, __be16 *severity);
> -int64_t opal_pci_poll(uint64_t phb_id);
> +int64_t opal_pci_poll(uint64_t id, uint8_t *state);
>   int64_t opal_return_cpu(void);
>   int64_t opal_check_token(uint64_t token);
>   int64_t opal_reinit_cpus(uint64_t flags);
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index c7454ba..e23b063 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -717,28 +717,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
>   	return ret;
>   }
>
> -static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
> -{
> -	s64 rc = OPAL_HARDWARE;
> -
> -	while (1) {
> -		rc = opal_pci_poll(phb->opal_id);
> -		if (rc <= 0)
> -			break;
> -
> -		if (system_state < SYSTEM_RUNNING)
> -			udelay(1000 * rc);
> -		else
> -			msleep(rc);
> -	}
> -
> -	return rc;
> -}
> -
>   int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>   {
>   	struct pnv_phb *phb = hose->private_data;
>   	s64 rc = OPAL_HARDWARE;
> +	int ret;
>
>   	pr_debug("%s: Reset PHB#%x, option=%d\n",
>   		 __func__, hose->global_number, option);
> @@ -753,8 +736,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>   		rc = opal_pci_reset(phb->opal_id,
>   				    OPAL_RESET_PHB_COMPLETE,
>   				    OPAL_DEASSERT_RESET);
> -	if (rc < 0)
> -		goto out;
>
>   	/*
>   	 * Poll state of the PHB until the request is done
> @@ -762,24 +743,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>   	 * reset followed by hot reset on root bus. So we also
>   	 * need the PCI bus settlement delay.
>   	 */
> -	rc = pnv_eeh_phb_poll(phb);
> -	if (option == EEH_RESET_DEACTIVATE) {
> +	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
> +	if (option == EEH_RESET_DEACTIVATE && !ret) {
>   		if (system_state < SYSTEM_RUNNING)
>   			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
>   		else
>   			msleep(EEH_PE_RST_SETTLE_TIME);
>   	}
> -out:
> -	if (rc != OPAL_SUCCESS)
> -		return -EIO;
>
> -	return 0;
> +	return ret;
>   }
>
>   static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>   {
>   	struct pnv_phb *phb = hose->private_data;
>   	s64 rc = OPAL_HARDWARE;
> +	int ret;
>
>   	pr_debug("%s: Reset PHB#%x, option=%d\n",
>   		 __func__, hose->global_number, option);
> @@ -801,18 +780,13 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>   		rc = opal_pci_reset(phb->opal_id,
>   				    OPAL_RESET_PCI_HOT,
>   				    OPAL_DEASSERT_RESET);
> -	if (rc < 0)
> -		goto out;
>
>   	/* Poll state of the PHB until the request is done */
> -	rc = pnv_eeh_phb_poll(phb);
> -	if (option == EEH_RESET_DEACTIVATE)
> +	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
> +	if (option == EEH_RESET_DEACTIVATE && !ret)
>   		msleep(EEH_PE_RST_SETTLE_TIME);
> -out:
> -	if (rc != OPAL_SUCCESS)
> -		return -EIO;
>
> -	return 0;
> +	return ret;
>   }
>
>   static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index b87a315..a458703 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -42,6 +42,27 @@
>   #define cfg_dbg(fmt...)	do { } while(0)
>   //#define cfg_dbg(fmt...)	printk(fmt)
>
> +int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
> +{
> +	while (rval > 0) {
> +		if (system_state < SYSTEM_RUNNING)
> +			udelay(1000 * rval);
> +		else
> +			msleep(rval);
> +
> +		rval = opal_pci_poll(id, state);
> +	}
> +
> +	/*
> +	 * The caller expects to retrieve additional
> +	 * information if the last argument isn't NULL.
> +	 */
> +	if (rval == OPAL_SUCCESS && state)
> +		rval = opal_pci_poll(id, state);


Old OPAL won't touch @state so whatever garbage was there will stay there 
as the only caller which is passing not-NULL there is 
pnv_php_get_power_state() and it does not initialize @power_state (it is in 
"[PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver").


btw how will new OPAL react if old kernel is running, i.e. not passing 
@state at all? If it is initialized to NULL somewher - fine but what 
exactly does this initialization and makes sure that OPAL won't see garbage 
as a second parameter?

When ABI like this changes, I expect to see opal_pci_poll2() or 
opal_pci_poll_ex() rather than just an additional parameter to 
opal_pci_poll()...



> +
> +	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
> +}
> +
>   #ifdef CONFIG_PCI_MSI
>   int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
>   {
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 0cddde3..6857703 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -192,6 +192,7 @@ extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
>   		unsigned long *hpa, enum dma_data_direction *direction);
>   extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
>
> +int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state);
>   void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
>   				unsigned char *log_buff);
>   int pnv_pci_cfg_read(struct pci_dn *pdn,
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure
  2016-02-17  3:44 ` [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure Gavin Shan
@ 2016-04-19  9:34   ` Alexey Kardashevskiy
  2016-04-20  2:33     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  9:34 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> The skiboot firmware might provide the PCI slot reset capability
> which is identified by property "ibm,reset-by-firmware" on the
> PCI slot associated device node.
>
> This checks the property. If it exists, the reset request is routed
> to firmware. Otherwise, the reset is done by kernel as before.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +++++++++++++++++++++++++++-
>   1 file changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index e23b063..c8a5217 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -789,7 +789,7 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>   	return ret;
>   }
>
> -static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
> +static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>   {
>   	struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
>   	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
> @@ -840,6 +840,45 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>   	return 0;
>   }
>
> +static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
> +{
> +	struct pci_controller *hose;
> +	struct pnv_phb *phb;
> +	struct device_node *dn = pdev ? pci_device_to_OF_node(pdev) : NULL;
> +	uint64_t id = (0x1ul << 60);


What is this 1<<60 for?


> +	uint8_t scope;
> +	int64_t rc;
> +
> +	/*
> +	 * If the firmware can't handle it, we will issue hot reset
> +	 * on the secondary bus despite the requested reset type.
> +	 */
> +	if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
> +		return __pnv_eeh_bridge_reset(pdev, option);
> +
> +	/* The firmware can handle the request */
> +	switch (option) {
> +	case EEH_RESET_HOT:
> +		scope = OPAL_RESET_PCI_HOT;
> +		break;
> +	case EEH_RESET_FUNDAMENTAL:
> +		scope = OPAL_RESET_PCI_FUNDAMENTAL;
> +		break;
> +	case EEH_RESET_DEACTIVATE:
> +		return 0;
> +	default:
> +		dev_warn(&pdev->dev, "%s: Unsupported reset %d\n",
> +			 __func__, option);


Can the userspace trigger this case (via VFIO-EEH) and flood dmesg?



> +		return -EINVAL;
> +	}
> +
> +	hose = pci_bus_to_host(pdev->bus);
> +	phb = hose->private_data;
> +	id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
> +	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
> +	return pnv_pci_poll(id, rc, NULL);
> +}
> +
>   static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
>   {
>   	int *freset = data;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status
  2016-02-17  3:44     ` Gavin Shan
  (?)
@ 2016-04-19  9:39     ` Alexey Kardashevskiy
  2016-04-20  2:36       ` Gavin Shan
  -1 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  9:39 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This exports 4 functins, which base on the corresponding OPAL


s/functins/functions/


> APIs to get/set PCI slot status. Those functions are going to
> be used by PowerNV PCI hotplug driver:
>
>     pnv_pci_get_device_tree()    opal_get_device_tree()
>     pnv_pci_get_presence_state() opal_pci_get_presence_state()
>     pnv_pci_get_power_state()    opal_pci_get_power_state()
>     pnv_pci_set_power_state()    opal_pci_set_power_state()
>
> Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
> unregister}() to allow registration and unregistration of PCI hotplug
> notifier, which will be used to receive PCI hotplug message from
> skiboot firmware in PowerNV PCI hotplug driver.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/opal-api.h            | 17 ++++++-
>   arch/powerpc/include/asm/opal.h                |  4 ++
>   arch/powerpc/include/asm/pnv-pci.h             |  7 +++
>   arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
>   arch/powerpc/platforms/powernv/pci.c           | 66 ++++++++++++++++++++++++++
>   5 files changed, 97 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index f8faaae..a6af338 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -158,7 +158,11 @@
>   #define OPAL_LEDS_SET_INDICATOR			115
>   #define OPAL_CEC_REBOOT2			116
>   #define OPAL_CONSOLE_FLUSH			117
> -#define OPAL_LAST				117
> +#define OPAL_GET_DEVICE_TREE			118
> +#define OPAL_PCI_GET_PRESENCE_STATE		119
> +#define OPAL_PCI_GET_POWER_STATE		120
> +#define OPAL_PCI_SET_POWER_STATE		121
> +#define OPAL_LAST				121
>
>   /* Device tree flags */
>
> @@ -344,6 +348,16 @@ enum OpalPciResetState {
>   	OPAL_ASSERT_RESET   = 1
>   };
>
> +enum OpalPciSlotPresentenceState {
> +	OPAL_PCI_SLOT_EMPTY	= 0,
> +	OPAL_PCI_SLOT_PRESENT	= 1
> +};
> +
> +enum OpalPciSlotPowerState {
> +	OPAL_PCI_SLOT_POWER_OFF	= 0,
> +	OPAL_PCI_SLOT_POWER_ON	= 1
> +};
> +
>   enum OpalSlotLedType {
>   	OPAL_SLOT_LED_TYPE_ID = 0,	/* IDENTIFY LED */
>   	OPAL_SLOT_LED_TYPE_FAULT = 1,	/* FAULT LED */
> @@ -378,6 +392,7 @@ enum opal_msg_type {
>   	OPAL_MSG_DPO,
>   	OPAL_MSG_PRD,
>   	OPAL_MSG_OCC,
> +	OPAL_MSG_PCI_HOTPLUG,
>   	OPAL_MSG_TYPE_MAX,
>   };
>
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 9e0039f..899bcb941 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -209,6 +209,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
>   		uint64_t size, uint64_t token);
>   int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
>   		uint64_t token);
> +int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
> +int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
> +int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
> +int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);
>
>   /* Internal functions */
>   extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
> diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
> index 6f77f71..d9d095b 100644
> --- a/arch/powerpc/include/asm/pnv-pci.h
> +++ b/arch/powerpc/include/asm/pnv-pci.h
> @@ -13,6 +13,13 @@
>   #include <linux/pci.h>
>   #include <misc/cxl-base.h>
>
> +extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
> +extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
> +extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
> +extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
> +extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
> +extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
> +
>   int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
>   int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
>   			   unsigned int virq);
> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
> index e45b88a..3ea1a855 100644
> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
> @@ -302,3 +302,7 @@ OPAL_CALL(opal_prd_msg,				OPAL_PRD_MSG);
>   OPAL_CALL(opal_leds_get_ind,			OPAL_LEDS_GET_INDICATOR);
>   OPAL_CALL(opal_leds_set_ind,			OPAL_LEDS_SET_INDICATOR);
>   OPAL_CALL(opal_console_flush,			OPAL_CONSOLE_FLUSH);
> +OPAL_CALL(opal_get_device_tree,			OPAL_GET_DEVICE_TREE);
> +OPAL_CALL(opal_pci_get_presence_state,		OPAL_PCI_GET_PRESENCE_STATE);
> +OPAL_CALL(opal_pci_get_power_state,		OPAL_PCI_GET_POWER_STATE);
> +OPAL_CALL(opal_pci_set_power_state,		OPAL_PCI_SET_POWER_STATE);
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index a458703..206385f 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -63,6 +63,72 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
>   	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
>   }
>
> +int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len)
> +{
> +	int64_t rc;
> +
> +	if (!opal_check_token(OPAL_GET_DEVICE_TREE))
> +		return -ENXIO;
> +
> +	rc = opal_get_device_tree(phandle, (uint64_t)buf, len);
> +	if (rc != OPAL_SUCCESS)
> +		return -EIO;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pnv_pci_get_device_tree);
> +
> +int pnv_pci_get_presence_state(uint64_t id, uint8_t *state)
> +{
> +	int64_t rc;
> +
> +	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATE))
> +		return -ENXIO;
> +
> +	rc = opal_pci_get_presence_state(id, state);
> +	if (rc != OPAL_SUCCESS)
> +		return -EIO;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pnv_pci_get_presence_state);
> +
> +int pnv_pci_get_power_state(uint64_t id, uint8_t *state)
> +{
> +	int64_t rc;
> +
> +	if (!opal_check_token(OPAL_PCI_GET_POWER_STATE))
> +		return -ENXIO;
> +
> +	rc = opal_pci_get_power_state(id, state);


Out of curiosity - if rc==OPAL_SUCCESS, @state should already contain the 
correct state and you do not have to call pnv_pci_poll() (which will call 
opal_pci_poll() immediately), is that correct?

Anyway, looks correct.


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>




> +	return pnv_pci_poll(id, rc, state);
> +}
> +EXPORT_SYMBOL_GPL(pnv_pci_get_power_state);
> +
> +int pnv_pci_set_power_state(uint64_t id, uint8_t state)
> +{
> +	int64_t rc;
> +
> +	if (!opal_check_token(OPAL_PCI_SET_POWER_STATE))
> +		return -ENXIO;
> +
> +	rc = opal_pci_set_power_state(id, state);
> +	return pnv_pci_poll(id, rc, NULL);
> +}
> +EXPORT_SYMBOL_GPL(pnv_pci_set_power_state);
> +
> +int pnv_pci_hotplug_notifier_register(struct notifier_block *nb)
> +{
> +	return opal_message_notifier_register(OPAL_MSG_PCI_HOTPLUG, nb);
> +}
> +EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_register);
> +
> +int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb)
> +{
> +	return opal_message_notifier_unregister(OPAL_MSG_PCI_HOTPLUG, nb);
> +}
> +EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_unregister);
> +
>   #ifdef CONFIG_PCI_MSI
>   int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
>   {
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC
  2016-02-17  3:44 ` [PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
@ 2016-04-19  9:42   ` Alexey Kardashevskiy
  2016-04-20  2:38     ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19  9:42 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> The device tree will change dynamically in PowerNV PCI hotplug
> driver. This enables CONFIG_OF_DYNAMIC to support that.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/Kconfig | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
> index 604190c..e7b1ad7 100644
> --- a/arch/powerpc/platforms/powernv/Kconfig
> +++ b/arch/powerpc/platforms/powernv/Kconfig
> @@ -18,6 +18,7 @@ config PPC_POWERNV
>   	select CPU_FREQ_GOV_ONDEMAND
>   	select CPU_FREQ_GOV_CONSERVATIVE
>   	select PPC_DOORBELL
> +	select OF_DYNAMIC


Why not to enable it in 45/45 under config HOTPLUG_PCI_POWERNV? Is there 
any benefit of having it always on if HOTPLUG_PCI_POWERNV is not enabled?


>   	default y
>
>   config OPAL_PRD
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-02-17  3:44     ` Gavin Shan
@ 2016-04-19 10:36         ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19 10:36 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Alistair Popple
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	mpe-Gsx/Oe8HsFggBc27wqDAHg, dja-Yfaxwxk/+vWsTnJN9+BGXg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This adds standalone driver to support PCI hotplug for PowerPC PowerNV
> platform that runs on top of skiboot firmware. The firmware identifies
> hotpluggable slots and marked their device tree node with proper
> "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
> device tree nodes to create/register PCI hotplug slot accordingly.
>
> The PCI slots are organized in fashion of tree, which means one
> PCI slot might have parent PCI slot and parent PCI slot possibly
> contains multiple child PCI slots. At the plugging time, the parent
> PCI slot is populated before its children. The child PCI slots are
> removed before their parent PCI slot can be removed from the system.
>
> If the skiboot firmware doesn't support slot status retrieval, the PCI
> slot device node shouldn't have property "ibm,reset-by-firmware". In
> that case, none of valid PCI slots will be detected from device tree.
> The skiboot firmware doesn't export the capability to access attention
> LEDs yet and it's something for TBD.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Acked-by: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
>   drivers/pci/hotplug/Kconfig   |  12 +
>   drivers/pci/hotplug/Makefile  |   3 +
>   drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 885 insertions(+)
>   create mode 100644 drivers/pci/hotplug/pnv_php.c
>
> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> index df8caec..167c8ce 100644
> --- a/drivers/pci/hotplug/Kconfig
> +++ b/drivers/pci/hotplug/Kconfig
> @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>
>   	  When in doubt, say N.
>
> +config HOTPLUG_PCI_POWERNV
> +	tristate "PowerPC PowerNV PCI Hotplug driver"
> +	depends on PPC_POWERNV && EEH
> +	help
> +	  Say Y here if you run PowerPC PowerNV platform that supports
> +	  PCI Hotplug
> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called pnv-php.
> +
> +	  When in doubt, say N.
> +
>   config HOTPLUG_PCI_RPA
>   	tristate "RPA PCI Hotplug driver"
>   	depends on PPC_PSERIES && EEH
> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
> index b616e75..e33cdda 100644
> --- a/drivers/pci/hotplug/Makefile
> +++ b/drivers/pci/hotplug/Makefile
> @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>   obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
> +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>   obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>   obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
> @@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>   acpiphp-objs		:=	acpiphp_core.o	\
>   				acpiphp_glue.o
>
> +pnv-php-objs		:=	pnv_php.o
> +
>   rpaphp-objs		:=	rpaphp_core.o	\
>   				rpaphp_pci.o	\
>   				rpaphp_slot.o
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> new file mode 100644
> index 0000000..364ec36
> --- /dev/null
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -0,0 +1,870 @@
> +/*
> + * PCI Hotplug Driver for PowerPC PowerNV platform.
> + *
> + * Copyright Gavin Shan, IBM Corporation 2015.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/libfdt.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/pci_hotplug.h>
> +
> +#include <asm/opal.h>
> +#include <asm/pnv-pci.h>
> +#include <asm/ppc-pci.h>
> +
> +#define DRIVER_VERSION	"0.1"
> +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
> +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
> +
> +struct pnv_php_slot {
> +	struct hotplug_slot		slot;
> +	struct hotplug_slot_info	slot_info;
> +	uint64_t			id;
> +	char				*name;
> +	int				slot_no;
> +	struct kref			kref;
> +#define PNV_PHP_STATE_INITIALIZED	0
> +#define PNV_PHP_STATE_REGISTERED	1
> +#define PNV_PHP_STATE_POPULATED		2
> +	int				state;
> +	struct device_node		*dn;
> +	struct pci_dev			*pdev;
> +	struct pci_bus			*bus;
> +	bool				power_state_check;
> +	int				power_state_confirmed;
> +#define PNV_PHP_POWER_CONFIRMED_INVALID	0
> +#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
> +#define PNV_PHP_POWER_CONFIRMED_FAIL	2
> +	struct opal_msg			*msg;
> +	void				*fdt;
> +	void				*dt;
> +	struct of_changeset		ocs;
> +	struct work_struct		work;
> +	wait_queue_head_t		queue;
> +	struct pnv_php_slot		*parent;
> +	struct list_head		children;
> +	struct list_head		link;
> +};
> +
> +static LIST_HEAD(pnv_php_slot_list);
> +static DEFINE_SPINLOCK(pnv_php_lock);
> +
> +static void pnv_php_register(struct device_node *dn);
> +static void pnv_php_unregister_one(struct device_node *dn);
> +static void pnv_php_unregister(struct device_node *dn);


The names confused me. I'd suggest pnv_php_scan(), pnv_php_unregister(), 
pnv_php_unregister_children() instead.


Alistair, what do you reckon?


> +
> +static void pnv_php_free_slot(struct kref *kref)
> +{
> +	struct pnv_php_slot *php_slot = container_of(kref,
> +						     struct pnv_php_slot,
> +						     kref);
> +
> +	WARN_ON(!list_empty(&php_slot->children));
> +	kfree(php_slot->name);
> +	kfree(php_slot);
> +}
> +
> +static inline void pnv_php_put_slot(struct pnv_php_slot *php_slot)
> +{
> +	if (!php_slot)


BUG_ON()?

> +		return;
> +
> +	kref_put(&php_slot->kref, pnv_php_free_slot);
> +}
> +
> +static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
> +					  struct pnv_php_slot *php_slot)
> +{
> +	struct pnv_php_slot *target, *tmp;
> +
> +	if (php_slot->dn == dn) {
> +		kref_get(&php_slot->kref);
> +		return php_slot;
> +	}
> +
> +	list_for_each_entry(tmp, &php_slot->children, link) {
> +		target = pnv_php_match(dn, tmp);
> +		if (target)
> +			return target;
> +	}
> +
> +	return NULL;
> +}
> +
> +static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
> +{
> +	struct pnv_php_slot *php_slot, *tmp;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pnv_php_lock, flags);
> +	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
> +		php_slot = pnv_php_match(dn, tmp);
> +		if (php_slot) {
> +			spin_unlock_irqrestore(&pnv_php_lock, flags);
> +			return php_slot;
> +		}
> +	}
> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> +
> +	return NULL;
> +}
> +
> +/*
> + * Remove pdn for all children of the indicated device node.
> + * The function should remove pdn in a depth-first manner.
> + */
> +static void pnv_php_rmv_pdns(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	for_each_child_of_node(dn, child) {
> +		pnv_php_rmv_pdns(child);
> +
> +		pci_remove_device_node_info(child);
> +	}
> +}
> +
> +/*
> + * Remove all child nodes of the indicated device nodes. The
> + * function should remove device nodes in depth-first manner.
> + */
> +static int pnv_php_rmv_device_nodes(struct device_node *parent)
> +{
> +	struct device_node *dn, *child;
> +	int ret = 0;
> +
> +	for_each_child_of_node(parent, dn) {
> +		ret = pnv_php_rmv_device_nodes(dn);
> +		if (ret)
> +			return ret;
> +
> +		child = of_get_next_child(dn, NULL);
> +		if (child) {
> +			of_node_put(child);
> +			of_node_put(dn);
> +			pr_err("%s: Alive children of node <%s>\n",
> +			       __func__, of_node_full_name(dn));
> +			return -EBUSY;
> +		}
> +
> +		of_detach_node(dn);
> +		of_node_put(dn);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * The function processes the message sent by firmware
> + * to remove all device tree nodes beneath the slot's
> + * nodes and the associated auxiliary data.
> + */
> +static void pnv_php_handle_poweroff(struct pnv_php_slot *php_slot)
> +{
> +	int ret;
> +
> +	pnv_php_rmv_pdns(php_slot->dn);
> +
> +	/*
> +	 * If the device sub-tree was created from OF changeset, simply
> +	 * to revert that. Otherwise, the device nodes in the sub-tree
> +	 * need to be iterated and detached.
> +	 */
> +	if (php_slot->fdt) {
> +		of_changeset_destroy(&php_slot->ocs);
> +		kfree(php_slot->dt);
> +		kfree(php_slot->fdt);
> +		php_slot->dt        = NULL;
> +		php_slot->dn->child = NULL;
> +		php_slot->fdt       = NULL;
> +		php_slot->power_state_confirmed =
> +			PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +		wake_up_interruptible(&php_slot->queue);
> +		return;
> +	}
> +
> +	ret = pnv_php_rmv_device_nodes(php_slot->dn);
> +	if (!ret) {
> +		php_slot->power_state_confirmed =
> +			PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +	} else {
> +		php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
> +		dev_warn(&php_slot->pdev->dev, "Error %d freeing nodes\n", ret);
> +	}
> +
> +	wake_up_interruptible(&php_slot->queue);


I liked one wake_up_interruptible() better...



> +}
> +
> +static int pnv_php_populate_changeset(struct of_changeset *ocs,
> +				      struct device_node *dn)
> +{
> +	struct device_node *child;
> +	int ret = 0;
> +
> +	for_each_child_of_node(dn, child) {
> +		ret = of_changeset_attach_node(ocs, child);
> +		if (ret)
> +			break;
> +
> +		ret = pnv_php_populate_changeset(ocs, child);


I asked in v7 - may be to add here "if (ret) break;"?


> +	}
> +
> +	return ret;
> +}
> +
> +static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
> +{
> +	struct pci_controller *hose = (struct pci_controller *)data;
> +	struct pci_dn *pdn;
> +
> +	pdn = pci_add_device_node_info(hose, dn);
> +	if (!pdn)
> +		return ERR_PTR(-ENOMEM);
> +
> +	return NULL;
> +}
> +
> +static void pnv_php_add_pdns(struct pnv_php_slot *slot)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(slot->bus);
> +
> +	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
> +}
> +
> +static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
> +{
> +	void *fdt, *fdt1, *dt;
> +	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +	int ret;
> +
> +	/* We don't know the FDT blob size. We try to get it through
> +	 * maximal memory chunk and then copy it to another chunk that
> +	 * fits the real size.
> +	 */
> +	fdt1 = kzalloc(0x10000, GFP_KERNEL);
> +	if (!fdt1)
> +		goto error;
> +
> +	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
> +	if (ret)
> +		goto free_fdt1;
> +
> +	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
> +	if (!fdt)
> +		goto free_fdt1;
> +
> +	/* Unflatten device tree blob */
> +	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
> +	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
> +	if (!dt) {
> +		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
> +		goto free_fdt;
> +	}
> +
> +	/* Initialize and apply the changeset */
> +	of_changeset_init(&php_slot->ocs);
> +	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
> +			 ret);
> +		goto free_dt;
> +	}
> +
> +	php_slot->dn->child = NULL;
> +	ret = of_changeset_apply(&php_slot->ocs);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
> +			 ret);
> +		goto destroy_changeset;
> +	}
> +
> +	/* Add device node firmware data */
> +	pnv_php_add_pdns(php_slot);
> +	php_slot->fdt = fdt;
> +	php_slot->dt  = dt;
> +	goto out;
> +
> +destroy_changeset:
> +	of_changeset_destroy(&php_slot->ocs);
> +free_dt:
> +	kfree(dt);
> +	php_slot->dn->child = NULL;
> +free_fdt:
> +	kfree(fdt);
> +free_fdt1:
> +	kfree(fdt1);
> +error:
> +	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
> +out:
> +	/* Confirm status change */
> +	php_slot->power_state_confirmed = confirm;
> +	wake_up_interruptible(&php_slot->queue);
> +}
> +
> +static void pnv_php_work(struct work_struct *data)
> +{
> +	struct pnv_php_slot *php_slot = container_of(data,
> +						     struct pnv_php_slot,
> +						     work);
> +	uint64_t event = be64_to_cpu(php_slot->msg->params[0]);
> +
> +	if (event == OPAL_PCI_SLOT_POWER_OFF)
> +		pnv_php_handle_poweroff(php_slot);
> +	else
> +		pnv_php_handle_poweron(php_slot);
> +
> +	pnv_php_put_slot(php_slot);
> +}
> +
> +static int pnv_php_handle_msg(struct notifier_block *nb,
> +			      unsigned long type,
> +			      void *message)
> +{
> +	phandle h;
> +	struct device_node *dn;
> +	struct pnv_php_slot *php_slot;
> +	struct opal_msg *msg = message;
> +
> +	if (type != OPAL_MSG_PCI_HOTPLUG) {
> +		pr_warn("%s: Invalid message %ld received!\n",
> +			__func__, type);
> +		return NOTIFY_DONE;
> +	}
> +
> +	h = (phandle)be64_to_cpu(msg->params[1]);
> +	dn = of_find_node_by_phandle(h);
> +	if (!dn) {
> +		pr_warn("%s: No device node for phandle 0x%x\n",
> +			__func__, h);
> +		return NOTIFY_DONE;
> +	}
> +
> +	php_slot = pnv_php_find_slot(dn);
> +	if (!php_slot) {
> +		pr_warn("%s: No slot found for node <%s>\n",
> +			__func__, of_node_full_name(dn));
> +		of_node_put(dn);
> +		return NOTIFY_DONE;
> +	}
> +
> +	of_node_put(dn);
> +	php_slot->msg = msg;
> +	schedule_work(&php_slot->work);
> +	return NOTIFY_OK;
> +}
> +
> +static int pnv_php_set_power_state(struct hotplug_slot *slot, u8 state)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	int ret;
> +
> +	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
> +	ret = pnv_pci_set_power_state(php_slot->id, state);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d powering %s slot\n",
> +			 ret, state ? "on" : "off");
> +		return ret;
> +	}
> +
> +	/* Continue to PCI probing after finalized device-tree. The
> +	 * device-tree might have been updated completely at this
> +	 * point. Thus we don't have to wait forever.
> +	 */
> +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
> +		return 0;
> +
> +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
> +		return -EBUSY;
> +
> +	/* Wait for firmware to add or remove device sub-tree. When it's done,
> +	 * one signal is received from firmware.
> +	 */
> +	ret = wait_event_timeout(php_slot->queue,
> +				 php_slot->power_state_confirmed, 10 * HZ);
> +	if (!ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d waiting for power-%s\n",
> +			 ret, state ? "on" : "off");
> +		return -EBUSY;
> +	}
> +
> +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
> +		return 0;
> +
> +	dev_warn(&php_slot->pdev->dev, "Error status %d for power-%s\n",
> +		 php_slot->power_state_confirmed, state ? "on" : "off");
> +	return -EBUSY;
> +}
> +
> +static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	uint8_t power_state;


Uninitialized variable.


> +	int ret;
> +
> +	/*
> +	 * Retrieve power status from firmware. If we fail
> +	 * getting that, the power status fails back to
> +	 * be on.
> +	 */
> +	ret = pnv_pci_get_power_state(php_slot->id, &power_state);
> +	if (ret) {
> +		*state = OPAL_PCI_SLOT_POWER_ON;
> +		dev_warn(&php_slot->pdev->dev, "Error %d getting power status\n",
> +			 ret);
> +	} else {
> +		*state = power_state;
> +		slot->info->power_status = power_state;
> +	}
> +
> +	return 0;
> +}
> +
> +static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	uint8_t presence;

Uninitialized variable.


> +	int ret;
> +
> +	/*
> +	 * Retrieve presence status from firmware. If we can't
> +	 * get that, it will fail back to be empty.
> +	 */
> +	ret = pnv_pci_get_presence_state(php_slot->id, &presence);
> +	if (ret >= 0) {
> +		*state = presence;
> +		slot->info->adapter_status = presence;
> +		ret = 0;
> +	} else {
> +		*state = OPAL_PCI_SLOT_EMPTY;
> +		dev_warn(&php_slot->pdev->dev, "Error %d getting presence\n",
> +			 ret);
> +	}
> +
> +	return ret;
> +}
> +
> +static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
> +{
> +	/* FIXME: Make it real once firmware supports it */

It still does not?


> +	slot->info->attention_status = state;
> +
> +	return 0;
> +}
> +
> +static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
> +{
> +	struct hotplug_slot *slot = &php_slot->slot;
> +	uint8_t presence, power_status;


Uninitialized variables.


> +	int ret;
> +
> +	/* Check if the slot has been configured */
> +	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
> +		return 0;
> +
> +	/* Retrieve slot presence status */
> +	ret = pnv_php_get_adapter_state(slot, &presence);
> +	if (ret)
> +		return ret;
> +
> +	/* Proceed if there have nothing behind the slot */
> +	if (presence == OPAL_PCI_SLOT_EMPTY)
> +		goto scan;
> +
> +	/*
> +	 * If the power suply to the slot is off, we can't detect

s/suply/supply/


> +	 * adapter presence state. That means we have to turn the
> +	 * slot on before going to probe slot's presence state.
> +	 *
> +	 * On the first time, we don't change the power status to
> +	 * boost system boot with assumption that the firmware
> +	 * supplies consistent slot power status: empty slot always
> +	 * has its power off and non-empty slot has its power on.
> +	 */
> +	if (!php_slot->power_state_check) {
> +		php_slot->power_state_check = true;
> +
> +		ret = pnv_php_get_power_state(slot, &power_status);
> +		if (ret)
> +			return ret;
> +
> +		if (power_status != OPAL_PCI_SLOT_POWER_ON)
> +			return 0;
> +	}
> +
> +	/* Check the power status. Scan the slot if that's already on */


s/that's/it is/


> +	ret = pnv_php_get_power_state(slot, &power_status);
> +	if (ret)
> +		return ret;
> +
> +	if (power_status == OPAL_PCI_SLOT_POWER_ON)
> +		goto scan;
> +
> +	/* Power is off, turn it on and then scan the slot */
> +	ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
> +	if (ret)
> +		return ret;
> +
> +scan:
> +	if (presence == OPAL_PCI_SLOT_PRESENT) {
> +		if (rescan) {
> +			pci_lock_rescan_remove();
> +			pci_add_pci_devices(php_slot->bus);
> +			pci_unlock_rescan_remove();
> +		}
> +
> +		/* Rescan for child hotpluggable slots */
> +		php_slot->state = PNV_PHP_STATE_POPULATED;
> +		if (rescan)
> +			pnv_php_register(php_slot->dn);
> +	} else {
> +		php_slot->state = PNV_PHP_STATE_POPULATED;
> +	}
> +
> +	return 0;
> +}
> +
> +static int pnv_php_enable_slot(struct hotplug_slot *slot)
> +{
> +	struct pnv_php_slot *php_slot = container_of(slot,
> +						     struct pnv_php_slot, slot);
> +
> +	return pnv_php_enable(php_slot, true);
> +}
> +
> +static int pnv_php_disable_slot(struct hotplug_slot *slot)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	uint8_t power_state;
> +	int ret;
> +
> +	if (php_slot->state != PNV_PHP_STATE_POPULATED)
> +		return 0;
> +
> +	/* Remove all devices behind the slot */
> +	pci_lock_rescan_remove();
> +	pci_remove_pci_devices(php_slot->bus);
> +	pci_unlock_rescan_remove();
> +
> +	/* Detach the child hotpluggable slots */
> +	pnv_php_unregister(php_slot->dn);
> +
> +	/*
> +	 * Check the power status and turn it off if necessary. If we
> +	 * fail to get the power status, the power will be forced to
> +	 * be off.
> +	 */
> +	ret = pnv_php_get_power_state(slot, &power_state);
> +	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
> +		ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_OFF);
> +		if (ret)
> +			dev_warn(&php_slot->pdev->dev, "Error %d powering off\n",


Long line, checkpatch.pl should have warned :)


> +				 ret);
> +	}
> +
> +	/* Update slot state */
> +	php_slot->state = PNV_PHP_STATE_REGISTERED;
> +	return 0;
> +}
> +
> +static struct hotplug_slot_ops php_slot_ops = {
> +	.get_power_status	= pnv_php_get_power_state,
> +	.get_adapter_status	= pnv_php_get_adapter_state,
> +	.set_attention_status	= pnv_php_set_attention_state,
> +	.enable_slot		= pnv_php_enable_slot,
> +	.disable_slot		= pnv_php_disable_slot,
> +};
> +
> +static void pnv_php_release(struct hotplug_slot *slot)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	unsigned long flags;
> +
> +	/* Remove from global or child list */
> +	spin_lock_irqsave(&pnv_php_lock, flags);
> +	list_del(&php_slot->link);
> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> +
> +	/* Detach from parent */
> +	pnv_php_put_slot(php_slot);
> +	pnv_php_put_slot(php_slot->parent);
> +}
> +
> +static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
> +{
> +	struct device_node *parent = dn;
> +	const __be64 *prop64;
> +	const __be32 *prop32;
> +
> +	/*
> +	 * The hotpluggable slot always has a compound Id, which
> +	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
> +	 * number, and compound indicator
> +	 */
> +	*id = (0x1ul << 63);


Is this bit from the same space as 1<<60 as in pnv_eeh_bridge_reset()? If 
so, it would be great to have all these id bits defined in one place.


> +
> +	/* Bus/Slot/Function number */
> +	prop32 = of_get_property(dn, "reg", NULL);
> +	if (!prop32)
> +		return -ENXIO;
> +	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
> +
> +	/* PHB Id */
> +	while ((parent = of_get_parent(parent))) {
> +		if (!PCI_DN(parent)) {
> +			of_node_put(parent);
> +			break;
> +		}
> +
> +		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
> +		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
> +			of_node_put(parent);
> +			continue;
> +		}
> +
> +		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
> +		if (!prop64) {
> +			of_node_put(parent);
> +			return -ENXIO;
> +		}
> +
> +		*id |= be64_to_cpup(prop64);
> +		of_node_put(parent);
> +		return 0;
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
> +{
> +	struct pnv_php_slot *php_slot;
> +	struct pci_bus *bus;
> +	const char *label;
> +	uint64_t id;
> +
> +	label = of_get_property(dn, "ibm,slot-label", NULL);
> +	if (!label)
> +		return NULL;
> +
> +	if (pnv_php_get_slot_id(dn, &id))
> +		return NULL;
> +
> +	bus = pci_find_bus_by_node(dn);
> +	if (!bus)
> +		return NULL;
> +
> +	php_slot = kzalloc(sizeof(*php_slot), GFP_KERNEL);
> +	if (!php_slot)
> +		return NULL;
> +
> +	php_slot->name = kstrdup(label, GFP_KERNEL);
> +	if (!php_slot->name) {
> +		kfree(php_slot);
> +		return NULL;
> +	}
> +
> +	if (dn->child && PCI_DN(dn->child))
> +		php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
> +	else
> +		php_slot->slot_no = -1;   /* Placeholder slot */
> +
> +	kref_init(&php_slot->kref);
> +	php_slot->state	                = PNV_PHP_STATE_INITIALIZED;
> +	php_slot->dn	                = dn;
> +	php_slot->pdev	                = bus->self;
> +	php_slot->bus	                = bus;
> +	php_slot->id	                = id;
> +	php_slot->power_state_check     = false;
> +	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
> +	php_slot->slot.ops              = &php_slot_ops;
> +	php_slot->slot.info             = &php_slot->slot_info;
> +	php_slot->slot.release          = pnv_php_release;
> +	php_slot->slot.private          = php_slot;
> +
> +	INIT_WORK(&php_slot->work, pnv_php_work);
> +	init_waitqueue_head(&php_slot->queue);
> +	INIT_LIST_HEAD(&php_slot->children);
> +	INIT_LIST_HEAD(&php_slot->link);
> +
> +	return php_slot;
> +}
> +
> +static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
> +{
> +	struct pnv_php_slot *parent;
> +	struct device_node *dn = php_slot->dn;
> +	unsigned long flags;
> +	int ret;
> +
> +	/* Check if the slot is registered or not */
> +	parent = pnv_php_find_slot(php_slot->dn);
> +	if (parent) {
> +		pnv_php_put_slot(parent);
> +		return -EEXIST;
> +	}
> +
> +	/* Register PCI slot */
> +	ret = pci_hp_register(&php_slot->slot, php_slot->bus,
> +			      php_slot->slot_no, php_slot->name);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d registering slot\n",
> +			 ret);
> +		return ret;
> +	}
> +
> +	/* Attach to the parent's child list or global list */
> +	while ((dn = of_get_parent(dn))) {
> +		if (!PCI_DN(dn)) {
> +			of_node_put(dn);
> +			break;
> +		}
> +
> +		parent = pnv_php_find_slot(dn);
> +		if (parent) {
> +			of_node_put(dn);
> +			break;
> +		}
> +
> +		of_node_put(dn);
> +	}
> +
> +	spin_lock_irqsave(&pnv_php_lock, flags);
> +	php_slot->parent = parent;
> +	if (parent)
> +		list_add_tail(&php_slot->link, &parent->children);
> +	else
> +		list_add_tail(&php_slot->link, &pnv_php_slot_list);
> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> +
> +	php_slot->state = PNV_PHP_STATE_REGISTERED;
> +	return 0;
> +}
> +
> +static int pnv_php_register_one(struct device_node *dn)
> +{
> +	struct pnv_php_slot *php_slot;
> +	const __be32 *prop32;
> +	int ret;
> +
> +	/* Check if it's hotpluggable slot */
> +	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
> +	if (!prop32 || !of_read_number(prop32, 1))
> +		return -ENXIO;
> +
> +	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
> +	if (!prop32 || !of_read_number(prop32, 1))
> +		return -ENXIO;
> +
> +	php_slot = pnv_php_alloc_slot(dn);
> +	if (!php_slot)
> +		return -ENODEV;
> +
> +	ret = pnv_php_register_slot(php_slot);
> +	if (ret)
> +		goto free_slot;
> +
> +	ret = pnv_php_enable(php_slot, false);
> +	if (ret)
> +		goto unregister_slot;
> +
> +	return 0;
> +
> +unregister_slot:
> +	pnv_php_unregister_one(php_slot->dn);
> +free_slot:
> +	pnv_php_put_slot(php_slot);
> +	return ret;
> +}
> +
> +static void pnv_php_register(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	/*
> +	 * The parent slots should be registered before their
> +	 * child slots.
> +	 */
> +	for_each_child_of_node(dn, child) {
> +		pnv_php_register_one(child);
> +		pnv_php_register(child);
> +	}
> +}
> +
> +static void pnv_php_unregister_one(struct device_node *dn)
> +{
> +	struct pnv_php_slot *php_slot;
> +
> +	php_slot = pnv_php_find_slot(dn);
> +	if (!php_slot)
> +		return;
> +
> +	pnv_php_put_slot(php_slot);
> +	pci_hp_deregister(&php_slot->slot);
> +}
> +
> +static void pnv_php_unregister(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	/* The child slots should go before their parent slots */
> +	for_each_child_of_node(dn, child) {
> +		pnv_php_unregister(child);
> +		pnv_php_unregister_one(child);
> +	}
> +}
> +
> +static struct notifier_block php_msg_nb = {
> +	.notifier_call	= pnv_php_handle_msg,
> +	.next		= NULL,
> +	.priority	= 0,
> +};
> +
> +static int __init pnv_php_init(void)
> +{
> +	struct device_node *dn;
> +	int ret;
> +
> +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> +
> +	/* Register hotplug message handler */
> +	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
> +	if (ret) {
> +		pr_warn("%s: Error %d registering hotplug notifier\n",
> +			__func__, ret);
> +		return ret;
> +	}
> +
> +	/* Scan PHB nodes and their children */
> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> +		pnv_php_register(dn);
> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> +		pnv_php_register(dn);
> +
> +	return 0;
> +}
> +
> +static void __exit pnv_php_exit(void)
> +{
> +	struct device_node *dn;
> +
> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> +		pnv_php_unregister(dn);
> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> +		pnv_php_unregister(dn);
> +
> +	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
> +}
> +
> +module_init(pnv_php_init);
> +module_exit(pnv_php_exit);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
@ 2016-04-19 10:36         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-19 10:36 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev, Alistair Popple
  Cc: linux-pci, devicetree, benh, mpe, dja, bhelgaas, robherring2,
	grant.likely

On 02/17/2016 02:44 PM, Gavin Shan wrote:
> This adds standalone driver to support PCI hotplug for PowerPC PowerNV
> platform that runs on top of skiboot firmware. The firmware identifies
> hotpluggable slots and marked their device tree node with proper
> "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
> device tree nodes to create/register PCI hotplug slot accordingly.
>
> The PCI slots are organized in fashion of tree, which means one
> PCI slot might have parent PCI slot and parent PCI slot possibly
> contains multiple child PCI slots. At the plugging time, the parent
> PCI slot is populated before its children. The child PCI slots are
> removed before their parent PCI slot can be removed from the system.
>
> If the skiboot firmware doesn't support slot status retrieval, the PCI
> slot device node shouldn't have property "ibm,reset-by-firmware". In
> that case, none of valid PCI slots will be detected from device tree.
> The skiboot firmware doesn't export the capability to access attention
> LEDs yet and it's something for TBD.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>   drivers/pci/hotplug/Kconfig   |  12 +
>   drivers/pci/hotplug/Makefile  |   3 +
>   drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 885 insertions(+)
>   create mode 100644 drivers/pci/hotplug/pnv_php.c
>
> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> index df8caec..167c8ce 100644
> --- a/drivers/pci/hotplug/Kconfig
> +++ b/drivers/pci/hotplug/Kconfig
> @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>
>   	  When in doubt, say N.
>
> +config HOTPLUG_PCI_POWERNV
> +	tristate "PowerPC PowerNV PCI Hotplug driver"
> +	depends on PPC_POWERNV && EEH
> +	help
> +	  Say Y here if you run PowerPC PowerNV platform that supports
> +	  PCI Hotplug
> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called pnv-php.
> +
> +	  When in doubt, say N.
> +
>   config HOTPLUG_PCI_RPA
>   	tristate "RPA PCI Hotplug driver"
>   	depends on PPC_PSERIES && EEH
> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
> index b616e75..e33cdda 100644
> --- a/drivers/pci/hotplug/Makefile
> +++ b/drivers/pci/hotplug/Makefile
> @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>   obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
> +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>   obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>   obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
> @@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>   acpiphp-objs		:=	acpiphp_core.o	\
>   				acpiphp_glue.o
>
> +pnv-php-objs		:=	pnv_php.o
> +
>   rpaphp-objs		:=	rpaphp_core.o	\
>   				rpaphp_pci.o	\
>   				rpaphp_slot.o
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> new file mode 100644
> index 0000000..364ec36
> --- /dev/null
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -0,0 +1,870 @@
> +/*
> + * PCI Hotplug Driver for PowerPC PowerNV platform.
> + *
> + * Copyright Gavin Shan, IBM Corporation 2015.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/libfdt.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/pci_hotplug.h>
> +
> +#include <asm/opal.h>
> +#include <asm/pnv-pci.h>
> +#include <asm/ppc-pci.h>
> +
> +#define DRIVER_VERSION	"0.1"
> +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
> +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
> +
> +struct pnv_php_slot {
> +	struct hotplug_slot		slot;
> +	struct hotplug_slot_info	slot_info;
> +	uint64_t			id;
> +	char				*name;
> +	int				slot_no;
> +	struct kref			kref;
> +#define PNV_PHP_STATE_INITIALIZED	0
> +#define PNV_PHP_STATE_REGISTERED	1
> +#define PNV_PHP_STATE_POPULATED		2
> +	int				state;
> +	struct device_node		*dn;
> +	struct pci_dev			*pdev;
> +	struct pci_bus			*bus;
> +	bool				power_state_check;
> +	int				power_state_confirmed;
> +#define PNV_PHP_POWER_CONFIRMED_INVALID	0
> +#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
> +#define PNV_PHP_POWER_CONFIRMED_FAIL	2
> +	struct opal_msg			*msg;
> +	void				*fdt;
> +	void				*dt;
> +	struct of_changeset		ocs;
> +	struct work_struct		work;
> +	wait_queue_head_t		queue;
> +	struct pnv_php_slot		*parent;
> +	struct list_head		children;
> +	struct list_head		link;
> +};
> +
> +static LIST_HEAD(pnv_php_slot_list);
> +static DEFINE_SPINLOCK(pnv_php_lock);
> +
> +static void pnv_php_register(struct device_node *dn);
> +static void pnv_php_unregister_one(struct device_node *dn);
> +static void pnv_php_unregister(struct device_node *dn);


The names confused me. I'd suggest pnv_php_scan(), pnv_php_unregister(), 
pnv_php_unregister_children() instead.


Alistair, what do you reckon?


> +
> +static void pnv_php_free_slot(struct kref *kref)
> +{
> +	struct pnv_php_slot *php_slot = container_of(kref,
> +						     struct pnv_php_slot,
> +						     kref);
> +
> +	WARN_ON(!list_empty(&php_slot->children));
> +	kfree(php_slot->name);
> +	kfree(php_slot);
> +}
> +
> +static inline void pnv_php_put_slot(struct pnv_php_slot *php_slot)
> +{
> +	if (!php_slot)


BUG_ON()?

> +		return;
> +
> +	kref_put(&php_slot->kref, pnv_php_free_slot);
> +}
> +
> +static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
> +					  struct pnv_php_slot *php_slot)
> +{
> +	struct pnv_php_slot *target, *tmp;
> +
> +	if (php_slot->dn == dn) {
> +		kref_get(&php_slot->kref);
> +		return php_slot;
> +	}
> +
> +	list_for_each_entry(tmp, &php_slot->children, link) {
> +		target = pnv_php_match(dn, tmp);
> +		if (target)
> +			return target;
> +	}
> +
> +	return NULL;
> +}
> +
> +static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
> +{
> +	struct pnv_php_slot *php_slot, *tmp;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pnv_php_lock, flags);
> +	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
> +		php_slot = pnv_php_match(dn, tmp);
> +		if (php_slot) {
> +			spin_unlock_irqrestore(&pnv_php_lock, flags);
> +			return php_slot;
> +		}
> +	}
> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> +
> +	return NULL;
> +}
> +
> +/*
> + * Remove pdn for all children of the indicated device node.
> + * The function should remove pdn in a depth-first manner.
> + */
> +static void pnv_php_rmv_pdns(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	for_each_child_of_node(dn, child) {
> +		pnv_php_rmv_pdns(child);
> +
> +		pci_remove_device_node_info(child);
> +	}
> +}
> +
> +/*
> + * Remove all child nodes of the indicated device nodes. The
> + * function should remove device nodes in depth-first manner.
> + */
> +static int pnv_php_rmv_device_nodes(struct device_node *parent)
> +{
> +	struct device_node *dn, *child;
> +	int ret = 0;
> +
> +	for_each_child_of_node(parent, dn) {
> +		ret = pnv_php_rmv_device_nodes(dn);
> +		if (ret)
> +			return ret;
> +
> +		child = of_get_next_child(dn, NULL);
> +		if (child) {
> +			of_node_put(child);
> +			of_node_put(dn);
> +			pr_err("%s: Alive children of node <%s>\n",
> +			       __func__, of_node_full_name(dn));
> +			return -EBUSY;
> +		}
> +
> +		of_detach_node(dn);
> +		of_node_put(dn);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * The function processes the message sent by firmware
> + * to remove all device tree nodes beneath the slot's
> + * nodes and the associated auxiliary data.
> + */
> +static void pnv_php_handle_poweroff(struct pnv_php_slot *php_slot)
> +{
> +	int ret;
> +
> +	pnv_php_rmv_pdns(php_slot->dn);
> +
> +	/*
> +	 * If the device sub-tree was created from OF changeset, simply
> +	 * to revert that. Otherwise, the device nodes in the sub-tree
> +	 * need to be iterated and detached.
> +	 */
> +	if (php_slot->fdt) {
> +		of_changeset_destroy(&php_slot->ocs);
> +		kfree(php_slot->dt);
> +		kfree(php_slot->fdt);
> +		php_slot->dt        = NULL;
> +		php_slot->dn->child = NULL;
> +		php_slot->fdt       = NULL;
> +		php_slot->power_state_confirmed =
> +			PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +		wake_up_interruptible(&php_slot->queue);
> +		return;
> +	}
> +
> +	ret = pnv_php_rmv_device_nodes(php_slot->dn);
> +	if (!ret) {
> +		php_slot->power_state_confirmed =
> +			PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +	} else {
> +		php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
> +		dev_warn(&php_slot->pdev->dev, "Error %d freeing nodes\n", ret);
> +	}
> +
> +	wake_up_interruptible(&php_slot->queue);


I liked one wake_up_interruptible() better...



> +}
> +
> +static int pnv_php_populate_changeset(struct of_changeset *ocs,
> +				      struct device_node *dn)
> +{
> +	struct device_node *child;
> +	int ret = 0;
> +
> +	for_each_child_of_node(dn, child) {
> +		ret = of_changeset_attach_node(ocs, child);
> +		if (ret)
> +			break;
> +
> +		ret = pnv_php_populate_changeset(ocs, child);


I asked in v7 - may be to add here "if (ret) break;"?


> +	}
> +
> +	return ret;
> +}
> +
> +static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
> +{
> +	struct pci_controller *hose = (struct pci_controller *)data;
> +	struct pci_dn *pdn;
> +
> +	pdn = pci_add_device_node_info(hose, dn);
> +	if (!pdn)
> +		return ERR_PTR(-ENOMEM);
> +
> +	return NULL;
> +}
> +
> +static void pnv_php_add_pdns(struct pnv_php_slot *slot)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(slot->bus);
> +
> +	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
> +}
> +
> +static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
> +{
> +	void *fdt, *fdt1, *dt;
> +	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +	int ret;
> +
> +	/* We don't know the FDT blob size. We try to get it through
> +	 * maximal memory chunk and then copy it to another chunk that
> +	 * fits the real size.
> +	 */
> +	fdt1 = kzalloc(0x10000, GFP_KERNEL);
> +	if (!fdt1)
> +		goto error;
> +
> +	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
> +	if (ret)
> +		goto free_fdt1;
> +
> +	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
> +	if (!fdt)
> +		goto free_fdt1;
> +
> +	/* Unflatten device tree blob */
> +	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
> +	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
> +	if (!dt) {
> +		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
> +		goto free_fdt;
> +	}
> +
> +	/* Initialize and apply the changeset */
> +	of_changeset_init(&php_slot->ocs);
> +	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
> +			 ret);
> +		goto free_dt;
> +	}
> +
> +	php_slot->dn->child = NULL;
> +	ret = of_changeset_apply(&php_slot->ocs);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
> +			 ret);
> +		goto destroy_changeset;
> +	}
> +
> +	/* Add device node firmware data */
> +	pnv_php_add_pdns(php_slot);
> +	php_slot->fdt = fdt;
> +	php_slot->dt  = dt;
> +	goto out;
> +
> +destroy_changeset:
> +	of_changeset_destroy(&php_slot->ocs);
> +free_dt:
> +	kfree(dt);
> +	php_slot->dn->child = NULL;
> +free_fdt:
> +	kfree(fdt);
> +free_fdt1:
> +	kfree(fdt1);
> +error:
> +	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
> +out:
> +	/* Confirm status change */
> +	php_slot->power_state_confirmed = confirm;
> +	wake_up_interruptible(&php_slot->queue);
> +}
> +
> +static void pnv_php_work(struct work_struct *data)
> +{
> +	struct pnv_php_slot *php_slot = container_of(data,
> +						     struct pnv_php_slot,
> +						     work);
> +	uint64_t event = be64_to_cpu(php_slot->msg->params[0]);
> +
> +	if (event == OPAL_PCI_SLOT_POWER_OFF)
> +		pnv_php_handle_poweroff(php_slot);
> +	else
> +		pnv_php_handle_poweron(php_slot);
> +
> +	pnv_php_put_slot(php_slot);
> +}
> +
> +static int pnv_php_handle_msg(struct notifier_block *nb,
> +			      unsigned long type,
> +			      void *message)
> +{
> +	phandle h;
> +	struct device_node *dn;
> +	struct pnv_php_slot *php_slot;
> +	struct opal_msg *msg = message;
> +
> +	if (type != OPAL_MSG_PCI_HOTPLUG) {
> +		pr_warn("%s: Invalid message %ld received!\n",
> +			__func__, type);
> +		return NOTIFY_DONE;
> +	}
> +
> +	h = (phandle)be64_to_cpu(msg->params[1]);
> +	dn = of_find_node_by_phandle(h);
> +	if (!dn) {
> +		pr_warn("%s: No device node for phandle 0x%x\n",
> +			__func__, h);
> +		return NOTIFY_DONE;
> +	}
> +
> +	php_slot = pnv_php_find_slot(dn);
> +	if (!php_slot) {
> +		pr_warn("%s: No slot found for node <%s>\n",
> +			__func__, of_node_full_name(dn));
> +		of_node_put(dn);
> +		return NOTIFY_DONE;
> +	}
> +
> +	of_node_put(dn);
> +	php_slot->msg = msg;
> +	schedule_work(&php_slot->work);
> +	return NOTIFY_OK;
> +}
> +
> +static int pnv_php_set_power_state(struct hotplug_slot *slot, u8 state)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	int ret;
> +
> +	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
> +	ret = pnv_pci_set_power_state(php_slot->id, state);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d powering %s slot\n",
> +			 ret, state ? "on" : "off");
> +		return ret;
> +	}
> +
> +	/* Continue to PCI probing after finalized device-tree. The
> +	 * device-tree might have been updated completely at this
> +	 * point. Thus we don't have to wait forever.
> +	 */
> +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
> +		return 0;
> +
> +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
> +		return -EBUSY;
> +
> +	/* Wait for firmware to add or remove device sub-tree. When it's done,
> +	 * one signal is received from firmware.
> +	 */
> +	ret = wait_event_timeout(php_slot->queue,
> +				 php_slot->power_state_confirmed, 10 * HZ);
> +	if (!ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d waiting for power-%s\n",
> +			 ret, state ? "on" : "off");
> +		return -EBUSY;
> +	}
> +
> +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
> +		return 0;
> +
> +	dev_warn(&php_slot->pdev->dev, "Error status %d for power-%s\n",
> +		 php_slot->power_state_confirmed, state ? "on" : "off");
> +	return -EBUSY;
> +}
> +
> +static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	uint8_t power_state;


Uninitialized variable.


> +	int ret;
> +
> +	/*
> +	 * Retrieve power status from firmware. If we fail
> +	 * getting that, the power status fails back to
> +	 * be on.
> +	 */
> +	ret = pnv_pci_get_power_state(php_slot->id, &power_state);
> +	if (ret) {
> +		*state = OPAL_PCI_SLOT_POWER_ON;
> +		dev_warn(&php_slot->pdev->dev, "Error %d getting power status\n",
> +			 ret);
> +	} else {
> +		*state = power_state;
> +		slot->info->power_status = power_state;
> +	}
> +
> +	return 0;
> +}
> +
> +static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	uint8_t presence;

Uninitialized variable.


> +	int ret;
> +
> +	/*
> +	 * Retrieve presence status from firmware. If we can't
> +	 * get that, it will fail back to be empty.
> +	 */
> +	ret = pnv_pci_get_presence_state(php_slot->id, &presence);
> +	if (ret >= 0) {
> +		*state = presence;
> +		slot->info->adapter_status = presence;
> +		ret = 0;
> +	} else {
> +		*state = OPAL_PCI_SLOT_EMPTY;
> +		dev_warn(&php_slot->pdev->dev, "Error %d getting presence\n",
> +			 ret);
> +	}
> +
> +	return ret;
> +}
> +
> +static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
> +{
> +	/* FIXME: Make it real once firmware supports it */

It still does not?


> +	slot->info->attention_status = state;
> +
> +	return 0;
> +}
> +
> +static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
> +{
> +	struct hotplug_slot *slot = &php_slot->slot;
> +	uint8_t presence, power_status;


Uninitialized variables.


> +	int ret;
> +
> +	/* Check if the slot has been configured */
> +	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
> +		return 0;
> +
> +	/* Retrieve slot presence status */
> +	ret = pnv_php_get_adapter_state(slot, &presence);
> +	if (ret)
> +		return ret;
> +
> +	/* Proceed if there have nothing behind the slot */
> +	if (presence == OPAL_PCI_SLOT_EMPTY)
> +		goto scan;
> +
> +	/*
> +	 * If the power suply to the slot is off, we can't detect

s/suply/supply/


> +	 * adapter presence state. That means we have to turn the
> +	 * slot on before going to probe slot's presence state.
> +	 *
> +	 * On the first time, we don't change the power status to
> +	 * boost system boot with assumption that the firmware
> +	 * supplies consistent slot power status: empty slot always
> +	 * has its power off and non-empty slot has its power on.
> +	 */
> +	if (!php_slot->power_state_check) {
> +		php_slot->power_state_check = true;
> +
> +		ret = pnv_php_get_power_state(slot, &power_status);
> +		if (ret)
> +			return ret;
> +
> +		if (power_status != OPAL_PCI_SLOT_POWER_ON)
> +			return 0;
> +	}
> +
> +	/* Check the power status. Scan the slot if that's already on */


s/that's/it is/


> +	ret = pnv_php_get_power_state(slot, &power_status);
> +	if (ret)
> +		return ret;
> +
> +	if (power_status == OPAL_PCI_SLOT_POWER_ON)
> +		goto scan;
> +
> +	/* Power is off, turn it on and then scan the slot */
> +	ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
> +	if (ret)
> +		return ret;
> +
> +scan:
> +	if (presence == OPAL_PCI_SLOT_PRESENT) {
> +		if (rescan) {
> +			pci_lock_rescan_remove();
> +			pci_add_pci_devices(php_slot->bus);
> +			pci_unlock_rescan_remove();
> +		}
> +
> +		/* Rescan for child hotpluggable slots */
> +		php_slot->state = PNV_PHP_STATE_POPULATED;
> +		if (rescan)
> +			pnv_php_register(php_slot->dn);
> +	} else {
> +		php_slot->state = PNV_PHP_STATE_POPULATED;
> +	}
> +
> +	return 0;
> +}
> +
> +static int pnv_php_enable_slot(struct hotplug_slot *slot)
> +{
> +	struct pnv_php_slot *php_slot = container_of(slot,
> +						     struct pnv_php_slot, slot);
> +
> +	return pnv_php_enable(php_slot, true);
> +}
> +
> +static int pnv_php_disable_slot(struct hotplug_slot *slot)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	uint8_t power_state;
> +	int ret;
> +
> +	if (php_slot->state != PNV_PHP_STATE_POPULATED)
> +		return 0;
> +
> +	/* Remove all devices behind the slot */
> +	pci_lock_rescan_remove();
> +	pci_remove_pci_devices(php_slot->bus);
> +	pci_unlock_rescan_remove();
> +
> +	/* Detach the child hotpluggable slots */
> +	pnv_php_unregister(php_slot->dn);
> +
> +	/*
> +	 * Check the power status and turn it off if necessary. If we
> +	 * fail to get the power status, the power will be forced to
> +	 * be off.
> +	 */
> +	ret = pnv_php_get_power_state(slot, &power_state);
> +	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
> +		ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_OFF);
> +		if (ret)
> +			dev_warn(&php_slot->pdev->dev, "Error %d powering off\n",


Long line, checkpatch.pl should have warned :)


> +				 ret);
> +	}
> +
> +	/* Update slot state */
> +	php_slot->state = PNV_PHP_STATE_REGISTERED;
> +	return 0;
> +}
> +
> +static struct hotplug_slot_ops php_slot_ops = {
> +	.get_power_status	= pnv_php_get_power_state,
> +	.get_adapter_status	= pnv_php_get_adapter_state,
> +	.set_attention_status	= pnv_php_set_attention_state,
> +	.enable_slot		= pnv_php_enable_slot,
> +	.disable_slot		= pnv_php_disable_slot,
> +};
> +
> +static void pnv_php_release(struct hotplug_slot *slot)
> +{
> +	struct pnv_php_slot *php_slot = slot->private;
> +	unsigned long flags;
> +
> +	/* Remove from global or child list */
> +	spin_lock_irqsave(&pnv_php_lock, flags);
> +	list_del(&php_slot->link);
> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> +
> +	/* Detach from parent */
> +	pnv_php_put_slot(php_slot);
> +	pnv_php_put_slot(php_slot->parent);
> +}
> +
> +static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
> +{
> +	struct device_node *parent = dn;
> +	const __be64 *prop64;
> +	const __be32 *prop32;
> +
> +	/*
> +	 * The hotpluggable slot always has a compound Id, which
> +	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
> +	 * number, and compound indicator
> +	 */
> +	*id = (0x1ul << 63);


Is this bit from the same space as 1<<60 as in pnv_eeh_bridge_reset()? If 
so, it would be great to have all these id bits defined in one place.


> +
> +	/* Bus/Slot/Function number */
> +	prop32 = of_get_property(dn, "reg", NULL);
> +	if (!prop32)
> +		return -ENXIO;
> +	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
> +
> +	/* PHB Id */
> +	while ((parent = of_get_parent(parent))) {
> +		if (!PCI_DN(parent)) {
> +			of_node_put(parent);
> +			break;
> +		}
> +
> +		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
> +		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
> +			of_node_put(parent);
> +			continue;
> +		}
> +
> +		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
> +		if (!prop64) {
> +			of_node_put(parent);
> +			return -ENXIO;
> +		}
> +
> +		*id |= be64_to_cpup(prop64);
> +		of_node_put(parent);
> +		return 0;
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
> +{
> +	struct pnv_php_slot *php_slot;
> +	struct pci_bus *bus;
> +	const char *label;
> +	uint64_t id;
> +
> +	label = of_get_property(dn, "ibm,slot-label", NULL);
> +	if (!label)
> +		return NULL;
> +
> +	if (pnv_php_get_slot_id(dn, &id))
> +		return NULL;
> +
> +	bus = pci_find_bus_by_node(dn);
> +	if (!bus)
> +		return NULL;
> +
> +	php_slot = kzalloc(sizeof(*php_slot), GFP_KERNEL);
> +	if (!php_slot)
> +		return NULL;
> +
> +	php_slot->name = kstrdup(label, GFP_KERNEL);
> +	if (!php_slot->name) {
> +		kfree(php_slot);
> +		return NULL;
> +	}
> +
> +	if (dn->child && PCI_DN(dn->child))
> +		php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
> +	else
> +		php_slot->slot_no = -1;   /* Placeholder slot */
> +
> +	kref_init(&php_slot->kref);
> +	php_slot->state	                = PNV_PHP_STATE_INITIALIZED;
> +	php_slot->dn	                = dn;
> +	php_slot->pdev	                = bus->self;
> +	php_slot->bus	                = bus;
> +	php_slot->id	                = id;
> +	php_slot->power_state_check     = false;
> +	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
> +	php_slot->slot.ops              = &php_slot_ops;
> +	php_slot->slot.info             = &php_slot->slot_info;
> +	php_slot->slot.release          = pnv_php_release;
> +	php_slot->slot.private          = php_slot;
> +
> +	INIT_WORK(&php_slot->work, pnv_php_work);
> +	init_waitqueue_head(&php_slot->queue);
> +	INIT_LIST_HEAD(&php_slot->children);
> +	INIT_LIST_HEAD(&php_slot->link);
> +
> +	return php_slot;
> +}
> +
> +static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
> +{
> +	struct pnv_php_slot *parent;
> +	struct device_node *dn = php_slot->dn;
> +	unsigned long flags;
> +	int ret;
> +
> +	/* Check if the slot is registered or not */
> +	parent = pnv_php_find_slot(php_slot->dn);
> +	if (parent) {
> +		pnv_php_put_slot(parent);
> +		return -EEXIST;
> +	}
> +
> +	/* Register PCI slot */
> +	ret = pci_hp_register(&php_slot->slot, php_slot->bus,
> +			      php_slot->slot_no, php_slot->name);
> +	if (ret) {
> +		dev_warn(&php_slot->pdev->dev, "Error %d registering slot\n",
> +			 ret);
> +		return ret;
> +	}
> +
> +	/* Attach to the parent's child list or global list */
> +	while ((dn = of_get_parent(dn))) {
> +		if (!PCI_DN(dn)) {
> +			of_node_put(dn);
> +			break;
> +		}
> +
> +		parent = pnv_php_find_slot(dn);
> +		if (parent) {
> +			of_node_put(dn);
> +			break;
> +		}
> +
> +		of_node_put(dn);
> +	}
> +
> +	spin_lock_irqsave(&pnv_php_lock, flags);
> +	php_slot->parent = parent;
> +	if (parent)
> +		list_add_tail(&php_slot->link, &parent->children);
> +	else
> +		list_add_tail(&php_slot->link, &pnv_php_slot_list);
> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> +
> +	php_slot->state = PNV_PHP_STATE_REGISTERED;
> +	return 0;
> +}
> +
> +static int pnv_php_register_one(struct device_node *dn)
> +{
> +	struct pnv_php_slot *php_slot;
> +	const __be32 *prop32;
> +	int ret;
> +
> +	/* Check if it's hotpluggable slot */
> +	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
> +	if (!prop32 || !of_read_number(prop32, 1))
> +		return -ENXIO;
> +
> +	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
> +	if (!prop32 || !of_read_number(prop32, 1))
> +		return -ENXIO;
> +
> +	php_slot = pnv_php_alloc_slot(dn);
> +	if (!php_slot)
> +		return -ENODEV;
> +
> +	ret = pnv_php_register_slot(php_slot);
> +	if (ret)
> +		goto free_slot;
> +
> +	ret = pnv_php_enable(php_slot, false);
> +	if (ret)
> +		goto unregister_slot;
> +
> +	return 0;
> +
> +unregister_slot:
> +	pnv_php_unregister_one(php_slot->dn);
> +free_slot:
> +	pnv_php_put_slot(php_slot);
> +	return ret;
> +}
> +
> +static void pnv_php_register(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	/*
> +	 * The parent slots should be registered before their
> +	 * child slots.
> +	 */
> +	for_each_child_of_node(dn, child) {
> +		pnv_php_register_one(child);
> +		pnv_php_register(child);
> +	}
> +}
> +
> +static void pnv_php_unregister_one(struct device_node *dn)
> +{
> +	struct pnv_php_slot *php_slot;
> +
> +	php_slot = pnv_php_find_slot(dn);
> +	if (!php_slot)
> +		return;
> +
> +	pnv_php_put_slot(php_slot);
> +	pci_hp_deregister(&php_slot->slot);
> +}
> +
> +static void pnv_php_unregister(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	/* The child slots should go before their parent slots */
> +	for_each_child_of_node(dn, child) {
> +		pnv_php_unregister(child);
> +		pnv_php_unregister_one(child);
> +	}
> +}
> +
> +static struct notifier_block php_msg_nb = {
> +	.notifier_call	= pnv_php_handle_msg,
> +	.next		= NULL,
> +	.priority	= 0,
> +};
> +
> +static int __init pnv_php_init(void)
> +{
> +	struct device_node *dn;
> +	int ret;
> +
> +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> +
> +	/* Register hotplug message handler */
> +	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
> +	if (ret) {
> +		pr_warn("%s: Error %d registering hotplug notifier\n",
> +			__func__, ret);
> +		return ret;
> +	}
> +
> +	/* Scan PHB nodes and their children */
> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> +		pnv_php_register(dn);
> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> +		pnv_php_register(dn);
> +
> +	return 0;
> +}
> +
> +static void __exit pnv_php_exit(void)
> +{
> +	struct device_node *dn;
> +
> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> +		pnv_php_unregister(dn);
> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> +		pnv_php_unregister(dn);
> +
> +	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
> +}
> +
> +module_init(pnv_php_init);
> +module_exit(pnv_php_exit);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops
  2016-04-13  5:52       ` Alexey Kardashevskiy
  (?)
@ 2016-04-19 23:59       ` Gavin Shan
  -1 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-19 23:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 13, 2016 at 03:52:25PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>Each PHB has one instance of "struct pci_controller_ops", which
>>includes various callbacks called by PCI subsystem. In the definition
>>of this struct, some callbacks have explicit names for its arguments,
>>but the left don't have.
>>
>>This adds all explicit names of the arguments to the callbacks in
>>"struct pci_controller_ops" so that the code looks consistent.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>Reviewed-by: Daniel Axtens <dja@axtens.net>
>
>With tiny nit below,
>
>Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>
>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h | 13 +++++++------
>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index b688d04..4dd6ef4 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -21,18 +21,19 @@ struct pci_controller_ops {
>>  	void		(*dma_dev_setup)(struct pci_dev *dev);
>>  	void		(*dma_bus_setup)(struct pci_bus *bus);
>>
>>-	int		(*probe_mode)(struct pci_bus *);
>>+	int		(*probe_mode)(struct pci_bus *bus);
>>
>>  	/* Called when pci_enable_device() is called. Returns true to
>>  	 * allow assignment/enabling of the device. */
>>-	bool		(*enable_device_hook)(struct pci_dev *);
>>+	bool		(*enable_device_hook)(struct pci_dev *dev);
>
>
>"pdev" is slightly better as it is of the "pci_dev" type (4130 occurrences of
>"pci_dev *pdev" and just 2833 of "pci_dev *dev" in the current kernel), "dev"
>is for "struct device".
>

Thanks for your review. I don't know if "dev" is for "struct device" only.
Usually, "dev" and "pdev" are interchangeably used for "struct pci_dev".
Especially the code written in old days uses "dev" for "struct pci_dev"
heavily.

Yes, I agree "pdev" is better than "dev" in this case and I'm going to
fix this up in next revision.

>>
>>-	void		(*disable_device)(struct pci_dev *);
>>+	void		(*disable_device)(struct pci_dev *dev);
>>
>>-	void		(*release_device)(struct pci_dev *);
>>+	void		(*release_device)(struct pci_dev *dev);
>>
>>  	/* Called during PCI resource reassignment */
>>-	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
>>+	resource_size_t (*window_alignment)(struct pci_bus *bus,
>>+					    unsigned long type);
>>  	void		(*setup_bridge)(struct pci_bus *bus,
>>  					unsigned long type);
>>  	void		(*reset_secondary_bus)(struct pci_dev *dev);
>>@@ -46,7 +47,7 @@ struct pci_controller_ops {
>>  	int             (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask);
>>  	u64		(*dma_get_required_mask)(struct pci_dev *dev);
>>
>>-	void		(*shutdown)(struct pci_controller *);
>>+	void		(*shutdown)(struct pci_controller *hose);
>>  };
>>
>>  /*
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  2016-04-13  6:45   ` Alexey Kardashevskiy
@ 2016-04-20  0:04     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  0:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 13, 2016 at 04:45:39PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>The original implementation of pnv_ioda_setup_pe_seg() configures
>>IO and M32 segments by separate logics, which can be merged by
>>by caching @segmap, @seg_size, @win in advance. This shouldn't
>>cause any behavioural changes.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++++++++++++++-----------------
>>  1 file changed, 28 insertions(+), 34 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 44cc5f3..fd7d382 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2940,8 +2940,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>  	struct pnv_phb *phb = hose->private_data;
>>  	struct pci_bus_region region;
>>  	struct resource *res;
>>-	int i, index;
>>-	int rc;
>>+	unsigned int segsize;
>>+	int *segmap, index, i;
>>+	uint16_t win;
>>+	int64_t rc;
>>
>>  	/*
>>  	 * NOTE: We only care PCI bus based PE for now. For PCI
>>@@ -2958,23 +2960,9 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>  		if (res->flags & IORESOURCE_IO) {
>>  			region.start = res->start - phb->ioda.io_pci_base;
>>  			region.end   = res->end - phb->ioda.io_pci_base;
>>-			index = region.start / phb->ioda.io_segsize;
>>-
>>-			while (index < phb->ioda.total_pe_num &&
>>-			       region.start <= region.end) {
>>-				phb->ioda.io_segmap[index] = pe->pe_number;
>>-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
>>-				if (rc != OPAL_SUCCESS) {
>>-					pr_err("%s: OPAL error %d when mapping IO "
>>-					       "segment #%d to PE#%d\n",
>>-					       __func__, rc, index, pe->pe_number);
>>-					break;
>>-				}
>>-
>>-				region.start += phb->ioda.io_segsize;
>>-				index++;
>>-			}
>>+			segsize      = phb->ioda.io_segsize;
>>+			segmap       = phb->ioda.io_segmap;
>>+			win          = OPAL_IO_WINDOW_TYPE;
>>  		} else if ((res->flags & IORESOURCE_MEM) &&
>>  			   !pnv_pci_is_mem_pref_64(res->flags)) {
>>  			region.start = res->start -
>>@@ -2983,23 +2971,29 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>  			region.end   = res->end -
>>  				       hose->mem_offset[0] -
>>  				       phb->ioda.m32_pci_base;
>>-			index = region.start / phb->ioda.m32_segsize;
>>-
>>-			while (index < phb->ioda.total_pe_num &&
>>-			       region.start <= region.end) {
>>-				phb->ioda.m32_segmap[index] = pe->pe_number;
>>-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
>>-				if (rc != OPAL_SUCCESS) {
>>-					pr_err("%s: OPAL error %d when mapping M32 "
>>-					       "segment#%d to PE#%d",
>>-					       __func__, rc, index, pe->pe_number);
>>-					break;
>>-				}
>>+			segsize      = phb->ioda.m32_segsize;
>>+			segmap       = phb->ioda.m32_segmap;
>>+			win          = OPAL_M32_WINDOW_TYPE;
>>+		} else {
>>+			continue;
>>+		}
>>
>>-				region.start += phb->ioda.m32_segsize;
>>-				index++;
>>+		index = region.start / segsize;
>>+		while (index < phb->ioda.total_pe_num &&
>>+		       region.start <= region.end) {
>>+			segmap[index] = pe->pe_number;
>>+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>+					pe->pe_number, win, 0, index);
>>+			if (rc != OPAL_SUCCESS) {
>>+				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
>>+					__func__, rc, win, index,
>>+					pe->phb->hose->global_number,
>>+					pe->pe_number);
>>+				break;
>
>Please move this loop to a helper and stop caching segsize/segmap/win; this
>will make the code easier to read and the next patch will look much cleaner
>as it will not have to move this exact loop.
>

Thanks. It's good idea and I'll change the code accordingly in next revision.

>>  			}
>>+
>>+			region.start += segsize;
>>+			index++;
>>  		}
>>  	}
>>  }
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption
  2016-04-13  7:09   ` Alexey Kardashevskiy
@ 2016-04-20  0:05     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  0:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 13, 2016 at 05:09:45PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>When unplugging PCI devices, their parent PEs might be offline.
>>The consumed M64 resource by the PEs should be released at that
>>time. As we track M32 segment consumption, this introduces an
>>array to the PHB to track the mapping between M64 segment and
>>PE number.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>
>Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>but it would not hurt to mention in the commit log why M64 segment is not
>tracked/setup by the existing (at this point, at least)
>pnv_ioda_setup_one_res().
>

Right, I'll add something for it to the commit log in next revision, thanks!

>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++--
>>  arch/powerpc/platforms/powernv/pci.h      |  1 +
>>  2 files changed, 9 insertions(+), 2 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 7330a73..fc0374a 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -305,6 +305,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>  		phb->ioda.total_pe_num) {
>>  		pe = &phb->ioda.pe_array[i];
>>
>>+		phb->ioda.m64_segmap[pe->pe_number] = pe->pe_number;
>>  		if (!master_pe) {
>>  			pe->flags |= PNV_IODA_PE_MASTER;
>>  			INIT_LIST_HEAD(&pe->slaves);
>>@@ -3245,7 +3246,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  {
>>  	struct pci_controller *hose;
>>  	struct pnv_phb *phb;
>>-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>+	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
>>  	const __be64 *prop64;
>>  	const __be32 *prop32;
>>  	int i, len;
>>@@ -3332,6 +3333,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>
>>  	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>  	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>+	m64map_off = size;
>>+	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
>>  	m32map_off = size;
>>  	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
>>  	if (phb->type == PNV_PHB_IODA1) {
>>@@ -3342,9 +3345,12 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
>>  	aux = memblock_virt_alloc(size, 0);
>>  	phb->ioda.pe_alloc = aux;
>>+	phb->ioda.m64_segmap = aux + m64map_off;
>>  	phb->ioda.m32_segmap = aux + m32map_off;
>>-	for (i = 0; i < phb->ioda.total_pe_num; i++)
>>+	for (i = 0; i < phb->ioda.total_pe_num; i++) {
>>+		phb->ioda.m64_segmap[i] = IODA_INVALID_PE;
>>  		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
>>+	}
>>  	if (phb->type == PNV_PHB_IODA1) {
>>  		phb->ioda.io_segmap = aux + iomap_off;
>>  		for (i = 0; i < phb->ioda.total_pe_num; i++)
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 36c4965..866a5ea 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -146,6 +146,7 @@ struct pnv_phb {
>>  		struct pnv_ioda_pe	*pe_array;
>>
>>  		/* M32 & IO segment maps */
>>+		int			*m64_segmap;
>>  		int			*m32_segmap;
>>  		int			*io_segmap;
>>
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC
  2016-04-13  7:47     ` Alexey Kardashevskiy
@ 2016-04-20  0:22       ` Gavin Shan
  2016-04-20  2:55         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  0:22 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 13, 2016 at 05:47:59PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>This enables M64 window on P7IOC, which has been enabled on PHB3.
>>Different from PHB3 where 16 M64 BARs are supported and each of
>>them can be owned by one particular PE# exclusively or divided
>>evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>of them are divided to 8 segments. So every P7IOC PHB supports
>>128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>M64DT, indicating that one M64 segment can only be pinned to the
>>fixed PE#. In order to have same code to support M64 on P7IOC and
>>PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>of them is pinned to the fixed PE# by bypassing the function of
>>M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>and PHB3 to support M64.
>
>The comment is not quite correct - in addition to pnv_ioda1_init_m64(), you
>also need to hack pnv_ioda_pick_m64_pe().
>

Right, will talk about the changes to pnv_ioda_pick_m64_pe() in the
commit log of next revision.

>
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>>  arch/powerpc/platforms/powernv/pci.h      |  3 ++
>>  2 files changed, 86 insertions(+), 3 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 1dc663a..8488238 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -246,6 +246,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>>  	}
>>  }
>>
>>+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>+{
>>+	struct resource *r;
>>+	int index;
>>+
>>+	/*
>>+	 * There are 16 M64 BARs, each of which has 8 segments. So
>>+	 * there are as many M64 segments as the maximum number of
>>+	 * PEs, which is 128.
>>+	 */
>>+	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
>>+		unsigned long base, segsz = phb->ioda.m64_segsize;
>>+		int64_t rc;
>>+
>>+		base = phb->ioda.m64_base +
>>+		       index * PNV_IODA1_M64_SEGS * segsz;
>>+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>+				OPAL_M64_WINDOW_TYPE, index, base, 0,
>>+				PNV_IODA1_M64_SEGS * segsz);
>>+		if (rc != OPAL_SUCCESS) {
>>+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>+				rc, phb->hose->global_number, index);
>>+			goto fail;
>>+		}
>>+
>>+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>+				OPAL_M64_WINDOW_TYPE, index,
>>+				OPAL_ENABLE_M64_SPLIT);
>>+		if (rc != OPAL_SUCCESS) {
>>+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>+				rc, phb->hose->global_number, index);
>>+			goto fail;
>>+		}
>>+	}
>>+
>>+	/*
>>+	 * Exclude the segment used by the reserved PE, which
>>+	 * is expected to be 0 or last supported PE#.
>>+	 */
>>+	r = &phb->hose->mem_resources[1];
>>+	if (phb->ioda.reserved_pe_idx == 0)
>>+		r->start += phb->ioda.m64_segsize;
>>+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>+		r->end -= phb->ioda.m64_segsize;
>>+	else
>>+		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>+			phb->ioda.reserved_pe_idx);
>>+
>>+	return 0;
>>+
>>+fail:
>>+	for ( ; index >= 0; index--)
>>+		opal_pci_phb_mmio_enable(phb->opal_id,
>>+			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
>>+
>>+	return -EIO;
>>+}
>>+
>>  static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>  				    unsigned long *pe_bitmap,
>>  				    bool all)
>>@@ -315,6 +373,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  			pe->master = master_pe;
>>  			list_add_tail(&pe->list, &master_pe->slaves);
>>  		}
>>+
>>+		/*
>>+		 * P7IOC supports M64DT, which helps mapping M64 segment
>>+		 * to one particular PE#. However, PHB3 has fixed mapping
>>+		 * between M64 segment and PE#. In order to have same logic
>>+		 * for P7IOC and PHB3, we enforce fixed mapping between M64
>>+		 * segment and PE# on P7IOC.
>>+		 */
>>+		if (phb->type == PNV_PHB_IODA1) {
>>+			int64_t rc;
>>+
>>+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>+					pe->pe_number, OPAL_M64_WINDOW_TYPE,
>>+					pe->pe_number / PNV_IODA1_M64_SEGS,
>>+					pe->pe_number % PNV_IODA1_M64_SEGS);
>>+			if (rc != OPAL_SUCCESS)
>>+				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
>>+					__func__, rc, phb->hose->global_number,
>>+					pe->pe_number);
>>+		}
>
>
>Cannot this go to pnv_ioda1_init_m64()? From the commit log I understood that
>this setup is supposed to be static so it can be done once. Or it is sort of
>enable/disable PE? Then make is a helper and call it ioda1_pe_enable() or
>something.
>

No, we cannot. This associates the M64 segments with the PE# and it can be
done in pnv_ioda1_init_m64() where PE# is unknown. I don't understand what
you meant by "sort of enable/disable PE". PE starts the jurney when PELTM
has the corresponding mapping and it doesn't depend on M64 mapping necessarily.

>
>>  	}
>>
>>  	kfree(pe_alloc);
>>@@ -329,8 +407,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>  	const u32 *r;
>>  	u64 pci_addr;
>>
>>-	/* FIXME: Support M64 for P7IOC */
>>-	if (phb->type != PNV_PHB_IODA2) {
>>+	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
>>  		pr_info("  Not support M64 window\n");
>>  		return;
>>  	}
>>@@ -364,7 +441,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>
>>  	/* Use last M64 BAR to cover M64 window */
>>  	phb->ioda.m64_bar_idx = 15;
>>-	phb->init_m64 = pnv_ioda2_init_m64;
>>+	if (phb->type == PNV_PHB_IODA1)
>>+		phb->init_m64 = pnv_ioda1_init_m64;
>>+	else
>>+		phb->init_m64 = pnv_ioda2_init_m64;
>>  	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>>  	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>>  }
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 866a5ea..00539ff 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -82,6 +82,9 @@ struct pnv_ioda_pe {
>>  	struct list_head	list;
>>  };
>>
>>+#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
>>+#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
>>+
>
>Why here, not in the beginning of arch/powerpc/platforms/powernv/pci-ioda.c ?
>It exposes symbols but nothing is using them (except pci-ioda.c) and code
>browsing gets bit more inconvenient.
>

It would be personal taste: those macros is tied with the definition
of "struct pnv_ioda_pe" or "struct pnv_ioda_phb". On the other hand,
those macros have to be in the header file once we split pci-ioda.c
to multiple source files some day. However, I can move them to
pci-ioda.c if you really want see them there. Let me know anyway.

>
>>  #define PNV_PHB_FLAG_EEH	(1 << 0)
>>
>>  struct pnv_phb {
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE
  2016-04-14  3:36         ` Alexey Kardashevskiy
@ 2016-04-20  0:25           ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  0:25 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Thu, Apr 14, 2016 at 01:36:33PM +1000, Alexey Kardashevskiy wrote:
>On 04/14/2016 09:54 AM, Gavin Shan wrote:
>>On Wed, Apr 13, 2016 at 06:29:42PM +1000, Alexey Kardashevskiy wrote:
>>>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>>>Currently, there is one macro (TCE32_TABLE_SIZE) representing the
>>>>TCE table size for one DMA32 segment. The constant representing
>>>>the DMA32 segment size (1 << 28) is still used in the code.
>>>>
>>>>This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
>>>>segment size. the TCE table size can be calcualted when the page
>>>
>>>s/calcualted/calculated/
>>>
>>>
>>>>has fixed 4KB size. So all the related calculation depends on one
>>>>macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.
>>>
>>>Please move PNV_IODA1_DMA32_SEGSIZE where TCE32_TABLE_SIZE was.
>>>
>>>
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 30 +++++++++++++++++-------------
>>>>  arch/powerpc/platforms/powernv/pci.h      |  1 +
>>>>  2 files changed, 18 insertions(+), 13 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index d18b95e..e60cff6 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -48,9 +48,6 @@
>>>>  #include "powernv.h"
>>>>  #include "pci.h"
>>>>
>>>>-/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>>>-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
>>>>-
>>>>  #define POWERNV_IOMMU_DEFAULT_LEVELS	1
>>>>  #define POWERNV_IOMMU_MAX_LEVELS	5
>>>>
>>>>@@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>>
>>>>  	struct page *tce_mem = NULL;
>>>>  	struct iommu_table *tbl;
>>>>-	unsigned int i;
>>>>+	unsigned int tce32_segsz, i;
>>>
>>>
>>>PNV_IODA1_DMA32_SEGSIZE is a segment size in bytes. The name @tce32_segsz
>>>also suggests that it is a segment size in bytes (otherwise it would be
>>>tce32_seg_entries or something like this) but it is not, it is a number of
>>>TCE entries (arch/powerpc/kernel/iommu.c uses "entry" for these). And
>>>tce32_segsz never changes. So:
>>>
>>>const unsigned int entries = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K
>>>- 3);
>>>
>>
>>Are you sure @tce32_segsz and equation you gave are for number of TCE entries,
>>not the size of meory required for the DMA32 segment TCE table?
>
>No, I am not :) "-3" makes it a table size in bytes, so it is rather tablesz
>then.
>

Ok. @tce32_segsz is the size of memory used for TCE entries for one segment (256MB),
not a whole TCE table. So I think @tce32_segsz is better than @tablesz from the
perspective.

>
>>
>>>>  	int64_t rc;
>>>>  	void *addr;
>>>>
>>>>@@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>>  	/* Grab a 32-bit TCE table */
>>>>  	pe->tce32_seg = base;
>>>>  	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>>>-		(base << 28), ((base + segs) << 28) - 1);
>>>>+		base * PNV_IODA1_DMA32_SEGSIZE,
>>>>+		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>>>>
>>>>  	/* XXX Currently, we allocate one big contiguous table for the
>>>>  	 * TCEs. We only really need one chunk per 256M of TCE space
>>>>  	 * (ie per segment) but that's an optimization for later, it
>>>>  	 * requires some added smarts with our get/put_tce implementation
>>>>+	 *
>>>>+	 * Each TCE page is 4KB in size and each TCE entry occupies 8
>>>>+	 * bytes
>>>>  	 */
>>>>+	tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);
>>>
>>>>  	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
>>>>-				   get_order(TCE32_TABLE_SIZE * segs));
>>>>+				   get_order(tce32_segsz * segs));
>>>>  	if (!tce_mem) {
>>>>  		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
>>>>  		goto fail;
>>>>  	}
>>>>  	addr = page_address(tce_mem);
>>>>-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
>>>>+	memset(addr, 0, tce32_segsz * segs);
>>>>
>>>>  	/* Configure HW */
>>>>  	for (i = 0; i < segs; i++) {
>>>>  		rc = opal_pci_map_pe_dma_window(phb->opal_id,
>>>>  					      pe->pe_number,
>>>>  					      base + i, 1,
>>>>-					      __pa(addr) + TCE32_TABLE_SIZE * i,
>>>>-					      TCE32_TABLE_SIZE, 0x1000);
>>>>+					      __pa(addr) + tce32_segsz * i,
>>>>+					      tce32_segsz, 0x1000);
>>>
>>>
>>>As you started using IOMMU_PAGE_SHIFT_4K and you are also touching this piece
>>>of code -
>>>
>>>s/0x1000/IOMMU_PAGE_SHIFT_4K/
>>>
>>
>>Does 0x1000 is equal to IOMMU_PAGE_SHIFT_4K? I guess you probably suggested
>>to use IOMMU_PAGE_SIZE_4K instead?
>
>
>Ah, my bad, should have been IOMMU_PAGE_SIZE_4K. I'll pay more attention to
>the details, sorry.
>

No worries. Thanks for your review anyway.

>>
>>>>  		if (rc) {
>>>>  			pe_err(pe, " Failed to configure 32-bit TCE table,"
>>>>  			       " err %ld\n", rc);
>>>>@@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>>  	}
>>>>
>>>>  	/* Setup linux iommu table */
>>>>-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
>>>>-				  base << 28, IOMMU_PAGE_SHIFT_4K);
>>>>+	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
>>>>+				  base * PNV_IODA1_DMA32_SEGSIZE,
>>>>+				  IOMMU_PAGE_SHIFT_4K);
>>>>
>>>>  	/* OPAL variant of P7IOC SW invalidated TCEs */
>>>>  	if (phb->ioda.tce_inval_reg)
>>>>@@ -2116,7 +2119,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>>  	if (pe->tce32_seg >= 0)
>>>>  		pe->tce32_seg = -1;
>>>>  	if (tce_mem)
>>>>-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>>>>+		__free_pages(tce_mem, get_order(tce32_segsz * segs));
>>>>  	if (tbl) {
>>>>  		pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
>>>>  		iommu_free_table(tbl, "pnv");
>>>>@@ -3445,7 +3448,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>  	mutex_init(&phb->ioda.pe_list_mutex);
>>>>
>>>>  	/* Calculate how many 32-bit TCE segments we have */
>>>>-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
>>>>+	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
>>>>+				PNV_IODA1_DMA32_SEGSIZE;
>>>>
>>>>  #if 0 /* We should really do that ... */
>>>>  	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>>>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>>>index 00539ff..1d8e775 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci.h
>>>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>>>@@ -84,6 +84,7 @@ struct pnv_ioda_pe {
>>>>
>>>>  #define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
>>>>  #define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
>>>>+#define PNV_IODA1_DMA32_SEGSIZE	0x10000000
>>>>
>>>>  #define PNV_PHB_FLAG_EEH	(1 << 0)
>>>>
>>>>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list
  2016-04-13  8:59   ` Alexey Kardashevskiy
@ 2016-04-20  0:34     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  0:34 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 13, 2016 at 06:59:40PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
>>to their DMA32 weight. The PEs on the list are iterated to setup
>>their TCE32 tables at system booting time. The list is used for
>>once and there is for keep having it.
>
>"there is no need to keep it" may be?
>

Sorry, I should have fixed it in early revision. Will fix it
up in next revision.

>>
>>This moves the logic calculating DMA32 weight of PHB and PE to
>>pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE
>>traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount
>>are useless and they're removed.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>
>Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>with few comments below...
>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 168 +++++++++++++-----------------
>>  arch/powerpc/platforms/powernv/pci.h      |  19 ----
>>  2 files changed, 75 insertions(+), 112 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index e60cff6..0fc2309 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -886,44 +886,6 @@ out:
>>  	return 0;
>>  }
>>
>>-static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
>>-				       struct pnv_ioda_pe *pe)
>>-{
>>-	struct pnv_ioda_pe *lpe;
>>-
>>-	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
>>-		if (lpe->dma_weight < pe->dma_weight) {
>>-			list_add_tail(&pe->dma_link, &lpe->dma_link);
>>-			return;
>>-		}
>>-	}
>>-	list_add_tail(&pe->dma_link, &phb->ioda.pe_dma_list);
>>-}
>>-
>>-static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>>-{
>>-	/* This is quite simplistic. The "base" weight of a device
>>-	 * is 10. 0 means no DMA is to be accounted for it.
>>-	 */
>>-
>>-	/* If it's a bridge, no DMA */
>>-	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>>-		return 0;
>>-
>>-	/* Reduce the weight of slow USB controllers */
>>-	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>>-	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>>-	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
>>-		return 3;
>>-
>>-	/* Increase the weight of RAID (includes Obsidian) */
>>-	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
>>-		return 15;
>>-
>>-	/* Default */
>>-	return 10;
>>-}
>>-
>>  #ifdef CONFIG_PCI_IOV
>>  static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>  {
>>@@ -1028,7 +990,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>>  	pe->flags = PNV_IODA_PE_DEV;
>>  	pe->pdev = dev;
>>  	pe->pbus = NULL;
>>-	pe->tce32_seg = -1;
>>  	pe->mve_number = -1;
>>  	pe->rid = dev->bus->number << 8 | pdn->devfn;
>>
>>@@ -1044,16 +1005,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>>  		return NULL;
>>  	}
>>
>>-	/* Assign a DMA weight to the device */
>>-	pe->dma_weight = pnv_ioda_dma_weight(dev);
>>-	if (pe->dma_weight != 0) {
>>-		phb->ioda.dma_weight += pe->dma_weight;
>>-		phb->ioda.dma_pe_count++;
>>-	}
>>-
>>-	/* Link the PE */
>>-	pnv_ioda_link_pe_by_weight(phb, pe);
>>-
>>  	return pe;
>>  }
>>
>>@@ -1071,7 +1022,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>  		}
>>  		pdn->pcidev = dev;
>>  		pdn->pe_number = pe->pe_number;
>>-		pe->dma_weight += pnv_ioda_dma_weight(dev);
>>  		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>  			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>>  	}
>>@@ -1108,10 +1058,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>>  	pe->pbus = bus;
>>  	pe->pdev = NULL;
>>-	pe->tce32_seg = -1;
>>  	pe->mve_number = -1;
>>  	pe->rid = bus->busn_res.start << 8;
>>-	pe->dma_weight = 0;
>>
>>  	if (all)
>>  		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
>>@@ -1133,17 +1081,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>
>>  	/* Put PE to the list */
>>  	list_add_tail(&pe->list, &phb->ioda.pe_list);
>>-
>>-	/* Account for one DMA PE if at least one DMA capable device exist
>>-	 * below the bridge
>>-	 */
>>-	if (pe->dma_weight != 0) {
>>-		phb->ioda.dma_weight += pe->dma_weight;
>>-		phb->ioda.dma_pe_count++;
>>-	}
>>-
>>-	/* Link the PE */
>>-	pnv_ioda_link_pe_by_weight(phb, pe);
>>  }
>>
>>  static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
>>@@ -1184,7 +1121,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
>>  			rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
>>  			npu_pdn->pcidev = npu_pdev;
>>  			npu_pdn->pe_number = pe_num;
>>-			pe->dma_weight += pnv_ioda_dma_weight(npu_pdev);
>>  			phb->ioda.pe_rmap[rid] = pe->pe_number;
>>
>>  			/* Map the PE to this link */
>>@@ -1532,7 +1468,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>  		pe->flags = PNV_IODA_PE_VF;
>>  		pe->pbus = NULL;
>>  		pe->parent_dev = pdev;
>>-		pe->tce32_seg = -1;
>>  		pe->mve_number = -1;
>>  		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
>>  			   pci_iov_virtfn_devfn(pdev, vf_index);
>>@@ -2023,6 +1958,54 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
>>  	.free = pnv_ioda2_table_free,
>>  };
>>
>>+static int pnv_pci_ioda_dev_dma_weight(struct pci_dev *dev, void *data)
>>+{
>>+	unsigned int *weight = (unsigned int *)data;
>>+
>>+	/* This is quite simplistic. The "base" weight of a device
>>+	 * is 10. 0 means no DMA is to be accounted for it.
>>+	 */
>>+	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>>+		return 0;
>>+
>>+	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>>+	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>>+	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
>>+		*weight += 3;
>>+	else if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
>>+		*weight += 15;
>>+	else
>>+		*weight += 10;
>>+
>>+	return 0;
>>+}
>>+
>>+static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe)
>>+{
>>+	unsigned int weight = 0;
>>+
>>+	if ((pe->flags & PNV_IODA_PE_DEV) && pe->pdev) {
>>+		pnv_pci_ioda_dev_dma_weight(pe->pdev, &weight);
>>+	} else if ((pe->flags & PNV_IODA_PE_BUS) && pe->pbus) {
>>+		struct pci_dev *pdev;
>>+
>>+		list_for_each_entry(pdev, &pe->pbus->devices, bus_list)
>>+			pnv_pci_ioda_dev_dma_weight(pdev, &weight);
>>+	} else if ((pe->flags & PNV_IODA_PE_BUS_ALL) && pe->pbus) {
>>+		pci_walk_bus(pe->pbus, pnv_pci_ioda_dev_dma_weight, &weight);
>>+	}
>>+
>>+	return weight;
>>+}
>>+
>>+static unsigned int pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
>
>
>s/pnv_pci_ioda_total_dma_weight/pnv_pci_ioda1_phb_dma_weight/ ? "total" does
>not say much. Or just merge it into pnv_pci_ioda1_setup_dma_pe() as it is
>useless for anything but IODA1.
>

Nice suggestion. I will merge it to pnv_pci_ioda1_setup_dma_pe().

>>+{
>>+	unsigned int weight = 0;
>>+
>>+	pci_walk_bus(phb->hose->bus, pnv_pci_ioda_dev_dma_weight, &weight);
>>+	return weight;
>>+}
>>+
>>  static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  				       struct pnv_ioda_pe *pe,
>>  				       unsigned int base,
>>@@ -2039,17 +2022,12 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>>  	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>>
>>-	/* We shouldn't already have a 32-bit DMA associated */
>>-	if (WARN_ON(pe->tce32_seg >= 0))
>>-		return;
>>-
>>  	tbl = pnv_pci_table_alloc(phb->hose->node);
>>  	iommu_register_group(&pe->table_group, phb->hose->global_number,
>>  			pe->pe_number);
>>  	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>>
>>  	/* Grab a 32-bit TCE table */
>>-	pe->tce32_seg = base;
>>  	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>  		base * PNV_IODA1_DMA32_SEGSIZE,
>>  		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>>@@ -2116,8 +2094,6 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  	return;
>>   fail:
>>  	/* XXX Failure: Try to fallback to 64-bit only ? */
>>-	if (pe->tce32_seg >= 0)
>>-		pe->tce32_seg = -1;
>>  	if (tce_mem)
>>  		__free_pages(tce_mem, get_order(tce32_segsz * segs));
>>  	if (tbl) {
>>@@ -2528,10 +2504,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  {
>>  	int64_t rc;
>>
>>-	/* We shouldn't already have a 32-bit DMA associated */
>>-	if (WARN_ON(pe->tce32_seg >= 0))
>>-		return;
>>-
>>  	/* TVE #1 is selected by PCI address bit 59 */
>>  	pe->tce_bypass_base = 1ull << 59;
>>
>>@@ -2539,7 +2511,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  			pe->pe_number);
>>
>>  	/* The PE will reserve all possible 32-bits space */
>>-	pe->tce32_seg = 0;
>>  	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
>>  		phb->ioda.m32_pci_base);
>>
>>@@ -2555,11 +2526,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  #endif
>>
>>  	rc = pnv_pci_ioda2_setup_default_config(pe);
>>-	if (rc) {
>>-		if (pe->tce32_seg >= 0)
>>-			pe->tce32_seg = -1;
>>+	if (rc)
>>  		return;
>>-	}
>>
>>  	if (pe->flags & PNV_IODA_PE_DEV)
>>  		iommu_add_device(&pe->pdev->dev);
>>@@ -2570,24 +2538,32 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  {
>>  	struct pci_controller *hose = phb->hose;
>>-	unsigned int residual, remaining, segs, tw, base;
>>+	unsigned int weight, total_weight, dma_pe_count;
>>+	unsigned int residual, remaining, segs, base;
>>  	struct pnv_ioda_pe *pe;
>>
>>+	total_weight = pnv_pci_ioda_total_dma_weight(phb);
>>+	dma_pe_count = 0;
>>+	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>+		weight = pnv_pci_ioda_pe_dma_weight(pe);
>>+		if (weight > 0)
>>+			dma_pe_count++;
>>+	}
>>+
>>  	/* If we have more PE# than segments available, hand out one
>>  	 * per PE until we run out and let the rest fail. If not,
>>  	 * then we assign at least one segment per PE, plus more based
>>  	 * on the amount of devices under that PE
>>  	 */
>>-	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
>>+	if (dma_pe_count > phb->ioda.tce32_count)
>>  		residual = 0;
>>  	else
>>-		residual = phb->ioda.tce32_count -
>>-			phb->ioda.dma_pe_count;
>>+		residual = phb->ioda.tce32_count - dma_pe_count;
>>
>>  	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
>>  		hose->global_number, phb->ioda.tce32_count);
>>  	pr_info("PCI: %d PE# for a total weight of %d\n",
>>-		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
>>+		dma_pe_count, total_weight);
>>
>>  	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>
>>@@ -2596,18 +2572,20 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  	 * weight
>>  	 */
>>  	remaining = phb->ioda.tce32_count;
>>-	tw = phb->ioda.dma_weight;
>>  	base = 0;
>>-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>-		if (!pe->dma_weight)
>>+	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>+		weight = pnv_pci_ioda_pe_dma_weight(pe);
>>+		if (!weight)
>>  			continue;
>>+
>>  		if (!remaining) {
>>  			pe_warn(pe, "No DMA32 resources available\n");
>>  			continue;
>>  		}
>>  		segs = 1;
>>  		if (residual) {
>>-			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
>>+			segs += ((weight * residual) + (total_weight / 2)) /
>>+				total_weight;
>>  			if (segs > remaining)
>>  				segs = remaining;
>>  		}
>>@@ -2619,7 +2597,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  		 */
>>  		if (phb->type == PNV_PHB_IODA1) {
>>  			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>>-				pe->dma_weight, segs);
>>+				weight, segs);
>>  			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>>  		} else if (phb->type == PNV_PHB_IODA2) {
>>  			pe_info(pe, "Assign DMA32 space\n");
>>@@ -3156,13 +3134,18 @@ static void pnv_npu_ioda_fixup(void)
>>  	struct pci_controller *hose, *tmp;
>>  	struct pnv_phb *phb;
>>  	struct pnv_ioda_pe *pe;
>>+	unsigned int weight;
>>
>>  	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>  		phb = hose->private_data;
>>  		if (phb->type != PNV_PHB_NPU)
>>  			continue;
>>
>>-		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>+		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>+			weight = pnv_pci_ioda_pe_dma_weight(pe);
>>+			if (!weight)
>>+				continue;
>
>Is this even possible for NPU PE to get weight==0? WARN_ON()? BUG_ON()?
>

It's impossible and worthy to have a WARN_ON() here. Will address it
in next revision.

>>+
>>  			enable_bypass = dma_get_mask(&pe->pdev->dev) ==
>>  				DMA_BIT_MASK(64);
>>  			pnv_npu_init_dma_pe(pe);
>>@@ -3443,7 +3426,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	phb->ioda.pe_array = aux + pemap_off;
>>  	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>>
>>-	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
>>  	INIT_LIST_HEAD(&phb->ioda.pe_list);
>>  	mutex_init(&phb->ioda.pe_list_mutex);
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 1d8e775..e90bcbe 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -53,14 +53,7 @@ struct pnv_ioda_pe {
>>  	/* PE number */
>>  	unsigned int		pe_number;
>>
>>-	/* "Weight" assigned to the PE for the sake of DMA resource
>>-	 * allocations
>>-	 */
>>-	unsigned int		dma_weight;
>>-
>>  	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
>>-	int			tce32_seg;
>>-	int			tce32_segcount;
>>  	struct iommu_table_group table_group;
>>
>>  	/* 64-bit TCE bypass region */
>>@@ -78,7 +71,6 @@ struct pnv_ioda_pe {
>>  	struct list_head	slaves;
>>
>>  	/* Link in list of PE#s */
>>-	struct list_head	dma_link;
>>  	struct list_head	list;
>>  };
>>
>>@@ -173,17 +165,6 @@ struct pnv_phb {
>>  		/* 32-bit TCE tables allocation */
>>  		unsigned long		tce32_count;
>>
>>-		/* Total "weight" for the sake of DMA resources
>>-		 * allocation
>>-		 */
>>-		unsigned int		dma_weight;
>>-		unsigned int		dma_pe_count;
>>-
>>-		/* Sorted list of used PE's, sorted at
>>-		 * boot for resource allocation purposes
>>-		 */
>>-		struct list_head	pe_dma_list;
>>-
>>  		/* TCE cache invalidate registers (physical and
>>  		 * remapped)
>>  		 */
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 17/45] powerpc/powernv/ioda1: Improve DMA32 segment track
  2016-04-19  1:50     ` Alexey Kardashevskiy
@ 2016-04-20  0:49       ` Gavin Shan
  2016-04-20  5:10         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  0:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 11:50:10AM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>In current implementation, the DMA32 segments required by one specific
>>PE isn't calculated with the information hold in the PE independently.
>>It conflicts with the PCI hotplug design: PE centralized, meaning the
>>PE's DMA32 segments should be calculated from the information hold in
>>the PE independently.
>>
>>This introduces an array (@dma32_segmap) for every PHB to track the
>>DMA32 segmeng usage. Besides, this moves the logic calculating PE's
>>consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
>>DMA32 segments are calculated/allocated from the information hold in
>>the PE (DMA32 weight). Also the logic is improved: we try to allocate
>>as much DMA32 segments as we can. It's acceptable that number of DMA32
>>segments less than the expected number are allocated.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>
>This DMA segments business was the reason why I have not even tried
>implementing DDW for POWER7 - it is way too different from POWER8 and there
>is no chance that anyone outside Ozlabs will ever try using this in practice;
>the same applies to PCI hotplug on POWER7.
>
>I am suggesting to ditch all IODA1 changes from this patchset as this code
>will hang around (unused) for may be a year or so and then will be gone as
>p5ioc2.
>

As I knew, some P7 boxes out of Ozlabs have the software stack. At least,
I was heavily relying on P7 box + PowerNV based linux heavily until last
September of last year. My original thoughts are as below. If they're
convincing, I can drop some of IODA1 changes, but not all of them obviously:

- In case customer want to use this combo (P7 box + PowerNV) for any reason.
- In case developers want to use this combo (P7 box + PowerNV) for any reason.
  For example, no P8 boxes can be found for one particular project, but available
  P7 box is still ok for that.
- EEH supported on P7/P8 needs hotplug some cases: when hitting excessive failures,
  PCI devices and their platform resources (PE, DMA, M32/M64 mapping etc) should
  be purged.
- Current implementation has P7/P8 mixed up to some extent which isn't so good
  as Ben pointed long time ago. It's impossible not to affect P7IOC piece if
  P8 piece is changed in order to support hotplug.

>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 111 +++++++++++++++++-------------
>>  arch/powerpc/platforms/powernv/pci.h      |   7 +-
>>  2 files changed, 66 insertions(+), 52 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 0fc2309..59782fba 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2007,20 +2007,54 @@ static unsigned int pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
>>  }
>>
>>  static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>-				       struct pnv_ioda_pe *pe,
>>-				       unsigned int base,
>>-				       unsigned int segs)
>>+				       struct pnv_ioda_pe *pe)
>>  {
>>
>>  	struct page *tce_mem = NULL;
>>  	struct iommu_table *tbl;
>>-	unsigned int tce32_segsz, i;
>>+	unsigned int weight, total_weight;
>>+	unsigned int tce32_segsz, base, segs, i;
>>  	int64_t rc;
>>  	void *addr;
>>
>>  	/* XXX FIXME: Handle 64-bit only DMA devices */
>>  	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>>  	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>>+	total_weight = pnv_pci_ioda_total_dma_weight(phb);
>>+	weight = pnv_pci_ioda_pe_dma_weight(pe);
>>+
>>+	segs = (weight * phb->ioda.dma32_count) / total_weight;
>>+	if (!segs)
>>+		segs = 1;
>>+
>>+	/*
>>+	 * Allocate contiguous DMA32 segments. We begin with the expected
>>+	 * number of segments. With one more attempt, the number of DMA32
>>+	 * segments to be allocated is decreased by one until one segment
>>+	 * is allocated successfully.
>>+	 */
>>+	while (segs) {
>>+		for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
>>+			for (i = base; i < base + segs; i++) {
>>+				if (phb->ioda.dma32_segmap[i] !=
>>+				    IODA_INVALID_PE)
>>+					break;
>>+			}
>>+
>>+			if (i >= base + segs)
>>+				break;
>>+		}
>>+
>>+		if (i >= base + segs)
>>+			break;
>>+
>>+		segs--;
>>+	}
>>+
>>+	if (!segs) {
>>+		pe_warn(pe, "No available DMA32 segments\n");
>>+		return;
>>+	}
>>
>>  	tbl = pnv_pci_table_alloc(phb->hose->node);
>>  	iommu_register_group(&pe->table_group, phb->hose->global_number,
>>@@ -2028,6 +2062,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>>
>>  	/* Grab a 32-bit TCE table */
>>+	pe_info(pe, "DMA weight %d (%d), assigned (%d) %d DMA32 segments\n",
>>+		weight, total_weight, base, segs);
>>  	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>  		base * PNV_IODA1_DMA32_SEGSIZE,
>>  		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>>@@ -2064,6 +2100,10 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  		}
>>  	}
>>
>>+	/* Setup DMA32 segment mapping */
>>+	for (i = base; i < base + segs; i++)
>>+		phb->ioda.dma32_segmap[i] = pe->pe_number;
>>+
>>  	/* Setup linux iommu table */
>>  	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
>>  				  base * PNV_IODA1_DMA32_SEGSIZE,
>>@@ -2538,70 +2578,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  {
>>  	struct pci_controller *hose = phb->hose;
>>-	unsigned int weight, total_weight, dma_pe_count;
>>-	unsigned int residual, remaining, segs, base;
>>  	struct pnv_ioda_pe *pe;
>>-
>>-	total_weight = pnv_pci_ioda_total_dma_weight(phb);
>>-	dma_pe_count = 0;
>>-	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>-		weight = pnv_pci_ioda_pe_dma_weight(pe);
>>-		if (weight > 0)
>>-			dma_pe_count++;
>>-	}
>>+	unsigned int weight;
>>
>>  	/* If we have more PE# than segments available, hand out one
>>  	 * per PE until we run out and let the rest fail. If not,
>>  	 * then we assign at least one segment per PE, plus more based
>>  	 * on the amount of devices under that PE
>>  	 */
>>-	if (dma_pe_count > phb->ioda.tce32_count)
>>-		residual = 0;
>>-	else
>>-		residual = phb->ioda.tce32_count - dma_pe_count;
>>-
>>  	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
>>-		hose->global_number, phb->ioda.tce32_count);
>>-	pr_info("PCI: %d PE# for a total weight of %d\n",
>>-		dma_pe_count, total_weight);
>>+		hose->global_number, phb->ioda.dma32_count);
>>
>>  	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>
>>-	/* Walk our PE list and configure their DMA segments, hand them
>>-	 * out one base segment plus any residual segments based on
>>-	 * weight
>>-	 */
>>-	remaining = phb->ioda.tce32_count;
>>-	base = 0;
>>+	/* Walk our PE list and configure their DMA segments */
>>  	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>  		weight = pnv_pci_ioda_pe_dma_weight(pe);
>>  		if (!weight)
>>  			continue;
>>
>>-		if (!remaining) {
>>-			pe_warn(pe, "No DMA32 resources available\n");
>>-			continue;
>>-		}
>>-		segs = 1;
>>-		if (residual) {
>>-			segs += ((weight * residual) + (total_weight / 2)) /
>>-				total_weight;
>>-			if (segs > remaining)
>>-				segs = remaining;
>>-		}
>>-
>>  		/*
>>  		 * For IODA2 compliant PHB3, we needn't care about the weight.
>>  		 * The all available 32-bits DMA space will be assigned to
>>  		 * the specific PE.
>>  		 */
>>  		if (phb->type == PNV_PHB_IODA1) {
>>-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>>-				weight, segs);
>>-			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>>+			pnv_pci_ioda1_setup_dma_pe(phb, pe);
>>  		} else if (phb->type == PNV_PHB_IODA2) {
>>  			pe_info(pe, "Assign DMA32 space\n");
>>-			segs = 0;
>>  			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>  		} else if (phb->type == PNV_PHB_NPU) {
>>  			/*
>>@@ -2611,9 +2615,6 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  			 * as the PHB3 TVT.
>>  			 */
>>  		}
>>-
>>-		remaining -= segs;
>>-		base += segs;
>>  	}
>>  }
>>
>>@@ -3313,7 +3314,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  {
>>  	struct pci_controller *hose;
>>  	struct pnv_phb *phb;
>>-	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
>>+	unsigned long size, m64map_off, m32map_off, pemap_off;
>>+	unsigned long iomap_off = 0, dma32map_off = 0;
>>  	const __be64 *prop64;
>>  	const __be32 *prop32;
>>  	int i, len;
>>@@ -3398,6 +3400,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
>>  	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
>>
>>+	/* Calculate how many 32-bit TCE segments we have */
>>+	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
>>+				PNV_IODA1_DMA32_SEGSIZE;
>>+
>>  	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>  	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>  	m64map_off = size;
>>@@ -3407,6 +3413,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	if (phb->type == PNV_PHB_IODA1) {
>>  		iomap_off = size;
>>  		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
>>+		dma32map_off = size;
>>+		size += phb->ioda.dma32_count *
>>+			sizeof(phb->ioda.dma32_segmap[0]);
>>  	}
>>  	pemap_off = size;
>>  	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
>>@@ -3422,6 +3431,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  		phb->ioda.io_segmap = aux + iomap_off;
>>  		for (i = 0; i < phb->ioda.total_pe_num; i++)
>>  			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
>>+
>>+		phb->ioda.dma32_segmap = aux + dma32map_off;
>>+		for (i = 0; i < phb->ioda.dma32_count; i++)
>>+			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
>>  	}
>>  	phb->ioda.pe_array = aux + pemap_off;
>>  	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>>@@ -3430,7 +3443,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	mutex_init(&phb->ioda.pe_list_mutex);
>>
>>  	/* Calculate how many 32-bit TCE segments we have */
>>-	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
>>+	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
>>  				PNV_IODA1_DMA32_SEGSIZE;
>>
>>  #if 0 /* We should really do that ... */
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index e90bcbe..350e630 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -146,6 +146,10 @@ struct pnv_phb {
>>  		int			*m32_segmap;
>>  		int			*io_segmap;
>>
>>+		/* DMA32 segment maps - IODA1 only */
>>+		unsigned long		dma32_count;
>>+		int			*dma32_segmap;
>>+
>>  		/* IRQ chip */
>>  		int			irq_chip_init;
>>  		struct irq_chip		irq_chip;
>>@@ -162,9 +166,6 @@ struct pnv_phb {
>>  		 */
>>  		unsigned char		pe_rmap[0x10000];
>>
>>-		/* 32-bit TCE tables allocation */
>>-		unsigned long		tce32_count;
>>-
>>  		/* TCE cache invalidate registers (physical and
>>  		 * remapped)
>>  		 */
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 18/45] powerpc/powernv: Increase PE# capacity
  2016-04-19  2:02   ` Alexey Kardashevskiy
@ 2016-04-20  0:52     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  0:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 12:02:23PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>Each PHB maintains an array helping to translate 2-bytes Request
>>ID (RID) to PE# with the assumption that PE# takes one byte, meaning
>>that we can't have more than 256 PEs. However, pci_dn->pe_number
>>already had 4-bytes for the PE#.
>>
>>This extends the PE# capacity for every PHB. After that, the PE number
>>is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
>>check the PE# in phb->pe_rmap[] is valid or not.
>
>
>This should be merged into "[PATCH v8 21/45] powerpc/powernv: Create PEs at
>PCI hot plugging time" as it does not make sense alone (this patch does the
>initialization but only 3 patches apart this default value is analyzed ->
>hard to review).
>

Indeed, will move accordingly in next revision.

>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>Reviewed-by: Daniel Axtens <dja@axtens.net>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++++-
>>  arch/powerpc/platforms/powernv/pci.h      | 7 ++-----
>>  2 files changed, 7 insertions(+), 6 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 59782fba..7800897 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -757,7 +757,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>
>>  	/* Clear the reverse map */
>>  	for (rid = pe->rid; rid < rid_end; rid++)
>>-		phb->ioda.pe_rmap[rid] = 0;
>>+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>>
>>  	/* Release from all parents PELT-V */
>>  	while (parent) {
>>@@ -3387,6 +3387,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	if (prop32)
>>  		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
>>
>>+	/* Invalidate RID to PE# mapping */
>>+	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
>>+		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
>>+
>>  	/* Parse 64-bit MMIO range */
>>  	pnv_ioda_parse_m64_window(phb);
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 350e630..928cf81 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -160,11 +160,8 @@ struct pnv_phb {
>>  		struct list_head	pe_list;
>>  		struct mutex            pe_list_mutex;
>>
>>-		/* Reverse map of PEs, will have to extend if
>>-		 * we are to support more than 256 PEs, indexed
>>-		 * bus { bus, devfn }
>>-		 */
>>-		unsigned char		pe_rmap[0x10000];
>>+		/* Reverse map of PEs, indexed by {bus, devfn} */
>>+		int			pe_rmap[0x10000];
>>
>>  		/* TCE cache invalidate registers (physical and
>>  		 * remapped)
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 20/45] powerpc/powernv: Allocate PE# in reverse order
  2016-04-19  3:07   ` Alexey Kardashevskiy
@ 2016-04-20  1:04     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  1:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 01:07:59PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>PE number for one particular PE can be allocated dynamically or
>>reserved according to the consumed M64 (64-bits prefetchable)
>>segments of the PE. The M64 resources, and hence their segments
>>and PE number are assigned/reserved in ascending order. The PE
>>numbers are allocated dynamically in ascending order as well.
>>It's not a problem as the PE numbers are reserved and then
>>allocated all at once in fine order. However, it will introduce
>>conflicts when PCI hotplug is supported: the PE number to be
>>reserved for newly added PE might have been assigned.
>>
>>To resolve above conflicts, this forces the PE number to be
>>allocated dynamically in reverse order. With this patch applied,
>>the PE numbers are reserved in ascending order, but allocated
>>dynamically in reverse order.
>
>
>The patch is probably is ok, the commit log is not - I do not follow it. Some
>PEs are reserved (for what? why does the absolute PE number matter? put it in
>the commit log), that means that the corresponding bits in pe_alloc[] should
>be set so when you will be allocating PEs for a just plugged device, you
>won't pick them and you will pick free ones, and the order should not matter.
>I would think that "reservation" happens once at the boot time so you set
>"used" bits for the reserved PEs then and after that the dynamic allocator
>will skip them.
>

I will enhance the commit log in next revision, perhaps just pick part of
below words: On PHB3, there are 16 M64 BARs in hardware. The last one is
split ovenly into 256 segments. Each segment can be associated/assigned
to fixed PE# (segment#x <-> PE#x) which is how the hardware was designed.
If one plugged PE has M64 (64-bits prefetchable memory) resources, its
PE# is equal to the segment#. Otherwise, the PE# is allocated dynamically
if the PE doesn't contain M64 resource.

The M64 resources are assigned from low to high end, meaning the reserved
PE# (according to the M64 segments) are grown from low to high end. It's
most likely to get a dynamically allocated PE# which should be reserved
because of M64 segment. It's the conflicts the patch tries to resolve.

The PE# reservation doesn't happen once at boot time because it's
unknow how many PEs and how much M64 resources will be hot added.

>
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++--------
>>  1 file changed, 6 insertions(+), 8 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index f182ca7..565725b 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -144,16 +144,14 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>
>>  static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>  {
>>-	unsigned long pe;
>>+	unsigned long pe = phb->ioda.total_pe_num - 1;
>>
>>-	do {
>>-		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>-					phb->ioda.total_pe_num, 0);
>>-		if (pe >= phb->ioda.total_pe_num)
>>-			return NULL;
>>-	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>+	for (pe = phb->ioda.total_pe_num - 1; pe >= 0; pe--) {
>>+		if (!test_and_set_bit(pe, phb->ioda.pe_alloc))
>>+			return pnv_ioda_init_pe(phb, pe);
>>+	}
>>
>>-	return pnv_ioda_init_pe(phb, pe);
>>+	return NULL;
>>  }
>>
>>  static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time
  2016-04-19  4:16   ` Alexey Kardashevskiy
@ 2016-04-20  1:12     ` Gavin Shan
  2016-04-20  3:00       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  1:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 02:16:42PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>Currently, the PEs and their associated resources are assigned
>>in ppc_md.pcibios_fixup() except those used by SRIOV VFs.
>
>But this new code does not affect IOV and VF's PEs will still be created
>somewhere else rather than pnv_pci_setup_bridge()?
>

Correct. VF PEs cannot be created in pnv_pci_setup_bridge() as the PF's
IOV capability isn't enabled at that point.

>
>>The
>>function is called for once after PCI probing and resources
>>assignment is completed. So it isn't hotplug friendly.
>>
>>This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
>>is called on the event during system bootup and PCI hotplug: updating
>>PCI bridge's windows after resource assignment/reassignment are done.
>>For partial hotplug case, where not all PCI devices belonging to the
>>PE are unplugged and plugged again, we just need unbinding/binding
>>the affected PCI devices with the corresponding PE without creating
>>new one.
>>
>>As there is no upstream bridge for root bus that needs to be covered
>>by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
>>before any other PEs can be created, as PE for root bus is the ancestor
>>to anyone else.
>
>We did not need a root bus PE before? What is the other PE reserved for?
>Comments only say "reserved"...
>

No, A PE for root bus is needed before. other PEs can be for the PCI bus
originated from root port and the subordinate domains.
 
>>
>>Also, the windows of root port or the upstream port of PCIe switch behind
>>root port are extended to be PHB's apertures to accommodate the additional
>>resources needed by newly plugged devices based on the fact: hotpluggable
>>slot is behind root port or downstream port of the PCIe switch behind
>>root port. The extension for those PCI brdiges' windows is done in
>>ppc_md.pcibios_setup_bridge() as well.
>
>
>This patch seems to be doing way too many things, hard to follow.
>
>Could you please split the patch into smaller chunks? For example (you can do
>it totally different):
>- move pnv_pci_ioda_setup_opal_tce_kill()
>- move PE creation from pnv_pci_ioda_fixup() to pnv_pci_setup_bridge();
>- add pnv_pci_fixup_bridge_resources()
>- add an extra reserved PE for the root bus (and all this magic with
>root_pe_idx/root_pe_populated)
>- ...
>

I'll evaluate it later. It's always nice to have small patches. Thanks
for the comments.

>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table
  2016-04-19  4:28   ` Alexey Kardashevskiy
@ 2016-04-20  1:15     ` Gavin Shan
  2016-04-20  3:17       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  1:15 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 02:28:51PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>pnv_pci_ioda_table_free_pages() can be reused to release the IODA1
>>TCE table when releasing IODA1 PE in subsequent patches.
>>
>>This renames the following functions to support releasing IODA1 TCE
>>table: pnv_pci_ioda2_table_free_pages() to pnv_pci_ioda_table_free_pages(),
>>pnv_pci_ioda2_table_do_free_pages() to pnv_pci_ioda_table_do_free_pages().
>>No logical changes introduced.
>
>I can only see renaming here but it seems (from
>IODA_architecture_04-14-2008.pdf) that IODA1 does not support multi-level TCE
>tables in the way IODA2 does.
>

Note that the change was proposed by you in last round. Yes, TVE on P7IOC
doesn't support multiple levels of TCE tables. In this case, we will always
have "tbl->it_indirect_levels" to 1, right?

>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 18 +++++++++---------
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index d360607..077f9db 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -51,7 +51,7 @@
>>  #define POWERNV_IOMMU_DEFAULT_LEVELS	1
>>  #define POWERNV_IOMMU_MAX_LEVELS	5
>>
>>-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
>>+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl);
>>
>>  static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
>>  			    const char *fmt, ...)
>>@@ -1352,7 +1352,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>>  		iommu_group_put(pe->table_group.group);
>>  		BUG_ON(pe->table_group.group);
>>  	}
>>-	pnv_pci_ioda2_table_free_pages(tbl);
>>+	pnv_pci_ioda_table_free_pages(tbl);
>>  	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>  }
>>
>>@@ -1946,7 +1946,7 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, long index,
>>
>>  static void pnv_ioda2_table_free(struct iommu_table *tbl)
>>  {
>>-	pnv_pci_ioda2_table_free_pages(tbl);
>>+	pnv_pci_ioda_table_free_pages(tbl);
>>  	iommu_free_table(tbl, "pnv");
>>  }
>>
>>@@ -2448,7 +2448,7 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int nid, unsigned shift,
>>  	return addr;
>>  }
>>
>>-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
>>  		unsigned long size, unsigned level);
>>
>>  static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>@@ -2487,7 +2487,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>  	 * release partially allocated table.
>>  	 */
>>  	if (offset < tce_table_size) {
>>-		pnv_pci_ioda2_table_do_free_pages(addr,
>>+		pnv_pci_ioda_table_do_free_pages(addr,
>>  				1ULL << (level_shift - 3), levels - 1);
>>  		return -ENOMEM;
>>  	}
>>@@ -2505,7 +2505,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>  	return 0;
>>  }
>>
>>-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
>>  		unsigned long size, unsigned level)
>>  {
>>  	const unsigned long addr_ul = (unsigned long) addr &
>>@@ -2521,7 +2521,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>  			if (!(hpa & (TCE_PCI_READ | TCE_PCI_WRITE)))
>>  				continue;
>>
>>-			pnv_pci_ioda2_table_do_free_pages(__va(hpa), size,
>>+			pnv_pci_ioda_table_do_free_pages(__va(hpa), size,
>>  					level - 1);
>>  		}
>>  	}
>>@@ -2529,7 +2529,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>  	free_pages(addr_ul, get_order(size << 3));
>>  }
>>
>>-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
>>+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl)
>>  {
>>  	const unsigned long size = tbl->it_indirect_levels ?
>>  			tbl->it_level_size : tbl->it_size;
>>@@ -2537,7 +2537,7 @@ static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
>>  	if (!tbl->it_size)
>>  		return;
>>
>>-	pnv_pci_ioda2_table_do_free_pages((__be64 *)tbl->it_base, size,
>>+	pnv_pci_ioda_table_do_free_pages((__be64 *)tbl->it_base, size,
>>  			tbl->it_indirect_levels);
>>  }
>>
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
  2016-04-19  5:28   ` [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add,remove}_pci_devices() Alexey Kardashevskiy
@ 2016-04-20  1:23     ` Gavin Shan
  2016-04-20  3:21       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  1:23 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 03:28:36PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
>>with names of the weak functions in PCI subsystem, which have the
>>prefix "pcibios". No logical changes introduced.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h |  4 ++--
>>  arch/powerpc/kernel/eeh_driver.c      | 12 ++++++------
>>  arch/powerpc/kernel/pci-hotplug.c     | 15 +++++++--------
>>  drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
>>  drivers/pci/hotplug/rpaphp_core.c     |  4 ++--
>>  drivers/pci/hotplug/rpaphp_pci.c      |  2 +-
>>  6 files changed, 19 insertions(+), 20 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index 4dd6ef4..c817f38 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -263,10 +263,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
>>  extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
>>
>>  /** Remove all of the PCI devices under this bus */
>>-extern void pcibios_remove_pci_devices(struct pci_bus *bus);
>>+extern void pci_remove_pci_devices(struct pci_bus *bus);
>
>
>pci_lala_pci_lala() ("pci" is used twice) looks weird, if the prefix is
>"pci", what other device types can they handle?...
>
>May be pcihp_add_devices(), pcihp_remove_devices() as these as defined in
>pci-hotplug.c?
>

I assume you're talking about drivers/pci/hotplug/pci_hotplug_core.c.
pci_hotplug_core.c uses pci_hp_ prefix rather than pcihp_. I will
rename them to pci_hp_*() in next revision.

gwshan@gwshan:~/sandbox/linux$ find . -name pci-hotplug.c
./arch/powerpc/kernel/pci-hotplug.c
gwshan@gwshan:~/sandbox/linux$ grep pci*hp arch/powerpc/kernel/pci-hotplug.c 

>
>>
>>  /** Discover new pci devices under this bus, and add them */
>>-extern void pcibios_add_pci_devices(struct pci_bus *bus);
>>+extern void pci_add_pci_devices(struct pci_bus *bus);
>>
>>
>>  extern void isa_bridge_find_early(struct pci_controller *hose);
>>diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>index fb6207d..59e53fe 100644
>>--- a/arch/powerpc/kernel/eeh_driver.c
>>+++ b/arch/powerpc/kernel/eeh_driver.c
>>@@ -621,7 +621,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>>  	 * We don't remove the corresponding PE instances because
>>  	 * we need the information afterwords. The attached EEH
>>  	 * devices are expected to be attached soon when calling
>>-	 * into pcibios_add_pci_devices().
>>+	 * into pci_add_pci_devices().
>>  	 */
>>  	eeh_pe_state_mark(pe, EEH_PE_KEEP);
>>  	if (bus) {
>>@@ -630,7 +630,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>>  		} else {
>>  			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
>>  			pci_lock_rescan_remove();
>>-			pcibios_remove_pci_devices(bus);
>>+			pci_remove_pci_devices(bus);
>>  			pci_unlock_rescan_remove();
>>  		}
>>  	} else if (frozen_bus) {
>>@@ -681,7 +681,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>>  		if (pe->type & EEH_PE_VF)
>>  			eeh_add_virt_device(edev, NULL);
>>  		else
>>-			pcibios_add_pci_devices(bus);
>>+			pci_add_pci_devices(bus);
>>  	} else if (frozen_bus && rmv_data->removed) {
>>  		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>>  		ssleep(5);
>>@@ -691,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>>  		if (pe->type & EEH_PE_VF)
>>  			eeh_add_virt_device(edev, NULL);
>>  		else
>>-			pcibios_add_pci_devices(frozen_bus);
>>+			pci_add_pci_devices(frozen_bus);
>>  	}
>>  	eeh_pe_state_clear(pe, EEH_PE_KEEP);
>>
>>@@ -896,7 +896,7 @@ perm_error:
>>  			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>
>>  			pci_lock_rescan_remove();
>>-			pcibios_remove_pci_devices(frozen_bus);
>>+			pci_remove_pci_devices(frozen_bus);
>>  			pci_unlock_rescan_remove();
>>  		}
>>  	}
>>@@ -981,7 +981,7 @@ static void eeh_handle_special_event(void)
>>  				bus = eeh_pe_bus_get(phb_pe);
>>  				eeh_pe_dev_traverse(pe,
>>  					eeh_report_failure, NULL);
>>-				pcibios_remove_pci_devices(bus);
>>+				pci_remove_pci_devices(bus);
>>  			}
>>  			pci_unlock_rescan_remove();
>>  		}
>>diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
>>index 59c4361..78bf2a1 100644
>>--- a/arch/powerpc/kernel/pci-hotplug.c
>>+++ b/arch/powerpc/kernel/pci-hotplug.c
>>@@ -38,20 +38,20 @@ void pcibios_release_device(struct pci_dev *dev)
>>  }
>>
>>  /**
>>- * pcibios_remove_pci_devices - remove all devices under this bus
>>+ * pci_remove_pci_devices - remove all devices under this bus
>>   * @bus: the indicated PCI bus
>>   *
>>   * Remove all of the PCI devices under this bus both from the
>>   * linux pci device tree, and from the powerpc EEH address cache.
>>   */
>>-void pcibios_remove_pci_devices(struct pci_bus *bus)
>>+void pci_remove_pci_devices(struct pci_bus *bus)
>>  {
>>  	struct pci_dev *dev, *tmp;
>>  	struct pci_bus *child_bus;
>>
>>  	/* First go down child busses */
>>  	list_for_each_entry(child_bus, &bus->children, node)
>>-		pcibios_remove_pci_devices(child_bus);
>>+		pci_remove_pci_devices(child_bus);
>>
>>  	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
>>  		 pci_domain_nr(bus),  bus->number);
>>@@ -60,11 +60,10 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
>>  		pci_stop_and_remove_bus_device(dev);
>>  	}
>>  }
>>-
>>-EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
>>+EXPORT_SYMBOL_GPL(pci_remove_pci_devices);
>>
>>  /**
>>- * pcibios_add_pci_devices - adds new pci devices to bus
>>+ * pci_add_pci_devices - adds new pci devices to bus
>>   * @bus: the indicated PCI bus
>>   *
>>   * This routine will find and fixup new pci devices under
>>@@ -74,7 +73,7 @@ EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
>>   * is how this routine differs from other, similar pcibios
>>   * routines.)
>>   */
>>-void pcibios_add_pci_devices(struct pci_bus * bus)
>>+void pci_add_pci_devices(struct pci_bus *bus)
>>  {
>>  	int slotno, mode, pass, max;
>>  	struct pci_dev *dev;
>>@@ -114,4 +113,4 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
>>  	}
>>  	pcibios_finish_adding_to_bus(bus);
>>  }
>>-EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
>>+EXPORT_SYMBOL_GPL(pci_add_pci_devices);
>>diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
>>index b46b57d..730982b 100644
>>--- a/drivers/pci/hotplug/rpadlpar_core.c
>>+++ b/drivers/pci/hotplug/rpadlpar_core.c
>>@@ -380,7 +380,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
>>  	}
>>
>>  	/* Remove all devices below slot */
>>-	pcibios_remove_pci_devices(bus);
>>+	pci_remove_pci_devices(bus);
>>
>>  	/* Unmap PCI IO space */
>>  	if (pcibios_unmap_io_space(bus)) {
>>diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
>>index 611f605..bba07b3 100644
>>--- a/drivers/pci/hotplug/rpaphp_core.c
>>+++ b/drivers/pci/hotplug/rpaphp_core.c
>>@@ -404,7 +404,7 @@ static int enable_slot(struct hotplug_slot *hotplug_slot)
>>
>>  	if (state == PRESENT) {
>>  		pci_lock_rescan_remove();
>>-		pcibios_add_pci_devices(slot->bus);
>>+		pci_add_pci_devices(slot->bus);
>>  		pci_unlock_rescan_remove();
>>  		slot->state = CONFIGURED;
>>  	} else if (state == EMPTY) {
>>@@ -426,7 +426,7 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
>>  		return -EINVAL;
>>
>>  	pci_lock_rescan_remove();
>>-	pcibios_remove_pci_devices(slot->bus);
>>+	pci_remove_pci_devices(slot->bus);
>>  	pci_unlock_rescan_remove();
>>  	vm_unmap_aliases();
>>
>>diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
>>index 7836d69..1099b38 100644
>>--- a/drivers/pci/hotplug/rpaphp_pci.c
>>+++ b/drivers/pci/hotplug/rpaphp_pci.c
>>@@ -116,7 +116,7 @@ int rpaphp_enable_slot(struct slot *slot)
>>  		}
>>
>>  		if (list_empty(&bus->devices))
>>-			pcibios_add_pci_devices(bus);
>>+			pci_add_pci_devices(bus);
>>
>>  		if (!list_empty(&bus->devices)) {
>>  			info->adapter_status = CONFIGURED;
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 28/45] powerpc/pci: Introduce pci_remove_device_node_info()
  2016-04-19  5:48   ` Alexey Kardashevskiy
@ 2016-04-20  1:25     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  1:25 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 03:48:26PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>This implements and exports pci_remove_device_node_info(). It's
>>used to remove the pdn (struct pci_dn) for the indicated device
>>node. The function is going to be used by PowerNV PCI hotplug
>>driver.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>Kind of strange that there is no such helper for pseries, is there?
>

I don't find one actually. If you find one, pls let me know, thanks!

>
>Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h |  1 +
>>  arch/powerpc/kernel/pci_dn.c          | 23 +++++++++++++++++++++++
>>  2 files changed, 24 insertions(+)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index 72a9d4e..c6310e2 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -240,6 +240,7 @@ extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
>>  extern void remove_dev_pci_data(struct pci_dev *pdev);
>>  extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>>  					       struct device_node *dn);
>>+extern void pci_remove_device_node_info(struct device_node *dn);
>>
>>  static inline int pci_device_from_OF_node(struct device_node *np,
>>  					  u8 *bus, u8 *devfn)
>>diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>index 0a249ff..ce10281 100644
>>--- a/arch/powerpc/kernel/pci_dn.c
>>+++ b/arch/powerpc/kernel/pci_dn.c
>>@@ -331,6 +331,29 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>>  }
>>  EXPORT_SYMBOL_GPL(pci_add_device_node_info);
>>
>>+void pci_remove_device_node_info(struct device_node *dn)
>>+{
>>+	struct pci_dn *pdn = dn ? PCI_DN(dn) : NULL;
>>+#ifdef CONFIG_EEH
>>+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>+
>>+	if (edev)
>>+		edev->pdn = NULL;
>>+#endif
>>+
>>+	if (!pdn)
>>+		return;
>>+
>>+	WARN_ON(!list_empty(&pdn->child_list));
>>+	list_del(&pdn->list);
>>+	if (pdn->parent)
>>+		of_node_put(pdn->parent->node);
>>+
>>+	dn->data = NULL;
>>+	kfree(pdn);
>>+}
>>+EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
>>+
>>  /*
>>   * Traverse a device tree stopping each PCI device in the tree.
>>   * This is done depth first.  As each node is processed, a "pre"
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes()
  2016-04-19  5:51       ` Alexey Kardashevskiy
  (?)
@ 2016-04-20  1:27       ` Gavin Shan
  2016-04-20  3:39         ` Alexey Kardashevskiy
  -1 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  1:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 03:51:03PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>This renames traverse_pci_devices() to pci_traverse_device_nodes().
>>The function traverses all subordinate device nodes of the specified
>>one. Also, below cleanup applied to the function. No logical changes
>>introduced.
>>
>>    * Rename "pre" to "fn".
>>    * Avoid assignment in if condition reported from checkpatch.pl.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
>>  arch/powerpc/kernel/pci_dn.c         | 15 ++++++++++-----
>>  arch/powerpc/platforms/pseries/msi.c |  4 ++--
>>  3 files changed, 15 insertions(+), 10 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
>>index ca0c5bf..8753e4e 100644
>>--- a/arch/powerpc/include/asm/ppc-pci.h
>>+++ b/arch/powerpc/include/asm/ppc-pci.h
>>@@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
>>  struct device_node;
>>  struct pci_dn;
>>
>>-typedef void *(*traverse_func)(struct device_node *me, void *data);
>
>
>
>Why removing this typedef? Typedef's are good.
>
>Anyway,
>

Could you please provide more details why it's good? I removed it
because it was used for only once.


>
>Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>
>
>
>>-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>-		void *data);
>>+void *pci_traverse_device_nodes(struct device_node *start,
>>+				void *(*fn)(struct device_node *, void *),
>>+				void *data);
>>  void *traverse_pci_dn(struct pci_dn *root,
>>  		      void *(*fn)(struct pci_dn *, void *),
>>  		      void *data);
>>diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>index ce10281..ecdccce 100644
>>--- a/arch/powerpc/kernel/pci_dn.c
>>+++ b/arch/powerpc/kernel/pci_dn.c
>>@@ -372,8 +372,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
>>   * one of these nodes we also assume its siblings are non-pci for
>>   * performance.
>>   */
>>-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>-		void *data)
>>+void *pci_traverse_device_nodes(struct device_node *start,
>>+				void *(*fn)(struct device_node *, void *),
>>+				void *data)
>>  {
>>  	struct device_node *dn, *nextdn;
>>  	void *ret;
>>@@ -388,8 +389,11 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>  		if (classp)
>>  			class = of_read_number(classp, 1);
>>
>>-		if (pre && ((ret = pre(dn, data)) != NULL))
>>-			return ret;
>>+		if (fn) {
>>+			ret = fn(dn, data);
>>+			if (ret)
>>+				return ret;
>>+		}
>>
>>  		/* If we are a PCI bridge, go down */
>>  		if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
>>@@ -411,6 +415,7 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>  	}
>>  	return NULL;
>>  }
>>+EXPORT_SYMBOL_GPL(pci_traverse_device_nodes);
>>
>>  static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
>>  				      struct pci_dn *pdn)
>>@@ -487,7 +492,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>>  	}
>>
>>  	/* Update dn->phb ptrs for new phb and children devices */
>>-	traverse_pci_devices(dn, add_pdn, phb);
>>+	pci_traverse_device_nodes(dn, add_pdn, phb);
>>  }
>>
>>  /**
>>diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
>>index 272e9ec..543a638 100644
>>--- a/arch/powerpc/platforms/pseries/msi.c
>>+++ b/arch/powerpc/platforms/pseries/msi.c
>>@@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>>  	memset(&counts, 0, sizeof(struct msi_counts));
>>
>>  	/* Work out how many devices we have below this PE */
>>-	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
>>+	pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, &counts);
>>
>>  	if (counts.num_devices == 0) {
>>  		pr_err("rtas_msi: found 0 devices under PE for %s\n",
>>@@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>>  	/* else, we have some more calculating to do */
>>  	counts.requestor = pci_device_to_OF_node(dev);
>>  	counts.request = request;
>>-	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
>>+	pci_traverse_device_nodes(pe_dn, count_spare_msis, &counts);
>>
>>  	/* If the quota isn't an integer multiple of the total, we can
>>  	 * use the remainder as spare MSIs for anyone that wants them. */
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 35/45] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  2016-04-19  9:04       ` Alexey Kardashevskiy
  (?)
@ 2016-04-20  1:36       ` Gavin Shan
  -1 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  1:36 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 07:04:19PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>In pnv_pci_reset_secondary_bus(), we should issue fundamental reset
>>if any one subordinate device of the specified bus is requesting that.
>>Otherwise, the device might not come up after the reset.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>
>Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>
>Out of curiosity - what does "fundamental" reset actually do?
>

Please refer to the skiboot patches - power off/on the target slot.

>
>>---
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 21 ++++++++++++++++++++-
>>  1 file changed, 20 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index 593b8dc..c7454ba 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -866,9 +866,28 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>  	return 0;
>>  }
>>
>>+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
>>+{
>>+	int *freset = data;
>>+
>>+	/*
>>+	 * Stop the iteration immediately if there has any one
>>+	 * PCI device requesting fundamental reset.
>>+	 */
>>+	*freset |= pdev->needs_freset;
>>+	return *freset;
>>+}
>>+
>>  void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>>  {
>>-	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
>>+	int option, freset = 0;
>>+
>>+	if (dev->subordinate)
>>+		pci_walk_bus(dev->subordinate,
>>+			     pnv_pci_dev_reset_type, &freset);
>>+
>>+	option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
>>+	pnv_eeh_bridge_reset(dev, option);
>>  	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
>>  }
>>
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-04-19 10:36         ` Alexey Kardashevskiy
  (?)
@ 2016-04-20  1:55         ` Alistair Popple
  2016-05-02 23:41           ` Gavin Shan
  -1 siblings, 1 reply; 174+ messages in thread
From: Alistair Popple @ 2016-04-20  1:55 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Gavin Shan, devicetree, linux-pci,
	grant.likely, robherring2, bhelgaas, dja

On Tue, 19 Apr 2016 20:36:48 Alexey Kardashevskiy wrote:
> On 02/17/2016 02:44 PM, Gavin Shan wrote:
> > This adds standalone driver to support PCI hotplug for PowerPC PowerNV
> > platform that runs on top of skiboot firmware. The firmware identifies
> > hotpluggable slots and marked their device tree node with proper
> > "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
> > device tree nodes to create/register PCI hotplug slot accordingly.
> >
> > The PCI slots are organized in fashion of tree, which means one
> > PCI slot might have parent PCI slot and parent PCI slot possibly
> > contains multiple child PCI slots. At the plugging time, the parent
> > PCI slot is populated before its children. The child PCI slots are
> > removed before their parent PCI slot can be removed from the system.
> >
> > If the skiboot firmware doesn't support slot status retrieval, the PCI
> > slot device node shouldn't have property "ibm,reset-by-firmware". In
> > that case, none of valid PCI slots will be detected from device tree.
> > The skiboot firmware doesn't export the capability to access attention
> > LEDs yet and it's something for TBD.
> >
> > Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > ---
> >   drivers/pci/hotplug/Kconfig   |  12 +
> >   drivers/pci/hotplug/Makefile  |   3 +
> >   drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 885 insertions(+)
> >   create mode 100644 drivers/pci/hotplug/pnv_php.c
> >
> > diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> > index df8caec..167c8ce 100644
> > --- a/drivers/pci/hotplug/Kconfig
> > +++ b/drivers/pci/hotplug/Kconfig
> > @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
> >
> >   	  When in doubt, say N.
> >
> > +config HOTPLUG_PCI_POWERNV
> > +	tristate "PowerPC PowerNV PCI Hotplug driver"
> > +	depends on PPC_POWERNV && EEH
> > +	help
> > +	  Say Y here if you run PowerPC PowerNV platform that supports
> > +	  PCI Hotplug
> > +
> > +	  To compile this driver as a module, choose M here: the
> > +	  module will be called pnv-php.
> > +
> > +	  When in doubt, say N.
> > +
> >   config HOTPLUG_PCI_RPA
> >   	tristate "RPA PCI Hotplug driver"
> >   	depends on PPC_PSERIES && EEH
> > diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
> > index b616e75..e33cdda 100644
> > --- a/drivers/pci/hotplug/Makefile
> > +++ b/drivers/pci/hotplug/Makefile
> > @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
> >   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
> >   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
> >   obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
> > +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
> >   obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
> >   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
> >   obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
> > @@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
> >   acpiphp-objs		:=	acpiphp_core.o	\
> >   				acpiphp_glue.o
> >
> > +pnv-php-objs		:=	pnv_php.o
> > +
> >   rpaphp-objs		:=	rpaphp_core.o	\
> >   				rpaphp_pci.o	\
> >   				rpaphp_slot.o
> > diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> > new file mode 100644
> > index 0000000..364ec36
> > --- /dev/null
> > +++ b/drivers/pci/hotplug/pnv_php.c
> > @@ -0,0 +1,870 @@
> > +/*
> > + * PCI Hotplug Driver for PowerPC PowerNV platform.
> > + *
> > + * Copyright Gavin Shan, IBM Corporation 2015.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + */
> > +
> > +#include <linux/libfdt.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/pci_hotplug.h>
> > +
> > +#include <asm/opal.h>
> > +#include <asm/pnv-pci.h>
> > +#include <asm/ppc-pci.h>
> > +
> > +#define DRIVER_VERSION	"0.1"
> > +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
> > +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
> > +
> > +struct pnv_php_slot {
> > +	struct hotplug_slot		slot;
> > +	struct hotplug_slot_info	slot_info;
> > +	uint64_t			id;
> > +	char				*name;
> > +	int				slot_no;
> > +	struct kref			kref;
> > +#define PNV_PHP_STATE_INITIALIZED	0
> > +#define PNV_PHP_STATE_REGISTERED	1
> > +#define PNV_PHP_STATE_POPULATED		2
> > +	int				state;
> > +	struct device_node		*dn;
> > +	struct pci_dev			*pdev;
> > +	struct pci_bus			*bus;
> > +	bool				power_state_check;
> > +	int				power_state_confirmed;
> > +#define PNV_PHP_POWER_CONFIRMED_INVALID	0
> > +#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
> > +#define PNV_PHP_POWER_CONFIRMED_FAIL	2
> > +	struct opal_msg			*msg;
> > +	void				*fdt;
> > +	void				*dt;
> > +	struct of_changeset		ocs;
> > +	struct work_struct		work;
> > +	wait_queue_head_t		queue;
> > +	struct pnv_php_slot		*parent;
> > +	struct list_head		children;
> > +	struct list_head		link;
> > +};
> > +
> > +static LIST_HEAD(pnv_php_slot_list);
> > +static DEFINE_SPINLOCK(pnv_php_lock);
> > +
> > +static void pnv_php_register(struct device_node *dn);
> > +static void pnv_php_unregister_one(struct device_node *dn);
> > +static void pnv_php_unregister(struct device_node *dn);
> 
> 
> The names confused me. I'd suggest pnv_php_scan(), pnv_php_unregister(), 
> pnv_php_unregister_children() instead.
> 
> 
> Alistair, what do you reckon?

To be honest I'm not sure the new names are necessarily any less confusing. I
will admit to having to read that code twice though so perhaps a short comment
describing what each of those functions does would be the best method for
reducing confusion.

- Alistair

> > +
> > +static void pnv_php_free_slot(struct kref *kref)
> > +{
> > +	struct pnv_php_slot *php_slot = container_of(kref,
> > +						     struct pnv_php_slot,
> > +						     kref);
> > +
> > +	WARN_ON(!list_empty(&php_slot->children));
> > +	kfree(php_slot->name);
> > +	kfree(php_slot);
> > +}
> > +
> > +static inline void pnv_php_put_slot(struct pnv_php_slot *php_slot)
> > +{
> > +	if (!php_slot)
> 
> 
> BUG_ON()?
> 
> > +		return;
> > +
> > +	kref_put(&php_slot->kref, pnv_php_free_slot);
> > +}
> > +
> > +static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
> > +					  struct pnv_php_slot *php_slot)
> > +{
> > +	struct pnv_php_slot *target, *tmp;
> > +
> > +	if (php_slot->dn == dn) {
> > +		kref_get(&php_slot->kref);
> > +		return php_slot;
> > +	}
> > +
> > +	list_for_each_entry(tmp, &php_slot->children, link) {
> > +		target = pnv_php_match(dn, tmp);
> > +		if (target)
> > +			return target;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
> > +{
> > +	struct pnv_php_slot *php_slot, *tmp;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&pnv_php_lock, flags);
> > +	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
> > +		php_slot = pnv_php_match(dn, tmp);
> > +		if (php_slot) {
> > +			spin_unlock_irqrestore(&pnv_php_lock, flags);
> > +			return php_slot;
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> > +
> > +	return NULL;
> > +}
> > +
> > +/*
> > + * Remove pdn for all children of the indicated device node.
> > + * The function should remove pdn in a depth-first manner.
> > + */
> > +static void pnv_php_rmv_pdns(struct device_node *dn)
> > +{
> > +	struct device_node *child;
> > +
> > +	for_each_child_of_node(dn, child) {
> > +		pnv_php_rmv_pdns(child);
> > +
> > +		pci_remove_device_node_info(child);
> > +	}
> > +}
> > +
> > +/*
> > + * Remove all child nodes of the indicated device nodes. The
> > + * function should remove device nodes in depth-first manner.
> > + */
> > +static int pnv_php_rmv_device_nodes(struct device_node *parent)
> > +{
> > +	struct device_node *dn, *child;
> > +	int ret = 0;
> > +
> > +	for_each_child_of_node(parent, dn) {
> > +		ret = pnv_php_rmv_device_nodes(dn);
> > +		if (ret)
> > +			return ret;
> > +
> > +		child = of_get_next_child(dn, NULL);
> > +		if (child) {
> > +			of_node_put(child);
> > +			of_node_put(dn);
> > +			pr_err("%s: Alive children of node <%s>\n",
> > +			       __func__, of_node_full_name(dn));
> > +			return -EBUSY;
> > +		}
> > +
> > +		of_detach_node(dn);
> > +		of_node_put(dn);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * The function processes the message sent by firmware
> > + * to remove all device tree nodes beneath the slot's
> > + * nodes and the associated auxiliary data.
> > + */
> > +static void pnv_php_handle_poweroff(struct pnv_php_slot *php_slot)
> > +{
> > +	int ret;
> > +
> > +	pnv_php_rmv_pdns(php_slot->dn);
> > +
> > +	/*
> > +	 * If the device sub-tree was created from OF changeset, simply
> > +	 * to revert that. Otherwise, the device nodes in the sub-tree
> > +	 * need to be iterated and detached.
> > +	 */
> > +	if (php_slot->fdt) {
> > +		of_changeset_destroy(&php_slot->ocs);
> > +		kfree(php_slot->dt);
> > +		kfree(php_slot->fdt);
> > +		php_slot->dt        = NULL;
> > +		php_slot->dn->child = NULL;
> > +		php_slot->fdt       = NULL;
> > +		php_slot->power_state_confirmed =
> > +			PNV_PHP_POWER_CONFIRMED_SUCCESS;
> > +		wake_up_interruptible(&php_slot->queue);
> > +		return;
> > +	}
> > +
> > +	ret = pnv_php_rmv_device_nodes(php_slot->dn);
> > +	if (!ret) {
> > +		php_slot->power_state_confirmed =
> > +			PNV_PHP_POWER_CONFIRMED_SUCCESS;
> > +	} else {
> > +		php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
> > +		dev_warn(&php_slot->pdev->dev, "Error %d freeing nodes\n", ret);
> > +	}
> > +
> > +	wake_up_interruptible(&php_slot->queue);
> 
> 
> I liked one wake_up_interruptible() better...
> 
> 
> 
> > +}
> > +
> > +static int pnv_php_populate_changeset(struct of_changeset *ocs,
> > +				      struct device_node *dn)
> > +{
> > +	struct device_node *child;
> > +	int ret = 0;
> > +
> > +	for_each_child_of_node(dn, child) {
> > +		ret = of_changeset_attach_node(ocs, child);
> > +		if (ret)
> > +			break;
> > +
> > +		ret = pnv_php_populate_changeset(ocs, child);
> 
> 
> I asked in v7 - may be to add here "if (ret) break;"?
> 
> 
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
> > +{
> > +	struct pci_controller *hose = (struct pci_controller *)data;
> > +	struct pci_dn *pdn;
> > +
> > +	pdn = pci_add_device_node_info(hose, dn);
> > +	if (!pdn)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	return NULL;
> > +}
> > +
> > +static void pnv_php_add_pdns(struct pnv_php_slot *slot)
> > +{
> > +	struct pci_controller *hose = pci_bus_to_host(slot->bus);
> > +
> > +	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
> > +}
> > +
> > +static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
> > +{
> > +	void *fdt, *fdt1, *dt;
> > +	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
> > +	int ret;
> > +
> > +	/* We don't know the FDT blob size. We try to get it through
> > +	 * maximal memory chunk and then copy it to another chunk that
> > +	 * fits the real size.
> > +	 */
> > +	fdt1 = kzalloc(0x10000, GFP_KERNEL);
> > +	if (!fdt1)
> > +		goto error;
> > +
> > +	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
> > +	if (ret)
> > +		goto free_fdt1;
> > +
> > +	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
> > +	if (!fdt)
> > +		goto free_fdt1;
> > +
> > +	/* Unflatten device tree blob */
> > +	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
> > +	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
> > +	if (!dt) {
> > +		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
> > +		goto free_fdt;
> > +	}
> > +
> > +	/* Initialize and apply the changeset */
> > +	of_changeset_init(&php_slot->ocs);
> > +	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
> > +	if (ret) {
> > +		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
> > +			 ret);
> > +		goto free_dt;
> > +	}
> > +
> > +	php_slot->dn->child = NULL;
> > +	ret = of_changeset_apply(&php_slot->ocs);
> > +	if (ret) {
> > +		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
> > +			 ret);
> > +		goto destroy_changeset;
> > +	}
> > +
> > +	/* Add device node firmware data */
> > +	pnv_php_add_pdns(php_slot);
> > +	php_slot->fdt = fdt;
> > +	php_slot->dt  = dt;
> > +	goto out;
> > +
> > +destroy_changeset:
> > +	of_changeset_destroy(&php_slot->ocs);
> > +free_dt:
> > +	kfree(dt);
> > +	php_slot->dn->child = NULL;
> > +free_fdt:
> > +	kfree(fdt);
> > +free_fdt1:
> > +	kfree(fdt1);
> > +error:
> > +	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
> > +out:
> > +	/* Confirm status change */
> > +	php_slot->power_state_confirmed = confirm;
> > +	wake_up_interruptible(&php_slot->queue);
> > +}
> > +
> > +static void pnv_php_work(struct work_struct *data)
> > +{
> > +	struct pnv_php_slot *php_slot = container_of(data,
> > +						     struct pnv_php_slot,
> > +						     work);
> > +	uint64_t event = be64_to_cpu(php_slot->msg->params[0]);
> > +
> > +	if (event == OPAL_PCI_SLOT_POWER_OFF)
> > +		pnv_php_handle_poweroff(php_slot);
> > +	else
> > +		pnv_php_handle_poweron(php_slot);
> > +
> > +	pnv_php_put_slot(php_slot);
> > +}
> > +
> > +static int pnv_php_handle_msg(struct notifier_block *nb,
> > +			      unsigned long type,
> > +			      void *message)
> > +{
> > +	phandle h;
> > +	struct device_node *dn;
> > +	struct pnv_php_slot *php_slot;
> > +	struct opal_msg *msg = message;
> > +
> > +	if (type != OPAL_MSG_PCI_HOTPLUG) {
> > +		pr_warn("%s: Invalid message %ld received!\n",
> > +			__func__, type);
> > +		return NOTIFY_DONE;
> > +	}
> > +
> > +	h = (phandle)be64_to_cpu(msg->params[1]);
> > +	dn = of_find_node_by_phandle(h);
> > +	if (!dn) {
> > +		pr_warn("%s: No device node for phandle 0x%x\n",
> > +			__func__, h);
> > +		return NOTIFY_DONE;
> > +	}
> > +
> > +	php_slot = pnv_php_find_slot(dn);
> > +	if (!php_slot) {
> > +		pr_warn("%s: No slot found for node <%s>\n",
> > +			__func__, of_node_full_name(dn));
> > +		of_node_put(dn);
> > +		return NOTIFY_DONE;
> > +	}
> > +
> > +	of_node_put(dn);
> > +	php_slot->msg = msg;
> > +	schedule_work(&php_slot->work);
> > +	return NOTIFY_OK;
> > +}
> > +
> > +static int pnv_php_set_power_state(struct hotplug_slot *slot, u8 state)
> > +{
> > +	struct pnv_php_slot *php_slot = slot->private;
> > +	int ret;
> > +
> > +	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
> > +	ret = pnv_pci_set_power_state(php_slot->id, state);
> > +	if (ret) {
> > +		dev_warn(&php_slot->pdev->dev, "Error %d powering %s slot\n",
> > +			 ret, state ? "on" : "off");
> > +		return ret;
> > +	}
> > +
> > +	/* Continue to PCI probing after finalized device-tree. The
> > +	 * device-tree might have been updated completely at this
> > +	 * point. Thus we don't have to wait forever.
> > +	 */
> > +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
> > +		return 0;
> > +
> > +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
> > +		return -EBUSY;
> > +
> > +	/* Wait for firmware to add or remove device sub-tree. When it's done,
> > +	 * one signal is received from firmware.
> > +	 */
> > +	ret = wait_event_timeout(php_slot->queue,
> > +				 php_slot->power_state_confirmed, 10 * HZ);
> > +	if (!ret) {
> > +		dev_warn(&php_slot->pdev->dev, "Error %d waiting for power-%s\n",
> > +			 ret, state ? "on" : "off");
> > +		return -EBUSY;
> > +	}
> > +
> > +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
> > +		return 0;
> > +
> > +	dev_warn(&php_slot->pdev->dev, "Error status %d for power-%s\n",
> > +		 php_slot->power_state_confirmed, state ? "on" : "off");
> > +	return -EBUSY;
> > +}
> > +
> > +static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
> > +{
> > +	struct pnv_php_slot *php_slot = slot->private;
> > +	uint8_t power_state;
> 
> 
> Uninitialized variable.
> 
> 
> > +	int ret;
> > +
> > +	/*
> > +	 * Retrieve power status from firmware. If we fail
> > +	 * getting that, the power status fails back to
> > +	 * be on.
> > +	 */
> > +	ret = pnv_pci_get_power_state(php_slot->id, &power_state);
> > +	if (ret) {
> > +		*state = OPAL_PCI_SLOT_POWER_ON;
> > +		dev_warn(&php_slot->pdev->dev, "Error %d getting power status\n",
> > +			 ret);
> > +	} else {
> > +		*state = power_state;
> > +		slot->info->power_status = power_state;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
> > +{
> > +	struct pnv_php_slot *php_slot = slot->private;
> > +	uint8_t presence;
> 
> Uninitialized variable.
> 
> 
> > +	int ret;
> > +
> > +	/*
> > +	 * Retrieve presence status from firmware. If we can't
> > +	 * get that, it will fail back to be empty.
> > +	 */
> > +	ret = pnv_pci_get_presence_state(php_slot->id, &presence);
> > +	if (ret >= 0) {
> > +		*state = presence;
> > +		slot->info->adapter_status = presence;
> > +		ret = 0;
> > +	} else {
> > +		*state = OPAL_PCI_SLOT_EMPTY;
> > +		dev_warn(&php_slot->pdev->dev, "Error %d getting presence\n",
> > +			 ret);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
> > +{
> > +	/* FIXME: Make it real once firmware supports it */
> 
> It still does not?
> 
> 
> > +	slot->info->attention_status = state;
> > +
> > +	return 0;
> > +}
> > +
> > +static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
> > +{
> > +	struct hotplug_slot *slot = &php_slot->slot;
> > +	uint8_t presence, power_status;
> 
> 
> Uninitialized variables.
> 
> 
> > +	int ret;
> > +
> > +	/* Check if the slot has been configured */
> > +	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
> > +		return 0;
> > +
> > +	/* Retrieve slot presence status */
> > +	ret = pnv_php_get_adapter_state(slot, &presence);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Proceed if there have nothing behind the slot */
> > +	if (presence == OPAL_PCI_SLOT_EMPTY)
> > +		goto scan;
> > +
> > +	/*
> > +	 * If the power suply to the slot is off, we can't detect
> 
> s/suply/supply/
> 
> 
> > +	 * adapter presence state. That means we have to turn the
> > +	 * slot on before going to probe slot's presence state.
> > +	 *
> > +	 * On the first time, we don't change the power status to
> > +	 * boost system boot with assumption that the firmware
> > +	 * supplies consistent slot power status: empty slot always
> > +	 * has its power off and non-empty slot has its power on.
> > +	 */
> > +	if (!php_slot->power_state_check) {
> > +		php_slot->power_state_check = true;
> > +
> > +		ret = pnv_php_get_power_state(slot, &power_status);
> > +		if (ret)
> > +			return ret;
> > +
> > +		if (power_status != OPAL_PCI_SLOT_POWER_ON)
> > +			return 0;
> > +	}
> > +
> > +	/* Check the power status. Scan the slot if that's already on */
> 
> 
> s/that's/it is/
> 
> 
> > +	ret = pnv_php_get_power_state(slot, &power_status);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (power_status == OPAL_PCI_SLOT_POWER_ON)
> > +		goto scan;
> > +
> > +	/* Power is off, turn it on and then scan the slot */
> > +	ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
> > +	if (ret)
> > +		return ret;
> > +
> > +scan:
> > +	if (presence == OPAL_PCI_SLOT_PRESENT) {
> > +		if (rescan) {
> > +			pci_lock_rescan_remove();
> > +			pci_add_pci_devices(php_slot->bus);
> > +			pci_unlock_rescan_remove();
> > +		}
> > +
> > +		/* Rescan for child hotpluggable slots */
> > +		php_slot->state = PNV_PHP_STATE_POPULATED;
> > +		if (rescan)
> > +			pnv_php_register(php_slot->dn);
> > +	} else {
> > +		php_slot->state = PNV_PHP_STATE_POPULATED;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int pnv_php_enable_slot(struct hotplug_slot *slot)
> > +{
> > +	struct pnv_php_slot *php_slot = container_of(slot,
> > +						     struct pnv_php_slot, slot);
> > +
> > +	return pnv_php_enable(php_slot, true);
> > +}
> > +
> > +static int pnv_php_disable_slot(struct hotplug_slot *slot)
> > +{
> > +	struct pnv_php_slot *php_slot = slot->private;
> > +	uint8_t power_state;
> > +	int ret;
> > +
> > +	if (php_slot->state != PNV_PHP_STATE_POPULATED)
> > +		return 0;
> > +
> > +	/* Remove all devices behind the slot */
> > +	pci_lock_rescan_remove();
> > +	pci_remove_pci_devices(php_slot->bus);
> > +	pci_unlock_rescan_remove();
> > +
> > +	/* Detach the child hotpluggable slots */
> > +	pnv_php_unregister(php_slot->dn);
> > +
> > +	/*
> > +	 * Check the power status and turn it off if necessary. If we
> > +	 * fail to get the power status, the power will be forced to
> > +	 * be off.
> > +	 */
> > +	ret = pnv_php_get_power_state(slot, &power_state);
> > +	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
> > +		ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_OFF);
> > +		if (ret)
> > +			dev_warn(&php_slot->pdev->dev, "Error %d powering off\n",
> 
> 
> Long line, checkpatch.pl should have warned :)
> 
> 
> > +				 ret);
> > +	}
> > +
> > +	/* Update slot state */
> > +	php_slot->state = PNV_PHP_STATE_REGISTERED;
> > +	return 0;
> > +}
> > +
> > +static struct hotplug_slot_ops php_slot_ops = {
> > +	.get_power_status	= pnv_php_get_power_state,
> > +	.get_adapter_status	= pnv_php_get_adapter_state,
> > +	.set_attention_status	= pnv_php_set_attention_state,
> > +	.enable_slot		= pnv_php_enable_slot,
> > +	.disable_slot		= pnv_php_disable_slot,
> > +};
> > +
> > +static void pnv_php_release(struct hotplug_slot *slot)
> > +{
> > +	struct pnv_php_slot *php_slot = slot->private;
> > +	unsigned long flags;
> > +
> > +	/* Remove from global or child list */
> > +	spin_lock_irqsave(&pnv_php_lock, flags);
> > +	list_del(&php_slot->link);
> > +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> > +
> > +	/* Detach from parent */
> > +	pnv_php_put_slot(php_slot);
> > +	pnv_php_put_slot(php_slot->parent);
> > +}
> > +
> > +static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
> > +{
> > +	struct device_node *parent = dn;
> > +	const __be64 *prop64;
> > +	const __be32 *prop32;
> > +
> > +	/*
> > +	 * The hotpluggable slot always has a compound Id, which
> > +	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
> > +	 * number, and compound indicator
> > +	 */
> > +	*id = (0x1ul << 63);
> 
> 
> Is this bit from the same space as 1<<60 as in pnv_eeh_bridge_reset()? If 
> so, it would be great to have all these id bits defined in one place.
> 
> 
> > +
> > +	/* Bus/Slot/Function number */
> > +	prop32 = of_get_property(dn, "reg", NULL);
> > +	if (!prop32)
> > +		return -ENXIO;
> > +	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
> > +
> > +	/* PHB Id */
> > +	while ((parent = of_get_parent(parent))) {
> > +		if (!PCI_DN(parent)) {
> > +			of_node_put(parent);
> > +			break;
> > +		}
> > +
> > +		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
> > +		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
> > +			of_node_put(parent);
> > +			continue;
> > +		}
> > +
> > +		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
> > +		if (!prop64) {
> > +			of_node_put(parent);
> > +			return -ENXIO;
> > +		}
> > +
> > +		*id |= be64_to_cpup(prop64);
> > +		of_node_put(parent);
> > +		return 0;
> > +	}
> > +
> > +	return -ENODEV;
> > +}
> > +
> > +static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
> > +{
> > +	struct pnv_php_slot *php_slot;
> > +	struct pci_bus *bus;
> > +	const char *label;
> > +	uint64_t id;
> > +
> > +	label = of_get_property(dn, "ibm,slot-label", NULL);
> > +	if (!label)
> > +		return NULL;
> > +
> > +	if (pnv_php_get_slot_id(dn, &id))
> > +		return NULL;
> > +
> > +	bus = pci_find_bus_by_node(dn);
> > +	if (!bus)
> > +		return NULL;
> > +
> > +	php_slot = kzalloc(sizeof(*php_slot), GFP_KERNEL);
> > +	if (!php_slot)
> > +		return NULL;
> > +
> > +	php_slot->name = kstrdup(label, GFP_KERNEL);
> > +	if (!php_slot->name) {
> > +		kfree(php_slot);
> > +		return NULL;
> > +	}
> > +
> > +	if (dn->child && PCI_DN(dn->child))
> > +		php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
> > +	else
> > +		php_slot->slot_no = -1;   /* Placeholder slot */
> > +
> > +	kref_init(&php_slot->kref);
> > +	php_slot->state	                = PNV_PHP_STATE_INITIALIZED;
> > +	php_slot->dn	                = dn;
> > +	php_slot->pdev	                = bus->self;
> > +	php_slot->bus	                = bus;
> > +	php_slot->id	                = id;
> > +	php_slot->power_state_check     = false;
> > +	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
> > +	php_slot->slot.ops              = &php_slot_ops;
> > +	php_slot->slot.info             = &php_slot->slot_info;
> > +	php_slot->slot.release          = pnv_php_release;
> > +	php_slot->slot.private          = php_slot;
> > +
> > +	INIT_WORK(&php_slot->work, pnv_php_work);
> > +	init_waitqueue_head(&php_slot->queue);
> > +	INIT_LIST_HEAD(&php_slot->children);
> > +	INIT_LIST_HEAD(&php_slot->link);
> > +
> > +	return php_slot;
> > +}
> > +
> > +static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
> > +{
> > +	struct pnv_php_slot *parent;
> > +	struct device_node *dn = php_slot->dn;
> > +	unsigned long flags;
> > +	int ret;
> > +
> > +	/* Check if the slot is registered or not */
> > +	parent = pnv_php_find_slot(php_slot->dn);
> > +	if (parent) {
> > +		pnv_php_put_slot(parent);
> > +		return -EEXIST;
> > +	}
> > +
> > +	/* Register PCI slot */
> > +	ret = pci_hp_register(&php_slot->slot, php_slot->bus,
> > +			      php_slot->slot_no, php_slot->name);
> > +	if (ret) {
> > +		dev_warn(&php_slot->pdev->dev, "Error %d registering slot\n",
> > +			 ret);
> > +		return ret;
> > +	}
> > +
> > +	/* Attach to the parent's child list or global list */
> > +	while ((dn = of_get_parent(dn))) {
> > +		if (!PCI_DN(dn)) {
> > +			of_node_put(dn);
> > +			break;
> > +		}
> > +
> > +		parent = pnv_php_find_slot(dn);
> > +		if (parent) {
> > +			of_node_put(dn);
> > +			break;
> > +		}
> > +
> > +		of_node_put(dn);
> > +	}
> > +
> > +	spin_lock_irqsave(&pnv_php_lock, flags);
> > +	php_slot->parent = parent;
> > +	if (parent)
> > +		list_add_tail(&php_slot->link, &parent->children);
> > +	else
> > +		list_add_tail(&php_slot->link, &pnv_php_slot_list);
> > +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> > +
> > +	php_slot->state = PNV_PHP_STATE_REGISTERED;
> > +	return 0;
> > +}
> > +
> > +static int pnv_php_register_one(struct device_node *dn)
> > +{
> > +	struct pnv_php_slot *php_slot;
> > +	const __be32 *prop32;
> > +	int ret;
> > +
> > +	/* Check if it's hotpluggable slot */
> > +	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
> > +	if (!prop32 || !of_read_number(prop32, 1))
> > +		return -ENXIO;
> > +
> > +	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
> > +	if (!prop32 || !of_read_number(prop32, 1))
> > +		return -ENXIO;
> > +
> > +	php_slot = pnv_php_alloc_slot(dn);
> > +	if (!php_slot)
> > +		return -ENODEV;
> > +
> > +	ret = pnv_php_register_slot(php_slot);
> > +	if (ret)
> > +		goto free_slot;
> > +
> > +	ret = pnv_php_enable(php_slot, false);
> > +	if (ret)
> > +		goto unregister_slot;
> > +
> > +	return 0;
> > +
> > +unregister_slot:
> > +	pnv_php_unregister_one(php_slot->dn);
> > +free_slot:
> > +	pnv_php_put_slot(php_slot);
> > +	return ret;
> > +}
> > +
> > +static void pnv_php_register(struct device_node *dn)
> > +{
> > +	struct device_node *child;
> > +
> > +	/*
> > +	 * The parent slots should be registered before their
> > +	 * child slots.
> > +	 */
> > +	for_each_child_of_node(dn, child) {
> > +		pnv_php_register_one(child);
> > +		pnv_php_register(child);
> > +	}
> > +}
> > +
> > +static void pnv_php_unregister_one(struct device_node *dn)
> > +{
> > +	struct pnv_php_slot *php_slot;
> > +
> > +	php_slot = pnv_php_find_slot(dn);
> > +	if (!php_slot)
> > +		return;
> > +
> > +	pnv_php_put_slot(php_slot);
> > +	pci_hp_deregister(&php_slot->slot);
> > +}
> > +
> > +static void pnv_php_unregister(struct device_node *dn)
> > +{
> > +	struct device_node *child;
> > +
> > +	/* The child slots should go before their parent slots */
> > +	for_each_child_of_node(dn, child) {
> > +		pnv_php_unregister(child);
> > +		pnv_php_unregister_one(child);
> > +	}
> > +}
> > +
> > +static struct notifier_block php_msg_nb = {
> > +	.notifier_call	= pnv_php_handle_msg,
> > +	.next		= NULL,
> > +	.priority	= 0,
> > +};
> > +
> > +static int __init pnv_php_init(void)
> > +{
> > +	struct device_node *dn;
> > +	int ret;
> > +
> > +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> > +
> > +	/* Register hotplug message handler */
> > +	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
> > +	if (ret) {
> > +		pr_warn("%s: Error %d registering hotplug notifier\n",
> > +			__func__, ret);
> > +		return ret;
> > +	}
> > +
> > +	/* Scan PHB nodes and their children */
> > +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> > +		pnv_php_register(dn);
> > +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> > +		pnv_php_register(dn);
> > +
> > +	return 0;
> > +}
> > +
> > +static void __exit pnv_php_exit(void)
> > +{
> > +	struct device_node *dn;
> > +
> > +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> > +		pnv_php_unregister(dn);
> > +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> > +		pnv_php_unregister(dn);
> > +
> > +	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
> > +}
> > +
> > +module_init(pnv_php_init);
> > +module_exit(pnv_php_exit);
> > +
> > +MODULE_VERSION(DRIVER_VERSION);
> > +MODULE_LICENSE("GPL v2");
> > +MODULE_AUTHOR(DRIVER_AUTHOR);
> > +MODULE_DESCRIPTION(DRIVER_DESC);
> >
> 
> 
> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 30/45] powerpc/pci: Delay populating pdn
  2016-04-19  8:19   ` Alexey Kardashevskiy
@ 2016-04-20  2:13     ` Gavin Shan
  2016-04-20  3:54       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  2:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 06:19:20PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>The pdn (struct pci_dn) instances are allocated from memblock or
>>bootmem when creating PCI controller (hoses) in setup_arch(). PCI
>>hotplug, which will be supported by proceeding patches, releases
>>PCI device nodes and their corresponding pdn on unplugging event.
>>The memory chunks for pdn instances allocated from memblock or
>>bootmem are hard to reused after being released.
>>
>>This delays creating pdn by pci_devs_phb_init() from setup_arch()
>>to core_initcall() so that they are allocated from slab. The memory
>>consumed by pdn can be released to system without problem during
>>PCI unplugging time. It indicates that pci_dn is unavailable in
>>setup_arch() and the the fixup on pdn (like AGP's) can't be carried
>>out that time. We have to do that in ppc_md.pcibios_root_bridge_prepare()
>>on maple/pasemi/powermac platforms where/when the pdn is available.
>>
>>At the mean while, the EEH device is created when pdn is populated,
>>meaning pdn and EEH device have same life cycle. In turn, we needn't
>>call eeh_dev_init() to create EEH device explicitly.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>
>Uff. It would not hurt to mention that  pcibios_root_bridge_prepare is called
>from subsys_initcall() which is executed after core_initcall() so the code
>flow does not change.
>

Yes, will do in next revision.

>Have you checked if there is anything in between
>core_initcall(pci_devs_phb_init) and subsys_initcall(pcibios_init) which
>might need device tree nodes? For example, subsys_initcall(pcibios_init)
>calls (eventually) pnv_pci_ioda_fixup(), if we are unlucky and pcibios_init()
>(and therefore pnv_pci_ioda_fixup() or what pseries/others do) is called
>before pcibios_init() - won't we crash or something?
>

I don't catch what you were asking. device-tree nodes (struct device_node)
are always there. This patch doesn't affect them. Perhaps you were talking
about pdn (PCI_DN). If it's the case, this patch delays creating pdn from
setup_arch() to core_initcall(pci_devs_phb_init). I don't see anything need
pdn between setup_arch() and core_initcall().

The changes introduced to powermac/pasemi platforms are: move fixing the child
pdns of the specifiec PHB's pdn from setup_arch() to subsys_initcall(pcibios_init).
I don't see anything between them needs the fixed pdns.

I don't understand how pcibios_init() is called before pcibios_init() in your
context. Sorry for my bad English. Perhaps you're asking the the called sequence
on core_initcall() and subsys_init()? If so, they're defined like below:

#define core_initcall(fn)		__define_initcall(fn, 1)
#define subsys_initcall(fn)		__define_initcall(fn, 4)

>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID
  2016-04-19  9:28       ` Alexey Kardashevskiy
  (?)
@ 2016-04-20  2:28       ` Gavin Shan
  2016-04-20  4:14         ` Alexey Kardashevskiy
  -1 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  2:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 07:28:20PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>PowerNV platforms runs on top of skiboot firmware that includes
>>changes to support PCI slots. PCI slots are identified by PHB's
>>ID or the combo of that and PCI slot ID.
>>
>>This changes the EEH PowerNV backend to support PCI slots:
>>
>>    * Rename arguments of opal_pci_reset() and opal_pci_poll().
>>    * One more argument (PCI slot's state) added to opal_pci_poll().
>>    * Drop pnv_eeh_phb_poll() and introduce a enhanced similar
>>      function pnv_pci_poll() that will be used by PowerNV hotplug
>>      backends.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/opal.h              |  4 +--
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 42 ++++++----------------------
>>  arch/powerpc/platforms/powernv/pci.c         | 21 ++++++++++++++
>>  arch/powerpc/platforms/powernv/pci.h         |  1 +
>>  4 files changed, 32 insertions(+), 36 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>>index 07a99e6..9e0039f 100644
>>--- a/arch/powerpc/include/asm/opal.h
>>+++ b/arch/powerpc/include/asm/opal.h
>>@@ -131,7 +131,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
>>  int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
>>  					uint16_t dma_window_number, uint64_t pci_start_addr,
>>  					uint64_t pci_mem_size);
>>-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
>>+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
>>
>>  int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
>>  				   uint64_t diag_buffer_len);
>>@@ -148,7 +148,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
>>  int64_t opal_set_system_attention_led(uint8_t led_action);
>>  int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
>>  			    __be16 *pci_error_type, __be16 *severity);
>>-int64_t opal_pci_poll(uint64_t phb_id);
>>+int64_t opal_pci_poll(uint64_t id, uint8_t *state);
>>  int64_t opal_return_cpu(void);
>>  int64_t opal_check_token(uint64_t token);
>>  int64_t opal_reinit_cpus(uint64_t flags);
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index c7454ba..e23b063 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -717,28 +717,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
>>  	return ret;
>>  }
>>
>>-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
>>-{
>>-	s64 rc = OPAL_HARDWARE;
>>-
>>-	while (1) {
>>-		rc = opal_pci_poll(phb->opal_id);
>>-		if (rc <= 0)
>>-			break;
>>-
>>-		if (system_state < SYSTEM_RUNNING)
>>-			udelay(1000 * rc);
>>-		else
>>-			msleep(rc);
>>-	}
>>-
>>-	return rc;
>>-}
>>-
>>  int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>>  {
>>  	struct pnv_phb *phb = hose->private_data;
>>  	s64 rc = OPAL_HARDWARE;
>>+	int ret;
>>
>>  	pr_debug("%s: Reset PHB#%x, option=%d\n",
>>  		 __func__, hose->global_number, option);
>>@@ -753,8 +736,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>>  		rc = opal_pci_reset(phb->opal_id,
>>  				    OPAL_RESET_PHB_COMPLETE,
>>  				    OPAL_DEASSERT_RESET);
>>-	if (rc < 0)
>>-		goto out;
>>
>>  	/*
>>  	 * Poll state of the PHB until the request is done
>>@@ -762,24 +743,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>>  	 * reset followed by hot reset on root bus. So we also
>>  	 * need the PCI bus settlement delay.
>>  	 */
>>-	rc = pnv_eeh_phb_poll(phb);
>>-	if (option == EEH_RESET_DEACTIVATE) {
>>+	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
>>+	if (option == EEH_RESET_DEACTIVATE && !ret) {
>>  		if (system_state < SYSTEM_RUNNING)
>>  			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
>>  		else
>>  			msleep(EEH_PE_RST_SETTLE_TIME);
>>  	}
>>-out:
>>-	if (rc != OPAL_SUCCESS)
>>-		return -EIO;
>>
>>-	return 0;
>>+	return ret;
>>  }
>>
>>  static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>>  {
>>  	struct pnv_phb *phb = hose->private_data;
>>  	s64 rc = OPAL_HARDWARE;
>>+	int ret;
>>
>>  	pr_debug("%s: Reset PHB#%x, option=%d\n",
>>  		 __func__, hose->global_number, option);
>>@@ -801,18 +780,13 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>>  		rc = opal_pci_reset(phb->opal_id,
>>  				    OPAL_RESET_PCI_HOT,
>>  				    OPAL_DEASSERT_RESET);
>>-	if (rc < 0)
>>-		goto out;
>>
>>  	/* Poll state of the PHB until the request is done */
>>-	rc = pnv_eeh_phb_poll(phb);
>>-	if (option == EEH_RESET_DEACTIVATE)
>>+	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
>>+	if (option == EEH_RESET_DEACTIVATE && !ret)
>>  		msleep(EEH_PE_RST_SETTLE_TIME);
>>-out:
>>-	if (rc != OPAL_SUCCESS)
>>-		return -EIO;
>>
>>-	return 0;
>>+	return ret;
>>  }
>>
>>  static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>index b87a315..a458703 100644
>>--- a/arch/powerpc/platforms/powernv/pci.c
>>+++ b/arch/powerpc/platforms/powernv/pci.c
>>@@ -42,6 +42,27 @@
>>  #define cfg_dbg(fmt...)	do { } while(0)
>>  //#define cfg_dbg(fmt...)	printk(fmt)
>>
>>+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
>>+{
>>+	while (rval > 0) {
>>+		if (system_state < SYSTEM_RUNNING)
>>+			udelay(1000 * rval);
>>+		else
>>+			msleep(rval);
>>+
>>+		rval = opal_pci_poll(id, state);
>>+	}
>>+
>>+	/*
>>+	 * The caller expects to retrieve additional
>>+	 * information if the last argument isn't NULL.
>>+	 */
>>+	if (rval == OPAL_SUCCESS && state)
>>+		rval = opal_pci_poll(id, state);
>
>
>Old OPAL won't touch @state so whatever garbage was there will stay there as
>the only caller which is passing not-NULL there is pnv_php_get_power_state()
>and it does not initialize @power_state (it is in "[PATCH v8 45/45]
>PCI/hotplug: PowerPC PowerNV PCI hotplug driver").
>

Old OPAL without exposing hotpluggable slots won't have this case. I mean
pnv_php_get_power_state() won't be called on old OPAL.

>
>btw how will new OPAL react if old kernel is running, i.e. not passing @state
>at all? If it is initialized to NULL somewher - fine but what exactly does
>this initialization and makes sure that OPAL won't see garbage as a second
>parameter?
>

@state is always NULL for old kernel + new OPAL. @state is used in
PCI hotplug functionality in OPAL only. As old kernel doesn't support
PCI hotplug, @state is never used. I'm not sure it's the answer you
want?

>When ABI like this changes, I expect to see opal_pci_poll2() or
>opal_pci_poll_ex() rather than just an additional parameter to
>opal_pci_poll()...
>

It's a good suggestion but it would be nicer if you raised this
early. One question I have: current opal_pci_poll() is enough
to cover all cases, why we need introduce and maintain another
similar one? Sorry that I don't see the reason from your context
and could you please provide more details?

>>+
>>+	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
>>+}
>>+
>>  #ifdef CONFIG_PCI_MSI
>>  int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
>>  {
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 0cddde3..6857703 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -192,6 +192,7 @@ extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
>>  		unsigned long *hpa, enum dma_data_direction *direction);
>>  extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
>>
>>+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state);
>>  void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
>>  				unsigned char *log_buff);
>>  int pnv_pci_cfg_read(struct pci_dn *pdn,
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure
  2016-04-19  9:34   ` Alexey Kardashevskiy
@ 2016-04-20  2:33     ` Gavin Shan
  2016-04-20  4:17       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  2:33 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 07:34:55PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>The skiboot firmware might provide the PCI slot reset capability
>>which is identified by property "ibm,reset-by-firmware" on the
>>PCI slot associated device node.
>>
>>This checks the property. If it exists, the reset request is routed
>>to firmware. Otherwise, the reset is done by kernel as before.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +++++++++++++++++++++++++++-
>>  1 file changed, 40 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index e23b063..c8a5217 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -789,7 +789,7 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>>  	return ret;
>>  }
>>
>>-static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>+static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>  {
>>  	struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
>>  	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>@@ -840,6 +840,45 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>  	return 0;
>>  }
>>
>>+static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
>>+{
>>+	struct pci_controller *hose;
>>+	struct pnv_phb *phb;
>>+	struct device_node *dn = pdev ? pci_device_to_OF_node(pdev) : NULL;
>>+	uint64_t id = (0x1ul << 60);
>
>
>What is this 1<<60 for?
>
>

As you replied in other threads, it's worthy to have some macros for this
piece of business. This bit indicates the ID of the slot behind a switch
port. If this bit is cleared, the ID represents a PHB slot.

>>+	uint8_t scope;
>>+	int64_t rc;
>>+
>>+	/*
>>+	 * If the firmware can't handle it, we will issue hot reset
>>+	 * on the secondary bus despite the requested reset type.
>>+	 */
>>+	if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
>>+		return __pnv_eeh_bridge_reset(pdev, option);
>>+
>>+	/* The firmware can handle the request */
>>+	switch (option) {
>>+	case EEH_RESET_HOT:
>>+		scope = OPAL_RESET_PCI_HOT;
>>+		break;
>>+	case EEH_RESET_FUNDAMENTAL:
>>+		scope = OPAL_RESET_PCI_FUNDAMENTAL;
>>+		break;
>>+	case EEH_RESET_DEACTIVATE:
>>+		return 0;
>>+	default:
>>+		dev_warn(&pdev->dev, "%s: Unsupported reset %d\n",
>>+			 __func__, option);
>
>
>Can the userspace trigger this case (via VFIO-EEH) and flood dmesg?
>

It depends on how you defined message flooding actually. It's abnormal
path caused by program internal error, not external users.

>
>
>>+		return -EINVAL;
>>+	}
>>+
>>+	hose = pci_bus_to_host(pdev->bus);
>>+	phb = hose->private_data;
>>+	id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
>>+	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
>>+	return pnv_pci_poll(id, rc, NULL);
>>+}
>>+
>>  static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
>>  {
>>  	int *freset = data;
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status
  2016-04-19  9:39     ` Alexey Kardashevskiy
@ 2016-04-20  2:36       ` Gavin Shan
  2016-04-20  4:25         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  2:36 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 07:39:34PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>This exports 4 functins, which base on the corresponding OPAL
>
>
>s/functins/functions/
>

Thanks.

>>APIs to get/set PCI slot status. Those functions are going to
>>be used by PowerNV PCI hotplug driver:
>>
>>    pnv_pci_get_device_tree()    opal_get_device_tree()
>>    pnv_pci_get_presence_state() opal_pci_get_presence_state()
>>    pnv_pci_get_power_state()    opal_pci_get_power_state()
>>    pnv_pci_set_power_state()    opal_pci_set_power_state()
>>
>>Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
>>unregister}() to allow registration and unregistration of PCI hotplug
>>notifier, which will be used to receive PCI hotplug message from
>>skiboot firmware in PowerNV PCI hotplug driver.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/opal-api.h            | 17 ++++++-
>>  arch/powerpc/include/asm/opal.h                |  4 ++
>>  arch/powerpc/include/asm/pnv-pci.h             |  7 +++
>>  arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
>>  arch/powerpc/platforms/powernv/pci.c           | 66 ++++++++++++++++++++++++++
>>  5 files changed, 97 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
>>index f8faaae..a6af338 100644
>>--- a/arch/powerpc/include/asm/opal-api.h
>>+++ b/arch/powerpc/include/asm/opal-api.h
>>@@ -158,7 +158,11 @@
>>  #define OPAL_LEDS_SET_INDICATOR			115
>>  #define OPAL_CEC_REBOOT2			116
>>  #define OPAL_CONSOLE_FLUSH			117
>>-#define OPAL_LAST				117
>>+#define OPAL_GET_DEVICE_TREE			118
>>+#define OPAL_PCI_GET_PRESENCE_STATE		119
>>+#define OPAL_PCI_GET_POWER_STATE		120
>>+#define OPAL_PCI_SET_POWER_STATE		121
>>+#define OPAL_LAST				121
>>
>>  /* Device tree flags */
>>
>>@@ -344,6 +348,16 @@ enum OpalPciResetState {
>>  	OPAL_ASSERT_RESET   = 1
>>  };
>>
>>+enum OpalPciSlotPresentenceState {
>>+	OPAL_PCI_SLOT_EMPTY	= 0,
>>+	OPAL_PCI_SLOT_PRESENT	= 1
>>+};
>>+
>>+enum OpalPciSlotPowerState {
>>+	OPAL_PCI_SLOT_POWER_OFF	= 0,
>>+	OPAL_PCI_SLOT_POWER_ON	= 1
>>+};
>>+
>>  enum OpalSlotLedType {
>>  	OPAL_SLOT_LED_TYPE_ID = 0,	/* IDENTIFY LED */
>>  	OPAL_SLOT_LED_TYPE_FAULT = 1,	/* FAULT LED */
>>@@ -378,6 +392,7 @@ enum opal_msg_type {
>>  	OPAL_MSG_DPO,
>>  	OPAL_MSG_PRD,
>>  	OPAL_MSG_OCC,
>>+	OPAL_MSG_PCI_HOTPLUG,
>>  	OPAL_MSG_TYPE_MAX,
>>  };
>>
>>diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>>index 9e0039f..899bcb941 100644
>>--- a/arch/powerpc/include/asm/opal.h
>>+++ b/arch/powerpc/include/asm/opal.h
>>@@ -209,6 +209,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
>>  		uint64_t size, uint64_t token);
>>  int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
>>  		uint64_t token);
>>+int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
>>+int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
>>+int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
>>+int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);
>>
>>  /* Internal functions */
>>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>>diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
>>index 6f77f71..d9d095b 100644
>>--- a/arch/powerpc/include/asm/pnv-pci.h
>>+++ b/arch/powerpc/include/asm/pnv-pci.h
>>@@ -13,6 +13,13 @@
>>  #include <linux/pci.h>
>>  #include <misc/cxl-base.h>
>>
>>+extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
>>+extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
>>+extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
>>+extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
>>+extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
>>+extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
>>+
>>  int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
>>  int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
>>  			   unsigned int virq);
>>diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
>>index e45b88a..3ea1a855 100644
>>--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
>>+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
>>@@ -302,3 +302,7 @@ OPAL_CALL(opal_prd_msg,				OPAL_PRD_MSG);
>>  OPAL_CALL(opal_leds_get_ind,			OPAL_LEDS_GET_INDICATOR);
>>  OPAL_CALL(opal_leds_set_ind,			OPAL_LEDS_SET_INDICATOR);
>>  OPAL_CALL(opal_console_flush,			OPAL_CONSOLE_FLUSH);
>>+OPAL_CALL(opal_get_device_tree,			OPAL_GET_DEVICE_TREE);
>>+OPAL_CALL(opal_pci_get_presence_state,		OPAL_PCI_GET_PRESENCE_STATE);
>>+OPAL_CALL(opal_pci_get_power_state,		OPAL_PCI_GET_POWER_STATE);
>>+OPAL_CALL(opal_pci_set_power_state,		OPAL_PCI_SET_POWER_STATE);
>>diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>index a458703..206385f 100644
>>--- a/arch/powerpc/platforms/powernv/pci.c
>>+++ b/arch/powerpc/platforms/powernv/pci.c
>>@@ -63,6 +63,72 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
>>  	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
>>  }
>>
>>+int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len)
>>+{
>>+	int64_t rc;
>>+
>>+	if (!opal_check_token(OPAL_GET_DEVICE_TREE))
>>+		return -ENXIO;
>>+
>>+	rc = opal_get_device_tree(phandle, (uint64_t)buf, len);
>>+	if (rc != OPAL_SUCCESS)
>>+		return -EIO;
>>+
>>+	return 0;
>>+}
>>+EXPORT_SYMBOL_GPL(pnv_pci_get_device_tree);
>>+
>>+int pnv_pci_get_presence_state(uint64_t id, uint8_t *state)
>>+{
>>+	int64_t rc;
>>+
>>+	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATE))
>>+		return -ENXIO;
>>+
>>+	rc = opal_pci_get_presence_state(id, state);
>>+	if (rc != OPAL_SUCCESS)
>>+		return -EIO;
>>+
>>+	return 0;
>>+}
>>+EXPORT_SYMBOL_GPL(pnv_pci_get_presence_state);
>>+
>>+int pnv_pci_get_power_state(uint64_t id, uint8_t *state)
>>+{
>>+	int64_t rc;
>>+
>>+	if (!opal_check_token(OPAL_PCI_GET_POWER_STATE))
>>+		return -ENXIO;
>>+
>>+	rc = opal_pci_get_power_state(id, state);
>
>
>Out of curiosity - if rc==OPAL_SUCCESS, @state should already contain the
>correct state and you do not have to call pnv_pci_poll() (which will call
>opal_pci_poll() immediately), is that correct?
>

It's not correct. opal_pci_get_power_state() to starts a state machine
in the OPAL firmware and pnv_pci_poll() keeps pushing the state machine
moving forward.

>Anyway, looks correct.
>
>
>Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>
>
>
>>+	return pnv_pci_poll(id, rc, state);
>>+}
>>+EXPORT_SYMBOL_GPL(pnv_pci_get_power_state);
>>+
>>+int pnv_pci_set_power_state(uint64_t id, uint8_t state)
>>+{
>>+	int64_t rc;
>>+
>>+	if (!opal_check_token(OPAL_PCI_SET_POWER_STATE))
>>+		return -ENXIO;
>>+
>>+	rc = opal_pci_set_power_state(id, state);
>>+	return pnv_pci_poll(id, rc, NULL);
>>+}
>>+EXPORT_SYMBOL_GPL(pnv_pci_set_power_state);
>>+
>>+int pnv_pci_hotplug_notifier_register(struct notifier_block *nb)
>>+{
>>+	return opal_message_notifier_register(OPAL_MSG_PCI_HOTPLUG, nb);
>>+}
>>+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_register);
>>+
>>+int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb)
>>+{
>>+	return opal_message_notifier_unregister(OPAL_MSG_PCI_HOTPLUG, nb);
>>+}
>>+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_unregister);
>>+
>>  #ifdef CONFIG_PCI_MSI
>>  int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
>>  {
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC
  2016-04-19  9:42   ` Alexey Kardashevskiy
@ 2016-04-20  2:38     ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  2:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 07:42:01PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>The device tree will change dynamically in PowerNV PCI hotplug
>>driver. This enables CONFIG_OF_DYNAMIC to support that.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>>
>>diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
>>index 604190c..e7b1ad7 100644
>>--- a/arch/powerpc/platforms/powernv/Kconfig
>>+++ b/arch/powerpc/platforms/powernv/Kconfig
>>@@ -18,6 +18,7 @@ config PPC_POWERNV
>>  	select CPU_FREQ_GOV_ONDEMAND
>>  	select CPU_FREQ_GOV_CONSERVATIVE
>>  	select PPC_DOORBELL
>>+	select OF_DYNAMIC
>
>
>Why not to enable it in 45/45 under config HOTPLUG_PCI_POWERNV? Is there any
>benefit of having it always on if HOTPLUG_PCI_POWERNV is not enabled?
>

Agree, I will move accordingly in next revision. Note that we have to move
it back here once something else depends on OF_DYNAMIC in future.

>>  	default y
>>
>>  config OPAL_PRD
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 40/45] drivers/of: Split unflatten_dt_node()
  2016-02-17 14:30   ` Rob Herring
@ 2016-04-20  2:38     ` Gavin Shan
  2016-05-02  2:02     ` Gavin Shan
  1 sibling, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  2:38 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree,
	Benjamin Herrenschmidt, Michael Ellerman, aik, dja,
	Bjorn Helgaas, Grant Likely

On Wed, Feb 17, 2016 at 08:30:42AM -0600, Rob Herring wrote:
>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> The function unflatten_dt_node() is called recursively to unflatten
>> device nodes and properties in the FDT blob. It looks complicated
>> and hard to be understood.
>>
>> This splits the function into 3 functions: populate_properties(),
>> populate_node() and unflatten_dt_node(). populate_properties(),
>> which is called by populate_node(), creates properties for the
>> indicated device node. The later one creates the device nodes
>> from FDT blob. populate_node() gets the offset in FDT blob for
>> next device nodes and then calls populate_node(). No logical
>> changes introduced.
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  drivers/of/fdt.c | 249 ++++++++++++++++++++++++++++++++-----------------------
>>  1 file changed, 147 insertions(+), 102 deletions(-)
>
>One nit, otherwise:
>
>Acked-by: Rob Herring <robh@kernel.org>
>
>[...]
>
>> +               /* And we process the "ibm,phandle" property
>> +                * used in pSeries dynamic device tree
>> +                * stuff
>> +                */
>> +               if (!strcmp(pname, "ibm,phandle"))
>> +                       np->phandle = be32_to_cpup(val);
>> +
>> +               pp->name   = (char *)pname;
>> +               pp->length = sz;
>> +               pp->value  = (__be32 *)val;
>
>This cast should not be needed.
>

Rob, very sorry to response so lately. I will fix it up in next revision.

>> +               *pprev     = pp;
>> +               pprev      = &pp->next;
>> +       }
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support
  2016-04-15 16:10             ` Rob Herring
@ 2016-04-20  2:40               ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  2:40 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, Alistair Popple, linuxppc-dev, Alexey Kardashevskiy,
	devicetree, Grant Likely, linux-pci, Bjorn Helgaas, dja

On Fri, Apr 15, 2016 at 11:10:21AM -0500, Rob Herring wrote:
>On Wed, Apr 13, 2016 at 8:30 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> On Thu, Apr 14, 2016 at 09:57:32AM +1000, Alistair Popple wrote:
>>>Hi Gavin,
>>>
>>><snip>
>>>
>>>> >Why exactly cannot EEH reset changes go to a smaller separate patchset
>>>> >(before hotplug)?
>>>> >
>>>>
>>>> As I explained before, the patchset's order is: PCI generic part,
>>>> PowerNV PCI related, EEH related, device-tree part and hotplug driver.
>>>>
>>>> The EEH reset change is included in PATCH[37/45]. There is no point
>>>> to reorder the patches.
>>>
>>>I don't understand all of the dependencies but if possible splitting the
>>>series up into a set of smaller self-contained patch series makes things
>>>easier to review and may make it easier for you to get this functionality
>>>reviewed and accepted into upstream.
>>>
>>
>> Thanks, Alistair. I will move those cleanup/refactor related patches
>> to form a separate series which is expected to be merged first. That
>> will helps the reviewers to focus on the patches with complicated
>> changes as you suggested. Alexey, please let me know if that way is
>> you like to see or not.
>
>As I said last cycle, I'll happily take the DT refactoring patches
>separately, but you have to tell me if you want me to apply them and
>it has to be well before the merge window.
>

Thanks, Rob. I hope to post next revision (v9) soon and the device-tree
related cleanup patches should be ready for next merge window in it.

>Rob
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC
  2016-04-20  0:22       ` Gavin Shan
@ 2016-04-20  2:55         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  2:55 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 10:22 AM, Gavin Shan wrote:
> On Wed, Apr 13, 2016 at 05:47:59PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>> This enables M64 window on P7IOC, which has been enabled on PHB3.
>>> Different from PHB3 where 16 M64 BARs are supported and each of
>>> them can be owned by one particular PE# exclusively or divided
>>> evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>> of them are divided to 8 segments. So every P7IOC PHB supports
>>> 128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>> M64DT, indicating that one M64 segment can only be pinned to the
>>> fixed PE#. In order to have same code to support M64 on P7IOC and
>>> PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>> of them is pinned to the fixed PE# by bypassing the function of
>>> M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>> and PHB3 to support M64.
>>
>> The comment is not quite correct - in addition to pnv_ioda1_init_m64(), you
>> also need to hack pnv_ioda_pick_m64_pe().
>>
>
> Right, will talk about the changes to pnv_ioda_pick_m64_pe() in the
> commit log of next revision.
>
>>
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>>>   arch/powerpc/platforms/powernv/pci.h      |  3 ++
>>>   2 files changed, 86 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 1dc663a..8488238 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -246,6 +246,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>>>   	}
>>>   }
>>>
>>> +static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>> +{
>>> +	struct resource *r;
>>> +	int index;
>>> +
>>> +	/*
>>> +	 * There are 16 M64 BARs, each of which has 8 segments. So
>>> +	 * there are as many M64 segments as the maximum number of
>>> +	 * PEs, which is 128.
>>> +	 */
>>> +	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
>>> +		unsigned long base, segsz = phb->ioda.m64_segsize;
>>> +		int64_t rc;
>>> +
>>> +		base = phb->ioda.m64_base +
>>> +		       index * PNV_IODA1_M64_SEGS * segsz;
>>> +		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>> +				OPAL_M64_WINDOW_TYPE, index, base, 0,
>>> +				PNV_IODA1_M64_SEGS * segsz);
>>> +		if (rc != OPAL_SUCCESS) {
>>> +			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>> +				rc, phb->hose->global_number, index);
>>> +			goto fail;
>>> +		}
>>> +
>>> +		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>> +				OPAL_M64_WINDOW_TYPE, index,
>>> +				OPAL_ENABLE_M64_SPLIT);
>>> +		if (rc != OPAL_SUCCESS) {
>>> +			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>> +				rc, phb->hose->global_number, index);
>>> +			goto fail;
>>> +		}
>>> +	}
>>> +
>>> +	/*
>>> +	 * Exclude the segment used by the reserved PE, which
>>> +	 * is expected to be 0 or last supported PE#.
>>> +	 */
>>> +	r = &phb->hose->mem_resources[1];
>>> +	if (phb->ioda.reserved_pe_idx == 0)
>>> +		r->start += phb->ioda.m64_segsize;
>>> +	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>> +		r->end -= phb->ioda.m64_segsize;
>>> +	else
>>> +		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>> +			phb->ioda.reserved_pe_idx);
>>> +
>>> +	return 0;
>>> +
>>> +fail:
>>> +	for ( ; index >= 0; index--)
>>> +		opal_pci_phb_mmio_enable(phb->opal_id,
>>> +			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
>>> +
>>> +	return -EIO;
>>> +}
>>> +
>>>   static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>>   				    unsigned long *pe_bitmap,
>>>   				    bool all)
>>> @@ -315,6 +373,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   			pe->master = master_pe;
>>>   			list_add_tail(&pe->list, &master_pe->slaves);
>>>   		}
>>> +
>>> +		/*
>>> +		 * P7IOC supports M64DT, which helps mapping M64 segment
>>> +		 * to one particular PE#. However, PHB3 has fixed mapping
>>> +		 * between M64 segment and PE#. In order to have same logic
>>> +		 * for P7IOC and PHB3, we enforce fixed mapping between M64
>>> +		 * segment and PE# on P7IOC.
>>> +		 */
>>> +		if (phb->type == PNV_PHB_IODA1) {
>>> +			int64_t rc;
>>> +
>>> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> +					pe->pe_number, OPAL_M64_WINDOW_TYPE,
>>> +					pe->pe_number / PNV_IODA1_M64_SEGS,
>>> +					pe->pe_number % PNV_IODA1_M64_SEGS);
>>> +			if (rc != OPAL_SUCCESS)
>>> +				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
>>> +					__func__, rc, phb->hose->global_number,
>>> +					pe->pe_number);
>>> +		}
>>
>>
>> Cannot this go to pnv_ioda1_init_m64()? From the commit log I understood that
>> this setup is supposed to be static so it can be done once. Or it is sort of
>> enable/disable PE? Then make is a helper and call it ioda1_pe_enable() or
>> something.
>>
>
> No, we cannot. This associates the M64 segments with the PE# and it can be
> done in pnv_ioda1_init_m64() where PE# is unknown.

Ok.


> I don't understand what
> you meant by "sort of enable/disable PE". PE starts the jurney when PELTM
> has the corresponding mapping and it doesn't depend on M64 mapping necessarily.

Ok.


>>
>>>   	}
>>>
>>>   	kfree(pe_alloc);
>>> @@ -329,8 +407,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>   	const u32 *r;
>>>   	u64 pci_addr;
>>>
>>> -	/* FIXME: Support M64 for P7IOC */
>>> -	if (phb->type != PNV_PHB_IODA2) {
>>> +	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
>>>   		pr_info("  Not support M64 window\n");
>>>   		return;
>>>   	}
>>> @@ -364,7 +441,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>
>>>   	/* Use last M64 BAR to cover M64 window */
>>>   	phb->ioda.m64_bar_idx = 15;
>>> -	phb->init_m64 = pnv_ioda2_init_m64;
>>> +	if (phb->type == PNV_PHB_IODA1)
>>> +		phb->init_m64 = pnv_ioda1_init_m64;
>>> +	else
>>> +		phb->init_m64 = pnv_ioda2_init_m64;
>>>   	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>>>   	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>>>   }
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index 866a5ea..00539ff 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -82,6 +82,9 @@ struct pnv_ioda_pe {
>>>   	struct list_head	list;
>>>   };
>>>
>>> +#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
>>> +#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
>>> +
>>
>> Why here, not in the beginning of arch/powerpc/platforms/powernv/pci-ioda.c ?
>> It exposes symbols but nothing is using them (except pci-ioda.c) and code
>> browsing gets bit more inconvenient.
>>
>
> It would be personal taste: those macros is tied with the definition
> of "struct pnv_ioda_pe" or "struct pnv_ioda_phb".


Neither of these structs uses these macros though.

> On the other hand,
> those macros have to be in the header file once we split pci-ioda.c
> to multiple source files some day.

Why? When/if we do such a split, these macros will simply go to pci-ioda1.c 
and won't pollute global powernv/ioda namespace.


> However, I can move them to
> pci-ioda.c if you really want see them there. Let me know anyway.


Please move. Thanks.


>
>>
>>>   #define PNV_PHB_FLAG_EEH	(1 << 0)
>>>
>>>   struct pnv_phb {
>>>
>>
>>
>> --
>> Alexey
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time
  2016-04-20  1:12     ` Gavin Shan
@ 2016-04-20  3:00       ` Alexey Kardashevskiy
  2016-04-20  3:35         ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  3:00 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 11:12 AM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 02:16:42PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> Currently, the PEs and their associated resources are assigned
>>> in ppc_md.pcibios_fixup() except those used by SRIOV VFs.
>>
>> But this new code does not affect IOV and VF's PEs will still be created
>> somewhere else rather than pnv_pci_setup_bridge()?
>>
>
> Correct. VF PEs cannot be created in pnv_pci_setup_bridge() as the PF's
> IOV capability isn't enabled at that point.
>
>>
>>> The
>>> function is called for once after PCI probing and resources
>>> assignment is completed. So it isn't hotplug friendly.
>>>
>>> This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
>>> is called on the event during system bootup and PCI hotplug: updating
>>> PCI bridge's windows after resource assignment/reassignment are done.
>>> For partial hotplug case, where not all PCI devices belonging to the
>>> PE are unplugged and plugged again, we just need unbinding/binding
>>> the affected PCI devices with the corresponding PE without creating
>>> new one.
>>>
>>> As there is no upstream bridge for root bus that needs to be covered
>>> by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
>>> before any other PEs can be created, as PE for root bus is the ancestor
>>> to anyone else.
>>
>> We did not need a root bus PE before? What is the other PE reserved for?
>> Comments only say "reserved"...
>>
>
> No, A PE for root bus is needed before.

Ok. We needed a PE for the root bus and we need it now. What changed? Why 
do you reserve another PE?


>
other PEs can be for the PCI bus
> originated from root port and the subordinate domains.
>
>>>
>>> Also, the windows of root port or the upstream port of PCIe switch behind
>>> root port are extended to be PHB's apertures to accommodate the additional
>>> resources needed by newly plugged devices based on the fact: hotpluggable
>>> slot is behind root port or downstream port of the PCIe switch behind
>>> root port. The extension for those PCI brdiges' windows is done in
>>> ppc_md.pcibios_setup_bridge() as well.
>>
>>
>> This patch seems to be doing way too many things, hard to follow.
>>
>> Could you please split the patch into smaller chunks? For example (you can do
>> it totally different):
>> - move pnv_pci_ioda_setup_opal_tce_kill()
>> - move PE creation from pnv_pci_ioda_fixup() to pnv_pci_setup_bridge();
>> - add pnv_pci_fixup_bridge_resources()
>> - add an extra reserved PE for the root bus (and all this magic with
>> root_pe_idx/root_pe_populated)
>> - ...
>>
>
> I'll evaluate it later. It's always nice to have small patches. Thanks
> for the comments.
>
>>
>>
>>
>> --
>> Alexey
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table
  2016-04-20  1:15     ` Gavin Shan
@ 2016-04-20  3:17       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  3:17 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 11:15 AM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 02:28:51PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> pnv_pci_ioda_table_free_pages() can be reused to release the IODA1
>>> TCE table when releasing IODA1 PE in subsequent patches.
>>>
>>> This renames the following functions to support releasing IODA1 TCE
>>> table: pnv_pci_ioda2_table_free_pages() to pnv_pci_ioda_table_free_pages(),
>>> pnv_pci_ioda2_table_do_free_pages() to pnv_pci_ioda_table_do_free_pages().
>>> No logical changes introduced.
>>
>> I can only see renaming here but it seems (from
>> IODA_architecture_04-14-2008.pdf) that IODA1 does not support multi-level TCE
>> tables in the way IODA2 does.
>>
>
> Note that the change was proposed by you in last round.

Hm. I do not recall proposing exactly that :-/

> Yes, TVE on P7IOC
> doesn't support multiple levels of TCE tables.

I thought it supports 2 levels.

> In this case, we will always
> have "tbl->it_indirect_levels" to 1, right?

Nope, it will be 0. But it is still ugly to use release function but not to 
use its allocating counterpart which is pnv_pci_ioda2_table_alloc_pages().

I suggest having pnv_pci_ioda1_table_free_pages() which will be just a 
single free_pages() call. If you need some ioda*-common code to free a 
table, then define pnv_ioda1_iommu_ops::free().

>
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 18 +++++++++---------
>>>   1 file changed, 9 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index d360607..077f9db 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -51,7 +51,7 @@
>>>   #define POWERNV_IOMMU_DEFAULT_LEVELS	1
>>>   #define POWERNV_IOMMU_MAX_LEVELS	5
>>>
>>> -static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
>>> +static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl);
>>>
>>>   static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
>>>   			    const char *fmt, ...)
>>> @@ -1352,7 +1352,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>>>   		iommu_group_put(pe->table_group.group);
>>>   		BUG_ON(pe->table_group.group);
>>>   	}
>>> -	pnv_pci_ioda2_table_free_pages(tbl);
>>> +	pnv_pci_ioda_table_free_pages(tbl);
>>>   	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>>   }
>>>
>>> @@ -1946,7 +1946,7 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, long index,
>>>
>>>   static void pnv_ioda2_table_free(struct iommu_table *tbl)
>>>   {
>>> -	pnv_pci_ioda2_table_free_pages(tbl);
>>> +	pnv_pci_ioda_table_free_pages(tbl);
>>>   	iommu_free_table(tbl, "pnv");
>>>   }
>>>
>>> @@ -2448,7 +2448,7 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int nid, unsigned shift,
>>>   	return addr;
>>>   }
>>>
>>> -static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>> +static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
>>>   		unsigned long size, unsigned level);
>>>
>>>   static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>> @@ -2487,7 +2487,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>>   	 * release partially allocated table.
>>>   	 */
>>>   	if (offset < tce_table_size) {
>>> -		pnv_pci_ioda2_table_do_free_pages(addr,
>>> +		pnv_pci_ioda_table_do_free_pages(addr,
>>>   				1ULL << (level_shift - 3), levels - 1);
>>>   		return -ENOMEM;
>>>   	}
>>> @@ -2505,7 +2505,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>>   	return 0;
>>>   }
>>>
>>> -static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>> +static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
>>>   		unsigned long size, unsigned level)
>>>   {
>>>   	const unsigned long addr_ul = (unsigned long) addr &
>>> @@ -2521,7 +2521,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>>   			if (!(hpa & (TCE_PCI_READ | TCE_PCI_WRITE)))
>>>   				continue;
>>>
>>> -			pnv_pci_ioda2_table_do_free_pages(__va(hpa), size,
>>> +			pnv_pci_ioda_table_do_free_pages(__va(hpa), size,
>>>   					level - 1);
>>>   		}
>>>   	}
>>> @@ -2529,7 +2529,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>>   	free_pages(addr_ul, get_order(size << 3));
>>>   }
>>>
>>> -static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
>>> +static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl)
>>>   {
>>>   	const unsigned long size = tbl->it_indirect_levels ?
>>>   			tbl->it_level_size : tbl->it_size;
>>> @@ -2537,7 +2537,7 @@ static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
>>>   	if (!tbl->it_size)
>>>   		return;
>>>
>>> -	pnv_pci_ioda2_table_do_free_pages((__be64 *)tbl->it_base, size,
>>> +	pnv_pci_ioda_table_do_free_pages((__be64 *)tbl->it_base, size,
>>>   			tbl->it_indirect_levels);
>>>   }
>>>
>>>
>>
>>
>> --
>> Alexey
>>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
  2016-04-20  1:23     ` Gavin Shan
@ 2016-04-20  3:21       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  3:21 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 11:23 AM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 03:28:36PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
>>> with names of the weak functions in PCI subsystem, which have the
>>> prefix "pcibios". No logical changes introduced.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/include/asm/pci-bridge.h |  4 ++--
>>>   arch/powerpc/kernel/eeh_driver.c      | 12 ++++++------
>>>   arch/powerpc/kernel/pci-hotplug.c     | 15 +++++++--------
>>>   drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
>>>   drivers/pci/hotplug/rpaphp_core.c     |  4 ++--
>>>   drivers/pci/hotplug/rpaphp_pci.c      |  2 +-
>>>   6 files changed, 19 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>> index 4dd6ef4..c817f38 100644
>>> --- a/arch/powerpc/include/asm/pci-bridge.h
>>> +++ b/arch/powerpc/include/asm/pci-bridge.h
>>> @@ -263,10 +263,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
>>>   extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
>>>
>>>   /** Remove all of the PCI devices under this bus */
>>> -extern void pcibios_remove_pci_devices(struct pci_bus *bus);
>>> +extern void pci_remove_pci_devices(struct pci_bus *bus);
>>
>>
>> pci_lala_pci_lala() ("pci" is used twice) looks weird, if the prefix is
>> "pci", what other device types can they handle?...
>>
>> May be pcihp_add_devices(), pcihp_remove_devices() as these as defined in
>> pci-hotplug.c?
>>
>
> I assume you're talking about drivers/pci/hotplug/pci_hotplug_core.c.

No, the helpers you are renaming are in pci-hotplug.c which uses "pci_" as 
a prefix even though the file is supposed to be about hotplug.


> pci_hotplug_core.c uses pci_hp_ prefix rather than pcihp_. I will
> rename them to pci_hp_*() in next revision.

Anyway, this will work too.


>
> gwshan@gwshan:~/sandbox/linux$ find . -name pci-hotplug.c
> ./arch/powerpc/kernel/pci-hotplug.c
> gwshan@gwshan:~/sandbox/linux$ grep pci*hp arch/powerpc/kernel/pci-hotplug.c
>
>>
>>>
>>>   /** Discover new pci devices under this bus, and add them */
>>> -extern void pcibios_add_pci_devices(struct pci_bus *bus);
>>> +extern void pci_add_pci_devices(struct pci_bus *bus);
>>>
>>>
>>>   extern void isa_bridge_find_early(struct pci_controller *hose);
>>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>> index fb6207d..59e53fe 100644
>>> --- a/arch/powerpc/kernel/eeh_driver.c
>>> +++ b/arch/powerpc/kernel/eeh_driver.c
>>> @@ -621,7 +621,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>>>   	 * We don't remove the corresponding PE instances because
>>>   	 * we need the information afterwords. The attached EEH
>>>   	 * devices are expected to be attached soon when calling
>>> -	 * into pcibios_add_pci_devices().
>>> +	 * into pci_add_pci_devices().
>>>   	 */
>>>   	eeh_pe_state_mark(pe, EEH_PE_KEEP);
>>>   	if (bus) {
>>> @@ -630,7 +630,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>>>   		} else {
>>>   			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
>>>   			pci_lock_rescan_remove();
>>> -			pcibios_remove_pci_devices(bus);
>>> +			pci_remove_pci_devices(bus);
>>>   			pci_unlock_rescan_remove();
>>>   		}
>>>   	} else if (frozen_bus) {
>>> @@ -681,7 +681,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>>>   		if (pe->type & EEH_PE_VF)
>>>   			eeh_add_virt_device(edev, NULL);
>>>   		else
>>> -			pcibios_add_pci_devices(bus);
>>> +			pci_add_pci_devices(bus);
>>>   	} else if (frozen_bus && rmv_data->removed) {
>>>   		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>>>   		ssleep(5);
>>> @@ -691,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
>>>   		if (pe->type & EEH_PE_VF)
>>>   			eeh_add_virt_device(edev, NULL);
>>>   		else
>>> -			pcibios_add_pci_devices(frozen_bus);
>>> +			pci_add_pci_devices(frozen_bus);
>>>   	}
>>>   	eeh_pe_state_clear(pe, EEH_PE_KEEP);
>>>
>>> @@ -896,7 +896,7 @@ perm_error:
>>>   			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>>
>>>   			pci_lock_rescan_remove();
>>> -			pcibios_remove_pci_devices(frozen_bus);
>>> +			pci_remove_pci_devices(frozen_bus);
>>>   			pci_unlock_rescan_remove();
>>>   		}
>>>   	}
>>> @@ -981,7 +981,7 @@ static void eeh_handle_special_event(void)
>>>   				bus = eeh_pe_bus_get(phb_pe);
>>>   				eeh_pe_dev_traverse(pe,
>>>   					eeh_report_failure, NULL);
>>> -				pcibios_remove_pci_devices(bus);
>>> +				pci_remove_pci_devices(bus);
>>>   			}
>>>   			pci_unlock_rescan_remove();
>>>   		}
>>> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
>>> index 59c4361..78bf2a1 100644
>>> --- a/arch/powerpc/kernel/pci-hotplug.c
>>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>>> @@ -38,20 +38,20 @@ void pcibios_release_device(struct pci_dev *dev)
>>>   }
>>>
>>>   /**
>>> - * pcibios_remove_pci_devices - remove all devices under this bus
>>> + * pci_remove_pci_devices - remove all devices under this bus
>>>    * @bus: the indicated PCI bus
>>>    *
>>>    * Remove all of the PCI devices under this bus both from the
>>>    * linux pci device tree, and from the powerpc EEH address cache.
>>>    */
>>> -void pcibios_remove_pci_devices(struct pci_bus *bus)
>>> +void pci_remove_pci_devices(struct pci_bus *bus)
>>>   {
>>>   	struct pci_dev *dev, *tmp;
>>>   	struct pci_bus *child_bus;
>>>
>>>   	/* First go down child busses */
>>>   	list_for_each_entry(child_bus, &bus->children, node)
>>> -		pcibios_remove_pci_devices(child_bus);
>>> +		pci_remove_pci_devices(child_bus);
>>>
>>>   	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
>>>   		 pci_domain_nr(bus),  bus->number);
>>> @@ -60,11 +60,10 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
>>>   		pci_stop_and_remove_bus_device(dev);
>>>   	}
>>>   }
>>> -
>>> -EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
>>> +EXPORT_SYMBOL_GPL(pci_remove_pci_devices);
>>>
>>>   /**
>>> - * pcibios_add_pci_devices - adds new pci devices to bus
>>> + * pci_add_pci_devices - adds new pci devices to bus
>>>    * @bus: the indicated PCI bus
>>>    *
>>>    * This routine will find and fixup new pci devices under
>>> @@ -74,7 +73,7 @@ EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
>>>    * is how this routine differs from other, similar pcibios
>>>    * routines.)
>>>    */
>>> -void pcibios_add_pci_devices(struct pci_bus * bus)
>>> +void pci_add_pci_devices(struct pci_bus *bus)
>>>   {
>>>   	int slotno, mode, pass, max;
>>>   	struct pci_dev *dev;
>>> @@ -114,4 +113,4 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
>>>   	}
>>>   	pcibios_finish_adding_to_bus(bus);
>>>   }
>>> -EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
>>> +EXPORT_SYMBOL_GPL(pci_add_pci_devices);
>>> diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
>>> index b46b57d..730982b 100644
>>> --- a/drivers/pci/hotplug/rpadlpar_core.c
>>> +++ b/drivers/pci/hotplug/rpadlpar_core.c
>>> @@ -380,7 +380,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
>>>   	}
>>>
>>>   	/* Remove all devices below slot */
>>> -	pcibios_remove_pci_devices(bus);
>>> +	pci_remove_pci_devices(bus);
>>>
>>>   	/* Unmap PCI IO space */
>>>   	if (pcibios_unmap_io_space(bus)) {
>>> diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
>>> index 611f605..bba07b3 100644
>>> --- a/drivers/pci/hotplug/rpaphp_core.c
>>> +++ b/drivers/pci/hotplug/rpaphp_core.c
>>> @@ -404,7 +404,7 @@ static int enable_slot(struct hotplug_slot *hotplug_slot)
>>>
>>>   	if (state == PRESENT) {
>>>   		pci_lock_rescan_remove();
>>> -		pcibios_add_pci_devices(slot->bus);
>>> +		pci_add_pci_devices(slot->bus);
>>>   		pci_unlock_rescan_remove();
>>>   		slot->state = CONFIGURED;
>>>   	} else if (state == EMPTY) {
>>> @@ -426,7 +426,7 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
>>>   		return -EINVAL;
>>>
>>>   	pci_lock_rescan_remove();
>>> -	pcibios_remove_pci_devices(slot->bus);
>>> +	pci_remove_pci_devices(slot->bus);
>>>   	pci_unlock_rescan_remove();
>>>   	vm_unmap_aliases();
>>>
>>> diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
>>> index 7836d69..1099b38 100644
>>> --- a/drivers/pci/hotplug/rpaphp_pci.c
>>> +++ b/drivers/pci/hotplug/rpaphp_pci.c
>>> @@ -116,7 +116,7 @@ int rpaphp_enable_slot(struct slot *slot)
>>>   		}
>>>
>>>   		if (list_empty(&bus->devices))
>>> -			pcibios_add_pci_devices(bus);
>>> +			pci_add_pci_devices(bus);
>>>
>>>   		if (!list_empty(&bus->devices)) {
>>>   			info->adapter_status = CONFIGURED;
>>>
>>
>>
>> --
>> Alexey
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time
  2016-04-20  3:00       ` Alexey Kardashevskiy
@ 2016-04-20  3:35         ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-04-20  3:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, dja,
	bhelgaas, robherring2, grant.likely

On Wed, Apr 20, 2016 at 01:00:38PM +1000, Alexey Kardashevskiy wrote:
>On 04/20/2016 11:12 AM, Gavin Shan wrote:
>>On Tue, Apr 19, 2016 at 02:16:42PM +1000, Alexey Kardashevskiy wrote:
>>>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>>>Currently, the PEs and their associated resources are assigned
>>>>in ppc_md.pcibios_fixup() except those used by SRIOV VFs.
>>>
>>>But this new code does not affect IOV and VF's PEs will still be created
>>>somewhere else rather than pnv_pci_setup_bridge()?
>>>
>>
>>Correct. VF PEs cannot be created in pnv_pci_setup_bridge() as the PF's
>>IOV capability isn't enabled at that point.
>>
>>>
>>>>The
>>>>function is called for once after PCI probing and resources
>>>>assignment is completed. So it isn't hotplug friendly.
>>>>
>>>>This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
>>>>is called on the event during system bootup and PCI hotplug: updating
>>>>PCI bridge's windows after resource assignment/reassignment are done.
>>>>For partial hotplug case, where not all PCI devices belonging to the
>>>>PE are unplugged and plugged again, we just need unbinding/binding
>>>>the affected PCI devices with the corresponding PE without creating
>>>>new one.
>>>>
>>>>As there is no upstream bridge for root bus that needs to be covered
>>>>by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
>>>>before any other PEs can be created, as PE for root bus is the ancestor
>>>>to anyone else.
>>>
>>>We did not need a root bus PE before? What is the other PE reserved for?
>>>Comments only say "reserved"...
>>>
>>
>>No, A PE for root bus is needed before.
>
>Ok. We needed a PE for the root bus and we need it now. What changed? Why do
>you reserve another PE?
>

Originally, all PEs (include the one for root bus) were created at PHB fixup time
in pnv_pci_ioda_fixup(). With this patch, all PEs are created in pnv_pci_setup_bridge().
pnv_pci_setup_bridge() is called for every PCI buses other than root bus. It means
pnv_pci_setup_bridge() isn't called for root bus. So we have to create PE for root
bus before the left PEs are created there. The PE# for root bus is reserved in advance
and used in pnv_pci_setup_bridge() at that point.

>
>>
>other PEs can be for the PCI bus
>>originated from root port and the subordinate domains.
>>
>>>>
>>>>Also, the windows of root port or the upstream port of PCIe switch behind
>>>>root port are extended to be PHB's apertures to accommodate the additional
>>>>resources needed by newly plugged devices based on the fact: hotpluggable
>>>>slot is behind root port or downstream port of the PCIe switch behind
>>>>root port. The extension for those PCI brdiges' windows is done in
>>>>ppc_md.pcibios_setup_bridge() as well.
>>>
>>>
>>>This patch seems to be doing way too many things, hard to follow.
>>>
>>>Could you please split the patch into smaller chunks? For example (you can do
>>>it totally different):
>>>- move pnv_pci_ioda_setup_opal_tce_kill()
>>>- move PE creation from pnv_pci_ioda_fixup() to pnv_pci_setup_bridge();
>>>- add pnv_pci_fixup_bridge_resources()
>>>- add an extra reserved PE for the root bus (and all this magic with
>>>root_pe_idx/root_pe_populated)
>>>- ...
>>>
>>
>>I'll evaluate it later. It's always nice to have small patches. Thanks
>>for the comments.
>>
>>>
>>>
>>>
>>>--
>>>Alexey
>>>
>>
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes()
  2016-04-20  1:27       ` Gavin Shan
@ 2016-04-20  3:39         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  3:39 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 11:27 AM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 03:51:03PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> This renames traverse_pci_devices() to pci_traverse_device_nodes().
>>> The function traverses all subordinate device nodes of the specified
>>> one. Also, below cleanup applied to the function. No logical changes
>>> introduced.
>>>
>>>     * Rename "pre" to "fn".
>>>     * Avoid assignment in if condition reported from checkpatch.pl.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
>>>   arch/powerpc/kernel/pci_dn.c         | 15 ++++++++++-----
>>>   arch/powerpc/platforms/pseries/msi.c |  4 ++--
>>>   3 files changed, 15 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
>>> index ca0c5bf..8753e4e 100644
>>> --- a/arch/powerpc/include/asm/ppc-pci.h
>>> +++ b/arch/powerpc/include/asm/ppc-pci.h
>>> @@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
>>>   struct device_node;
>>>   struct pci_dn;
>>>
>>> -typedef void *(*traverse_func)(struct device_node *me, void *data);
>>
>>
>>
>> Why removing this typedef? Typedef's are good.
>>
>> Anyway,
>>
>
> Could you please provide more details why it's good? I removed it
> because it was used for only once.


I have some thoughts but never mind, nobody seems to care about this and 
typedefs are considered bad by the CodingStyle.


>
>
>>
>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>>
>>
>>
>>> -void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>> -		void *data);
>>> +void *pci_traverse_device_nodes(struct device_node *start,
>>> +				void *(*fn)(struct device_node *, void *),
>>> +				void *data);
>>>   void *traverse_pci_dn(struct pci_dn *root,
>>>   		      void *(*fn)(struct pci_dn *, void *),
>>>   		      void *data);
>>> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>> index ce10281..ecdccce 100644
>>> --- a/arch/powerpc/kernel/pci_dn.c
>>> +++ b/arch/powerpc/kernel/pci_dn.c
>>> @@ -372,8 +372,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
>>>    * one of these nodes we also assume its siblings are non-pci for
>>>    * performance.
>>>    */
>>> -void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>> -		void *data)
>>> +void *pci_traverse_device_nodes(struct device_node *start,
>>> +				void *(*fn)(struct device_node *, void *),
>>> +				void *data)
>>>   {
>>>   	struct device_node *dn, *nextdn;
>>>   	void *ret;
>>> @@ -388,8 +389,11 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>>   		if (classp)
>>>   			class = of_read_number(classp, 1);
>>>
>>> -		if (pre && ((ret = pre(dn, data)) != NULL))
>>> -			return ret;
>>> +		if (fn) {
>>> +			ret = fn(dn, data);
>>> +			if (ret)
>>> +				return ret;
>>> +		}
>>>
>>>   		/* If we are a PCI bridge, go down */
>>>   		if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
>>> @@ -411,6 +415,7 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>>   	}
>>>   	return NULL;
>>>   }
>>> +EXPORT_SYMBOL_GPL(pci_traverse_device_nodes);
>>>
>>>   static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
>>>   				      struct pci_dn *pdn)
>>> @@ -487,7 +492,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>>>   	}
>>>
>>>   	/* Update dn->phb ptrs for new phb and children devices */
>>> -	traverse_pci_devices(dn, add_pdn, phb);
>>> +	pci_traverse_device_nodes(dn, add_pdn, phb);
>>>   }
>>>
>>>   /**
>>> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
>>> index 272e9ec..543a638 100644
>>> --- a/arch/powerpc/platforms/pseries/msi.c
>>> +++ b/arch/powerpc/platforms/pseries/msi.c
>>> @@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>>>   	memset(&counts, 0, sizeof(struct msi_counts));
>>>
>>>   	/* Work out how many devices we have below this PE */
>>> -	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
>>> +	pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, &counts);
>>>
>>>   	if (counts.num_devices == 0) {
>>>   		pr_err("rtas_msi: found 0 devices under PE for %s\n",
>>> @@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>>>   	/* else, we have some more calculating to do */
>>>   	counts.requestor = pci_device_to_OF_node(dev);
>>>   	counts.request = request;
>>> -	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
>>> +	pci_traverse_device_nodes(pe_dn, count_spare_msis, &counts);
>>>
>>>   	/* If the quota isn't an integer multiple of the total, we can
>>>   	 * use the remainder as spare MSIs for anyone that wants them. */
>>>
>>
>>
>> --
>> Alexey
>>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 30/45] powerpc/pci: Delay populating pdn
  2016-04-20  2:13     ` Gavin Shan
@ 2016-04-20  3:54       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  3:54 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 12:13 PM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 06:19:20PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> The pdn (struct pci_dn) instances are allocated from memblock or
>>> bootmem when creating PCI controller (hoses) in setup_arch(). PCI
>>> hotplug, which will be supported by proceeding patches, releases
>>> PCI device nodes and their corresponding pdn on unplugging event.
>>> The memory chunks for pdn instances allocated from memblock or
>>> bootmem are hard to reused after being released.
>>>
>>> This delays creating pdn by pci_devs_phb_init() from setup_arch()
>>> to core_initcall() so that they are allocated from slab. The memory
>>> consumed by pdn can be released to system without problem during
>>> PCI unplugging time. It indicates that pci_dn is unavailable in
>>> setup_arch() and the the fixup on pdn (like AGP's) can't be carried
>>> out that time. We have to do that in ppc_md.pcibios_root_bridge_prepare()
>>> on maple/pasemi/powermac platforms where/when the pdn is available.
>>>
>>> At the mean while, the EEH device is created when pdn is populated,
>>> meaning pdn and EEH device have same life cycle. In turn, we needn't
>>> call eeh_dev_init() to create EEH device explicitly.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>
>>
>> Uff. It would not hurt to mention that  pcibios_root_bridge_prepare is called
>>from subsys_initcall() which is executed after core_initcall() so the code
>> flow does not change.
>>
>
> Yes, will do in next revision.
>
>> Have you checked if there is anything in between
>> core_initcall(pci_devs_phb_init) and subsys_initcall(pcibios_init) which
>> might need device tree nodes? For example, subsys_initcall(pcibios_init)
>> calls (eventually) pnv_pci_ioda_fixup(), if we are unlucky and pcibios_init()
>> (and therefore pnv_pci_ioda_fixup() or what pseries/others do) is called
>> before pcibios_init() - won't we crash or something?
>>
>
> I don't catch what you were asking. device-tree nodes (struct device_node)
> are always there. This patch doesn't affect them. Perhaps you were talking
> about pdn (PCI_DN). If it's the case, this patch delays creating pdn from
> setup_arch() to core_initcall(pci_devs_phb_init).


While thinking of explaining what I wanted to ask, I found my answer :)

pcibios_init() calls ppc_md.pcibios_root_bridge_prepare() first, then 
ppc_md.pcibios_fixup() so we are fine here with ordering.


> I don't see anything need pdn between setup_arch() and core_initcall().
> The changes introduced to powermac/pasemi platforms are: move fixing the child
> pdns of the specifiec PHB's pdn from setup_arch() to subsys_initcall(pcibios_init).
> I don't see anything between them needs the fixed pdns.
>
> I don't understand how pcibios_init() is called before pcibios_init() in your

pcibios_init() is used twice in the sentence above :)

Anyway,


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>




> context. Sorry for my bad English. Perhaps you're asking the the called sequence
> on core_initcall() and subsys_init()? If so, they're defined like below:
>
> #define core_initcall(fn)		__define_initcall(fn, 1)
> #define subsys_initcall(fn)		__define_initcall(fn, 4)
 >
>
>>
>> --
>> Alexey
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID
  2016-04-20  2:28       ` Gavin Shan
@ 2016-04-20  4:14         ` Alexey Kardashevskiy
  2016-04-22  4:23           ` Alistair Popple
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  4:14 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 12:28 PM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 07:28:20PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> PowerNV platforms runs on top of skiboot firmware that includes
>>> changes to support PCI slots. PCI slots are identified by PHB's
>>> ID or the combo of that and PCI slot ID.
>>>
>>> This changes the EEH PowerNV backend to support PCI slots:
>>>
>>>     * Rename arguments of opal_pci_reset() and opal_pci_poll().
>>>     * One more argument (PCI slot's state) added to opal_pci_poll().
>>>     * Drop pnv_eeh_phb_poll() and introduce a enhanced similar
>>>       function pnv_pci_poll() that will be used by PowerNV hotplug
>>>       backends.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/include/asm/opal.h              |  4 +--
>>>   arch/powerpc/platforms/powernv/eeh-powernv.c | 42 ++++++----------------------
>>>   arch/powerpc/platforms/powernv/pci.c         | 21 ++++++++++++++
>>>   arch/powerpc/platforms/powernv/pci.h         |  1 +
>>>   4 files changed, 32 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>>> index 07a99e6..9e0039f 100644
>>> --- a/arch/powerpc/include/asm/opal.h
>>> +++ b/arch/powerpc/include/asm/opal.h
>>> @@ -131,7 +131,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
>>>   int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
>>>   					uint16_t dma_window_number, uint64_t pci_start_addr,
>>>   					uint64_t pci_mem_size);
>>> -int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
>>> +int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
>>>
>>>   int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
>>>   				   uint64_t diag_buffer_len);
>>> @@ -148,7 +148,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
>>>   int64_t opal_set_system_attention_led(uint8_t led_action);
>>>   int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
>>>   			    __be16 *pci_error_type, __be16 *severity);
>>> -int64_t opal_pci_poll(uint64_t phb_id);
>>> +int64_t opal_pci_poll(uint64_t id, uint8_t *state);
>>>   int64_t opal_return_cpu(void);
>>>   int64_t opal_check_token(uint64_t token);
>>>   int64_t opal_reinit_cpus(uint64_t flags);
>>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> index c7454ba..e23b063 100644
>>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> @@ -717,28 +717,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
>>>   	return ret;
>>>   }
>>>
>>> -static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
>>> -{
>>> -	s64 rc = OPAL_HARDWARE;
>>> -
>>> -	while (1) {
>>> -		rc = opal_pci_poll(phb->opal_id);
>>> -		if (rc <= 0)
>>> -			break;
>>> -
>>> -		if (system_state < SYSTEM_RUNNING)
>>> -			udelay(1000 * rc);
>>> -		else
>>> -			msleep(rc);
>>> -	}
>>> -
>>> -	return rc;
>>> -}
>>> -
>>>   int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>>>   {
>>>   	struct pnv_phb *phb = hose->private_data;
>>>   	s64 rc = OPAL_HARDWARE;
>>> +	int ret;
>>>
>>>   	pr_debug("%s: Reset PHB#%x, option=%d\n",
>>>   		 __func__, hose->global_number, option);
>>> @@ -753,8 +736,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>>>   		rc = opal_pci_reset(phb->opal_id,
>>>   				    OPAL_RESET_PHB_COMPLETE,
>>>   				    OPAL_DEASSERT_RESET);
>>> -	if (rc < 0)
>>> -		goto out;
>>>
>>>   	/*
>>>   	 * Poll state of the PHB until the request is done
>>> @@ -762,24 +743,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>>>   	 * reset followed by hot reset on root bus. So we also
>>>   	 * need the PCI bus settlement delay.
>>>   	 */
>>> -	rc = pnv_eeh_phb_poll(phb);
>>> -	if (option == EEH_RESET_DEACTIVATE) {
>>> +	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
>>> +	if (option == EEH_RESET_DEACTIVATE && !ret) {
>>>   		if (system_state < SYSTEM_RUNNING)
>>>   			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
>>>   		else
>>>   			msleep(EEH_PE_RST_SETTLE_TIME);
>>>   	}
>>> -out:
>>> -	if (rc != OPAL_SUCCESS)
>>> -		return -EIO;
>>>
>>> -	return 0;
>>> +	return ret;
>>>   }
>>>
>>>   static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>>>   {
>>>   	struct pnv_phb *phb = hose->private_data;
>>>   	s64 rc = OPAL_HARDWARE;
>>> +	int ret;
>>>
>>>   	pr_debug("%s: Reset PHB#%x, option=%d\n",
>>>   		 __func__, hose->global_number, option);
>>> @@ -801,18 +780,13 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>>>   		rc = opal_pci_reset(phb->opal_id,
>>>   				    OPAL_RESET_PCI_HOT,
>>>   				    OPAL_DEASSERT_RESET);
>>> -	if (rc < 0)
>>> -		goto out;
>>>
>>>   	/* Poll state of the PHB until the request is done */
>>> -	rc = pnv_eeh_phb_poll(phb);
>>> -	if (option == EEH_RESET_DEACTIVATE)
>>> +	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
>>> +	if (option == EEH_RESET_DEACTIVATE && !ret)
>>>   		msleep(EEH_PE_RST_SETTLE_TIME);
>>> -out:
>>> -	if (rc != OPAL_SUCCESS)
>>> -		return -EIO;
>>>
>>> -	return 0;
>>> +	return ret;
>>>   }
>>>
>>>   static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>> index b87a315..a458703 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.c
>>> +++ b/arch/powerpc/platforms/powernv/pci.c
>>> @@ -42,6 +42,27 @@
>>>   #define cfg_dbg(fmt...)	do { } while(0)
>>>   //#define cfg_dbg(fmt...)	printk(fmt)
>>>
>>> +int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
>>> +{
>>> +	while (rval > 0) {
>>> +		if (system_state < SYSTEM_RUNNING)
>>> +			udelay(1000 * rval);
>>> +		else
>>> +			msleep(rval);
>>> +
>>> +		rval = opal_pci_poll(id, state);
>>> +	}
>>> +
>>> +	/*
>>> +	 * The caller expects to retrieve additional
>>> +	 * information if the last argument isn't NULL.
>>> +	 */
>>> +	if (rval == OPAL_SUCCESS && state)
>>> +		rval = opal_pci_poll(id, state);
>>
>>
>> Old OPAL won't touch @state so whatever garbage was there will stay there as
>> the only caller which is passing not-NULL there is pnv_php_get_power_state()
>> and it does not initialize @power_state (it is in "[PATCH v8 45/45]
>> PCI/hotplug: PowerPC PowerNV PCI hotplug driver").
>>
>
> Old OPAL without exposing hotpluggable slots won't have this case. I mean
> pnv_php_get_power_state() won't be called on old OPAL.


What exactly does guarantee that hotplug will work with new OPAL only?

>
>>
>> btw how will new OPAL react if old kernel is running, i.e. not passing @state
>> at all? If it is initialized to NULL somewher - fine but what exactly does
>> this initialization and makes sure that OPAL won't see garbage as a second
>> parameter?
>>
>
> @state is always NULL for old kernel + new OPAL.

What piece of code writes NULL to "state"? Old kernel will do 
opal_pci_poll(id) and omit the "state" so something has to take care of it 
and write NULL where OPAL expects to see a pointer to "state" (which we set 
to NULL) and that place would be some GPR which may have garbage.

I am looking at
#define OPAL_CALL(name, token)

and cannot see where the second parameter (which old kernel omits) is 
reset, i.e. gpr4 (or gpr5?) is cleared.


> @state is used in
> PCI hotplug functionality in OPAL only. As old kernel doesn't support
> PCI hotplug, @state is never used. I'm not sure it's the answer you
> want?

No, it is not :)

>> When ABI like this changes, I expect to see opal_pci_poll2() or
>> opal_pci_poll_ex() rather than just an additional parameter to
>> opal_pci_poll()...
>>
>
> It's a good suggestion but it would be nicer if you raised this
> early. One question I have: current opal_pci_poll() is enough
> to cover all cases, why we need introduce and maintain another
> similar one? Sorry that I don't see the reason from your context
> and could you please provide more details?
>
>>> +
>>> +	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
>>> +}
>>> +
>>>   #ifdef CONFIG_PCI_MSI
>>>   int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
>>>   {
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index 0cddde3..6857703 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -192,6 +192,7 @@ extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
>>>   		unsigned long *hpa, enum dma_data_direction *direction);
>>>   extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
>>>
>>> +int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state);
>>>   void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
>>>   				unsigned char *log_buff);
>>>   int pnv_pci_cfg_read(struct pci_dn *pdn,
>>>
>>
>>
>> --
>> Alexey
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure
  2016-04-20  2:33     ` Gavin Shan
@ 2016-04-20  4:17       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  4:17 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 12:33 PM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 07:34:55PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> The skiboot firmware might provide the PCI slot reset capability
>>> which is identified by property "ibm,reset-by-firmware" on the
>>> PCI slot associated device node.
>>>
>>> This checks the property. If it exists, the reset request is routed
>>> to firmware. Otherwise, the reset is done by kernel as before.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +++++++++++++++++++++++++++-
>>>   1 file changed, 40 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> index e23b063..c8a5217 100644
>>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> @@ -789,7 +789,7 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>>>   	return ret;
>>>   }
>>>
>>> -static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>> +static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>>   {
>>>   	struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
>>>   	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>> @@ -840,6 +840,45 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>>   	return 0;
>>>   }
>>>
>>> +static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
>>> +{
>>> +	struct pci_controller *hose;
>>> +	struct pnv_phb *phb;
>>> +	struct device_node *dn = pdev ? pci_device_to_OF_node(pdev) : NULL;
>>> +	uint64_t id = (0x1ul << 60);
>>
>>
>> What is this 1<<60 for?
>>
>>
>
> As you replied in other threads, it's worthy to have some macros for this
> piece of business. This bit indicates the ID of the slot behind a switch
> port. If this bit is cleared, the ID represents a PHB slot.
>
>>> +	uint8_t scope;
>>> +	int64_t rc;
>>> +
>>> +	/*
>>> +	 * If the firmware can't handle it, we will issue hot reset
>>> +	 * on the secondary bus despite the requested reset type.
>>> +	 */
>>> +	if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
>>> +		return __pnv_eeh_bridge_reset(pdev, option);
>>> +
>>> +	/* The firmware can handle the request */
>>> +	switch (option) {
>>> +	case EEH_RESET_HOT:
>>> +		scope = OPAL_RESET_PCI_HOT;
>>> +		break;
>>> +	case EEH_RESET_FUNDAMENTAL:
>>> +		scope = OPAL_RESET_PCI_FUNDAMENTAL;
>>> +		break;
>>> +	case EEH_RESET_DEACTIVATE:
>>> +		return 0;
>>> +	default:
>>> +		dev_warn(&pdev->dev, "%s: Unsupported reset %d\n",
>>> +			 __func__, option);
>>
>>
>> Can the userspace trigger this case (via VFIO-EEH) and flood dmesg?
>>
>
> It depends on how you defined message flooding actually. It's abnormal
> path caused by program internal error, not external users.


Can QEMU be changed to do something special (cause reset with a wrong 
option) via VFIO/EEH interface in a loop to make this message appear? Or 
the call with a wrong option will never reach this point?


>
>>
>>
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	hose = pci_bus_to_host(pdev->bus);
>>> +	phb = hose->private_data;
>>> +	id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
>>> +	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
>>> +	return pnv_pci_poll(id, rc, NULL);
>>> +}
>>> +
>>>   static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
>>>   {
>>>   	int *freset = data;
>>>
>>
>>
>> --
>> Alexey
>>
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status
  2016-04-20  2:36       ` Gavin Shan
@ 2016-04-20  4:25         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  4:25 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 12:36 PM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 07:39:34PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> This exports 4 functins, which base on the corresponding OPAL
>>
>>
>> s/functins/functions/
>>
>
> Thanks.
>
>>> APIs to get/set PCI slot status. Those functions are going to
>>> be used by PowerNV PCI hotplug driver:
>>>
>>>     pnv_pci_get_device_tree()    opal_get_device_tree()
>>>     pnv_pci_get_presence_state() opal_pci_get_presence_state()
>>>     pnv_pci_get_power_state()    opal_pci_get_power_state()
>>>     pnv_pci_set_power_state()    opal_pci_set_power_state()
>>>
>>> Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
>>> unregister}() to allow registration and unregistration of PCI hotplug
>>> notifier, which will be used to receive PCI hotplug message from
>>> skiboot firmware in PowerNV PCI hotplug driver.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/include/asm/opal-api.h            | 17 ++++++-
>>>   arch/powerpc/include/asm/opal.h                |  4 ++
>>>   arch/powerpc/include/asm/pnv-pci.h             |  7 +++
>>>   arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
>>>   arch/powerpc/platforms/powernv/pci.c           | 66 ++++++++++++++++++++++++++
>>>   5 files changed, 97 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
>>> index f8faaae..a6af338 100644
>>> --- a/arch/powerpc/include/asm/opal-api.h
>>> +++ b/arch/powerpc/include/asm/opal-api.h
>>> @@ -158,7 +158,11 @@
>>>   #define OPAL_LEDS_SET_INDICATOR			115
>>>   #define OPAL_CEC_REBOOT2			116
>>>   #define OPAL_CONSOLE_FLUSH			117
>>> -#define OPAL_LAST				117
>>> +#define OPAL_GET_DEVICE_TREE			118
>>> +#define OPAL_PCI_GET_PRESENCE_STATE		119
>>> +#define OPAL_PCI_GET_POWER_STATE		120
>>> +#define OPAL_PCI_SET_POWER_STATE		121
>>> +#define OPAL_LAST				121
>>>
>>>   /* Device tree flags */
>>>
>>> @@ -344,6 +348,16 @@ enum OpalPciResetState {
>>>   	OPAL_ASSERT_RESET   = 1
>>>   };
>>>
>>> +enum OpalPciSlotPresentenceState {
>>> +	OPAL_PCI_SLOT_EMPTY	= 0,
>>> +	OPAL_PCI_SLOT_PRESENT	= 1
>>> +};
>>> +
>>> +enum OpalPciSlotPowerState {
>>> +	OPAL_PCI_SLOT_POWER_OFF	= 0,
>>> +	OPAL_PCI_SLOT_POWER_ON	= 1
>>> +};
>>> +
>>>   enum OpalSlotLedType {
>>>   	OPAL_SLOT_LED_TYPE_ID = 0,	/* IDENTIFY LED */
>>>   	OPAL_SLOT_LED_TYPE_FAULT = 1,	/* FAULT LED */
>>> @@ -378,6 +392,7 @@ enum opal_msg_type {
>>>   	OPAL_MSG_DPO,
>>>   	OPAL_MSG_PRD,
>>>   	OPAL_MSG_OCC,
>>> +	OPAL_MSG_PCI_HOTPLUG,
>>>   	OPAL_MSG_TYPE_MAX,
>>>   };
>>>
>>> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>>> index 9e0039f..899bcb941 100644
>>> --- a/arch/powerpc/include/asm/opal.h
>>> +++ b/arch/powerpc/include/asm/opal.h
>>> @@ -209,6 +209,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
>>>   		uint64_t size, uint64_t token);
>>>   int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
>>>   		uint64_t token);
>>> +int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
>>> +int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
>>> +int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
>>> +int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);
>>>
>>>   /* Internal functions */
>>>   extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>>> diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
>>> index 6f77f71..d9d095b 100644
>>> --- a/arch/powerpc/include/asm/pnv-pci.h
>>> +++ b/arch/powerpc/include/asm/pnv-pci.h
>>> @@ -13,6 +13,13 @@
>>>   #include <linux/pci.h>
>>>   #include <misc/cxl-base.h>
>>>
>>> +extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
>>> +extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
>>> +extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
>>> +extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
>>> +extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
>>> +extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
>>> +
>>>   int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
>>>   int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
>>>   			   unsigned int virq);
>>> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
>>> index e45b88a..3ea1a855 100644
>>> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
>>> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
>>> @@ -302,3 +302,7 @@ OPAL_CALL(opal_prd_msg,				OPAL_PRD_MSG);
>>>   OPAL_CALL(opal_leds_get_ind,			OPAL_LEDS_GET_INDICATOR);
>>>   OPAL_CALL(opal_leds_set_ind,			OPAL_LEDS_SET_INDICATOR);
>>>   OPAL_CALL(opal_console_flush,			OPAL_CONSOLE_FLUSH);
>>> +OPAL_CALL(opal_get_device_tree,			OPAL_GET_DEVICE_TREE);
>>> +OPAL_CALL(opal_pci_get_presence_state,		OPAL_PCI_GET_PRESENCE_STATE);
>>> +OPAL_CALL(opal_pci_get_power_state,		OPAL_PCI_GET_POWER_STATE);
>>> +OPAL_CALL(opal_pci_set_power_state,		OPAL_PCI_SET_POWER_STATE);
>>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>> index a458703..206385f 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.c
>>> +++ b/arch/powerpc/platforms/powernv/pci.c
>>> @@ -63,6 +63,72 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
>>>   	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
>>>   }
>>>
>>> +int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len)
>>> +{
>>> +	int64_t rc;
>>> +
>>> +	if (!opal_check_token(OPAL_GET_DEVICE_TREE))
>>> +		return -ENXIO;
>>> +
>>> +	rc = opal_get_device_tree(phandle, (uint64_t)buf, len);
>>> +	if (rc != OPAL_SUCCESS)
>>> +		return -EIO;
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL_GPL(pnv_pci_get_device_tree);
>>> +
>>> +int pnv_pci_get_presence_state(uint64_t id, uint8_t *state)
>>> +{
>>> +	int64_t rc;
>>> +
>>> +	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATE))
>>> +		return -ENXIO;
>>> +
>>> +	rc = opal_pci_get_presence_state(id, state);
>>> +	if (rc != OPAL_SUCCESS)
>>> +		return -EIO;
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL_GPL(pnv_pci_get_presence_state);
>>> +
>>> +int pnv_pci_get_power_state(uint64_t id, uint8_t *state)
>>> +{
>>> +	int64_t rc;
>>> +
>>> +	if (!opal_check_token(OPAL_PCI_GET_POWER_STATE))
>>> +		return -ENXIO;
>>> +
>>> +	rc = opal_pci_get_power_state(id, state);
>>
>>
>> Out of curiosity - if rc==OPAL_SUCCESS, @state should already contain the
>> correct state and you do not have to call pnv_pci_poll() (which will call
>> opal_pci_poll() immediately), is that correct?
>>
>
> It's not correct. opal_pci_get_power_state() to starts a state machine
> in the OPAL firmware and pnv_pci_poll() keeps pushing the state machine
> moving forward.


opal_pci_get_power_state() and pnv_pci_poll() both read the state, they do 
not push it (as they are not xxx_set or xxx_put or xxx_push) and there is 
no delay between these calls, does the state change so fast?

In practice, can this happen?
1. opal_pci_get_power_state() returns OPAL_SUCCESS and some state A;
2. pnv_pci_poll() called right-after-that-with-zero-delay (pnv_pci_poll() 
does not do loop if OPAL_SUCCESS) return a different state B.

Or these helpers return different types of states (unlikely though)?

It is not a concern but may help to understand all this OPAL polling magic.



>
>> Anyway, looks correct.
>>
>>
>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>>
>>
>>
>>> +	return pnv_pci_poll(id, rc, state);
>>> +}
>>> +EXPORT_SYMBOL_GPL(pnv_pci_get_power_state);
>>> +
>>> +int pnv_pci_set_power_state(uint64_t id, uint8_t state)
>>> +{
>>> +	int64_t rc;
>>> +
>>> +	if (!opal_check_token(OPAL_PCI_SET_POWER_STATE))
>>> +		return -ENXIO;
>>> +
>>> +	rc = opal_pci_set_power_state(id, state);
>>> +	return pnv_pci_poll(id, rc, NULL);
>>> +}
>>> +EXPORT_SYMBOL_GPL(pnv_pci_set_power_state);
>>> +
>>> +int pnv_pci_hotplug_notifier_register(struct notifier_block *nb)
>>> +{
>>> +	return opal_message_notifier_register(OPAL_MSG_PCI_HOTPLUG, nb);
>>> +}
>>> +EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_register);
>>> +
>>> +int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb)
>>> +{
>>> +	return opal_message_notifier_unregister(OPAL_MSG_PCI_HOTPLUG, nb);
>>> +}
>>> +EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_unregister);
>>> +
>>>   #ifdef CONFIG_PCI_MSI
>>>   int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
>>>   {
>>>
>>
>>
>> --
>> Alexey
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 17/45] powerpc/powernv/ioda1: Improve DMA32 segment track
  2016-04-20  0:49       ` Gavin Shan
@ 2016-04-20  5:10         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-04-20  5:10 UTC (permalink / raw)
  To: Gavin Shan, benh
  Cc: linuxppc-dev, linux-pci, devicetree, mpe, dja, bhelgaas,
	robherring2, grant.likely

On 04/20/2016 10:49 AM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 11:50:10AM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> In current implementation, the DMA32 segments required by one specific
>>> PE isn't calculated with the information hold in the PE independently.
>>> It conflicts with the PCI hotplug design: PE centralized, meaning the
>>> PE's DMA32 segments should be calculated from the information hold in
>>> the PE independently.
>>>
>>> This introduces an array (@dma32_segmap) for every PHB to track the
>>> DMA32 segmeng usage. Besides, this moves the logic calculating PE's
>>> consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
>>> DMA32 segments are calculated/allocated from the information hold in
>>> the PE (DMA32 weight). Also the logic is improved: we try to allocate
>>> as much DMA32 segments as we can. It's acceptable that number of DMA32
>>> segments less than the expected number are allocated.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>
>>
>> This DMA segments business was the reason why I have not even tried
>> implementing DDW for POWER7 - it is way too different from POWER8 and there
>> is no chance that anyone outside Ozlabs will ever try using this in practice;
>> the same applies to PCI hotplug on POWER7.
>>
>> I am suggesting to ditch all IODA1 changes from this patchset as this code
>> will hang around (unused) for may be a year or so and then will be gone as
>> p5ioc2.
>>
>
> As I knew, some P7 boxes out of Ozlabs have the software stack. At least,
> I was heavily relying on P7 box + PowerNV based linux heavily until last
> September of last year.

And yet you have not replaced a single physical device on any of our power7 
boxes ;)

> My original thoughts are as below. If they're
> convincing, I can drop some of IODA1 changes, but not all of them obviously:
>
> - In case customer want to use this combo (P7 box + PowerNV) for any reason.

I have serious doubts we have any customer like this. Or a developer who 
would want this. And OPAL on P7 does not support this either.

> - In case developers want to use this combo (P7 box + PowerNV) for any reason.
>    For example, no P8 boxes can be found for one particular project, but available
>    P7 box is still ok for that.

Testing POWER8 PCI hotplug on POWER7 machine is kind of pointless anyway.


> - EEH supported on P7/P8 needs hotplug some cases: when hitting excessive failures,
>    PCI devices and their platform resources (PE, DMA, M32/M64 mapping etc) should
>    be purged.

EEH recovery should not require resource reallocation, no?

> - Current implementation has P7/P8 mixed up to some extent which isn't so good
>    as Ben pointed long time ago. It's impossible not to affect P7IOC piece if
>    P8 piece is changed in order to support hotplug.

This is understandable.


I'll leave it to Ben.


>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 111 +++++++++++++++++-------------
>>>   arch/powerpc/platforms/powernv/pci.h      |   7 +-
>>>   2 files changed, 66 insertions(+), 52 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 0fc2309..59782fba 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -2007,20 +2007,54 @@ static unsigned int pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
>>>   }
>>>
>>>   static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>> -				       struct pnv_ioda_pe *pe,
>>> -				       unsigned int base,
>>> -				       unsigned int segs)
>>> +				       struct pnv_ioda_pe *pe)
>>>   {
>>>
>>>   	struct page *tce_mem = NULL;
>>>   	struct iommu_table *tbl;
>>> -	unsigned int tce32_segsz, i;
>>> +	unsigned int weight, total_weight;
>>> +	unsigned int tce32_segsz, base, segs, i;
>>>   	int64_t rc;
>>>   	void *addr;
>>>
>>>   	/* XXX FIXME: Handle 64-bit only DMA devices */
>>>   	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>>>   	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>>> +	total_weight = pnv_pci_ioda_total_dma_weight(phb);
>>> +	weight = pnv_pci_ioda_pe_dma_weight(pe);
>>> +
>>> +	segs = (weight * phb->ioda.dma32_count) / total_weight;
>>> +	if (!segs)
>>> +		segs = 1;
>>> +
>>> +	/*
>>> +	 * Allocate contiguous DMA32 segments. We begin with the expected
>>> +	 * number of segments. With one more attempt, the number of DMA32
>>> +	 * segments to be allocated is decreased by one until one segment
>>> +	 * is allocated successfully.
>>> +	 */
>>> +	while (segs) {
>>> +		for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
>>> +			for (i = base; i < base + segs; i++) {
>>> +				if (phb->ioda.dma32_segmap[i] !=
>>> +				    IODA_INVALID_PE)
>>> +					break;
>>> +			}
>>> +
>>> +			if (i >= base + segs)
>>> +				break;
>>> +		}
>>> +
>>> +		if (i >= base + segs)
>>> +			break;
>>> +
>>> +		segs--;
>>> +	}
>>> +
>>> +	if (!segs) {
>>> +		pe_warn(pe, "No available DMA32 segments\n");
>>> +		return;
>>> +	}
>>>
>>>   	tbl = pnv_pci_table_alloc(phb->hose->node);
>>>   	iommu_register_group(&pe->table_group, phb->hose->global_number,
>>> @@ -2028,6 +2062,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>   	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>>>
>>>   	/* Grab a 32-bit TCE table */
>>> +	pe_info(pe, "DMA weight %d (%d), assigned (%d) %d DMA32 segments\n",
>>> +		weight, total_weight, base, segs);
>>>   	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>>   		base * PNV_IODA1_DMA32_SEGSIZE,
>>>   		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>>> @@ -2064,6 +2100,10 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>>   		}
>>>   	}
>>>
>>> +	/* Setup DMA32 segment mapping */
>>> +	for (i = base; i < base + segs; i++)
>>> +		phb->ioda.dma32_segmap[i] = pe->pe_number;
>>> +
>>>   	/* Setup linux iommu table */
>>>   	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
>>>   				  base * PNV_IODA1_DMA32_SEGSIZE,
>>> @@ -2538,70 +2578,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>>   static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>>   {
>>>   	struct pci_controller *hose = phb->hose;
>>> -	unsigned int weight, total_weight, dma_pe_count;
>>> -	unsigned int residual, remaining, segs, base;
>>>   	struct pnv_ioda_pe *pe;
>>> -
>>> -	total_weight = pnv_pci_ioda_total_dma_weight(phb);
>>> -	dma_pe_count = 0;
>>> -	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>> -		weight = pnv_pci_ioda_pe_dma_weight(pe);
>>> -		if (weight > 0)
>>> -			dma_pe_count++;
>>> -	}
>>> +	unsigned int weight;
>>>
>>>   	/* If we have more PE# than segments available, hand out one
>>>   	 * per PE until we run out and let the rest fail. If not,
>>>   	 * then we assign at least one segment per PE, plus more based
>>>   	 * on the amount of devices under that PE
>>>   	 */
>>> -	if (dma_pe_count > phb->ioda.tce32_count)
>>> -		residual = 0;
>>> -	else
>>> -		residual = phb->ioda.tce32_count - dma_pe_count;
>>> -
>>>   	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
>>> -		hose->global_number, phb->ioda.tce32_count);
>>> -	pr_info("PCI: %d PE# for a total weight of %d\n",
>>> -		dma_pe_count, total_weight);
>>> +		hose->global_number, phb->ioda.dma32_count);
>>>
>>>   	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>>
>>> -	/* Walk our PE list and configure their DMA segments, hand them
>>> -	 * out one base segment plus any residual segments based on
>>> -	 * weight
>>> -	 */
>>> -	remaining = phb->ioda.tce32_count;
>>> -	base = 0;
>>> +	/* Walk our PE list and configure their DMA segments */
>>>   	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>>   		weight = pnv_pci_ioda_pe_dma_weight(pe);
>>>   		if (!weight)
>>>   			continue;
>>>
>>> -		if (!remaining) {
>>> -			pe_warn(pe, "No DMA32 resources available\n");
>>> -			continue;
>>> -		}
>>> -		segs = 1;
>>> -		if (residual) {
>>> -			segs += ((weight * residual) + (total_weight / 2)) /
>>> -				total_weight;
>>> -			if (segs > remaining)
>>> -				segs = remaining;
>>> -		}
>>> -
>>>   		/*
>>>   		 * For IODA2 compliant PHB3, we needn't care about the weight.
>>>   		 * The all available 32-bits DMA space will be assigned to
>>>   		 * the specific PE.
>>>   		 */
>>>   		if (phb->type == PNV_PHB_IODA1) {
>>> -			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>>> -				weight, segs);
>>> -			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>>> +			pnv_pci_ioda1_setup_dma_pe(phb, pe);
>>>   		} else if (phb->type == PNV_PHB_IODA2) {
>>>   			pe_info(pe, "Assign DMA32 space\n");
>>> -			segs = 0;
>>>   			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>>   		} else if (phb->type == PNV_PHB_NPU) {
>>>   			/*
>>> @@ -2611,9 +2615,6 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>>   			 * as the PHB3 TVT.
>>>   			 */
>>>   		}
>>> -
>>> -		remaining -= segs;
>>> -		base += segs;
>>>   	}
>>>   }
>>>
>>> @@ -3313,7 +3314,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   {
>>>   	struct pci_controller *hose;
>>>   	struct pnv_phb *phb;
>>> -	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
>>> +	unsigned long size, m64map_off, m32map_off, pemap_off;
>>> +	unsigned long iomap_off = 0, dma32map_off = 0;
>>>   	const __be64 *prop64;
>>>   	const __be32 *prop32;
>>>   	int i, len;
>>> @@ -3398,6 +3400,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
>>>   	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
>>>
>>> +	/* Calculate how many 32-bit TCE segments we have */
>>> +	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
>>> +				PNV_IODA1_DMA32_SEGSIZE;
>>> +
>>>   	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>>   	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>>   	m64map_off = size;
>>> @@ -3407,6 +3413,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   	if (phb->type == PNV_PHB_IODA1) {
>>>   		iomap_off = size;
>>>   		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
>>> +		dma32map_off = size;
>>> +		size += phb->ioda.dma32_count *
>>> +			sizeof(phb->ioda.dma32_segmap[0]);
>>>   	}
>>>   	pemap_off = size;
>>>   	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
>>> @@ -3422,6 +3431,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   		phb->ioda.io_segmap = aux + iomap_off;
>>>   		for (i = 0; i < phb->ioda.total_pe_num; i++)
>>>   			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
>>> +
>>> +		phb->ioda.dma32_segmap = aux + dma32map_off;
>>> +		for (i = 0; i < phb->ioda.dma32_count; i++)
>>> +			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
>>>   	}
>>>   	phb->ioda.pe_array = aux + pemap_off;
>>>   	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>>> @@ -3430,7 +3443,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   	mutex_init(&phb->ioda.pe_list_mutex);
>>>
>>>   	/* Calculate how many 32-bit TCE segments we have */
>>> -	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
>>> +	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
>>>   				PNV_IODA1_DMA32_SEGSIZE;
>>>
>>>   #if 0 /* We should really do that ... */
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index e90bcbe..350e630 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -146,6 +146,10 @@ struct pnv_phb {
>>>   		int			*m32_segmap;
>>>   		int			*io_segmap;
>>>
>>> +		/* DMA32 segment maps - IODA1 only */
>>> +		unsigned long		dma32_count;
>>> +		int			*dma32_segmap;
>>> +
>>>   		/* IRQ chip */
>>>   		int			irq_chip_init;
>>>   		struct irq_chip		irq_chip;
>>> @@ -162,9 +166,6 @@ struct pnv_phb {
>>>   		 */
>>>   		unsigned char		pe_rmap[0x10000];
>>>
>>> -		/* 32-bit TCE tables allocation */
>>> -		unsigned long		tce32_count;
>>> -
>>>   		/* TCE cache invalidate registers (physical and
>>>   		 * remapped)
>>>   		 */
>>>
>>
>>
>> --
>> Alexey
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID
  2016-04-20  4:14         ` Alexey Kardashevskiy
@ 2016-04-22  4:23           ` Alistair Popple
  0 siblings, 0 replies; 174+ messages in thread
From: Alistair Popple @ 2016-04-22  4:23 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Gavin Shan, devicetree, linux-pci,
	grant.likely, robherring2, bhelgaas, dja

On Wed, 20 Apr 2016 14:14:13 Alexey Kardashevskiy wrote:

<snip>

> >
> >>
> >> btw how will new OPAL react if old kernel is running, i.e. not passing @state
> >> at all? If it is initialized to NULL somewher - fine but what exactly does
> >> this initialization and makes sure that OPAL won't see garbage as a second
> >> parameter?
> >>
> >
> > @state is always NULL for old kernel + new OPAL.
> 
> What piece of code writes NULL to "state"? Old kernel will do 
> opal_pci_poll(id) and omit the "state" so something has to take care of it 
> and write NULL where OPAL expects to see a pointer to "state" (which we set 
> to NULL) and that place would be some GPR which may have garbage.
> 
> I am looking at
> #define OPAL_CALL(name, token)
> 
> and cannot see where the second parameter (which old kernel omits) is 
> reset, i.e. gpr4 (or gpr5?) is cleared.

Gavin walked me through the code here and he's right that this won't cause a
problem at the moment. Not because @state is NULL on old kernels (as you point
out it isn't) but because old kernels call a specific sequence of OPAL calls 
which mean that the new Skiboot wont check @state when an old kernel is running.

However this is more a side-effect and is very brittle. It would need far more
commenting on the OPAL side and there's still a good chance someone will break
it by accident. Far easier just to add another OPAL call - eg. opal_pci_poll2().

> > @state is used in
> > PCI hotplug functionality in OPAL only. As old kernel doesn't support
> > PCI hotplug, @state is never used. I'm not sure it's the answer you
> > want?
> 
> No, it is not :)
> 
> >> When ABI like this changes, I expect to see opal_pci_poll2() or
> >> opal_pci_poll_ex() rather than just an additional parameter to
> >> opal_pci_poll()...
> >>
> >
> > It's a good suggestion but it would be nicer if you raised this
> > early. One question I have: current opal_pci_poll() is enough
> > to cover all cases, why we need introduce and maintain another
> > similar one? Sorry that I don't see the reason from your context
> > and could you please provide more details?

Because @state will be some non-NULL random number when opal_pci_poll() is
called on older kernels and if OPAL ever dereferences it you will at best get
a machine check and at worst random memory corruption. It would be far easier
to maintain a new OPAL call than to deal with these kind of problems in the
future.

Also it is an important suggestion that has potentially saved some difficult
debugging. Whilst it is *preferable* for reviewers to notice things earlier in
the review process it is also *preferable* for developers to not write bugs in
the first place. Unfortunately we all miss things and make mistakes. It's far
more important that we catch and correct these problems even if they are noticed
later than we would desire.

- Alistair

> >>> +
> >>> +	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
> >>> +}
> >>> +
> >>>   #ifdef CONFIG_PCI_MSI
> >>>   int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
> >>>   {
> >>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> >>> index 0cddde3..6857703 100644
> >>> --- a/arch/powerpc/platforms/powernv/pci.h
> >>> +++ b/arch/powerpc/platforms/powernv/pci.h
> >>> @@ -192,6 +192,7 @@ extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
> >>>   		unsigned long *hpa, enum dma_data_direction *direction);
> >>>   extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
> >>>
> >>> +int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state);
> >>>   void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
> >>>   				unsigned char *log_buff);
> >>>   int pnv_pci_cfg_read(struct pci_dn *pdn,
> >>>
> >>
> >>
> >> --
> >> Alexey
> >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 40/45] drivers/of: Split unflatten_dt_node()
  2016-02-17 14:30   ` Rob Herring
  2016-04-20  2:38     ` Gavin Shan
@ 2016-05-02  2:02     ` Gavin Shan
  1 sibling, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-05-02  2:02 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree,
	Benjamin Herrenschmidt, Michael Ellerman, aik, dja,
	Bjorn Helgaas, Grant Likely

On Wed, Feb 17, 2016 at 08:30:42AM -0600, Rob Herring wrote:
>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> The function unflatten_dt_node() is called recursively to unflatten
>> device nodes and properties in the FDT blob. It looks complicated
>> and hard to be understood.
>>
>> This splits the function into 3 functions: populate_properties(),
>> populate_node() and unflatten_dt_node(). populate_properties(),
>> which is called by populate_node(), creates properties for the
>> indicated device node. The later one creates the device nodes
>> from FDT blob. populate_node() gets the offset in FDT blob for
>> next device nodes and then calls populate_node(). No logical
>> changes introduced.
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  drivers/of/fdt.c | 249 ++++++++++++++++++++++++++++++++-----------------------
>>  1 file changed, 147 insertions(+), 102 deletions(-)
>
>One nit, otherwise:
>
>Acked-by: Rob Herring <robh@kernel.org>
>
>[...]
>
>> +               /* And we process the "ibm,phandle" property
>> +                * used in pSeries dynamic device tree
>> +                * stuff
>> +                */
>> +               if (!strcmp(pname, "ibm,phandle"))
>> +                       np->phandle = be32_to_cpup(val);
>> +
>> +               pp->name   = (char *)pname;
>> +               pp->length = sz;
>> +               pp->value  = (__be32 *)val;
>
>This cast should not be needed.
>

It's needed. Otherwise, we will have warning. So I will keep it. I just
went through this one for next revision and sorry for late response.

drivers/of/fdt.c:225:14: warning: assignment discards ‘const’ qualifier from pointer target type
   pp->value  = val;
              ^

Thanks,
Gavin

>> +               *pprev     = pp;
>> +               pprev      = &pp->next;
>> +       }
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-04-19 10:36         ` Alexey Kardashevskiy
  (?)
  (?)
@ 2016-05-02  3:44         ` Gavin Shan
  2016-05-02  6:11           ` Alexey Kardashevskiy
  -1 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-05-02  3:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, Alistair Popple, linux-pci, devicetree,
	benh, mpe, dja, bhelgaas, robherring2, grant.likely

On Tue, Apr 19, 2016 at 08:36:48PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>This adds standalone driver to support PCI hotplug for PowerPC PowerNV
>>platform that runs on top of skiboot firmware. The firmware identifies
>>hotpluggable slots and marked their device tree node with proper
>>"ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
>>device tree nodes to create/register PCI hotplug slot accordingly.
>>
>>The PCI slots are organized in fashion of tree, which means one
>>PCI slot might have parent PCI slot and parent PCI slot possibly
>>contains multiple child PCI slots. At the plugging time, the parent
>>PCI slot is populated before its children. The child PCI slots are
>>removed before their parent PCI slot can be removed from the system.
>>
>>If the skiboot firmware doesn't support slot status retrieval, the PCI
>>slot device node shouldn't have property "ibm,reset-by-firmware". In
>>that case, none of valid PCI slots will be detected from device tree.
>>The skiboot firmware doesn't export the capability to access attention
>>LEDs yet and it's something for TBD.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>---
>>  drivers/pci/hotplug/Kconfig   |  12 +
>>  drivers/pci/hotplug/Makefile  |   3 +
>>  drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 885 insertions(+)
>>  create mode 100644 drivers/pci/hotplug/pnv_php.c
>>
>>diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>>index df8caec..167c8ce 100644
>>--- a/drivers/pci/hotplug/Kconfig
>>+++ b/drivers/pci/hotplug/Kconfig
>>@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>>
>>  	  When in doubt, say N.
>>
>>+config HOTPLUG_PCI_POWERNV
>>+	tristate "PowerPC PowerNV PCI Hotplug driver"
>>+	depends on PPC_POWERNV && EEH
>>+	help
>>+	  Say Y here if you run PowerPC PowerNV platform that supports
>>+	  PCI Hotplug
>>+
>>+	  To compile this driver as a module, choose M here: the
>>+	  module will be called pnv-php.
>>+
>>+	  When in doubt, say N.
>>+
>>  config HOTPLUG_PCI_RPA
>>  	tristate "RPA PCI Hotplug driver"
>>  	depends on PPC_PSERIES && EEH
>>diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>index b616e75..e33cdda 100644
>>--- a/drivers/pci/hotplug/Makefile
>>+++ b/drivers/pci/hotplug/Makefile
>>@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>>  obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>>  obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>>  obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
>>+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>>  obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>>  obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>>  obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
>>@@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>>  acpiphp-objs		:=	acpiphp_core.o	\
>>  				acpiphp_glue.o
>>
>>+pnv-php-objs		:=	pnv_php.o
>>+
>>  rpaphp-objs		:=	rpaphp_core.o	\
>>  				rpaphp_pci.o	\
>>  				rpaphp_slot.o
>>diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>>new file mode 100644
>>index 0000000..364ec36
>>--- /dev/null
>>+++ b/drivers/pci/hotplug/pnv_php.c
>>@@ -0,0 +1,870 @@
>>+/*
>>+ * PCI Hotplug Driver for PowerPC PowerNV platform.
>>+ *
>>+ * Copyright Gavin Shan, IBM Corporation 2015.
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+
>>+#include <linux/libfdt.h>
>>+#include <linux/module.h>
>>+#include <linux/pci.h>
>>+#include <linux/pci_hotplug.h>
>>+
>>+#include <asm/opal.h>
>>+#include <asm/pnv-pci.h>
>>+#include <asm/ppc-pci.h>
>>+
>>+#define DRIVER_VERSION	"0.1"
>>+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
>>+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
>>+
>>+struct pnv_php_slot {
>>+	struct hotplug_slot		slot;
>>+	struct hotplug_slot_info	slot_info;
>>+	uint64_t			id;
>>+	char				*name;
>>+	int				slot_no;
>>+	struct kref			kref;
>>+#define PNV_PHP_STATE_INITIALIZED	0
>>+#define PNV_PHP_STATE_REGISTERED	1
>>+#define PNV_PHP_STATE_POPULATED		2
>>+	int				state;
>>+	struct device_node		*dn;
>>+	struct pci_dev			*pdev;
>>+	struct pci_bus			*bus;
>>+	bool				power_state_check;
>>+	int				power_state_confirmed;
>>+#define PNV_PHP_POWER_CONFIRMED_INVALID	0
>>+#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
>>+#define PNV_PHP_POWER_CONFIRMED_FAIL	2
>>+	struct opal_msg			*msg;
>>+	void				*fdt;
>>+	void				*dt;
>>+	struct of_changeset		ocs;
>>+	struct work_struct		work;
>>+	wait_queue_head_t		queue;
>>+	struct pnv_php_slot		*parent;
>>+	struct list_head		children;
>>+	struct list_head		link;
>>+};
>>+
>>+static LIST_HEAD(pnv_php_slot_list);
>>+static DEFINE_SPINLOCK(pnv_php_lock);
>>+
>>+static void pnv_php_register(struct device_node *dn);
>>+static void pnv_php_unregister_one(struct device_node *dn);
>>+static void pnv_php_unregister(struct device_node *dn);
>
>
>The names confused me. I'd suggest pnv_php_scan(), pnv_php_unregister(),
>pnv_php_unregister_children() instead.
>
>
>Alistair, what do you reckon?
>
>
>>+
>>+static void pnv_php_free_slot(struct kref *kref)
>>+{
>>+	struct pnv_php_slot *php_slot = container_of(kref,
>>+						     struct pnv_php_slot,
>>+						     kref);
>>+
>>+	WARN_ON(!list_empty(&php_slot->children));
>>+	kfree(php_slot->name);
>>+	kfree(php_slot);
>>+}
>>+
>>+static inline void pnv_php_put_slot(struct pnv_php_slot *php_slot)
>>+{
>>+	if (!php_slot)
>
>
>BUG_ON()?
>

checkpatch.pl will report warning like below. Are you sure you need a BUG_ON()?

WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
#159: FILE: drivers/pci/hotplug/pnv_php.c:76:
+	BUG_ON(!php_slot);


>>+		return;
>>+
>>+	kref_put(&php_slot->kref, pnv_php_free_slot);
>>+}
>>+
>>+static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
>>+					  struct pnv_php_slot *php_slot)
>>+{
>>+	struct pnv_php_slot *target, *tmp;
>>+
>>+	if (php_slot->dn == dn) {
>>+		kref_get(&php_slot->kref);
>>+		return php_slot;
>>+	}
>>+
>>+	list_for_each_entry(tmp, &php_slot->children, link) {
>>+		target = pnv_php_match(dn, tmp);
>>+		if (target)
>>+			return target;
>>+	}
>>+
>>+	return NULL;
>>+}
>>+
>>+static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
>>+{
>>+	struct pnv_php_slot *php_slot, *tmp;
>>+	unsigned long flags;
>>+
>>+	spin_lock_irqsave(&pnv_php_lock, flags);
>>+	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
>>+		php_slot = pnv_php_match(dn, tmp);
>>+		if (php_slot) {
>>+			spin_unlock_irqrestore(&pnv_php_lock, flags);
>>+			return php_slot;
>>+		}
>>+	}
>>+	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>+
>>+	return NULL;
>>+}
>>+
>>+/*
>>+ * Remove pdn for all children of the indicated device node.
>>+ * The function should remove pdn in a depth-first manner.
>>+ */
>>+static void pnv_php_rmv_pdns(struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+
>>+	for_each_child_of_node(dn, child) {
>>+		pnv_php_rmv_pdns(child);
>>+
>>+		pci_remove_device_node_info(child);
>>+	}
>>+}
>>+
>>+/*
>>+ * Remove all child nodes of the indicated device nodes. The
>>+ * function should remove device nodes in depth-first manner.
>>+ */
>>+static int pnv_php_rmv_device_nodes(struct device_node *parent)
>>+{
>>+	struct device_node *dn, *child;
>>+	int ret = 0;
>>+
>>+	for_each_child_of_node(parent, dn) {
>>+		ret = pnv_php_rmv_device_nodes(dn);
>>+		if (ret)
>>+			return ret;
>>+
>>+		child = of_get_next_child(dn, NULL);
>>+		if (child) {
>>+			of_node_put(child);
>>+			of_node_put(dn);
>>+			pr_err("%s: Alive children of node <%s>\n",
>>+			       __func__, of_node_full_name(dn));
>>+			return -EBUSY;
>>+		}
>>+
>>+		of_detach_node(dn);
>>+		of_node_put(dn);
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+/*
>>+ * The function processes the message sent by firmware
>>+ * to remove all device tree nodes beneath the slot's
>>+ * nodes and the associated auxiliary data.
>>+ */
>>+static void pnv_php_handle_poweroff(struct pnv_php_slot *php_slot)
>>+{
>>+	int ret;
>>+
>>+	pnv_php_rmv_pdns(php_slot->dn);
>>+
>>+	/*
>>+	 * If the device sub-tree was created from OF changeset, simply
>>+	 * to revert that. Otherwise, the device nodes in the sub-tree
>>+	 * need to be iterated and detached.
>>+	 */
>>+	if (php_slot->fdt) {
>>+		of_changeset_destroy(&php_slot->ocs);
>>+		kfree(php_slot->dt);
>>+		kfree(php_slot->fdt);
>>+		php_slot->dt        = NULL;
>>+		php_slot->dn->child = NULL;
>>+		php_slot->fdt       = NULL;
>>+		php_slot->power_state_confirmed =
>>+			PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>+		wake_up_interruptible(&php_slot->queue);
>>+		return;
>>+	}
>>+
>>+	ret = pnv_php_rmv_device_nodes(php_slot->dn);
>>+	if (!ret) {
>>+		php_slot->power_state_confirmed =
>>+			PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>+	} else {
>>+		php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
>>+		dev_warn(&php_slot->pdev->dev, "Error %d freeing nodes\n", ret);
>>+	}
>>+
>>+	wake_up_interruptible(&php_slot->queue);
>
>
>I liked one wake_up_interruptible() better...
>

Will fix in next revision.

>>+}
>>+
>>+static int pnv_php_populate_changeset(struct of_changeset *ocs,
>>+				      struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+	int ret = 0;
>>+
>>+	for_each_child_of_node(dn, child) {
>>+		ret = of_changeset_attach_node(ocs, child);
>>+		if (ret)
>>+			break;
>>+
>>+		ret = pnv_php_populate_changeset(ocs, child);
>
>
>I asked in v7 - may be to add here "if (ret) break;"?
>

Will add it in v9.

>>+	}
>>+
>>+	return ret;
>>+}
>>+
>>+static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
>>+{
>>+	struct pci_controller *hose = (struct pci_controller *)data;
>>+	struct pci_dn *pdn;
>>+
>>+	pdn = pci_add_device_node_info(hose, dn);
>>+	if (!pdn)
>>+		return ERR_PTR(-ENOMEM);
>>+
>>+	return NULL;
>>+}
>>+
>>+static void pnv_php_add_pdns(struct pnv_php_slot *slot)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(slot->bus);
>>+
>>+	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
>>+}
>>+
>>+static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
>>+{
>>+	void *fdt, *fdt1, *dt;
>>+	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>+	int ret;
>>+
>>+	/* We don't know the FDT blob size. We try to get it through
>>+	 * maximal memory chunk and then copy it to another chunk that
>>+	 * fits the real size.
>>+	 */
>>+	fdt1 = kzalloc(0x10000, GFP_KERNEL);
>>+	if (!fdt1)
>>+		goto error;
>>+
>>+	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
>>+	if (ret)
>>+		goto free_fdt1;
>>+
>>+	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
>>+	if (!fdt)
>>+		goto free_fdt1;
>>+
>>+	/* Unflatten device tree blob */
>>+	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
>>+	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
>>+	if (!dt) {
>>+		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
>>+		goto free_fdt;
>>+	}
>>+
>>+	/* Initialize and apply the changeset */
>>+	of_changeset_init(&php_slot->ocs);
>>+	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
>>+	if (ret) {
>>+		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
>>+			 ret);
>>+		goto free_dt;
>>+	}
>>+
>>+	php_slot->dn->child = NULL;
>>+	ret = of_changeset_apply(&php_slot->ocs);
>>+	if (ret) {
>>+		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
>>+			 ret);
>>+		goto destroy_changeset;
>>+	}
>>+
>>+	/* Add device node firmware data */
>>+	pnv_php_add_pdns(php_slot);
>>+	php_slot->fdt = fdt;
>>+	php_slot->dt  = dt;
>>+	goto out;
>>+
>>+destroy_changeset:
>>+	of_changeset_destroy(&php_slot->ocs);
>>+free_dt:
>>+	kfree(dt);
>>+	php_slot->dn->child = NULL;
>>+free_fdt:
>>+	kfree(fdt);
>>+free_fdt1:
>>+	kfree(fdt1);
>>+error:
>>+	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
>>+out:
>>+	/* Confirm status change */
>>+	php_slot->power_state_confirmed = confirm;
>>+	wake_up_interruptible(&php_slot->queue);
>>+}
>>+
>>+static void pnv_php_work(struct work_struct *data)
>>+{
>>+	struct pnv_php_slot *php_slot = container_of(data,
>>+						     struct pnv_php_slot,
>>+						     work);
>>+	uint64_t event = be64_to_cpu(php_slot->msg->params[0]);
>>+
>>+	if (event == OPAL_PCI_SLOT_POWER_OFF)
>>+		pnv_php_handle_poweroff(php_slot);
>>+	else
>>+		pnv_php_handle_poweron(php_slot);
>>+
>>+	pnv_php_put_slot(php_slot);
>>+}
>>+
>>+static int pnv_php_handle_msg(struct notifier_block *nb,
>>+			      unsigned long type,
>>+			      void *message)
>>+{
>>+	phandle h;
>>+	struct device_node *dn;
>>+	struct pnv_php_slot *php_slot;
>>+	struct opal_msg *msg = message;
>>+
>>+	if (type != OPAL_MSG_PCI_HOTPLUG) {
>>+		pr_warn("%s: Invalid message %ld received!\n",
>>+			__func__, type);
>>+		return NOTIFY_DONE;
>>+	}
>>+
>>+	h = (phandle)be64_to_cpu(msg->params[1]);
>>+	dn = of_find_node_by_phandle(h);
>>+	if (!dn) {
>>+		pr_warn("%s: No device node for phandle 0x%x\n",
>>+			__func__, h);
>>+		return NOTIFY_DONE;
>>+	}
>>+
>>+	php_slot = pnv_php_find_slot(dn);
>>+	if (!php_slot) {
>>+		pr_warn("%s: No slot found for node <%s>\n",
>>+			__func__, of_node_full_name(dn));
>>+		of_node_put(dn);
>>+		return NOTIFY_DONE;
>>+	}
>>+
>>+	of_node_put(dn);
>>+	php_slot->msg = msg;
>>+	schedule_work(&php_slot->work);
>>+	return NOTIFY_OK;
>>+}
>>+
>>+static int pnv_php_set_power_state(struct hotplug_slot *slot, u8 state)
>>+{
>>+	struct pnv_php_slot *php_slot = slot->private;
>>+	int ret;
>>+
>>+	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
>>+	ret = pnv_pci_set_power_state(php_slot->id, state);
>>+	if (ret) {
>>+		dev_warn(&php_slot->pdev->dev, "Error %d powering %s slot\n",
>>+			 ret, state ? "on" : "off");
>>+		return ret;
>>+	}
>>+
>>+	/* Continue to PCI probing after finalized device-tree. The
>>+	 * device-tree might have been updated completely at this
>>+	 * point. Thus we don't have to wait forever.
>>+	 */
>>+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
>>+		return 0;
>>+
>>+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
>>+		return -EBUSY;
>>+
>>+	/* Wait for firmware to add or remove device sub-tree. When it's done,
>>+	 * one signal is received from firmware.
>>+	 */
>>+	ret = wait_event_timeout(php_slot->queue,
>>+				 php_slot->power_state_confirmed, 10 * HZ);
>>+	if (!ret) {
>>+		dev_warn(&php_slot->pdev->dev, "Error %d waiting for power-%s\n",
>>+			 ret, state ? "on" : "off");
>>+		return -EBUSY;
>>+	}
>>+
>>+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
>>+		return 0;
>>+
>>+	dev_warn(&php_slot->pdev->dev, "Error status %d for power-%s\n",
>>+		 php_slot->power_state_confirmed, state ? "on" : "off");
>>+	return -EBUSY;
>>+}
>>+
>>+static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
>>+{
>>+	struct pnv_php_slot *php_slot = slot->private;
>>+	uint8_t power_state;
>
>
>Uninitialized variable.
>

When pnv_pci_get_power_state() fails to get the power state, it fails back to
default one (OPAL_PCI_SLOT_POWER_ON). Otherwise, it is set to the state returned
from pnv_pci_get_power_state(). The logic is complete. Also, I don't see building
warning/error caused by this.

>
>>+	int ret;
>>+
>>+	/*
>>+	 * Retrieve power status from firmware. If we fail
>>+	 * getting that, the power status fails back to
>>+	 * be on.
>>+	 */
>>+	ret = pnv_pci_get_power_state(php_slot->id, &power_state);
>>+	if (ret) {
>>+		*state = OPAL_PCI_SLOT_POWER_ON;
>>+		dev_warn(&php_slot->pdev->dev, "Error %d getting power status\n",
>>+			 ret);
>>+	} else {
>>+		*state = power_state;
>>+		slot->info->power_status = power_state;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
>>+{
>>+	struct pnv_php_slot *php_slot = slot->private;
>>+	uint8_t presence;
>
>Uninitialized variable.
>

Same as above.

>>+	int ret;
>>+
>>+	/*
>>+	 * Retrieve presence status from firmware. If we can't
>>+	 * get that, it will fail back to be empty.
>>+	 */
>>+	ret = pnv_pci_get_presence_state(php_slot->id, &presence);
>>+	if (ret >= 0) {
>>+		*state = presence;
>>+		slot->info->adapter_status = presence;
>>+		ret = 0;
>>+	} else {
>>+		*state = OPAL_PCI_SLOT_EMPTY;
>>+		dev_warn(&php_slot->pdev->dev, "Error %d getting presence\n",
>>+			 ret);
>>+	}
>>+
>>+	return ret;
>>+}
>>+
>>+static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
>>+{
>>+	/* FIXME: Make it real once firmware supports it */
>
>It still does not?
>
>
>>+	slot->info->attention_status = state;
>>+
>>+	return 0;
>>+}
>>+
>>+static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
>>+{
>>+	struct hotplug_slot *slot = &php_slot->slot;
>>+	uint8_t presence, power_status;
>
>
>Uninitialized variables.
>
>

I will initialize them to default states in next revision.

>>+	int ret;
>>+
>>+	/* Check if the slot has been configured */
>>+	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
>>+		return 0;
>>+
>>+	/* Retrieve slot presence status */
>>+	ret = pnv_php_get_adapter_state(slot, &presence);
>>+	if (ret)
>>+		return ret;
>>+
>>+	/* Proceed if there have nothing behind the slot */
>>+	if (presence == OPAL_PCI_SLOT_EMPTY)
>>+		goto scan;
>>+
>>+	/*
>>+	 * If the power suply to the slot is off, we can't detect
>
>s/suply/supply/
>

Will fix in next revision.

>>+	 * adapter presence state. That means we have to turn the
>>+	 * slot on before going to probe slot's presence state.
>>+	 *
>>+	 * On the first time, we don't change the power status to
>>+	 * boost system boot with assumption that the firmware
>>+	 * supplies consistent slot power status: empty slot always
>>+	 * has its power off and non-empty slot has its power on.
>>+	 */
>>+	if (!php_slot->power_state_check) {
>>+		php_slot->power_state_check = true;
>>+
>>+		ret = pnv_php_get_power_state(slot, &power_status);
>>+		if (ret)
>>+			return ret;
>>+
>>+		if (power_status != OPAL_PCI_SLOT_POWER_ON)
>>+			return 0;
>>+	}
>>+
>>+	/* Check the power status. Scan the slot if that's already on */
>
>
>s/that's/it is/
>

I don't know the difference. Will fix it in next revision anyway.

>
>>+	ret = pnv_php_get_power_state(slot, &power_status);
>>+	if (ret)
>>+		return ret;
>>+
>>+	if (power_status == OPAL_PCI_SLOT_POWER_ON)
>>+		goto scan;
>>+
>>+	/* Power is off, turn it on and then scan the slot */
>>+	ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
>>+	if (ret)
>>+		return ret;
>>+
>>+scan:
>>+	if (presence == OPAL_PCI_SLOT_PRESENT) {
>>+		if (rescan) {
>>+			pci_lock_rescan_remove();
>>+			pci_add_pci_devices(php_slot->bus);
>>+			pci_unlock_rescan_remove();
>>+		}
>>+
>>+		/* Rescan for child hotpluggable slots */
>>+		php_slot->state = PNV_PHP_STATE_POPULATED;
>>+		if (rescan)
>>+			pnv_php_register(php_slot->dn);
>>+	} else {
>>+		php_slot->state = PNV_PHP_STATE_POPULATED;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int pnv_php_enable_slot(struct hotplug_slot *slot)
>>+{
>>+	struct pnv_php_slot *php_slot = container_of(slot,
>>+						     struct pnv_php_slot, slot);
>>+
>>+	return pnv_php_enable(php_slot, true);
>>+}
>>+
>>+static int pnv_php_disable_slot(struct hotplug_slot *slot)
>>+{
>>+	struct pnv_php_slot *php_slot = slot->private;
>>+	uint8_t power_state;
>>+	int ret;
>>+
>>+	if (php_slot->state != PNV_PHP_STATE_POPULATED)
>>+		return 0;
>>+
>>+	/* Remove all devices behind the slot */
>>+	pci_lock_rescan_remove();
>>+	pci_remove_pci_devices(php_slot->bus);
>>+	pci_unlock_rescan_remove();
>>+
>>+	/* Detach the child hotpluggable slots */
>>+	pnv_php_unregister(php_slot->dn);
>>+
>>+	/*
>>+	 * Check the power status and turn it off if necessary. If we
>>+	 * fail to get the power status, the power will be forced to
>>+	 * be off.
>>+	 */
>>+	ret = pnv_php_get_power_state(slot, &power_state);
>>+	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
>>+		ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_OFF);
>>+		if (ret)
>>+			dev_warn(&php_slot->pdev->dev, "Error %d powering off\n",
>
>
>Long line, checkpatch.pl should have warned :)
>

I didn't see the warning from checkpatch.pl.

>>+				 ret);
>>+	}
>>+
>>+	/* Update slot state */
>>+	php_slot->state = PNV_PHP_STATE_REGISTERED;
>>+	return 0;
>>+}
>>+
>>+static struct hotplug_slot_ops php_slot_ops = {
>>+	.get_power_status	= pnv_php_get_power_state,
>>+	.get_adapter_status	= pnv_php_get_adapter_state,
>>+	.set_attention_status	= pnv_php_set_attention_state,
>>+	.enable_slot		= pnv_php_enable_slot,
>>+	.disable_slot		= pnv_php_disable_slot,
>>+};
>>+
>>+static void pnv_php_release(struct hotplug_slot *slot)
>>+{
>>+	struct pnv_php_slot *php_slot = slot->private;
>>+	unsigned long flags;
>>+
>>+	/* Remove from global or child list */
>>+	spin_lock_irqsave(&pnv_php_lock, flags);
>>+	list_del(&php_slot->link);
>>+	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>+
>>+	/* Detach from parent */
>>+	pnv_php_put_slot(php_slot);
>>+	pnv_php_put_slot(php_slot->parent);
>>+}
>>+
>>+static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
>>+{
>>+	struct device_node *parent = dn;
>>+	const __be64 *prop64;
>>+	const __be32 *prop32;
>>+
>>+	/*
>>+	 * The hotpluggable slot always has a compound Id, which
>>+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
>>+	 * number, and compound indicator
>>+	 */
>>+	*id = (0x1ul << 63);
>
>
>Is this bit from the same space as 1<<60 as in pnv_eeh_bridge_reset()? If so,
>it would be great to have all these id bits defined in one place.
>

Will have a macro (PCI_SLOT_ID) to produce the PCI slot ID in next revision.

>
>>+
>>+	/* Bus/Slot/Function number */
>>+	prop32 = of_get_property(dn, "reg", NULL);
>>+	if (!prop32)
>>+		return -ENXIO;
>>+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
>>+
>>+	/* PHB Id */
>>+	while ((parent = of_get_parent(parent))) {
>>+		if (!PCI_DN(parent)) {
>>+			of_node_put(parent);
>>+			break;
>>+		}
>>+
>>+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
>>+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
>>+			of_node_put(parent);
>>+			continue;
>>+		}
>>+
>>+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
>>+		if (!prop64) {
>>+			of_node_put(parent);
>>+			return -ENXIO;
>>+		}
>>+
>>+		*id |= be64_to_cpup(prop64);
>>+		of_node_put(parent);
>>+		return 0;
>>+	}
>>+
>>+	return -ENODEV;
>>+}
>>+
>>+static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
>>+{
>>+	struct pnv_php_slot *php_slot;
>>+	struct pci_bus *bus;
>>+	const char *label;
>>+	uint64_t id;
>>+
>>+	label = of_get_property(dn, "ibm,slot-label", NULL);
>>+	if (!label)
>>+		return NULL;
>>+
>>+	if (pnv_php_get_slot_id(dn, &id))
>>+		return NULL;
>>+
>>+	bus = pci_find_bus_by_node(dn);
>>+	if (!bus)
>>+		return NULL;
>>+
>>+	php_slot = kzalloc(sizeof(*php_slot), GFP_KERNEL);
>>+	if (!php_slot)
>>+		return NULL;
>>+
>>+	php_slot->name = kstrdup(label, GFP_KERNEL);
>>+	if (!php_slot->name) {
>>+		kfree(php_slot);
>>+		return NULL;
>>+	}
>>+
>>+	if (dn->child && PCI_DN(dn->child))
>>+		php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
>>+	else
>>+		php_slot->slot_no = -1;   /* Placeholder slot */
>>+
>>+	kref_init(&php_slot->kref);
>>+	php_slot->state	                = PNV_PHP_STATE_INITIALIZED;
>>+	php_slot->dn	                = dn;
>>+	php_slot->pdev	                = bus->self;
>>+	php_slot->bus	                = bus;
>>+	php_slot->id	                = id;
>>+	php_slot->power_state_check     = false;
>>+	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
>>+	php_slot->slot.ops              = &php_slot_ops;
>>+	php_slot->slot.info             = &php_slot->slot_info;
>>+	php_slot->slot.release          = pnv_php_release;
>>+	php_slot->slot.private          = php_slot;
>>+
>>+	INIT_WORK(&php_slot->work, pnv_php_work);
>>+	init_waitqueue_head(&php_slot->queue);
>>+	INIT_LIST_HEAD(&php_slot->children);
>>+	INIT_LIST_HEAD(&php_slot->link);
>>+
>>+	return php_slot;
>>+}
>>+
>>+static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
>>+{
>>+	struct pnv_php_slot *parent;
>>+	struct device_node *dn = php_slot->dn;
>>+	unsigned long flags;
>>+	int ret;
>>+
>>+	/* Check if the slot is registered or not */
>>+	parent = pnv_php_find_slot(php_slot->dn);
>>+	if (parent) {
>>+		pnv_php_put_slot(parent);
>>+		return -EEXIST;
>>+	}
>>+
>>+	/* Register PCI slot */
>>+	ret = pci_hp_register(&php_slot->slot, php_slot->bus,
>>+			      php_slot->slot_no, php_slot->name);
>>+	if (ret) {
>>+		dev_warn(&php_slot->pdev->dev, "Error %d registering slot\n",
>>+			 ret);
>>+		return ret;
>>+	}
>>+
>>+	/* Attach to the parent's child list or global list */
>>+	while ((dn = of_get_parent(dn))) {
>>+		if (!PCI_DN(dn)) {
>>+			of_node_put(dn);
>>+			break;
>>+		}
>>+
>>+		parent = pnv_php_find_slot(dn);
>>+		if (parent) {
>>+			of_node_put(dn);
>>+			break;
>>+		}
>>+
>>+		of_node_put(dn);
>>+	}
>>+
>>+	spin_lock_irqsave(&pnv_php_lock, flags);
>>+	php_slot->parent = parent;
>>+	if (parent)
>>+		list_add_tail(&php_slot->link, &parent->children);
>>+	else
>>+		list_add_tail(&php_slot->link, &pnv_php_slot_list);
>>+	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>+
>>+	php_slot->state = PNV_PHP_STATE_REGISTERED;
>>+	return 0;
>>+}
>>+
>>+static int pnv_php_register_one(struct device_node *dn)
>>+{
>>+	struct pnv_php_slot *php_slot;
>>+	const __be32 *prop32;
>>+	int ret;
>>+
>>+	/* Check if it's hotpluggable slot */
>>+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
>>+	if (!prop32 || !of_read_number(prop32, 1))
>>+		return -ENXIO;
>>+
>>+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
>>+	if (!prop32 || !of_read_number(prop32, 1))
>>+		return -ENXIO;
>>+
>>+	php_slot = pnv_php_alloc_slot(dn);
>>+	if (!php_slot)
>>+		return -ENODEV;
>>+
>>+	ret = pnv_php_register_slot(php_slot);
>>+	if (ret)
>>+		goto free_slot;
>>+
>>+	ret = pnv_php_enable(php_slot, false);
>>+	if (ret)
>>+		goto unregister_slot;
>>+
>>+	return 0;
>>+
>>+unregister_slot:
>>+	pnv_php_unregister_one(php_slot->dn);
>>+free_slot:
>>+	pnv_php_put_slot(php_slot);
>>+	return ret;
>>+}
>>+
>>+static void pnv_php_register(struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+
>>+	/*
>>+	 * The parent slots should be registered before their
>>+	 * child slots.
>>+	 */
>>+	for_each_child_of_node(dn, child) {
>>+		pnv_php_register_one(child);
>>+		pnv_php_register(child);
>>+	}
>>+}
>>+
>>+static void pnv_php_unregister_one(struct device_node *dn)
>>+{
>>+	struct pnv_php_slot *php_slot;
>>+
>>+	php_slot = pnv_php_find_slot(dn);
>>+	if (!php_slot)
>>+		return;
>>+
>>+	pnv_php_put_slot(php_slot);
>>+	pci_hp_deregister(&php_slot->slot);
>>+}
>>+
>>+static void pnv_php_unregister(struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+
>>+	/* The child slots should go before their parent slots */
>>+	for_each_child_of_node(dn, child) {
>>+		pnv_php_unregister(child);
>>+		pnv_php_unregister_one(child);
>>+	}
>>+}
>>+
>>+static struct notifier_block php_msg_nb = {
>>+	.notifier_call	= pnv_php_handle_msg,
>>+	.next		= NULL,
>>+	.priority	= 0,
>>+};
>>+
>>+static int __init pnv_php_init(void)
>>+{
>>+	struct device_node *dn;
>>+	int ret;
>>+
>>+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
>>+
>>+	/* Register hotplug message handler */
>>+	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
>>+	if (ret) {
>>+		pr_warn("%s: Error %d registering hotplug notifier\n",
>>+			__func__, ret);
>>+		return ret;
>>+	}
>>+
>>+	/* Scan PHB nodes and their children */
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>+		pnv_php_register(dn);
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>+		pnv_php_register(dn);
>>+
>>+	return 0;
>>+}
>>+
>>+static void __exit pnv_php_exit(void)
>>+{
>>+	struct device_node *dn;
>>+
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>+		pnv_php_unregister(dn);
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>+		pnv_php_unregister(dn);
>>+
>>+	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
>>+}
>>+
>>+module_init(pnv_php_init);
>>+module_exit(pnv_php_exit);
>>+
>>+MODULE_VERSION(DRIVER_VERSION);
>>+MODULE_LICENSE("GPL v2");
>>+MODULE_AUTHOR(DRIVER_AUTHOR);
>>+MODULE_DESCRIPTION(DRIVER_DESC);
>>
>
>
>-- 
>Alexey
>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-05-02  3:44         ` Gavin Shan
@ 2016-05-02  6:11           ` Alexey Kardashevskiy
  2016-05-02 23:38             ` Gavin Shan
  0 siblings, 1 reply; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-05-02  6:11 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, Alistair Popple, linux-pci, devicetree, benh, mpe,
	dja, bhelgaas, robherring2, grant.likely

On 05/02/2016 01:44 PM, Gavin Shan wrote:
> On Tue, Apr 19, 2016 at 08:36:48PM +1000, Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>> This adds standalone driver to support PCI hotplug for PowerPC PowerNV
>>> platform that runs on top of skiboot firmware. The firmware identifies
>>> hotpluggable slots and marked their device tree node with proper
>>> "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
>>> device tree nodes to create/register PCI hotplug slot accordingly.
>>>
>>> The PCI slots are organized in fashion of tree, which means one
>>> PCI slot might have parent PCI slot and parent PCI slot possibly
>>> contains multiple child PCI slots. At the plugging time, the parent
>>> PCI slot is populated before its children. The child PCI slots are
>>> removed before their parent PCI slot can be removed from the system.
>>>
>>> If the skiboot firmware doesn't support slot status retrieval, the PCI
>>> slot device node shouldn't have property "ibm,reset-by-firmware". In
>>> that case, none of valid PCI slots will be detected from device tree.
>>> The skiboot firmware doesn't export the capability to access attention
>>> LEDs yet and it's something for TBD.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>> ---
>>>  drivers/pci/hotplug/Kconfig   |  12 +
>>>  drivers/pci/hotplug/Makefile  |   3 +
>>>  drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
>>>  3 files changed, 885 insertions(+)
>>>  create mode 100644 drivers/pci/hotplug/pnv_php.c
>>>
>>> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>>> index df8caec..167c8ce 100644
>>> --- a/drivers/pci/hotplug/Kconfig
>>> +++ b/drivers/pci/hotplug/Kconfig
>>> @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>>>
>>>  	  When in doubt, say N.
>>>
>>> +config HOTPLUG_PCI_POWERNV
>>> +	tristate "PowerPC PowerNV PCI Hotplug driver"
>>> +	depends on PPC_POWERNV && EEH
>>> +	help
>>> +	  Say Y here if you run PowerPC PowerNV platform that supports
>>> +	  PCI Hotplug
>>> +
>>> +	  To compile this driver as a module, choose M here: the
>>> +	  module will be called pnv-php.
>>> +
>>> +	  When in doubt, say N.
>>> +
>>>  config HOTPLUG_PCI_RPA
>>>  	tristate "RPA PCI Hotplug driver"
>>>  	depends on PPC_PSERIES && EEH
>>> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>> index b616e75..e33cdda 100644
>>> --- a/drivers/pci/hotplug/Makefile
>>> +++ b/drivers/pci/hotplug/Makefile
>>> @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>>>  obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>>>  obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>>>  obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
>>> +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>>>  obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>>>  obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>>>  obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
>>> @@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>>>  acpiphp-objs		:=	acpiphp_core.o	\
>>>  				acpiphp_glue.o
>>>
>>> +pnv-php-objs		:=	pnv_php.o
>>> +
>>>  rpaphp-objs		:=	rpaphp_core.o	\
>>>  				rpaphp_pci.o	\
>>>  				rpaphp_slot.o
>>> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>>> new file mode 100644
>>> index 0000000..364ec36
>>> --- /dev/null
>>> +++ b/drivers/pci/hotplug/pnv_php.c
>>> @@ -0,0 +1,870 @@
>>> +/*
>>> + * PCI Hotplug Driver for PowerPC PowerNV platform.
>>> + *
>>> + * Copyright Gavin Shan, IBM Corporation 2015.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + */
>>> +
>>> +#include <linux/libfdt.h>
>>> +#include <linux/module.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/pci_hotplug.h>
>>> +
>>> +#include <asm/opal.h>
>>> +#include <asm/pnv-pci.h>
>>> +#include <asm/ppc-pci.h>
>>> +
>>> +#define DRIVER_VERSION	"0.1"
>>> +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
>>> +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
>>> +
>>> +struct pnv_php_slot {
>>> +	struct hotplug_slot		slot;
>>> +	struct hotplug_slot_info	slot_info;
>>> +	uint64_t			id;
>>> +	char				*name;
>>> +	int				slot_no;
>>> +	struct kref			kref;
>>> +#define PNV_PHP_STATE_INITIALIZED	0
>>> +#define PNV_PHP_STATE_REGISTERED	1
>>> +#define PNV_PHP_STATE_POPULATED		2
>>> +	int				state;
>>> +	struct device_node		*dn;
>>> +	struct pci_dev			*pdev;
>>> +	struct pci_bus			*bus;
>>> +	bool				power_state_check;
>>> +	int				power_state_confirmed;
>>> +#define PNV_PHP_POWER_CONFIRMED_INVALID	0
>>> +#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
>>> +#define PNV_PHP_POWER_CONFIRMED_FAIL	2
>>> +	struct opal_msg			*msg;
>>> +	void				*fdt;
>>> +	void				*dt;
>>> +	struct of_changeset		ocs;
>>> +	struct work_struct		work;
>>> +	wait_queue_head_t		queue;
>>> +	struct pnv_php_slot		*parent;
>>> +	struct list_head		children;
>>> +	struct list_head		link;
>>> +};
>>> +
>>> +static LIST_HEAD(pnv_php_slot_list);
>>> +static DEFINE_SPINLOCK(pnv_php_lock);
>>> +
>>> +static void pnv_php_register(struct device_node *dn);
>>> +static void pnv_php_unregister_one(struct device_node *dn);
>>> +static void pnv_php_unregister(struct device_node *dn);
>>
>>
>> The names confused me. I'd suggest pnv_php_scan(), pnv_php_unregister(),
>> pnv_php_unregister_children() instead.
>>
>>
>> Alistair, what do you reckon?
>>
>>
>>> +
>>> +static void pnv_php_free_slot(struct kref *kref)
>>> +{
>>> +	struct pnv_php_slot *php_slot = container_of(kref,
>>> +						     struct pnv_php_slot,
>>> +						     kref);
>>> +
>>> +	WARN_ON(!list_empty(&php_slot->children));
>>> +	kfree(php_slot->name);
>>> +	kfree(php_slot);
>>> +}
>>> +
>>> +static inline void pnv_php_put_slot(struct pnv_php_slot *php_slot)
>>> +{
>>> +	if (!php_slot)
>>
>>
>> BUG_ON()?
>>
>
> checkpatch.pl will report warning like below. Are you sure you need a BUG_ON()?


No, I am not - this is why I asked. How possible is it to have here 
phb_slot==NULL? Can we recover from that? The options are -
1) memory is corrupted (then we cannot and it has to be BUG_ON)
2) broken/old OPAL returns unexpected error (then we can continue, I guess)
3) there are ways (via sysfs in the userspace? no idea) to get 
pnv_php_put_slot() called with phb_slot.

If only 1) is possible - then BUG_ON, if 2) - WARN_ON, if 3) - should be 
neither BUG_ON nor WARN_ON. You know the code better, you decide.


>
> WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
> #159: FILE: drivers/pci/hotplug/pnv_php.c:76:
> +	BUG_ON(!php_slot);
>
>
>>> +		return;
>>> +
>>> +	kref_put(&php_slot->kref, pnv_php_free_slot);
>>> +}
>>> +
>>> +static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
>>> +					  struct pnv_php_slot *php_slot)
>>> +{
>>> +	struct pnv_php_slot *target, *tmp;
>>> +
>>> +	if (php_slot->dn == dn) {
>>> +		kref_get(&php_slot->kref);
>>> +		return php_slot;
>>> +	}
>>> +
>>> +	list_for_each_entry(tmp, &php_slot->children, link) {
>>> +		target = pnv_php_match(dn, tmp);
>>> +		if (target)
>>> +			return target;
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
>>> +{
>>> +	struct pnv_php_slot *php_slot, *tmp;
>>> +	unsigned long flags;
>>> +
>>> +	spin_lock_irqsave(&pnv_php_lock, flags);
>>> +	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
>>> +		php_slot = pnv_php_match(dn, tmp);
>>> +		if (php_slot) {
>>> +			spin_unlock_irqrestore(&pnv_php_lock, flags);
>>> +			return php_slot;
>>> +		}
>>> +	}
>>> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +/*
>>> + * Remove pdn for all children of the indicated device node.
>>> + * The function should remove pdn in a depth-first manner.
>>> + */
>>> +static void pnv_php_rmv_pdns(struct device_node *dn)
>>> +{
>>> +	struct device_node *child;
>>> +
>>> +	for_each_child_of_node(dn, child) {
>>> +		pnv_php_rmv_pdns(child);
>>> +
>>> +		pci_remove_device_node_info(child);
>>> +	}
>>> +}
>>> +
>>> +/*
>>> + * Remove all child nodes of the indicated device nodes. The
>>> + * function should remove device nodes in depth-first manner.
>>> + */
>>> +static int pnv_php_rmv_device_nodes(struct device_node *parent)
>>> +{
>>> +	struct device_node *dn, *child;
>>> +	int ret = 0;
>>> +
>>> +	for_each_child_of_node(parent, dn) {
>>> +		ret = pnv_php_rmv_device_nodes(dn);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>> +		child = of_get_next_child(dn, NULL);
>>> +		if (child) {
>>> +			of_node_put(child);
>>> +			of_node_put(dn);
>>> +			pr_err("%s: Alive children of node <%s>\n",
>>> +			       __func__, of_node_full_name(dn));
>>> +			return -EBUSY;
>>> +		}
>>> +
>>> +		of_detach_node(dn);


While playing with compiler options, I hit this:

   MODPOST 248 modules
ERROR: "of_detach_node" [drivers/pci/hotplug/pnv-php.ko] undefined!
/home/aik/p/kernel-power8hp/scripts/Makefile.modpost:91: recipe for target 
'__modpost' failed
make[2]: *** [__modpost] Error 1


I enabled pnv-php to compile as a module:
CONFIG_HOTPLUG_PCI_POWERNV=m

This is missing:

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index c647bd1..75ce30d 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -311,6 +311,7 @@ int of_detach_node(struct device_node *np)

         return rc;
  }
+EXPORT_SYMBOL_GPL(of_detach_node);




>>> +		of_node_put(dn);
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * The function processes the message sent by firmware
>>> + * to remove all device tree nodes beneath the slot's
>>> + * nodes and the associated auxiliary data.
>>> + */
>>> +static void pnv_php_handle_poweroff(struct pnv_php_slot *php_slot)
>>> +{
>>> +	int ret;
>>> +
>>> +	pnv_php_rmv_pdns(php_slot->dn);
>>> +
>>> +	/*
>>> +	 * If the device sub-tree was created from OF changeset, simply
>>> +	 * to revert that. Otherwise, the device nodes in the sub-tree
>>> +	 * need to be iterated and detached.
>>> +	 */
>>> +	if (php_slot->fdt) {
>>> +		of_changeset_destroy(&php_slot->ocs);
>>> +		kfree(php_slot->dt);
>>> +		kfree(php_slot->fdt);
>>> +		php_slot->dt        = NULL;
>>> +		php_slot->dn->child = NULL;
>>> +		php_slot->fdt       = NULL;
>>> +		php_slot->power_state_confirmed =
>>> +			PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>> +		wake_up_interruptible(&php_slot->queue);
>>> +		return;
>>> +	}
>>> +
>>> +	ret = pnv_php_rmv_device_nodes(php_slot->dn);
>>> +	if (!ret) {
>>> +		php_slot->power_state_confirmed =
>>> +			PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>> +	} else {
>>> +		php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
>>> +		dev_warn(&php_slot->pdev->dev, "Error %d freeing nodes\n", ret);
>>> +	}
>>> +
>>> +	wake_up_interruptible(&php_slot->queue);
>>
>>
>> I liked one wake_up_interruptible() better...
>>
>
> Will fix in next revision.
>
>>> +}
>>> +
>>> +static int pnv_php_populate_changeset(struct of_changeset *ocs,
>>> +				      struct device_node *dn)
>>> +{
>>> +	struct device_node *child;
>>> +	int ret = 0;
>>> +
>>> +	for_each_child_of_node(dn, child) {
>>> +		ret = of_changeset_attach_node(ocs, child);
>>> +		if (ret)
>>> +			break;
>>> +
>>> +		ret = pnv_php_populate_changeset(ocs, child);
>>
>>
>> I asked in v7 - may be to add here "if (ret) break;"?
>>
>
> Will add it in v9.
>
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
>>> +{
>>> +	struct pci_controller *hose = (struct pci_controller *)data;
>>> +	struct pci_dn *pdn;
>>> +
>>> +	pdn = pci_add_device_node_info(hose, dn);
>>> +	if (!pdn)
>>> +		return ERR_PTR(-ENOMEM);
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static void pnv_php_add_pdns(struct pnv_php_slot *slot)
>>> +{
>>> +	struct pci_controller *hose = pci_bus_to_host(slot->bus);
>>> +
>>> +	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
>>> +}
>>> +
>>> +static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
>>> +{
>>> +	void *fdt, *fdt1, *dt;
>>> +	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>> +	int ret;
>>> +
>>> +	/* We don't know the FDT blob size. We try to get it through
>>> +	 * maximal memory chunk and then copy it to another chunk that
>>> +	 * fits the real size.
>>> +	 */
>>> +	fdt1 = kzalloc(0x10000, GFP_KERNEL);
>>> +	if (!fdt1)
>>> +		goto error;
>>> +
>>> +	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
>>> +	if (ret)
>>> +		goto free_fdt1;
>>> +
>>> +	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
>>> +	if (!fdt)
>>> +		goto free_fdt1;
>>> +
>>> +	/* Unflatten device tree blob */
>>> +	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
>>> +	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
>>> +	if (!dt) {
>>> +		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
>>> +		goto free_fdt;
>>> +	}
>>> +
>>> +	/* Initialize and apply the changeset */
>>> +	of_changeset_init(&php_slot->ocs);
>>> +	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
>>> +	if (ret) {
>>> +		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
>>> +			 ret);
>>> +		goto free_dt;
>>> +	}
>>> +
>>> +	php_slot->dn->child = NULL;
>>> +	ret = of_changeset_apply(&php_slot->ocs);
>>> +	if (ret) {
>>> +		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
>>> +			 ret);
>>> +		goto destroy_changeset;
>>> +	}
>>> +
>>> +	/* Add device node firmware data */
>>> +	pnv_php_add_pdns(php_slot);
>>> +	php_slot->fdt = fdt;
>>> +	php_slot->dt  = dt;
>>> +	goto out;
>>> +
>>> +destroy_changeset:
>>> +	of_changeset_destroy(&php_slot->ocs);
>>> +free_dt:
>>> +	kfree(dt);
>>> +	php_slot->dn->child = NULL;
>>> +free_fdt:
>>> +	kfree(fdt);
>>> +free_fdt1:
>>> +	kfree(fdt1);
>>> +error:
>>> +	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
>>> +out:
>>> +	/* Confirm status change */
>>> +	php_slot->power_state_confirmed = confirm;
>>> +	wake_up_interruptible(&php_slot->queue);
>>> +}
>>> +
>>> +static void pnv_php_work(struct work_struct *data)
>>> +{
>>> +	struct pnv_php_slot *php_slot = container_of(data,
>>> +						     struct pnv_php_slot,
>>> +						     work);
>>> +	uint64_t event = be64_to_cpu(php_slot->msg->params[0]);
>>> +
>>> +	if (event == OPAL_PCI_SLOT_POWER_OFF)
>>> +		pnv_php_handle_poweroff(php_slot);
>>> +	else
>>> +		pnv_php_handle_poweron(php_slot);
>>> +
>>> +	pnv_php_put_slot(php_slot);
>>> +}
>>> +
>>> +static int pnv_php_handle_msg(struct notifier_block *nb,
>>> +			      unsigned long type,
>>> +			      void *message)
>>> +{
>>> +	phandle h;
>>> +	struct device_node *dn;
>>> +	struct pnv_php_slot *php_slot;
>>> +	struct opal_msg *msg = message;
>>> +
>>> +	if (type != OPAL_MSG_PCI_HOTPLUG) {
>>> +		pr_warn("%s: Invalid message %ld received!\n",
>>> +			__func__, type);
>>> +		return NOTIFY_DONE;
>>> +	}
>>> +
>>> +	h = (phandle)be64_to_cpu(msg->params[1]);
>>> +	dn = of_find_node_by_phandle(h);
>>> +	if (!dn) {
>>> +		pr_warn("%s: No device node for phandle 0x%x\n",
>>> +			__func__, h);
>>> +		return NOTIFY_DONE;
>>> +	}
>>> +
>>> +	php_slot = pnv_php_find_slot(dn);
>>> +	if (!php_slot) {
>>> +		pr_warn("%s: No slot found for node <%s>\n",
>>> +			__func__, of_node_full_name(dn));
>>> +		of_node_put(dn);
>>> +		return NOTIFY_DONE;
>>> +	}
>>> +
>>> +	of_node_put(dn);
>>> +	php_slot->msg = msg;
>>> +	schedule_work(&php_slot->work);
>>> +	return NOTIFY_OK;
>>> +}
>>> +
>>> +static int pnv_php_set_power_state(struct hotplug_slot *slot, u8 state)
>>> +{
>>> +	struct pnv_php_slot *php_slot = slot->private;
>>> +	int ret;
>>> +
>>> +	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
>>> +	ret = pnv_pci_set_power_state(php_slot->id, state);
>>> +	if (ret) {
>>> +		dev_warn(&php_slot->pdev->dev, "Error %d powering %s slot\n",
>>> +			 ret, state ? "on" : "off");
>>> +		return ret;
>>> +	}
>>> +
>>> +	/* Continue to PCI probing after finalized device-tree. The
>>> +	 * device-tree might have been updated completely at this
>>> +	 * point. Thus we don't have to wait forever.
>>> +	 */
>>> +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
>>> +		return 0;
>>> +
>>> +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
>>> +		return -EBUSY;
>>> +
>>> +	/* Wait for firmware to add or remove device sub-tree. When it's done,
>>> +	 * one signal is received from firmware.
>>> +	 */
>>> +	ret = wait_event_timeout(php_slot->queue,
>>> +				 php_slot->power_state_confirmed, 10 * HZ);
>>> +	if (!ret) {
>>> +		dev_warn(&php_slot->pdev->dev, "Error %d waiting for power-%s\n",
>>> +			 ret, state ? "on" : "off");
>>> +		return -EBUSY;
>>> +	}
>>> +
>>> +	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
>>> +		return 0;
>>> +
>>> +	dev_warn(&php_slot->pdev->dev, "Error status %d for power-%s\n",
>>> +		 php_slot->power_state_confirmed, state ? "on" : "off");
>>> +	return -EBUSY;
>>> +}
>>> +
>>> +static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
>>> +{
>>> +	struct pnv_php_slot *php_slot = slot->private;
>>> +	uint8_t power_state;
>>
>>
>> Uninitialized variable.
>>
>

> When pnv_pci_get_power_state() fails to get the power state, it fails back to
> default one (OPAL_PCI_SLOT_POWER_ON). Otherwise, it is set to the state returned
> from pnv_pci_get_power_state(). The logic is complete.

What does guarantee that if the corresponding OPAL call returned success, 
then all data pointers to which you passed to OPAL will point to correct 
values? For exampple, the new pnv_pci_poll() updates the state only in some 
cases.


 > Also, I don't see building warning/error caused by this.

You do not see them now with your current compiler which does not mean you 
will never see them.


>>
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * Retrieve power status from firmware. If we fail
>>> +	 * getting that, the power status fails back to
>>> +	 * be on.
>>> +	 */
>>> +	ret = pnv_pci_get_power_state(php_slot->id, &power_state);
>>> +	if (ret) {
>>> +		*state = OPAL_PCI_SLOT_POWER_ON;
>>> +		dev_warn(&php_slot->pdev->dev, "Error %d getting power status\n",
>>> +			 ret);
>>> +	} else {
>>> +		*state = power_state;
>>> +		slot->info->power_status = power_state;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
>>> +{
>>> +	struct pnv_php_slot *php_slot = slot->private;
>>> +	uint8_t presence;
>>
>> Uninitialized variable.
>>
>
> Same as above.
>
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * Retrieve presence status from firmware. If we can't
>>> +	 * get that, it will fail back to be empty.
>>> +	 */
>>> +	ret = pnv_pci_get_presence_state(php_slot->id, &presence);
>>> +	if (ret >= 0) {
>>> +		*state = presence;
>>> +		slot->info->adapter_status = presence;
>>> +		ret = 0;
>>> +	} else {
>>> +		*state = OPAL_PCI_SLOT_EMPTY;
>>> +		dev_warn(&php_slot->pdev->dev, "Error %d getting presence\n",
>>> +			 ret);
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
>>> +{
>>> +	/* FIXME: Make it real once firmware supports it */
>>
>> It still does not?
>>
>>
>>> +	slot->info->attention_status = state;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
>>> +{
>>> +	struct hotplug_slot *slot = &php_slot->slot;
>>> +	uint8_t presence, power_status;
>>
>>
>> Uninitialized variables.
>>
>>
>
> I will initialize them to default states in next revision.

Thanks :)


>
>>> +	int ret;
>>> +
>>> +	/* Check if the slot has been configured */
>>> +	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
>>> +		return 0;
>>> +
>>> +	/* Retrieve slot presence status */
>>> +	ret = pnv_php_get_adapter_state(slot, &presence);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/* Proceed if there have nothing behind the slot */
>>> +	if (presence == OPAL_PCI_SLOT_EMPTY)
>>> +		goto scan;
>>> +
>>> +	/*
>>> +	 * If the power suply to the slot is off, we can't detect
>>
>> s/suply/supply/
>>
>
> Will fix in next revision.
>
>>> +	 * adapter presence state. That means we have to turn the
>>> +	 * slot on before going to probe slot's presence state.
>>> +	 *
>>> +	 * On the first time, we don't change the power status to
>>> +	 * boost system boot with assumption that the firmware
>>> +	 * supplies consistent slot power status: empty slot always
>>> +	 * has its power off and non-empty slot has its power on.
>>> +	 */
>>> +	if (!php_slot->power_state_check) {
>>> +		php_slot->power_state_check = true;
>>> +
>>> +		ret = pnv_php_get_power_state(slot, &power_status);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>> +		if (power_status != OPAL_PCI_SLOT_POWER_ON)
>>> +			return 0;
>>> +	}
>>> +
>>> +	/* Check the power status. Scan the slot if that's already on */
>>
>>
>> s/that's/it is/
>>
>
> I don't know the difference. Will fix it in next revision anyway.
>
>>
>>> +	ret = pnv_php_get_power_state(slot, &power_status);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	if (power_status == OPAL_PCI_SLOT_POWER_ON)
>>> +		goto scan;
>>> +
>>> +	/* Power is off, turn it on and then scan the slot */
>>> +	ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +scan:
>>> +	if (presence == OPAL_PCI_SLOT_PRESENT) {
>>> +		if (rescan) {
>>> +			pci_lock_rescan_remove();
>>> +			pci_add_pci_devices(php_slot->bus);
>>> +			pci_unlock_rescan_remove();
>>> +		}
>>> +
>>> +		/* Rescan for child hotpluggable slots */
>>> +		php_slot->state = PNV_PHP_STATE_POPULATED;
>>> +		if (rescan)
>>> +			pnv_php_register(php_slot->dn);
>>> +	} else {
>>> +		php_slot->state = PNV_PHP_STATE_POPULATED;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int pnv_php_enable_slot(struct hotplug_slot *slot)
>>> +{
>>> +	struct pnv_php_slot *php_slot = container_of(slot,
>>> +						     struct pnv_php_slot, slot);
>>> +
>>> +	return pnv_php_enable(php_slot, true);
>>> +}
>>> +
>>> +static int pnv_php_disable_slot(struct hotplug_slot *slot)
>>> +{
>>> +	struct pnv_php_slot *php_slot = slot->private;
>>> +	uint8_t power_state;
>>> +	int ret;
>>> +
>>> +	if (php_slot->state != PNV_PHP_STATE_POPULATED)
>>> +		return 0;
>>> +
>>> +	/* Remove all devices behind the slot */
>>> +	pci_lock_rescan_remove();
>>> +	pci_remove_pci_devices(php_slot->bus);
>>> +	pci_unlock_rescan_remove();
>>> +
>>> +	/* Detach the child hotpluggable slots */
>>> +	pnv_php_unregister(php_slot->dn);
>>> +
>>> +	/*
>>> +	 * Check the power status and turn it off if necessary. If we
>>> +	 * fail to get the power status, the power will be forced to
>>> +	 * be off.
>>> +	 */
>>> +	ret = pnv_php_get_power_state(slot, &power_state);
>>> +	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
>>> +		ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_OFF);
>>> +		if (ret)
>>> +			dev_warn(&php_slot->pdev->dev, "Error %d powering off\n",
>>
>>
>> Long line, checkpatch.pl should have warned :)
>>
>
> I didn't see the warning from checkpatch.pl.


Cool, then never mind.


>
>>> +				 ret);
>>> +	}
>>> +
>>> +	/* Update slot state */
>>> +	php_slot->state = PNV_PHP_STATE_REGISTERED;
>>> +	return 0;
>>> +}
>>> +
>>> +static struct hotplug_slot_ops php_slot_ops = {
>>> +	.get_power_status	= pnv_php_get_power_state,
>>> +	.get_adapter_status	= pnv_php_get_adapter_state,
>>> +	.set_attention_status	= pnv_php_set_attention_state,
>>> +	.enable_slot		= pnv_php_enable_slot,
>>> +	.disable_slot		= pnv_php_disable_slot,
>>> +};
>>> +
>>> +static void pnv_php_release(struct hotplug_slot *slot)
>>> +{
>>> +	struct pnv_php_slot *php_slot = slot->private;
>>> +	unsigned long flags;
>>> +
>>> +	/* Remove from global or child list */
>>> +	spin_lock_irqsave(&pnv_php_lock, flags);
>>> +	list_del(&php_slot->link);
>>> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>> +
>>> +	/* Detach from parent */
>>> +	pnv_php_put_slot(php_slot);
>>> +	pnv_php_put_slot(php_slot->parent);
>>> +}
>>> +
>>> +static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
>>> +{
>>> +	struct device_node *parent = dn;
>>> +	const __be64 *prop64;
>>> +	const __be32 *prop32;
>>> +
>>> +	/*
>>> +	 * The hotpluggable slot always has a compound Id, which
>>> +	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
>>> +	 * number, and compound indicator
>>> +	 */
>>> +	*id = (0x1ul << 63);
>>
>>
>> Is this bit from the same space as 1<<60 as in pnv_eeh_bridge_reset()? If so,
>> it would be great to have all these id bits defined in one place.
>>
>
> Will have a macro (PCI_SLOT_ID) to produce the PCI slot ID in next revision.
 >
>>
>>> +
>>> +	/* Bus/Slot/Function number */
>>> +	prop32 = of_get_property(dn, "reg", NULL);
>>> +	if (!prop32)
>>> +		return -ENXIO;
>>> +	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
>>> +
>>> +	/* PHB Id */
>>> +	while ((parent = of_get_parent(parent))) {
>>> +		if (!PCI_DN(parent)) {
>>> +			of_node_put(parent);
>>> +			break;
>>> +		}
>>> +
>>> +		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
>>> +		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
>>> +			of_node_put(parent);
>>> +			continue;
>>> +		}
>>> +
>>> +		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
>>> +		if (!prop64) {
>>> +			of_node_put(parent);
>>> +			return -ENXIO;
>>> +		}
>>> +
>>> +		*id |= be64_to_cpup(prop64);
>>> +		of_node_put(parent);
>>> +		return 0;
>>> +	}
>>> +
>>> +	return -ENODEV;
>>> +}
>>> +
>>> +static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
>>> +{
>>> +	struct pnv_php_slot *php_slot;
>>> +	struct pci_bus *bus;
>>> +	const char *label;
>>> +	uint64_t id;
>>> +
>>> +	label = of_get_property(dn, "ibm,slot-label", NULL);
>>> +	if (!label)
>>> +		return NULL;
>>> +
>>> +	if (pnv_php_get_slot_id(dn, &id))
>>> +		return NULL;
>>> +
>>> +	bus = pci_find_bus_by_node(dn);
>>> +	if (!bus)
>>> +		return NULL;
>>> +
>>> +	php_slot = kzalloc(sizeof(*php_slot), GFP_KERNEL);
>>> +	if (!php_slot)
>>> +		return NULL;
>>> +
>>> +	php_slot->name = kstrdup(label, GFP_KERNEL);
>>> +	if (!php_slot->name) {
>>> +		kfree(php_slot);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	if (dn->child && PCI_DN(dn->child))
>>> +		php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
>>> +	else
>>> +		php_slot->slot_no = -1;   /* Placeholder slot */
>>> +
>>> +	kref_init(&php_slot->kref);
>>> +	php_slot->state	                = PNV_PHP_STATE_INITIALIZED;
>>> +	php_slot->dn	                = dn;
>>> +	php_slot->pdev	                = bus->self;
>>> +	php_slot->bus	                = bus;
>>> +	php_slot->id	                = id;
>>> +	php_slot->power_state_check     = false;
>>> +	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
>>> +	php_slot->slot.ops              = &php_slot_ops;
>>> +	php_slot->slot.info             = &php_slot->slot_info;
>>> +	php_slot->slot.release          = pnv_php_release;
>>> +	php_slot->slot.private          = php_slot;
>>> +
>>> +	INIT_WORK(&php_slot->work, pnv_php_work);
>>> +	init_waitqueue_head(&php_slot->queue);
>>> +	INIT_LIST_HEAD(&php_slot->children);
>>> +	INIT_LIST_HEAD(&php_slot->link);
>>> +
>>> +	return php_slot;
>>> +}
>>> +
>>> +static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
>>> +{
>>> +	struct pnv_php_slot *parent;
>>> +	struct device_node *dn = php_slot->dn;
>>> +	unsigned long flags;
>>> +	int ret;
>>> +
>>> +	/* Check if the slot is registered or not */
>>> +	parent = pnv_php_find_slot(php_slot->dn);
>>> +	if (parent) {
>>> +		pnv_php_put_slot(parent);
>>> +		return -EEXIST;
>>> +	}
>>> +
>>> +	/* Register PCI slot */
>>> +	ret = pci_hp_register(&php_slot->slot, php_slot->bus,
>>> +			      php_slot->slot_no, php_slot->name);
>>> +	if (ret) {
>>> +		dev_warn(&php_slot->pdev->dev, "Error %d registering slot\n",
>>> +			 ret);
>>> +		return ret;
>>> +	}
>>> +
>>> +	/* Attach to the parent's child list or global list */
>>> +	while ((dn = of_get_parent(dn))) {
>>> +		if (!PCI_DN(dn)) {
>>> +			of_node_put(dn);
>>> +			break;
>>> +		}
>>> +
>>> +		parent = pnv_php_find_slot(dn);
>>> +		if (parent) {
>>> +			of_node_put(dn);
>>> +			break;
>>> +		}
>>> +
>>> +		of_node_put(dn);
>>> +	}
>>> +
>>> +	spin_lock_irqsave(&pnv_php_lock, flags);
>>> +	php_slot->parent = parent;
>>> +	if (parent)
>>> +		list_add_tail(&php_slot->link, &parent->children);
>>> +	else
>>> +		list_add_tail(&php_slot->link, &pnv_php_slot_list);
>>> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>> +
>>> +	php_slot->state = PNV_PHP_STATE_REGISTERED;
>>> +	return 0;
>>> +}
>>> +
>>> +static int pnv_php_register_one(struct device_node *dn)
>>> +{
>>> +	struct pnv_php_slot *php_slot;
>>> +	const __be32 *prop32;
>>> +	int ret;
>>> +
>>> +	/* Check if it's hotpluggable slot */
>>> +	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
>>> +	if (!prop32 || !of_read_number(prop32, 1))
>>> +		return -ENXIO;
>>> +
>>> +	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
>>> +	if (!prop32 || !of_read_number(prop32, 1))
>>> +		return -ENXIO;
>>> +
>>> +	php_slot = pnv_php_alloc_slot(dn);
>>> +	if (!php_slot)
>>> +		return -ENODEV;
>>> +
>>> +	ret = pnv_php_register_slot(php_slot);
>>> +	if (ret)
>>> +		goto free_slot;
>>> +
>>> +	ret = pnv_php_enable(php_slot, false);
>>> +	if (ret)
>>> +		goto unregister_slot;
>>> +
>>> +	return 0;
>>> +
>>> +unregister_slot:
>>> +	pnv_php_unregister_one(php_slot->dn);
>>> +free_slot:
>>> +	pnv_php_put_slot(php_slot);
>>> +	return ret;
>>> +}
>>> +
>>> +static void pnv_php_register(struct device_node *dn)
>>> +{
>>> +	struct device_node *child;
>>> +
>>> +	/*
>>> +	 * The parent slots should be registered before their
>>> +	 * child slots.
>>> +	 */
>>> +	for_each_child_of_node(dn, child) {
>>> +		pnv_php_register_one(child);
>>> +		pnv_php_register(child);
>>> +	}
>>> +}
>>> +
>>> +static void pnv_php_unregister_one(struct device_node *dn)
>>> +{
>>> +	struct pnv_php_slot *php_slot;
>>> +
>>> +	php_slot = pnv_php_find_slot(dn);
>>> +	if (!php_slot)
>>> +		return;
>>> +
>>> +	pnv_php_put_slot(php_slot);
>>> +	pci_hp_deregister(&php_slot->slot);
>>> +}
>>> +
>>> +static void pnv_php_unregister(struct device_node *dn)
>>> +{
>>> +	struct device_node *child;
>>> +
>>> +	/* The child slots should go before their parent slots */
>>> +	for_each_child_of_node(dn, child) {
>>> +		pnv_php_unregister(child);
>>> +		pnv_php_unregister_one(child);
>>> +	}
>>> +}
>>> +
>>> +static struct notifier_block php_msg_nb = {
>>> +	.notifier_call	= pnv_php_handle_msg,
>>> +	.next		= NULL,
>>> +	.priority	= 0,
>>> +};
>>> +
>>> +static int __init pnv_php_init(void)
>>> +{
>>> +	struct device_node *dn;
>>> +	int ret;
>>> +
>>> +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
>>> +
>>> +	/* Register hotplug message handler */
>>> +	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
>>> +	if (ret) {
>>> +		pr_warn("%s: Error %d registering hotplug notifier\n",
>>> +			__func__, ret);
>>> +		return ret;
>>> +	}
>>> +
>>> +	/* Scan PHB nodes and their children */
>>> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>> +		pnv_php_register(dn);
>>> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>> +		pnv_php_register(dn);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static void __exit pnv_php_exit(void)
>>> +{
>>> +	struct device_node *dn;
>>> +
>>> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>> +		pnv_php_unregister(dn);
>>> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>> +		pnv_php_unregister(dn);
>>> +
>>> +	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
>>> +}
>>> +
>>> +module_init(pnv_php_init);
>>> +module_exit(pnv_php_exit);
>>> +
>>> +MODULE_VERSION(DRIVER_VERSION);
>>> +MODULE_LICENSE("GPL v2");
>>> +MODULE_AUTHOR(DRIVER_AUTHOR);
>>> +MODULE_DESCRIPTION(DRIVER_DESC);
>>>
>>
>>
>> --
>> Alexey
>>
>


-- 
Alexey

^ permalink raw reply related	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-05-02  6:11           ` Alexey Kardashevskiy
@ 2016-05-02 23:38             ` Gavin Shan
  0 siblings, 0 replies; 174+ messages in thread
From: Gavin Shan @ 2016-05-02 23:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, Alistair Popple, linux-pci, devicetree,
	benh, mpe, dja, bhelgaas, robherring2, grant.likely

On Mon, May 02, 2016 at 04:11:53PM +1000, Alexey Kardashevskiy wrote:
>On 05/02/2016 01:44 PM, Gavin Shan wrote:
>>On Tue, Apr 19, 2016 at 08:36:48PM +1000, Alexey Kardashevskiy wrote:
>>>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>>>This adds standalone driver to support PCI hotplug for PowerPC PowerNV
>>>>platform that runs on top of skiboot firmware. The firmware identifies
>>>>hotpluggable slots and marked their device tree node with proper
>>>>"ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
>>>>device tree nodes to create/register PCI hotplug slot accordingly.
>>>>
>>>>The PCI slots are organized in fashion of tree, which means one
>>>>PCI slot might have parent PCI slot and parent PCI slot possibly
>>>>contains multiple child PCI slots. At the plugging time, the parent
>>>>PCI slot is populated before its children. The child PCI slots are
>>>>removed before their parent PCI slot can be removed from the system.
>>>>
>>>>If the skiboot firmware doesn't support slot status retrieval, the PCI
>>>>slot device node shouldn't have property "ibm,reset-by-firmware". In
>>>>that case, none of valid PCI slots will be detected from device tree.
>>>>The skiboot firmware doesn't export the capability to access attention
>>>>LEDs yet and it's something for TBD.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>>>---
>>>> drivers/pci/hotplug/Kconfig   |  12 +
>>>> drivers/pci/hotplug/Makefile  |   3 +
>>>> drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
>>>> 3 files changed, 885 insertions(+)
>>>> create mode 100644 drivers/pci/hotplug/pnv_php.c
>>>>
>>>>diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>>>>index df8caec..167c8ce 100644
>>>>--- a/drivers/pci/hotplug/Kconfig
>>>>+++ b/drivers/pci/hotplug/Kconfig
>>>>@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>>>>
>>>> 	  When in doubt, say N.
>>>>
>>>>+config HOTPLUG_PCI_POWERNV
>>>>+	tristate "PowerPC PowerNV PCI Hotplug driver"
>>>>+	depends on PPC_POWERNV && EEH
>>>>+	help
>>>>+	  Say Y here if you run PowerPC PowerNV platform that supports
>>>>+	  PCI Hotplug
>>>>+
>>>>+	  To compile this driver as a module, choose M here: the
>>>>+	  module will be called pnv-php.
>>>>+
>>>>+	  When in doubt, say N.
>>>>+
>>>> config HOTPLUG_PCI_RPA
>>>> 	tristate "RPA PCI Hotplug driver"
>>>> 	depends on PPC_PSERIES && EEH
>>>>diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>>>index b616e75..e33cdda 100644
>>>>--- a/drivers/pci/hotplug/Makefile
>>>>+++ b/drivers/pci/hotplug/Makefile
>>>>@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>>>> obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>>>> obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>>>> obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
>>>>+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>>>> obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>>>> obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>>>> obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
>>>>@@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>>>> acpiphp-objs		:=	acpiphp_core.o	\
>>>> 				acpiphp_glue.o
>>>>
>>>>+pnv-php-objs		:=	pnv_php.o
>>>>+
>>>> rpaphp-objs		:=	rpaphp_core.o	\
>>>> 				rpaphp_pci.o	\
>>>> 				rpaphp_slot.o
>>>>diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>>>>new file mode 100644
>>>>index 0000000..364ec36
>>>>--- /dev/null
>>>>+++ b/drivers/pci/hotplug/pnv_php.c
>>>>@@ -0,0 +1,870 @@
>>>>+/*
>>>>+ * PCI Hotplug Driver for PowerPC PowerNV platform.
>>>>+ *
>>>>+ * Copyright Gavin Shan, IBM Corporation 2015.
>>>>+ *
>>>>+ * This program is free software; you can redistribute it and/or modify
>>>>+ * it under the terms of the GNU General Public License as published by
>>>>+ * the Free Software Foundation; either version 2 of the License, or
>>>>+ * (at your option) any later version.
>>>>+ */
>>>>+
>>>>+#include <linux/libfdt.h>
>>>>+#include <linux/module.h>
>>>>+#include <linux/pci.h>
>>>>+#include <linux/pci_hotplug.h>
>>>>+
>>>>+#include <asm/opal.h>
>>>>+#include <asm/pnv-pci.h>
>>>>+#include <asm/ppc-pci.h>
>>>>+
>>>>+#define DRIVER_VERSION	"0.1"
>>>>+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
>>>>+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
>>>>+
>>>>+struct pnv_php_slot {
>>>>+	struct hotplug_slot		slot;
>>>>+	struct hotplug_slot_info	slot_info;
>>>>+	uint64_t			id;
>>>>+	char				*name;
>>>>+	int				slot_no;
>>>>+	struct kref			kref;
>>>>+#define PNV_PHP_STATE_INITIALIZED	0
>>>>+#define PNV_PHP_STATE_REGISTERED	1
>>>>+#define PNV_PHP_STATE_POPULATED		2
>>>>+	int				state;
>>>>+	struct device_node		*dn;
>>>>+	struct pci_dev			*pdev;
>>>>+	struct pci_bus			*bus;
>>>>+	bool				power_state_check;
>>>>+	int				power_state_confirmed;
>>>>+#define PNV_PHP_POWER_CONFIRMED_INVALID	0
>>>>+#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
>>>>+#define PNV_PHP_POWER_CONFIRMED_FAIL	2
>>>>+	struct opal_msg			*msg;
>>>>+	void				*fdt;
>>>>+	void				*dt;
>>>>+	struct of_changeset		ocs;
>>>>+	struct work_struct		work;
>>>>+	wait_queue_head_t		queue;
>>>>+	struct pnv_php_slot		*parent;
>>>>+	struct list_head		children;
>>>>+	struct list_head		link;
>>>>+};
>>>>+
>>>>+static LIST_HEAD(pnv_php_slot_list);
>>>>+static DEFINE_SPINLOCK(pnv_php_lock);
>>>>+
>>>>+static void pnv_php_register(struct device_node *dn);
>>>>+static void pnv_php_unregister_one(struct device_node *dn);
>>>>+static void pnv_php_unregister(struct device_node *dn);
>>>
>>>
>>>The names confused me. I'd suggest pnv_php_scan(), pnv_php_unregister(),
>>>pnv_php_unregister_children() instead.
>>>
>>>
>>>Alistair, what do you reckon?
>>>
>>>
>>>>+
>>>>+static void pnv_php_free_slot(struct kref *kref)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot = container_of(kref,
>>>>+						     struct pnv_php_slot,
>>>>+						     kref);
>>>>+
>>>>+	WARN_ON(!list_empty(&php_slot->children));
>>>>+	kfree(php_slot->name);
>>>>+	kfree(php_slot);
>>>>+}
>>>>+
>>>>+static inline void pnv_php_put_slot(struct pnv_php_slot *php_slot)
>>>>+{
>>>>+	if (!php_slot)
>>>
>>>
>>>BUG_ON()?
>>>
>>
>>checkpatch.pl will report warning like below. Are you sure you need a BUG_ON()?
>
>
>No, I am not - this is why I asked. How possible is it to have here
>phb_slot==NULL? Can we recover from that? The options are -
>1) memory is corrupted (then we cannot and it has to be BUG_ON)
>2) broken/old OPAL returns unexpected error (then we can continue, I guess)
>3) there are ways (via sysfs in the userspace? no idea) to get
>pnv_php_put_slot() called with phb_slot.
>
>If only 1) is possible - then BUG_ON, if 2) - WARN_ON, if 3) - should be
>neither BUG_ON nor WARN_ON. You know the code better, you decide.
>

I will have a WARN_ON instead since it's not harmful.

>>
>>WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
>>#159: FILE: drivers/pci/hotplug/pnv_php.c:76:
>>+	BUG_ON(!php_slot);
>>
>>
>>>>+		return;
>>>>+
>>>>+	kref_put(&php_slot->kref, pnv_php_free_slot);
>>>>+}
>>>>+
>>>>+static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
>>>>+					  struct pnv_php_slot *php_slot)
>>>>+{
>>>>+	struct pnv_php_slot *target, *tmp;
>>>>+
>>>>+	if (php_slot->dn == dn) {
>>>>+		kref_get(&php_slot->kref);
>>>>+		return php_slot;
>>>>+	}
>>>>+
>>>>+	list_for_each_entry(tmp, &php_slot->children, link) {
>>>>+		target = pnv_php_match(dn, tmp);
>>>>+		if (target)
>>>>+			return target;
>>>>+	}
>>>>+
>>>>+	return NULL;
>>>>+}
>>>>+
>>>>+static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot, *tmp;
>>>>+	unsigned long flags;
>>>>+
>>>>+	spin_lock_irqsave(&pnv_php_lock, flags);
>>>>+	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
>>>>+		php_slot = pnv_php_match(dn, tmp);
>>>>+		if (php_slot) {
>>>>+			spin_unlock_irqrestore(&pnv_php_lock, flags);
>>>>+			return php_slot;
>>>>+		}
>>>>+	}
>>>>+	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>>>+
>>>>+	return NULL;
>>>>+}
>>>>+
>>>>+/*
>>>>+ * Remove pdn for all children of the indicated device node.
>>>>+ * The function should remove pdn in a depth-first manner.
>>>>+ */
>>>>+static void pnv_php_rmv_pdns(struct device_node *dn)
>>>>+{
>>>>+	struct device_node *child;
>>>>+
>>>>+	for_each_child_of_node(dn, child) {
>>>>+		pnv_php_rmv_pdns(child);
>>>>+
>>>>+		pci_remove_device_node_info(child);
>>>>+	}
>>>>+}
>>>>+
>>>>+/*
>>>>+ * Remove all child nodes of the indicated device nodes. The
>>>>+ * function should remove device nodes in depth-first manner.
>>>>+ */
>>>>+static int pnv_php_rmv_device_nodes(struct device_node *parent)
>>>>+{
>>>>+	struct device_node *dn, *child;
>>>>+	int ret = 0;
>>>>+
>>>>+	for_each_child_of_node(parent, dn) {
>>>>+		ret = pnv_php_rmv_device_nodes(dn);
>>>>+		if (ret)
>>>>+			return ret;
>>>>+
>>>>+		child = of_get_next_child(dn, NULL);
>>>>+		if (child) {
>>>>+			of_node_put(child);
>>>>+			of_node_put(dn);
>>>>+			pr_err("%s: Alive children of node <%s>\n",
>>>>+			       __func__, of_node_full_name(dn));
>>>>+			return -EBUSY;
>>>>+		}
>>>>+
>>>>+		of_detach_node(dn);
>
>
>While playing with compiler options, I hit this:
>
>  MODPOST 248 modules
>ERROR: "of_detach_node" [drivers/pci/hotplug/pnv-php.ko] undefined!
>/home/aik/p/kernel-power8hp/scripts/Makefile.modpost:91: recipe for target
>'__modpost' failed
>make[2]: *** [__modpost] Error 1
>

It was a known issue and the patch has been piled there. I will include
it in next revision.

>
>I enabled pnv-php to compile as a module:
>CONFIG_HOTPLUG_PCI_POWERNV=m
>
>This is missing:
>
>diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>index c647bd1..75ce30d 100644
>--- a/drivers/of/dynamic.c
>+++ b/drivers/of/dynamic.c
>@@ -311,6 +311,7 @@ int of_detach_node(struct device_node *np)
>
>        return rc;
> }
>+EXPORT_SYMBOL_GPL(of_detach_node);
>
>
>
>
>>>>+		of_node_put(dn);
>>>>+	}
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+/*
>>>>+ * The function processes the message sent by firmware
>>>>+ * to remove all device tree nodes beneath the slot's
>>>>+ * nodes and the associated auxiliary data.
>>>>+ */
>>>>+static void pnv_php_handle_poweroff(struct pnv_php_slot *php_slot)
>>>>+{
>>>>+	int ret;
>>>>+
>>>>+	pnv_php_rmv_pdns(php_slot->dn);
>>>>+
>>>>+	/*
>>>>+	 * If the device sub-tree was created from OF changeset, simply
>>>>+	 * to revert that. Otherwise, the device nodes in the sub-tree
>>>>+	 * need to be iterated and detached.
>>>>+	 */
>>>>+	if (php_slot->fdt) {
>>>>+		of_changeset_destroy(&php_slot->ocs);
>>>>+		kfree(php_slot->dt);
>>>>+		kfree(php_slot->fdt);
>>>>+		php_slot->dt        = NULL;
>>>>+		php_slot->dn->child = NULL;
>>>>+		php_slot->fdt       = NULL;
>>>>+		php_slot->power_state_confirmed =
>>>>+			PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>>>+		wake_up_interruptible(&php_slot->queue);
>>>>+		return;
>>>>+	}
>>>>+
>>>>+	ret = pnv_php_rmv_device_nodes(php_slot->dn);
>>>>+	if (!ret) {
>>>>+		php_slot->power_state_confirmed =
>>>>+			PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>>>+	} else {
>>>>+		php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
>>>>+		dev_warn(&php_slot->pdev->dev, "Error %d freeing nodes\n", ret);
>>>>+	}
>>>>+
>>>>+	wake_up_interruptible(&php_slot->queue);
>>>
>>>
>>>I liked one wake_up_interruptible() better...
>>>
>>
>>Will fix in next revision.
>>
>>>>+}
>>>>+
>>>>+static int pnv_php_populate_changeset(struct of_changeset *ocs,
>>>>+				      struct device_node *dn)
>>>>+{
>>>>+	struct device_node *child;
>>>>+	int ret = 0;
>>>>+
>>>>+	for_each_child_of_node(dn, child) {
>>>>+		ret = of_changeset_attach_node(ocs, child);
>>>>+		if (ret)
>>>>+			break;
>>>>+
>>>>+		ret = pnv_php_populate_changeset(ocs, child);
>>>
>>>
>>>I asked in v7 - may be to add here "if (ret) break;"?
>>>
>>
>>Will add it in v9.
>>
>>>>+	}
>>>>+
>>>>+	return ret;
>>>>+}
>>>>+
>>>>+static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
>>>>+{
>>>>+	struct pci_controller *hose = (struct pci_controller *)data;
>>>>+	struct pci_dn *pdn;
>>>>+
>>>>+	pdn = pci_add_device_node_info(hose, dn);
>>>>+	if (!pdn)
>>>>+		return ERR_PTR(-ENOMEM);
>>>>+
>>>>+	return NULL;
>>>>+}
>>>>+
>>>>+static void pnv_php_add_pdns(struct pnv_php_slot *slot)
>>>>+{
>>>>+	struct pci_controller *hose = pci_bus_to_host(slot->bus);
>>>>+
>>>>+	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
>>>>+}
>>>>+
>>>>+static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
>>>>+{
>>>>+	void *fdt, *fdt1, *dt;
>>>>+	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>>>+	int ret;
>>>>+
>>>>+	/* We don't know the FDT blob size. We try to get it through
>>>>+	 * maximal memory chunk and then copy it to another chunk that
>>>>+	 * fits the real size.
>>>>+	 */
>>>>+	fdt1 = kzalloc(0x10000, GFP_KERNEL);
>>>>+	if (!fdt1)
>>>>+		goto error;
>>>>+
>>>>+	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
>>>>+	if (ret)
>>>>+		goto free_fdt1;
>>>>+
>>>>+	fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
>>>>+	if (!fdt)
>>>>+		goto free_fdt1;
>>>>+
>>>>+	/* Unflatten device tree blob */
>>>>+	memcpy(fdt, fdt1, fdt_totalsize(fdt1));
>>>>+	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
>>>>+	if (!dt) {
>>>>+		dev_warn(&php_slot->pdev->dev, "Cannot unflatten FDT\n");
>>>>+		goto free_fdt;
>>>>+	}
>>>>+
>>>>+	/* Initialize and apply the changeset */
>>>>+	of_changeset_init(&php_slot->ocs);
>>>>+	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
>>>>+	if (ret) {
>>>>+		dev_warn(&php_slot->pdev->dev, "Error %d populating changeset\n",
>>>>+			 ret);
>>>>+		goto free_dt;
>>>>+	}
>>>>+
>>>>+	php_slot->dn->child = NULL;
>>>>+	ret = of_changeset_apply(&php_slot->ocs);
>>>>+	if (ret) {
>>>>+		dev_warn(&php_slot->pdev->dev, "Error %d applying changeset\n",
>>>>+			 ret);
>>>>+		goto destroy_changeset;
>>>>+	}
>>>>+
>>>>+	/* Add device node firmware data */
>>>>+	pnv_php_add_pdns(php_slot);
>>>>+	php_slot->fdt = fdt;
>>>>+	php_slot->dt  = dt;
>>>>+	goto out;
>>>>+
>>>>+destroy_changeset:
>>>>+	of_changeset_destroy(&php_slot->ocs);
>>>>+free_dt:
>>>>+	kfree(dt);
>>>>+	php_slot->dn->child = NULL;
>>>>+free_fdt:
>>>>+	kfree(fdt);
>>>>+free_fdt1:
>>>>+	kfree(fdt1);
>>>>+error:
>>>>+	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
>>>>+out:
>>>>+	/* Confirm status change */
>>>>+	php_slot->power_state_confirmed = confirm;
>>>>+	wake_up_interruptible(&php_slot->queue);
>>>>+}
>>>>+
>>>>+static void pnv_php_work(struct work_struct *data)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot = container_of(data,
>>>>+						     struct pnv_php_slot,
>>>>+						     work);
>>>>+	uint64_t event = be64_to_cpu(php_slot->msg->params[0]);
>>>>+
>>>>+	if (event == OPAL_PCI_SLOT_POWER_OFF)
>>>>+		pnv_php_handle_poweroff(php_slot);
>>>>+	else
>>>>+		pnv_php_handle_poweron(php_slot);
>>>>+
>>>>+	pnv_php_put_slot(php_slot);
>>>>+}
>>>>+
>>>>+static int pnv_php_handle_msg(struct notifier_block *nb,
>>>>+			      unsigned long type,
>>>>+			      void *message)
>>>>+{
>>>>+	phandle h;
>>>>+	struct device_node *dn;
>>>>+	struct pnv_php_slot *php_slot;
>>>>+	struct opal_msg *msg = message;
>>>>+
>>>>+	if (type != OPAL_MSG_PCI_HOTPLUG) {
>>>>+		pr_warn("%s: Invalid message %ld received!\n",
>>>>+			__func__, type);
>>>>+		return NOTIFY_DONE;
>>>>+	}
>>>>+
>>>>+	h = (phandle)be64_to_cpu(msg->params[1]);
>>>>+	dn = of_find_node_by_phandle(h);
>>>>+	if (!dn) {
>>>>+		pr_warn("%s: No device node for phandle 0x%x\n",
>>>>+			__func__, h);
>>>>+		return NOTIFY_DONE;
>>>>+	}
>>>>+
>>>>+	php_slot = pnv_php_find_slot(dn);
>>>>+	if (!php_slot) {
>>>>+		pr_warn("%s: No slot found for node <%s>\n",
>>>>+			__func__, of_node_full_name(dn));
>>>>+		of_node_put(dn);
>>>>+		return NOTIFY_DONE;
>>>>+	}
>>>>+
>>>>+	of_node_put(dn);
>>>>+	php_slot->msg = msg;
>>>>+	schedule_work(&php_slot->work);
>>>>+	return NOTIFY_OK;
>>>>+}
>>>>+
>>>>+static int pnv_php_set_power_state(struct hotplug_slot *slot, u8 state)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot = slot->private;
>>>>+	int ret;
>>>>+
>>>>+	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
>>>>+	ret = pnv_pci_set_power_state(php_slot->id, state);
>>>>+	if (ret) {
>>>>+		dev_warn(&php_slot->pdev->dev, "Error %d powering %s slot\n",
>>>>+			 ret, state ? "on" : "off");
>>>>+		return ret;
>>>>+	}
>>>>+
>>>>+	/* Continue to PCI probing after finalized device-tree. The
>>>>+	 * device-tree might have been updated completely at this
>>>>+	 * point. Thus we don't have to wait forever.
>>>>+	 */
>>>>+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
>>>>+		return 0;
>>>>+
>>>>+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
>>>>+		return -EBUSY;
>>>>+
>>>>+	/* Wait for firmware to add or remove device sub-tree. When it's done,
>>>>+	 * one signal is received from firmware.
>>>>+	 */
>>>>+	ret = wait_event_timeout(php_slot->queue,
>>>>+				 php_slot->power_state_confirmed, 10 * HZ);
>>>>+	if (!ret) {
>>>>+		dev_warn(&php_slot->pdev->dev, "Error %d waiting for power-%s\n",
>>>>+			 ret, state ? "on" : "off");
>>>>+		return -EBUSY;
>>>>+	}
>>>>+
>>>>+	if (php_slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
>>>>+		return 0;
>>>>+
>>>>+	dev_warn(&php_slot->pdev->dev, "Error status %d for power-%s\n",
>>>>+		 php_slot->power_state_confirmed, state ? "on" : "off");
>>>>+	return -EBUSY;
>>>>+}
>>>>+
>>>>+static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot = slot->private;
>>>>+	uint8_t power_state;
>>>
>>>
>>>Uninitialized variable.
>>>
>>
>
>>When pnv_pci_get_power_state() fails to get the power state, it fails back to
>>default one (OPAL_PCI_SLOT_POWER_ON). Otherwise, it is set to the state returned
>>from pnv_pci_get_power_state(). The logic is complete.
>
>What does guarantee that if the corresponding OPAL call returned success,
>then all data pointers to which you passed to OPAL will point to correct
>values? For exampple, the new pnv_pci_poll() updates the state only in some
>cases.
>

Note that opal_pci_poll() is going to be replaced by opal_pci_poll2() as you
suggested in another thread. The later function accepts two arguments.

When the second argument to opal_pci_poll() isn't null, the result is returned.
Otherwise, the result won't be returned.

>
>> Also, I don't see building warning/error caused by this.
>
>You do not see them now with your current compiler which does not mean you
>will never see them.
>

hrm, I really don't catch the point. Obviously, I'm not able to know the
unpredictable thing. I will initialize @power_state to default (ON) in
next revision since you're insisting on it :)

>
>>>
>>>>+	int ret;
>>>>+
>>>>+	/*
>>>>+	 * Retrieve power status from firmware. If we fail
>>>>+	 * getting that, the power status fails back to
>>>>+	 * be on.
>>>>+	 */
>>>>+	ret = pnv_pci_get_power_state(php_slot->id, &power_state);
>>>>+	if (ret) {
>>>>+		*state = OPAL_PCI_SLOT_POWER_ON;
>>>>+		dev_warn(&php_slot->pdev->dev, "Error %d getting power status\n",
>>>>+			 ret);
>>>>+	} else {
>>>>+		*state = power_state;
>>>>+		slot->info->power_status = power_state;
>>>>+	}
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot = slot->private;
>>>>+	uint8_t presence;
>>>
>>>Uninitialized variable.
>>>
>>
>>Same as above.
>>
>>>>+	int ret;
>>>>+
>>>>+	/*
>>>>+	 * Retrieve presence status from firmware. If we can't
>>>>+	 * get that, it will fail back to be empty.
>>>>+	 */
>>>>+	ret = pnv_pci_get_presence_state(php_slot->id, &presence);
>>>>+	if (ret >= 0) {
>>>>+		*state = presence;
>>>>+		slot->info->adapter_status = presence;
>>>>+		ret = 0;
>>>>+	} else {
>>>>+		*state = OPAL_PCI_SLOT_EMPTY;
>>>>+		dev_warn(&php_slot->pdev->dev, "Error %d getting presence\n",
>>>>+			 ret);
>>>>+	}
>>>>+
>>>>+	return ret;
>>>>+}
>>>>+
>>>>+static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
>>>>+{
>>>>+	/* FIXME: Make it real once firmware supports it */
>>>
>>>It still does not?
>>>
>>>
>>>>+	slot->info->attention_status = state;
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
>>>>+{
>>>>+	struct hotplug_slot *slot = &php_slot->slot;
>>>>+	uint8_t presence, power_status;
>>>
>>>
>>>Uninitialized variables.
>>>
>>>
>>
>>I will initialize them to default states in next revision.
>
>Thanks :)
>
>
>>
>>>>+	int ret;
>>>>+
>>>>+	/* Check if the slot has been configured */
>>>>+	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
>>>>+		return 0;
>>>>+
>>>>+	/* Retrieve slot presence status */
>>>>+	ret = pnv_php_get_adapter_state(slot, &presence);
>>>>+	if (ret)
>>>>+		return ret;
>>>>+
>>>>+	/* Proceed if there have nothing behind the slot */
>>>>+	if (presence == OPAL_PCI_SLOT_EMPTY)
>>>>+		goto scan;
>>>>+
>>>>+	/*
>>>>+	 * If the power suply to the slot is off, we can't detect
>>>
>>>s/suply/supply/
>>>
>>
>>Will fix in next revision.
>>
>>>>+	 * adapter presence state. That means we have to turn the
>>>>+	 * slot on before going to probe slot's presence state.
>>>>+	 *
>>>>+	 * On the first time, we don't change the power status to
>>>>+	 * boost system boot with assumption that the firmware
>>>>+	 * supplies consistent slot power status: empty slot always
>>>>+	 * has its power off and non-empty slot has its power on.
>>>>+	 */
>>>>+	if (!php_slot->power_state_check) {
>>>>+		php_slot->power_state_check = true;
>>>>+
>>>>+		ret = pnv_php_get_power_state(slot, &power_status);
>>>>+		if (ret)
>>>>+			return ret;
>>>>+
>>>>+		if (power_status != OPAL_PCI_SLOT_POWER_ON)
>>>>+			return 0;
>>>>+	}
>>>>+
>>>>+	/* Check the power status. Scan the slot if that's already on */
>>>
>>>
>>>s/that's/it is/
>>>
>>
>>I don't know the difference. Will fix it in next revision anyway.
>>
>>>
>>>>+	ret = pnv_php_get_power_state(slot, &power_status);
>>>>+	if (ret)
>>>>+		return ret;
>>>>+
>>>>+	if (power_status == OPAL_PCI_SLOT_POWER_ON)
>>>>+		goto scan;
>>>>+
>>>>+	/* Power is off, turn it on and then scan the slot */
>>>>+	ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
>>>>+	if (ret)
>>>>+		return ret;
>>>>+
>>>>+scan:
>>>>+	if (presence == OPAL_PCI_SLOT_PRESENT) {
>>>>+		if (rescan) {
>>>>+			pci_lock_rescan_remove();
>>>>+			pci_add_pci_devices(php_slot->bus);
>>>>+			pci_unlock_rescan_remove();
>>>>+		}
>>>>+
>>>>+		/* Rescan for child hotpluggable slots */
>>>>+		php_slot->state = PNV_PHP_STATE_POPULATED;
>>>>+		if (rescan)
>>>>+			pnv_php_register(php_slot->dn);
>>>>+	} else {
>>>>+		php_slot->state = PNV_PHP_STATE_POPULATED;
>>>>+	}
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+static int pnv_php_enable_slot(struct hotplug_slot *slot)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot = container_of(slot,
>>>>+						     struct pnv_php_slot, slot);
>>>>+
>>>>+	return pnv_php_enable(php_slot, true);
>>>>+}
>>>>+
>>>>+static int pnv_php_disable_slot(struct hotplug_slot *slot)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot = slot->private;
>>>>+	uint8_t power_state;
>>>>+	int ret;
>>>>+
>>>>+	if (php_slot->state != PNV_PHP_STATE_POPULATED)
>>>>+		return 0;
>>>>+
>>>>+	/* Remove all devices behind the slot */
>>>>+	pci_lock_rescan_remove();
>>>>+	pci_remove_pci_devices(php_slot->bus);
>>>>+	pci_unlock_rescan_remove();
>>>>+
>>>>+	/* Detach the child hotpluggable slots */
>>>>+	pnv_php_unregister(php_slot->dn);
>>>>+
>>>>+	/*
>>>>+	 * Check the power status and turn it off if necessary. If we
>>>>+	 * fail to get the power status, the power will be forced to
>>>>+	 * be off.
>>>>+	 */
>>>>+	ret = pnv_php_get_power_state(slot, &power_state);
>>>>+	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
>>>>+		ret = pnv_php_set_power_state(slot, OPAL_PCI_SLOT_POWER_OFF);
>>>>+		if (ret)
>>>>+			dev_warn(&php_slot->pdev->dev, "Error %d powering off\n",
>>>
>>>
>>>Long line, checkpatch.pl should have warned :)
>>>
>>
>>I didn't see the warning from checkpatch.pl.
>
>
>Cool, then never mind.
>
>
>>
>>>>+				 ret);
>>>>+	}
>>>>+
>>>>+	/* Update slot state */
>>>>+	php_slot->state = PNV_PHP_STATE_REGISTERED;
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+static struct hotplug_slot_ops php_slot_ops = {
>>>>+	.get_power_status	= pnv_php_get_power_state,
>>>>+	.get_adapter_status	= pnv_php_get_adapter_state,
>>>>+	.set_attention_status	= pnv_php_set_attention_state,
>>>>+	.enable_slot		= pnv_php_enable_slot,
>>>>+	.disable_slot		= pnv_php_disable_slot,
>>>>+};
>>>>+
>>>>+static void pnv_php_release(struct hotplug_slot *slot)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot = slot->private;
>>>>+	unsigned long flags;
>>>>+
>>>>+	/* Remove from global or child list */
>>>>+	spin_lock_irqsave(&pnv_php_lock, flags);
>>>>+	list_del(&php_slot->link);
>>>>+	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>>>+
>>>>+	/* Detach from parent */
>>>>+	pnv_php_put_slot(php_slot);
>>>>+	pnv_php_put_slot(php_slot->parent);
>>>>+}
>>>>+
>>>>+static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
>>>>+{
>>>>+	struct device_node *parent = dn;
>>>>+	const __be64 *prop64;
>>>>+	const __be32 *prop32;
>>>>+
>>>>+	/*
>>>>+	 * The hotpluggable slot always has a compound Id, which
>>>>+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
>>>>+	 * number, and compound indicator
>>>>+	 */
>>>>+	*id = (0x1ul << 63);
>>>
>>>
>>>Is this bit from the same space as 1<<60 as in pnv_eeh_bridge_reset()? If so,
>>>it would be great to have all these id bits defined in one place.
>>>
>>
>>Will have a macro (PCI_SLOT_ID) to produce the PCI slot ID in next revision.
>>
>>>
>>>>+
>>>>+	/* Bus/Slot/Function number */
>>>>+	prop32 = of_get_property(dn, "reg", NULL);
>>>>+	if (!prop32)
>>>>+		return -ENXIO;
>>>>+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
>>>>+
>>>>+	/* PHB Id */
>>>>+	while ((parent = of_get_parent(parent))) {
>>>>+		if (!PCI_DN(parent)) {
>>>>+			of_node_put(parent);
>>>>+			break;
>>>>+		}
>>>>+
>>>>+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
>>>>+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
>>>>+			of_node_put(parent);
>>>>+			continue;
>>>>+		}
>>>>+
>>>>+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
>>>>+		if (!prop64) {
>>>>+			of_node_put(parent);
>>>>+			return -ENXIO;
>>>>+		}
>>>>+
>>>>+		*id |= be64_to_cpup(prop64);
>>>>+		of_node_put(parent);
>>>>+		return 0;
>>>>+	}
>>>>+
>>>>+	return -ENODEV;
>>>>+}
>>>>+
>>>>+static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot;
>>>>+	struct pci_bus *bus;
>>>>+	const char *label;
>>>>+	uint64_t id;
>>>>+
>>>>+	label = of_get_property(dn, "ibm,slot-label", NULL);
>>>>+	if (!label)
>>>>+		return NULL;
>>>>+
>>>>+	if (pnv_php_get_slot_id(dn, &id))
>>>>+		return NULL;
>>>>+
>>>>+	bus = pci_find_bus_by_node(dn);
>>>>+	if (!bus)
>>>>+		return NULL;
>>>>+
>>>>+	php_slot = kzalloc(sizeof(*php_slot), GFP_KERNEL);
>>>>+	if (!php_slot)
>>>>+		return NULL;
>>>>+
>>>>+	php_slot->name = kstrdup(label, GFP_KERNEL);
>>>>+	if (!php_slot->name) {
>>>>+		kfree(php_slot);
>>>>+		return NULL;
>>>>+	}
>>>>+
>>>>+	if (dn->child && PCI_DN(dn->child))
>>>>+		php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
>>>>+	else
>>>>+		php_slot->slot_no = -1;   /* Placeholder slot */
>>>>+
>>>>+	kref_init(&php_slot->kref);
>>>>+	php_slot->state	                = PNV_PHP_STATE_INITIALIZED;
>>>>+	php_slot->dn	                = dn;
>>>>+	php_slot->pdev	                = bus->self;
>>>>+	php_slot->bus	                = bus;
>>>>+	php_slot->id	                = id;
>>>>+	php_slot->power_state_check     = false;
>>>>+	php_slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
>>>>+	php_slot->slot.ops              = &php_slot_ops;
>>>>+	php_slot->slot.info             = &php_slot->slot_info;
>>>>+	php_slot->slot.release          = pnv_php_release;
>>>>+	php_slot->slot.private          = php_slot;
>>>>+
>>>>+	INIT_WORK(&php_slot->work, pnv_php_work);
>>>>+	init_waitqueue_head(&php_slot->queue);
>>>>+	INIT_LIST_HEAD(&php_slot->children);
>>>>+	INIT_LIST_HEAD(&php_slot->link);
>>>>+
>>>>+	return php_slot;
>>>>+}
>>>>+
>>>>+static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
>>>>+{
>>>>+	struct pnv_php_slot *parent;
>>>>+	struct device_node *dn = php_slot->dn;
>>>>+	unsigned long flags;
>>>>+	int ret;
>>>>+
>>>>+	/* Check if the slot is registered or not */
>>>>+	parent = pnv_php_find_slot(php_slot->dn);
>>>>+	if (parent) {
>>>>+		pnv_php_put_slot(parent);
>>>>+		return -EEXIST;
>>>>+	}
>>>>+
>>>>+	/* Register PCI slot */
>>>>+	ret = pci_hp_register(&php_slot->slot, php_slot->bus,
>>>>+			      php_slot->slot_no, php_slot->name);
>>>>+	if (ret) {
>>>>+		dev_warn(&php_slot->pdev->dev, "Error %d registering slot\n",
>>>>+			 ret);
>>>>+		return ret;
>>>>+	}
>>>>+
>>>>+	/* Attach to the parent's child list or global list */
>>>>+	while ((dn = of_get_parent(dn))) {
>>>>+		if (!PCI_DN(dn)) {
>>>>+			of_node_put(dn);
>>>>+			break;
>>>>+		}
>>>>+
>>>>+		parent = pnv_php_find_slot(dn);
>>>>+		if (parent) {
>>>>+			of_node_put(dn);
>>>>+			break;
>>>>+		}
>>>>+
>>>>+		of_node_put(dn);
>>>>+	}
>>>>+
>>>>+	spin_lock_irqsave(&pnv_php_lock, flags);
>>>>+	php_slot->parent = parent;
>>>>+	if (parent)
>>>>+		list_add_tail(&php_slot->link, &parent->children);
>>>>+	else
>>>>+		list_add_tail(&php_slot->link, &pnv_php_slot_list);
>>>>+	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>>>+
>>>>+	php_slot->state = PNV_PHP_STATE_REGISTERED;
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+static int pnv_php_register_one(struct device_node *dn)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot;
>>>>+	const __be32 *prop32;
>>>>+	int ret;
>>>>+
>>>>+	/* Check if it's hotpluggable slot */
>>>>+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
>>>>+	if (!prop32 || !of_read_number(prop32, 1))
>>>>+		return -ENXIO;
>>>>+
>>>>+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
>>>>+	if (!prop32 || !of_read_number(prop32, 1))
>>>>+		return -ENXIO;
>>>>+
>>>>+	php_slot = pnv_php_alloc_slot(dn);
>>>>+	if (!php_slot)
>>>>+		return -ENODEV;
>>>>+
>>>>+	ret = pnv_php_register_slot(php_slot);
>>>>+	if (ret)
>>>>+		goto free_slot;
>>>>+
>>>>+	ret = pnv_php_enable(php_slot, false);
>>>>+	if (ret)
>>>>+		goto unregister_slot;
>>>>+
>>>>+	return 0;
>>>>+
>>>>+unregister_slot:
>>>>+	pnv_php_unregister_one(php_slot->dn);
>>>>+free_slot:
>>>>+	pnv_php_put_slot(php_slot);
>>>>+	return ret;
>>>>+}
>>>>+
>>>>+static void pnv_php_register(struct device_node *dn)
>>>>+{
>>>>+	struct device_node *child;
>>>>+
>>>>+	/*
>>>>+	 * The parent slots should be registered before their
>>>>+	 * child slots.
>>>>+	 */
>>>>+	for_each_child_of_node(dn, child) {
>>>>+		pnv_php_register_one(child);
>>>>+		pnv_php_register(child);
>>>>+	}
>>>>+}
>>>>+
>>>>+static void pnv_php_unregister_one(struct device_node *dn)
>>>>+{
>>>>+	struct pnv_php_slot *php_slot;
>>>>+
>>>>+	php_slot = pnv_php_find_slot(dn);
>>>>+	if (!php_slot)
>>>>+		return;
>>>>+
>>>>+	pnv_php_put_slot(php_slot);
>>>>+	pci_hp_deregister(&php_slot->slot);
>>>>+}
>>>>+
>>>>+static void pnv_php_unregister(struct device_node *dn)
>>>>+{
>>>>+	struct device_node *child;
>>>>+
>>>>+	/* The child slots should go before their parent slots */
>>>>+	for_each_child_of_node(dn, child) {
>>>>+		pnv_php_unregister(child);
>>>>+		pnv_php_unregister_one(child);
>>>>+	}
>>>>+}
>>>>+
>>>>+static struct notifier_block php_msg_nb = {
>>>>+	.notifier_call	= pnv_php_handle_msg,
>>>>+	.next		= NULL,
>>>>+	.priority	= 0,
>>>>+};
>>>>+
>>>>+static int __init pnv_php_init(void)
>>>>+{
>>>>+	struct device_node *dn;
>>>>+	int ret;
>>>>+
>>>>+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
>>>>+
>>>>+	/* Register hotplug message handler */
>>>>+	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
>>>>+	if (ret) {
>>>>+		pr_warn("%s: Error %d registering hotplug notifier\n",
>>>>+			__func__, ret);
>>>>+		return ret;
>>>>+	}
>>>>+
>>>>+	/* Scan PHB nodes and their children */
>>>>+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>>>+		pnv_php_register(dn);
>>>>+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>>>+		pnv_php_register(dn);
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+static void __exit pnv_php_exit(void)
>>>>+{
>>>>+	struct device_node *dn;
>>>>+
>>>>+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>>>+		pnv_php_unregister(dn);
>>>>+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>>>+		pnv_php_unregister(dn);
>>>>+
>>>>+	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
>>>>+}
>>>>+
>>>>+module_init(pnv_php_init);
>>>>+module_exit(pnv_php_exit);
>>>>+
>>>>+MODULE_VERSION(DRIVER_VERSION);
>>>>+MODULE_LICENSE("GPL v2");
>>>>+MODULE_AUTHOR(DRIVER_AUTHOR);
>>>>+MODULE_DESCRIPTION(DRIVER_DESC);
>>>>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-04-20  1:55         ` Alistair Popple
@ 2016-05-02 23:41           ` Gavin Shan
  2016-05-03  0:44               ` Alexey Kardashevskiy
  0 siblings, 1 reply; 174+ messages in thread
From: Gavin Shan @ 2016-05-02 23:41 UTC (permalink / raw)
  To: Alistair Popple
  Cc: linuxppc-dev, Alexey Kardashevskiy, Gavin Shan, devicetree,
	linux-pci, grant.likely, robherring2, bhelgaas, dja

On Wed, Apr 20, 2016 at 11:55:56AM +1000, Alistair Popple wrote:
>On Tue, 19 Apr 2016 20:36:48 Alexey Kardashevskiy wrote:
>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>> > This adds standalone driver to support PCI hotplug for PowerPC PowerNV
>> > platform that runs on top of skiboot firmware. The firmware identifies
>> > hotpluggable slots and marked their device tree node with proper
>> > "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
>> > device tree nodes to create/register PCI hotplug slot accordingly.
>> >
>> > The PCI slots are organized in fashion of tree, which means one
>> > PCI slot might have parent PCI slot and parent PCI slot possibly
>> > contains multiple child PCI slots. At the plugging time, the parent
>> > PCI slot is populated before its children. The child PCI slots are
>> > removed before their parent PCI slot can be removed from the system.
>> >
>> > If the skiboot firmware doesn't support slot status retrieval, the PCI
>> > slot device node shouldn't have property "ibm,reset-by-firmware". In
>> > that case, none of valid PCI slots will be detected from device tree.
>> > The skiboot firmware doesn't export the capability to access attention
>> > LEDs yet and it's something for TBD.
>> >
>> > Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>> > ---
>> >   drivers/pci/hotplug/Kconfig   |  12 +
>> >   drivers/pci/hotplug/Makefile  |   3 +
>> >   drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
>> >   3 files changed, 885 insertions(+)
>> >   create mode 100644 drivers/pci/hotplug/pnv_php.c
>> >
>> > diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>> > index df8caec..167c8ce 100644
>> > --- a/drivers/pci/hotplug/Kconfig
>> > +++ b/drivers/pci/hotplug/Kconfig
>> > @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>> >
>> >   	  When in doubt, say N.
>> >
>> > +config HOTPLUG_PCI_POWERNV
>> > +	tristate "PowerPC PowerNV PCI Hotplug driver"
>> > +	depends on PPC_POWERNV && EEH
>> > +	help
>> > +	  Say Y here if you run PowerPC PowerNV platform that supports
>> > +	  PCI Hotplug
>> > +
>> > +	  To compile this driver as a module, choose M here: the
>> > +	  module will be called pnv-php.
>> > +
>> > +	  When in doubt, say N.
>> > +
>> >   config HOTPLUG_PCI_RPA
>> >   	tristate "RPA PCI Hotplug driver"
>> >   	depends on PPC_PSERIES && EEH
>> > diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>> > index b616e75..e33cdda 100644
>> > --- a/drivers/pci/hotplug/Makefile
>> > +++ b/drivers/pci/hotplug/Makefile
>> > @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>> >   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>> >   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>> >   obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
>> > +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>> >   obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>> >   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>> >   obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
>> > @@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>> >   acpiphp-objs		:=	acpiphp_core.o	\
>> >   				acpiphp_glue.o
>> >
>> > +pnv-php-objs		:=	pnv_php.o
>> > +
>> >   rpaphp-objs		:=	rpaphp_core.o	\
>> >   				rpaphp_pci.o	\
>> >   				rpaphp_slot.o
>> > diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>> > new file mode 100644
>> > index 0000000..364ec36
>> > --- /dev/null
>> > +++ b/drivers/pci/hotplug/pnv_php.c
>> > @@ -0,0 +1,870 @@
>> > +/*
>> > + * PCI Hotplug Driver for PowerPC PowerNV platform.
>> > + *
>> > + * Copyright Gavin Shan, IBM Corporation 2015.
>> > + *
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License as published by
>> > + * the Free Software Foundation; either version 2 of the License, or
>> > + * (at your option) any later version.
>> > + */
>> > +
>> > +#include <linux/libfdt.h>
>> > +#include <linux/module.h>
>> > +#include <linux/pci.h>
>> > +#include <linux/pci_hotplug.h>
>> > +
>> > +#include <asm/opal.h>
>> > +#include <asm/pnv-pci.h>
>> > +#include <asm/ppc-pci.h>
>> > +
>> > +#define DRIVER_VERSION	"0.1"
>> > +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
>> > +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
>> > +
>> > +struct pnv_php_slot {
>> > +	struct hotplug_slot		slot;
>> > +	struct hotplug_slot_info	slot_info;
>> > +	uint64_t			id;
>> > +	char				*name;
>> > +	int				slot_no;
>> > +	struct kref			kref;
>> > +#define PNV_PHP_STATE_INITIALIZED	0
>> > +#define PNV_PHP_STATE_REGISTERED	1
>> > +#define PNV_PHP_STATE_POPULATED		2
>> > +	int				state;
>> > +	struct device_node		*dn;
>> > +	struct pci_dev			*pdev;
>> > +	struct pci_bus			*bus;
>> > +	bool				power_state_check;
>> > +	int				power_state_confirmed;
>> > +#define PNV_PHP_POWER_CONFIRMED_INVALID	0
>> > +#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
>> > +#define PNV_PHP_POWER_CONFIRMED_FAIL	2
>> > +	struct opal_msg			*msg;
>> > +	void				*fdt;
>> > +	void				*dt;
>> > +	struct of_changeset		ocs;
>> > +	struct work_struct		work;
>> > +	wait_queue_head_t		queue;
>> > +	struct pnv_php_slot		*parent;
>> > +	struct list_head		children;
>> > +	struct list_head		link;
>> > +};
>> > +
>> > +static LIST_HEAD(pnv_php_slot_list);
>> > +static DEFINE_SPINLOCK(pnv_php_lock);
>> > +
>> > +static void pnv_php_register(struct device_node *dn);
>> > +static void pnv_php_unregister_one(struct device_node *dn);
>> > +static void pnv_php_unregister(struct device_node *dn);
>> 
>> 
>> The names confused me. I'd suggest pnv_php_scan(), pnv_php_unregister(), 
>> pnv_php_unregister_children() instead.
>> 
>> 
>> Alistair, what do you reckon?
>
>To be honest I'm not sure the new names are necessarily any less confusing. I
>will admit to having to read that code twice though so perhaps a short comment
>describing what each of those functions does would be the best method for
>reducing confusion.

Alexey, Please confirm if I need rename those functions though I
don't understand the confusion caused the function names.

[unrelated content removed]

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2016-05-02 23:41           ` Gavin Shan
@ 2016-05-03  0:44               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-05-03  0:44 UTC (permalink / raw)
  To: Gavin Shan, Alistair Popple
  Cc: devicetree, linux-pci, bhelgaas, robherring2, grant.likely,
	linuxppc-dev, dja

On 05/03/2016 09:41 AM, Gavin Shan wrote:
> On Wed, Apr 20, 2016 at 11:55:56AM +1000, Alistair Popple wrote:
>> On Tue, 19 Apr 2016 20:36:48 Alexey Kardashevskiy wrote:
>>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>>> This adds standalone driver to support PCI hotplug for PowerPC PowerNV
>>>> platform that runs on top of skiboot firmware. The firmware identifies
>>>> hotpluggable slots and marked their device tree node with proper
>>>> "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
>>>> device tree nodes to create/register PCI hotplug slot accordingly.
>>>>
>>>> The PCI slots are organized in fashion of tree, which means one
>>>> PCI slot might have parent PCI slot and parent PCI slot possibly
>>>> contains multiple child PCI slots. At the plugging time, the parent
>>>> PCI slot is populated before its children. The child PCI slots are
>>>> removed before their parent PCI slot can be removed from the system.
>>>>
>>>> If the skiboot firmware doesn't support slot status retrieval, the PCI
>>>> slot device node shouldn't have property "ibm,reset-by-firmware". In
>>>> that case, none of valid PCI slots will be detected from device tree.
>>>> The skiboot firmware doesn't export the capability to access attention
>>>> LEDs yet and it's something for TBD.
>>>>
>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>>> ---
>>>>   drivers/pci/hotplug/Kconfig   |  12 +
>>>>   drivers/pci/hotplug/Makefile  |   3 +
>>>>   drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
>>>>   3 files changed, 885 insertions(+)
>>>>   create mode 100644 drivers/pci/hotplug/pnv_php.c
>>>>
>>>> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>>>> index df8caec..167c8ce 100644
>>>> --- a/drivers/pci/hotplug/Kconfig
>>>> +++ b/drivers/pci/hotplug/Kconfig
>>>> @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>>>>
>>>>   	  When in doubt, say N.
>>>>
>>>> +config HOTPLUG_PCI_POWERNV
>>>> +	tristate "PowerPC PowerNV PCI Hotplug driver"
>>>> +	depends on PPC_POWERNV && EEH
>>>> +	help
>>>> +	  Say Y here if you run PowerPC PowerNV platform that supports
>>>> +	  PCI Hotplug
>>>> +
>>>> +	  To compile this driver as a module, choose M here: the
>>>> +	  module will be called pnv-php.
>>>> +
>>>> +	  When in doubt, say N.
>>>> +
>>>>   config HOTPLUG_PCI_RPA
>>>>   	tristate "RPA PCI Hotplug driver"
>>>>   	depends on PPC_PSERIES && EEH
>>>> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>>> index b616e75..e33cdda 100644
>>>> --- a/drivers/pci/hotplug/Makefile
>>>> +++ b/drivers/pci/hotplug/Makefile
>>>> @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
>>>> +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
>>>> @@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>>>>   acpiphp-objs		:=	acpiphp_core.o	\
>>>>   				acpiphp_glue.o
>>>>
>>>> +pnv-php-objs		:=	pnv_php.o
>>>> +
>>>>   rpaphp-objs		:=	rpaphp_core.o	\
>>>>   				rpaphp_pci.o	\
>>>>   				rpaphp_slot.o
>>>> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>>>> new file mode 100644
>>>> index 0000000..364ec36
>>>> --- /dev/null
>>>> +++ b/drivers/pci/hotplug/pnv_php.c
>>>> @@ -0,0 +1,870 @@
>>>> +/*
>>>> + * PCI Hotplug Driver for PowerPC PowerNV platform.
>>>> + *
>>>> + * Copyright Gavin Shan, IBM Corporation 2015.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License as published by
>>>> + * the Free Software Foundation; either version 2 of the License, or
>>>> + * (at your option) any later version.
>>>> + */
>>>> +
>>>> +#include <linux/libfdt.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/pci_hotplug.h>
>>>> +
>>>> +#include <asm/opal.h>
>>>> +#include <asm/pnv-pci.h>
>>>> +#include <asm/ppc-pci.h>
>>>> +
>>>> +#define DRIVER_VERSION	"0.1"
>>>> +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
>>>> +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
>>>> +
>>>> +struct pnv_php_slot {
>>>> +	struct hotplug_slot		slot;
>>>> +	struct hotplug_slot_info	slot_info;
>>>> +	uint64_t			id;
>>>> +	char				*name;
>>>> +	int				slot_no;
>>>> +	struct kref			kref;
>>>> +#define PNV_PHP_STATE_INITIALIZED	0
>>>> +#define PNV_PHP_STATE_REGISTERED	1
>>>> +#define PNV_PHP_STATE_POPULATED		2
>>>> +	int				state;
>>>> +	struct device_node		*dn;
>>>> +	struct pci_dev			*pdev;
>>>> +	struct pci_bus			*bus;
>>>> +	bool				power_state_check;
>>>> +	int				power_state_confirmed;
>>>> +#define PNV_PHP_POWER_CONFIRMED_INVALID	0
>>>> +#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
>>>> +#define PNV_PHP_POWER_CONFIRMED_FAIL	2
>>>> +	struct opal_msg			*msg;
>>>> +	void				*fdt;
>>>> +	void				*dt;
>>>> +	struct of_changeset		ocs;
>>>> +	struct work_struct		work;
>>>> +	wait_queue_head_t		queue;
>>>> +	struct pnv_php_slot		*parent;
>>>> +	struct list_head		children;
>>>> +	struct list_head		link;
>>>> +};
>>>> +
>>>> +static LIST_HEAD(pnv_php_slot_list);
>>>> +static DEFINE_SPINLOCK(pnv_php_lock);
>>>> +
>>>> +static void pnv_php_register(struct device_node *dn);
>>>> +static void pnv_php_unregister_one(struct device_node *dn);
>>>> +static void pnv_php_unregister(struct device_node *dn);
>>>
>>>
>>> The names confused me. I'd suggest pnv_php_scan(), pnv_php_unregister(),
>>> pnv_php_unregister_children() instead.
>>>
>>>
>>> Alistair, what do you reckon?
>>
>> To be honest I'm not sure the new names are necessarily any less confusing. I
>> will admit to having to read that code twice though so perhaps a short comment
>> describing what each of those functions does would be the best method for
>> reducing confusion.
>
> Alexey, Please confirm if I need rename those functions though I
> don't understand the confusion caused the function names.

Just add the comments.

I got confused because:

pnv_php_register() walks through nodes to find "ibm,slot-pluggable" - this 
is rather "scan" than "register" (which may not happen if the property is 
not there).

pnv_php_register_one() registers one what? slot? From the name I conclude 
that not a slot as there is pnv_php_register_slot() which does register one 
slot. So I suppose pnv_php_register_one() registers one _node_ (which may 
have multiple slots? there should be reason why it is a separate function). 
I do not know...



> [unrelated content removed]



-- 
Alexey
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
@ 2016-05-03  0:44               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 174+ messages in thread
From: Alexey Kardashevskiy @ 2016-05-03  0:44 UTC (permalink / raw)
  To: Gavin Shan, Alistair Popple
  Cc: linuxppc-dev, devicetree, linux-pci, grant.likely, robherring2,
	bhelgaas, dja

On 05/03/2016 09:41 AM, Gavin Shan wrote:
> On Wed, Apr 20, 2016 at 11:55:56AM +1000, Alistair Popple wrote:
>> On Tue, 19 Apr 2016 20:36:48 Alexey Kardashevskiy wrote:
>>> On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>>> This adds standalone driver to support PCI hotplug for PowerPC PowerNV
>>>> platform that runs on top of skiboot firmware. The firmware identifies
>>>> hotpluggable slots and marked their device tree node with proper
>>>> "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
>>>> device tree nodes to create/register PCI hotplug slot accordingly.
>>>>
>>>> The PCI slots are organized in fashion of tree, which means one
>>>> PCI slot might have parent PCI slot and parent PCI slot possibly
>>>> contains multiple child PCI slots. At the plugging time, the parent
>>>> PCI slot is populated before its children. The child PCI slots are
>>>> removed before their parent PCI slot can be removed from the system.
>>>>
>>>> If the skiboot firmware doesn't support slot status retrieval, the PCI
>>>> slot device node shouldn't have property "ibm,reset-by-firmware". In
>>>> that case, none of valid PCI slots will be detected from device tree.
>>>> The skiboot firmware doesn't export the capability to access attention
>>>> LEDs yet and it's something for TBD.
>>>>
>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>>> ---
>>>>   drivers/pci/hotplug/Kconfig   |  12 +
>>>>   drivers/pci/hotplug/Makefile  |   3 +
>>>>   drivers/pci/hotplug/pnv_php.c | 870 ++++++++++++++++++++++++++++++++++++++++++
>>>>   3 files changed, 885 insertions(+)
>>>>   create mode 100644 drivers/pci/hotplug/pnv_php.c
>>>>
>>>> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>>>> index df8caec..167c8ce 100644
>>>> --- a/drivers/pci/hotplug/Kconfig
>>>> +++ b/drivers/pci/hotplug/Kconfig
>>>> @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>>>>
>>>>   	  When in doubt, say N.
>>>>
>>>> +config HOTPLUG_PCI_POWERNV
>>>> +	tristate "PowerPC PowerNV PCI Hotplug driver"
>>>> +	depends on PPC_POWERNV && EEH
>>>> +	help
>>>> +	  Say Y here if you run PowerPC PowerNV platform that supports
>>>> +	  PCI Hotplug
>>>> +
>>>> +	  To compile this driver as a module, choose M here: the
>>>> +	  module will be called pnv-php.
>>>> +
>>>> +	  When in doubt, say N.
>>>> +
>>>>   config HOTPLUG_PCI_RPA
>>>>   	tristate "RPA PCI Hotplug driver"
>>>>   	depends on PPC_PSERIES && EEH
>>>> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>>> index b616e75..e33cdda 100644
>>>> --- a/drivers/pci/hotplug/Makefile
>>>> +++ b/drivers/pci/hotplug/Makefile
>>>> @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
>>>> +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>>>>   obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
>>>> @@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>>>>   acpiphp-objs		:=	acpiphp_core.o	\
>>>>   				acpiphp_glue.o
>>>>
>>>> +pnv-php-objs		:=	pnv_php.o
>>>> +
>>>>   rpaphp-objs		:=	rpaphp_core.o	\
>>>>   				rpaphp_pci.o	\
>>>>   				rpaphp_slot.o
>>>> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>>>> new file mode 100644
>>>> index 0000000..364ec36
>>>> --- /dev/null
>>>> +++ b/drivers/pci/hotplug/pnv_php.c
>>>> @@ -0,0 +1,870 @@
>>>> +/*
>>>> + * PCI Hotplug Driver for PowerPC PowerNV platform.
>>>> + *
>>>> + * Copyright Gavin Shan, IBM Corporation 2015.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License as published by
>>>> + * the Free Software Foundation; either version 2 of the License, or
>>>> + * (at your option) any later version.
>>>> + */
>>>> +
>>>> +#include <linux/libfdt.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/pci_hotplug.h>
>>>> +
>>>> +#include <asm/opal.h>
>>>> +#include <asm/pnv-pci.h>
>>>> +#include <asm/ppc-pci.h>
>>>> +
>>>> +#define DRIVER_VERSION	"0.1"
>>>> +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
>>>> +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
>>>> +
>>>> +struct pnv_php_slot {
>>>> +	struct hotplug_slot		slot;
>>>> +	struct hotplug_slot_info	slot_info;
>>>> +	uint64_t			id;
>>>> +	char				*name;
>>>> +	int				slot_no;
>>>> +	struct kref			kref;
>>>> +#define PNV_PHP_STATE_INITIALIZED	0
>>>> +#define PNV_PHP_STATE_REGISTERED	1
>>>> +#define PNV_PHP_STATE_POPULATED		2
>>>> +	int				state;
>>>> +	struct device_node		*dn;
>>>> +	struct pci_dev			*pdev;
>>>> +	struct pci_bus			*bus;
>>>> +	bool				power_state_check;
>>>> +	int				power_state_confirmed;
>>>> +#define PNV_PHP_POWER_CONFIRMED_INVALID	0
>>>> +#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
>>>> +#define PNV_PHP_POWER_CONFIRMED_FAIL	2
>>>> +	struct opal_msg			*msg;
>>>> +	void				*fdt;
>>>> +	void				*dt;
>>>> +	struct of_changeset		ocs;
>>>> +	struct work_struct		work;
>>>> +	wait_queue_head_t		queue;
>>>> +	struct pnv_php_slot		*parent;
>>>> +	struct list_head		children;
>>>> +	struct list_head		link;
>>>> +};
>>>> +
>>>> +static LIST_HEAD(pnv_php_slot_list);
>>>> +static DEFINE_SPINLOCK(pnv_php_lock);
>>>> +
>>>> +static void pnv_php_register(struct device_node *dn);
>>>> +static void pnv_php_unregister_one(struct device_node *dn);
>>>> +static void pnv_php_unregister(struct device_node *dn);
>>>
>>>
>>> The names confused me. I'd suggest pnv_php_scan(), pnv_php_unregister(),
>>> pnv_php_unregister_children() instead.
>>>
>>>
>>> Alistair, what do you reckon?
>>
>> To be honest I'm not sure the new names are necessarily any less confusing. I
>> will admit to having to read that code twice though so perhaps a short comment
>> describing what each of those functions does would be the best method for
>> reducing confusion.
>
> Alexey, Please confirm if I need rename those functions though I
> don't understand the confusion caused the function names.

Just add the comments.

I got confused because:

pnv_php_register() walks through nodes to find "ibm,slot-pluggable" - this 
is rather "scan" than "register" (which may not happen if the property is 
not there).

pnv_php_register_one() registers one what? slot? From the name I conclude 
that not a slot as there is pnv_php_register_slot() which does register one 
slot. So I suppose pnv_php_register_one() registers one _node_ (which may 
have multiple slots? there should be reason why it is a separate function). 
I do not know...



> [unrelated content removed]



-- 
Alexey

^ permalink raw reply	[flat|nested] 174+ messages in thread

end of thread, other threads:[~2016-05-03  0:44 UTC | newest]

Thread overview: 174+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17  3:43 [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Gavin Shan
2016-02-17  3:43 ` [PATCH v8 01/45] PCI: Add pcibios_setup_bridge() Gavin Shan
2016-02-17  3:43 ` [PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
     [not found]   ` <1455680668-23298-3-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-04-13  5:52     ` Alexey Kardashevskiy
2016-04-13  5:52       ` Alexey Kardashevskiy
2016-02-17  3:43 ` [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
2016-02-17  4:18   ` Andrew Donnellan
     [not found]   ` <1455680668-23298-4-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-04-13  5:52     ` Alexey Kardashevskiy
2016-04-13  5:52       ` Alexey Kardashevskiy
2016-04-19 23:59       ` Gavin Shan
2016-02-17  3:43 ` [PATCH v8 04/45] powerpc/powernv: Cleanup on pci_controller_ops instances Gavin Shan
2016-02-17  4:38   ` Andrew Donnellan
2016-02-17  3:43 ` [PATCH v8 06/45] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
2016-04-13  5:56   ` Alexey Kardashevskiy
2016-02-17  3:43 ` [PATCH v8 07/45] powerpc/powernv: Rename PE# " Gavin Shan
2016-04-13  5:57   ` Alexey Kardashevskiy
2016-02-17  3:43 ` [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap Gavin Shan
2016-04-13  6:21   ` Alexey Kardashevskiy
2016-04-13  7:53     ` Gavin Shan
2016-04-13  7:53       ` Gavin Shan
2016-04-13  9:53       ` Alexey Kardashevskiy
2016-02-17  3:43 ` [PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
2016-04-13  6:45   ` Alexey Kardashevskiy
2016-04-20  0:04     ` Gavin Shan
2016-02-17  3:43 ` [PATCH v8 10/45] powerpc/powernv: IO and M32 mapping based on PCI device resources Gavin Shan
2016-02-17  3:43 ` [PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption Gavin Shan
2016-04-13  7:09   ` Alexey Kardashevskiy
2016-04-20  0:05     ` Gavin Shan
2016-02-17  3:43 ` [PATCH v8 12/45] powerpc/powernv: Rename M64 related functions Gavin Shan
     [not found]   ` <1455680668-23298-13-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-04-13  7:20     ` Alexey Kardashevskiy
2016-04-13  7:20       ` Alexey Kardashevskiy
2016-02-17  3:43 ` [PATCH v8 14/45] powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe() Gavin Shan
2016-04-13  7:36   ` Alexey Kardashevskiy
2016-02-17  3:43 ` [PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list Gavin Shan
2016-04-13  8:59   ` Alexey Kardashevskiy
2016-04-20  0:34     ` Gavin Shan
2016-02-17  3:44 ` [PATCH v8 18/45] powerpc/powernv: Increase PE# capacity Gavin Shan
2016-04-19  2:02   ` Alexey Kardashevskiy
2016-04-20  0:52     ` Gavin Shan
2016-02-17  3:44 ` [PATCH v8 19/45] powerpc/powernv: Use PE instead of number during setup and release Gavin Shan
2016-04-19  2:50   ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 20/45] powerpc/powernv: Allocate PE# in reverse order Gavin Shan
2016-04-19  3:07   ` Alexey Kardashevskiy
2016-04-20  1:04     ` Gavin Shan
2016-02-17  3:44 ` [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time Gavin Shan
2016-04-19  4:16   ` Alexey Kardashevskiy
2016-04-20  1:12     ` Gavin Shan
2016-04-20  3:00       ` Alexey Kardashevskiy
2016-04-20  3:35         ` Gavin Shan
2016-02-17  3:44 ` [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table Gavin Shan
2016-04-19  4:28   ` Alexey Kardashevskiy
2016-04-20  1:15     ` Gavin Shan
2016-04-20  3:17       ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 23/45] powerpc/powernv: Dynamically release PEs Gavin Shan
2016-04-19  5:19   ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add,remove}_pci_devices() Gavin Shan
2016-02-17  3:44   ` [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Gavin Shan
2016-04-19  5:28   ` [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add,remove}_pci_devices() Alexey Kardashevskiy
2016-04-20  1:23     ` Gavin Shan
2016-04-20  3:21       ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 25/45] powerpc/pci: Rename pcibios_find_pci_bus() Gavin Shan
2016-04-19  5:31   ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 26/45] powerpc/pci: Move pci_find_bus_by_node() around Gavin Shan
2016-02-17  3:44 ` [PATCH v8 27/45] powerpc/pci: Export pci_add_device_node_info() Gavin Shan
2016-04-19  5:35   ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 28/45] powerpc/pci: Introduce pci_remove_device_node_info() Gavin Shan
2016-04-19  5:48   ` Alexey Kardashevskiy
2016-04-20  1:25     ` Gavin Shan
2016-02-17  3:44 ` [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes() Gavin Shan
     [not found]   ` <1455680668-23298-30-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-04-19  5:51     ` Alexey Kardashevskiy
2016-04-19  5:51       ` Alexey Kardashevskiy
2016-04-20  1:27       ` Gavin Shan
2016-04-20  3:39         ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 30/45] powerpc/pci: Delay populating pdn Gavin Shan
2016-04-19  8:19   ` Alexey Kardashevskiy
2016-04-20  2:13     ` Gavin Shan
2016-04-20  3:54       ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 31/45] powerpc/pci: Don't scan empty slot Gavin Shan
2016-04-19  8:19   ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 32/45] powerpc/pci: Update bridge windows on PCI plug Gavin Shan
2016-04-19  8:47   ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
2016-02-17  4:35   ` Andrew Donnellan
2016-04-19  8:49   ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 34/45] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() Gavin Shan
2016-04-19  8:57   ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 35/45] powerpc/powernv: Fundamental reset " Gavin Shan
     [not found]   ` <1455680668-23298-36-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-04-19  9:04     ` Alexey Kardashevskiy
2016-04-19  9:04       ` Alexey Kardashevskiy
2016-04-20  1:36       ` Gavin Shan
2016-02-17  3:44 ` [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID Gavin Shan
     [not found]   ` <1455680668-23298-37-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-04-19  9:28     ` Alexey Kardashevskiy
2016-04-19  9:28       ` Alexey Kardashevskiy
2016-04-20  2:28       ` Gavin Shan
2016-04-20  4:14         ` Alexey Kardashevskiy
2016-04-22  4:23           ` Alistair Popple
2016-02-17  3:44 ` [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure Gavin Shan
2016-04-19  9:34   ` Alexey Kardashevskiy
2016-04-20  2:33     ` Gavin Shan
2016-04-20  4:17       ` Alexey Kardashevskiy
2016-02-17  3:44 ` [PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
2016-04-19  9:42   ` Alexey Kardashevskiy
2016-04-20  2:38     ` Gavin Shan
2016-02-17  3:44 ` [PATCH v8 40/45] drivers/of: Split unflatten_dt_node() Gavin Shan
2016-02-17 14:30   ` Rob Herring
2016-04-20  2:38     ` Gavin Shan
2016-05-02  2:02     ` Gavin Shan
2016-02-17  3:44 ` [PATCH v8 41/45] drivers/of: Avoid recursively calling unflatten_dt_node() Gavin Shan
2016-02-17 14:53   ` Rob Herring
2016-02-17 14:53     ` Rob Herring
2016-02-17  3:44 ` [PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree() Gavin Shan
2016-02-17 15:00   ` Rob Herring
2016-02-17 15:58   ` Jyri Sarha
2016-02-17 15:58     ` Jyri Sarha
2016-02-17  3:44 ` [PATCH v8 44/45] drivers/of: Return allocated memory from of_fdt_unflatten_tree() Gavin Shan
     [not found] ` <1455680668-23298-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-02-17  3:43   ` [PATCH v8 05/45] powerpc/powernv: Drop phb->bdfn_to_pe() Gavin Shan
2016-02-17  3:43     ` Gavin Shan
2016-04-13  5:53     ` Alexey Kardashevskiy
2016-02-17  3:43   ` [PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC Gavin Shan
2016-02-17  3:43     ` Gavin Shan
2016-04-13  7:47     ` Alexey Kardashevskiy
2016-04-20  0:22       ` Gavin Shan
2016-04-20  2:55         ` Alexey Kardashevskiy
2016-02-17  3:43   ` [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE Gavin Shan
2016-02-17  3:43     ` Gavin Shan
2016-04-13  8:29     ` Alexey Kardashevskiy
2016-04-13 23:54       ` Gavin Shan
2016-04-14  3:36         ` Alexey Kardashevskiy
2016-04-20  0:25           ` Gavin Shan
2016-02-17  3:44   ` [PATCH v8 17/45] powerpc/powernv/ioda1: Improve DMA32 segment track Gavin Shan
2016-02-17  3:44     ` Gavin Shan
2016-04-19  1:50     ` Alexey Kardashevskiy
2016-04-20  0:49       ` Gavin Shan
2016-04-20  5:10         ` Alexey Kardashevskiy
2016-02-17  3:44   ` [PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status Gavin Shan
2016-02-17  3:44     ` Gavin Shan
2016-04-19  9:39     ` Alexey Kardashevskiy
2016-04-20  2:36       ` Gavin Shan
2016-04-20  4:25         ` Alexey Kardashevskiy
2016-02-17  3:44   ` [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node() Gavin Shan
2016-02-17  3:44     ` Gavin Shan
     [not found]     ` <1455680668-23298-43-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-02-17 14:59       ` Rob Herring
2016-02-17 14:59         ` Rob Herring
2016-02-19  3:16         ` Gavin Shan
2016-03-02  2:40           ` Rob Herring
2016-03-02  2:40             ` Rob Herring
2016-03-08  0:56             ` Gavin Shan
2016-03-17 13:31               ` Rob Herring
2016-03-17 22:44                 ` Gavin Shan
2016-02-17  3:44   ` [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
2016-02-17  3:44     ` Gavin Shan
2016-04-15  0:47     ` Alistair Popple
2016-04-15  1:39       ` Gavin Shan
     [not found]     ` <1455680668-23298-46-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-04-19 10:36       ` Alexey Kardashevskiy
2016-04-19 10:36         ` Alexey Kardashevskiy
2016-04-20  1:55         ` Alistair Popple
2016-05-02 23:41           ` Gavin Shan
2016-05-03  0:44             ` Alexey Kardashevskiy
2016-05-03  0:44               ` Alexey Kardashevskiy
2016-05-02  3:44         ` Gavin Shan
2016-05-02  6:11           ` Alexey Kardashevskiy
2016-05-02 23:38             ` Gavin Shan
2016-04-13  7:28 ` [PATCH v8 00/45] powerpc/powernv: PCI hotplug support Alexey Kardashevskiy
2016-04-13  7:42   ` Gavin Shan
2016-04-13  9:14     ` Alexey Kardashevskiy
2016-04-13  9:14       ` Alexey Kardashevskiy
2016-04-13 23:42       ` Gavin Shan
2016-04-13 23:57         ` Alistair Popple
2016-04-14  1:30           ` Gavin Shan
2016-04-14  3:38             ` Alexey Kardashevskiy
2016-04-15 16:10             ` Rob Herring
2016-04-20  2:40               ` Gavin Shan
2016-04-14  3:26         ` Alexey Kardashevskiy
2016-04-14  5:25           ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.