This is the linux part of the work to use the PCI hotplug framework to control an opencapi card so that it can be reset and re-read after flashing a new FPGA image. I had posted it earlier as an RFC and this version is mostly similar, with just some minor editing. It needs support in skiboot: http://patchwork.ozlabs.org/project/skiboot/list/?series=129724 On an old skiboot, it will do nothing. A virtual PCI slot is created for the opencapi adapter, and its state can be controlled through the pnv-php hotplug driver: echo 0|1 > /sys/bus/pci/slots/OPENCAPI-<...>/power Note that the power to the card is not really turned off, as the card needs to stay on to be flashed with a new image. Instead the card is in reset. The first part of the series mostly deals with the pci/ioda state, as the opencapi devices can now go away and the state needs to be cleaned up. The second part is modifications to the PCI hotplug driver on powernv, so that a virtual slot is created for the opencapi adapters found in the device tree. Frederic Barrat (11): powerpc/powernv/ioda: Fix ref count for devices with their own PE powerpc/powernv/ioda: Protect PE list powerpc/powernv/ioda: set up PE on opencapi device when enabling powerpc/powernv/ioda: Release opencapi device powerpc/powernv/ioda: Find opencapi slot for a device node pci/hotplug/pnv-php: Remove erroneous warning pci/hotplug/pnv-php: Improve error msg on power state change failure pci/hotplug/pnv-php: Register opencapi slots pci/hotplug/pnv-php: Relax check when disabling slot pci/hotplug/pnv-php: Wrap warnings in macro ocxl: Add PCI hotplug dependency to Kconfig arch/powerpc/include/asm/pnv-pci.h | 1 + arch/powerpc/platforms/powernv/pci-ioda.c | 107 ++++++++++++++-------- arch/powerpc/platforms/powernv/pci.c | 10 +- drivers/misc/ocxl/Kconfig | 1 + drivers/pci/hotplug/pnv_php.c | 82 ++++++++++------- 5 files changed, 125 insertions(+), 76 deletions(-) -- 2.21.0
Taking a reference on the pci_dev structure was required with initial commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under OPAL v2"), where we were storing the pci_dev in the pci_dn structure. However, the pci_dev was later removed from the pci_dn structure, but the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn"). The pnv_ioda_pe structure life cycle is the same as the pci_dev structure, the PE is freed when the device is released. So we don't need a reference for the pci_dev stored in the PE, otherwise the pci_dev will never be released. Which is not really a surprise as the comment (removed here as no longer needed) was stating as much. Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- arch/powerpc/platforms/powernv/pci-ioda.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index d8080558d020..92767f006f20 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) return NULL; } - /* NOTE: We get only one ref to the pci_dev for the pdn, not for the - * pointer in the PE data structure, both should be destroyed at the - * same time. However, this needs to be looked at more closely again - * once we actually start removing things (Hotplug, SR-IOV, ...) - * - * At some point we want to remove the PDN completely anyways - */ - pci_dev_get(dev); pdn->pe_number = pe->pe_number; pe->flags = PNV_IODA_PE_DEV; pe->pdev = dev; @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) pnv_ioda_free_pe(pe); pdn->pe_number = IODA_INVALID_PE; pe->pdev = NULL; - pci_dev_put(dev); return NULL; } @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev) */ dev_info(&npu_pdev->dev, "Associating to existing PE %x\n", pe_num); - pci_dev_get(npu_pdev); + pci_dev_get(npu_pdev); // still needed after 902bdc57451c ? npu_pdn = pci_get_pdn(npu_pdev); rid = npu_pdev->bus->number << 8 | npu_pdn->devfn; npu_pdn->pe_number = pe_num; -- 2.21.0
Protect the PHB's list of PE. Probably not needed as long as it was populated during PHB creation, but it feels right and will become required once we can add/remove opencapi devices on hotplug. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 92767f006f20..3dbbf5365c1c 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1080,8 +1080,9 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) } /* Put PE to the list */ + mutex_lock(&phb->ioda.pe_list_mutex); list_add_tail(&pe->list, &phb->ioda.pe_list); - + mutex_unlock(&phb->ioda.pe_list_mutex); return pe; } @@ -3513,7 +3514,10 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe) struct pnv_phb *phb = pe->phb; struct pnv_ioda_pe *slave, *tmp; + mutex_lock(&phb->ioda.pe_list_mutex); list_del(&pe->list); + mutex_unlock(&phb->ioda.pe_list_mutex); + switch (phb->type) { case PNV_PHB_IODA1: pnv_pci_ioda1_release_pe_dma(pe); -- 2.21.0
The PE for an opencapi device was set as part of a late PHB fixup operation, when creating the PHB. To use the PCI hotplug framework, this is not going to work, as the PHB stays the same, it's only the devices underneath which are updated. For regular PCI devices, it is done as part of the reconfiguration of the bridge, but for opencapi PHBs, we don't have an intermediate bridge. So let's define the PE when the device is enabled. PEs are meaningless for opencapi, the NPU doesn't define them and opal is not doing anything with them. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- arch/powerpc/platforms/powernv/pci-ioda.c | 31 +++++++++++++++++------ 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 3dbbf5365c1c..06ce7ddaa0cf 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1260,8 +1260,6 @@ static void pnv_pci_ioda_setup_PEs(void) { struct pci_controller *hose; struct pnv_phb *phb; - struct pci_bus *bus; - struct pci_dev *pdev; struct pnv_ioda_pe *pe; list_for_each_entry(hose, &hose_list, list_node) { @@ -1273,11 +1271,6 @@ static void pnv_pci_ioda_setup_PEs(void) if (phb->model == PNV_PHB_MODEL_NPU2) WARN_ON_ONCE(pnv_npu2_init(hose)); } - if (phb->type == PNV_PHB_NPU_OCAPI) { - bus = hose->bus; - list_for_each_entry(pdev, &bus->devices, bus_list) - pnv_ioda_setup_dev_PE(pdev); - } } list_for_each_entry(hose, &hose_list, list_node) { phb = hose->private_data; @@ -3385,6 +3378,28 @@ static bool pnv_pci_enable_device_hook(struct pci_dev *dev) return true; } +static bool pnv_ocapi_enable_device_hook(struct pci_dev *dev) +{ + struct pci_controller *hose = pci_bus_to_host(dev->bus); + struct pnv_phb *phb = hose->private_data; + struct pci_dn *pdn; + struct pnv_ioda_pe *pe; + + if (!phb->initialized) + return true; + + pdn = pci_get_pdn(dev); + if (!pdn) + return false; + + if (pdn->pe_number == IODA_INVALID_PE) { + pe = pnv_ioda_setup_dev_PE(dev); + if (!pe) + return false; + } + return true; +} + static long pnv_pci_ioda1_unset_window(struct iommu_table_group *table_group, int num) { @@ -3625,7 +3640,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = { }; static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = { - .enable_device_hook = pnv_pci_enable_device_hook, + .enable_device_hook = pnv_ocapi_enable_device_hook, .window_alignment = pnv_pci_window_alignment, .reset_secondary_bus = pnv_pci_reset_secondary_bus, .shutdown = pnv_pci_ioda_shutdown, -- 2.21.0
With hotplug, an opencapi device can now go away. It needs to be released, mostly to clean up its PE state. We were previously not defining any device callback. We can reuse the standard PCI release callback, it does a bit too much for an opencapi device, but it's harmless, and only needs minor tuning. Also separate the undo of the PELT-V code in a separate function, it is not needed for NPU devices and it improves a bit the readability of the code. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- arch/powerpc/platforms/powernv/pci-ioda.c | 59 +++++++++++++++-------- 1 file changed, 39 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 06ce7ddaa0cf..e5895c05efae 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -188,7 +188,7 @@ static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe) unsigned int pe_num = pe->pe_number; WARN_ON(pe->pdev); - WARN_ON(pe->npucomp); /* NPUs are not supposed to be freed */ + WARN_ON(pe->npucomp); /* NPUs for nvlink are not supposed to be freed */ kfree(pe->npucomp); memset(pe, 0, sizeof(struct pnv_ioda_pe)); clear_bit(pe_num, phb->ioda.pe_alloc); @@ -777,6 +777,34 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, return 0; } +static void pnv_ioda_unset_peltv(struct pnv_phb *phb, + struct pnv_ioda_pe *pe, + struct pci_dev *parent) +{ + int64_t rc; + + while (parent) { + struct pci_dn *pdn = pci_get_pdn(parent); + + if (pdn && pdn->pe_number != IODA_INVALID_PE) { + rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number, + pe->pe_number, + OPAL_REMOVE_PE_FROM_DOMAIN); + /* XXX What to do in case of error ? */ + } + parent = parent->bus->self; + } + + opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number, + OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); + + /* Disassociate PE in PELT */ + rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number, + pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN); + if (rc) + pe_warn(pe, "OPAL error %lld remove self from PELTV\n", rc); +} + static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe) { struct pci_dev *parent; @@ -827,25 +855,13 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe) for (rid = pe->rid; rid < rid_end; rid++) phb->ioda.pe_rmap[rid] = IODA_INVALID_PE; - /* Release from all parents PELT-V */ - while (parent) { - struct pci_dn *pdn = pci_get_pdn(parent); - if (pdn && pdn->pe_number != IODA_INVALID_PE) { - rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number, - pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN); - /* XXX What to do in case of error ? */ - } - parent = parent->bus->self; - } - - opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number, - OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); + /* + * Release from all parents PELT-V. NPUs don't have a PELTV + * table + */ + if (phb->type != PNV_PHB_NPU_NVLINK && phb->type != PNV_PHB_NPU_OCAPI) + pnv_ioda_unset_peltv(phb, pe, parent); - /* Disassociate PE in PELT */ - rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number, - pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN); - if (rc) - pe_warn(pe, "OPAL error %lld remove self from PELTV\n", rc); rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid, bcomp, dcomp, fcomp, OPAL_UNMAP_PE); if (rc) @@ -3540,6 +3556,8 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe) case PNV_PHB_IODA2: pnv_pci_ioda2_release_pe_dma(pe); break; + case PNV_PHB_NPU_OCAPI: + break; default: WARN_ON(1); } @@ -3592,7 +3610,7 @@ static void pnv_pci_release_device(struct pci_dev *pdev) pe = &phb->ioda.pe_array[pdn->pe_number]; pdn->pe_number = IODA_INVALID_PE; - WARN_ON(--pe->device_count < 0); + WARN_ON((pe->flags != PNV_IODA_PE_DEV) && (--pe->device_count < 0)); if (pe->device_count == 0) pnv_ioda_release_pe(pe); } @@ -3641,6 +3659,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = { static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = { .enable_device_hook = pnv_ocapi_enable_device_hook, + .release_device = pnv_pci_release_device, .window_alignment = pnv_pci_window_alignment, .reset_secondary_bus = pnv_pci_reset_secondary_bus, .shutdown = pnv_pci_ioda_shutdown, -- 2.21.0
Unlike real PCI slots, opencapi slots are directly associated to the (virtual) opencapi PHB, there's no intermediate bridge. So when looking for a slot ID, we must start the search from the device node itself and not its parent. Also, the slot ID is not attached to a specific bdfn, so let's build it from the PHB ID, like skiboot. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- arch/powerpc/include/asm/pnv-pci.h | 1 + arch/powerpc/platforms/powernv/pci.c | 10 +++++++--- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h index edcb1fc50aeb..d0ee0ede5767 100644 --- a/arch/powerpc/include/asm/pnv-pci.h +++ b/arch/powerpc/include/asm/pnv-pci.h @@ -15,6 +15,7 @@ #define PCI_SLOT_ID_PREFIX (1UL << 63) #define PCI_SLOT_ID(phb_id, bdfn) \ (PCI_SLOT_ID_PREFIX | ((uint64_t)(bdfn) << 16) | (phb_id)) +#define PCI_PHB_SLOT_ID(phb_id) (phb_id) extern int pnv_pci_get_slot_id(struct device_node *np, uint64_t *id); extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len); diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index 6104418c9ad5..00a79f3c989f 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -48,13 +48,14 @@ int pnv_pci_get_slot_id(struct device_node *np, uint64_t *id) return -ENXIO; bdfn = ((bdfn & 0x00ffff00) >> 8); - while ((parent = of_get_parent(parent))) { + for (parent = np; parent; parent = of_get_parent(parent)) { if (!PCI_DN(parent)) { of_node_put(parent); break; } - if (!of_device_is_compatible(parent, "ibm,ioda2-phb")) { + if (!of_device_is_compatible(parent, "ibm,ioda2-phb") && + !of_device_is_compatible(parent, "ibm,ioda2-npu2-opencapi-phb")) { of_node_put(parent); continue; } @@ -65,7 +66,10 @@ int pnv_pci_get_slot_id(struct device_node *np, uint64_t *id) return -ENXIO; } - *id = PCI_SLOT_ID(phbid, bdfn); + if (of_device_is_compatible(parent, "ibm,ioda2-npu2-opencapi-phb")) + *id = PCI_PHB_SLOT_ID(phbid); + else + *id = PCI_SLOT_ID(phbid, bdfn); return 0; } -- 2.21.0
On powernv, when removing a device through hotplug, the following warning is logged: Invalid refcount <.> on <...> It may be incorrect, the refcount may be set to a higher value than 1 and be valid. of_detach_node() can drop more than one reference. As it doesn't seem trivial to assert the correct value, let's remove the warning. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- drivers/pci/hotplug/pnv_php.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index 6758fd7c382e..5b5cbf1e636d 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -151,17 +151,11 @@ static void pnv_php_rmv_pdns(struct device_node *dn) static void pnv_php_detach_device_nodes(struct device_node *parent) { struct device_node *dn; - int refcount; for_each_child_of_node(parent, dn) { pnv_php_detach_device_nodes(dn); of_node_put(dn); - refcount = kref_read(&dn->kobj.kref); - if (refcount != 1) - pr_warn("Invalid refcount %d on <%pOF>\n", - refcount, dn); - of_detach_node(dn); } } -- 2.21.0
When changing the slot state, if opal hits an error and tells as such in the asynchronous reply, the warning "Wrong msg" is logged, which is rather confusing. Instead we can reuse the better message which is already used when we couldn't submit the asynchronous opal request initially. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- drivers/pci/hotplug/pnv_php.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index 5b5cbf1e636d..304bdbcdb77c 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -336,18 +336,19 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot, ret = pnv_pci_set_power_state(php_slot->id, state, &msg); if (ret > 0) { if (be64_to_cpu(msg.params[1]) != php_slot->dn->phandle || - be64_to_cpu(msg.params[2]) != state || - be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) { + be64_to_cpu(msg.params[2]) != state) { pci_warn(php_slot->pdev, "Wrong msg (%lld, %lld, %lld)\n", be64_to_cpu(msg.params[1]), be64_to_cpu(msg.params[2]), be64_to_cpu(msg.params[3])); return -ENOMSG; } + if (be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) { + ret = -ENODEV; + goto error; + } } else if (ret < 0) { - pci_warn(php_slot->pdev, "Error %d powering %s\n", - ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off"); - return ret; + goto error; } if (state == OPAL_PCI_SLOT_POWER_OFF || state == OPAL_PCI_SLOT_OFFLINE) @@ -356,6 +357,11 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot, ret = pnv_php_add_devtree(php_slot); return ret; + +error: + pci_warn(php_slot->pdev, "Error %d powering %s\n", + ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off"); + return ret; } EXPORT_SYMBOL_GPL(pnv_php_set_slot_power_state); -- 2.21.0
Add the opencapi PHBs to the list of PHBs being scanned to look for slots. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- drivers/pci/hotplug/pnv_php.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index 304bdbcdb77c..f0a7360154e7 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -954,7 +954,8 @@ static int __init pnv_php_init(void) pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); for_each_compatible_node(dn, NULL, "ibm,ioda2-phb") pnv_php_register(dn); - + for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-phb") + pnv_php_register_one(dn); return 0; } @@ -964,6 +965,8 @@ static void __exit pnv_php_exit(void) for_each_compatible_node(dn, NULL, "ibm,ioda2-phb") pnv_php_unregister(dn); + for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-phb") + pnv_php_unregister(dn); } module_init(pnv_php_init); -- 2.21.0
The driver only allows to disable a slot in the POPULATED state. However, if an error occurs while enabling the slot, say because the link couldn't be trained, then the POPULATED state may not be reached, yet the power state of the slot is on. So allow to disable a slot in the REGISTERED state. Removing the devices will do nothing since it's not populated, and we'll set the power state of the slot back to off. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- drivers/pci/hotplug/pnv_php.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index f0a7360154e7..5ca51d67db4b 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -523,7 +523,13 @@ static int pnv_php_disable_slot(struct hotplug_slot *slot) struct pnv_php_slot *php_slot = to_pnv_php_slot(slot); int ret; - if (php_slot->state != PNV_PHP_STATE_POPULATED) + /* + * Allow to disable a slot already in the registered state to + * cover cases where the slot couldn't be enabled and never + * reached the populated state + */ + if (php_slot->state != PNV_PHP_STATE_POPULATED && + php_slot->state != PNV_PHP_STATE_REGISTERED) return 0; /* Remove all devices behind the slot */ -- 2.21.0
An opencapi slot doesn't have an associated bridge device. It's not needed for operation, but any warning is displayed through pci_warn() which uses the pci_dev struct of the assocated bridge device. So wrap those warning so that a different trace mechanism can be used if it's an opencapi slot. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- drivers/pci/hotplug/pnv_php.c | 51 +++++++++++++++++++---------------- 1 file changed, 28 insertions(+), 23 deletions(-) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index 5ca51d67db4b..d01a8595bc5c 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -18,6 +18,9 @@ #define DRIVER_AUTHOR "Gavin Shan, IBM Corporation" #define DRIVER_DESC "PowerPC PowerNV PCI Hotplug Driver" +#define SLOT_WARN(sl, x...) \ + ((sl)->pdev ? pci_warn((sl)->pdev, x) : dev_warn(&(sl)->bus->dev, x)) + struct pnv_php_event { bool added; struct pnv_php_slot *php_slot; @@ -265,7 +268,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot) ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000); if (ret) { - pci_warn(php_slot->pdev, "Error %d getting FDT blob\n", ret); + SLOT_WARN(php_slot, "Error %d getting FDT blob\n", ret); goto free_fdt1; } @@ -279,7 +282,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot) dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL); if (!dt) { ret = -EINVAL; - pci_warn(php_slot->pdev, "Cannot unflatten FDT\n"); + SLOT_WARN(php_slot, "Cannot unflatten FDT\n"); goto free_fdt; } @@ -289,15 +292,15 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot) ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn); if (ret) { pnv_php_reverse_nodes(php_slot->dn); - pci_warn(php_slot->pdev, "Error %d populating changeset\n", - ret); + SLOT_WARN(php_slot, "Error %d populating changeset\n", + ret); goto free_dt; } php_slot->dn->child = NULL; ret = of_changeset_apply(&php_slot->ocs); if (ret) { - pci_warn(php_slot->pdev, "Error %d applying changeset\n", ret); + SLOT_WARN(php_slot, "Error %d applying changeset\n", ret); goto destroy_changeset; } @@ -337,10 +340,10 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot, if (ret > 0) { if (be64_to_cpu(msg.params[1]) != php_slot->dn->phandle || be64_to_cpu(msg.params[2]) != state) { - pci_warn(php_slot->pdev, "Wrong msg (%lld, %lld, %lld)\n", - be64_to_cpu(msg.params[1]), - be64_to_cpu(msg.params[2]), - be64_to_cpu(msg.params[3])); + SLOT_WARN(php_slot, "Wrong msg (%lld, %lld, %lld)\n", + be64_to_cpu(msg.params[1]), + be64_to_cpu(msg.params[2]), + be64_to_cpu(msg.params[3])); return -ENOMSG; } if (be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) { @@ -359,8 +362,8 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot, return ret; error: - pci_warn(php_slot->pdev, "Error %d powering %s\n", - ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off"); + SLOT_WARN(php_slot, "Error %d powering %s\n", + ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off"); return ret; } EXPORT_SYMBOL_GPL(pnv_php_set_slot_power_state); @@ -378,8 +381,8 @@ static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state) */ ret = pnv_pci_get_power_state(php_slot->id, &power_state); if (ret) { - pci_warn(php_slot->pdev, "Error %d getting power status\n", - ret); + SLOT_WARN(php_slot, "Error %d getting power status\n", + ret); } else { *state = power_state; } @@ -402,7 +405,7 @@ static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state) *state = presence; ret = 0; } else { - pci_warn(php_slot->pdev, "Error %d getting presence\n", ret); + SLOT_WARN(php_slot, "Error %d getting presence\n", ret); } return ret; @@ -637,7 +640,7 @@ static int pnv_php_register_slot(struct pnv_php_slot *php_slot) ret = pci_hp_register(&php_slot->slot, php_slot->bus, php_slot->slot_no, php_slot->name); if (ret) { - pci_warn(php_slot->pdev, "Error %d registering slot\n", ret); + SLOT_WARN(php_slot, "Error %d registering slot\n", ret); return ret; } @@ -690,7 +693,7 @@ static int pnv_php_enable_msix(struct pnv_php_slot *php_slot) /* Enable MSIx */ ret = pci_enable_msix_exact(pdev, &entry, 1); if (ret) { - pci_warn(pdev, "Error %d enabling MSIx\n", ret); + SLOT_WARN(php_slot, "Error %d enabling MSIx\n", ret); return ret; } @@ -734,8 +737,9 @@ static irqreturn_t pnv_php_interrupt(int irq, void *data) (sts & PCI_EXP_SLTSTA_PDC)) { ret = pnv_pci_get_presence_state(php_slot->id, &presence); if (ret) { - pci_warn(pdev, "PCI slot [%s] error %d getting presence (0x%04x), to retry the operation.\n", - php_slot->name, ret, sts); + SLOT_WARN(php_slot, + "PCI slot [%s] error %d getting presence (0x%04x), to retry the operation.\n", + php_slot->name, ret, sts); return IRQ_HANDLED; } @@ -764,8 +768,9 @@ static irqreturn_t pnv_php_interrupt(int irq, void *data) */ event = kzalloc(sizeof(*event), GFP_ATOMIC); if (!event) { - pci_warn(pdev, "PCI slot [%s] missed hotplug event 0x%04x\n", - php_slot->name, sts); + SLOT_WARN(php_slot, + "PCI slot [%s] missed hotplug event 0x%04x\n", + php_slot->name, sts); return IRQ_HANDLED; } @@ -789,7 +794,7 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq) /* Allocate workqueue */ php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name); if (!php_slot->wq) { - pci_warn(pdev, "Cannot alloc workqueue\n"); + SLOT_WARN(php_slot, "Cannot alloc workqueue\n"); pnv_php_disable_irq(php_slot, true); return; } @@ -813,7 +818,7 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq) php_slot->name, php_slot); if (ret) { pnv_php_disable_irq(php_slot, true); - pci_warn(pdev, "Error %d enabling IRQ %d\n", ret, irq); + SLOT_WARN(php_slot, "Error %d enabling IRQ %d\n", ret, irq); return; } @@ -849,7 +854,7 @@ static void pnv_php_enable_irq(struct pnv_php_slot *php_slot) ret = pci_enable_device(pdev); if (ret) { - pci_warn(pdev, "Error %d enabling device\n", ret); + SLOT_WARN(php_slot, "Error %d enabling device\n", ret); return; } -- 2.21.0
The PCI hotplug framework is used to update the devices when a new image is written to the FPGA. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> --- drivers/misc/ocxl/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig index 1916fa65f2f2..2d2266c1439e 100644 --- a/drivers/misc/ocxl/Kconfig +++ b/drivers/misc/ocxl/Kconfig @@ -11,6 +11,7 @@ config OCXL tristate "OpenCAPI coherent accelerator support" depends on PPC_POWERNV && PCI && EEH select OCXL_BASE + select HOTPLUG_PCI_POWERNV default m help Select this option to enable the ocxl driver for Open -- 2.21.0
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote:
> Taking a reference on the pci_dev structure was required with initial
> commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under
> OPAL v2"), where we were storing the pci_dev in the pci_dn structure.
>
> However, the pci_dev was later removed from the pci_dn structure, but
> the reference was kept. See commit 902bdc57451c
> ("powerpc/powernv/idoa:
> Remove unnecessary pcidev from pci_dn").
>
> The pnv_ioda_pe structure life cycle is the same as the pci_dev
> structure, the PE is freed when the device is released. So we don't
> need a reference for the pci_dev stored in the PE, otherwise the
> pci_dev will never be released. Which is not really a surprise as the
> comment (removed here as no longer needed) was stating as much.
>
> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev
> from pci_dn")
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> arch/powerpc/platforms/powernv/pci-ioda.c | 11 +----------
> 1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index d8080558d020..92767f006f20 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe
> *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> return NULL;
> }
>
> - /* NOTE: We get only one ref to the pci_dev for the pdn, not
> for the
> - * pointer in the PE data structure, both should be destroyed
> at the
> - * same time. However, this needs to be looked at more closely
> again
> - * once we actually start removing things (Hotplug, SR-IOV,
> ...)
> - *
> - * At some point we want to remove the PDN completely anyways
> - */
> - pci_dev_get(dev);
> pdn->pe_number = pe->pe_number;
> pe->flags = PNV_IODA_PE_DEV;
> pe->pdev = dev;
> @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe
> *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> pnv_ioda_free_pe(pe);
> pdn->pe_number = IODA_INVALID_PE;
> pe->pdev = NULL;
> - pci_dev_put(dev);
> return NULL;
> }
>
> @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe
> *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
> */
> dev_info(&npu_pdev->dev,
> "Associating to existing PE %x\n",
> pe_num);
> - pci_dev_get(npu_pdev);
> + pci_dev_get(npu_pdev); // still needed after
> 902bdc57451c ?
> npu_pdn = pci_get_pdn(npu_pdev);
> rid = npu_pdev->bus->number << 8 | npu_pdn-
> >devfn;
> npu_pdn->pe_number = pe_num;
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote:
> Protect the PHB's list of PE. Probably not needed as long as it was
> populated during PHB creation, but it feels right and will become
> required once we can add/remove opencapi devices on hotplug.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 92767f006f20..3dbbf5365c1c 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1080,8 +1080,9 @@ static struct pnv_ioda_pe
> *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> }
>
> /* Put PE to the list */
> + mutex_lock(&phb->ioda.pe_list_mutex);
> list_add_tail(&pe->list, &phb->ioda.pe_list);
> -
> + mutex_unlock(&phb->ioda.pe_list_mutex);
> return pe;
> }
>
> @@ -3513,7 +3514,10 @@ static void pnv_ioda_release_pe(struct
> pnv_ioda_pe *pe)
> struct pnv_phb *phb = pe->phb;
> struct pnv_ioda_pe *slave, *tmp;
>
> + mutex_lock(&phb->ioda.pe_list_mutex);
> list_del(&pe->list);
> + mutex_unlock(&phb->ioda.pe_list_mutex);
> +
> switch (phb->type) {
> case PNV_PHB_IODA1:
> pnv_pci_ioda1_release_pe_dma(pe);
Hmm, the ioda.pe_list_mutex muxtex exists, and is inited, but there are
no other users. It's position & naming in the struct suggests it
belongs to ioda.pe_list, rather than pnv_ioda_pe.list (as suggested by
the lock/unlock around the list del).
Do the other accessors of ioda.pe_list also need mutex protection?
pnv_ioda_setup_bus_PE()
pnv_pci_dma_bus_setup()
pnv_pci_init_ioda_phb()
pnv_pci_ioda_setup_PEs()
If not, perhaps the metux should be removed from ioda and replaced with
pe.list_mutex instead.
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote:
> The PE for an opencapi device was set as part of a late PHB fixup
> operation, when creating the PHB. To use the PCI hotplug framework,
> this is not going to work, as the PHB stays the same, it's only the
> devices underneath which are updated. For regular PCI devices, it is
> done as part of the reconfiguration of the bridge, but for opencapi
> PHBs, we don't have an intermediate bridge. So let's define the PE
> when the device is enabled. PEs are meaningless for opencapi, the NPU
> doesn't define them and opal is not doing anything with them.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> arch/powerpc/platforms/powernv/pci-ioda.c | 31 +++++++++++++++++--
> ----
> 1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 3dbbf5365c1c..06ce7ddaa0cf 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1260,8 +1260,6 @@ static void pnv_pci_ioda_setup_PEs(void)
> {
> struct pci_controller *hose;
> struct pnv_phb *phb;
> - struct pci_bus *bus;
> - struct pci_dev *pdev;
> struct pnv_ioda_pe *pe;
>
> list_for_each_entry(hose, &hose_list, list_node) {
> @@ -1273,11 +1271,6 @@ static void pnv_pci_ioda_setup_PEs(void)
> if (phb->model == PNV_PHB_MODEL_NPU2)
> WARN_ON_ONCE(pnv_npu2_init(hose));
> }
> - if (phb->type == PNV_PHB_NPU_OCAPI) {
> - bus = hose->bus;
> - list_for_each_entry(pdev, &bus->devices,
> bus_list)
> - pnv_ioda_setup_dev_PE(pdev);
> - }
> }
> list_for_each_entry(hose, &hose_list, list_node) {
> phb = hose->private_data;
> @@ -3385,6 +3378,28 @@ static bool pnv_pci_enable_device_hook(struct
> pci_dev *dev)
> return true;
> }
>
> +static bool pnv_ocapi_enable_device_hook(struct pci_dev *dev)
> +{
> + struct pci_controller *hose = pci_bus_to_host(dev->bus);
> + struct pnv_phb *phb = hose->private_data;
> + struct pci_dn *pdn;
> + struct pnv_ioda_pe *pe;
> +
> + if (!phb->initialized)
> + return true;
> +
> + pdn = pci_get_pdn(dev);
> + if (!pdn)
> + return false;
> +
> + if (pdn->pe_number == IODA_INVALID_PE) {
> + pe = pnv_ioda_setup_dev_PE(dev);
> + if (!pe)
> + return false;
> + }
> + return true;
> +}
> +
> static long pnv_pci_ioda1_unset_window(struct iommu_table_group
> *table_group,
> int num)
> {
> @@ -3625,7 +3640,7 @@ static const struct pci_controller_ops
> pnv_npu_ioda_controller_ops = {
> };
>
> static const struct pci_controller_ops
> pnv_npu_ocapi_ioda_controller_ops = {
> - .enable_device_hook = pnv_pci_enable_device_hook,
> + .enable_device_hook = pnv_ocapi_enable_device_hook,
> .window_alignment = pnv_pci_window_alignment,
> .reset_secondary_bus = pnv_pci_reset_secondary_bus,
> .shutdown = pnv_pci_ioda_shutdown,
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote: > With hotplug, an opencapi device can now go away. It needs to be > released, mostly to clean up its PE state. We were previously not > defining any device callback. We can reuse the standard PCI release > callback, it does a bit too much for an opencapi device, but it's > harmless, and only needs minor tuning. > > Also separate the undo of the PELT-V code in a separate function, it > is not needed for NPU devices and it improves a bit the readability > of > the code. > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> > --- > arch/powerpc/platforms/powernv/pci-ioda.c | 59 +++++++++++++++---- > ---- > 1 file changed, 39 insertions(+), 20 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index 06ce7ddaa0cf..e5895c05efae 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -188,7 +188,7 @@ static void pnv_ioda_free_pe(struct pnv_ioda_pe > *pe) > unsigned int pe_num = pe->pe_number; > > WARN_ON(pe->pdev); > - WARN_ON(pe->npucomp); /* NPUs are not supposed to be freed */ > + WARN_ON(pe->npucomp); /* NPUs for nvlink are not supposed to be > freed */ > kfree(pe->npucomp); > memset(pe, 0, sizeof(struct pnv_ioda_pe)); > clear_bit(pe_num, phb->ioda.pe_alloc); > @@ -777,6 +777,34 @@ static int pnv_ioda_set_peltv(struct pnv_phb > *phb, > return 0; > } > > +static void pnv_ioda_unset_peltv(struct pnv_phb *phb, > + struct pnv_ioda_pe *pe, > + struct pci_dev *parent) > +{ > + int64_t rc; > + > + while (parent) { > + struct pci_dn *pdn = pci_get_pdn(parent); > + > + if (pdn && pdn->pe_number != IODA_INVALID_PE) { > + rc = opal_pci_set_peltv(phb->opal_id, pdn- > >pe_number, > + pe->pe_number, > + OPAL_REMOVE_PE_FROM_DOM > AIN); > + /* XXX What to do in case of error ? */ Can we take the opportunity to address this comment? > + } > + parent = parent->bus->self; > + } > + > + opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number, > + OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); > + > + /* Disassociate PE in PELT */ > + rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number, > + pe->pe_number, > OPAL_REMOVE_PE_FROM_DOMAIN); > + if (rc) > + pe_warn(pe, "OPAL error %lld remove self from PELTV\n", > rc); > +} > + > static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct > pnv_ioda_pe *pe) > { > struct pci_dev *parent; > @@ -827,25 +855,13 @@ static int pnv_ioda_deconfigure_pe(struct > pnv_phb *phb, struct pnv_ioda_pe *pe) > for (rid = pe->rid; rid < rid_end; rid++) > phb->ioda.pe_rmap[rid] = IODA_INVALID_PE; > > - /* Release from all parents PELT-V */ > - while (parent) { > - struct pci_dn *pdn = pci_get_pdn(parent); > - if (pdn && pdn->pe_number != IODA_INVALID_PE) { > - rc = opal_pci_set_peltv(phb->opal_id, pdn- > >pe_number, > - pe->pe_number, > OPAL_REMOVE_PE_FROM_DOMAIN); > - /* XXX What to do in case of error ? */ > - } > - parent = parent->bus->self; > - } > - > - opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number, > - OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); > + /* > + * Release from all parents PELT-V. NPUs don't have a PELTV > + * table > + */ > + if (phb->type != PNV_PHB_NPU_NVLINK && phb->type != > PNV_PHB_NPU_OCAPI) > + pnv_ioda_unset_peltv(phb, pe, parent); > > - /* Disassociate PE in PELT */ > - rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number, > - pe->pe_number, > OPAL_REMOVE_PE_FROM_DOMAIN); > - if (rc) > - pe_warn(pe, "OPAL error %lld remove self from PELTV\n", > rc); > rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid, > bcomp, dcomp, fcomp, OPAL_UNMAP_PE); > if (rc) > @@ -3540,6 +3556,8 @@ static void pnv_ioda_release_pe(struct > pnv_ioda_pe *pe) > case PNV_PHB_IODA2: > pnv_pci_ioda2_release_pe_dma(pe); > break; > + case PNV_PHB_NPU_OCAPI: > + break; > default: > WARN_ON(1); > } > @@ -3592,7 +3610,7 @@ static void pnv_pci_release_device(struct > pci_dev *pdev) > pe = &phb->ioda.pe_array[pdn->pe_number]; > pdn->pe_number = IODA_INVALID_PE; > > - WARN_ON(--pe->device_count < 0); > + WARN_ON((pe->flags != PNV_IODA_PE_DEV) && (--pe->device_count < > 0)); > if (pe->device_count == 0) > pnv_ioda_release_pe(pe); > } > @@ -3641,6 +3659,7 @@ static const struct pci_controller_ops > pnv_npu_ioda_controller_ops = { > > static const struct pci_controller_ops > pnv_npu_ocapi_ioda_controller_ops = { > .enable_device_hook = pnv_ocapi_enable_device_hook, > + .release_device = pnv_pci_release_device, > .window_alignment = pnv_pci_window_alignment, > .reset_secondary_bus = pnv_pci_reset_secondary_bus, > .shutdown = pnv_pci_ioda_shutdown, -- Alastair D'Silva Open Source Developer Linux Technology Centre, IBM Australia mob: 0423 762 819
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote:
> Unlike real PCI slots, opencapi slots are directly associated to
> the (virtual) opencapi PHB, there's no intermediate bridge. So when
> looking for a slot ID, we must start the search from the device node
> itself and not its parent.
>
> Also, the slot ID is not attached to a specific bdfn, so let's build
> it from the PHB ID, like skiboot.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> arch/powerpc/include/asm/pnv-pci.h | 1 +
> arch/powerpc/platforms/powernv/pci.c | 10 +++++++---
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pnv-pci.h
> b/arch/powerpc/include/asm/pnv-pci.h
> index edcb1fc50aeb..d0ee0ede5767 100644
> --- a/arch/powerpc/include/asm/pnv-pci.h
> +++ b/arch/powerpc/include/asm/pnv-pci.h
> @@ -15,6 +15,7 @@
> #define PCI_SLOT_ID_PREFIX (1UL << 63)
> #define PCI_SLOT_ID(phb_id, bdfn) \
> (PCI_SLOT_ID_PREFIX | ((uint64_t)(bdfn) << 16) | (phb_id))
> +#define PCI_PHB_SLOT_ID(phb_id) (phb_id)
>
> extern int pnv_pci_get_slot_id(struct device_node *np, uint64_t
> *id);
> extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf,
> uint64_t len);
> diff --git a/arch/powerpc/platforms/powernv/pci.c
> b/arch/powerpc/platforms/powernv/pci.c
> index 6104418c9ad5..00a79f3c989f 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -48,13 +48,14 @@ int pnv_pci_get_slot_id(struct device_node *np,
> uint64_t *id)
> return -ENXIO;
>
> bdfn = ((bdfn & 0x00ffff00) >> 8);
> - while ((parent = of_get_parent(parent))) {
> + for (parent = np; parent; parent = of_get_parent(parent)) {
> if (!PCI_DN(parent)) {
> of_node_put(parent);
> break;
> }
>
> - if (!of_device_is_compatible(parent, "ibm,ioda2-phb"))
> {
> + if (!of_device_is_compatible(parent, "ibm,ioda2-phb")
> &&
> + !of_device_is_compatible(parent, "ibm,ioda2-npu2-
> opencapi-phb")) {
> of_node_put(parent);
> continue;
> }
> @@ -65,7 +66,10 @@ int pnv_pci_get_slot_id(struct device_node *np,
> uint64_t *id)
> return -ENXIO;
> }
>
> - *id = PCI_SLOT_ID(phbid, bdfn);
> + if (of_device_is_compatible(parent, "ibm,ioda2-npu2-
> opencapi-phb"))
> + *id = PCI_PHB_SLOT_ID(phbid);
> + else
> + *id = PCI_SLOT_ID(phbid, bdfn);
> return 0;
> }
>
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote:
> On powernv, when removing a device through hotplug, the following
> warning is logged:
>
> Invalid refcount <.> on <...>
>
> It may be incorrect, the refcount may be set to a higher value than 1
> and be valid. of_detach_node() can drop more than one reference. As
> it
> doesn't seem trivial to assert the correct value, let's remove the
> warning.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> drivers/pci/hotplug/pnv_php.c | 6 ------
> 1 file changed, 6 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pnv_php.c
> b/drivers/pci/hotplug/pnv_php.c
> index 6758fd7c382e..5b5cbf1e636d 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -151,17 +151,11 @@ static void pnv_php_rmv_pdns(struct device_node
> *dn)
> static void pnv_php_detach_device_nodes(struct device_node *parent)
> {
> struct device_node *dn;
> - int refcount;
>
> for_each_child_of_node(parent, dn) {
> pnv_php_detach_device_nodes(dn);
>
> of_node_put(dn);
> - refcount = kref_read(&dn->kobj.kref);
> - if (refcount != 1)
> - pr_warn("Invalid refcount %d on <%pOF>\n",
> - refcount, dn);
> -
> of_detach_node(dn);
> }
> }
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote:
> When changing the slot state, if opal hits an error and tells as such
> in the asynchronous reply, the warning "Wrong msg" is logged, which
> is
> rather confusing. Instead we can reuse the better message which is
> already used when we couldn't submit the asynchronous opal request
> initially.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> drivers/pci/hotplug/pnv_php.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pnv_php.c
> b/drivers/pci/hotplug/pnv_php.c
> index 5b5cbf1e636d..304bdbcdb77c 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -336,18 +336,19 @@ int pnv_php_set_slot_power_state(struct
> hotplug_slot *slot,
> ret = pnv_pci_set_power_state(php_slot->id, state, &msg);
> if (ret > 0) {
> if (be64_to_cpu(msg.params[1]) != php_slot->dn->phandle
> ||
> - be64_to_cpu(msg.params[2]) != state
> ||
> - be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) {
> + be64_to_cpu(msg.params[2]) != state) {
> pci_warn(php_slot->pdev, "Wrong msg (%lld,
> %lld, %lld)\n",
> be64_to_cpu(msg.params[1]),
> be64_to_cpu(msg.params[2]),
> be64_to_cpu(msg.params[3]));
> return -ENOMSG;
> }
> + if (be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) {
> + ret = -ENODEV;
> + goto error;
> + }
> } else if (ret < 0) {
> - pci_warn(php_slot->pdev, "Error %d powering %s\n",
> - ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on"
> : "off");
> - return ret;
> + goto error;
> }
>
> if (state == OPAL_PCI_SLOT_POWER_OFF || state ==
> OPAL_PCI_SLOT_OFFLINE)
> @@ -356,6 +357,11 @@ int pnv_php_set_slot_power_state(struct
> hotplug_slot *slot,
> ret = pnv_php_add_devtree(php_slot);
>
> return ret;
> +
> +error:
> + pci_warn(php_slot->pdev, "Error %d powering %s\n",
> + ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" :
> "off");
> + return ret;
> }
> EXPORT_SYMBOL_GPL(pnv_php_set_slot_power_state);
>
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote:
> Add the opencapi PHBs to the list of PHBs being scanned to look for
> slots.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> drivers/pci/hotplug/pnv_php.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/hotplug/pnv_php.c
> b/drivers/pci/hotplug/pnv_php.c
> index 304bdbcdb77c..f0a7360154e7 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -954,7 +954,8 @@ static int __init pnv_php_init(void)
> pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> pnv_php_register(dn);
> -
> + for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-
> phb")
> + pnv_php_register_one(dn);
> return 0;
> }
>
> @@ -964,6 +965,8 @@ static void __exit pnv_php_exit(void)
>
> for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> pnv_php_unregister(dn);
> + for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-
> phb")
> + pnv_php_unregister(dn);
> }
>
> module_init(pnv_php_init);
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote:
> The driver only allows to disable a slot in the POPULATED
> state. However, if an error occurs while enabling the slot, say
> because the link couldn't be trained, then the POPULATED state may
> not
> be reached, yet the power state of the slot is on. So allow to
> disable
> a slot in the REGISTERED state. Removing the devices will do nothing
> since it's not populated, and we'll set the power state of the slot
> back to off.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> drivers/pci/hotplug/pnv_php.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/hotplug/pnv_php.c
> b/drivers/pci/hotplug/pnv_php.c
> index f0a7360154e7..5ca51d67db4b 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -523,7 +523,13 @@ static int pnv_php_disable_slot(struct
> hotplug_slot *slot)
> struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
> int ret;
>
> - if (php_slot->state != PNV_PHP_STATE_POPULATED)
> + /*
> + * Allow to disable a slot already in the registered state to
> + * cover cases where the slot couldn't be enabled and never
> + * reached the populated state
> + */
> + if (php_slot->state != PNV_PHP_STATE_POPULATED &&
> + php_slot->state != PNV_PHP_STATE_REGISTERED)
> return 0;
>
> /* Remove all devices behind the slot */
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote:
> An opencapi slot doesn't have an associated bridge device. It's not
> needed for operation, but any warning is displayed through pci_warn()
> which uses the pci_dev struct of the assocated bridge device. So wrap
> those warning so that a different trace mechanism can be used if it's
> an opencapi slot.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> drivers/pci/hotplug/pnv_php.c | 51 +++++++++++++++++++------------
> ----
> 1 file changed, 28 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pnv_php.c
> b/drivers/pci/hotplug/pnv_php.c
> index 5ca51d67db4b..d01a8595bc5c 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -18,6 +18,9 @@
> #define DRIVER_AUTHOR "Gavin Shan, IBM Corporation"
> #define DRIVER_DESC "PowerPC PowerNV PCI Hotplug Driver"
>
> +#define SLOT_WARN(sl, x...) \
> + ((sl)->pdev ? pci_warn((sl)->pdev, x) : dev_warn(&(sl)->bus-
> >dev, x))
> +
> struct pnv_php_event {
> bool added;
> struct pnv_php_slot *php_slot;
> @@ -265,7 +268,7 @@ static int pnv_php_add_devtree(struct
> pnv_php_slot *php_slot)
>
> ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1,
> 0x10000);
> if (ret) {
> - pci_warn(php_slot->pdev, "Error %d getting FDT blob\n",
> ret);
> + SLOT_WARN(php_slot, "Error %d getting FDT blob\n",
> ret);
> goto free_fdt1;
> }
>
> @@ -279,7 +282,7 @@ static int pnv_php_add_devtree(struct
> pnv_php_slot *php_slot)
> dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
> if (!dt) {
> ret = -EINVAL;
> - pci_warn(php_slot->pdev, "Cannot unflatten FDT\n");
> + SLOT_WARN(php_slot, "Cannot unflatten FDT\n");
> goto free_fdt;
> }
>
> @@ -289,15 +292,15 @@ static int pnv_php_add_devtree(struct
> pnv_php_slot *php_slot)
> ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
> if (ret) {
> pnv_php_reverse_nodes(php_slot->dn);
> - pci_warn(php_slot->pdev, "Error %d populating
> changeset\n",
> - ret);
> + SLOT_WARN(php_slot, "Error %d populating changeset\n",
> + ret);
> goto free_dt;
> }
>
> php_slot->dn->child = NULL;
> ret = of_changeset_apply(&php_slot->ocs);
> if (ret) {
> - pci_warn(php_slot->pdev, "Error %d applying
> changeset\n", ret);
> + SLOT_WARN(php_slot, "Error %d applying changeset\n",
> ret);
> goto destroy_changeset;
> }
>
> @@ -337,10 +340,10 @@ int pnv_php_set_slot_power_state(struct
> hotplug_slot *slot,
> if (ret > 0) {
> if (be64_to_cpu(msg.params[1]) != php_slot->dn->phandle
> ||
> be64_to_cpu(msg.params[2]) != state) {
> - pci_warn(php_slot->pdev, "Wrong msg (%lld,
> %lld, %lld)\n",
> - be64_to_cpu(msg.params[1]),
> - be64_to_cpu(msg.params[2]),
> - be64_to_cpu(msg.params[3]));
> + SLOT_WARN(php_slot, "Wrong msg (%lld, %lld,
> %lld)\n",
> + be64_to_cpu(msg.params[1]),
> + be64_to_cpu(msg.params[2]),
> + be64_to_cpu(msg.params[3]));
> return -ENOMSG;
> }
> if (be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) {
> @@ -359,8 +362,8 @@ int pnv_php_set_slot_power_state(struct
> hotplug_slot *slot,
> return ret;
>
> error:
> - pci_warn(php_slot->pdev, "Error %d powering %s\n",
> - ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" :
> "off");
> + SLOT_WARN(php_slot, "Error %d powering %s\n",
> + ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" :
> "off");
> return ret;
> }
> EXPORT_SYMBOL_GPL(pnv_php_set_slot_power_state);
> @@ -378,8 +381,8 @@ static int pnv_php_get_power_state(struct
> hotplug_slot *slot, u8 *state)
> */
> ret = pnv_pci_get_power_state(php_slot->id, &power_state);
> if (ret) {
> - pci_warn(php_slot->pdev, "Error %d getting power
> status\n",
> - ret);
> + SLOT_WARN(php_slot, "Error %d getting power status\n",
> + ret);
> } else {
> *state = power_state;
> }
> @@ -402,7 +405,7 @@ static int pnv_php_get_adapter_state(struct
> hotplug_slot *slot, u8 *state)
> *state = presence;
> ret = 0;
> } else {
> - pci_warn(php_slot->pdev, "Error %d getting presence\n",
> ret);
> + SLOT_WARN(php_slot, "Error %d getting presence\n",
> ret);
> }
>
> return ret;
> @@ -637,7 +640,7 @@ static int pnv_php_register_slot(struct
> pnv_php_slot *php_slot)
> ret = pci_hp_register(&php_slot->slot, php_slot->bus,
> php_slot->slot_no, php_slot->name);
> if (ret) {
> - pci_warn(php_slot->pdev, "Error %d registering slot\n",
> ret);
> + SLOT_WARN(php_slot, "Error %d registering slot\n",
> ret);
> return ret;
> }
>
> @@ -690,7 +693,7 @@ static int pnv_php_enable_msix(struct
> pnv_php_slot *php_slot)
> /* Enable MSIx */
> ret = pci_enable_msix_exact(pdev, &entry, 1);
> if (ret) {
> - pci_warn(pdev, "Error %d enabling MSIx\n", ret);
> + SLOT_WARN(php_slot, "Error %d enabling MSIx\n", ret);
> return ret;
> }
>
> @@ -734,8 +737,9 @@ static irqreturn_t pnv_php_interrupt(int irq,
> void *data)
> (sts & PCI_EXP_SLTSTA_PDC)) {
> ret = pnv_pci_get_presence_state(php_slot->id,
> &presence);
> if (ret) {
> - pci_warn(pdev, "PCI slot [%s] error %d getting
> presence (0x%04x), to retry the operation.\n",
> - php_slot->name, ret, sts);
> + SLOT_WARN(php_slot,
> + "PCI slot [%s] error %d getting
> presence (0x%04x), to retry the operation.\n",
> + php_slot->name, ret, sts);
> return IRQ_HANDLED;
> }
>
> @@ -764,8 +768,9 @@ static irqreturn_t pnv_php_interrupt(int irq,
> void *data)
> */
> event = kzalloc(sizeof(*event), GFP_ATOMIC);
> if (!event) {
> - pci_warn(pdev, "PCI slot [%s] missed hotplug event
> 0x%04x\n",
> - php_slot->name, sts);
> + SLOT_WARN(php_slot,
> + "PCI slot [%s] missed hotplug event
> 0x%04x\n",
> + php_slot->name, sts);
> return IRQ_HANDLED;
> }
>
> @@ -789,7 +794,7 @@ static void pnv_php_init_irq(struct pnv_php_slot
> *php_slot, int irq)
> /* Allocate workqueue */
> php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot-
> >name);
> if (!php_slot->wq) {
> - pci_warn(pdev, "Cannot alloc workqueue\n");
> + SLOT_WARN(php_slot, "Cannot alloc workqueue\n");
> pnv_php_disable_irq(php_slot, true);
> return;
> }
> @@ -813,7 +818,7 @@ static void pnv_php_init_irq(struct pnv_php_slot
> *php_slot, int irq)
> php_slot->name, php_slot);
> if (ret) {
> pnv_php_disable_irq(php_slot, true);
> - pci_warn(pdev, "Error %d enabling IRQ %d\n", ret, irq);
> + SLOT_WARN(php_slot, "Error %d enabling IRQ %d\n", ret,
> irq);
> return;
> }
>
> @@ -849,7 +854,7 @@ static void pnv_php_enable_irq(struct
> pnv_php_slot *php_slot)
>
> ret = pci_enable_device(pdev);
> if (ret) {
> - pci_warn(pdev, "Error %d enabling device\n", ret);
> + SLOT_WARN(php_slot, "Error %d enabling device\n", ret);
> return;
> }
>
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
On Mon, 2019-09-09 at 17:46 +0200, Frederic Barrat wrote:
> The PCI hotplug framework is used to update the devices when a new
> image is written to the FPGA.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> drivers/misc/ocxl/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
> index 1916fa65f2f2..2d2266c1439e 100644
> --- a/drivers/misc/ocxl/Kconfig
> +++ b/drivers/misc/ocxl/Kconfig
> @@ -11,6 +11,7 @@ config OCXL
> tristate "OpenCAPI coherent accelerator support"
> depends on PPC_POWERNV && PCI && EEH
> select OCXL_BASE
> + select HOTPLUG_PCI_POWERNV
> default m
> help
> Select this option to enable the ocxl driver for Open
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
Hi Fred,
what is this made against of? It does not apply on the master. Thanks,
On 10/09/2019 01:45, Frederic Barrat wrote:
> This is the linux part of the work to use the PCI hotplug framework to
> control an opencapi card so that it can be reset and re-read after
> flashing a new FPGA image. I had posted it earlier as an RFC and this
> version is mostly similar, with just some minor editing.
>
> It needs support in skiboot:
> http://patchwork.ozlabs.org/project/skiboot/list/?series=129724
> On an old skiboot, it will do nothing.
>
> A virtual PCI slot is created for the opencapi adapter, and its state
> can be controlled through the pnv-php hotplug driver:
>
> echo 0|1 > /sys/bus/pci/slots/OPENCAPI-<...>/power
>
> Note that the power to the card is not really turned off, as the card
> needs to stay on to be flashed with a new image. Instead the card is
> in reset.
>
> The first part of the series mostly deals with the pci/ioda state, as
> the opencapi devices can now go away and the state needs to be cleaned
> up.
>
> The second part is modifications to the PCI hotplug driver on powernv,
> so that a virtual slot is created for the opencapi adapters found in
> the device tree.
>
>
>
> Frederic Barrat (11):
> powerpc/powernv/ioda: Fix ref count for devices with their own PE
> powerpc/powernv/ioda: Protect PE list
> powerpc/powernv/ioda: set up PE on opencapi device when enabling
> powerpc/powernv/ioda: Release opencapi device
> powerpc/powernv/ioda: Find opencapi slot for a device node
> pci/hotplug/pnv-php: Remove erroneous warning
> pci/hotplug/pnv-php: Improve error msg on power state change failure
> pci/hotplug/pnv-php: Register opencapi slots
> pci/hotplug/pnv-php: Relax check when disabling slot
> pci/hotplug/pnv-php: Wrap warnings in macro
> ocxl: Add PCI hotplug dependency to Kconfig
>
> arch/powerpc/include/asm/pnv-pci.h | 1 +
> arch/powerpc/platforms/powernv/pci-ioda.c | 107 ++++++++++++++--------
> arch/powerpc/platforms/powernv/pci.c | 10 +-
> drivers/misc/ocxl/Kconfig | 1 +
> drivers/pci/hotplug/pnv_php.c | 82 ++++++++++-------
> 5 files changed, 125 insertions(+), 76 deletions(-)
>
--
Alexey
Le 24/09/2019 à 06:24, Alexey Kardashevskiy a écrit : > Hi Fred, > > what is this made against of? It does not apply on the master. Thanks, It applies on v5.3. And I can see there's a conflict with the current state in the merge window. I'll resubmit. Fred > On 10/09/2019 01:45, Frederic Barrat wrote: >> This is the linux part of the work to use the PCI hotplug framework to >> control an opencapi card so that it can be reset and re-read after >> flashing a new FPGA image. I had posted it earlier as an RFC and this >> version is mostly similar, with just some minor editing. >> >> It needs support in skiboot: >> http://patchwork.ozlabs.org/project/skiboot/list/?series=129724 >> On an old skiboot, it will do nothing. >> >> A virtual PCI slot is created for the opencapi adapter, and its state >> can be controlled through the pnv-php hotplug driver: >> >> echo 0|1 > /sys/bus/pci/slots/OPENCAPI-<...>/power >> >> Note that the power to the card is not really turned off, as the card >> needs to stay on to be flashed with a new image. Instead the card is >> in reset. >> >> The first part of the series mostly deals with the pci/ioda state, as >> the opencapi devices can now go away and the state needs to be cleaned >> up. >> >> The second part is modifications to the PCI hotplug driver on powernv, >> so that a virtual slot is created for the opencapi adapters found in >> the device tree. >> >> >> >> Frederic Barrat (11): >> powerpc/powernv/ioda: Fix ref count for devices with their own PE >> powerpc/powernv/ioda: Protect PE list >> powerpc/powernv/ioda: set up PE on opencapi device when enabling >> powerpc/powernv/ioda: Release opencapi device >> powerpc/powernv/ioda: Find opencapi slot for a device node >> pci/hotplug/pnv-php: Remove erroneous warning >> pci/hotplug/pnv-php: Improve error msg on power state change failure >> pci/hotplug/pnv-php: Register opencapi slots >> pci/hotplug/pnv-php: Relax check when disabling slot >> pci/hotplug/pnv-php: Wrap warnings in macro >> ocxl: Add PCI hotplug dependency to Kconfig >> >> arch/powerpc/include/asm/pnv-pci.h | 1 + >> arch/powerpc/platforms/powernv/pci-ioda.c | 107 ++++++++++++++-------- >> arch/powerpc/platforms/powernv/pci.c | 10 +- >> drivers/misc/ocxl/Kconfig | 1 + >> drivers/pci/hotplug/pnv_php.c | 82 ++++++++++------- >> 5 files changed, 125 insertions(+), 76 deletions(-) >> >
On 24/09/2019 16:39, Frederic Barrat wrote: > > > Le 24/09/2019 à 06:24, Alexey Kardashevskiy a écrit : >> Hi Fred, >> >> what is this made against of? It does not apply on the master. Thanks, > > It applies on v5.3. And I can see there's a conflict with the current > state in the merge window. I'll resubmit. Not really necessary, just mention the exact sha1 or tag in the cover letter next time. Thanks, > > Fred > > > >> On 10/09/2019 01:45, Frederic Barrat wrote: >>> This is the linux part of the work to use the PCI hotplug framework to >>> control an opencapi card so that it can be reset and re-read after >>> flashing a new FPGA image. I had posted it earlier as an RFC and this >>> version is mostly similar, with just some minor editing. >>> >>> It needs support in skiboot: >>> http://patchwork.ozlabs.org/project/skiboot/list/?series=129724 >>> On an old skiboot, it will do nothing. >>> >>> A virtual PCI slot is created for the opencapi adapter, and its state >>> can be controlled through the pnv-php hotplug driver: >>> >>> echo 0|1 > /sys/bus/pci/slots/OPENCAPI-<...>/power >>> >>> Note that the power to the card is not really turned off, as the card >>> needs to stay on to be flashed with a new image. Instead the card is >>> in reset. >>> >>> The first part of the series mostly deals with the pci/ioda state, as >>> the opencapi devices can now go away and the state needs to be cleaned >>> up. >>> >>> The second part is modifications to the PCI hotplug driver on powernv, >>> so that a virtual slot is created for the opencapi adapters found in >>> the device tree. >>> >>> >>> >>> Frederic Barrat (11): >>> powerpc/powernv/ioda: Fix ref count for devices with their own PE >>> powerpc/powernv/ioda: Protect PE list >>> powerpc/powernv/ioda: set up PE on opencapi device when enabling >>> powerpc/powernv/ioda: Release opencapi device >>> powerpc/powernv/ioda: Find opencapi slot for a device node >>> pci/hotplug/pnv-php: Remove erroneous warning >>> pci/hotplug/pnv-php: Improve error msg on power state change failure >>> pci/hotplug/pnv-php: Register opencapi slots >>> pci/hotplug/pnv-php: Relax check when disabling slot >>> pci/hotplug/pnv-php: Wrap warnings in macro >>> ocxl: Add PCI hotplug dependency to Kconfig >>> >>> arch/powerpc/include/asm/pnv-pci.h | 1 + >>> arch/powerpc/platforms/powernv/pci-ioda.c | 107 ++++++++++++++-------- >>> arch/powerpc/platforms/powernv/pci.c | 10 +- >>> drivers/misc/ocxl/Kconfig | 1 + >>> drivers/pci/hotplug/pnv_php.c | 82 ++++++++++------- >>> 5 files changed, 125 insertions(+), 76 deletions(-) >>> >> > -- Alexey
On 9/9/19 5:45 pm, Frederic Barrat wrote: > Taking a reference on the pci_dev structure was required with initial > commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under > OPAL v2"), where we were storing the pci_dev in the pci_dn structure. > > However, the pci_dev was later removed from the pci_dn structure, but > the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa: > Remove unnecessary pcidev from pci_dn"). > > The pnv_ioda_pe structure life cycle is the same as the pci_dev > structure, the PE is freed when the device is released. So we don't > need a reference for the pci_dev stored in the PE, otherwise the > pci_dev will never be released. Which is not really a surprise as the > comment (removed here as no longer needed) was stating as much. > > Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn") > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> > --- > arch/powerpc/platforms/powernv/pci-ioda.c | 11 +---------- > 1 file changed, 1 insertion(+), 10 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > index d8080558d020..92767f006f20 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) > return NULL; > } > > - /* NOTE: We get only one ref to the pci_dev for the pdn, not for the > - * pointer in the PE data structure, both should be destroyed at the > - * same time. However, this needs to be looked at more closely again > - * once we actually start removing things (Hotplug, SR-IOV, ...) > - * > - * At some point we want to remove the PDN completely anyways > - */ > - pci_dev_get(dev); > pdn->pe_number = pe->pe_number; > pe->flags = PNV_IODA_PE_DEV; > pe->pdev = dev; > @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) > pnv_ioda_free_pe(pe); > pdn->pe_number = IODA_INVALID_PE; > pe->pdev = NULL; > - pci_dev_put(dev); > return NULL; > } > > @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev) > */ > dev_info(&npu_pdev->dev, > "Associating to existing PE %x\n", pe_num); > - pci_dev_get(npu_pdev); > + pci_dev_get(npu_pdev); // still needed after 902bdc57451c ? Did you mean to leave that comment in? > npu_pdn = pci_get_pdn(npu_pdev); > rid = npu_pdev->bus->number << 8 | npu_pdn->devfn; > npu_pdn->pe_number = pe_num; > -- Andrew Donnellan OzLabs, ADL Canberra ajd@linux.ibm.com IBM Australia Limited
On 9/9/19 5:46 pm, Frederic Barrat wrote: > The PCI hotplug framework is used to update the devices when a new > image is written to the FPGA. > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> > --- > drivers/misc/ocxl/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig > index 1916fa65f2f2..2d2266c1439e 100644 > --- a/drivers/misc/ocxl/Kconfig > +++ b/drivers/misc/ocxl/Kconfig > @@ -11,6 +11,7 @@ config OCXL > tristate "OpenCAPI coherent accelerator support" > depends on PPC_POWERNV && PCI && EEH > select OCXL_BASE > + select HOTPLUG_PCI_POWERNV > default m > help > Select this option to enable the ocxl driver for Open > -- Andrew Donnellan OzLabs, ADL Canberra ajd@linux.ibm.com IBM Australia Limited
Le 26/09/2019 à 18:44, Andrew Donnellan a écrit : > On 9/9/19 5:45 pm, Frederic Barrat wrote: >> Taking a reference on the pci_dev structure was required with initial >> commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under >> OPAL v2"), where we were storing the pci_dev in the pci_dn structure. >> >> However, the pci_dev was later removed from the pci_dn structure, but >> the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa: >> Remove unnecessary pcidev from pci_dn"). >> >> The pnv_ioda_pe structure life cycle is the same as the pci_dev >> structure, the PE is freed when the device is released. So we don't >> need a reference for the pci_dev stored in the PE, otherwise the >> pci_dev will never be released. Which is not really a surprise as the >> comment (removed here as no longer needed) was stating as much. >> >> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev >> from pci_dn") >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> >> --- >> arch/powerpc/platforms/powernv/pci-ioda.c | 11 +---------- >> 1 file changed, 1 insertion(+), 10 deletions(-) >> >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >> b/arch/powerpc/platforms/powernv/pci-ioda.c >> index d8080558d020..92767f006f20 100644 >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c >> @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe >> *pnv_ioda_setup_dev_PE(struct pci_dev *dev) >> return NULL; >> } >> - /* NOTE: We get only one ref to the pci_dev for the pdn, not for the >> - * pointer in the PE data structure, both should be destroyed at the >> - * same time. However, this needs to be looked at more closely again >> - * once we actually start removing things (Hotplug, SR-IOV, ...) >> - * >> - * At some point we want to remove the PDN completely anyways >> - */ >> - pci_dev_get(dev); >> pdn->pe_number = pe->pe_number; >> pe->flags = PNV_IODA_PE_DEV; >> pe->pdev = dev; >> @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe >> *pnv_ioda_setup_dev_PE(struct pci_dev *dev) >> pnv_ioda_free_pe(pe); >> pdn->pe_number = IODA_INVALID_PE; >> pe->pdev = NULL; >> - pci_dev_put(dev); >> return NULL; >> } >> @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe >> *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev) >> */ >> dev_info(&npu_pdev->dev, >> "Associating to existing PE %x\n", pe_num); >> - pci_dev_get(npu_pdev); >> + pci_dev_get(npu_pdev); // still needed after 902bdc57451c ? > > Did you mean to leave that comment in? Yes, I assumed the series wouldn't get in on the first try and a nvlink-minded developer would weigh in :) Fred >> npu_pdn = pci_get_pdn(npu_pdev); >> rid = npu_pdev->bus->number << 8 | npu_pdn->devfn; >> npu_pdn->pe_number = pe_num; >> >
On 9/9/19 5:45 pm, Frederic Barrat wrote: > An opencapi slot doesn't have an associated bridge device. It's not > needed for operation, but any warning is displayed through pci_warn() > which uses the pci_dev struct of the assocated bridge device. So wrap > those warning so that a different trace mechanism can be used if it's > an opencapi slot. > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> > --- > drivers/pci/hotplug/pnv_php.c | 51 +++++++++++++++++++---------------- > 1 file changed, 28 insertions(+), 23 deletions(-) > > diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c > index 5ca51d67db4b..d01a8595bc5c 100644 > --- a/drivers/pci/hotplug/pnv_php.c > +++ b/drivers/pci/hotplug/pnv_php.c > @@ -18,6 +18,9 @@ > #define DRIVER_AUTHOR "Gavin Shan, IBM Corporation" > #define DRIVER_DESC "PowerPC PowerNV PCI Hotplug Driver" > > +#define SLOT_WARN(sl, x...) \ > + ((sl)->pdev ? pci_warn((sl)->pdev, x) : dev_warn(&(sl)->bus->dev, x)) > + > struct pnv_php_event { > bool added; > struct pnv_php_slot *php_slot; > @@ -265,7 +268,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot) > > ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000); > if (ret) { > - pci_warn(php_slot->pdev, "Error %d getting FDT blob\n", ret); > + SLOT_WARN(php_slot, "Error %d getting FDT blob\n", ret); > goto free_fdt1; > } > > @@ -279,7 +282,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot) > dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL); > if (!dt) { > ret = -EINVAL; > - pci_warn(php_slot->pdev, "Cannot unflatten FDT\n"); > + SLOT_WARN(php_slot, "Cannot unflatten FDT\n"); > goto free_fdt; > } > > @@ -289,15 +292,15 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot) > ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn); > if (ret) { > pnv_php_reverse_nodes(php_slot->dn); > - pci_warn(php_slot->pdev, "Error %d populating changeset\n", > - ret); > + SLOT_WARN(php_slot, "Error %d populating changeset\n", > + ret); > goto free_dt; > } > > php_slot->dn->child = NULL; > ret = of_changeset_apply(&php_slot->ocs); > if (ret) { > - pci_warn(php_slot->pdev, "Error %d applying changeset\n", ret); > + SLOT_WARN(php_slot, "Error %d applying changeset\n", ret); > goto destroy_changeset; > } > > @@ -337,10 +340,10 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot, > if (ret > 0) { > if (be64_to_cpu(msg.params[1]) != php_slot->dn->phandle || > be64_to_cpu(msg.params[2]) != state) { > - pci_warn(php_slot->pdev, "Wrong msg (%lld, %lld, %lld)\n", > - be64_to_cpu(msg.params[1]), > - be64_to_cpu(msg.params[2]), > - be64_to_cpu(msg.params[3])); > + SLOT_WARN(php_slot, "Wrong msg (%lld, %lld, %lld)\n", > + be64_to_cpu(msg.params[1]), > + be64_to_cpu(msg.params[2]), > + be64_to_cpu(msg.params[3])); > return -ENOMSG; > } > if (be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) { > @@ -359,8 +362,8 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot, > return ret; > > error: > - pci_warn(php_slot->pdev, "Error %d powering %s\n", > - ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off"); > + SLOT_WARN(php_slot, "Error %d powering %s\n", > + ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off"); > return ret; > } > EXPORT_SYMBOL_GPL(pnv_php_set_slot_power_state); > @@ -378,8 +381,8 @@ static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state) > */ > ret = pnv_pci_get_power_state(php_slot->id, &power_state); > if (ret) { > - pci_warn(php_slot->pdev, "Error %d getting power status\n", > - ret); > + SLOT_WARN(php_slot, "Error %d getting power status\n", > + ret); > } else { > *state = power_state; > } > @@ -402,7 +405,7 @@ static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state) > *state = presence; > ret = 0; > } else { > - pci_warn(php_slot->pdev, "Error %d getting presence\n", ret); > + SLOT_WARN(php_slot, "Error %d getting presence\n", ret); > } > > return ret; > @@ -637,7 +640,7 @@ static int pnv_php_register_slot(struct pnv_php_slot *php_slot) > ret = pci_hp_register(&php_slot->slot, php_slot->bus, > php_slot->slot_no, php_slot->name); > if (ret) { > - pci_warn(php_slot->pdev, "Error %d registering slot\n", ret); > + SLOT_WARN(php_slot, "Error %d registering slot\n", ret); > return ret; > } > > @@ -690,7 +693,7 @@ static int pnv_php_enable_msix(struct pnv_php_slot *php_slot) > /* Enable MSIx */ > ret = pci_enable_msix_exact(pdev, &entry, 1); > if (ret) { > - pci_warn(pdev, "Error %d enabling MSIx\n", ret); > + SLOT_WARN(php_slot, "Error %d enabling MSIx\n", ret); > return ret; > } > > @@ -734,8 +737,9 @@ static irqreturn_t pnv_php_interrupt(int irq, void *data) > (sts & PCI_EXP_SLTSTA_PDC)) { > ret = pnv_pci_get_presence_state(php_slot->id, &presence); > if (ret) { > - pci_warn(pdev, "PCI slot [%s] error %d getting presence (0x%04x), to retry the operation.\n", > - php_slot->name, ret, sts); > + SLOT_WARN(php_slot, > + "PCI slot [%s] error %d getting presence (0x%04x), to retry the operation.\n", > + php_slot->name, ret, sts); > return IRQ_HANDLED; > } > > @@ -764,8 +768,9 @@ static irqreturn_t pnv_php_interrupt(int irq, void *data) > */ > event = kzalloc(sizeof(*event), GFP_ATOMIC); > if (!event) { > - pci_warn(pdev, "PCI slot [%s] missed hotplug event 0x%04x\n", > - php_slot->name, sts); > + SLOT_WARN(php_slot, > + "PCI slot [%s] missed hotplug event 0x%04x\n", > + php_slot->name, sts); > return IRQ_HANDLED; > } > > @@ -789,7 +794,7 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq) > /* Allocate workqueue */ > php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name); > if (!php_slot->wq) { > - pci_warn(pdev, "Cannot alloc workqueue\n"); > + SLOT_WARN(php_slot, "Cannot alloc workqueue\n"); > pnv_php_disable_irq(php_slot, true); > return; > } > @@ -813,7 +818,7 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq) > php_slot->name, php_slot); > if (ret) { > pnv_php_disable_irq(php_slot, true); > - pci_warn(pdev, "Error %d enabling IRQ %d\n", ret, irq); > + SLOT_WARN(php_slot, "Error %d enabling IRQ %d\n", ret, irq); > return; > } > > @@ -849,7 +854,7 @@ static void pnv_php_enable_irq(struct pnv_php_slot *php_slot) > > ret = pci_enable_device(pdev); > if (ret) { > - pci_warn(pdev, "Error %d enabling device\n", ret); > + SLOT_WARN(php_slot, "Error %d enabling device\n", ret); > return; > } > > -- Andrew Donnellan OzLabs, ADL Canberra ajd@linux.ibm.com IBM Australia Limited
On 27/09/2019 03:15, Frederic Barrat wrote: > > > Le 26/09/2019 à 18:44, Andrew Donnellan a écrit : >> On 9/9/19 5:45 pm, Frederic Barrat wrote: >>> Taking a reference on the pci_dev structure was required with initial >>> commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under >>> OPAL v2"), where we were storing the pci_dev in the pci_dn structure. >>> >>> However, the pci_dev was later removed from the pci_dn structure, but >>> the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa: >>> Remove unnecessary pcidev from pci_dn"). >>> >>> The pnv_ioda_pe structure life cycle is the same as the pci_dev >>> structure, the PE is freed when the device is released. So we don't >>> need a reference for the pci_dev stored in the PE, otherwise the >>> pci_dev will never be released. Which is not really a surprise as the >>> comment (removed here as no longer needed) was stating as much. >>> >>> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn") >>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> >>> --- >>> arch/powerpc/platforms/powernv/pci-ioda.c | 11 +---------- >>> 1 file changed, 1 insertion(+), 10 deletions(-) >>> >>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c >>> index d8080558d020..92767f006f20 100644 >>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c >>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>> @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) >>> return NULL; >>> } >>> - /* NOTE: We get only one ref to the pci_dev for the pdn, not for the >>> - * pointer in the PE data structure, both should be destroyed at the >>> - * same time. However, this needs to be looked at more closely again >>> - * once we actually start removing things (Hotplug, SR-IOV, ...) >>> - * >>> - * At some point we want to remove the PDN completely anyways >>> - */ >>> - pci_dev_get(dev); >>> pdn->pe_number = pe->pe_number; >>> pe->flags = PNV_IODA_PE_DEV; >>> pe->pdev = dev; >>> @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) >>> pnv_ioda_free_pe(pe); >>> pdn->pe_number = IODA_INVALID_PE; >>> pe->pdev = NULL; >>> - pci_dev_put(dev); >>> return NULL; >>> } >>> @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev) >>> */ >>> dev_info(&npu_pdev->dev, >>> "Associating to existing PE %x\n", pe_num); >>> - pci_dev_get(npu_pdev); >>> + pci_dev_get(npu_pdev); // still needed after 902bdc57451c ? >> >> Did you mean to leave that comment in? > > Yes, I assumed the series wouldn't get in on the first try and a nvlink-minded developer would weigh in :) I am looking at it and I am still not so sure what exactly guarantees that lifetime(@dev) == lifetime(@pe). For example, sriov_disable() removes VFs first, and only then releases PEs so there is a window when we may have a stale pdev. Not sure. > > Fred > > >>> npu_pdn = pci_get_pdn(npu_pdev); >>> rid = npu_pdev->bus->number << 8 | npu_pdn->devfn; >>> npu_pdn->pe_number = pe_num; >>> >> > -- Alexey
On 9/9/19 5:45 pm, Frederic Barrat wrote: > The driver only allows to disable a slot in the POPULATED > state. However, if an error occurs while enabling the slot, say > because the link couldn't be trained, then the POPULATED state may not > be reached, yet the power state of the slot is on. So allow to disable > a slot in the REGISTERED state. Removing the devices will do nothing > since it's not populated, and we'll set the power state of the slot > back to off. > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Makes sense Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> > --- > drivers/pci/hotplug/pnv_php.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c > index f0a7360154e7..5ca51d67db4b 100644 > --- a/drivers/pci/hotplug/pnv_php.c > +++ b/drivers/pci/hotplug/pnv_php.c > @@ -523,7 +523,13 @@ static int pnv_php_disable_slot(struct hotplug_slot *slot) > struct pnv_php_slot *php_slot = to_pnv_php_slot(slot); > int ret; > > - if (php_slot->state != PNV_PHP_STATE_POPULATED) > + /* > + * Allow to disable a slot already in the registered state to > + * cover cases where the slot couldn't be enabled and never > + * reached the populated state > + */ > + if (php_slot->state != PNV_PHP_STATE_POPULATED && > + php_slot->state != PNV_PHP_STATE_REGISTERED) > return 0; > > /* Remove all devices behind the slot */ > -- Andrew Donnellan OzLabs, ADL Canberra ajd@linux.ibm.com IBM Australia Limited
On 9/9/19 5:45 pm, Frederic Barrat wrote: > The PE for an opencapi device was set as part of a late PHB fixup > operation, when creating the PHB. To use the PCI hotplug framework, > this is not going to work, as the PHB stays the same, it's only the > devices underneath which are updated. For regular PCI devices, it is > done as part of the reconfiguration of the bridge, but for opencapi > PHBs, we don't have an intermediate bridge. So let's define the PE > when the device is enabled. PEs are meaningless for opencapi, the NPU > doesn't define them and opal is not doing anything with them. > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> > --- > arch/powerpc/platforms/powernv/pci-ioda.c | 31 +++++++++++++++++------ > 1 file changed, 23 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > index 3dbbf5365c1c..06ce7ddaa0cf 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -1260,8 +1260,6 @@ static void pnv_pci_ioda_setup_PEs(void) > { > struct pci_controller *hose; > struct pnv_phb *phb; > - struct pci_bus *bus; > - struct pci_dev *pdev; > struct pnv_ioda_pe *pe; > > list_for_each_entry(hose, &hose_list, list_node) { > @@ -1273,11 +1271,6 @@ static void pnv_pci_ioda_setup_PEs(void) > if (phb->model == PNV_PHB_MODEL_NPU2) > WARN_ON_ONCE(pnv_npu2_init(hose)); > } > - if (phb->type == PNV_PHB_NPU_OCAPI) { > - bus = hose->bus; > - list_for_each_entry(pdev, &bus->devices, bus_list) > - pnv_ioda_setup_dev_PE(pdev); > - } > } > list_for_each_entry(hose, &hose_list, list_node) { > phb = hose->private_data; > @@ -3385,6 +3378,28 @@ static bool pnv_pci_enable_device_hook(struct pci_dev *dev) > return true; > } > > +static bool pnv_ocapi_enable_device_hook(struct pci_dev *dev) > +{ > + struct pci_controller *hose = pci_bus_to_host(dev->bus); > + struct pnv_phb *phb = hose->private_data; > + struct pci_dn *pdn; > + struct pnv_ioda_pe *pe; > + > + if (!phb->initialized) > + return true; > + > + pdn = pci_get_pdn(dev); > + if (!pdn) > + return false; > + > + if (pdn->pe_number == IODA_INVALID_PE) { > + pe = pnv_ioda_setup_dev_PE(dev); > + if (!pe) > + return false; > + } > + return true; > +} > + > static long pnv_pci_ioda1_unset_window(struct iommu_table_group *table_group, > int num) > { > @@ -3625,7 +3640,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = { > }; > > static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = { > - .enable_device_hook = pnv_pci_enable_device_hook, > + .enable_device_hook = pnv_ocapi_enable_device_hook, > .window_alignment = pnv_pci_window_alignment, > .reset_secondary_bus = pnv_pci_reset_secondary_bus, > .shutdown = pnv_pci_ioda_shutdown, > -- Andrew Donnellan OzLabs, ADL Canberra ajd@linux.ibm.com IBM Australia Limited
Le 27/09/2019 à 08:54, Alexey Kardashevskiy a écrit : > > > On 27/09/2019 03:15, Frederic Barrat wrote: >> >> >> Le 26/09/2019 à 18:44, Andrew Donnellan a écrit : >>> On 9/9/19 5:45 pm, Frederic Barrat wrote: >>>> Taking a reference on the pci_dev structure was required with initial >>>> commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under >>>> OPAL v2"), where we were storing the pci_dev in the pci_dn structure. >>>> >>>> However, the pci_dev was later removed from the pci_dn structure, but >>>> the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa: >>>> Remove unnecessary pcidev from pci_dn"). >>>> >>>> The pnv_ioda_pe structure life cycle is the same as the pci_dev >>>> structure, the PE is freed when the device is released. So we don't >>>> need a reference for the pci_dev stored in the PE, otherwise the >>>> pci_dev will never be released. Which is not really a surprise as the >>>> comment (removed here as no longer needed) was stating as much. >>>> >>>> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn") >>>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> >>>> --- >>>> arch/powerpc/platforms/powernv/pci-ioda.c | 11 +---------- >>>> 1 file changed, 1 insertion(+), 10 deletions(-) >>>> >>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c >>>> index d8080558d020..92767f006f20 100644 >>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c >>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>>> @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) >>>> return NULL; >>>> } >>>> - /* NOTE: We get only one ref to the pci_dev for the pdn, not for the >>>> - * pointer in the PE data structure, both should be destroyed at the >>>> - * same time. However, this needs to be looked at more closely again >>>> - * once we actually start removing things (Hotplug, SR-IOV, ...) >>>> - * >>>> - * At some point we want to remove the PDN completely anyways >>>> - */ >>>> - pci_dev_get(dev); >>>> pdn->pe_number = pe->pe_number; >>>> pe->flags = PNV_IODA_PE_DEV; >>>> pe->pdev = dev; >>>> @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) >>>> pnv_ioda_free_pe(pe); >>>> pdn->pe_number = IODA_INVALID_PE; >>>> pe->pdev = NULL; >>>> - pci_dev_put(dev); >>>> return NULL; >>>> } >>>> @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev) >>>> */ >>>> dev_info(&npu_pdev->dev, >>>> "Associating to existing PE %x\n", pe_num); >>>> - pci_dev_get(npu_pdev); >>>> + pci_dev_get(npu_pdev); // still needed after 902bdc57451c ? >>> >>> Did you mean to leave that comment in? >> >> Yes, I assumed the series wouldn't get in on the first try and a nvlink-minded developer would weigh in :) > > I am looking at it and I am still not so sure what exactly guarantees that lifetime(@dev) == lifetime(@pe). For example, > sriov_disable() removes VFs first, and only then releases PEs so there is a window when we may have a stale pdev. Not sure. Indeed, for SRIOV, PE life-cycle is handled differently. And hopefully correctly, but I don’t think this patch alters it. I was discussing with Greg about it, as he had to look in that area in the past. My understanding is that this patch is ok as it follows the initial comment in pnv_ioda_setup_dev_PE() that we don’t need to get a reference for the PE, it makes sense and I could trace the origin of the pci_dev_get() and it seemed like it should have been removed in a previous patch (902bdc57451c). However, one question is whether this patch breaks nvlink and if nvlink assumes the devices won’t go away because we explicitly take a reference forever. In npu_dma.c, there are 2 functions which allow to find the GPU associated to a npu device, and vice-versa. Both functions return a pointer to a struct pci_dev, but they don’t take a reference on the device being returned. So that seems dangerous. I’m probably missing something. Alexey, Alistair: what, if anything, guarantees, that the npu or gpu devices stay valid. Is it because we simply don’t provide any means to get rid of them ? Otherwise, don’t we need the callers of pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference counting ? I’ve started looking into it and the changes are scary, which explains Greg’s related commit 02c5f5394918b. Fred > >> >> Fred >> >> >>>> npu_pdn = pci_get_pdn(npu_pdev); >>>> rid = npu_pdev->bus->number << 8 | npu_pdn->devfn; >>>> npu_pdn->pe_number = pe_num; >>>> >>> >> >
On Wednesday, 13 November 2019 4:38:21 AM AEDT Frederic Barrat wrote: > > Le 27/09/2019 à 08:54, Alexey Kardashevskiy a écrit : > > > > > > On 27/09/2019 03:15, Frederic Barrat wrote: > >> > >> > >> Le 26/09/2019 à 18:44, Andrew Donnellan a écrit : > >>> On 9/9/19 5:45 pm, Frederic Barrat wrote: > >>>> Taking a reference on the pci_dev structure was required with initial > >>>> commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under > >>>> OPAL v2"), where we were storing the pci_dev in the pci_dn structure. > >>>> > >>>> However, the pci_dev was later removed from the pci_dn structure, but > >>>> the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa: > >>>> Remove unnecessary pcidev from pci_dn"). > >>>> > >>>> The pnv_ioda_pe structure life cycle is the same as the pci_dev > >>>> structure, the PE is freed when the device is released. So we don't > >>>> need a reference for the pci_dev stored in the PE, otherwise the > >>>> pci_dev will never be released. Which is not really a surprise as the > >>>> comment (removed here as no longer needed) was stating as much. > >>>> > >>>> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn") > >>>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> > >>>> --- > >>>> arch/powerpc/platforms/powernv/pci-ioda.c | 11 +---------- > >>>> 1 file changed, 1 insertion(+), 10 deletions(-) > >>>> > >>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/ platforms/powernv/pci-ioda.c > >>>> index d8080558d020..92767f006f20 100644 > >>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c > >>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > >>>> @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) > >>>> return NULL; > >>>> } > >>>> - /* NOTE: We get only one ref to the pci_dev for the pdn, not for the > >>>> - * pointer in the PE data structure, both should be destroyed at the > >>>> - * same time. However, this needs to be looked at more closely again > >>>> - * once we actually start removing things (Hotplug, SR-IOV, ...) > >>>> - * > >>>> - * At some point we want to remove the PDN completely anyways > >>>> - */ > >>>> - pci_dev_get(dev); > >>>> pdn->pe_number = pe->pe_number; > >>>> pe->flags = PNV_IODA_PE_DEV; > >>>> pe->pdev = dev; > >>>> @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) > >>>> pnv_ioda_free_pe(pe); > >>>> pdn->pe_number = IODA_INVALID_PE; > >>>> pe->pdev = NULL; > >>>> - pci_dev_put(dev); > >>>> return NULL; > >>>> } > >>>> @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev) > >>>> */ > >>>> dev_info(&npu_pdev->dev, > >>>> "Associating to existing PE %x\n", pe_num); > >>>> - pci_dev_get(npu_pdev); > >>>> + pci_dev_get(npu_pdev); // still needed after 902bdc57451c ? > >>> > >>> Did you mean to leave that comment in? > >> > >> Yes, I assumed the series wouldn't get in on the first try and a nvlink- minded developer would weigh in :) > > > > I am looking at it and I am still not so sure what exactly guarantees that lifetime(@dev) == lifetime(@pe). For example, > > sriov_disable() removes VFs first, and only then releases PEs so there is a window when we may have a stale pdev. Not sure. > > > Indeed, for SRIOV, PE life-cycle is handled differently. And hopefully > correctly, but I don’t think this patch alters it. > > I was discussing with Greg about it, as he had to look in that area in > the past. My understanding is that this patch is ok as it follows the > initial comment in pnv_ioda_setup_dev_PE() that we don’t need to get a > reference for the PE, it makes sense and I could trace the origin of the > pci_dev_get() and it seemed like it should have been removed in a > previous patch (902bdc57451c). > > However, one question is whether this patch breaks nvlink and if nvlink > assumes the devices won’t go away because we explicitly take a reference > forever. In npu_dma.c, there are 2 functions which allow to find the GPU > associated to a npu device, and vice-versa. Both functions return a > pointer to a struct pci_dev, but they don’t take a reference on the > device being returned. So that seems dangerous. I’m probably missing > something. > > Alexey, Alistair: what, if anything, guarantees, that the npu or gpu > devices stay valid. Is it because we simply don’t provide any means to > get rid of them ? Otherwise, don’t we need the callers of > pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference > counting ? I’ve started looking into it and the changes are scary, which > explains Greg’s related commit 02c5f5394918b. To be honest the reference counting looks like it has evolved into something quite suspect and I don't think you're missing anything. In practice though we likely haven't hit any issues because the original callers didn't store references to the pdev which would make the window quite small (although the pass through work may have changed that). And as you say there simply wasn't any means to get rid of them anyway (EEH, hotplug, etc. has never been implemented or supported for GPUs and all sorts of terrible things happen if you try). The best solution would likely be to review the reference counting and to teach callers of get_*_dev() to call pci_put_dev(), etc. - Alistair > Fred > > > > > > >> > >> Fred > >> > >> > >>>> npu_pdn = pci_get_pdn(npu_pdev); > >>>> rid = npu_pdev->bus->number << 8 | npu_pdn->devfn; > >>>> npu_pdn->pe_number = pe_num; > >>>> > >>> > >> > > > >
On Mon, Nov 18, 2019 at 12:06 PM Alistair Popple <alistair@popple.id.au> wrote: > > On Wednesday, 13 November 2019 4:38:21 AM AEDT Frederic Barrat wrote: > > > > However, one question is whether this patch breaks nvlink and if nvlink > > assumes the devices won’t go away because we explicitly take a reference > > forever. In npu_dma.c, there are 2 functions which allow to find the GPU > > associated to a npu device, and vice-versa. Both functions return a > > pointer to a struct pci_dev, but they don’t take a reference on the > > device being returned. So that seems dangerous. I’m probably missing > > something. > > > > Alexey, Alistair: what, if anything, guarantees, that the npu or gpu > > devices stay valid. Is it because we simply don’t provide any means to > > get rid of them ? Otherwise, don’t we need the callers of > > pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference > > counting ? I’ve started looking into it and the changes are scary, which > > explains Greg’s related commit 02c5f5394918b. > > To be honest the reference counting looks like it has evolved into something > quite suspect and I don't think you're missing anything. In practice though we > likely haven't hit any issues because the original callers didn't store > references to the pdev which would make the window quite small (although the > pass through work may have changed that). And as you say there simply wasn't > any means to get rid of them anyway (EEH, hotplug, etc. has never been > implemented or supported for GPUs and all sorts of terrible things happen if > you try). In other words: leaking a ref is the only safe thing to do. > The best solution would likely be to review the reference counting and to > teach callers of get_*_dev() to call pci_put_dev(), etc. The issue is that the two callers of get_pci_dev() are non-GPL exported symbols so we don't know what's calling them or what their expectations are. We be doing whatever makes sense for OpenCAPI and if that happens to cause a problem for someone else, then they can deal with it. Oliver
On Monday, 18 November 2019 12:24:24 PM AEDT Oliver O'Halloran wrote: > On Mon, Nov 18, 2019 at 12:06 PM Alistair Popple <alistair@popple.id.au> wrote: > > > > On Wednesday, 13 November 2019 4:38:21 AM AEDT Frederic Barrat wrote: > > > > > > However, one question is whether this patch breaks nvlink and if nvlink > > > assumes the devices won’t go away because we explicitly take a reference > > > forever. In npu_dma.c, there are 2 functions which allow to find the GPU > > > associated to a npu device, and vice-versa. Both functions return a > > > pointer to a struct pci_dev, but they don’t take a reference on the > > > device being returned. So that seems dangerous. I’m probably missing > > > something. > > > > > > Alexey, Alistair: what, if anything, guarantees, that the npu or gpu > > > devices stay valid. Is it because we simply don’t provide any means to > > > get rid of them ? Otherwise, don’t we need the callers of > > > pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference > > > counting ? I’ve started looking into it and the changes are scary, which > > > explains Greg’s related commit 02c5f5394918b. > > > > To be honest the reference counting looks like it has evolved into something > > quite suspect and I don't think you're missing anything. In practice though we > > likely haven't hit any issues because the original callers didn't store > > references to the pdev which would make the window quite small (although the > > pass through work may have changed that). And as you say there simply wasn't > > any means to get rid of them anyway (EEH, hotplug, etc. has never been > > implemented or supported for GPUs and all sorts of terrible things happen if > > you try). > > In other words: leaking a ref is the only safe thing to do. Correct. > > The best solution would likely be to review the reference counting and to > > teach callers of get_*_dev() to call pci_put_dev(), etc. > > The issue is that the two callers of get_pci_dev() are non-GPL > exported symbols so we don't know what's calling them or what their > expectations are. We be doing whatever makes sense for OpenCAPI and if > that happens to cause a problem for someone else, then they can deal > with it. The issue isn't that it's exported non-GPL vs. GPL, rather that there are probably out-of-tree modules like the NVIDIA driver using it. However as you say supporting out-of-tree modules is not generally a concern for kernel developers so we certainly shouldn't let that prevent us doing a fix. - Alistair > Oliver
Le 18/11/2019 à 03:36, Alistair Popple a écrit : > On Monday, 18 November 2019 12:24:24 PM AEDT Oliver O'Halloran wrote: >> On Mon, Nov 18, 2019 at 12:06 PM Alistair Popple <alistair@popple.id.au> > wrote: >>> >>> On Wednesday, 13 November 2019 4:38:21 AM AEDT Frederic Barrat wrote: >>>> >>>> However, one question is whether this patch breaks nvlink and if nvlink >>>> assumes the devices won’t go away because we explicitly take a reference >>>> forever. In npu_dma.c, there are 2 functions which allow to find the GPU >>>> associated to a npu device, and vice-versa. Both functions return a >>>> pointer to a struct pci_dev, but they don’t take a reference on the >>>> device being returned. So that seems dangerous. I’m probably missing >>>> something. >>>> >>>> Alexey, Alistair: what, if anything, guarantees, that the npu or gpu >>>> devices stay valid. Is it because we simply don’t provide any means to >>>> get rid of them ? Otherwise, don’t we need the callers of >>>> pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference >>>> counting ? I’ve started looking into it and the changes are scary, which >>>> explains Greg’s related commit 02c5f5394918b. >>> >>> To be honest the reference counting looks like it has evolved into > something >>> quite suspect and I don't think you're missing anything. In practice > though we >>> likely haven't hit any issues because the original callers didn't store >>> references to the pdev which would make the window quite small (although > the >>> pass through work may have changed that). And as you say there simply > wasn't >>> any means to get rid of them anyway (EEH, hotplug, etc. has never been >>> implemented or supported for GPUs and all sorts of terrible things happen > if >>> you try). >> >> In other words: leaking a ref is the only safe thing to do. > > Correct. > >>> The best solution would likely be to review the reference counting and to >>> teach callers of get_*_dev() to call pci_put_dev(), etc. >> >> The issue is that the two callers of get_pci_dev() are non-GPL >> exported symbols so we don't know what's calling them or what their >> expectations are. We be doing whatever makes sense for OpenCAPI and if >> that happens to cause a problem for someone else, then they can deal >> with it. > > The issue isn't that it's exported non-GPL vs. GPL, rather that there are > probably out-of-tree modules like the NVIDIA driver using it. However as you > say supporting out-of-tree modules is not generally a concern for kernel > developers so we certainly shouldn't let that prevent us doing a fix. Thanks Alistair and Oliver. I'll bite the bullet and try do the right thing wrt ref counting in npu-dma.c Fred > - Alistair > >> Oliver > > > >
On 10/9/19 1:45 am, Frederic Barrat wrote:
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index 6104418c9ad5..00a79f3c989f 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -48,13 +48,14 @@ int pnv_pci_get_slot_id(struct device_node *np, uint64_t *id)
> return -ENXIO;
>
> bdfn = ((bdfn & 0x00ffff00) >> 8);
> - while ((parent = of_get_parent(parent))) {
> + for (parent = np; parent; parent = of_get_parent(parent)) {
I think we should rename this pointer from "parent" as it no longer
means that.
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@linux.ibm.com IBM Australia Limited
On 10/9/19 1:45 am, Frederic Barrat wrote:
> When changing the slot state, if opal hits an error and tells as such
> in the asynchronous reply, the warning "Wrong msg" is logged, which is
> rather confusing. Instead we can reuse the better message which is
> already used when we couldn't submit the asynchronous opal request
> initially.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@linux.ibm.com IBM Australia Limited
On 10/9/19 1:45 am, Frederic Barrat wrote:
> On powernv, when removing a device through hotplug, the following
> warning is logged:
>
> Invalid refcount <.> on <...>
>
> It may be incorrect, the refcount may be set to a higher value than 1
> and be valid. of_detach_node() can drop more than one reference. As it
> doesn't seem trivial to assert the correct value, let's remove the
> warning.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@linux.ibm.com IBM Australia Limited
On 10/9/19 1:45 am, Frederic Barrat wrote:
> Add the opencapi PHBs to the list of PHBs being scanned to look for
> slots.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> drivers/pci/hotplug/pnv_php.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index 304bdbcdb77c..f0a7360154e7 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -954,7 +954,8 @@ static int __init pnv_php_init(void)
> pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> pnv_php_register(dn);
> -
> + for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-phb")
> + pnv_php_register_one(dn);
> return 0;
> }
>
> @@ -964,6 +965,8 @@ static void __exit pnv_php_exit(void)
>
> for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> pnv_php_unregister(dn);
> + for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-phb")
> + pnv_php_unregister(dn);
> }
Why do we use register_one to register and unregister rather than
unregister_one to unregister?
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@linux.ibm.com IBM Australia Limited
On 10/9/19 1:45 am, Frederic Barrat wrote:
> Protect the PHB's list of PE. Probably not needed as long as it was
> populated during PHB creation, but it feels right and will become
> required once we can add/remove opencapi devices on hotplug.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@linux.ibm.com IBM Australia Limited
On 10/9/19 1:45 am, Frederic Barrat wrote:
> With hotplug, an opencapi device can now go away. It needs to be
> released, mostly to clean up its PE state. We were previously not
> defining any device callback. We can reuse the standard PCI release
> callback, it does a bit too much for an opencapi device, but it's
> harmless, and only needs minor tuning.
>
> Also separate the undo of the PELT-V code in a separate function, it
> is not needed for NPU devices and it improves a bit the readability of
> the code.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@linux.ibm.com IBM Australia Limited
Le 10/09/2019 à 02:34, Alastair D'Silva a écrit : > On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote: >> Protect the PHB's list of PE. Probably not needed as long as it was >> populated during PHB creation, but it feels right and will become >> required once we can add/remove opencapi devices on hotplug. >> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> >> --- >> arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >> b/arch/powerpc/platforms/powernv/pci-ioda.c >> index 92767f006f20..3dbbf5365c1c 100644 >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c >> @@ -1080,8 +1080,9 @@ static struct pnv_ioda_pe >> *pnv_ioda_setup_dev_PE(struct pci_dev *dev) >> } >> >> /* Put PE to the list */ >> + mutex_lock(&phb->ioda.pe_list_mutex); >> list_add_tail(&pe->list, &phb->ioda.pe_list); >> - >> + mutex_unlock(&phb->ioda.pe_list_mutex); >> return pe; >> } >> >> @@ -3513,7 +3514,10 @@ static void pnv_ioda_release_pe(struct >> pnv_ioda_pe *pe) >> struct pnv_phb *phb = pe->phb; >> struct pnv_ioda_pe *slave, *tmp; >> >> + mutex_lock(&phb->ioda.pe_list_mutex); >> list_del(&pe->list); >> + mutex_unlock(&phb->ioda.pe_list_mutex); >> + >> switch (phb->type) { >> case PNV_PHB_IODA1: >> pnv_pci_ioda1_release_pe_dma(pe); > > Hmm, the ioda.pe_list_mutex muxtex exists, and is inited, but there are > no other users. It's position & naming in the struct suggests it > belongs to ioda.pe_list, rather than pnv_ioda_pe.list (as suggested by > the lock/unlock around the list del). I don't quite understand the above. The mutex is already in use by the functions handling virtual functions, pnv_ioda_setup_vf_PE() and pnv_ioda_release_vf_PE(). The point of the list is to keep track of all the PEs used by the PHB, so it makes sense to have the mutex at that level. > Do the other accessors of ioda.pe_list also need mutex protection? > pnv_ioda_setup_bus_PE() > pnv_pci_dma_bus_setup() > pnv_pci_init_ioda_phb() > pnv_pci_ioda_setup_PEs() I think we could also use it there, it wouldn't hurt. Those functions are called when the kernel is building part of the PCI topology, and devices are not really active yet, so I don't think it's absolutely required. I'm actually not sure my patch is needed either. With hotplug, the devices can come and go, whereas the PHB remains. So it feels right to start protecting the list when adding/removing a device. But I don't think we can really have concurrency and have 2 different operations adding/removing devices at the same time under the same PHB, at least for opencapi. Maybe for PCI, if we have multiple slots under the same PHB. Not sure. Fred > If not, perhaps the metux should be removed from ioda and replaced with > pe.list_mutex instead. >
On Tue, Nov 19, 2019 at 11:57 PM Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>
> > Do the other accessors of ioda.pe_list also need mutex protection?
> > pnv_ioda_setup_bus_PE()
> > pnv_pci_dma_bus_setup()
> > pnv_pci_init_ioda_phb()
> > pnv_pci_ioda_setup_PEs()
>
>
> I think we could also use it there, it wouldn't hurt. Those functions
> are called when the kernel is building part of the PCI topology, and
> devices are not really active yet, so I don't think it's absolutely
> required.
>
> I'm actually not sure my patch is needed either. With hotplug, the
> devices can come and go, whereas the PHB remains. So it feels right to
> start protecting the list when adding/removing a device. But I don't
> think we can really have concurrency and have 2 different operations
> adding/removing devices at the same time under the same PHB, at least
> for opencapi. Maybe for PCI, if we have multiple slots under the same
> PHB. Not sure.
Creation of new pci_dev's is serialised by the global pci rescan and
remove lock so on the creation side it's not an issue. However, we can
release IODA PEs in the pci_dev's release function which might be
called without that lock held. It's pretty hard to hit that case
though since it require something to be holding a ref to the pci_dev
even after the driver's ->remove() function has been called.
Le 19/11/2019 à 02:26, Andrew Donnellan a écrit :
> On 10/9/19 1:45 am, Frederic Barrat wrote:
>> diff --git a/arch/powerpc/platforms/powernv/pci.c
>> b/arch/powerpc/platforms/powernv/pci.c
>> index 6104418c9ad5..00a79f3c989f 100644
>> --- a/arch/powerpc/platforms/powernv/pci.c
>> +++ b/arch/powerpc/platforms/powernv/pci.c
>> @@ -48,13 +48,14 @@ int pnv_pci_get_slot_id(struct device_node *np,
>> uint64_t *id)
>> return -ENXIO;
>> bdfn = ((bdfn & 0x00ffff00) >> 8);
>> - while ((parent = of_get_parent(parent))) {
>> + for (parent = np; parent; parent = of_get_parent(parent)) {
>
> I think we should rename this pointer from "parent" as it no longer
> means that.
ok. We still walk the parent list, but we don't start from the first
parent any more. I'll rename with a more generic "node" to avoid confusion.
Fred
Le 19/11/2019 à 14:22, Oliver O'Halloran a écrit :
> On Tue, Nov 19, 2019 at 11:57 PM Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>>
>>> Do the other accessors of ioda.pe_list also need mutex protection?
>>> pnv_ioda_setup_bus_PE()
>>> pnv_pci_dma_bus_setup()
>>> pnv_pci_init_ioda_phb()
>>> pnv_pci_ioda_setup_PEs()
>>
>>
>> I think we could also use it there, it wouldn't hurt. Those functions
>> are called when the kernel is building part of the PCI topology, and
>> devices are not really active yet, so I don't think it's absolutely
>> required.
>>
>> I'm actually not sure my patch is needed either. With hotplug, the
>> devices can come and go, whereas the PHB remains. So it feels right to
>> start protecting the list when adding/removing a device. But I don't
>> think we can really have concurrency and have 2 different operations
>> adding/removing devices at the same time under the same PHB, at least
>> for opencapi. Maybe for PCI, if we have multiple slots under the same
>> PHB. Not sure.
>
> Creation of new pci_dev's is serialised by the global pci rescan and
> remove lock so on the creation side it's not an issue. However, we can
> release IODA PEs in the pci_dev's release function which might be
> called without that lock held. It's pretty hard to hit that case
> though since it require something to be holding a ref to the pci_dev
> even after the driver's ->remove() function has been called.
Thanks for clarifying it! Indeed, the pci_rescan_remove_lock in hotplug
helps for the PCI case. I guess we can keep this patch for that hard to
hit case, just in case it doesn't blow up somewhere else.
Fred
Le 19/11/2019 à 06:18, Andrew Donnellan a écrit :
> On 10/9/19 1:45 am, Frederic Barrat wrote:
>> Add the opencapi PHBs to the list of PHBs being scanned to look for
>> slots.
>>
>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>> ---
>> drivers/pci/hotplug/pnv_php.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/hotplug/pnv_php.c
>> b/drivers/pci/hotplug/pnv_php.c
>> index 304bdbcdb77c..f0a7360154e7 100644
>> --- a/drivers/pci/hotplug/pnv_php.c
>> +++ b/drivers/pci/hotplug/pnv_php.c
>> @@ -954,7 +954,8 @@ static int __init pnv_php_init(void)
>> pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
>> for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>> pnv_php_register(dn);
>> -
>> + for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-phb")
>> + pnv_php_register_one(dn);
>> return 0;
>> }
>> @@ -964,6 +965,8 @@ static void __exit pnv_php_exit(void)
>> for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>> pnv_php_unregister(dn);
>> + for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-phb")
>> + pnv_php_unregister(dn);
>> }
>
>
> Why do we use register_one to register and unregister rather than
> unregister_one to unregister?
Good catch! With the above, the slot was not removed.
pnv_php_unregister() looks at the children only and was missing the
opencapi slot, since it's directly under the PHB.
Fred
Le 10/09/2019 à 02:56, Alastair D'Silva a écrit : > On Mon, 2019-09-09 at 17:45 +0200, Frederic Barrat wrote: >> With hotplug, an opencapi device can now go away. It needs to be >> released, mostly to clean up its PE state. We were previously not >> defining any device callback. We can reuse the standard PCI release >> callback, it does a bit too much for an opencapi device, but it's >> harmless, and only needs minor tuning. >> >> Also separate the undo of the PELT-V code in a separate function, it >> is not needed for NPU devices and it improves a bit the readability >> of >> the code. >> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> >> --- >> arch/powerpc/platforms/powernv/pci-ioda.c | 59 +++++++++++++++---- >> ---- >> 1 file changed, 39 insertions(+), 20 deletions(-) >> >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >> b/arch/powerpc/platforms/powernv/pci-ioda.c >> index 06ce7ddaa0cf..e5895c05efae 100644 >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c >> @@ -188,7 +188,7 @@ static void pnv_ioda_free_pe(struct pnv_ioda_pe >> *pe) >> unsigned int pe_num = pe->pe_number; >> >> WARN_ON(pe->pdev); >> - WARN_ON(pe->npucomp); /* NPUs are not supposed to be freed */ >> + WARN_ON(pe->npucomp); /* NPUs for nvlink are not supposed to be >> freed */ >> kfree(pe->npucomp); >> memset(pe, 0, sizeof(struct pnv_ioda_pe)); >> clear_bit(pe_num, phb->ioda.pe_alloc); >> @@ -777,6 +777,34 @@ static int pnv_ioda_set_peltv(struct pnv_phb >> *phb, >> return 0; >> } >> >> +static void pnv_ioda_unset_peltv(struct pnv_phb *phb, >> + struct pnv_ioda_pe *pe, >> + struct pci_dev *parent) >> +{ >> + int64_t rc; >> + >> + while (parent) { >> + struct pci_dn *pdn = pci_get_pdn(parent); >> + >> + if (pdn && pdn->pe_number != IODA_INVALID_PE) { >> + rc = opal_pci_set_peltv(phb->opal_id, pdn- >>> pe_number, >> + pe->pe_number, >> + OPAL_REMOVE_PE_FROM_DOM >> AIN); >> + /* XXX What to do in case of error ? */ > > Can we take the opportunity to address this comment? Probably like the original author, I'm not sure what we could do better here. We already log a warning below if opal returns an error when dissociating the PE (admittedly not the same thing as the parent). Note that this code is not executed for an opencapi device, so I'm just keeping thing as is. Fred >> + } >> + parent = parent->bus->self; >> + } >> + >> + opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number, >> + OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); >> + >> + /* Disassociate PE in PELT */ >> + rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number, >> + pe->pe_number, >> OPAL_REMOVE_PE_FROM_DOMAIN); >> + if (rc) >> + pe_warn(pe, "OPAL error %lld remove self from PELTV\n", >> rc); >> +} >> + >> static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct >> pnv_ioda_pe *pe) >> { >> struct pci_dev *parent; >> @@ -827,25 +855,13 @@ static int pnv_ioda_deconfigure_pe(struct >> pnv_phb *phb, struct pnv_ioda_pe *pe) >> for (rid = pe->rid; rid < rid_end; rid++) >> phb->ioda.pe_rmap[rid] = IODA_INVALID_PE; >> >> - /* Release from all parents PELT-V */ >> - while (parent) { >> - struct pci_dn *pdn = pci_get_pdn(parent); >> - if (pdn && pdn->pe_number != IODA_INVALID_PE) { >> - rc = opal_pci_set_peltv(phb->opal_id, pdn- >>> pe_number, >> - pe->pe_number, >> OPAL_REMOVE_PE_FROM_DOMAIN); >> - /* XXX What to do in case of error ? */ >> - } >> - parent = parent->bus->self; >> - } >> - >> - opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number, >> - OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); >> + /* >> + * Release from all parents PELT-V. NPUs don't have a PELTV >> + * table >> + */ >> + if (phb->type != PNV_PHB_NPU_NVLINK && phb->type != >> PNV_PHB_NPU_OCAPI) >> + pnv_ioda_unset_peltv(phb, pe, parent); >> >> - /* Disassociate PE in PELT */ >> - rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number, >> - pe->pe_number, >> OPAL_REMOVE_PE_FROM_DOMAIN); >> - if (rc) >> - pe_warn(pe, "OPAL error %lld remove self from PELTV\n", >> rc); >> rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid, >> bcomp, dcomp, fcomp, OPAL_UNMAP_PE); >> if (rc) >> @@ -3540,6 +3556,8 @@ static void pnv_ioda_release_pe(struct >> pnv_ioda_pe *pe) >> case PNV_PHB_IODA2: >> pnv_pci_ioda2_release_pe_dma(pe); >> break; >> + case PNV_PHB_NPU_OCAPI: >> + break; >> default: >> WARN_ON(1); >> } >> @@ -3592,7 +3610,7 @@ static void pnv_pci_release_device(struct >> pci_dev *pdev) >> pe = &phb->ioda.pe_array[pdn->pe_number]; >> pdn->pe_number = IODA_INVALID_PE; >> >> - WARN_ON(--pe->device_count < 0); >> + WARN_ON((pe->flags != PNV_IODA_PE_DEV) && (--pe->device_count < >> 0)); >> if (pe->device_count == 0) >> pnv_ioda_release_pe(pe); >> } >> @@ -3641,6 +3659,7 @@ static const struct pci_controller_ops >> pnv_npu_ioda_controller_ops = { >> >> static const struct pci_controller_ops >> pnv_npu_ocapi_ioda_controller_ops = { >> .enable_device_hook = pnv_ocapi_enable_device_hook, >> + .release_device = pnv_pci_release_device, >> .window_alignment = pnv_pci_window_alignment, >> .reset_secondary_bus = pnv_pci_reset_secondary_bus, >> .shutdown = pnv_pci_ioda_shutdown,