linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
@ 2020-08-07 10:18 Cédric Le Goater
  2020-09-09  6:16 ` Alexey Kardashevskiy
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Cédric Le Goater @ 2020-08-07 10:18 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Alexey Kardashevskiy, Oliver O'Halloran, linuxppc-dev,
	Cédric Le Goater

When a passthrough IO adapter is removed from a pseries machine using
hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
guest OS to clear all page table entries related to the adapter. If
some are still present, the RTAS call which isolates the PCI slot
returns error 9001 "valid outstanding translations" and the removal of
the IO adapter fails. This is because when the PHBs are scanned, Linux
maps automatically the INTx interrupts in the Linux interrupt number
space but these are never removed.

To solve this problem, we introduce a PPC platform specific
pcibios_remove_bus() routine which clears all interrupt mappings when
the bus is removed. This also clears the associated page table entries
of the ESB pages when using XIVE.

For this purpose, we record the logical interrupt numbers of the
mapped interrupt under the PHB structure and let pcibios_remove_bus()
do the clean up.

Since some PCI adapters, like GPUs, use the "interrupt-map" property
to describe interrupt mappings other than the legacy INTx interrupts,
we can not restrict the size of the mapping array to PCI_NUM_INTX. The
number of interrupt mappings is computed from the "interrupt-map"
property and the mapping array is allocated accordingly.

Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

 Changes since v2:

 - merged 2 patches.
 
 arch/powerpc/include/asm/pci-bridge.h |   6 ++
 arch/powerpc/kernel/pci-common.c      | 114 ++++++++++++++++++++++++++
 2 files changed, 120 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index b92e81b256e5..ca75cf264ddf 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -48,6 +48,9 @@ struct pci_controller_ops {
 
 /*
  * Structure of a PCI controller (host bridge)
+ *
+ * @irq_count: number of interrupt mappings
+ * @irq_map: interrupt mappings
  */
 struct pci_controller {
 	struct pci_bus *bus;
@@ -127,6 +130,9 @@ struct pci_controller {
 
 	void *private_data;
 	struct npu *npu;
+
+	unsigned int irq_count;
+	unsigned int *irq_map;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index be108616a721..deb831f0ae13 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -353,6 +353,115 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
 	return NULL;
 }
 
+/*
+ * Assumption is made on the interrupt parent. All interrupt-map
+ * entries are considered to have the same parent.
+ */
+static int pcibios_irq_map_count(struct pci_controller *phb)
+{
+	const __be32 *imap;
+	int imaplen;
+	struct device_node *parent;
+	u32 intsize, addrsize, parintsize, paraddrsize;
+
+	if (of_property_read_u32(phb->dn, "#interrupt-cells", &intsize))
+		return 0;
+	if (of_property_read_u32(phb->dn, "#address-cells", &addrsize))
+		return 0;
+
+	imap = of_get_property(phb->dn, "interrupt-map", &imaplen);
+	if (!imap) {
+		pr_debug("%pOF : no interrupt-map\n", phb->dn);
+		return 0;
+	}
+	imaplen /= sizeof(u32);
+	pr_debug("%pOF : imaplen=%d\n", phb->dn, imaplen);
+
+	if (imaplen < (addrsize + intsize + 1))
+		return 0;
+
+	imap += intsize + addrsize;
+	parent = of_find_node_by_phandle(be32_to_cpup(imap));
+	if (!parent) {
+		pr_debug("%pOF : no imap parent found !\n", phb->dn);
+		return 0;
+	}
+
+	if (of_property_read_u32(parent, "#interrupt-cells", &parintsize)) {
+		pr_debug("%pOF : parent lacks #interrupt-cells!\n", phb->dn);
+		return 0;
+	}
+
+	if (of_property_read_u32(parent, "#address-cells", &paraddrsize))
+		paraddrsize = 0;
+
+	return imaplen / (addrsize + intsize + 1 + paraddrsize + parintsize);
+}
+
+static void pcibios_irq_map_init(struct pci_controller *phb)
+{
+	phb->irq_count = pcibios_irq_map_count(phb);
+	if (phb->irq_count < PCI_NUM_INTX)
+		phb->irq_count = PCI_NUM_INTX;
+
+	pr_debug("%pOF : interrupt map #%d\n", phb->dn, phb->irq_count);
+
+	phb->irq_map = kcalloc(phb->irq_count, sizeof(unsigned int),
+			       GFP_KERNEL);
+}
+
+static void pci_irq_map_register(struct pci_dev *pdev, unsigned int virq)
+{
+	struct pci_controller *phb = pci_bus_to_host(pdev->bus);
+	int i;
+
+	if (!phb->irq_map)
+		return;
+
+	for (i = 0; i < phb->irq_count; i++) {
+		/*
+		 * Look for an empty or an equivalent slot, as INTx
+		 * interrupts can be shared between adapters.
+		 */
+		if (phb->irq_map[i] == virq || !phb->irq_map[i]) {
+			phb->irq_map[i] = virq;
+			break;
+		}
+	}
+
+	if (i == phb->irq_count)
+		pr_err("PCI:%s all platform interrupts mapped\n",
+		       pci_name(pdev));
+}
+
+/*
+ * Clearing the mapped interrupts will also clear the underlying
+ * mappings of the ESB pages of the interrupts when under XIVE. It is
+ * a requirement of PowerVM to clear all memory mappings before
+ * removing a PHB.
+ */
+static void pci_irq_map_dispose(struct pci_bus *bus)
+{
+	struct pci_controller *phb = pci_bus_to_host(bus);
+	int i;
+
+	if (!phb->irq_map)
+		return;
+
+	pr_debug("PCI: Clearing interrupt mappings for PHB %04x:%02x...\n",
+		 pci_domain_nr(bus), bus->number);
+	for (i = 0; i < phb->irq_count; i++)
+		irq_dispose_mapping(phb->irq_map[i]);
+
+	kfree(phb->irq_map);
+}
+
+void pcibios_remove_bus(struct pci_bus *bus)
+{
+	pci_irq_map_dispose(bus);
+}
+EXPORT_SYMBOL_GPL(pcibios_remove_bus);
+
 /*
  * Reads the interrupt pin to determine if interrupt is use by card.
  * If the interrupt is used, then gets the interrupt line from the
@@ -401,6 +510,8 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
 
 	pci_dev->irq = virq;
 
+	/* Record all interrut mappings for later removal of a PHB */
+	pci_irq_map_register(pci_dev, virq);
 	return 0;
 }
 
@@ -1554,6 +1665,9 @@ void pcibios_scan_phb(struct pci_controller *hose)
 
 	pr_debug("PCI: Scanning PHB %pOF\n", node);
 
+	/* Allocate interrupt mappings array */
+	pcibios_irq_map_init(hose);
+
 	/* Get some IO space for the new PHB */
 	pcibios_setup_phb_io_space(hose);
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-08-07 10:18 [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed Cédric Le Goater
@ 2020-09-09  6:16 ` Alexey Kardashevskiy
  2020-09-17 11:27 ` Michael Ellerman
  2020-09-23  0:33 ` Qian Cai
  2 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-09  6:16 UTC (permalink / raw)
  To: Cédric Le Goater, Michael Ellerman
  Cc: Oliver O'Halloran, linuxppc-dev



On 07/08/2020 20:18, Cédric Le Goater wrote:
> When a passthrough IO adapter is removed from a pseries machine using
> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
> guest OS to clear all page table entries related to the adapter. If
> some are still present, the RTAS call which isolates the PCI slot
> returns error 9001 "valid outstanding translations" and the removal of
> the IO adapter fails. This is because when the PHBs are scanned, Linux
> maps automatically the INTx interrupts in the Linux interrupt number
> space but these are never removed.
> 
> To solve this problem, we introduce a PPC platform specific
> pcibios_remove_bus() routine which clears all interrupt mappings when
> the bus is removed. This also clears the associated page table entries
> of the ESB pages when using XIVE.
> 
> For this purpose, we record the logical interrupt numbers of the
> mapped interrupt under the PHB structure and let pcibios_remove_bus()
> do the clean up.
> 
> Since some PCI adapters, like GPUs, use the "interrupt-map" property
> to describe interrupt mappings other than the legacy INTx interrupts,
> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
> number of interrupt mappings is computed from the "interrupt-map"
> property and the mapping array is allocated accordingly.
> 
> Cc: "Oliver O'Halloran" <oohall@gmail.com>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

I thought we could reuse some of the common OF code for the DT parsing
but we cannot (easily) so it is good as it is:

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
> 
>  Changes since v2:
> 
>  - merged 2 patches.
>  
>  arch/powerpc/include/asm/pci-bridge.h |   6 ++
>  arch/powerpc/kernel/pci-common.c      | 114 ++++++++++++++++++++++++++
>  2 files changed, 120 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index b92e81b256e5..ca75cf264ddf 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -48,6 +48,9 @@ struct pci_controller_ops {
>  
>  /*
>   * Structure of a PCI controller (host bridge)
> + *
> + * @irq_count: number of interrupt mappings
> + * @irq_map: interrupt mappings
>   */
>  struct pci_controller {
>  	struct pci_bus *bus;
> @@ -127,6 +130,9 @@ struct pci_controller {
>  
>  	void *private_data;
>  	struct npu *npu;
> +
> +	unsigned int irq_count;
> +	unsigned int *irq_map;
>  };
>  
>  /* These are used for config access before all the PCI probing
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index be108616a721..deb831f0ae13 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -353,6 +353,115 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
>  	return NULL;
>  }
>  
> +/*
> + * Assumption is made on the interrupt parent. All interrupt-map
> + * entries are considered to have the same parent.
> + */
> +static int pcibios_irq_map_count(struct pci_controller *phb)
> +{
> +	const __be32 *imap;
> +	int imaplen;
> +	struct device_node *parent;
> +	u32 intsize, addrsize, parintsize, paraddrsize;
> +
> +	if (of_property_read_u32(phb->dn, "#interrupt-cells", &intsize))
> +		return 0;
> +	if (of_property_read_u32(phb->dn, "#address-cells", &addrsize))
> +		return 0;
> +
> +	imap = of_get_property(phb->dn, "interrupt-map", &imaplen);
> +	if (!imap) {
> +		pr_debug("%pOF : no interrupt-map\n", phb->dn);
> +		return 0;
> +	}
> +	imaplen /= sizeof(u32);
> +	pr_debug("%pOF : imaplen=%d\n", phb->dn, imaplen);
> +
> +	if (imaplen < (addrsize + intsize + 1))
> +		return 0;
> +
> +	imap += intsize + addrsize;
> +	parent = of_find_node_by_phandle(be32_to_cpup(imap));
> +	if (!parent) {
> +		pr_debug("%pOF : no imap parent found !\n", phb->dn);
> +		return 0;
> +	}
> +
> +	if (of_property_read_u32(parent, "#interrupt-cells", &parintsize)) {
> +		pr_debug("%pOF : parent lacks #interrupt-cells!\n", phb->dn);
> +		return 0;
> +	}
> +
> +	if (of_property_read_u32(parent, "#address-cells", &paraddrsize))
> +		paraddrsize = 0;
> +
> +	return imaplen / (addrsize + intsize + 1 + paraddrsize + parintsize);
> +}
> +
> +static void pcibios_irq_map_init(struct pci_controller *phb)
> +{
> +	phb->irq_count = pcibios_irq_map_count(phb);
> +	if (phb->irq_count < PCI_NUM_INTX)
> +		phb->irq_count = PCI_NUM_INTX;
> +
> +	pr_debug("%pOF : interrupt map #%d\n", phb->dn, phb->irq_count);
> +
> +	phb->irq_map = kcalloc(phb->irq_count, sizeof(unsigned int),
> +			       GFP_KERNEL);
> +}
> +
> +static void pci_irq_map_register(struct pci_dev *pdev, unsigned int virq)
> +{
> +	struct pci_controller *phb = pci_bus_to_host(pdev->bus);
> +	int i;
> +
> +	if (!phb->irq_map)
> +		return;
> +
> +	for (i = 0; i < phb->irq_count; i++) {
> +		/*
> +		 * Look for an empty or an equivalent slot, as INTx
> +		 * interrupts can be shared between adapters.
> +		 */
> +		if (phb->irq_map[i] == virq || !phb->irq_map[i]) {
> +			phb->irq_map[i] = virq;
> +			break;
> +		}
> +	}
> +
> +	if (i == phb->irq_count)
> +		pr_err("PCI:%s all platform interrupts mapped\n",
> +		       pci_name(pdev));
> +}
> +
> +/*
> + * Clearing the mapped interrupts will also clear the underlying
> + * mappings of the ESB pages of the interrupts when under XIVE. It is
> + * a requirement of PowerVM to clear all memory mappings before
> + * removing a PHB.
> + */
> +static void pci_irq_map_dispose(struct pci_bus *bus)
> +{
> +	struct pci_controller *phb = pci_bus_to_host(bus);
> +	int i;
> +
> +	if (!phb->irq_map)
> +		return;
> +
> +	pr_debug("PCI: Clearing interrupt mappings for PHB %04x:%02x...\n",
> +		 pci_domain_nr(bus), bus->number);
> +	for (i = 0; i < phb->irq_count; i++)
> +		irq_dispose_mapping(phb->irq_map[i]);
> +
> +	kfree(phb->irq_map);
> +}
> +
> +void pcibios_remove_bus(struct pci_bus *bus)
> +{
> +	pci_irq_map_dispose(bus);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_remove_bus);
> +
>  /*
>   * Reads the interrupt pin to determine if interrupt is use by card.
>   * If the interrupt is used, then gets the interrupt line from the
> @@ -401,6 +510,8 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>  
>  	pci_dev->irq = virq;
>  
> +	/* Record all interrut mappings for later removal of a PHB */
> +	pci_irq_map_register(pci_dev, virq);
>  	return 0;
>  }
>  
> @@ -1554,6 +1665,9 @@ void pcibios_scan_phb(struct pci_controller *hose)
>  
>  	pr_debug("PCI: Scanning PHB %pOF\n", node);
>  
> +	/* Allocate interrupt mappings array */
> +	pcibios_irq_map_init(hose);
> +
>  	/* Get some IO space for the new PHB */
>  	pcibios_setup_phb_io_space(hose);
>  
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-08-07 10:18 [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed Cédric Le Goater
  2020-09-09  6:16 ` Alexey Kardashevskiy
@ 2020-09-17 11:27 ` Michael Ellerman
  2020-09-23  0:33 ` Qian Cai
  2 siblings, 0 replies; 12+ messages in thread
From: Michael Ellerman @ 2020-09-17 11:27 UTC (permalink / raw)
  To: Michael Ellerman, Cédric Le Goater
  Cc: Alexey Kardashevskiy, Oliver O'Halloran, linuxppc-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

On Fri, 7 Aug 2020 12:18:54 +0200, Cédric Le Goater wrote:
> When a passthrough IO adapter is removed from a pseries machine using
> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
> guest OS to clear all page table entries related to the adapter. If
> some are still present, the RTAS call which isolates the PCI slot
> returns error 9001 "valid outstanding translations" and the removal of
> the IO adapter fails. This is because when the PHBs are scanned, Linux
> maps automatically the INTx interrupts in the Linux interrupt number
> space but these are never removed.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
      https://git.kernel.org/powerpc/c/3a3181e16fbde752007759f8759d25e0ff1fc425

cheers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-08-07 10:18 [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed Cédric Le Goater
  2020-09-09  6:16 ` Alexey Kardashevskiy
  2020-09-17 11:27 ` Michael Ellerman
@ 2020-09-23  0:33 ` Qian Cai
  2020-09-23  7:06   ` Cédric Le Goater
  2 siblings, 1 reply; 12+ messages in thread
From: Qian Cai @ 2020-09-23  0:33 UTC (permalink / raw)
  To: Cédric Le Goater, Michael Ellerman
  Cc: Stephen Rothwell, Alexey Kardashevskiy, linux-kernel, linux-next,
	Oliver O'Halloran, linuxppc-dev

On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
> When a passthrough IO adapter is removed from a pseries machine using
> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
> guest OS to clear all page table entries related to the adapter. If
> some are still present, the RTAS call which isolates the PCI slot
> returns error 9001 "valid outstanding translations" and the removal of
> the IO adapter fails. This is because when the PHBs are scanned, Linux
> maps automatically the INTx interrupts in the Linux interrupt number
> space but these are never removed.
> 
> To solve this problem, we introduce a PPC platform specific
> pcibios_remove_bus() routine which clears all interrupt mappings when
> the bus is removed. This also clears the associated page table entries
> of the ESB pages when using XIVE.
> 
> For this purpose, we record the logical interrupt numbers of the
> mapped interrupt under the PHB structure and let pcibios_remove_bus()
> do the clean up.
> 
> Since some PCI adapters, like GPUs, use the "interrupt-map" property
> to describe interrupt mappings other than the legacy INTx interrupts,
> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
> number of interrupt mappings is computed from the "interrupt-map"
> property and the mapping array is allocated accordingly.
> 
> Cc: "Oliver O'Halloran" <oohall@gmail.com>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
this patch.

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config

[ 3574.564109][  T965] ata1.00: disabled
[ 3574.580373][T151472] sd 0:0:0:0: [sdb] Synchronizing SCSI cache
[ 3574.581180][T151472] sd 0:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
[ 3574.581226][T151472] sd 0:0:0:0: [sdb] Stopping disk
[ 3574.581289][T151472] sd 0:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
[ 3574.611424][ T3019] Read-error on swap-device (254:1:849792)
[ 3574.611685][ T3019] Read-error on swap-device (254:1:914944)
[ 3574.611769][ T3019] Read-error on swap-device (254:1:915072)
[ 3574.611838][ T3019] Read-error on swap-device (254:1:915200)
[ 3574.611926][ T3019] Read-error on swap-device (254:1:915328)
[ 3574.612268][ T3084] Read-error on swap-device (254:1:792576)
[ 3574.612342][ T3084] Read-error on swap-device (254:1:792704)
[ 3574.612757][ T2362] Read-error on swap-device (254:1:957440)
[ 3574.612773][ T2905] Read-error on swap-device (254:1:784128)
[ 3574.613015][ T2362] Read-error on swap-device (254:1:957568)
[ 3574.613160][ T2905] Read-error on swap-device (254:1:784256)
[ 3574.613241][ T2362] Read-error on swap-device (254:1:957696)
[ 3574.613342][ T2362] Read-error on swap-device (254:1:957824)
[ 3574.614448][ T3019] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
[ 3574.614663][ T3019] Read-error on swap-device (254:1:961536)
[ 3574.675330][T151844] Read-error on swap-device (254:1:128)
[ 3574.675515][T151844] Read-error on swap-device (254:1:256)
[ 3574.675700][T151844] Read-error on swap-device (254:1:384)
[ 3574.703570][  T971] ata2.00: disabled
[ 3574.710393][T151472] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[ 3574.710864][T151472] sd 1:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
[ 3574.710922][T151472] sd 1:0:0:0: [sda] Stopping disk
[ 3574.711010][T151472] sd 1:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
[ 3574.826569][  T674] dm-0: writeback error on inode 68507862, offset 65536, sector 54281504
[ 3575.117547][ T3366] dm-0: writeback error on inode 68507851, offset 0, sector 54378880
[ 3575.140104][T151472] pci 0004:03:00.0: Removing from iommu group 3
[ 3575.141778][T151472] pci 0004:03     : [PE# fb] Releasing PE
[ 3575.141965][T151472] pci 0004:03     : [PE# fb] Removing DMA window #0
[ 3575.142452][T151472] pci 0004:03     : [PE# fb] Disabling 64-bit DMA bypass
[ 3575.149369][T151472] pci_bus 0004:03: busn_res: [bus 03] is released
[ 3575.150574][T152037] Read-error on swap-device (254:1:35584)
[ 3575.150713][T152037] Read-error on swap-device (254:1:35712)
[ 3575.152632][T152037] Read-error on swap-device (254:1:915584)
[ 3575.152706][T151472] pci_bus 0004:04: busn_res: [bus 04-08] is released
[ 3575.152983][T151472] =============================================================================
[ 3575.153937][T151472] BUG kmalloc-16 (Not tainted): Object already free
[ 3575.153962][T151472] -----------------------------------------------------------------------------
[ 3575.153962][T151472] 
[ 3575.154020][T151472] Disabling lock debugging due to kernel taint
[ 3575.154047][T151472] INFO: Allocated in pcibios_scan_phb+0x104/0x3e0 age=356904 cpu=4 pid=1
[ 3575.154084][T151472] 	__slab_alloc+0xa4/0xf0
[ 3575.154105][T151472] 	__kmalloc+0x294/0x330
[ 3575.154127][T151472] 	pcibios_scan_phb+0x104/0x3e0
[ 3575.154165][T151472] 	pcibios_init+0x84/0x124
[ 3575.154209][T151472] 	do_one_initcall+0xac/0x528
[ 3575.154241][T151472] 	kernel_init_freeable+0x35c/0x3fc
[ 3575.154272][T151472] 	kernel_init+0x24/0x148
[ 3575.154306][T151472] 	ret_from_kernel_thread+0x5c/0x80
[ 3575.154352][T151472] INFO: Freed in pcibios_remove_bus+0x70/0x90 age=0 cpu=7 pid=151472
[ 3575.154387][T151472] 	kfree+0x49c/0x510
[ 3575.154406][T151472] 	pcibios_remove_bus+0x70/0x90
[ 3575.154443][T151472] 	pci_remove_bus+0xe4/0x110
[ 3575.154467][T151472] 	pci_remove_bus_device+0x74/0x170
[ 3575.154503][T151472] 	pci_remove_bus_device+0x4c/0x170
[ 3575.154524][T151472] 	pci_stop_and_remove_bus_device_locked+0x34/0x50
[ 3575.154539][T151472] 	remove_store+0xc0/0xe0
[ 3575.154551][T151472] 	dev_attr_store+0x30/0x50
[ 3575.154573][T151472] 	sysfs_kf_write+0x68/0xb0
[ 3575.154595][T151472] 	kernfs_fop_write+0x114/0x260
[ 3575.154643][T151472] 	vfs_write+0xe4/0x260
[ 3575.154667][T151472] 	ksys_write+0x74/0x130
[ 3575.154692][T151472] 	system_call_exception+0xf8/0x1d0
[ 3575.154728][T151472] 	system_call_common+0xe8/0x218
[ 3575.154788][T151472] INFO: Slab 0x00000000cafdf25c objects=178 used=174 fp=0x0000000020a64b99 flags=0x7fff8000000201
[ 3575.154848][T151472] INFO: Object 0x00000000af116201 @offset=5168 fp=0x0000000000000000
[ 3575.154848][T151472] 
[ 3575.154901][T151472] Redzone 00000000781d3b96: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
[ 3575.154968][T151472] Object 00000000af116201: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
[ 3575.155012][T151472] Redzone 000000007b8ec00f: bb bb bb bb bb bb bb bb                          ........
[ 3575.155052][T151472] Padding 00000000df7d5e89: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
[ 3575.155107][T151472] CPU: 7 PID: 151472 Comm: trinity-c61 Tainted: G    B             5.9.0-rc6-next-20200922+ #2
[ 3575.155147][T151472] Call Trace:
[ 3575.155171][T151472] [c000001fefdef800] [c000000000644278] dump_stack+0xec/0x144 (unreliable)
[ 3575.155204][T151472] [c000001fefdef840] [c00000000035e9a8] print_trailer+0x278/0x2a0
[ 3575.155220][T151472] [c000001fefdef8d0] [c000000000355d9c] free_debug_processing+0x57c/0x600
[ 3575.155236][T151472] [c000001fefdef9b0] [c000000000356234] __slab_free+0x414/0x5b0
[ 3575.155251][T151472] [c000001fefdefa80] [c00000000035686c] kfree+0x49c/0x510
[ 3575.155267][T151472] [c000001fefdefb10] [c000000000043260] pcibios_remove_bus+0x70/0x90
pci_irq_map_dispose at arch/powerpc/kernel/pci-common.c:456
(inlined by) pcibios_remove_bus at arch/powerpc/kernel/pci-common.c:461
[ 3575.155282][T151472] [c000001fefdefb40] [c00000000066fee4] pci_remove_bus+0xe4/0x110
[ 3575.155309][T151472] [c000001fefdefb70] [c000000000670084] pci_remove_bus_device+0x74/0x170
[ 3575.155348][T151472] [c000001fefdefbb0] [c000000000670070] pci_remove_bus_device+0x60/0x170
[ 3575.155377][T151472] [c000001fefdefbf0] [c0000000006701f4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[ 3575.155395][T151472] [c000001fefdefc20] [c00000000067f5e0] remove_store+0xc0/0xe0
[ 3575.155431][T151472] [c000001fefdefc70] [c0000000006dccd0] dev_attr_store+0x30/0x50
[ 3575.155457][T151472] [c000001fefdefc90] [c00000000049fae8] sysfs_kf_write+0x68/0xb0
[ 3575.155507][T151472] [c000001fefdefcd0] [c00000000049ed14] kernfs_fop_write+0x114/0x260
[ 3575.155553][T151472] [c000001fefdefd20] [c0000000003ab2d4] vfs_write+0xe4/0x260
[ 3575.155592][T151472] [c000001fefdefd70] [c0000000003ab604] ksys_write+0x74/0x130
[ 3575.155630][T151472] [c000001fefdefdc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
[ 3575.155671][T151472] [c000001fefdefe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 3575.155715][T151472] FIX kmalloc-16: Object at 0x00000000af116201 not freed
[ 3575.156125][T151472] =============================================================================
[ 3575.156170][T151472] BUG kmalloc-16 (Tainted: G    B            ): Wrong object count. Counter is 174 but counted were 176
[ 3575.156204][T151472] -----------------------------------------------------------------------------
[ 3575.156204][T151472] 
[ 3575.156240][T151472] INFO: Slab 0x00000000cafdf25c objects=178 used=174 fp=0x0000000020a64b99 flags=0x7fff8000000201
[ 3575.156296][T151472] CPU: 7 PID: 151472 Comm: trinity-c61 Tainted: G    B             5.9.0-rc6-next-20200922+ #2
[ 3575.156348][T151472] Call Trace:
[ 3575.156376][T151472] [c000001fefdef6f0] [c000000000644278] dump_stack+0xec/0x144 (unreliable)
[ 3575.156439][T151472] [c000001fefdef730] [c00000000035e688] slab_err+0x78/0xb0
[ 3575.156502][T151472] [c000001fefdef810] [c0000000003552a4] on_freelist+0x364/0x390
[ 3575.156541][T151472] [c000001fefdef8b0] [c000000000355aa8] free_debug_processing+0x288/0x600
[ 3575.156598][T151472] [c000001fefdef990] [c000000000356234] __slab_free+0x414/0x5b0
[ 3575.156644][T151472] [c000001fefdefa60] [c00000000035686c] kfree+0x49c/0x510
[ 3575.156701][T151472] [c000001fefdefaf0] [c0000000002b8a10] kfree_const+0x60/0x80
[ 3575.156738][T151472] [c000001fefdefb10] [c00000000064d3bc] kobject_release+0x7c/0xd0
[ 3575.156784][T151472] [c000001fefdefb50] [c0000000006de0c0] put_device+0x20/0x40
[ 3575.156831][T151472] [c000001fefdefb70] [c00000000067015c] pci_remove_bus_device+0x14c/0x170
[ 3575.156867][T151472] [c000001fefdefbb0] [c000000000670070] pci_remove_bus_device+0x60/0x170
[ 3575.156923][T151472] [c000001fefdefbf0] [c0000000006701f4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[ 3575.156998][T151472] [c000001fefdefc20] [c00000000067f5e0] remove_store+0xc0/0xe0
[ 3575.157058][T151472] [c000001fefdefc70] [c0000000006dccd0] dev_attr_store+0x30/0x50
[ 3575.157114][T151472] [c000001fefdefc90] [c00000000049fae8] sysfs_kf_write+0x68/0xb0
[ 3575.157169][T151472] [c000001fefdefcd0] [c00000000049ed14] kernfs_fop_write+0x114/0x260
[ 3575.157215][T151472] [c000001fefdefd20] [c0000000003ab2d4] vfs_write+0xe4/0x260
[ 3575.157256][T151472] [c000001fefdefd70] [c0000000003ab604] ksys_write+0x74/0x130
[ 3575.157301][T151472] [c000001fefdefdc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
[ 3575.157352][T151472] [c000001fefdefe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 3575.157396][T151472] FIX kmalloc-16: Object count adjusted.
[ 3575.157457][T151472] pci_bus 0004:09: busn_res: [bus 09-0d] is released


> ---
> 
>  Changes since v2:
> 
>  - merged 2 patches.
>  
>  arch/powerpc/include/asm/pci-bridge.h |   6 ++
>  arch/powerpc/kernel/pci-common.c      | 114 ++++++++++++++++++++++++++
>  2 files changed, 120 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pci-bridge.h
> b/arch/powerpc/include/asm/pci-bridge.h
> index b92e81b256e5..ca75cf264ddf 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -48,6 +48,9 @@ struct pci_controller_ops {
>  
>  /*
>   * Structure of a PCI controller (host bridge)
> + *
> + * @irq_count: number of interrupt mappings
> + * @irq_map: interrupt mappings
>   */
>  struct pci_controller {
>  	struct pci_bus *bus;
> @@ -127,6 +130,9 @@ struct pci_controller {
>  
>  	void *private_data;
>  	struct npu *npu;
> +
> +	unsigned int irq_count;
> +	unsigned int *irq_map;
>  };
>  
>  /* These are used for config access before all the PCI probing
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-
> common.c
> index be108616a721..deb831f0ae13 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -353,6 +353,115 @@ struct pci_controller
> *pci_find_controller_for_domain(int domain_nr)
>  	return NULL;
>  }
>  
> +/*
> + * Assumption is made on the interrupt parent. All interrupt-map
> + * entries are considered to have the same parent.
> + */
> +static int pcibios_irq_map_count(struct pci_controller *phb)
> +{
> +	const __be32 *imap;
> +	int imaplen;
> +	struct device_node *parent;
> +	u32 intsize, addrsize, parintsize, paraddrsize;
> +
> +	if (of_property_read_u32(phb->dn, "#interrupt-cells", &intsize))
> +		return 0;
> +	if (of_property_read_u32(phb->dn, "#address-cells", &addrsize))
> +		return 0;
> +
> +	imap = of_get_property(phb->dn, "interrupt-map", &imaplen);
> +	if (!imap) {
> +		pr_debug("%pOF : no interrupt-map\n", phb->dn);
> +		return 0;
> +	}
> +	imaplen /= sizeof(u32);
> +	pr_debug("%pOF : imaplen=%d\n", phb->dn, imaplen);
> +
> +	if (imaplen < (addrsize + intsize + 1))
> +		return 0;
> +
> +	imap += intsize + addrsize;
> +	parent = of_find_node_by_phandle(be32_to_cpup(imap));
> +	if (!parent) {
> +		pr_debug("%pOF : no imap parent found !\n", phb->dn);
> +		return 0;
> +	}
> +
> +	if (of_property_read_u32(parent, "#interrupt-cells", &parintsize)) {
> +		pr_debug("%pOF : parent lacks #interrupt-cells!\n", phb->dn);
> +		return 0;
> +	}
> +
> +	if (of_property_read_u32(parent, "#address-cells", &paraddrsize))
> +		paraddrsize = 0;
> +
> +	return imaplen / (addrsize + intsize + 1 + paraddrsize + parintsize);
> +}
> +
> +static void pcibios_irq_map_init(struct pci_controller *phb)
> +{
> +	phb->irq_count = pcibios_irq_map_count(phb);
> +	if (phb->irq_count < PCI_NUM_INTX)
> +		phb->irq_count = PCI_NUM_INTX;
> +
> +	pr_debug("%pOF : interrupt map #%d\n", phb->dn, phb->irq_count);
> +
> +	phb->irq_map = kcalloc(phb->irq_count, sizeof(unsigned int),
> +			       GFP_KERNEL);
> +}
> +
> +static void pci_irq_map_register(struct pci_dev *pdev, unsigned int virq)
> +{
> +	struct pci_controller *phb = pci_bus_to_host(pdev->bus);
> +	int i;
> +
> +	if (!phb->irq_map)
> +		return;
> +
> +	for (i = 0; i < phb->irq_count; i++) {
> +		/*
> +		 * Look for an empty or an equivalent slot, as INTx
> +		 * interrupts can be shared between adapters.
> +		 */
> +		if (phb->irq_map[i] == virq || !phb->irq_map[i]) {
> +			phb->irq_map[i] = virq;
> +			break;
> +		}
> +	}
> +
> +	if (i == phb->irq_count)
> +		pr_err("PCI:%s all platform interrupts mapped\n",
> +		       pci_name(pdev));
> +}
> +
> +/*
> + * Clearing the mapped interrupts will also clear the underlying
> + * mappings of the ESB pages of the interrupts when under XIVE. It is
> + * a requirement of PowerVM to clear all memory mappings before
> + * removing a PHB.
> + */
> +static void pci_irq_map_dispose(struct pci_bus *bus)
> +{
> +	struct pci_controller *phb = pci_bus_to_host(bus);
> +	int i;
> +
> +	if (!phb->irq_map)
> +		return;
> +
> +	pr_debug("PCI: Clearing interrupt mappings for PHB %04x:%02x...\n",
> +		 pci_domain_nr(bus), bus->number);
> +	for (i = 0; i < phb->irq_count; i++)
> +		irq_dispose_mapping(phb->irq_map[i]);
> +
> +	kfree(phb->irq_map);
> +}
> +
> +void pcibios_remove_bus(struct pci_bus *bus)
> +{
> +	pci_irq_map_dispose(bus);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_remove_bus);
> +
>  /*
>   * Reads the interrupt pin to determine if interrupt is use by card.
>   * If the interrupt is used, then gets the interrupt line from the
> @@ -401,6 +510,8 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>  
>  	pci_dev->irq = virq;
>  
> +	/* Record all interrut mappings for later removal of a PHB */
> +	pci_irq_map_register(pci_dev, virq);
>  	return 0;
>  }
>  
> @@ -1554,6 +1665,9 @@ void pcibios_scan_phb(struct pci_controller *hose)
>  
>  	pr_debug("PCI: Scanning PHB %pOF\n", node);
>  
> +	/* Allocate interrupt mappings array */
> +	pcibios_irq_map_init(hose);
> +
>  	/* Get some IO space for the new PHB */
>  	pcibios_setup_phb_io_space(hose);
>  


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-09-23  0:33 ` Qian Cai
@ 2020-09-23  7:06   ` Cédric Le Goater
  2020-09-24  5:11     ` Alexey Kardashevskiy
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Cédric Le Goater @ 2020-09-23  7:06 UTC (permalink / raw)
  To: Qian Cai, Michael Ellerman
  Cc: Stephen Rothwell, Alexey Kardashevskiy, linux-kernel, linux-next,
	Oliver O'Halloran, linuxppc-dev

On 9/23/20 2:33 AM, Qian Cai wrote:
> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>> When a passthrough IO adapter is removed from a pseries machine using
>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>> guest OS to clear all page table entries related to the adapter. If
>> some are still present, the RTAS call which isolates the PCI slot
>> returns error 9001 "valid outstanding translations" and the removal of
>> the IO adapter fails. This is because when the PHBs are scanned, Linux
>> maps automatically the INTx interrupts in the Linux interrupt number
>> space but these are never removed.
>>
>> To solve this problem, we introduce a PPC platform specific
>> pcibios_remove_bus() routine which clears all interrupt mappings when
>> the bus is removed. This also clears the associated page table entries
>> of the ESB pages when using XIVE.
>>
>> For this purpose, we record the logical interrupt numbers of the
>> mapped interrupt under the PHB structure and let pcibios_remove_bus()
>> do the clean up.
>>
>> Since some PCI adapters, like GPUs, use the "interrupt-map" property
>> to describe interrupt mappings other than the legacy INTx interrupts,
>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>> number of interrupt mappings is computed from the "interrupt-map"
>> property and the mapping array is allocated accordingly.
>>
>> Cc: "Oliver O'Halloran" <oohall@gmail.com>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
> this patch.
> 
> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config

OK. The patch is missing a NULL assignement after kfree() and that
might be the issue. 

I did try PHB removal under PowerNV, so I would like to understand 
how we managed to remove twice the PCI bus and possibly reproduce. 
Any chance we could grab what the syscall fuzzer (syzkaller) did ? 


Thanks,

C. 

> 
> [ 3574.564109][  T965] ata1.00: disabled
> [ 3574.580373][T151472] sd 0:0:0:0: [sdb] Synchronizing SCSI cache
> [ 3574.581180][T151472] sd 0:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
> [ 3574.581226][T151472] sd 0:0:0:0: [sdb] Stopping disk
> [ 3574.581289][T151472] sd 0:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
> [ 3574.611424][ T3019] Read-error on swap-device (254:1:849792)
> [ 3574.611685][ T3019] Read-error on swap-device (254:1:914944)
> [ 3574.611769][ T3019] Read-error on swap-device (254:1:915072)
> [ 3574.611838][ T3019] Read-error on swap-device (254:1:915200)
> [ 3574.611926][ T3019] Read-error on swap-device (254:1:915328)
> [ 3574.612268][ T3084] Read-error on swap-device (254:1:792576)
> [ 3574.612342][ T3084] Read-error on swap-device (254:1:792704)
> [ 3574.612757][ T2362] Read-error on swap-device (254:1:957440)
> [ 3574.612773][ T2905] Read-error on swap-device (254:1:784128)
> [ 3574.613015][ T2362] Read-error on swap-device (254:1:957568)
> [ 3574.613160][ T2905] Read-error on swap-device (254:1:784256)
> [ 3574.613241][ T2362] Read-error on swap-device (254:1:957696)
> [ 3574.613342][ T2362] Read-error on swap-device (254:1:957824)
> [ 3574.614448][ T3019] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
> [ 3574.614663][ T3019] Read-error on swap-device (254:1:961536)
> [ 3574.675330][T151844] Read-error on swap-device (254:1:128)
> [ 3574.675515][T151844] Read-error on swap-device (254:1:256)
> [ 3574.675700][T151844] Read-error on swap-device (254:1:384)
> [ 3574.703570][  T971] ata2.00: disabled
> [ 3574.710393][T151472] sd 1:0:0:0: [sda] Synchronizing SCSI cache
> [ 3574.710864][T151472] sd 1:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
> [ 3574.710922][T151472] sd 1:0:0:0: [sda] Stopping disk
> [ 3574.711010][T151472] sd 1:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
> [ 3574.826569][  T674] dm-0: writeback error on inode 68507862, offset 65536, sector 54281504
> [ 3575.117547][ T3366] dm-0: writeback error on inode 68507851, offset 0, sector 54378880
> [ 3575.140104][T151472] pci 0004:03:00.0: Removing from iommu group 3
> [ 3575.141778][T151472] pci 0004:03     : [PE# fb] Releasing PE
> [ 3575.141965][T151472] pci 0004:03     : [PE# fb] Removing DMA window #0
> [ 3575.142452][T151472] pci 0004:03     : [PE# fb] Disabling 64-bit DMA bypass
> [ 3575.149369][T151472] pci_bus 0004:03: busn_res: [bus 03] is released
> [ 3575.150574][T152037] Read-error on swap-device (254:1:35584)
> [ 3575.150713][T152037] Read-error on swap-device (254:1:35712)
> [ 3575.152632][T152037] Read-error on swap-device (254:1:915584)
> [ 3575.152706][T151472] pci_bus 0004:04: busn_res: [bus 04-08] is released
> [ 3575.152983][T151472] =============================================================================
> [ 3575.153937][T151472] BUG kmalloc-16 (Not tainted): Object already free
> [ 3575.153962][T151472] -----------------------------------------------------------------------------
> [ 3575.153962][T151472] 
> [ 3575.154020][T151472] Disabling lock debugging due to kernel taint
> [ 3575.154047][T151472] INFO: Allocated in pcibios_scan_phb+0x104/0x3e0 age=356904 cpu=4 pid=1
> [ 3575.154084][T151472] 	__slab_alloc+0xa4/0xf0
> [ 3575.154105][T151472] 	__kmalloc+0x294/0x330
> [ 3575.154127][T151472] 	pcibios_scan_phb+0x104/0x3e0
> [ 3575.154165][T151472] 	pcibios_init+0x84/0x124
> [ 3575.154209][T151472] 	do_one_initcall+0xac/0x528
> [ 3575.154241][T151472] 	kernel_init_freeable+0x35c/0x3fc
> [ 3575.154272][T151472] 	kernel_init+0x24/0x148
> [ 3575.154306][T151472] 	ret_from_kernel_thread+0x5c/0x80
> [ 3575.154352][T151472] INFO: Freed in pcibios_remove_bus+0x70/0x90 age=0 cpu=7 pid=151472
> [ 3575.154387][T151472] 	kfree+0x49c/0x510
> [ 3575.154406][T151472] 	pcibios_remove_bus+0x70/0x90
> [ 3575.154443][T151472] 	pci_remove_bus+0xe4/0x110
> [ 3575.154467][T151472] 	pci_remove_bus_device+0x74/0x170
> [ 3575.154503][T151472] 	pci_remove_bus_device+0x4c/0x170
> [ 3575.154524][T151472] 	pci_stop_and_remove_bus_device_locked+0x34/0x50
> [ 3575.154539][T151472] 	remove_store+0xc0/0xe0
> [ 3575.154551][T151472] 	dev_attr_store+0x30/0x50
> [ 3575.154573][T151472] 	sysfs_kf_write+0x68/0xb0
> [ 3575.154595][T151472] 	kernfs_fop_write+0x114/0x260
> [ 3575.154643][T151472] 	vfs_write+0xe4/0x260
> [ 3575.154667][T151472] 	ksys_write+0x74/0x130
> [ 3575.154692][T151472] 	system_call_exception+0xf8/0x1d0
> [ 3575.154728][T151472] 	system_call_common+0xe8/0x218
> [ 3575.154788][T151472] INFO: Slab 0x00000000cafdf25c objects=178 used=174 fp=0x0000000020a64b99 flags=0x7fff8000000201
> [ 3575.154848][T151472] INFO: Object 0x00000000af116201 @offset=5168 fp=0x0000000000000000
> [ 3575.154848][T151472] 
> [ 3575.154901][T151472] Redzone 00000000781d3b96: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
> [ 3575.154968][T151472] Object 00000000af116201: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
> [ 3575.155012][T151472] Redzone 000000007b8ec00f: bb bb bb bb bb bb bb bb                          ........
> [ 3575.155052][T151472] Padding 00000000df7d5e89: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
> [ 3575.155107][T151472] CPU: 7 PID: 151472 Comm: trinity-c61 Tainted: G    B             5.9.0-rc6-next-20200922+ #2
> [ 3575.155147][T151472] Call Trace:
> [ 3575.155171][T151472] [c000001fefdef800] [c000000000644278] dump_stack+0xec/0x144 (unreliable)
> [ 3575.155204][T151472] [c000001fefdef840] [c00000000035e9a8] print_trailer+0x278/0x2a0
> [ 3575.155220][T151472] [c000001fefdef8d0] [c000000000355d9c] free_debug_processing+0x57c/0x600
> [ 3575.155236][T151472] [c000001fefdef9b0] [c000000000356234] __slab_free+0x414/0x5b0
> [ 3575.155251][T151472] [c000001fefdefa80] [c00000000035686c] kfree+0x49c/0x510
> [ 3575.155267][T151472] [c000001fefdefb10] [c000000000043260] pcibios_remove_bus+0x70/0x90
> pci_irq_map_dispose at arch/powerpc/kernel/pci-common.c:456
> (inlined by) pcibios_remove_bus at arch/powerpc/kernel/pci-common.c:461
> [ 3575.155282][T151472] [c000001fefdefb40] [c00000000066fee4] pci_remove_bus+0xe4/0x110
> [ 3575.155309][T151472] [c000001fefdefb70] [c000000000670084] pci_remove_bus_device+0x74/0x170
> [ 3575.155348][T151472] [c000001fefdefbb0] [c000000000670070] pci_remove_bus_device+0x60/0x170
> [ 3575.155377][T151472] [c000001fefdefbf0] [c0000000006701f4] pci_stop_and_remove_bus_device_locked+0x34/0x50
> [ 3575.155395][T151472] [c000001fefdefc20] [c00000000067f5e0] remove_store+0xc0/0xe0
> [ 3575.155431][T151472] [c000001fefdefc70] [c0000000006dccd0] dev_attr_store+0x30/0x50
> [ 3575.155457][T151472] [c000001fefdefc90] [c00000000049fae8] sysfs_kf_write+0x68/0xb0
> [ 3575.155507][T151472] [c000001fefdefcd0] [c00000000049ed14] kernfs_fop_write+0x114/0x260
> [ 3575.155553][T151472] [c000001fefdefd20] [c0000000003ab2d4] vfs_write+0xe4/0x260
> [ 3575.155592][T151472] [c000001fefdefd70] [c0000000003ab604] ksys_write+0x74/0x130
> [ 3575.155630][T151472] [c000001fefdefdc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
> [ 3575.155671][T151472] [c000001fefdefe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 3575.155715][T151472] FIX kmalloc-16: Object at 0x00000000af116201 not freed
> [ 3575.156125][T151472] =============================================================================
> [ 3575.156170][T151472] BUG kmalloc-16 (Tainted: G    B            ): Wrong object count. Counter is 174 but counted were 176
> [ 3575.156204][T151472] -----------------------------------------------------------------------------
> [ 3575.156204][T151472] 
> [ 3575.156240][T151472] INFO: Slab 0x00000000cafdf25c objects=178 used=174 fp=0x0000000020a64b99 flags=0x7fff8000000201
> [ 3575.156296][T151472] CPU: 7 PID: 151472 Comm: trinity-c61 Tainted: G    B             5.9.0-rc6-next-20200922+ #2
> [ 3575.156348][T151472] Call Trace:
> [ 3575.156376][T151472] [c000001fefdef6f0] [c000000000644278] dump_stack+0xec/0x144 (unreliable)
> [ 3575.156439][T151472] [c000001fefdef730] [c00000000035e688] slab_err+0x78/0xb0
> [ 3575.156502][T151472] [c000001fefdef810] [c0000000003552a4] on_freelist+0x364/0x390
> [ 3575.156541][T151472] [c000001fefdef8b0] [c000000000355aa8] free_debug_processing+0x288/0x600
> [ 3575.156598][T151472] [c000001fefdef990] [c000000000356234] __slab_free+0x414/0x5b0
> [ 3575.156644][T151472] [c000001fefdefa60] [c00000000035686c] kfree+0x49c/0x510
> [ 3575.156701][T151472] [c000001fefdefaf0] [c0000000002b8a10] kfree_const+0x60/0x80
> [ 3575.156738][T151472] [c000001fefdefb10] [c00000000064d3bc] kobject_release+0x7c/0xd0
> [ 3575.156784][T151472] [c000001fefdefb50] [c0000000006de0c0] put_device+0x20/0x40
> [ 3575.156831][T151472] [c000001fefdefb70] [c00000000067015c] pci_remove_bus_device+0x14c/0x170
> [ 3575.156867][T151472] [c000001fefdefbb0] [c000000000670070] pci_remove_bus_device+0x60/0x170
> [ 3575.156923][T151472] [c000001fefdefbf0] [c0000000006701f4] pci_stop_and_remove_bus_device_locked+0x34/0x50
> [ 3575.156998][T151472] [c000001fefdefc20] [c00000000067f5e0] remove_store+0xc0/0xe0
> [ 3575.157058][T151472] [c000001fefdefc70] [c0000000006dccd0] dev_attr_store+0x30/0x50
> [ 3575.157114][T151472] [c000001fefdefc90] [c00000000049fae8] sysfs_kf_write+0x68/0xb0
> [ 3575.157169][T151472] [c000001fefdefcd0] [c00000000049ed14] kernfs_fop_write+0x114/0x260
> [ 3575.157215][T151472] [c000001fefdefd20] [c0000000003ab2d4] vfs_write+0xe4/0x260
> [ 3575.157256][T151472] [c000001fefdefd70] [c0000000003ab604] ksys_write+0x74/0x130
> [ 3575.157301][T151472] [c000001fefdefdc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
> [ 3575.157352][T151472] [c000001fefdefe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 3575.157396][T151472] FIX kmalloc-16: Object count adjusted.
> [ 3575.157457][T151472] pci_bus 0004:09: busn_res: [bus 09-0d] is released
> 
> 
>> ---
>>
>>  Changes since v2:
>>
>>  - merged 2 patches.
>>  
>>  arch/powerpc/include/asm/pci-bridge.h |   6 ++
>>  arch/powerpc/kernel/pci-common.c      | 114 ++++++++++++++++++++++++++
>>  2 files changed, 120 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/pci-bridge.h
>> b/arch/powerpc/include/asm/pci-bridge.h
>> index b92e81b256e5..ca75cf264ddf 100644
>> --- a/arch/powerpc/include/asm/pci-bridge.h
>> +++ b/arch/powerpc/include/asm/pci-bridge.h
>> @@ -48,6 +48,9 @@ struct pci_controller_ops {
>>  
>>  /*
>>   * Structure of a PCI controller (host bridge)
>> + *
>> + * @irq_count: number of interrupt mappings
>> + * @irq_map: interrupt mappings
>>   */
>>  struct pci_controller {
>>  	struct pci_bus *bus;
>> @@ -127,6 +130,9 @@ struct pci_controller {
>>  
>>  	void *private_data;
>>  	struct npu *npu;
>> +
>> +	unsigned int irq_count;
>> +	unsigned int *irq_map;
>>  };
>>  
>>  /* These are used for config access before all the PCI probing
>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-
>> common.c
>> index be108616a721..deb831f0ae13 100644
>> --- a/arch/powerpc/kernel/pci-common.c
>> +++ b/arch/powerpc/kernel/pci-common.c
>> @@ -353,6 +353,115 @@ struct pci_controller
>> *pci_find_controller_for_domain(int domain_nr)
>>  	return NULL;
>>  }
>>  
>> +/*
>> + * Assumption is made on the interrupt parent. All interrupt-map
>> + * entries are considered to have the same parent.
>> + */
>> +static int pcibios_irq_map_count(struct pci_controller *phb)
>> +{
>> +	const __be32 *imap;
>> +	int imaplen;
>> +	struct device_node *parent;
>> +	u32 intsize, addrsize, parintsize, paraddrsize;
>> +
>> +	if (of_property_read_u32(phb->dn, "#interrupt-cells", &intsize))
>> +		return 0;
>> +	if (of_property_read_u32(phb->dn, "#address-cells", &addrsize))
>> +		return 0;
>> +
>> +	imap = of_get_property(phb->dn, "interrupt-map", &imaplen);
>> +	if (!imap) {
>> +		pr_debug("%pOF : no interrupt-map\n", phb->dn);
>> +		return 0;
>> +	}
>> +	imaplen /= sizeof(u32);
>> +	pr_debug("%pOF : imaplen=%d\n", phb->dn, imaplen);
>> +
>> +	if (imaplen < (addrsize + intsize + 1))
>> +		return 0;
>> +
>> +	imap += intsize + addrsize;
>> +	parent = of_find_node_by_phandle(be32_to_cpup(imap));
>> +	if (!parent) {
>> +		pr_debug("%pOF : no imap parent found !\n", phb->dn);
>> +		return 0;
>> +	}
>> +
>> +	if (of_property_read_u32(parent, "#interrupt-cells", &parintsize)) {
>> +		pr_debug("%pOF : parent lacks #interrupt-cells!\n", phb->dn);
>> +		return 0;
>> +	}
>> +
>> +	if (of_property_read_u32(parent, "#address-cells", &paraddrsize))
>> +		paraddrsize = 0;
>> +
>> +	return imaplen / (addrsize + intsize + 1 + paraddrsize + parintsize);
>> +}
>> +
>> +static void pcibios_irq_map_init(struct pci_controller *phb)
>> +{
>> +	phb->irq_count = pcibios_irq_map_count(phb);
>> +	if (phb->irq_count < PCI_NUM_INTX)
>> +		phb->irq_count = PCI_NUM_INTX;
>> +
>> +	pr_debug("%pOF : interrupt map #%d\n", phb->dn, phb->irq_count);
>> +
>> +	phb->irq_map = kcalloc(phb->irq_count, sizeof(unsigned int),
>> +			       GFP_KERNEL);
>> +}
>> +
>> +static void pci_irq_map_register(struct pci_dev *pdev, unsigned int virq)
>> +{
>> +	struct pci_controller *phb = pci_bus_to_host(pdev->bus);
>> +	int i;
>> +
>> +	if (!phb->irq_map)
>> +		return;
>> +
>> +	for (i = 0; i < phb->irq_count; i++) {
>> +		/*
>> +		 * Look for an empty or an equivalent slot, as INTx
>> +		 * interrupts can be shared between adapters.
>> +		 */
>> +		if (phb->irq_map[i] == virq || !phb->irq_map[i]) {
>> +			phb->irq_map[i] = virq;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (i == phb->irq_count)
>> +		pr_err("PCI:%s all platform interrupts mapped\n",
>> +		       pci_name(pdev));
>> +}
>> +
>> +/*
>> + * Clearing the mapped interrupts will also clear the underlying
>> + * mappings of the ESB pages of the interrupts when under XIVE. It is
>> + * a requirement of PowerVM to clear all memory mappings before
>> + * removing a PHB.
>> + */
>> +static void pci_irq_map_dispose(struct pci_bus *bus)
>> +{
>> +	struct pci_controller *phb = pci_bus_to_host(bus);
>> +	int i;
>> +
>> +	if (!phb->irq_map)
>> +		return;
>> +
>> +	pr_debug("PCI: Clearing interrupt mappings for PHB %04x:%02x...\n",
>> +		 pci_domain_nr(bus), bus->number);
>> +	for (i = 0; i < phb->irq_count; i++)
>> +		irq_dispose_mapping(phb->irq_map[i]);
>> +
>> +	kfree(phb->irq_map);
>> +}
>> +
>> +void pcibios_remove_bus(struct pci_bus *bus)
>> +{
>> +	pci_irq_map_dispose(bus);
>> +}
>> +EXPORT_SYMBOL_GPL(pcibios_remove_bus);
>> +
>>  /*
>>   * Reads the interrupt pin to determine if interrupt is use by card.
>>   * If the interrupt is used, then gets the interrupt line from the
>> @@ -401,6 +510,8 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>>  
>>  	pci_dev->irq = virq;
>>  
>> +	/* Record all interrut mappings for later removal of a PHB */
>> +	pci_irq_map_register(pci_dev, virq);
>>  	return 0;
>>  }
>>  
>> @@ -1554,6 +1665,9 @@ void pcibios_scan_phb(struct pci_controller *hose)
>>  
>>  	pr_debug("PCI: Scanning PHB %pOF\n", node);
>>  
>> +	/* Allocate interrupt mappings array */
>> +	pcibios_irq_map_init(hose);
>> +
>>  	/* Get some IO space for the new PHB */
>>  	pcibios_setup_phb_io_space(hose);
>>  
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-09-23  7:06   ` Cédric Le Goater
@ 2020-09-24  5:11     ` Alexey Kardashevskiy
  2020-09-24 10:31       ` Cédric Le Goater
  2020-10-13 19:33     ` Qian Cai
  2020-10-14  2:55     ` Alexey Kardashevskiy
  2 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-24  5:11 UTC (permalink / raw)
  To: Cédric Le Goater, Qian Cai, Michael Ellerman
  Cc: linuxppc-dev, linux-next, Oliver O'Halloran, linux-kernel,
	Stephen Rothwell



On 23/09/2020 17:06, Cédric Le Goater wrote:
> On 9/23/20 2:33 AM, Qian Cai wrote:
>> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>>> When a passthrough IO adapter is removed from a pseries machine using
>>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>>> guest OS to clear all page table entries related to the adapter. If
>>> some are still present, the RTAS call which isolates the PCI slot
>>> returns error 9001 "valid outstanding translations" and the removal of
>>> the IO adapter fails. This is because when the PHBs are scanned, Linux
>>> maps automatically the INTx interrupts in the Linux interrupt number
>>> space but these are never removed.
>>>
>>> To solve this problem, we introduce a PPC platform specific
>>> pcibios_remove_bus() routine which clears all interrupt mappings when
>>> the bus is removed. This also clears the associated page table entries
>>> of the ESB pages when using XIVE.
>>>
>>> For this purpose, we record the logical interrupt numbers of the
>>> mapped interrupt under the PHB structure and let pcibios_remove_bus()
>>> do the clean up.
>>>
>>> Since some PCI adapters, like GPUs, use the "interrupt-map" property
>>> to describe interrupt mappings other than the legacy INTx interrupts,
>>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>>> number of interrupt mappings is computed from the "interrupt-map"
>>> property and the mapping array is allocated accordingly.
>>>
>>> Cc: "Oliver O'Halloran" <oohall@gmail.com>
>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>
>> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
>> this patch.
>>
>> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> 
> OK. The patch is missing a NULL assignement after kfree() and that
> might be the issue. 
> 
> I did try PHB removal under PowerNV, so I would like to understand 
> how we managed to remove twice the PCI bus and possibly reproduce. 
> Any chance we could grab what the syscall fuzzer (syzkaller) did ? 



My guess would be it is doing this in parallel to provoke races.



-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-09-24  5:11     ` Alexey Kardashevskiy
@ 2020-09-24 10:31       ` Cédric Le Goater
  0 siblings, 0 replies; 12+ messages in thread
From: Cédric Le Goater @ 2020-09-24 10:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Qian Cai, Michael Ellerman
  Cc: linuxppc-dev, linux-next, Oliver O'Halloran, linux-kernel,
	Stephen Rothwell

On 9/24/20 7:11 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 23/09/2020 17:06, Cédric Le Goater wrote:
>> On 9/23/20 2:33 AM, Qian Cai wrote:
>>> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>>>> When a passthrough IO adapter is removed from a pseries machine using
>>>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>>>> guest OS to clear all page table entries related to the adapter. If
>>>> some are still present, the RTAS call which isolates the PCI slot
>>>> returns error 9001 "valid outstanding translations" and the removal of
>>>> the IO adapter fails. This is because when the PHBs are scanned, Linux
>>>> maps automatically the INTx interrupts in the Linux interrupt number
>>>> space but these are never removed.
>>>>
>>>> To solve this problem, we introduce a PPC platform specific
>>>> pcibios_remove_bus() routine which clears all interrupt mappings when
>>>> the bus is removed. This also clears the associated page table entries
>>>> of the ESB pages when using XIVE.
>>>>
>>>> For this purpose, we record the logical interrupt numbers of the
>>>> mapped interrupt under the PHB structure and let pcibios_remove_bus()
>>>> do the clean up.
>>>>
>>>> Since some PCI adapters, like GPUs, use the "interrupt-map" property
>>>> to describe interrupt mappings other than the legacy INTx interrupts,
>>>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>>>> number of interrupt mappings is computed from the "interrupt-map"
>>>> property and the mapping array is allocated accordingly.
>>>>
>>>> Cc: "Oliver O'Halloran" <oohall@gmail.com>
>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>
>>> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
>>> this patch.
>>>
>>> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
>>
>> OK. The patch is missing a NULL assignement after kfree() and that
>> might be the issue. 
>>
>> I did try PHB removal under PowerNV, so I would like to understand 
>> how we managed to remove twice the PCI bus and possibly reproduce. 
>> Any chance we could grab what the syscall fuzzer (syzkaller) did ? 
> 
> 
> 
> My guess would be it is doing this in parallel to provoke races.

Concurrency removal and rescan should be controlled by : 

   pci_stop_and_remove_bus_device_locked()
       pci_lock_rescan_remove() 

And, in the report, the stack traces are on the same CPU and PID. 



What I think is happening is that we did a couple of remove/rescan
of the same PHB. The problem is that ->irq_map is not reallocated
the second time because the PHB is re-scanned differently on the 
PowerNV platform. At the second remove, the ->irq_map being not NULL, 
we try to kfree it again and the allocator warns of a double free :/

This works fine on pseries/KVM because the PHB is never removed and 
on pseries/pHyp, pcibios_scan_phb() is called at PHB hotplug. But on 
PowerNV, pcibios_scan_phb() is only called at probe/boot time and 
not at hotplug time :/

I was a good thing to spot that before merge. But I need to revise 
that patch again.

Thanks,

C.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-09-23  7:06   ` Cédric Le Goater
  2020-09-24  5:11     ` Alexey Kardashevskiy
@ 2020-10-13 19:33     ` Qian Cai
  2020-10-13 23:42       ` Michael Ellerman
  2020-10-14  2:55     ` Alexey Kardashevskiy
  2 siblings, 1 reply; 12+ messages in thread
From: Qian Cai @ 2020-10-13 19:33 UTC (permalink / raw)
  To: Cédric Le Goater, Michael Ellerman
  Cc: Stephen Rothwell, Alexey Kardashevskiy, linux-kernel, linux-next,
	Oliver O'Halloran, linuxppc-dev

On Wed, 2020-09-23 at 09:06 +0200, Cédric Le Goater wrote:
> On 9/23/20 2:33 AM, Qian Cai wrote:
> > On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
> > > When a passthrough IO adapter is removed from a pseries machine using
> > > hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
> > > guest OS to clear all page table entries related to the adapter. If
> > > some are still present, the RTAS call which isolates the PCI slot
> > > returns error 9001 "valid outstanding translations" and the removal of
> > > the IO adapter fails. This is because when the PHBs are scanned, Linux
> > > maps automatically the INTx interrupts in the Linux interrupt number
> > > space but these are never removed.
> > > 
> > > To solve this problem, we introduce a PPC platform specific
> > > pcibios_remove_bus() routine which clears all interrupt mappings when
> > > the bus is removed. This also clears the associated page table entries
> > > of the ESB pages when using XIVE.
> > > 
> > > For this purpose, we record the logical interrupt numbers of the
> > > mapped interrupt under the PHB structure and let pcibios_remove_bus()
> > > do the clean up.
> > > 
> > > Since some PCI adapters, like GPUs, use the "interrupt-map" property
> > > to describe interrupt mappings other than the legacy INTx interrupts,
> > > we can not restrict the size of the mapping array to PCI_NUM_INTX. The
> > > number of interrupt mappings is computed from the "interrupt-map"
> > > property and the mapping array is allocated accordingly.
> > > 
> > > Cc: "Oliver O'Halloran" <oohall@gmail.com>
> > > Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > 
> > Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed
> > to
> > this patch.
> > 
> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> 
> OK. The patch is missing a NULL assignement after kfree() and that
> might be the issue. 
> 
> I did try PHB removal under PowerNV, so I would like to understand 
> how we managed to remove twice the PCI bus and possibly reproduce. 
> Any chance we could grab what the syscall fuzzer (syzkaller) did ? 

Any update on this? Maybe Michael or Stephen could drop this for now, so our
fuzzing could continue to find something else new?

It can still be reproduced on today's linux-next. BTW, this is running trinity
from an unprivileged user. This is the snapshot of the each fuzzing thread when
this happens.

http://people.redhat.com/qcai/pcibios_remove_bus/trinity-post-mortem.log

It can be reproduced by simply keep running this for a while:

$ trinity -C <total number of CPUs> --arch 64

[19611.946827][T1717146] pci_bus 0035:03: busn_res: [bus 03-07] is released
[19611.950956][T1717146] pci_bus 0035:08: busn_res: [bus 08-0c] is released
[19611.951260][T1717146] =============================================================================
[19611.952336][T1717146] BUG kmalloc-16 (Tainted: G        W  O     ): Object already free
[19611.952365][T1717146] -----------------------------------------------------------------------------
[19611.952365][T1717146] 
[19611.952411][T1717146] Disabling lock debugging due to kernel taint
[19611.952438][T1717146] INFO: Allocated in pcibios_scan_phb+0x104/0x3e0 age=1960714 cpu=4 pid=1
[19611.952481][T1717146] 	__slab_alloc+0xa4/0xf0
[19611.952500][T1717146] 	__kmalloc+0x294/0x330
[19611.952519][T1717146] 	pcibios_scan_phb+0x104/0x3e0
[19611.952549][T1717146] 	pcibios_init+0x84/0x124
[19611.952578][T1717146] 	do_one_initcall+0xac/0x528
[19611.952599][T1717146] 	kernel_init_freeable+0x35c/0x3fc
[19611.952618][T1717146] 	kernel_init+0x24/0x148
[19611.952646][T1717146] 	ret_from_kernel_thread+0x5c/0x80
[19611.952665][T1717146] INFO: Freed in pcibios_remove_bus+0x70/0x90 age=0 cpu=16 pid=1717146
[19611.952691][T1717146] 	kfree+0x49c/0x510
[19611.952700][T1717146] 	pcibios_remove_bus+0x70/0x90
[19611.952711][T1717146] 	pci_remove_bus+0xe4/0x110
[19611.952730][T1717146] 	pci_remove_bus_device+0x74/0x170
[19611.952749][T1717146] 	pci_remove_bus_device+0x4c/0x170
[19611.952768][T1717146] 	pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.952798][T1717146] 	remove_store+0xc0/0xe0
[19611.952819][T1717146] 	dev_attr_store+0x30/0x50
[19611.952852][T1717146] 	sysfs_kf_write+0x68/0xb0
[19611.952870][T1717146] 	kernfs_fop_write+0x114/0x260
[19611.952904][T1717146] 	vfs_write+0xe4/0x260
[19611.952922][T1717146] 	ksys_write+0x74/0x130
[19611.952951][T1717146] 	system_call_exception+0xf8/0x1d0
[19611.952970][T1717146] 	system_call_common+0xe8/0x218
[19611.952990][T1717146] INFO: Slab 0x0000000099caaf22 objects=178 used=174 fp=0x00000000006a64b0 flags=0x7fff8000000201
[19611.953004][T1717146] INFO: Object 0x00000000f360132d @offset=30192 fp=0x0000000000000000
[19611.953004][T1717146] 
[19611.953048][T1717146] Redzone 00000000acef7298: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
[19611.953080][T1717146] Object 00000000f360132d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
[19611.953114][T1717146] Redzone 0000000083758aaa: bb bb bb bb bb bb bb bb                          ........
[19611.953146][T1717146] Padding 00000000cbb228a2: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
[19611.953189][T1717146] CPU: 16 PID: 1717146 Comm: trinity-c8 Tainted: G    B   W  O      5.9.0-next-20201013 #1
[19611.953223][T1717146] Call Trace:
[19611.953242][T1717146] [c000200022557800] [c00000000064c208] dump_stack+0xec/0x144 (unreliable)
[19611.953291][T1717146] [c000200022557840] [c000000000363688] print_trailer+0x278/0x2a0
[19611.953323][T1717146] [c0002000225578d0] [c00000000035aa8c] free_debug_processing+0x57c/0x600
[19611.953356][T1717146] [c0002000225579b0] [c00000000035af24] __slab_free+0x414/0x5b0
[19611.953391][T1717146] [c000200022557a80] [c00000000035b55c] kfree+0x49c/0x510
[19611.953423][T1717146] [c000200022557b10] [c0000000000432a0] pcibios_remove_bus+0x70/0x90
pci_irq_map_dispose at arch/powerpc/kernel/pci-common.c:456
(inlined by) pcibios_remove_bus at arch/powerpc/kernel/pci-common.c:461
[19611.953454][T1717146] [c000200022557b40] [c000000000677f94] pci_remove_bus+0xe4/0x110
[19611.953477][T1717146] [c000200022557b70] [c000000000678134] pci_remove_bus_device+0x74/0x170
[19611.953510][T1717146] [c000200022557bb0] [c000000000678120] pci_remove_bus_device+0x60/0x170
[19611.953543][T1717146] [c000200022557bf0] [c0000000006782a4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.953567][T1717146] [c000200022557c20] [c000000000687690] remove_store+0xc0/0xe0
[19611.953599][T1717146] [c000200022557c70] [c0000000006e5320] dev_attr_store+0x30/0x50
[19611.953621][T1717146] [c000200022557c90] [c0000000004a53b8] sysfs_kf_write+0x68/0xb0
[19611.953652][T1717146] [c000200022557cd0] [c0000000004a45e4] kernfs_fop_write+0x114/0x260
[19611.953684][T1717146] [c000200022557d20] [c0000000003aff74] vfs_write+0xe4/0x260
[19611.953717][T1717146] [c000200022557d70] [c0000000003b02a4] ksys_write+0x74/0x130
[19611.953762][T1717146] [c000200022557dc0] [c00000000002a3e8] system_call_exception+0xf8/0x1d0
[19611.953795][T1717146] [c000200022557e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[19611.953821][T1717146] FIX kmalloc-16: Object at 0x00000000f360132d not freed
[19611.954111][T1717146] =============================================================================
[19611.954144][T1717146] BUG kmalloc-16 (Tainted: G    B   W  O     ): Wrong object count. Counter is 174 but counted were 176
[19611.954176][T1717146] -----------------------------------------------------------------------------
[19611.954176][T1717146] 
[19611.954221][T1717146] INFO: Slab 0x0000000099caaf22 objects=178 used=174 fp=0x00000000006a64b0 flags=0x7fff8000000201
[19611.954237][T1717146] CPU: 16 PID: 1717146 Comm: trinity-c8 Tainted: G    B   W  O      5.9.0-next-20201013 #1
[19611.954269][T1717146] Call Trace:
[19611.954286][T1717146] [c0002000225576f0] [c00000000064c208] dump_stack+0xec/0x144 (unreliable)
[19611.954329][T1717146] [c000200022557730] [c000000000363368] slab_err+0x78/0xb0
[19611.954364][T1717146] [c000200022557810] [c000000000359f94] on_freelist+0x364/0x390
[19611.954390][T1717146] [c0002000225578b0] [c00000000035a798] free_debug_processing+0x288/0x600
[19611.954428][T1717146] [c000200022557990] [c00000000035af24] __slab_free+0x414/0x5b0
[19611.954459][T1717146] [c000200022557a60] [c00000000035b55c] kfree+0x49c/0x510
[19611.954507][T1717146] [c000200022557af0] [c0000000002bd5a0] kfree_const+0x60/0x80
[19611.954540][T1717146] [c000200022557b10] [c0000000006553ec] kobject_release+0x7c/0xd0
[19611.954562][T1717146] [c000200022557b50] [c0000000006e66c0] put_device+0x20/0x40
[19611.954594][T1717146] [c000200022557b70] [c00000000067820c] pci_remove_bus_device+0x14c/0x170
[19611.954627][T1717146] [c000200022557bb0] [c000000000678120] pci_remove_bus_device+0x60/0x170
[19611.954652][T1717146] [c000200022557bf0] [c0000000006782a4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.954686][T1717146] [c000200022557c20] [c000000000687690] remove_store+0xc0/0xe0
[19611.954717][T1717146] [c000200022557c70] [c0000000006e5320] dev_attr_store+0x30/0x50
[19611.954749][T1717146] [c000200022557c90] [c0000000004a53b8] sysfs_kf_write+0x68/0xb0
[19611.954784][T1717146] [c000200022557cd0] [c0000000004a45e4] kernfs_fop_write+0x114/0x260
[19611.954884][T1717146] [c000200022557d20] [c0000000003aff74] vfs_write+0xe4/0x260
[19611.954972][T1717146] [c000200022557d70] [c0000000003b02a4] ksys_write+0x74/0x130
[19611.955050][T1717146] [c000200022557dc0] [c00000000002a3e8] system_call_exception+0xf8/0x1d0
[19611.955144][T1717146] [c000200022557e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[19611.955228][T1717146] FIX kmalloc-16: Object count adjusted.
[19611.955300][T1717146] pci_bus 0035:0d: busn_res: [bus 0d-11] is released
[19611.955394][T1717146] =============================================================================
[19611.955493][T1717146] BUG kmalloc-16 (Tainted: G    B   W  O     ): Object already free
[19611.955572][T1717146] -----------------------------------------------------------------------------
[19611.955572][T1717146] 
[19611.955732][T1717146] INFO: Allocated in pcibios_scan_phb+0x104/0x3e0 age=1960715 cpu=4 pid=1
[19611.955847][T1717146] 	__slab_alloc+0xa4/0xf0
[19611.955902][T1717146] 	__kmalloc+0x294/0x330
[19611.955948][T1717146] 	pcibios_scan_phb+0x104/0x3e0
[19611.955994][T1717146] 	pcibios_init+0x84/0x124
[19611.956064][T1717146] 	do_one_initcall+0xac/0x528
[19611.956101][T1717146] 	kernel_init_freeable+0x35c/0x3fc
[19611.956164][T1717146] 	kernel_init+0x24/0x148
[19611.956215][T1717146] 	ret_from_kernel_thread+0x5c/0x80
[19611.956283][T1717146] INFO: Freed in pcibios_remove_bus+0x70/0x90 age=1 cpu=16 pid=1717146
[19611.956385][T1717146] 	kfree+0x49c/0x510
[19611.956419][T1717146] 	pcibios_remove_bus+0x70/0x90
[19611.956481][T1717146] 	pci_remove_bus+0xe4/0x110
[19611.956532][T1717146] 	pci_remove_bus_device+0x74/0x170
[19611.956608][T1717146] 	pci_remove_bus_device+0x4c/0x170
[19611.956652][T1717146] 	pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.956722][T1717146] 	remove_store+0xc0/0xe0
[19611.956793][T1717146] 	dev_attr_store+0x30/0x50
[19611.956850][T1717146] 	sysfs_kf_write+0x68/0xb0
[19611.956914][T1717146] 	kernfs_fop_write+0x114/0x260
[19611.956964][T1717146] 	vfs_write+0xe4/0x260
[19611.957009][T1717146] 	ksys_write+0x74/0x130
[19611.957055][T1717146] 	system_call_exception+0xf8/0x1d0
[19611.957101][T1717146] 	system_call_common+0xe8/0x218
[19611.957173][T1717146] INFO: Slab 0x0000000099caaf22 objects=178 used=175 fp=0x00000000f4222fd7 flags=0x7fff8000000201
[19611.957304][T1717146] INFO: Object 0x00000000f360132d @offset=30192 fp=0x0000000000000000
[19611.957304][T1717146] 
[19611.957429][T1717146] Redzone 00000000acef7298: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
[19611.957543][T1717146] Object 00000000f360132d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
[19611.957684][T1717146] Redzone 0000000083758aaa: bb bb bb bb bb bb bb bb                          ........
[19611.957781][T1717146] Padding 00000000cbb228a2: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
[19611.957912][T1717146] CPU: 16 PID: 1717146 Comm: trinity-c8 Tainted: G    B   W  O      5.9.0-next-20201013 #1
[19611.958033][T1717146] Call Trace:
[19611.958085][T1717146] [c000200022557800] [c00000000064c208] dump_stack+0xec/0x144 (unreliable)
[19611.958182][T1717146] [c000200022557840] [c000000000363688] print_trailer+0x278/0x2a0
[19611.958261][T1717146] [c0002000225578d0] [c00000000035aa8c] free_debug_processing+0x57c/0x600
[19611.958385][T1717146] [c0002000225579b0] [c00000000035af24] __slab_free+0x414/0x5b0
[19611.958486][T1717146] [c000200022557a80] [c00000000035b55c] kfree+0x49c/0x510
[19611.958555][T1717146] [c000200022557b10] [c0000000000432a0] pcibios_remove_bus+0x70/0x90
[19611.958665][T1717146] [c000200022557b40] [c000000000677f94] pci_remove_bus+0xe4/0x110
[19611.958745][T1717146] [c000200022557b70] [c000000000678134] pci_remove_bus_device+0x74/0x170
[19611.958842][T1717146] [c000200022557bb0] [c000000000678120] pci_remove_bus_device+0x60/0x170
[19611.958953][T1717146] [c000200022557bf0] [c0000000006782a4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.959062][T1717146] [c000200022557c20] [c000000000687690] remove_store+0xc0/0xe0
[19611.959142][T1717146] [c000200022557c70] [c0000000006e5320] dev_attr_store+0x30/0x50
[19611.959242][T1717146] [c000200022557c90] [c0000000004a53b8] sysfs_kf_write+0x68/0xb0
[19611.959323][T1717146] [c000200022557cd0] [c0000000004a45e4] kernfs_fop_write+0x114/0x260
[19611.959425][T1717146] [c000200022557d20] [c0000000003aff74] vfs_write+0xe4/0x260
[19611.959506][T1717146] [c000200022557d70] [c0000000003b02a4] ksys_write+0x74/0x130
[19611.959613][T1717146] [c000200022557dc0] [c00000000002a3e8] system_call_exception+0xf8/0x1d0
[19611.959713][T1717146] [c000200022557e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[19611.959819][T1717146] FIX kmalloc-16: Object at 0x00000000f360132d not freed
[19611.960653][T1717146] pci 0035:02     : [PE# fc] Releasing PE
[19611.960831][T1717146] pci_bus 0035:02: busn_res: [bus 02-11] is released
[19611.960913][T1717146] =============================================================================
[19611.960934][T1717146] BUG kmalloc-16 (Tainted: G    B   W  O     ): Object already free
[19611.960954][T1717146] -----------------------------------------------------------------------------
[19611.960954][T1717146] 
[19611.960991][T1717146] INFO: Allocated in pcibios_scan_phb+0x104/0x3e0 age=1960715 cpu=4 pid=1
[19611.961024][T1717146] 	__slab_alloc+0xa4/0xf0
[19611.961052][T1717146] 	__kmalloc+0x294/0x330
[19611.961070][T1717146] 	pcibios_scan_phb+0x104/0x3e0
[19611.961089][T1717146] 	pcibios_init+0x84/0x124
[19611.961108][T1717146] 	do_one_initcall+0xac/0x528
[19611.961169][T1717146] 	kernel_init_freeable+0x35c/0x3fc
[19611.961213][T1717146] 	kernel_init+0x24/0x148
[19611.961276][T1717146] 	ret_from_kernel_thread+0x5c/0x80
[19611.961321][T1717146] INFO: Freed in pcibios_remove_bus+0x70/0x90 age=1 cpu=16 pid=1717146
[19611.961441][T1717146] 	kfree+0x49c/0x510
[19611.961497][T1717146] 	pcibios_remove_bus+0x70/0x90
[19611.961554][T1717146] 	pci_remove_bus+0xe4/0x110
[19611.961621][T1717146] 	pci_remove_bus_device+0x74/0x170
[19611.961670][T1717146] 	pci_remove_bus_device+0x4c/0x170
[19611.961730][T1717146] 	pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.961810][T1717146] 	remove_store+0xc0/0xe0
[19611.961855][T1717146] 	dev_attr_store+0x30/0x50
[19611.961912][T1717146] 	sysfs_kf_write+0x68/0xb0
[19611.961965][T1717146] 	kernfs_fop_write+0x114/0x260
[19611.962017][T1717146] 	vfs_write+0xe4/0x260
[19611.962071][T1717146] 	ksys_write+0x74/0x130
[19611.962127][T1717146] 	system_call_exception+0xf8/0x1d0
[19611.962194][T1717146] 	system_call_common+0xe8/0x218
[19611.962239][T1717146] INFO: Slab 0x0000000099caaf22 objects=178 used=174 fp=0x00000000253d72f3 flags=0x7fff8000000201
[19611.962365][T1717146] INFO: Object 0x00000000f360132d @offset=30192 fp=0x0000000000000000
[19611.962365][T1717146] 
[19611.962501][T1717146] Redzone 00000000acef7298: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
[19611.962628][T1717146] Object 00000000f360132d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
[19611.962729][T1717146] Redzone 0000000083758aaa: bb bb bb bb bb bb bb bb                          ........
[19611.962836][T1717146] Padding 00000000cbb228a2: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
[19611.962962][T1717146] CPU: 16 PID: 1717146 Comm: trinity-c8 Tainted: G    B   W  O      5.9.0-next-20201013 #1
[19611.963077][T1717146] Call Trace:
[19611.963113][T1717146] [c000200022557840] [c00000000064c208] dump_stack+0xec/0x144 (unreliable)
[19611.963210][T1717146] [c000200022557880] [c000000000363688] print_trailer+0x278/0x2a0
[19611.963300][T1717146] [c000200022557910] [c00000000035aa8c] free_debug_processing+0x57c/0x600
[19611.963395][T1717146] [c0002000225579f0] [c00000000035af24] __slab_free+0x414/0x5b0
[19611.963490][T1717146] [c000200022557ac0] [c00000000035b55c] kfree+0x49c/0x510
[19611.963585][T1717146] [c000200022557b50] [c0000000000432a0] pcibios_remove_bus+0x70/0x90
[19611.963710][T1717146] [c000200022557b80] [c000000000677f94] pci_remove_bus+0xe4/0x110
[19611.963788][T1717146] [c000200022557bb0] [c000000000678134] pci_remove_bus_device+0x74/0x170
[19611.963883][T1717146] [c000200022557bf0] [c0000000006782a4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.963988][T1717146] [c000200022557c20] [c000000000687690] remove_store+0xc0/0xe0
[19611.964102][T1717146] [c000200022557c70] [c0000000006e5320] dev_attr_store+0x30/0x50
[19611.964189][T1717146] [c000200022557c90] [c0000000004a53b8] sysfs_kf_write+0x68/0xb0
[19611.964302][T1717146] [c000200022557cd0] [c0000000004a45e4] kernfs_fop_write+0x114/0x260
[19611.964390][T1717146] [c000200022557d20] [c0000000003aff74] vfs_write+0xe4/0x260
[19611.964493][T1717146] [c000200022557d70] [c0000000003b02a4] ksys_write+0x74/0x130
[19611.964541][T1717146] [c000200022557dc0] [c00000000002a3e8] system_call_exception+0xf8/0x1d0
[19611.964747][T1717146] [c000200022557e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[19611.964851][T1717146] FIX kmalloc-16: Object at 0x00000000f360132d not freed
[19611.966211][T1717146] pci 0035:01     : [PE# fd] Releasing PE


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-10-13 19:33     ` Qian Cai
@ 2020-10-13 23:42       ` Michael Ellerman
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Ellerman @ 2020-10-13 23:42 UTC (permalink / raw)
  To: Qian Cai, Cédric Le Goater
  Cc: Stephen Rothwell, Alexey Kardashevskiy, linux-kernel, linux-next,
	Oliver O'Halloran, linuxppc-dev

Qian Cai <cai@redhat.com> writes:
> On Wed, 2020-09-23 at 09:06 +0200, Cédric Le Goater wrote:
>> On 9/23/20 2:33 AM, Qian Cai wrote:
>> > On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>> > > When a passthrough IO adapter is removed from a pseries machine using
>> > > hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>> > > guest OS to clear all page table entries related to the adapter. If
>> > > some are still present, the RTAS call which isolates the PCI slot
>> > > returns error 9001 "valid outstanding translations" and the removal of
>> > > the IO adapter fails. This is because when the PHBs are scanned, Linux
>> > > maps automatically the INTx interrupts in the Linux interrupt number
>> > > space but these are never removed.
>> > > 
>> > > To solve this problem, we introduce a PPC platform specific
>> > > pcibios_remove_bus() routine which clears all interrupt mappings when
>> > > the bus is removed. This also clears the associated page table entries
>> > > of the ESB pages when using XIVE.
>> > > 
>> > > For this purpose, we record the logical interrupt numbers of the
>> > > mapped interrupt under the PHB structure and let pcibios_remove_bus()
>> > > do the clean up.
>> > > 
>> > > Since some PCI adapters, like GPUs, use the "interrupt-map" property
>> > > to describe interrupt mappings other than the legacy INTx interrupts,
>> > > we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>> > > number of interrupt mappings is computed from the "interrupt-map"
>> > > property and the mapping array is allocated accordingly.
>> > > 
>> > > Cc: "Oliver O'Halloran" <oohall@gmail.com>
>> > > Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> > 
>> > Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed
>> > to
>> > this patch.
>> > 
>> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
>> 
>> OK. The patch is missing a NULL assignement after kfree() and that
>> might be the issue. 
>> 
>> I did try PHB removal under PowerNV, so I would like to understand 
>> how we managed to remove twice the PCI bus and possibly reproduce. 
>> Any chance we could grab what the syscall fuzzer (syzkaller) did ? 
>
> Any update on this? Maybe Michael or Stephen could drop this for now, so our
> fuzzing could continue to find something else new?

Someone send me a revert?

cheers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-09-23  7:06   ` Cédric Le Goater
  2020-09-24  5:11     ` Alexey Kardashevskiy
  2020-10-13 19:33     ` Qian Cai
@ 2020-10-14  2:55     ` Alexey Kardashevskiy
  2020-11-02 14:39       ` Cédric Le Goater
  2 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2020-10-14  2:55 UTC (permalink / raw)
  To: Cédric Le Goater, Qian Cai, Michael Ellerman
  Cc: linuxppc-dev, linux-next, Oliver O'Halloran, linux-kernel,
	Stephen Rothwell



On 23/09/2020 17:06, Cédric Le Goater wrote:
> On 9/23/20 2:33 AM, Qian Cai wrote:
>> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>>> When a passthrough IO adapter is removed from a pseries machine using
>>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>>> guest OS to clear all page table entries related to the adapter. If
>>> some are still present, the RTAS call which isolates the PCI slot
>>> returns error 9001 "valid outstanding translations" and the removal of
>>> the IO adapter fails. This is because when the PHBs are scanned, Linux
>>> maps automatically the INTx interrupts in the Linux interrupt number
>>> space but these are never removed.
>>>
>>> To solve this problem, we introduce a PPC platform specific
>>> pcibios_remove_bus() routine which clears all interrupt mappings when
>>> the bus is removed. This also clears the associated page table entries
>>> of the ESB pages when using XIVE.
>>>
>>> For this purpose, we record the logical interrupt numbers of the
>>> mapped interrupt under the PHB structure and let pcibios_remove_bus()
>>> do the clean up.
>>>
>>> Since some PCI adapters, like GPUs, use the "interrupt-map" property
>>> to describe interrupt mappings other than the legacy INTx interrupts,
>>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>>> number of interrupt mappings is computed from the "interrupt-map"
>>> property and the mapping array is allocated accordingly.
>>>
>>> Cc: "Oliver O'Halloran" <oohall@gmail.com>
>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>
>> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
>> this patch.
>>
>> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> 
> OK. The patch is missing a NULL assignement after kfree() and that
> might be the issue.
> 
> I did try PHB removal under PowerNV, so I would like to understand
> how we managed to remove twice the PCI bus and possibly reproduce.
> Any chance we could grab what the syscall fuzzer (syzkaller) did ?


How do you remove PHBs exactly? There is no such thing in the powernv 
platform, I thought someone added this and you are fixing it but no. 
PHBs on powernv are created at the boot time and there is no way to 
remove them, you can only try removing all the bridges.

So what exactly are you doing?


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-10-14  2:55     ` Alexey Kardashevskiy
@ 2020-11-02 14:39       ` Cédric Le Goater
  2020-11-03  4:40         ` Oliver O'Halloran
  0 siblings, 1 reply; 12+ messages in thread
From: Cédric Le Goater @ 2020-11-02 14:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Qian Cai, Michael Ellerman
  Cc: linuxppc-dev, linux-next, Oliver O'Halloran, linux-kernel,
	Stephen Rothwell

On 10/14/20 4:55 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 23/09/2020 17:06, Cédric Le Goater wrote:
>> On 9/23/20 2:33 AM, Qian Cai wrote:
>>> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>>>> When a passthrough IO adapter is removed from a pseries machine using
>>>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>>>> guest OS to clear all page table entries related to the adapter. If
>>>> some are still present, the RTAS call which isolates the PCI slot
>>>> returns error 9001 "valid outstanding translations" and the removal of
>>>> the IO adapter fails. This is because when the PHBs are scanned, Linux
>>>> maps automatically the INTx interrupts in the Linux interrupt number
>>>> space but these are never removed.
>>>>
>>>> To solve this problem, we introduce a PPC platform specific
>>>> pcibios_remove_bus() routine which clears all interrupt mappings when
>>>> the bus is removed. This also clears the associated page table entries
>>>> of the ESB pages when using XIVE.
>>>>
>>>> For this purpose, we record the logical interrupt numbers of the
>>>> mapped interrupt under the PHB structure and let pcibios_remove_bus()
>>>> do the clean up.
>>>>
>>>> Since some PCI adapters, like GPUs, use the "interrupt-map" property
>>>> to describe interrupt mappings other than the legacy INTx interrupts,
>>>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>>>> number of interrupt mappings is computed from the "interrupt-map"
>>>> property and the mapping array is allocated accordingly.
>>>>
>>>> Cc: "Oliver O'Halloran" <oohall@gmail.com>
>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>
>>> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
>>> this patch.
>>>
>>> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
>>
>> OK. The patch is missing a NULL assignement after kfree() and that
>> might be the issue.
>>
>> I did try PHB removal under PowerNV, so I would like to understand
>> how we managed to remove twice the PCI bus and possibly reproduce.
>> Any chance we could grab what the syscall fuzzer (syzkaller) did ?
> 
> 
> How do you remove PHBs exactly? There is no such thing in the powernv platform, I thought someone added this and you are fixing it but no. PHBs on powernv are created at the boot time and there is no way to remove them, you can only try removing all the bridges.

yes. I noticed that later when proposing the fix for the double 
free.

> So what exactly are you doing?

What you just said above, with the commands : 

  echo 1 >  /sys/devices/pci0031\:00/0031\:00\:00.0/remove
  echo 1 >  /sys/devices/pci0031\:00/pci_bus/0031\:00/rescan


C. 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
  2020-11-02 14:39       ` Cédric Le Goater
@ 2020-11-03  4:40         ` Oliver O'Halloran
  0 siblings, 0 replies; 12+ messages in thread
From: Oliver O'Halloran @ 2020-11-03  4:40 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Stephen Rothwell, Alexey Kardashevskiy,
	Linux Kernel Mailing List, Qian Cai, Linux-Next Mailing List,
	linuxppc-dev

On Tue, Nov 3, 2020 at 1:39 AM Cédric Le Goater <clg@kaod.org> wrote:
>
> On 10/14/20 4:55 AM, Alexey Kardashevskiy wrote:
> >
> > How do you remove PHBs exactly? There is no such thing in the powernv platform, I thought someone added this and you are fixing it but no. PHBs on powernv are created at the boot time and there is no way to remove them, you can only try removing all the bridges.
>
> yes. I noticed that later when proposing the fix for the double
> free.
>
> > So what exactly are you doing?
>
> What you just said above, with the commands :
>
>   echo 1 >  /sys/devices/pci0031\:00/0031\:00\:00.0/remove
>   echo 1 >  /sys/devices/pci0031\:00/pci_bus/0031\:00/rescan

Right, so that'll remove the root port device (and Bus 01 beneath it),
but the PHB itself is still there. If it was removed the root bus
would also disappear.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-03  5:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-07 10:18 [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed Cédric Le Goater
2020-09-09  6:16 ` Alexey Kardashevskiy
2020-09-17 11:27 ` Michael Ellerman
2020-09-23  0:33 ` Qian Cai
2020-09-23  7:06   ` Cédric Le Goater
2020-09-24  5:11     ` Alexey Kardashevskiy
2020-09-24 10:31       ` Cédric Le Goater
2020-10-13 19:33     ` Qian Cai
2020-10-13 23:42       ` Michael Ellerman
2020-10-14  2:55     ` Alexey Kardashevskiy
2020-11-02 14:39       ` Cédric Le Goater
2020-11-03  4:40         ` Oliver O'Halloran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).