linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Qian Cai <cai@redhat.com>
To: "Cédric Le Goater" <clg@kaod.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	linux-kernel@vger.kernel.org, linux-next@vger.kernel.org,
	Oliver O'Halloran <oohall@gmail.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
Date: Tue, 22 Sep 2020 20:33:00 -0400	[thread overview]
Message-ID: <9c5eca863c63e360662fae7597213e8927c2a885.camel@redhat.com> (raw)
In-Reply-To: <20200807101854.844619-1-clg@kaod.org>

On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
> When a passthrough IO adapter is removed from a pseries machine using
> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
> guest OS to clear all page table entries related to the adapter. If
> some are still present, the RTAS call which isolates the PCI slot
> returns error 9001 "valid outstanding translations" and the removal of
> the IO adapter fails. This is because when the PHBs are scanned, Linux
> maps automatically the INTx interrupts in the Linux interrupt number
> space but these are never removed.
> 
> To solve this problem, we introduce a PPC platform specific
> pcibios_remove_bus() routine which clears all interrupt mappings when
> the bus is removed. This also clears the associated page table entries
> of the ESB pages when using XIVE.
> 
> For this purpose, we record the logical interrupt numbers of the
> mapped interrupt under the PHB structure and let pcibios_remove_bus()
> do the clean up.
> 
> Since some PCI adapters, like GPUs, use the "interrupt-map" property
> to describe interrupt mappings other than the legacy INTx interrupts,
> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
> number of interrupt mappings is computed from the "interrupt-map"
> property and the mapping array is allocated accordingly.
> 
> Cc: "Oliver O'Halloran" <oohall@gmail.com>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
this patch.

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config

[ 3574.564109][  T965] ata1.00: disabled
[ 3574.580373][T151472] sd 0:0:0:0: [sdb] Synchronizing SCSI cache
[ 3574.581180][T151472] sd 0:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
[ 3574.581226][T151472] sd 0:0:0:0: [sdb] Stopping disk
[ 3574.581289][T151472] sd 0:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
[ 3574.611424][ T3019] Read-error on swap-device (254:1:849792)
[ 3574.611685][ T3019] Read-error on swap-device (254:1:914944)
[ 3574.611769][ T3019] Read-error on swap-device (254:1:915072)
[ 3574.611838][ T3019] Read-error on swap-device (254:1:915200)
[ 3574.611926][ T3019] Read-error on swap-device (254:1:915328)
[ 3574.612268][ T3084] Read-error on swap-device (254:1:792576)
[ 3574.612342][ T3084] Read-error on swap-device (254:1:792704)
[ 3574.612757][ T2362] Read-error on swap-device (254:1:957440)
[ 3574.612773][ T2905] Read-error on swap-device (254:1:784128)
[ 3574.613015][ T2362] Read-error on swap-device (254:1:957568)
[ 3574.613160][ T2905] Read-error on swap-device (254:1:784256)
[ 3574.613241][ T2362] Read-error on swap-device (254:1:957696)
[ 3574.613342][ T2362] Read-error on swap-device (254:1:957824)
[ 3574.614448][ T3019] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
[ 3574.614663][ T3019] Read-error on swap-device (254:1:961536)
[ 3574.675330][T151844] Read-error on swap-device (254:1:128)
[ 3574.675515][T151844] Read-error on swap-device (254:1:256)
[ 3574.675700][T151844] Read-error on swap-device (254:1:384)
[ 3574.703570][  T971] ata2.00: disabled
[ 3574.710393][T151472] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[ 3574.710864][T151472] sd 1:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
[ 3574.710922][T151472] sd 1:0:0:0: [sda] Stopping disk
[ 3574.711010][T151472] sd 1:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
[ 3574.826569][  T674] dm-0: writeback error on inode 68507862, offset 65536, sector 54281504
[ 3575.117547][ T3366] dm-0: writeback error on inode 68507851, offset 0, sector 54378880
[ 3575.140104][T151472] pci 0004:03:00.0: Removing from iommu group 3
[ 3575.141778][T151472] pci 0004:03     : [PE# fb] Releasing PE
[ 3575.141965][T151472] pci 0004:03     : [PE# fb] Removing DMA window #0
[ 3575.142452][T151472] pci 0004:03     : [PE# fb] Disabling 64-bit DMA bypass
[ 3575.149369][T151472] pci_bus 0004:03: busn_res: [bus 03] is released
[ 3575.150574][T152037] Read-error on swap-device (254:1:35584)
[ 3575.150713][T152037] Read-error on swap-device (254:1:35712)
[ 3575.152632][T152037] Read-error on swap-device (254:1:915584)
[ 3575.152706][T151472] pci_bus 0004:04: busn_res: [bus 04-08] is released
[ 3575.152983][T151472] =============================================================================
[ 3575.153937][T151472] BUG kmalloc-16 (Not tainted): Object already free
[ 3575.153962][T151472] -----------------------------------------------------------------------------
[ 3575.153962][T151472] 
[ 3575.154020][T151472] Disabling lock debugging due to kernel taint
[ 3575.154047][T151472] INFO: Allocated in pcibios_scan_phb+0x104/0x3e0 age=356904 cpu=4 pid=1
[ 3575.154084][T151472] 	__slab_alloc+0xa4/0xf0
[ 3575.154105][T151472] 	__kmalloc+0x294/0x330
[ 3575.154127][T151472] 	pcibios_scan_phb+0x104/0x3e0
[ 3575.154165][T151472] 	pcibios_init+0x84/0x124
[ 3575.154209][T151472] 	do_one_initcall+0xac/0x528
[ 3575.154241][T151472] 	kernel_init_freeable+0x35c/0x3fc
[ 3575.154272][T151472] 	kernel_init+0x24/0x148
[ 3575.154306][T151472] 	ret_from_kernel_thread+0x5c/0x80
[ 3575.154352][T151472] INFO: Freed in pcibios_remove_bus+0x70/0x90 age=0 cpu=7 pid=151472
[ 3575.154387][T151472] 	kfree+0x49c/0x510
[ 3575.154406][T151472] 	pcibios_remove_bus+0x70/0x90
[ 3575.154443][T151472] 	pci_remove_bus+0xe4/0x110
[ 3575.154467][T151472] 	pci_remove_bus_device+0x74/0x170
[ 3575.154503][T151472] 	pci_remove_bus_device+0x4c/0x170
[ 3575.154524][T151472] 	pci_stop_and_remove_bus_device_locked+0x34/0x50
[ 3575.154539][T151472] 	remove_store+0xc0/0xe0
[ 3575.154551][T151472] 	dev_attr_store+0x30/0x50
[ 3575.154573][T151472] 	sysfs_kf_write+0x68/0xb0
[ 3575.154595][T151472] 	kernfs_fop_write+0x114/0x260
[ 3575.154643][T151472] 	vfs_write+0xe4/0x260
[ 3575.154667][T151472] 	ksys_write+0x74/0x130
[ 3575.154692][T151472] 	system_call_exception+0xf8/0x1d0
[ 3575.154728][T151472] 	system_call_common+0xe8/0x218
[ 3575.154788][T151472] INFO: Slab 0x00000000cafdf25c objects=178 used=174 fp=0x0000000020a64b99 flags=0x7fff8000000201
[ 3575.154848][T151472] INFO: Object 0x00000000af116201 @offset=5168 fp=0x0000000000000000
[ 3575.154848][T151472] 
[ 3575.154901][T151472] Redzone 00000000781d3b96: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
[ 3575.154968][T151472] Object 00000000af116201: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
[ 3575.155012][T151472] Redzone 000000007b8ec00f: bb bb bb bb bb bb bb bb                          ........
[ 3575.155052][T151472] Padding 00000000df7d5e89: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
[ 3575.155107][T151472] CPU: 7 PID: 151472 Comm: trinity-c61 Tainted: G    B             5.9.0-rc6-next-20200922+ #2
[ 3575.155147][T151472] Call Trace:
[ 3575.155171][T151472] [c000001fefdef800] [c000000000644278] dump_stack+0xec/0x144 (unreliable)
[ 3575.155204][T151472] [c000001fefdef840] [c00000000035e9a8] print_trailer+0x278/0x2a0
[ 3575.155220][T151472] [c000001fefdef8d0] [c000000000355d9c] free_debug_processing+0x57c/0x600
[ 3575.155236][T151472] [c000001fefdef9b0] [c000000000356234] __slab_free+0x414/0x5b0
[ 3575.155251][T151472] [c000001fefdefa80] [c00000000035686c] kfree+0x49c/0x510
[ 3575.155267][T151472] [c000001fefdefb10] [c000000000043260] pcibios_remove_bus+0x70/0x90
pci_irq_map_dispose at arch/powerpc/kernel/pci-common.c:456
(inlined by) pcibios_remove_bus at arch/powerpc/kernel/pci-common.c:461
[ 3575.155282][T151472] [c000001fefdefb40] [c00000000066fee4] pci_remove_bus+0xe4/0x110
[ 3575.155309][T151472] [c000001fefdefb70] [c000000000670084] pci_remove_bus_device+0x74/0x170
[ 3575.155348][T151472] [c000001fefdefbb0] [c000000000670070] pci_remove_bus_device+0x60/0x170
[ 3575.155377][T151472] [c000001fefdefbf0] [c0000000006701f4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[ 3575.155395][T151472] [c000001fefdefc20] [c00000000067f5e0] remove_store+0xc0/0xe0
[ 3575.155431][T151472] [c000001fefdefc70] [c0000000006dccd0] dev_attr_store+0x30/0x50
[ 3575.155457][T151472] [c000001fefdefc90] [c00000000049fae8] sysfs_kf_write+0x68/0xb0
[ 3575.155507][T151472] [c000001fefdefcd0] [c00000000049ed14] kernfs_fop_write+0x114/0x260
[ 3575.155553][T151472] [c000001fefdefd20] [c0000000003ab2d4] vfs_write+0xe4/0x260
[ 3575.155592][T151472] [c000001fefdefd70] [c0000000003ab604] ksys_write+0x74/0x130
[ 3575.155630][T151472] [c000001fefdefdc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
[ 3575.155671][T151472] [c000001fefdefe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 3575.155715][T151472] FIX kmalloc-16: Object at 0x00000000af116201 not freed
[ 3575.156125][T151472] =============================================================================
[ 3575.156170][T151472] BUG kmalloc-16 (Tainted: G    B            ): Wrong object count. Counter is 174 but counted were 176
[ 3575.156204][T151472] -----------------------------------------------------------------------------
[ 3575.156204][T151472] 
[ 3575.156240][T151472] INFO: Slab 0x00000000cafdf25c objects=178 used=174 fp=0x0000000020a64b99 flags=0x7fff8000000201
[ 3575.156296][T151472] CPU: 7 PID: 151472 Comm: trinity-c61 Tainted: G    B             5.9.0-rc6-next-20200922+ #2
[ 3575.156348][T151472] Call Trace:
[ 3575.156376][T151472] [c000001fefdef6f0] [c000000000644278] dump_stack+0xec/0x144 (unreliable)
[ 3575.156439][T151472] [c000001fefdef730] [c00000000035e688] slab_err+0x78/0xb0
[ 3575.156502][T151472] [c000001fefdef810] [c0000000003552a4] on_freelist+0x364/0x390
[ 3575.156541][T151472] [c000001fefdef8b0] [c000000000355aa8] free_debug_processing+0x288/0x600
[ 3575.156598][T151472] [c000001fefdef990] [c000000000356234] __slab_free+0x414/0x5b0
[ 3575.156644][T151472] [c000001fefdefa60] [c00000000035686c] kfree+0x49c/0x510
[ 3575.156701][T151472] [c000001fefdefaf0] [c0000000002b8a10] kfree_const+0x60/0x80
[ 3575.156738][T151472] [c000001fefdefb10] [c00000000064d3bc] kobject_release+0x7c/0xd0
[ 3575.156784][T151472] [c000001fefdefb50] [c0000000006de0c0] put_device+0x20/0x40
[ 3575.156831][T151472] [c000001fefdefb70] [c00000000067015c] pci_remove_bus_device+0x14c/0x170
[ 3575.156867][T151472] [c000001fefdefbb0] [c000000000670070] pci_remove_bus_device+0x60/0x170
[ 3575.156923][T151472] [c000001fefdefbf0] [c0000000006701f4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[ 3575.156998][T151472] [c000001fefdefc20] [c00000000067f5e0] remove_store+0xc0/0xe0
[ 3575.157058][T151472] [c000001fefdefc70] [c0000000006dccd0] dev_attr_store+0x30/0x50
[ 3575.157114][T151472] [c000001fefdefc90] [c00000000049fae8] sysfs_kf_write+0x68/0xb0
[ 3575.157169][T151472] [c000001fefdefcd0] [c00000000049ed14] kernfs_fop_write+0x114/0x260
[ 3575.157215][T151472] [c000001fefdefd20] [c0000000003ab2d4] vfs_write+0xe4/0x260
[ 3575.157256][T151472] [c000001fefdefd70] [c0000000003ab604] ksys_write+0x74/0x130
[ 3575.157301][T151472] [c000001fefdefdc0] [c00000000002a458] system_call_exception+0xf8/0x1d0
[ 3575.157352][T151472] [c000001fefdefe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 3575.157396][T151472] FIX kmalloc-16: Object count adjusted.
[ 3575.157457][T151472] pci_bus 0004:09: busn_res: [bus 09-0d] is released


> ---
> 
>  Changes since v2:
> 
>  - merged 2 patches.
>  
>  arch/powerpc/include/asm/pci-bridge.h |   6 ++
>  arch/powerpc/kernel/pci-common.c      | 114 ++++++++++++++++++++++++++
>  2 files changed, 120 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pci-bridge.h
> b/arch/powerpc/include/asm/pci-bridge.h
> index b92e81b256e5..ca75cf264ddf 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -48,6 +48,9 @@ struct pci_controller_ops {
>  
>  /*
>   * Structure of a PCI controller (host bridge)
> + *
> + * @irq_count: number of interrupt mappings
> + * @irq_map: interrupt mappings
>   */
>  struct pci_controller {
>  	struct pci_bus *bus;
> @@ -127,6 +130,9 @@ struct pci_controller {
>  
>  	void *private_data;
>  	struct npu *npu;
> +
> +	unsigned int irq_count;
> +	unsigned int *irq_map;
>  };
>  
>  /* These are used for config access before all the PCI probing
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-
> common.c
> index be108616a721..deb831f0ae13 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -353,6 +353,115 @@ struct pci_controller
> *pci_find_controller_for_domain(int domain_nr)
>  	return NULL;
>  }
>  
> +/*
> + * Assumption is made on the interrupt parent. All interrupt-map
> + * entries are considered to have the same parent.
> + */
> +static int pcibios_irq_map_count(struct pci_controller *phb)
> +{
> +	const __be32 *imap;
> +	int imaplen;
> +	struct device_node *parent;
> +	u32 intsize, addrsize, parintsize, paraddrsize;
> +
> +	if (of_property_read_u32(phb->dn, "#interrupt-cells", &intsize))
> +		return 0;
> +	if (of_property_read_u32(phb->dn, "#address-cells", &addrsize))
> +		return 0;
> +
> +	imap = of_get_property(phb->dn, "interrupt-map", &imaplen);
> +	if (!imap) {
> +		pr_debug("%pOF : no interrupt-map\n", phb->dn);
> +		return 0;
> +	}
> +	imaplen /= sizeof(u32);
> +	pr_debug("%pOF : imaplen=%d\n", phb->dn, imaplen);
> +
> +	if (imaplen < (addrsize + intsize + 1))
> +		return 0;
> +
> +	imap += intsize + addrsize;
> +	parent = of_find_node_by_phandle(be32_to_cpup(imap));
> +	if (!parent) {
> +		pr_debug("%pOF : no imap parent found !\n", phb->dn);
> +		return 0;
> +	}
> +
> +	if (of_property_read_u32(parent, "#interrupt-cells", &parintsize)) {
> +		pr_debug("%pOF : parent lacks #interrupt-cells!\n", phb->dn);
> +		return 0;
> +	}
> +
> +	if (of_property_read_u32(parent, "#address-cells", &paraddrsize))
> +		paraddrsize = 0;
> +
> +	return imaplen / (addrsize + intsize + 1 + paraddrsize + parintsize);
> +}
> +
> +static void pcibios_irq_map_init(struct pci_controller *phb)
> +{
> +	phb->irq_count = pcibios_irq_map_count(phb);
> +	if (phb->irq_count < PCI_NUM_INTX)
> +		phb->irq_count = PCI_NUM_INTX;
> +
> +	pr_debug("%pOF : interrupt map #%d\n", phb->dn, phb->irq_count);
> +
> +	phb->irq_map = kcalloc(phb->irq_count, sizeof(unsigned int),
> +			       GFP_KERNEL);
> +}
> +
> +static void pci_irq_map_register(struct pci_dev *pdev, unsigned int virq)
> +{
> +	struct pci_controller *phb = pci_bus_to_host(pdev->bus);
> +	int i;
> +
> +	if (!phb->irq_map)
> +		return;
> +
> +	for (i = 0; i < phb->irq_count; i++) {
> +		/*
> +		 * Look for an empty or an equivalent slot, as INTx
> +		 * interrupts can be shared between adapters.
> +		 */
> +		if (phb->irq_map[i] == virq || !phb->irq_map[i]) {
> +			phb->irq_map[i] = virq;
> +			break;
> +		}
> +	}
> +
> +	if (i == phb->irq_count)
> +		pr_err("PCI:%s all platform interrupts mapped\n",
> +		       pci_name(pdev));
> +}
> +
> +/*
> + * Clearing the mapped interrupts will also clear the underlying
> + * mappings of the ESB pages of the interrupts when under XIVE. It is
> + * a requirement of PowerVM to clear all memory mappings before
> + * removing a PHB.
> + */
> +static void pci_irq_map_dispose(struct pci_bus *bus)
> +{
> +	struct pci_controller *phb = pci_bus_to_host(bus);
> +	int i;
> +
> +	if (!phb->irq_map)
> +		return;
> +
> +	pr_debug("PCI: Clearing interrupt mappings for PHB %04x:%02x...\n",
> +		 pci_domain_nr(bus), bus->number);
> +	for (i = 0; i < phb->irq_count; i++)
> +		irq_dispose_mapping(phb->irq_map[i]);
> +
> +	kfree(phb->irq_map);
> +}
> +
> +void pcibios_remove_bus(struct pci_bus *bus)
> +{
> +	pci_irq_map_dispose(bus);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_remove_bus);
> +
>  /*
>   * Reads the interrupt pin to determine if interrupt is use by card.
>   * If the interrupt is used, then gets the interrupt line from the
> @@ -401,6 +510,8 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>  
>  	pci_dev->irq = virq;
>  
> +	/* Record all interrut mappings for later removal of a PHB */
> +	pci_irq_map_register(pci_dev, virq);
>  	return 0;
>  }
>  
> @@ -1554,6 +1665,9 @@ void pcibios_scan_phb(struct pci_controller *hose)
>  
>  	pr_debug("PCI: Scanning PHB %pOF\n", node);
>  
> +	/* Allocate interrupt mappings array */
> +	pcibios_irq_map_init(hose);
> +
>  	/* Get some IO space for the new PHB */
>  	pcibios_setup_phb_io_space(hose);
>  


  parent reply	other threads:[~2020-09-23  0:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-07 10:18 [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed Cédric Le Goater
2020-09-09  6:16 ` Alexey Kardashevskiy
2020-09-17 11:27 ` Michael Ellerman
2020-09-23  0:33 ` Qian Cai [this message]
2020-09-23  7:06   ` Cédric Le Goater
2020-09-24  5:11     ` Alexey Kardashevskiy
2020-09-24 10:31       ` Cédric Le Goater
2020-10-13 19:33     ` Qian Cai
2020-10-13 23:42       ` Michael Ellerman
2020-10-14  2:55     ` Alexey Kardashevskiy
2020-11-02 14:39       ` Cédric Le Goater
2020-11-03  4:40         ` Oliver O'Halloran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c5eca863c63e360662fae7597213e8927c2a885.camel@redhat.com \
    --to=cai@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=clg@kaod.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=oohall@gmail.com \
    --cc=sfr@canb.auug.org.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).