All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Alistair Popple <alistair@popple.id.au>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Sasha Levin <sashal@kernel.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: [PATCH AUTOSEL 5.3 34/87] powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window
Date: Tue, 24 Sep 2019 12:40:50 -0400	[thread overview]
Message-ID: <20190924164144.25591-34-sashal@kernel.org> (raw)
In-Reply-To: <20190924164144.25591-1-sashal@kernel.org>

From: Alexey Kardashevskiy <aik@ozlabs.ru>

[ Upstream commit c37c792dec0929dbb6360a609fb00fa20bb16fc2 ]

We allocate only the first level of multilevel TCE tables for KVM
already (alloc_userspace_copy==true), and the rest is allocated on demand.
This is not enabled though for bare metal.

This removes the KVM limitation (implicit, via the alloc_userspace_copy
parameter) and always allocates just the first level. The on-demand
allocation of missing levels is already implemented.

As from now on DMA map might happen with disabled interrupts, this
allocates TCEs with GFP_ATOMIC; otherwise lockdep reports errors 1].
In practice just a single page is allocated there so chances for failure
are quite low.

To save time when creating a new clean table, this skips non-allocated
indirect TCE entries in pnv_tce_free just like we already do in
the VFIO IOMMU TCE driver.

This changes the default level number from 1 to 2 to reduce the amount
of memory required for the default 32bit DMA window at the boot time.
The default window size is up to 2GB which requires 4MB of TCEs which is
unlikely to be used entirely or at all as most devices these days are
64bit capable so by switching to 2 levels by default we save 4032KB of
RAM per a device.

While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace
can trigger this path via VFIO, see the failure and try creating a table
again with different parameters which might succeed.

[1]:
===
BUG: sleeping function called from invalid context at mm/page_alloc.c:4596
in_atomic(): 1, irqs_disabled(): 1, pid: 1038, name: scsi_eh_1
2 locks held by scsi_eh_1/1038:
 #0: 000000005efd659a (&host->eh_mutex){+.+.}, at: ata_eh_acquire+0x34/0x80
 #1: 0000000006cf56a6 (&(&host->lock)->rlock){....}, at: ata_exec_internal_sg+0xb0/0x5c0
irq event stamp: 500
hardirqs last  enabled at (499): [<c000000000cb8a74>] _raw_spin_unlock_irqrestore+0x94/0xd0
hardirqs last disabled at (500): [<c000000000cb85c4>] _raw_spin_lock_irqsave+0x44/0x120
softirqs last  enabled at (0): [<c000000000101120>] copy_process.isra.4.part.5+0x640/0x1a80
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 73 PID: 1038 Comm: scsi_eh_1 Not tainted 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634
Call Trace:
[c000003d064cef50] [c000000000c8e6c4] dump_stack+0xe8/0x164 (unreliable)
[c000003d064cefa0] [c00000000014ed78] ___might_sleep+0x2f8/0x310
[c000003d064cf020] [c0000000003ca084] __alloc_pages_nodemask+0x2a4/0x1560
[c000003d064cf220] [c0000000000c2530] pnv_alloc_tce_level.isra.0+0x90/0x130
[c000003d064cf290] [c0000000000c2888] pnv_tce+0x128/0x3b0
[c000003d064cf360] [c0000000000c2c00] pnv_tce_build+0xb0/0xf0
[c000003d064cf3c0] [c0000000000bbd9c] pnv_ioda2_tce_build+0x3c/0xb0
[c000003d064cf400] [c00000000004cfe0] ppc_iommu_map_sg+0x210/0x550
[c000003d064cf510] [c00000000004b7a4] dma_iommu_map_sg+0x74/0xb0
[c000003d064cf530] [c000000000863944] ata_qc_issue+0x134/0x470
[c000003d064cf5b0] [c000000000863ec4] ata_exec_internal_sg+0x244/0x5c0
[c000003d064cf700] [c0000000008642d0] ata_exec_internal+0x90/0xe0
[c000003d064cf780] [c0000000008650ac] ata_dev_read_id+0x2ec/0x640
[c000003d064cf8d0] [c000000000878e28] ata_eh_recover+0x948/0x16d0
[c000003d064cfa10] [c00000000087d760] sata_pmp_error_handler+0x480/0xbf0
[c000003d064cfbc0] [c000000000884624] ahci_error_handler+0x74/0xe0
[c000003d064cfbf0] [c000000000879fa8] ata_scsi_port_error_handler+0x2d8/0x7c0
[c000003d064cfca0] [c00000000087a544] ata_scsi_error+0xb4/0x100
[c000003d064cfd00] [c000000000802450] scsi_error_handler+0x120/0x510
[c000003d064cfdb0] [c000000000140c48] kthread+0x1b8/0x1c0
[c000003d064cfe20] [c00000000000bd8c] ret_from_kernel_thread+0x5c/0x70
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
irq event stamp: 2305

========================================================
hardirqs last  enabled at (2305): [<c00000000000e4c8>] fast_exc_return_irq+0x28/0x34
hardirqs last disabled at (2303): [<c000000000cb9fd0>] __do_softirq+0x4a0/0x654
WARNING: possible irq lock inversion dependency detected
5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Tainted: G        W
softirqs last  enabled at (2304): [<c000000000cba054>] __do_softirq+0x524/0x654
softirqs last disabled at (2297): [<c00000000010f278>] irq_exit+0x128/0x180
--------------------------------------------------------
swapper/0/0 just changed the state of lock:
0000000006cf56a6 (&(&host->lock)->rlock){-...}, at: ahci_single_level_irq_intr+0xac/0x120
but this lock took another, HARDIRQ-unsafe lock in the past:
 (fs_reclaim){+.+.}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               local_irq_disable();
                               lock(&(&host->lock)->rlock);
                               lock(fs_reclaim);
  <Interrupt>
    lock(&(&host->lock)->rlock);

 *** DEADLOCK ***

no locks held by swapper/0/0.

the shortest dependencies between 2nd lock and 1st lock:
 -> (fs_reclaim){+.+.} ops: 167579 {
    HARDIRQ-ON-W at:
                      lock_acquire+0xf8/0x2a0
                      fs_reclaim_acquire.part.23+0x44/0x60
                      kmem_cache_alloc_node_trace+0x80/0x590
                      alloc_desc+0x64/0x270
                      __irq_alloc_descs+0x2e4/0x3a0
                      irq_domain_alloc_descs+0xb0/0x150
                      irq_create_mapping+0x168/0x2c0
                      xics_smp_probe+0x2c/0x98
                      pnv_smp_probe+0x40/0x9c
                      smp_prepare_cpus+0x524/0x6c4
                      kernel_init_freeable+0x1b4/0x650
                      kernel_init+0x2c/0x148
                      ret_from_kernel_thread+0x5c/0x70
    SOFTIRQ-ON-W at:
                      lock_acquire+0xf8/0x2a0
                      fs_reclaim_acquire.part.23+0x44/0x60
                      kmem_cache_alloc_node_trace+0x80/0x590
                      alloc_desc+0x64/0x270
                      __irq_alloc_descs+0x2e4/0x3a0
                      irq_domain_alloc_descs+0xb0/0x150
                      irq_create_mapping+0x168/0x2c0
                      xics_smp_probe+0x2c/0x98
                      pnv_smp_probe+0x40/0x9c
                      smp_prepare_cpus+0x524/0x6c4
                      kernel_init_freeable+0x1b4/0x650
                      kernel_init+0x2c/0x148
                      ret_from_kernel_thread+0x5c/0x70
    INITIAL USE at:
                     lock_acquire+0xf8/0x2a0
                     fs_reclaim_acquire.part.23+0x44/0x60
                     kmem_cache_alloc_node_trace+0x80/0x590
                     alloc_desc+0x64/0x270
                     __irq_alloc_descs+0x2e4/0x3a0
                     irq_domain_alloc_descs+0xb0/0x150
                     irq_create_mapping+0x168/0x2c0
                     xics_smp_probe+0x2c/0x98
                     pnv_smp_probe+0x40/0x9c
                     smp_prepare_cpus+0x524/0x6c4
                     kernel_init_freeable+0x1b4/0x650
                     kernel_init+0x2c/0x148
                     ret_from_kernel_thread+0x5c/0x70
  }
===

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718051139.74787-4-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 20 +++++++++----------
 arch/powerpc/platforms/powernv/pci.h          |  2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5eb..c75ec37bf0cda 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -36,7 +36,8 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
 	struct page *tce_mem = NULL;
 	__be64 *addr;
 
-	tce_mem = alloc_pages_node(nid, GFP_KERNEL, shift - PAGE_SHIFT);
+	tce_mem = alloc_pages_node(nid, GFP_ATOMIC | __GFP_NOWARN,
+			shift - PAGE_SHIFT);
 	if (!tce_mem) {
 		pr_err("Failed to allocate a TCE memory, level shift=%d\n",
 				shift);
@@ -161,6 +162,9 @@ void pnv_tce_free(struct iommu_table *tbl, long index, long npages)
 
 		if (ptce)
 			*ptce = cpu_to_be64(0);
+		else
+			/* Skip the rest of the level */
+			i |= tbl->it_level_size - 1;
 	}
 }
 
@@ -260,7 +264,6 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	unsigned int table_shift = max_t(unsigned int, entries_shift + 3,
 			PAGE_SHIFT);
 	const unsigned long tce_table_size = 1UL << table_shift;
-	unsigned int tmplevels = levels;
 
 	if (!levels || (levels > POWERNV_IOMMU_MAX_LEVELS))
 		return -EINVAL;
@@ -268,9 +271,6 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	if (!is_power_of_2(window_size))
 		return -EINVAL;
 
-	if (alloc_userspace_copy && (window_size > (1ULL << 32)))
-		tmplevels = 1;
-
 	/* Adjust direct table size from window_size and levels */
 	entries_shift = (entries_shift + levels - 1) / levels;
 	level_shift = entries_shift + 3;
@@ -281,7 +281,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 
 	/* Allocate TCE table */
 	addr = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
-			tmplevels, tce_table_size, &offset, &total_allocated);
+			1, tce_table_size, &offset, &total_allocated);
 
 	/* addr==NULL means that the first level allocation failed */
 	if (!addr)
@@ -292,18 +292,18 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	 * we did not allocate as much as we wanted,
 	 * release partially allocated table.
 	 */
-	if (tmplevels == levels && offset < tce_table_size)
+	if (levels == 1 && offset < tce_table_size)
 		goto free_tces_exit;
 
 	/* Allocate userspace view of the TCE table */
 	if (alloc_userspace_copy) {
 		offset = 0;
 		uas = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
-				tmplevels, tce_table_size, &offset,
+				1, tce_table_size, &offset,
 				&total_allocated_uas);
 		if (!uas)
 			goto free_tces_exit;
-		if (tmplevels == levels && (offset < tce_table_size ||
+		if (levels == 1 && (offset < tce_table_size ||
 				total_allocated_uas != total_allocated))
 			goto free_uas_exit;
 	}
@@ -318,7 +318,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 
 	pr_debug("Created TCE table: ws=%08llx ts=%lx @%08llx base=%lx uas=%p levels=%d/%d\n",
 			window_size, tce_table_size, bus_offset, tbl->it_base,
-			tbl->it_userspace, tmplevels, levels);
+			tbl->it_userspace, 1, levels);
 
 	return 0;
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 469c244632477..f914f0b14e4e3 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -219,7 +219,7 @@ extern struct iommu_table_group *pnv_npu_compound_attach(
 		struct pnv_ioda_pe *pe);
 
 /* pci-ioda-tce.c */
-#define POWERNV_IOMMU_DEFAULT_LEVELS	1
+#define POWERNV_IOMMU_DEFAULT_LEVELS	2
 #define POWERNV_IOMMU_MAX_LEVELS	5
 
 extern int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
-- 
2.20.1


WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Alistair Popple <alistair@popple.id.au>,
	linuxppc-dev@lists.ozlabs.org, Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 5.3 34/87] powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window
Date: Tue, 24 Sep 2019 12:40:50 -0400	[thread overview]
Message-ID: <20190924164144.25591-34-sashal@kernel.org> (raw)
In-Reply-To: <20190924164144.25591-1-sashal@kernel.org>

From: Alexey Kardashevskiy <aik@ozlabs.ru>

[ Upstream commit c37c792dec0929dbb6360a609fb00fa20bb16fc2 ]

We allocate only the first level of multilevel TCE tables for KVM
already (alloc_userspace_copy==true), and the rest is allocated on demand.
This is not enabled though for bare metal.

This removes the KVM limitation (implicit, via the alloc_userspace_copy
parameter) and always allocates just the first level. The on-demand
allocation of missing levels is already implemented.

As from now on DMA map might happen with disabled interrupts, this
allocates TCEs with GFP_ATOMIC; otherwise lockdep reports errors 1].
In practice just a single page is allocated there so chances for failure
are quite low.

To save time when creating a new clean table, this skips non-allocated
indirect TCE entries in pnv_tce_free just like we already do in
the VFIO IOMMU TCE driver.

This changes the default level number from 1 to 2 to reduce the amount
of memory required for the default 32bit DMA window at the boot time.
The default window size is up to 2GB which requires 4MB of TCEs which is
unlikely to be used entirely or at all as most devices these days are
64bit capable so by switching to 2 levels by default we save 4032KB of
RAM per a device.

While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace
can trigger this path via VFIO, see the failure and try creating a table
again with different parameters which might succeed.

[1]:
===
BUG: sleeping function called from invalid context at mm/page_alloc.c:4596
in_atomic(): 1, irqs_disabled(): 1, pid: 1038, name: scsi_eh_1
2 locks held by scsi_eh_1/1038:
 #0: 000000005efd659a (&host->eh_mutex){+.+.}, at: ata_eh_acquire+0x34/0x80
 #1: 0000000006cf56a6 (&(&host->lock)->rlock){....}, at: ata_exec_internal_sg+0xb0/0x5c0
irq event stamp: 500
hardirqs last  enabled at (499): [<c000000000cb8a74>] _raw_spin_unlock_irqrestore+0x94/0xd0
hardirqs last disabled at (500): [<c000000000cb85c4>] _raw_spin_lock_irqsave+0x44/0x120
softirqs last  enabled at (0): [<c000000000101120>] copy_process.isra.4.part.5+0x640/0x1a80
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 73 PID: 1038 Comm: scsi_eh_1 Not tainted 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634
Call Trace:
[c000003d064cef50] [c000000000c8e6c4] dump_stack+0xe8/0x164 (unreliable)
[c000003d064cefa0] [c00000000014ed78] ___might_sleep+0x2f8/0x310
[c000003d064cf020] [c0000000003ca084] __alloc_pages_nodemask+0x2a4/0x1560
[c000003d064cf220] [c0000000000c2530] pnv_alloc_tce_level.isra.0+0x90/0x130
[c000003d064cf290] [c0000000000c2888] pnv_tce+0x128/0x3b0
[c000003d064cf360] [c0000000000c2c00] pnv_tce_build+0xb0/0xf0
[c000003d064cf3c0] [c0000000000bbd9c] pnv_ioda2_tce_build+0x3c/0xb0
[c000003d064cf400] [c00000000004cfe0] ppc_iommu_map_sg+0x210/0x550
[c000003d064cf510] [c00000000004b7a4] dma_iommu_map_sg+0x74/0xb0
[c000003d064cf530] [c000000000863944] ata_qc_issue+0x134/0x470
[c000003d064cf5b0] [c000000000863ec4] ata_exec_internal_sg+0x244/0x5c0
[c000003d064cf700] [c0000000008642d0] ata_exec_internal+0x90/0xe0
[c000003d064cf780] [c0000000008650ac] ata_dev_read_id+0x2ec/0x640
[c000003d064cf8d0] [c000000000878e28] ata_eh_recover+0x948/0x16d0
[c000003d064cfa10] [c00000000087d760] sata_pmp_error_handler+0x480/0xbf0
[c000003d064cfbc0] [c000000000884624] ahci_error_handler+0x74/0xe0
[c000003d064cfbf0] [c000000000879fa8] ata_scsi_port_error_handler+0x2d8/0x7c0
[c000003d064cfca0] [c00000000087a544] ata_scsi_error+0xb4/0x100
[c000003d064cfd00] [c000000000802450] scsi_error_handler+0x120/0x510
[c000003d064cfdb0] [c000000000140c48] kthread+0x1b8/0x1c0
[c000003d064cfe20] [c00000000000bd8c] ret_from_kernel_thread+0x5c/0x70
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
irq event stamp: 2305

========================================================
hardirqs last  enabled at (2305): [<c00000000000e4c8>] fast_exc_return_irq+0x28/0x34
hardirqs last disabled at (2303): [<c000000000cb9fd0>] __do_softirq+0x4a0/0x654
WARNING: possible irq lock inversion dependency detected
5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Tainted: G        W
softirqs last  enabled at (2304): [<c000000000cba054>] __do_softirq+0x524/0x654
softirqs last disabled at (2297): [<c00000000010f278>] irq_exit+0x128/0x180
--------------------------------------------------------
swapper/0/0 just changed the state of lock:
0000000006cf56a6 (&(&host->lock)->rlock){-...}, at: ahci_single_level_irq_intr+0xac/0x120
but this lock took another, HARDIRQ-unsafe lock in the past:
 (fs_reclaim){+.+.}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               local_irq_disable();
                               lock(&(&host->lock)->rlock);
                               lock(fs_reclaim);
  <Interrupt>
    lock(&(&host->lock)->rlock);

 *** DEADLOCK ***

no locks held by swapper/0/0.

the shortest dependencies between 2nd lock and 1st lock:
 -> (fs_reclaim){+.+.} ops: 167579 {
    HARDIRQ-ON-W at:
                      lock_acquire+0xf8/0x2a0
                      fs_reclaim_acquire.part.23+0x44/0x60
                      kmem_cache_alloc_node_trace+0x80/0x590
                      alloc_desc+0x64/0x270
                      __irq_alloc_descs+0x2e4/0x3a0
                      irq_domain_alloc_descs+0xb0/0x150
                      irq_create_mapping+0x168/0x2c0
                      xics_smp_probe+0x2c/0x98
                      pnv_smp_probe+0x40/0x9c
                      smp_prepare_cpus+0x524/0x6c4
                      kernel_init_freeable+0x1b4/0x650
                      kernel_init+0x2c/0x148
                      ret_from_kernel_thread+0x5c/0x70
    SOFTIRQ-ON-W at:
                      lock_acquire+0xf8/0x2a0
                      fs_reclaim_acquire.part.23+0x44/0x60
                      kmem_cache_alloc_node_trace+0x80/0x590
                      alloc_desc+0x64/0x270
                      __irq_alloc_descs+0x2e4/0x3a0
                      irq_domain_alloc_descs+0xb0/0x150
                      irq_create_mapping+0x168/0x2c0
                      xics_smp_probe+0x2c/0x98
                      pnv_smp_probe+0x40/0x9c
                      smp_prepare_cpus+0x524/0x6c4
                      kernel_init_freeable+0x1b4/0x650
                      kernel_init+0x2c/0x148
                      ret_from_kernel_thread+0x5c/0x70
    INITIAL USE at:
                     lock_acquire+0xf8/0x2a0
                     fs_reclaim_acquire.part.23+0x44/0x60
                     kmem_cache_alloc_node_trace+0x80/0x590
                     alloc_desc+0x64/0x270
                     __irq_alloc_descs+0x2e4/0x3a0
                     irq_domain_alloc_descs+0xb0/0x150
                     irq_create_mapping+0x168/0x2c0
                     xics_smp_probe+0x2c/0x98
                     pnv_smp_probe+0x40/0x9c
                     smp_prepare_cpus+0x524/0x6c4
                     kernel_init_freeable+0x1b4/0x650
                     kernel_init+0x2c/0x148
                     ret_from_kernel_thread+0x5c/0x70
  }
===

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718051139.74787-4-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 20 +++++++++----------
 arch/powerpc/platforms/powernv/pci.h          |  2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5eb..c75ec37bf0cda 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -36,7 +36,8 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
 	struct page *tce_mem = NULL;
 	__be64 *addr;
 
-	tce_mem = alloc_pages_node(nid, GFP_KERNEL, shift - PAGE_SHIFT);
+	tce_mem = alloc_pages_node(nid, GFP_ATOMIC | __GFP_NOWARN,
+			shift - PAGE_SHIFT);
 	if (!tce_mem) {
 		pr_err("Failed to allocate a TCE memory, level shift=%d\n",
 				shift);
@@ -161,6 +162,9 @@ void pnv_tce_free(struct iommu_table *tbl, long index, long npages)
 
 		if (ptce)
 			*ptce = cpu_to_be64(0);
+		else
+			/* Skip the rest of the level */
+			i |= tbl->it_level_size - 1;
 	}
 }
 
@@ -260,7 +264,6 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	unsigned int table_shift = max_t(unsigned int, entries_shift + 3,
 			PAGE_SHIFT);
 	const unsigned long tce_table_size = 1UL << table_shift;
-	unsigned int tmplevels = levels;
 
 	if (!levels || (levels > POWERNV_IOMMU_MAX_LEVELS))
 		return -EINVAL;
@@ -268,9 +271,6 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	if (!is_power_of_2(window_size))
 		return -EINVAL;
 
-	if (alloc_userspace_copy && (window_size > (1ULL << 32)))
-		tmplevels = 1;
-
 	/* Adjust direct table size from window_size and levels */
 	entries_shift = (entries_shift + levels - 1) / levels;
 	level_shift = entries_shift + 3;
@@ -281,7 +281,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 
 	/* Allocate TCE table */
 	addr = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
-			tmplevels, tce_table_size, &offset, &total_allocated);
+			1, tce_table_size, &offset, &total_allocated);
 
 	/* addr==NULL means that the first level allocation failed */
 	if (!addr)
@@ -292,18 +292,18 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	 * we did not allocate as much as we wanted,
 	 * release partially allocated table.
 	 */
-	if (tmplevels == levels && offset < tce_table_size)
+	if (levels == 1 && offset < tce_table_size)
 		goto free_tces_exit;
 
 	/* Allocate userspace view of the TCE table */
 	if (alloc_userspace_copy) {
 		offset = 0;
 		uas = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
-				tmplevels, tce_table_size, &offset,
+				1, tce_table_size, &offset,
 				&total_allocated_uas);
 		if (!uas)
 			goto free_tces_exit;
-		if (tmplevels == levels && (offset < tce_table_size ||
+		if (levels == 1 && (offset < tce_table_size ||
 				total_allocated_uas != total_allocated))
 			goto free_uas_exit;
 	}
@@ -318,7 +318,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 
 	pr_debug("Created TCE table: ws=%08llx ts=%lx @%08llx base=%lx uas=%p levels=%d/%d\n",
 			window_size, tce_table_size, bus_offset, tbl->it_base,
-			tbl->it_userspace, tmplevels, levels);
+			tbl->it_userspace, 1, levels);
 
 	return 0;
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 469c244632477..f914f0b14e4e3 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -219,7 +219,7 @@ extern struct iommu_table_group *pnv_npu_compound_attach(
 		struct pnv_ioda_pe *pe);
 
 /* pci-ioda-tce.c */
-#define POWERNV_IOMMU_DEFAULT_LEVELS	1
+#define POWERNV_IOMMU_DEFAULT_LEVELS	2
 #define POWERNV_IOMMU_MAX_LEVELS	5
 
 extern int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
-- 
2.20.1


  parent reply	other threads:[~2019-09-24 17:01 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-24 16:40 [PATCH AUTOSEL 5.3 01/87] drm/vkms: Fix crc worker races Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 02/87] drm/mcde: Fix uninitialized variable Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 03/87] drm/bridge: tc358767: Increase AUX transfer length limit Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 04/87] drm/bridge: adv7511: Attach to DSI host at probe time Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 05/87] drm/kms: Catch mode_object lifetime errors Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 06/87] drm/vkms: Avoid assigning 0 for possible_crtc Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 07/87] drm/panel: simple: fix AUO g185han01 horizontal blanking Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 08/87] drm/amd/display: add monitor patch to add T7 delay Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 09/87] drm/amd/display: Power-gate all DSCs at driver init time Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 10/87] drm/amd/display: fix not calling ppsmu to trigger PME Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 11/87] drm/amd/display: Clear FEC_READY shadow register if DPCD write fails Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 12/87] drm/amd/display: Copy GSL groups when committing a new context Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 13/87] video: ssd1307fb: Start page range at page_offset Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 14/87] drm/tinydrm/Kconfig: drivers: Select BACKLIGHT_CLASS_DEVICE Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 15/87] drm/stm: attach gem fence to atomic state Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 16/87] drm/bridge: sii902x: fix missing reference to mclk clock Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 17/87] drm/panel: check failure cases in the probe func Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 18/87] drm/rockchip: Check for fast link training before enabling psr Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 19/87] drm/amdgpu: Fix hard hang for S/G display BOs Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 20/87] drm/amd/display: Use proper enum conversion functions Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 21/87] drm/radeon: Fix EEH during kexec Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 22/87] gpu: drm: radeon: Fix a possible null-pointer dereference in radeon_connector_set_property() Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 23/87] clk: imx8mq: Mark AHB clock as critical Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 24/87] PCI: rpaphp: Avoid a sometimes-uninitialized warning Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 25/87] pinctrl: stmfx: update pinconf settings Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 26/87] ipmi_si: Only schedule continuously in the thread in maintenance mode Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 27/87] clk: qoriq: Fix -Wunused-const-variable Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 28/87] clk: ingenic/jz4740: Fix "pll half" divider not read/written properly Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 29/87] clk: sunxi-ng: v3s: add missing clock slices for MMC2 module clocks Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 30/87] drm/amd/display: fix issue where 252-255 values are clipped Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 31/87] drm/amd/display: Fix frames_to_insert math Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 32/87] drm/amd/display: reprogram VM config when system resume Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 33/87] drm/amd/display: Register VUPDATE_NO_LOCK interrupts for DCN2 Sasha Levin
2019-09-24 16:40 ` Sasha Levin [this message]
2019-09-24 16:40   ` [PATCH AUTOSEL 5.3 34/87] powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 35/87] drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2) Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 36/87] clk: actions: Don't reference clk_init_data after registration Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 37/87] clk: sirf: " Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 38/87] clk: meson: axg-audio: " Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 39/87] clk: sprd: " Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 40/87] clk: zx296718: " Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 41/87] clk: sunxi: Don't call clk_hw_get_name() on a hw that isn't registered Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 42/87] powerpc/xmon: Check for HV mode when dumping XIVE info from OPAL Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:40 ` [PATCH AUTOSEL 5.3 43/87] powerpc/rtas: use device model APIs and serialization during LPM Sasha Levin
2019-09-24 16:40   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 44/87] powerpc/ptdump: fix walk_pagetables() address mismatch Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 45/87] powerpc/futex: Fix warning: 'oldval' may be used uninitialized in this function Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 46/87] powerpc/64s/radix: Remove redundant pfn_pte bitop, add VM_BUG_ON Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 47/87] powerpc/64s/radix: Fix memory hotplug section page table creation Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 48/87] powerpc/pseries/mobility: use cond_resched when updating device tree Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 49/87] powerpc/perf: fix imc allocation failure handling Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 50/87] pinctrl: tegra: Fix write barrier placement in pmx_writel Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 51/87] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 52/87] vfio_pci: Restore original state on release Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 53/87] drm/amdgpu/sdma5: fix number of sdma5 trap irq types for navi1x Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 54/87] drm/nouveau/kms/tu102-: disable input lut when input is already FP16 Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 55/87] drm/nouveau/volt: Fix for some cards having 0 maximum voltage Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 56/87] pinctrl: amd: disable spurious-firing GPIO IRQs Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 57/87] clk: renesas: mstp: Set GENPD_FLAG_ALWAYS_ON for clock domain Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 58/87] clk: renesas: cpg-mssr: " Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 59/87] drm/amd/display: support spdif Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 60/87] drm/amd/powerpaly: fix navi series custom peak level value error Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 61/87] drm/amd/display: fix MPO HUBP underflow with Scatter Gather Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 62/87] drm/amd/display: fix trigger not generated for freesync Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 63/87] selftests/powerpc: Retry on host facility unavailable Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 64/87] kbuild: Do not enable -Wimplicit-fallthrough for clang for now Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 65/87] drm/amdgpu/si: fix ASIC tests Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 66/87] powerpc/64s/exception: machine check use correct cfar for late handler Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 67/87] pstore: fs superblock limits Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 68/87] powerpc/eeh: Clean up EEH PEs after recovery finishes Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 69/87] powerpc/imc: Dont create debugfs files for cpu-less nodes Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 70/87] clk: qcom: gcc-sdm845: Use floor ops for sdcc clks Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 71/87] powerpc/pseries: correctly track irq state in default idle Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 72/87] pinctrl: meson-gxbb: Fix wrong pinning definition for uart_c Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 73/87] mailbox: mediatek: cmdq: clear the event in cmdq initial flow Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 74/87] ARM: dts: dir685: Drop spi-cpol from the display Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 75/87] arm64: fix unreachable code issue with cmpxchg Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 76/87] clk: at91: select parent if main oscillator or bypass is enabled Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 77/87] clk: imx: pll14xx: avoid glitch when set rate Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 78/87] clk: imx: clk-pll14xx: unbypass PLL by default Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 79/87] clk: Make clk_bulk_get_all() return a valid "id" Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 80/87] powerpc: dump kernel log before carrying out fadump or kdump Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 81/87] mbox: qcom: add APCS child device for QCS404 Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 82/87] clk: sprd: add missing kfree Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 83/87] scsi: core: Reduce memory required for SCSI logging Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 84/87] dma-buf/sw_sync: Synchronize signal vs syncpt free Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 85/87] drm: fix module name in edid_firmware log message Sasha Levin
2019-09-24 16:41   ` Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 86/87] f2fs: fix to drop meta/node pages during umount Sasha Levin
2019-09-24 16:41   ` [f2fs-dev] " Sasha Levin
2019-09-24 16:41 ` [PATCH AUTOSEL 5.3 87/87] ext4: fix potential use after free after remounting with noblock_validity Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190924164144.25591-34-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=aik@ozlabs.ru \
    --cc=alistair@popple.id.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.