All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel 0/2] powerpc/iommu: Stop crashing the host when VM is terminated
@ 2021-02-16  3:33 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2021-02-16  3:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc, David Gibson

Killing a VM on a host under memory pressure kills a host which is
annoying. 1/2 reduces the chances, 2/2 eliminates panic() on
ioda2.


This is based on sha1
f40ddce88593 Linus Torvalds "Linux 5.11".

Please comment. Thanks.



Alexey Kardashevskiy (2):
  powerpc/iommu: Allocate it_map by vmalloc
  powerpc/iommu: Do not immediately panic when failed IOMMU table
    allocation

 arch/powerpc/kernel/iommu.c               | 19 ++++++-------------
 arch/powerpc/platforms/cell/iommu.c       |  3 ++-
 arch/powerpc/platforms/pasemi/iommu.c     |  4 +++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 15 ++++++++-------
 arch/powerpc/platforms/pseries/iommu.c    | 10 +++++++---
 arch/powerpc/sysdev/dart_iommu.c          |  3 ++-
 6 files changed, 28 insertions(+), 26 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH kernel 0/2] powerpc/iommu: Stop crashing the host when VM is terminated
@ 2021-02-16  3:33 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2021-02-16  3:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc, David Gibson

Killing a VM on a host under memory pressure kills a host which is
annoying. 1/2 reduces the chances, 2/2 eliminates panic() on
ioda2.


This is based on sha1
f40ddce88593 Linus Torvalds "Linux 5.11".

Please comment. Thanks.



Alexey Kardashevskiy (2):
  powerpc/iommu: Allocate it_map by vmalloc
  powerpc/iommu: Do not immediately panic when failed IOMMU table
    allocation

 arch/powerpc/kernel/iommu.c               | 19 ++++++-------------
 arch/powerpc/platforms/cell/iommu.c       |  3 ++-
 arch/powerpc/platforms/pasemi/iommu.c     |  4 +++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 15 ++++++++-------
 arch/powerpc/platforms/pseries/iommu.c    | 10 +++++++---
 arch/powerpc/sysdev/dart_iommu.c          |  3 ++-
 6 files changed, 28 insertions(+), 26 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH kernel 1/2] powerpc/iommu: Allocate it_map by vmalloc
  2021-02-16  3:33 ` Alexey Kardashevskiy
@ 2021-02-16  3:33   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2021-02-16  3:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc, David Gibson

The IOMMU table uses the it_map bitmap to keep track of allocated DMA
pages. This has always been a contiguous array allocated at either
the boot time or when a passed through device is returned to the host OS.
The it_map memory is allocated by alloc_pages() which allocates
contiguous physical memory.

Such allocation method occasionally creates a problem when there is
no big chunk of memory available (no free memory or too fragmented).
On powernv/ioda2 the default DMA window requires 16MB for it_map.

This replaces alloc_pages_node() with vzalloc_node() which allocates
contiguous block but in virtual memory. This should reduce changes of
failure but should not cause other behavioral changes as it_map is only
used by the kernel's DMA hooks/api when MMU is on.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/iommu.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index c00214a4355c..8eb6eb0afa97 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -719,7 +719,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 {
 	unsigned long sz;
 	static int welcomed = 0;
-	struct page *page;
 	unsigned int i;
 	struct iommu_pool *p;
 
@@ -728,11 +727,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 	/* number of bytes needed for the bitmap */
 	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
 
-	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
-	if (!page)
+	tbl->it_map = vzalloc_node(sz, nid);
+	if (!tbl->it_map)
 		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
-	tbl->it_map = page_address(page);
-	memset(tbl->it_map, 0, sz);
 
 	iommu_table_reserve_pages(tbl, res_start, res_end);
 
@@ -774,8 +771,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 
 static void iommu_table_free(struct kref *kref)
 {
-	unsigned long bitmap_sz;
-	unsigned int order;
 	struct iommu_table *tbl;
 
 	tbl = container_of(kref, struct iommu_table, it_kref);
@@ -796,12 +791,8 @@ static void iommu_table_free(struct kref *kref)
 	if (!bitmap_empty(tbl->it_map, tbl->it_size))
 		pr_warn("%s: Unexpected TCEs\n", __func__);
 
-	/* calculate bitmap size in bytes */
-	bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
-
 	/* free bitmap */
-	order = get_order(bitmap_sz);
-	free_pages((unsigned long) tbl->it_map, order);
+	vfree(tbl->it_map);
 
 	/* free table */
 	kfree(tbl);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 1/2] powerpc/iommu: Allocate it_map by vmalloc
@ 2021-02-16  3:33   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2021-02-16  3:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc, David Gibson

The IOMMU table uses the it_map bitmap to keep track of allocated DMA
pages. This has always been a contiguous array allocated at either
the boot time or when a passed through device is returned to the host OS.
The it_map memory is allocated by alloc_pages() which allocates
contiguous physical memory.

Such allocation method occasionally creates a problem when there is
no big chunk of memory available (no free memory or too fragmented).
On powernv/ioda2 the default DMA window requires 16MB for it_map.

This replaces alloc_pages_node() with vzalloc_node() which allocates
contiguous block but in virtual memory. This should reduce changes of
failure but should not cause other behavioral changes as it_map is only
used by the kernel's DMA hooks/api when MMU is on.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/iommu.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index c00214a4355c..8eb6eb0afa97 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -719,7 +719,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 {
 	unsigned long sz;
 	static int welcomed = 0;
-	struct page *page;
 	unsigned int i;
 	struct iommu_pool *p;
 
@@ -728,11 +727,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 	/* number of bytes needed for the bitmap */
 	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
 
-	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
-	if (!page)
+	tbl->it_map = vzalloc_node(sz, nid);
+	if (!tbl->it_map)
 		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
-	tbl->it_map = page_address(page);
-	memset(tbl->it_map, 0, sz);
 
 	iommu_table_reserve_pages(tbl, res_start, res_end);
 
@@ -774,8 +771,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 
 static void iommu_table_free(struct kref *kref)
 {
-	unsigned long bitmap_sz;
-	unsigned int order;
 	struct iommu_table *tbl;
 
 	tbl = container_of(kref, struct iommu_table, it_kref);
@@ -796,12 +791,8 @@ static void iommu_table_free(struct kref *kref)
 	if (!bitmap_empty(tbl->it_map, tbl->it_size))
 		pr_warn("%s: Unexpected TCEs\n", __func__);
 
-	/* calculate bitmap size in bytes */
-	bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
-
 	/* free bitmap */
-	order = get_order(bitmap_sz);
-	free_pages((unsigned long) tbl->it_map, order);
+	vfree(tbl->it_map);
 
 	/* free table */
 	kfree(tbl);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
  2021-02-16  3:33 ` Alexey Kardashevskiy
@ 2021-02-16  3:33   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2021-02-16  3:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc, David Gibson

Most platforms allocate IOMMU table structures (specifically it_map)
at the boot time and when this fails - it is a valid reason for panic().

However the powernv platform allocates it_map after a device is returned
to the host OS after being passed through and this happens long after
the host OS booted. It is quite possible to trigger the it_map allocation
panic() and kill the host even though it is not necessary - the host OS
can still use the DMA bypass mode (requires a tiny fraction of it_map's
memory) and even if that fails, the host OS is runnnable as it was without
the device for which allocating it_map causes the panic.

Instead of immediately crashing in a powernv/ioda2 system, this prints
an error and continues. All other platforms still call panic().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/iommu.c               |  6 ++++--
 arch/powerpc/platforms/cell/iommu.c       |  3 ++-
 arch/powerpc/platforms/pasemi/iommu.c     |  4 +++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 15 ++++++++-------
 arch/powerpc/platforms/pseries/iommu.c    | 10 +++++++---
 arch/powerpc/sysdev/dart_iommu.c          |  3 ++-
 6 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 8eb6eb0afa97..c1a5c366a664 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -728,8 +728,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
 
 	tbl->it_map = vzalloc_node(sz, nid);
-	if (!tbl->it_map)
-		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
+	if (!tbl->it_map) {
+		pr_err("%s: Can't allocate %ld bytes\n", __func__, sz);
+		return NULL;
+	}
 
 	iommu_table_reserve_pages(tbl, res_start, res_end);
 
diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index 2124831cf57c..fa08699aedeb 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -486,7 +486,8 @@ cell_iommu_setup_window(struct cbe_iommu *iommu, struct device_node *np,
 	window->table.it_size = size >> window->table.it_page_shift;
 	window->table.it_ops = &cell_iommu_ops;
 
-	iommu_init_table(&window->table, iommu->nid, 0, 0);
+	if (!iommu_init_table(&window->table, iommu->nid, 0, 0))
+		panic("Failed to initialize iommu table");
 
 	pr_debug("\tioid      %d\n", window->ioid);
 	pr_debug("\tblocksize %ld\n", window->table.it_blocksize);
diff --git a/arch/powerpc/platforms/pasemi/iommu.c b/arch/powerpc/platforms/pasemi/iommu.c
index b500a6e47e6b..5be7242fbd86 100644
--- a/arch/powerpc/platforms/pasemi/iommu.c
+++ b/arch/powerpc/platforms/pasemi/iommu.c
@@ -146,7 +146,9 @@ static void iommu_table_iobmap_setup(void)
 	 */
 	iommu_table_iobmap.it_blocksize = 4;
 	iommu_table_iobmap.it_ops = &iommu_table_iobmap_ops;
-	iommu_init_table(&iommu_table_iobmap, 0, 0, 0);
+	if (!iommu_init_table(&iommu_table_iobmap, 0, 0, 0))
+		panic("Failed to initialize iommu table");
+
 	pr_debug(" <- %s\n", __func__);
 }
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f0f901683a2f..66c3c3337334 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1762,7 +1762,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	tbl->it_ops = &pnv_ioda1_iommu_ops;
 	pe->table_group.tce32_start = tbl->it_offset << tbl->it_page_shift;
 	pe->table_group.tce32_size = tbl->it_size << tbl->it_page_shift;
-	iommu_init_table(tbl, phb->hose->node, 0, 0);
+	if (!iommu_init_table(tbl, phb->hose->node, 0, 0))
+		panic("Failed to initialize iommu table");
 
 	pe->dma_setup_done = true;
 	return;
@@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
 		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
 	}
-	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
 
-	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
+	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
+		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
+	else
+		rc = -ENOMEM;
 	if (rc) {
-		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
-				rc);
+		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
 		iommu_tce_table_put(tbl);
-		return rc;
+		tbl = NULL; /* This clears iommu_table_base below */
 	}
-
 	if (!pnv_iommu_bypass_disabled)
 		pnv_pci_ioda2_set_bypass(pe, true);
 
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..4d9ac1f181c2 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -638,7 +638,8 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
 
 	iommu_table_setparms(pci->phb, dn, tbl);
 	tbl->it_ops = &iommu_table_pseries_ops;
-	iommu_init_table(tbl, pci->phb->node, 0, 0);
+	if (!iommu_init_table(tbl, pci->phb->node, 0, 0))
+		panic("Failed to initialize iommu table");
 
 	/* Divide the rest (1.75GB) among the children */
 	pci->phb->dma_window_size = 0x80000000ul;
@@ -720,7 +721,8 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 		iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
 				ppci->table_group, dma_window);
 		tbl->it_ops = &iommu_table_lpar_multi_ops;
-		iommu_init_table(tbl, ppci->phb->node, 0, 0);
+		if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
+			panic("Failed to initialize iommu table");
 		iommu_register_group(ppci->table_group,
 				pci_domain_nr(bus), 0);
 		pr_debug("  created table: %p\n", ppci->table_group);
@@ -749,7 +751,9 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		tbl = PCI_DN(dn)->table_group->tables[0];
 		iommu_table_setparms(phb, dn, tbl);
 		tbl->it_ops = &iommu_table_pseries_ops;
-		iommu_init_table(tbl, phb->node, 0, 0);
+		if (!iommu_init_table(tbl, phb->node, 0, 0))
+			panic("Failed to initialize iommu table");
+
 		set_iommu_table_base(&dev->dev, tbl);
 		return;
 	}
diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 6b4a34b36d98..1d33b7a5ea83 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -344,7 +344,8 @@ static void iommu_table_dart_setup(void)
 	iommu_table_dart.it_index = 0;
 	iommu_table_dart.it_blocksize = 1;
 	iommu_table_dart.it_ops = &iommu_dart_ops;
-	iommu_init_table(&iommu_table_dart, -1, 0, 0);
+	if (!iommu_init_table(&iommu_table_dart, -1, 0, 0))
+		panic("Failed to initialize iommu table");
 
 	/* Reserve the last page of the DART to avoid possible prefetch
 	 * past the DART mapped area
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
@ 2021-02-16  3:33   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2021-02-16  3:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc, David Gibson

Most platforms allocate IOMMU table structures (specifically it_map)
at the boot time and when this fails - it is a valid reason for panic().

However the powernv platform allocates it_map after a device is returned
to the host OS after being passed through and this happens long after
the host OS booted. It is quite possible to trigger the it_map allocation
panic() and kill the host even though it is not necessary - the host OS
can still use the DMA bypass mode (requires a tiny fraction of it_map's
memory) and even if that fails, the host OS is runnnable as it was without
the device for which allocating it_map causes the panic.

Instead of immediately crashing in a powernv/ioda2 system, this prints
an error and continues. All other platforms still call panic().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/iommu.c               |  6 ++++--
 arch/powerpc/platforms/cell/iommu.c       |  3 ++-
 arch/powerpc/platforms/pasemi/iommu.c     |  4 +++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 15 ++++++++-------
 arch/powerpc/platforms/pseries/iommu.c    | 10 +++++++---
 arch/powerpc/sysdev/dart_iommu.c          |  3 ++-
 6 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 8eb6eb0afa97..c1a5c366a664 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -728,8 +728,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
 
 	tbl->it_map = vzalloc_node(sz, nid);
-	if (!tbl->it_map)
-		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
+	if (!tbl->it_map) {
+		pr_err("%s: Can't allocate %ld bytes\n", __func__, sz);
+		return NULL;
+	}
 
 	iommu_table_reserve_pages(tbl, res_start, res_end);
 
diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index 2124831cf57c..fa08699aedeb 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -486,7 +486,8 @@ cell_iommu_setup_window(struct cbe_iommu *iommu, struct device_node *np,
 	window->table.it_size = size >> window->table.it_page_shift;
 	window->table.it_ops = &cell_iommu_ops;
 
-	iommu_init_table(&window->table, iommu->nid, 0, 0);
+	if (!iommu_init_table(&window->table, iommu->nid, 0, 0))
+		panic("Failed to initialize iommu table");
 
 	pr_debug("\tioid      %d\n", window->ioid);
 	pr_debug("\tblocksize %ld\n", window->table.it_blocksize);
diff --git a/arch/powerpc/platforms/pasemi/iommu.c b/arch/powerpc/platforms/pasemi/iommu.c
index b500a6e47e6b..5be7242fbd86 100644
--- a/arch/powerpc/platforms/pasemi/iommu.c
+++ b/arch/powerpc/platforms/pasemi/iommu.c
@@ -146,7 +146,9 @@ static void iommu_table_iobmap_setup(void)
 	 */
 	iommu_table_iobmap.it_blocksize = 4;
 	iommu_table_iobmap.it_ops = &iommu_table_iobmap_ops;
-	iommu_init_table(&iommu_table_iobmap, 0, 0, 0);
+	if (!iommu_init_table(&iommu_table_iobmap, 0, 0, 0))
+		panic("Failed to initialize iommu table");
+
 	pr_debug(" <- %s\n", __func__);
 }
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f0f901683a2f..66c3c3337334 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1762,7 +1762,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	tbl->it_ops = &pnv_ioda1_iommu_ops;
 	pe->table_group.tce32_start = tbl->it_offset << tbl->it_page_shift;
 	pe->table_group.tce32_size = tbl->it_size << tbl->it_page_shift;
-	iommu_init_table(tbl, phb->hose->node, 0, 0);
+	if (!iommu_init_table(tbl, phb->hose->node, 0, 0))
+		panic("Failed to initialize iommu table");
 
 	pe->dma_setup_done = true;
 	return;
@@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
 		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
 	}
-	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
 
-	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
+	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
+		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
+	else
+		rc = -ENOMEM;
 	if (rc) {
-		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
-				rc);
+		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
 		iommu_tce_table_put(tbl);
-		return rc;
+		tbl = NULL; /* This clears iommu_table_base below */
 	}
-
 	if (!pnv_iommu_bypass_disabled)
 		pnv_pci_ioda2_set_bypass(pe, true);
 
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..4d9ac1f181c2 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -638,7 +638,8 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
 
 	iommu_table_setparms(pci->phb, dn, tbl);
 	tbl->it_ops = &iommu_table_pseries_ops;
-	iommu_init_table(tbl, pci->phb->node, 0, 0);
+	if (!iommu_init_table(tbl, pci->phb->node, 0, 0))
+		panic("Failed to initialize iommu table");
 
 	/* Divide the rest (1.75GB) among the children */
 	pci->phb->dma_window_size = 0x80000000ul;
@@ -720,7 +721,8 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 		iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
 				ppci->table_group, dma_window);
 		tbl->it_ops = &iommu_table_lpar_multi_ops;
-		iommu_init_table(tbl, ppci->phb->node, 0, 0);
+		if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
+			panic("Failed to initialize iommu table");
 		iommu_register_group(ppci->table_group,
 				pci_domain_nr(bus), 0);
 		pr_debug("  created table: %p\n", ppci->table_group);
@@ -749,7 +751,9 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		tbl = PCI_DN(dn)->table_group->tables[0];
 		iommu_table_setparms(phb, dn, tbl);
 		tbl->it_ops = &iommu_table_pseries_ops;
-		iommu_init_table(tbl, phb->node, 0, 0);
+		if (!iommu_init_table(tbl, phb->node, 0, 0))
+			panic("Failed to initialize iommu table");
+
 		set_iommu_table_base(&dev->dev, tbl);
 		return;
 	}
diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 6b4a34b36d98..1d33b7a5ea83 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -344,7 +344,8 @@ static void iommu_table_dart_setup(void)
 	iommu_table_dart.it_index = 0;
 	iommu_table_dart.it_blocksize = 1;
 	iommu_table_dart.it_ops = &iommu_dart_ops;
-	iommu_init_table(&iommu_table_dart, -1, 0, 0);
+	if (!iommu_init_table(&iommu_table_dart, -1, 0, 0))
+		panic("Failed to initialize iommu table");
 
 	/* Reserve the last page of the DART to avoid possible prefetch
 	 * past the DART mapped area
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 1/2] powerpc/iommu: Allocate it_map by vmalloc
  2021-02-16  3:33   ` Alexey Kardashevskiy
@ 2021-02-17  0:16     ` David Gibson
  -1 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2021-02-17  0:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 3078 bytes --]

On Tue, Feb 16, 2021 at 02:33:06PM +1100, Alexey Kardashevskiy wrote:
> The IOMMU table uses the it_map bitmap to keep track of allocated DMA
> pages. This has always been a contiguous array allocated at either
> the boot time or when a passed through device is returned to the host OS.
> The it_map memory is allocated by alloc_pages() which allocates
> contiguous physical memory.
> 
> Such allocation method occasionally creates a problem when there is
> no big chunk of memory available (no free memory or too fragmented).
> On powernv/ioda2 the default DMA window requires 16MB for it_map.
> 
> This replaces alloc_pages_node() with vzalloc_node() which allocates
> contiguous block but in virtual memory. This should reduce changes of
> failure but should not cause other behavioral changes as it_map is only
> used by the kernel's DMA hooks/api when MMU is on.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kernel/iommu.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index c00214a4355c..8eb6eb0afa97 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -719,7 +719,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  {
>  	unsigned long sz;
>  	static int welcomed = 0;
> -	struct page *page;
>  	unsigned int i;
>  	struct iommu_pool *p;
>  
> @@ -728,11 +727,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  	/* number of bytes needed for the bitmap */
>  	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
>  
> -	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
> -	if (!page)
> +	tbl->it_map = vzalloc_node(sz, nid);
> +	if (!tbl->it_map)
>  		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
> -	tbl->it_map = page_address(page);
> -	memset(tbl->it_map, 0, sz);
>  
>  	iommu_table_reserve_pages(tbl, res_start, res_end);
>  
> @@ -774,8 +771,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  
>  static void iommu_table_free(struct kref *kref)
>  {
> -	unsigned long bitmap_sz;
> -	unsigned int order;
>  	struct iommu_table *tbl;
>  
>  	tbl = container_of(kref, struct iommu_table, it_kref);
> @@ -796,12 +791,8 @@ static void iommu_table_free(struct kref *kref)
>  	if (!bitmap_empty(tbl->it_map, tbl->it_size))
>  		pr_warn("%s: Unexpected TCEs\n", __func__);
>  
> -	/* calculate bitmap size in bytes */
> -	bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
> -
>  	/* free bitmap */
> -	order = get_order(bitmap_sz);
> -	free_pages((unsigned long) tbl->it_map, order);
> +	vfree(tbl->it_map);
>  
>  	/* free table */
>  	kfree(tbl);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 1/2] powerpc/iommu: Allocate it_map by vmalloc
@ 2021-02-17  0:16     ` David Gibson
  0 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2021-02-17  0:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 3078 bytes --]

On Tue, Feb 16, 2021 at 02:33:06PM +1100, Alexey Kardashevskiy wrote:
> The IOMMU table uses the it_map bitmap to keep track of allocated DMA
> pages. This has always been a contiguous array allocated at either
> the boot time or when a passed through device is returned to the host OS.
> The it_map memory is allocated by alloc_pages() which allocates
> contiguous physical memory.
> 
> Such allocation method occasionally creates a problem when there is
> no big chunk of memory available (no free memory or too fragmented).
> On powernv/ioda2 the default DMA window requires 16MB for it_map.
> 
> This replaces alloc_pages_node() with vzalloc_node() which allocates
> contiguous block but in virtual memory. This should reduce changes of
> failure but should not cause other behavioral changes as it_map is only
> used by the kernel's DMA hooks/api when MMU is on.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kernel/iommu.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index c00214a4355c..8eb6eb0afa97 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -719,7 +719,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  {
>  	unsigned long sz;
>  	static int welcomed = 0;
> -	struct page *page;
>  	unsigned int i;
>  	struct iommu_pool *p;
>  
> @@ -728,11 +727,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  	/* number of bytes needed for the bitmap */
>  	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
>  
> -	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
> -	if (!page)
> +	tbl->it_map = vzalloc_node(sz, nid);
> +	if (!tbl->it_map)
>  		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
> -	tbl->it_map = page_address(page);
> -	memset(tbl->it_map, 0, sz);
>  
>  	iommu_table_reserve_pages(tbl, res_start, res_end);
>  
> @@ -774,8 +771,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  
>  static void iommu_table_free(struct kref *kref)
>  {
> -	unsigned long bitmap_sz;
> -	unsigned int order;
>  	struct iommu_table *tbl;
>  
>  	tbl = container_of(kref, struct iommu_table, it_kref);
> @@ -796,12 +791,8 @@ static void iommu_table_free(struct kref *kref)
>  	if (!bitmap_empty(tbl->it_map, tbl->it_size))
>  		pr_warn("%s: Unexpected TCEs\n", __func__);
>  
> -	/* calculate bitmap size in bytes */
> -	bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
> -
>  	/* free bitmap */
> -	order = get_order(bitmap_sz);
> -	free_pages((unsigned long) tbl->it_map, order);
> +	vfree(tbl->it_map);
>  
>  	/* free table */
>  	kfree(tbl);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
  2021-02-16  3:33   ` Alexey Kardashevskiy
@ 2021-02-17  0:16     ` David Gibson
  -1 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2021-02-17  0:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 7551 bytes --]

On Tue, Feb 16, 2021 at 02:33:07PM +1100, Alexey Kardashevskiy wrote:
> Most platforms allocate IOMMU table structures (specifically it_map)
> at the boot time and when this fails - it is a valid reason for panic().
> 
> However the powernv platform allocates it_map after a device is returned
> to the host OS after being passed through and this happens long after
> the host OS booted. It is quite possible to trigger the it_map allocation
> panic() and kill the host even though it is not necessary - the host OS
> can still use the DMA bypass mode (requires a tiny fraction of it_map's
> memory) and even if that fails, the host OS is runnnable as it was without
> the device for which allocating it_map causes the panic.
> 
> Instead of immediately crashing in a powernv/ioda2 system, this prints
> an error and continues. All other platforms still call panic().
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kernel/iommu.c               |  6 ++++--
>  arch/powerpc/platforms/cell/iommu.c       |  3 ++-
>  arch/powerpc/platforms/pasemi/iommu.c     |  4 +++-
>  arch/powerpc/platforms/powernv/pci-ioda.c | 15 ++++++++-------
>  arch/powerpc/platforms/pseries/iommu.c    | 10 +++++++---
>  arch/powerpc/sysdev/dart_iommu.c          |  3 ++-
>  6 files changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 8eb6eb0afa97..c1a5c366a664 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -728,8 +728,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
>  
>  	tbl->it_map = vzalloc_node(sz, nid);
> -	if (!tbl->it_map)
> -		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
> +	if (!tbl->it_map) {
> +		pr_err("%s: Can't allocate %ld bytes\n", __func__, sz);
> +		return NULL;
> +	}
>  
>  	iommu_table_reserve_pages(tbl, res_start, res_end);
>  
> diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
> index 2124831cf57c..fa08699aedeb 100644
> --- a/arch/powerpc/platforms/cell/iommu.c
> +++ b/arch/powerpc/platforms/cell/iommu.c
> @@ -486,7 +486,8 @@ cell_iommu_setup_window(struct cbe_iommu *iommu, struct device_node *np,
>  	window->table.it_size = size >> window->table.it_page_shift;
>  	window->table.it_ops = &cell_iommu_ops;
>  
> -	iommu_init_table(&window->table, iommu->nid, 0, 0);
> +	if (!iommu_init_table(&window->table, iommu->nid, 0, 0))
> +		panic("Failed to initialize iommu table");
>  
>  	pr_debug("\tioid      %d\n", window->ioid);
>  	pr_debug("\tblocksize %ld\n", window->table.it_blocksize);
> diff --git a/arch/powerpc/platforms/pasemi/iommu.c b/arch/powerpc/platforms/pasemi/iommu.c
> index b500a6e47e6b..5be7242fbd86 100644
> --- a/arch/powerpc/platforms/pasemi/iommu.c
> +++ b/arch/powerpc/platforms/pasemi/iommu.c
> @@ -146,7 +146,9 @@ static void iommu_table_iobmap_setup(void)
>  	 */
>  	iommu_table_iobmap.it_blocksize = 4;
>  	iommu_table_iobmap.it_ops = &iommu_table_iobmap_ops;
> -	iommu_init_table(&iommu_table_iobmap, 0, 0, 0);
> +	if (!iommu_init_table(&iommu_table_iobmap, 0, 0, 0))
> +		panic("Failed to initialize iommu table");
> +
>  	pr_debug(" <- %s\n", __func__);
>  }
>  
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index f0f901683a2f..66c3c3337334 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1762,7 +1762,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>  	tbl->it_ops = &pnv_ioda1_iommu_ops;
>  	pe->table_group.tce32_start = tbl->it_offset << tbl->it_page_shift;
>  	pe->table_group.tce32_size = tbl->it_size << tbl->it_page_shift;
> -	iommu_init_table(tbl, phb->hose->node, 0, 0);
> +	if (!iommu_init_table(tbl, phb->hose->node, 0, 0))
> +		panic("Failed to initialize iommu table");
>  
>  	pe->dma_setup_done = true;
>  	return;
> @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
>  		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
>  		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
>  	}
> -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
>  
> -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
> +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> +	else
> +		rc = -ENOMEM;
>  	if (rc) {
> -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
> -				rc);
> +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
>  		iommu_tce_table_put(tbl);
> -		return rc;
> +		tbl = NULL; /* This clears iommu_table_base below */
>  	}
> -
>  	if (!pnv_iommu_bypass_disabled)
>  		pnv_pci_ioda2_set_bypass(pe, true);
>  
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 9fc5217f0c8e..4d9ac1f181c2 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -638,7 +638,8 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
>  
>  	iommu_table_setparms(pci->phb, dn, tbl);
>  	tbl->it_ops = &iommu_table_pseries_ops;
> -	iommu_init_table(tbl, pci->phb->node, 0, 0);
> +	if (!iommu_init_table(tbl, pci->phb->node, 0, 0))
> +		panic("Failed to initialize iommu table");
>  
>  	/* Divide the rest (1.75GB) among the children */
>  	pci->phb->dma_window_size = 0x80000000ul;
> @@ -720,7 +721,8 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>  		iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
>  				ppci->table_group, dma_window);
>  		tbl->it_ops = &iommu_table_lpar_multi_ops;
> -		iommu_init_table(tbl, ppci->phb->node, 0, 0);
> +		if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
> +			panic("Failed to initialize iommu table");
>  		iommu_register_group(ppci->table_group,
>  				pci_domain_nr(bus), 0);
>  		pr_debug("  created table: %p\n", ppci->table_group);
> @@ -749,7 +751,9 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
>  		tbl = PCI_DN(dn)->table_group->tables[0];
>  		iommu_table_setparms(phb, dn, tbl);
>  		tbl->it_ops = &iommu_table_pseries_ops;
> -		iommu_init_table(tbl, phb->node, 0, 0);
> +		if (!iommu_init_table(tbl, phb->node, 0, 0))
> +			panic("Failed to initialize iommu table");
> +
>  		set_iommu_table_base(&dev->dev, tbl);
>  		return;
>  	}
> diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
> index 6b4a34b36d98..1d33b7a5ea83 100644
> --- a/arch/powerpc/sysdev/dart_iommu.c
> +++ b/arch/powerpc/sysdev/dart_iommu.c
> @@ -344,7 +344,8 @@ static void iommu_table_dart_setup(void)
>  	iommu_table_dart.it_index = 0;
>  	iommu_table_dart.it_blocksize = 1;
>  	iommu_table_dart.it_ops = &iommu_dart_ops;
> -	iommu_init_table(&iommu_table_dart, -1, 0, 0);
> +	if (!iommu_init_table(&iommu_table_dart, -1, 0, 0))
> +		panic("Failed to initialize iommu table");
>  
>  	/* Reserve the last page of the DART to avoid possible prefetch
>  	 * past the DART mapped area

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
@ 2021-02-17  0:16     ` David Gibson
  0 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2021-02-17  0:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 7551 bytes --]

On Tue, Feb 16, 2021 at 02:33:07PM +1100, Alexey Kardashevskiy wrote:
> Most platforms allocate IOMMU table structures (specifically it_map)
> at the boot time and when this fails - it is a valid reason for panic().
> 
> However the powernv platform allocates it_map after a device is returned
> to the host OS after being passed through and this happens long after
> the host OS booted. It is quite possible to trigger the it_map allocation
> panic() and kill the host even though it is not necessary - the host OS
> can still use the DMA bypass mode (requires a tiny fraction of it_map's
> memory) and even if that fails, the host OS is runnnable as it was without
> the device for which allocating it_map causes the panic.
> 
> Instead of immediately crashing in a powernv/ioda2 system, this prints
> an error and continues. All other platforms still call panic().
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kernel/iommu.c               |  6 ++++--
>  arch/powerpc/platforms/cell/iommu.c       |  3 ++-
>  arch/powerpc/platforms/pasemi/iommu.c     |  4 +++-
>  arch/powerpc/platforms/powernv/pci-ioda.c | 15 ++++++++-------
>  arch/powerpc/platforms/pseries/iommu.c    | 10 +++++++---
>  arch/powerpc/sysdev/dart_iommu.c          |  3 ++-
>  6 files changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 8eb6eb0afa97..c1a5c366a664 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -728,8 +728,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
>  
>  	tbl->it_map = vzalloc_node(sz, nid);
> -	if (!tbl->it_map)
> -		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
> +	if (!tbl->it_map) {
> +		pr_err("%s: Can't allocate %ld bytes\n", __func__, sz);
> +		return NULL;
> +	}
>  
>  	iommu_table_reserve_pages(tbl, res_start, res_end);
>  
> diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
> index 2124831cf57c..fa08699aedeb 100644
> --- a/arch/powerpc/platforms/cell/iommu.c
> +++ b/arch/powerpc/platforms/cell/iommu.c
> @@ -486,7 +486,8 @@ cell_iommu_setup_window(struct cbe_iommu *iommu, struct device_node *np,
>  	window->table.it_size = size >> window->table.it_page_shift;
>  	window->table.it_ops = &cell_iommu_ops;
>  
> -	iommu_init_table(&window->table, iommu->nid, 0, 0);
> +	if (!iommu_init_table(&window->table, iommu->nid, 0, 0))
> +		panic("Failed to initialize iommu table");
>  
>  	pr_debug("\tioid      %d\n", window->ioid);
>  	pr_debug("\tblocksize %ld\n", window->table.it_blocksize);
> diff --git a/arch/powerpc/platforms/pasemi/iommu.c b/arch/powerpc/platforms/pasemi/iommu.c
> index b500a6e47e6b..5be7242fbd86 100644
> --- a/arch/powerpc/platforms/pasemi/iommu.c
> +++ b/arch/powerpc/platforms/pasemi/iommu.c
> @@ -146,7 +146,9 @@ static void iommu_table_iobmap_setup(void)
>  	 */
>  	iommu_table_iobmap.it_blocksize = 4;
>  	iommu_table_iobmap.it_ops = &iommu_table_iobmap_ops;
> -	iommu_init_table(&iommu_table_iobmap, 0, 0, 0);
> +	if (!iommu_init_table(&iommu_table_iobmap, 0, 0, 0))
> +		panic("Failed to initialize iommu table");
> +
>  	pr_debug(" <- %s\n", __func__);
>  }
>  
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index f0f901683a2f..66c3c3337334 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1762,7 +1762,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>  	tbl->it_ops = &pnv_ioda1_iommu_ops;
>  	pe->table_group.tce32_start = tbl->it_offset << tbl->it_page_shift;
>  	pe->table_group.tce32_size = tbl->it_size << tbl->it_page_shift;
> -	iommu_init_table(tbl, phb->hose->node, 0, 0);
> +	if (!iommu_init_table(tbl, phb->hose->node, 0, 0))
> +		panic("Failed to initialize iommu table");
>  
>  	pe->dma_setup_done = true;
>  	return;
> @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
>  		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
>  		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
>  	}
> -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
>  
> -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
> +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> +	else
> +		rc = -ENOMEM;
>  	if (rc) {
> -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
> -				rc);
> +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
>  		iommu_tce_table_put(tbl);
> -		return rc;
> +		tbl = NULL; /* This clears iommu_table_base below */
>  	}
> -
>  	if (!pnv_iommu_bypass_disabled)
>  		pnv_pci_ioda2_set_bypass(pe, true);
>  
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 9fc5217f0c8e..4d9ac1f181c2 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -638,7 +638,8 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
>  
>  	iommu_table_setparms(pci->phb, dn, tbl);
>  	tbl->it_ops = &iommu_table_pseries_ops;
> -	iommu_init_table(tbl, pci->phb->node, 0, 0);
> +	if (!iommu_init_table(tbl, pci->phb->node, 0, 0))
> +		panic("Failed to initialize iommu table");
>  
>  	/* Divide the rest (1.75GB) among the children */
>  	pci->phb->dma_window_size = 0x80000000ul;
> @@ -720,7 +721,8 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>  		iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
>  				ppci->table_group, dma_window);
>  		tbl->it_ops = &iommu_table_lpar_multi_ops;
> -		iommu_init_table(tbl, ppci->phb->node, 0, 0);
> +		if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
> +			panic("Failed to initialize iommu table");
>  		iommu_register_group(ppci->table_group,
>  				pci_domain_nr(bus), 0);
>  		pr_debug("  created table: %p\n", ppci->table_group);
> @@ -749,7 +751,9 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
>  		tbl = PCI_DN(dn)->table_group->tables[0];
>  		iommu_table_setparms(phb, dn, tbl);
>  		tbl->it_ops = &iommu_table_pseries_ops;
> -		iommu_init_table(tbl, phb->node, 0, 0);
> +		if (!iommu_init_table(tbl, phb->node, 0, 0))
> +			panic("Failed to initialize iommu table");
> +
>  		set_iommu_table_base(&dev->dev, tbl);
>  		return;
>  	}
> diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
> index 6b4a34b36d98..1d33b7a5ea83 100644
> --- a/arch/powerpc/sysdev/dart_iommu.c
> +++ b/arch/powerpc/sysdev/dart_iommu.c
> @@ -344,7 +344,8 @@ static void iommu_table_dart_setup(void)
>  	iommu_table_dart.it_index = 0;
>  	iommu_table_dart.it_blocksize = 1;
>  	iommu_table_dart.it_ops = &iommu_dart_ops;
> -	iommu_init_table(&iommu_table_dart, -1, 0, 0);
> +	if (!iommu_init_table(&iommu_table_dart, -1, 0, 0))
> +		panic("Failed to initialize iommu table");
>  
>  	/* Reserve the last page of the DART to avoid possible prefetch
>  	 * past the DART mapped area

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 1/2] powerpc/iommu: Allocate it_map by vmalloc
  2021-02-16  3:33   ` Alexey Kardashevskiy
@ 2021-02-17 19:11     ` Leonardo Bras
  -1 siblings, 0 replies; 20+ messages in thread
From: Leonardo Bras @ 2021-02-17 19:11 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: kvm-ppc, David Gibson

On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
> The IOMMU table uses the it_map bitmap to keep track of allocated DMA
> pages. This has always been a contiguous array allocated at either
> the boot time or when a passed through device is returned to the host OS.
> The it_map memory is allocated by alloc_pages() which allocates
> contiguous physical memory.
> 
> Such allocation method occasionally creates a problem when there is
> no big chunk of memory available (no free memory or too fragmented).
> On powernv/ioda2 the default DMA window requires 16MB for it_map.
> 
> This replaces alloc_pages_node() with vzalloc_node() which allocates
> contiguous block but in virtual memory. This should reduce changes of
> failure but should not cause other behavioral changes as it_map is only
> used by the kernel's DMA hooks/api when MMU is on.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

It looks a very good change, and also makes code much simpler to read.

FWIW:

Reviewed-by: Leonardo Bras <leobras.c@gmail,com>

> ---
>  arch/powerpc/kernel/iommu.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index c00214a4355c..8eb6eb0afa97 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -719,7 +719,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  {
>  	unsigned long sz;
>  	static int welcomed = 0;
> -	struct page *page;
>  	unsigned int i;
>  	struct iommu_pool *p;
>  
> 
> 
> 
> @@ -728,11 +727,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  	/* number of bytes needed for the bitmap */
>  	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
>  
> 
> 
> 
> -	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
> -	if (!page)
> +	tbl->it_map = vzalloc_node(sz, nid);
> +	if (!tbl->it_map)
>  		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
> -	tbl->it_map = page_address(page);
> -	memset(tbl->it_map, 0, sz);
>  
> 
> 
> 
>  	iommu_table_reserve_pages(tbl, res_start, res_end);
>  
> 
> 
> 
> @@ -774,8 +771,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  
> 
> 
> 
>  static void iommu_table_free(struct kref *kref)
>  {
> -	unsigned long bitmap_sz;
> -	unsigned int order;
>  	struct iommu_table *tbl;
>  
> 
> 
> 
>  	tbl = container_of(kref, struct iommu_table, it_kref);
> @@ -796,12 +791,8 @@ static void iommu_table_free(struct kref *kref)
>  	if (!bitmap_empty(tbl->it_map, tbl->it_size))
>  		pr_warn("%s: Unexpected TCEs\n", __func__);
>  
> 
> 
> 
> -	/* calculate bitmap size in bytes */
> -	bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
> -
>  	/* free bitmap */
> -	order = get_order(bitmap_sz);
> -	free_pages((unsigned long) tbl->it_map, order);
> +	vfree(tbl->it_map);
>  
> 
> 
> 
>  	/* free table */
>  	kfree(tbl);



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 1/2] powerpc/iommu: Allocate it_map by vmalloc
@ 2021-02-17 19:11     ` Leonardo Bras
  0 siblings, 0 replies; 20+ messages in thread
From: Leonardo Bras @ 2021-02-17 19:11 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: kvm-ppc, David Gibson

On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
> The IOMMU table uses the it_map bitmap to keep track of allocated DMA
> pages. This has always been a contiguous array allocated at either
> the boot time or when a passed through device is returned to the host OS.
> The it_map memory is allocated by alloc_pages() which allocates
> contiguous physical memory.
> 
> Such allocation method occasionally creates a problem when there is
> no big chunk of memory available (no free memory or too fragmented).
> On powernv/ioda2 the default DMA window requires 16MB for it_map.
> 
> This replaces alloc_pages_node() with vzalloc_node() which allocates
> contiguous block but in virtual memory. This should reduce changes of
> failure but should not cause other behavioral changes as it_map is only
> used by the kernel's DMA hooks/api when MMU is on.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

It looks a very good change, and also makes code much simpler to read.

FWIW:

Reviewed-by: Leonardo Bras <leobras.c@gmail,com>

> ---
>  arch/powerpc/kernel/iommu.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index c00214a4355c..8eb6eb0afa97 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -719,7 +719,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  {
>  	unsigned long sz;
>  	static int welcomed = 0;
> -	struct page *page;
>  	unsigned int i;
>  	struct iommu_pool *p;
>  
> 
> 
> 
> @@ -728,11 +727,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  	/* number of bytes needed for the bitmap */
>  	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
>  
> 
> 
> 
> -	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
> -	if (!page)
> +	tbl->it_map = vzalloc_node(sz, nid);
> +	if (!tbl->it_map)
>  		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
> -	tbl->it_map = page_address(page);
> -	memset(tbl->it_map, 0, sz);
>  
> 
> 
> 
>  	iommu_table_reserve_pages(tbl, res_start, res_end);
>  
> 
> 
> 
> @@ -774,8 +771,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  
> 
> 
> 
>  static void iommu_table_free(struct kref *kref)
>  {
> -	unsigned long bitmap_sz;
> -	unsigned int order;
>  	struct iommu_table *tbl;
>  
> 
> 
> 
>  	tbl = container_of(kref, struct iommu_table, it_kref);
> @@ -796,12 +791,8 @@ static void iommu_table_free(struct kref *kref)
>  	if (!bitmap_empty(tbl->it_map, tbl->it_size))
>  		pr_warn("%s: Unexpected TCEs\n", __func__);
>  
> 
> 
> 
> -	/* calculate bitmap size in bytes */
> -	bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
> -
>  	/* free bitmap */
> -	order = get_order(bitmap_sz);
> -	free_pages((unsigned long) tbl->it_map, order);
> +	vfree(tbl->it_map);
>  
> 
> 
> 
>  	/* free table */
>  	kfree(tbl);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
  2021-02-16  3:33   ` Alexey Kardashevskiy
@ 2021-02-17 19:32     ` Leonardo Bras
  -1 siblings, 0 replies; 20+ messages in thread
From: Leonardo Bras @ 2021-02-17 19:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: kvm-ppc, David Gibson

On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
> Most platforms allocate IOMMU table structures (specifically it_map)
> at the boot time and when this fails - it is a valid reason for panic().
> 
> However the powernv platform allocates it_map after a device is returned
> to the host OS after being passed through and this happens long after
> the host OS booted. It is quite possible to trigger the it_map allocation
> panic() and kill the host even though it is not necessary - the host OS
> can still use the DMA bypass mode (requires a tiny fraction of it_map's
> memory) and even if that fails, the host OS is runnnable as it was without
> the device for which allocating it_map causes the panic.
> 
> Instead of immediately crashing in a powernv/ioda2 system, this prints
> an error and continues. All other platforms still call panic().
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Hello Alexey,

This looks like a good change, that passes panic() decision to platform
code. Everything looks pretty straightforward, but I have a question
regarding this:

> @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
>  		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
>  		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
>  	}
> -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
> -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> 
> +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
> +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> +	else
> +		rc = -ENOMEM;
>  	if (rc) {
> -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
> -				rc);
> +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
>  		iommu_tce_table_put(tbl);
> -		return rc;
> +		tbl = NULL; /* This clears iommu_table_base below */
>  	}
> -
>  	if (!pnv_iommu_bypass_disabled)
>  		pnv_pci_ioda2_set_bypass(pe, true);
>  
> 

If I could understand correctly, previously if iommu_init_table() did
not panic(), and pnv_pci_ioda2_set_window() returned something other
than 0, it would return rc in the if (rc) clause, but now it does not
happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards.

Is that desired?

As far as I could see, returning rc there seems a good procedure after
iommu_init_table returning -ENOMEM.

Best regards, 
Leonardo Bras  





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
@ 2021-02-17 19:32     ` Leonardo Bras
  0 siblings, 0 replies; 20+ messages in thread
From: Leonardo Bras @ 2021-02-17 19:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: kvm-ppc, David Gibson

On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
> Most platforms allocate IOMMU table structures (specifically it_map)
> at the boot time and when this fails - it is a valid reason for panic().
> 
> However the powernv platform allocates it_map after a device is returned
> to the host OS after being passed through and this happens long after
> the host OS booted. It is quite possible to trigger the it_map allocation
> panic() and kill the host even though it is not necessary - the host OS
> can still use the DMA bypass mode (requires a tiny fraction of it_map's
> memory) and even if that fails, the host OS is runnnable as it was without
> the device for which allocating it_map causes the panic.
> 
> Instead of immediately crashing in a powernv/ioda2 system, this prints
> an error and continues. All other platforms still call panic().
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Hello Alexey,

This looks like a good change, that passes panic() decision to platform
code. Everything looks pretty straightforward, but I have a question
regarding this:

> @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
>  		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
>  		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
>  	}
> -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
> -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> 
> +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
> +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> +	else
> +		rc = -ENOMEM;
>  	if (rc) {
> -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
> -				rc);
> +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
>  		iommu_tce_table_put(tbl);
> -		return rc;
> +		tbl = NULL; /* This clears iommu_table_base below */
>  	}
> -
>  	if (!pnv_iommu_bypass_disabled)
>  		pnv_pci_ioda2_set_bypass(pe, true);
>  
> 

If I could understand correctly, previously if iommu_init_table() did
not panic(), and pnv_pci_ioda2_set_window() returned something other
than 0, it would return rc in the if (rc) clause, but now it does not
happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards.

Is that desired?

As far as I could see, returning rc there seems a good procedure after
iommu_init_table returning -ENOMEM.

Best regards, 
Leonardo Bras  




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
  2021-02-17 19:32     ` Leonardo Bras
@ 2021-02-22  5:24       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2021-02-22  5:24 UTC (permalink / raw)
  To: Leonardo Bras, linuxppc-dev; +Cc: kvm-ppc, David Gibson



On 18/02/2021 06:32, Leonardo Bras wrote:
> On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
>> Most platforms allocate IOMMU table structures (specifically it_map)
>> at the boot time and when this fails - it is a valid reason for panic().
>>
>> However the powernv platform allocates it_map after a device is returned
>> to the host OS after being passed through and this happens long after
>> the host OS booted. It is quite possible to trigger the it_map allocation
>> panic() and kill the host even though it is not necessary - the host OS
>> can still use the DMA bypass mode (requires a tiny fraction of it_map's
>> memory) and even if that fails, the host OS is runnnable as it was without
>> the device for which allocating it_map causes the panic.
>>
>> Instead of immediately crashing in a powernv/ioda2 system, this prints
>> an error and continues. All other platforms still call panic().
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Hello Alexey,
> 
> This looks like a good change, that passes panic() decision to platform
> code. Everything looks pretty straightforward, but I have a question
> regarding this:
> 
>> @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
>>   		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
>>   		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
>>   	}
>> -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
>> -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
>>
>> +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
>> +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
>> +	else
>> +		rc = -ENOMEM;
>>   	if (rc) {
>> -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
>> -				rc);
>> +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
>>   		iommu_tce_table_put(tbl);
>> -		return rc;
>> +		tbl = NULL; /* This clears iommu_table_base below */
>>   	}
>> -
>>   	if (!pnv_iommu_bypass_disabled)
>>   		pnv_pci_ioda2_set_bypass(pe, true);
>>   
>>
> 
> If I could understand correctly, previously if iommu_init_table() did
> not panic(), and pnv_pci_ioda2_set_window() returned something other
> than 0, it would return rc in the if (rc) clause, but now it does not
> happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards.
> 
> Is that desired?


Yes. A PE (==device, pretty much) has 2 DMA windows:
- the default one which requires some RAM to operate
- a bypass mode which tells the hardware that PCI addresses are 
statically mapped to RAM 1:1.

This bypass mode does not require extra memory to work and is used in 
the most cases on the bare metal as long as the device supports 64bit 
DMA which is everything except GPUs. Since it is cheap to enable and 
this what we prefer anyway, no urge to fail.


> As far as I could see, returning rc there seems a good procedure after
> iommu_init_table returning -ENOMEM.

This change is intentional and yes it could be done by a separate patch 
but I figured there is no that much value in splitting.



-- 
Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
@ 2021-02-22  5:24       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2021-02-22  5:24 UTC (permalink / raw)
  To: Leonardo Bras, linuxppc-dev; +Cc: kvm-ppc, David Gibson



On 18/02/2021 06:32, Leonardo Bras wrote:
> On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
>> Most platforms allocate IOMMU table structures (specifically it_map)
>> at the boot time and when this fails - it is a valid reason for panic().
>>
>> However the powernv platform allocates it_map after a device is returned
>> to the host OS after being passed through and this happens long after
>> the host OS booted. It is quite possible to trigger the it_map allocation
>> panic() and kill the host even though it is not necessary - the host OS
>> can still use the DMA bypass mode (requires a tiny fraction of it_map's
>> memory) and even if that fails, the host OS is runnnable as it was without
>> the device for which allocating it_map causes the panic.
>>
>> Instead of immediately crashing in a powernv/ioda2 system, this prints
>> an error and continues. All other platforms still call panic().
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Hello Alexey,
> 
> This looks like a good change, that passes panic() decision to platform
> code. Everything looks pretty straightforward, but I have a question
> regarding this:
> 
>> @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
>>   		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
>>   		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
>>   	}
>> -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
>> -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
>>
>> +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
>> +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
>> +	else
>> +		rc = -ENOMEM;
>>   	if (rc) {
>> -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
>> -				rc);
>> +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
>>   		iommu_tce_table_put(tbl);
>> -		return rc;
>> +		tbl = NULL; /* This clears iommu_table_base below */
>>   	}
>> -
>>   	if (!pnv_iommu_bypass_disabled)
>>   		pnv_pci_ioda2_set_bypass(pe, true);
>>   
>>
> 
> If I could understand correctly, previously if iommu_init_table() did
> not panic(), and pnv_pci_ioda2_set_window() returned something other
> than 0, it would return rc in the if (rc) clause, but now it does not
> happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards.
> 
> Is that desired?


Yes. A PE (=device, pretty much) has 2 DMA windows:
- the default one which requires some RAM to operate
- a bypass mode which tells the hardware that PCI addresses are 
statically mapped to RAM 1:1.

This bypass mode does not require extra memory to work and is used in 
the most cases on the bare metal as long as the device supports 64bit 
DMA which is everything except GPUs. Since it is cheap to enable and 
this what we prefer anyway, no urge to fail.


> As far as I could see, returning rc there seems a good procedure after
> iommu_init_table returning -ENOMEM.

This change is intentional and yes it could be done by a separate patch 
but I figured there is no that much value in splitting.



-- 
Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
  2021-02-22  5:24       ` Alexey Kardashevskiy
@ 2021-02-22 18:39         ` Leonardo Bras
  -1 siblings, 0 replies; 20+ messages in thread
From: Leonardo Bras @ 2021-02-22 18:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: kvm-ppc, David Gibson

On Mon, 2021-02-22 at 16:24 +1100, Alexey Kardashevskiy wrote:
> 
> On 18/02/2021 06:32, Leonardo Bras wrote:
> > On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
> > > Most platforms allocate IOMMU table structures (specifically it_map)
> > > at the boot time and when this fails - it is a valid reason for panic().
> > > 
> > > However the powernv platform allocates it_map after a device is returned
> > > to the host OS after being passed through and this happens long after
> > > the host OS booted. It is quite possible to trigger the it_map allocation
> > > panic() and kill the host even though it is not necessary - the host OS
> > > can still use the DMA bypass mode (requires a tiny fraction of it_map's
> > > memory) and even if that fails, the host OS is runnnable as it was without
> > > the device for which allocating it_map causes the panic.
> > > 
> > > Instead of immediately crashing in a powernv/ioda2 system, this prints
> > > an error and continues. All other platforms still call panic().
> > > 
> > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > 
> > Hello Alexey,
> > 
> > This looks like a good change, that passes panic() decision to platform
> > code. Everything looks pretty straightforward, but I have a question
> > regarding this:
> > 
> > > @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
> > >   		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
> > >   		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
> > >   	}
> > > -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
> > > -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> > > 
> > > +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
> > > +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> > > +	else
> > > +		rc = -ENOMEM;
> > >   	if (rc) {
> > > -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
> > > -				rc);
> > > +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
> > >   		iommu_tce_table_put(tbl);
> > > -		return rc;
> > > +		tbl = NULL; /* This clears iommu_table_base below */
> > >   	}
> > > -
> > >   	if (!pnv_iommu_bypass_disabled)
> > >   		pnv_pci_ioda2_set_bypass(pe, true);
> > >   
> > > 
> > > 
> > > 
> > > 
> > 
> > If I could understand correctly, previously if iommu_init_table() did
> > not panic(), and pnv_pci_ioda2_set_window() returned something other
> > than 0, it would return rc in the if (rc) clause, but now it does not
> > happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards.
> > 
> > Is that desired?
> 
> 
> Yes. A PE (==device, pretty much) has 2 DMA windows:
> - the default one which requires some RAM to operate
> - a bypass mode which tells the hardware that PCI addresses are 
> statically mapped to RAM 1:1.
> 
> This bypass mode does not require extra memory to work and is used in 
> the most cases on the bare metal as long as the device supports 64bit 
> DMA which is everything except GPUs. Since it is cheap to enable and 
> this what we prefer anyway, no urge to fail.
> 
> 
> > As far as I could see, returning rc there seems a good procedure after
> > iommu_init_table returning -ENOMEM.
> 
> This change is intentional and yes it could be done by a separate patch 
> but I figured there is no that much value in splitting.

Ok then, thanks for clarifying.
FWIW:

Reviewed-by: Leonardo Bras <leobras.c@gmail.com>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
@ 2021-02-22 18:39         ` Leonardo Bras
  0 siblings, 0 replies; 20+ messages in thread
From: Leonardo Bras @ 2021-02-22 18:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: kvm-ppc, David Gibson

On Mon, 2021-02-22 at 16:24 +1100, Alexey Kardashevskiy wrote:
> 
> On 18/02/2021 06:32, Leonardo Bras wrote:
> > On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
> > > Most platforms allocate IOMMU table structures (specifically it_map)
> > > at the boot time and when this fails - it is a valid reason for panic().
> > > 
> > > However the powernv platform allocates it_map after a device is returned
> > > to the host OS after being passed through and this happens long after
> > > the host OS booted. It is quite possible to trigger the it_map allocation
> > > panic() and kill the host even though it is not necessary - the host OS
> > > can still use the DMA bypass mode (requires a tiny fraction of it_map's
> > > memory) and even if that fails, the host OS is runnnable as it was without
> > > the device for which allocating it_map causes the panic.
> > > 
> > > Instead of immediately crashing in a powernv/ioda2 system, this prints
> > > an error and continues. All other platforms still call panic().
> > > 
> > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > 
> > Hello Alexey,
> > 
> > This looks like a good change, that passes panic() decision to platform
> > code. Everything looks pretty straightforward, but I have a question
> > regarding this:
> > 
> > > @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
> > >   		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
> > >   		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
> > >   	}
> > > -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
> > > -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> > > 
> > > +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
> > > +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> > > +	else
> > > +		rc = -ENOMEM;
> > >   	if (rc) {
> > > -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
> > > -				rc);
> > > +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
> > >   		iommu_tce_table_put(tbl);
> > > -		return rc;
> > > +		tbl = NULL; /* This clears iommu_table_base below */
> > >   	}
> > > -
> > >   	if (!pnv_iommu_bypass_disabled)
> > >   		pnv_pci_ioda2_set_bypass(pe, true);
> > >   
> > > 
> > > 
> > > 
> > > 
> > 
> > If I could understand correctly, previously if iommu_init_table() did
> > not panic(), and pnv_pci_ioda2_set_window() returned something other
> > than 0, it would return rc in the if (rc) clause, but now it does not
> > happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards.
> > 
> > Is that desired?
> 
> 
> Yes. A PE (=device, pretty much) has 2 DMA windows:
> - the default one which requires some RAM to operate
> - a bypass mode which tells the hardware that PCI addresses are 
> statically mapped to RAM 1:1.
> 
> This bypass mode does not require extra memory to work and is used in 
> the most cases on the bare metal as long as the device supports 64bit 
> DMA which is everything except GPUs. Since it is cheap to enable and 
> this what we prefer anyway, no urge to fail.
> 
> 
> > As far as I could see, returning rc there seems a good procedure after
> > iommu_init_table returning -ENOMEM.
> 
> This change is intentional and yes it could be done by a separate patch 
> but I figured there is no that much value in splitting.

Ok then, thanks for clarifying.
FWIW:

Reviewed-by: Leonardo Bras <leobras.c@gmail.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 0/2] powerpc/iommu: Stop crashing the host when VM is terminated
  2021-02-16  3:33 ` Alexey Kardashevskiy
@ 2021-04-29 14:01   ` Michael Ellerman
  -1 siblings, 0 replies; 20+ messages in thread
From: Michael Ellerman @ 2021-04-29 14:01 UTC (permalink / raw)
  To: linuxppc-dev, Alexey Kardashevskiy; +Cc: kvm-ppc, David Gibson

On Tue, 16 Feb 2021 14:33:05 +1100, Alexey Kardashevskiy wrote:
> Killing a VM on a host under memory pressure kills a host which is
> annoying. 1/2 reduces the chances, 2/2 eliminates panic() on
> ioda2.
> 
> 
> This is based on sha1
> f40ddce88593 Linus Torvalds "Linux 5.11".
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/iommu: Allocate it_map by vmalloc
      https://git.kernel.org/powerpc/c/7f1fa82d79947dfabb4046e1d787da9db2bc1c20
[2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
      https://git.kernel.org/powerpc/c/4be518d838809e21354f32087aa9c26efc50b410

cheers

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 0/2] powerpc/iommu: Stop crashing the host when VM is terminated
@ 2021-04-29 14:01   ` Michael Ellerman
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Ellerman @ 2021-04-29 14:01 UTC (permalink / raw)
  To: linuxppc-dev, Alexey Kardashevskiy; +Cc: kvm-ppc, David Gibson

On Tue, 16 Feb 2021 14:33:05 +1100, Alexey Kardashevskiy wrote:
> Killing a VM on a host under memory pressure kills a host which is
> annoying. 1/2 reduces the chances, 2/2 eliminates panic() on
> ioda2.
> 
> 
> This is based on sha1
> f40ddce88593 Linus Torvalds "Linux 5.11".
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/iommu: Allocate it_map by vmalloc
      https://git.kernel.org/powerpc/c/7f1fa82d79947dfabb4046e1d787da9db2bc1c20
[2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
      https://git.kernel.org/powerpc/c/4be518d838809e21354f32087aa9c26efc50b410

cheers

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-04-29 14:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-16  3:33 [PATCH kernel 0/2] powerpc/iommu: Stop crashing the host when VM is terminated Alexey Kardashevskiy
2021-02-16  3:33 ` Alexey Kardashevskiy
2021-02-16  3:33 ` [PATCH kernel 1/2] powerpc/iommu: Allocate it_map by vmalloc Alexey Kardashevskiy
2021-02-16  3:33   ` Alexey Kardashevskiy
2021-02-17  0:16   ` David Gibson
2021-02-17  0:16     ` David Gibson
2021-02-17 19:11   ` Leonardo Bras
2021-02-17 19:11     ` Leonardo Bras
2021-02-16  3:33 ` [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation Alexey Kardashevskiy
2021-02-16  3:33   ` Alexey Kardashevskiy
2021-02-17  0:16   ` David Gibson
2021-02-17  0:16     ` David Gibson
2021-02-17 19:32   ` Leonardo Bras
2021-02-17 19:32     ` Leonardo Bras
2021-02-22  5:24     ` Alexey Kardashevskiy
2021-02-22  5:24       ` Alexey Kardashevskiy
2021-02-22 18:39       ` Leonardo Bras
2021-02-22 18:39         ` Leonardo Bras
2021-04-29 14:01 ` [PATCH kernel 0/2] powerpc/iommu: Stop crashing the host when VM is terminated Michael Ellerman
2021-04-29 14:01   ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.