linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu
@ 2017-08-01 11:37 Baoquan He
  2017-08-01 11:37 ` [PATCH v9 01/13] iommu/amd: Detect pre enabled translation Baoquan He
                   ` (12 more replies)
  0 siblings, 13 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

When kernel panicked and jump into the kdump kernel, DMA started by the
1st kernel is not stopped, this is called on-flight DMA. In the current
code it will disable iommu and build new translation table and attach
device to it. This will cause:

 1. IO_PAGE_FAULT warning message can be seen.
 2. transfer data to or from incorrect areas of memory.

Sometime it causes the dump failure or kernel hang.

The principle of the fix is to copy the old device table to let the old-flight
DMA continue looking up to get correct address translation and irq remap result,
meanwhile to defer the assignment of device to domain to device driver initializtion
stage. The old domain ids used in 1st kernel are reserved. And a new call-back
is_attach_deferred() is added to iommu-ops, will check whether we need defer the
domain attach/detach in iommu-core code. If defer is needed, just return directly
from amd iommu attach/detach function. The attachment will be done in device driver
initializaiton stage when calling get_domain().

Change history:
v8->v9:
    Made changes according to Joerg's reviewing comments and suggestions:
    - Check if all IOMMUs are pre-enabled, otherwise do not copy dev table
      and just continue as normal kernel does.

    - Add a new global old_dev_tbl_cpy to point to a newly allocated device
      table. The content of old device table will be copied to the specific
      device table for copying which old_dev_tbl_cpy points at. If copy failed
      we can still use the amd_iommu_dev_table which is allocated in
      early_amd_iommu_init(). This is for better rolling back if copy failed,
      the amd_iommu_dev_table has got necessary initialization since iommu init. 

    - Always allocate device table with GFP_DMA32 flag to make sure that they
      are under 4G. This tries to work around the issue mentioned in patch 10/13.
      Meanwhile double check if the address of device table is above 4G since
      it could be touched accidentally in corrupted 1st kernel and not trustworthy
      any more.

v7->v8:
    Rebase patchset v7 on the latest v4.13-rc1.
    - And re-enable printing IO_PAGE_FAULT message in kdump kernel.

    - Only disable iommu if amd_iommu=off is specified in kdump kernel.


v6->v7:
    Two main changes are made according to Joerg's suggestion:
    - Add is_attach_deferred call-back to iommu-ops. With this domain
      can be deferred to device driver init cleanly.

    - Allocate memory below 4G for dev table if translation pre-enabled.
      AMD engineer pointed out that it's unsafe to update the device-table
      while iommu is enabled. device-table pointer update is split up into
      two 32bit writes in the IOMMU hardware. So updating it while the IOMMU
      is enabled could have some nasty side effects.

v5->v6:
    According to Joerg's comments made several below main changes:
    - Add sanity check when copy old dev tables.

    - If a device is set up with guest translations (DTE.GV=1), then don't
      copy that information but move the device over to an empty guest-cr3
      table and handle the faults in the PPR log (which just answer them
      with INVALID).

v5:
    bnx2 NIC can't reset itself during driver init. Post patch to reset
    it during driver init. IO_PAGE_FAULT can't be seen anymore.

    Below is link of v5 post.
    https://lists.linuxfoundation.org/pipermail/iommu/2016-September/018527.html

Baoquan He (13):
  iommu/amd: Detect pre enabled translation
  iommu/amd: add several helper functions
  Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel"
  iommu/amd: Define bit fields for DTE particularly
  iommu/amd: Add function copy_dev_tables()
  iommu/amd: copy old trans table from old kernel
  iommu/amd: Do sanity check for address translation and irq remap of
    old dev table entry
  iommu: Add is_attach_deferred call-back to iommu-ops
  iommu/amd: Use is_attach_deferred call-back
  iommu/amd: Allocate memory below 4G for dev table if translation
    pre-enabled
  iommu/amd: Don't copy GCR3 table root pointer
  iommu/amd: Clear out the GV flag when handle deferred domain attach
  iommu/amd: Disable iommu only if amd_iommu=off is specified

 drivers/iommu/amd_iommu.c       |  81 ++++++++-------
 drivers/iommu/amd_iommu_init.c  | 222 ++++++++++++++++++++++++++++++++++++----
 drivers/iommu/amd_iommu_proto.h |   2 +
 drivers/iommu/amd_iommu_types.h |  55 +++++++++-
 drivers/iommu/amd_iommu_v2.c    |  18 +++-
 drivers/iommu/iommu.c           |   8 ++
 include/linux/iommu.h           |   1 +
 7 files changed, 323 insertions(+), 64 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v9 01/13] iommu/amd: Detect pre enabled translation
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-01 11:37 ` [PATCH v9 02/13] iommu/amd: add several helper functions Baoquan He
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

Add functions to check whether translation is already enabled in IOMMU.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu_init.c  | 24 ++++++++++++++++++++++++
 drivers/iommu/amd_iommu_proto.h |  1 +
 drivers/iommu/amd_iommu_types.h |  3 +++
 3 files changed, 28 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5cc597b383c7..e39857ce6481 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -258,6 +258,25 @@ static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum iommu_init_state state);
 static void init_device_table_dma(void);
 
+bool translation_pre_enabled(struct amd_iommu *iommu)
+{
+	return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
+}
+
+static void clear_translation_pre_enabled(struct amd_iommu *iommu)
+{
+	iommu->flags &= ~AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
+}
+
+static void init_translation_status(struct amd_iommu *iommu)
+{
+	u32 ctrl;
+
+	ctrl = readl(iommu->mmio_base + MMIO_CONTROL_OFFSET);
+	if (ctrl & (1<<CONTROL_IOMMU_EN))
+		iommu->flags |= AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
+}
+
 static inline void update_last_devid(u16 devid)
 {
 	if (devid > amd_iommu_last_bdf)
@@ -1399,6 +1418,11 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 
 	iommu->int_enabled = false;
 
+	init_translation_status(iommu);
+
+	if (translation_pre_enabled(iommu))
+		pr_warn("Translation is already enabled - trying to copy translation structures\n");
+
 	ret = init_iommu_from_acpi(iommu, h);
 	if (ret)
 		return ret;
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 466260f8a1df..a9666d2005bb 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -87,4 +87,5 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
 	return !!(iommu->features & f);
 }
 
+extern bool translation_pre_enabled(struct amd_iommu *iommu);
 #endif /* _ASM_X86_AMD_IOMMU_PROTO_H  */
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 294a409e283b..0c98b2cf04cc 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -435,6 +435,8 @@ struct iommu_domain;
 struct irq_domain;
 struct amd_irte_ops;
 
+#define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED      (1 << 0)
+
 /*
  * This structure contains generic data for  IOMMU protection domains
  * independent of their use.
@@ -569,6 +571,7 @@ struct amd_iommu {
 	struct amd_irte_ops *irte_ops;
 #endif
 
+	u32 flags;
 	volatile u64 __aligned(8) cmd_sem;
 };
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 02/13] iommu/amd: add several helper functions
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
  2017-08-01 11:37 ` [PATCH v9 01/13] iommu/amd: Detect pre enabled translation Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-01 11:37 ` [PATCH v9 03/13] Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel" Baoquan He
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

Move single iommu enabling codes into a wrapper function early_enable_iommu().
This can make later kdump change easier.

And also add iommu_disable_command_buffer and iommu_disable_event_buffer
for later usage.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu_init.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index e39857ce6481..b36c145c498d 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -634,6 +634,14 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
 	amd_iommu_reset_cmd_buffer(iommu);
 }
 
+/*
+ * This function disables the command buffer
+ */
+static void iommu_disable_command_buffer(struct amd_iommu *iommu)
+{
+	iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
+}
+
 static void __init free_command_buffer(struct amd_iommu *iommu)
 {
 	free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
@@ -666,6 +674,14 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
 	iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
 }
 
+/*
+ * This function disables the event log buffer
+ */
+static void iommu_disable_event_buffer(struct amd_iommu *iommu)
+{
+	iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
+}
+
 static void __init free_event_buffer(struct amd_iommu *iommu)
 {
 	free_pages((unsigned long)iommu->evt_buf, get_order(EVT_BUFFER_SIZE));
@@ -2046,6 +2062,19 @@ static void iommu_enable_ga(struct amd_iommu *iommu)
 #endif
 }
 
+static void early_enable_iommu(struct amd_iommu *iommu)
+{
+	iommu_disable(iommu);
+	iommu_init_flags(iommu);
+	iommu_set_device_table(iommu);
+	iommu_enable_command_buffer(iommu);
+	iommu_enable_event_buffer(iommu);
+	iommu_set_exclusion_range(iommu);
+	iommu_enable_ga(iommu);
+	iommu_enable(iommu);
+	iommu_flush_all_caches(iommu);
+}
+
 /*
  * This function finally enables all IOMMUs found in the system after
  * they have been initialized
@@ -2054,17 +2083,8 @@ static void early_enable_iommus(void)
 {
 	struct amd_iommu *iommu;
 
-	for_each_iommu(iommu) {
-		iommu_disable(iommu);
-		iommu_init_flags(iommu);
-		iommu_set_device_table(iommu);
-		iommu_enable_command_buffer(iommu);
-		iommu_enable_event_buffer(iommu);
-		iommu_set_exclusion_range(iommu);
-		iommu_enable_ga(iommu);
-		iommu_enable(iommu);
-		iommu_flush_all_caches(iommu);
-	}
+	for_each_iommu(iommu)
+		early_enable_iommu(iommu);
 
 #ifdef CONFIG_IRQ_REMAP
 	if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 03/13] Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel"
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
  2017-08-01 11:37 ` [PATCH v9 01/13] iommu/amd: Detect pre enabled translation Baoquan He
  2017-08-01 11:37 ` [PATCH v9 02/13] iommu/amd: add several helper functions Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-01 11:37 ` [PATCH v9 04/13] iommu/amd: Define bit fields for DTE particularly Baoquan He
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

This reverts commit 54bd63570484167cb13edf81e31fff107b879981.

We still need the IO_PAGE_FAULT message to warn error after the
issue of on-flight dma in kdump kernel is fixed.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu.c       | 3 +--
 drivers/iommu/amd_iommu_init.c  | 9 ---------
 drivers/iommu/amd_iommu_types.h | 1 -
 3 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 688e77576e5a..e8a6d8109564 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2086,8 +2086,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
 		flags    |= tmp;
 	}
 
-
-	flags &= ~(DTE_FLAG_SA | 0xffffULL);
+	flags &= ~(0xffffUL);
 	flags |= domain->id;
 
 	amd_iommu_dev_table[devid].data[1]  = flags;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index b36c145c498d..8896137f17af 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -29,7 +29,6 @@
 #include <linux/export.h>
 #include <linux/iommu.h>
 #include <linux/kmemleak.h>
-#include <linux/crash_dump.h>
 #include <asm/pci-direct.h>
 #include <asm/iommu.h>
 #include <asm/gart.h>
@@ -1942,14 +1941,6 @@ static void init_device_table_dma(void)
 	for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
 		set_dev_entry_bit(devid, DEV_ENTRY_VALID);
 		set_dev_entry_bit(devid, DEV_ENTRY_TRANSLATION);
-		/*
-		 * In kdump kernels in-flight DMA from the old kernel might
-		 * cause IO_PAGE_FAULTs. There are no reports that a kdump
-		 * actually failed because of that, so just disable fault
-		 * reporting in the hardware to get rid of the messages
-		 */
-		if (is_kdump_kernel())
-			set_dev_entry_bit(devid, DEV_ENTRY_NO_PAGE_FAULT);
 	}
 }
 
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 0c98b2cf04cc..db7ceb4d0957 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -322,7 +322,6 @@
 #define IOMMU_PTE_IW (1ULL << 62)
 
 #define DTE_FLAG_IOTLB	(1ULL << 32)
-#define DTE_FLAG_SA	(1ULL << 34)
 #define DTE_FLAG_GV	(1ULL << 55)
 #define DTE_FLAG_MASK	(0x3ffULL << 32)
 #define DTE_GLX_SHIFT	(56)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 04/13] iommu/amd: Define bit fields for DTE particularly
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (2 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 03/13] Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel" Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-01 11:37 ` [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables() Baoquan He
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

In AMD-Vi spec several bits of IO PTE fields and DTE fields are similar
so that both of them can share the same MACRO definition. However
defining them respectively can make code more read-able. Do it now.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu.c       |  8 ++++----
 drivers/iommu/amd_iommu_types.h | 18 ++++++++++++++----
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e8a6d8109564..e5a03f259986 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1537,9 +1537,9 @@ static int iommu_map_page(struct protection_domain *dom,
 
 	if (count > 1) {
 		__pte = PAGE_SIZE_PTE(phys_addr, page_size);
-		__pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_P | IOMMU_PTE_FC;
+		__pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC;
 	} else
-		__pte = phys_addr | IOMMU_PTE_P | IOMMU_PTE_FC;
+		__pte = phys_addr | IOMMU_PTE_PR | IOMMU_PTE_FC;
 
 	if (prot & IOMMU_PROT_IR)
 		__pte |= IOMMU_PTE_IR;
@@ -2053,7 +2053,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
 
 	pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
 		    << DEV_ENTRY_MODE_SHIFT;
-	pte_root |= IOMMU_PTE_IR | IOMMU_PTE_IW | IOMMU_PTE_P | IOMMU_PTE_TV;
+	pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;
 
 	flags = amd_iommu_dev_table[devid].data[1];
 
@@ -2096,7 +2096,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
 static void clear_dte_entry(u16 devid)
 {
 	/* remove entry from the device table seen by the hardware */
-	amd_iommu_dev_table[devid].data[0]  = IOMMU_PTE_P | IOMMU_PTE_TV;
+	amd_iommu_dev_table[devid].data[0]  = DTE_FLAG_V | DTE_FLAG_TV;
 	amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;
 
 	amd_iommu_apply_erratum_63(devid);
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index db7ceb4d0957..f88e802481a3 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -265,7 +265,7 @@
 #define PM_LEVEL_INDEX(x, a)	(((a) >> PM_LEVEL_SHIFT((x))) & 0x1ffULL)
 #define PM_LEVEL_ENC(x)		(((x) << 9) & 0xe00ULL)
 #define PM_LEVEL_PDE(x, a)	((a) | PM_LEVEL_ENC((x)) | \
-				 IOMMU_PTE_P | IOMMU_PTE_IR | IOMMU_PTE_IW)
+				 IOMMU_PTE_PR | IOMMU_PTE_IR | IOMMU_PTE_IW)
 #define PM_PTE_LEVEL(pte)	(((pte) >> 9) & 0x7ULL)
 
 #define PM_MAP_4k		0
@@ -314,13 +314,23 @@
 #define PTE_LEVEL_PAGE_SIZE(level)			\
 	(1ULL << (12 + (9 * (level))))
 
-#define IOMMU_PTE_P  (1ULL << 0)
-#define IOMMU_PTE_TV (1ULL << 1)
+/*
+ * Bit value definition for I/O PTE fields
+ */
+#define IOMMU_PTE_PR (1ULL << 0)
 #define IOMMU_PTE_U  (1ULL << 59)
 #define IOMMU_PTE_FC (1ULL << 60)
 #define IOMMU_PTE_IR (1ULL << 61)
 #define IOMMU_PTE_IW (1ULL << 62)
 
+/*
+ * Bit value definition for DTE fields
+ */
+#define DTE_FLAG_V  (1ULL << 0)
+#define DTE_FLAG_TV (1ULL << 1)
+#define DTE_FLAG_IR (1ULL << 61)
+#define DTE_FLAG_IW (1ULL << 62)
+
 #define DTE_FLAG_IOTLB	(1ULL << 32)
 #define DTE_FLAG_GV	(1ULL << 55)
 #define DTE_FLAG_MASK	(0x3ffULL << 32)
@@ -342,7 +352,7 @@
 #define GCR3_VALID		0x01ULL
 
 #define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
-#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_P)
+#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
 #define IOMMU_PTE_PAGE(pte) (phys_to_virt((pte) & IOMMU_PAGE_MASK))
 #define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables()
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (3 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 04/13] iommu/amd: Define bit fields for DTE particularly Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-04 12:09   ` Joerg Roedel
  2017-08-01 11:37 ` [PATCH v9 06/13] iommu/amd: copy old trans table from old kernel Baoquan He
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

Add function copy_dev_tables to copy the old DEV table entries of the panicked
kernel to the new allocated device table. Since all iommus share the same device
table the copy only need be done one time. Here add a new global old_dev_tbl_cpy
to point to the newly allocated device table which the content of old device
table will be copied to. Besides, we also need to:

  - Check whether all IOMMUs actually use the same device table with the same size

  - Verify that the size of the old device table is the expected size.

  - Reserve the old domain id occupied in 1st kernel to avoid touching the old
    io-page tables. Then on-flight DMA can continue looking it up.

And also define MACRO DEV_DOMID_MASK to replace magic number 0xffffULL, it can be
reused in copy_dev_tables().

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu.c       |  2 +-
 drivers/iommu/amd_iommu_init.c  | 64 +++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/amd_iommu_types.h |  1 +
 3 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e5a03f259986..4d00f1bda900 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2086,7 +2086,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
 		flags    |= tmp;
 	}
 
-	flags &= ~(0xffffUL);
+	flags &= ~DEV_DOMID_MASK;
 	flags |= domain->id;
 
 	amd_iommu_dev_table[devid].data[1]  = flags;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 8896137f17af..ab6794539527 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -195,6 +195,11 @@ spinlock_t amd_iommu_pd_lock;
  * page table root pointer.
  */
 struct dev_table_entry *amd_iommu_dev_table;
+/*
+ * Pointer to a device table which the content of old device table
+ * will be copied to. It's only be used in kdump kernel.
+ */
+static struct dev_table_entry *old_dev_tbl_cpy;
 
 /*
  * The alias table is a driver specific data structure which contains the
@@ -842,6 +847,65 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
 }
 
 
+static int copy_device_table(void)
+{
+	struct dev_table_entry *old_devtb = NULL;
+	u32 lo, hi, devid, old_devtb_size;
+	phys_addr_t old_devtb_phys;
+	u64 entry, last_entry = 0;
+	struct amd_iommu *iommu;
+	u16 dom_id, dte_v;
+	static int copied;
+	gfp_t gfp_flag;
+
+	for_each_iommu(iommu) {
+		/* All IOMMUs should use the same device table with the same size */
+		lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
+		hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
+		entry = (((u64) hi) << 32) + lo;
+		if (last_entry && last_entry != entry) {
+			pr_err("IOMMU:%d should use the same dev table as others!/n",
+				iommu->index);
+			return -1;
+		}
+		last_entry = entry;
+
+		old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
+		if (old_devtb_size != dev_table_size) {
+			pr_err("The device table size of IOMMU:%d is not expected!/n",
+				iommu->index);
+			return -1;
+		}
+
+		if (copied)
+			continue;
+
+		old_devtb_phys = entry & PAGE_MASK;
+		old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+		if (!old_devtb)
+			return -1;
+
+		gfp_flag = GFP_KERNEL | __GFP_ZERO;
+		old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
+					get_order(dev_table_size));
+		if (old_dev_tbl_cpy == NULL) {
+			pr_err("Failed to allocate memory for copying old device table!/n");
+			return -1;
+		}
+
+		for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
+			old_dev_tbl_cpy[devid] = old_devtb[devid];
+			dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
+			dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
+			if (dte_v && dom_id)
+				__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+		}
+		memunmap(old_devtb);
+		copied = 1;
+	}
+	return 0;
+}
+
 void amd_iommu_apply_erratum_63(u16 devid)
 {
 	int sysmgt;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index f88e802481a3..a7f6cf8c841e 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -336,6 +336,7 @@
 #define DTE_FLAG_MASK	(0x3ffULL << 32)
 #define DTE_GLX_SHIFT	(56)
 #define DTE_GLX_MASK	(3)
+#define DEV_DOMID_MASK	0xffffULL
 
 #define DTE_GCR3_VAL_A(x)	(((x) >> 12) & 0x00007ULL)
 #define DTE_GCR3_VAL_B(x)	(((x) >> 15) & 0x0ffffULL)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 06/13] iommu/amd: copy old trans table from old kernel
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (4 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables() Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-04 12:21   ` Joerg Roedel
  2017-08-01 11:37 ` [PATCH v9 07/13] iommu/amd: Do sanity check for address translation and irq remap of old dev table entry Baoquan He
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

Here several things need be done:
- If iommu is pre-enabled in a normal kernel, just disable it and print
  warning.

- If any one of IOMMUs is not pre-enabled in kdump kernel, just continue
  as it does in normal kernel.

- If failed to copy dev table of old kernel, continue to proceed as
  it does in normal kernel.

- Only if all IOMMUs are pre-enabled and copy dev table is done well, free
  the dev table allocated in early_amd_iommu_init() and make amd_iommu_dev_table
  point to the copied one.

- Disable and Re-enable event/cmd buffer,  install the copied DTE table
  to reg, and detect and enable guest vapic.

- Flush all caches

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu_init.c | 57 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 50 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index ab6794539527..e3d8af9d9636 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -37,6 +37,7 @@
 #include <asm/io_apic.h>
 #include <asm/irq_remapping.h>
 
+#include <linux/crash_dump.h>
 #include "amd_iommu_proto.h"
 #include "amd_iommu_types.h"
 #include "irq_remapping.h"
@@ -1498,9 +1499,12 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 	iommu->int_enabled = false;
 
 	init_translation_status(iommu);
-
-	if (translation_pre_enabled(iommu))
-		pr_warn("Translation is already enabled - trying to copy translation structures\n");
+	if (translation_pre_enabled(iommu) && !is_kdump_kernel()) {
+		iommu_disable(iommu);
+		clear_translation_pre_enabled(iommu);
+		pr_warn("Translation was enabled for IOMMU:%d but we are not in kdump mode\n",
+			iommu->index);
+	}
 
 	ret = init_iommu_from_acpi(iommu, h);
 	if (ret)
@@ -1995,8 +1999,7 @@ static int __init init_memory_definitions(struct acpi_table_header *table)
 }
 
 /*
- * Init the device table to not allow DMA access for devices and
- * suppress all page faults
+ * Init the device table to not allow DMA access for devices
  */
 static void init_device_table_dma(void)
 {
@@ -2137,9 +2140,49 @@ static void early_enable_iommu(struct amd_iommu *iommu)
 static void early_enable_iommus(void)
 {
 	struct amd_iommu *iommu;
+	bool is_pre_disabled = false;
 
-	for_each_iommu(iommu)
-		early_enable_iommu(iommu);
+	for_each_iommu(iommu) {
+		if (!translation_pre_enabled(iommu)) {
+			is_pre_disabled = true;
+			break;
+		}
+	}
+
+	if (is_pre_disabled) {
+		for_each_iommu(iommu)
+			early_enable_iommu(iommu);
+	} else {
+		pr_warn("Translation is already enabled - trying to copy translation structures\n");
+		if (copy_device_table()) {
+			pr_err("Failed to copy DEV table from previous kernel.\n");
+			/*
+			 * If failed to copy dev tables from old kernel, continue to proceed
+			 * as it does in normal kernel.
+			 */
+			if(old_dev_tbl_cpy != NULL)
+				free_pages((unsigned long)old_dev_tbl_cpy,
+						get_order(dev_table_size));
+			for_each_iommu(iommu) {
+				clear_translation_pre_enabled(iommu);
+				early_enable_iommu(iommu);
+			}
+		} else {
+			pr_info("Copied DEV table from previous kernel.\n");
+			free_pages((unsigned long)amd_iommu_dev_table,
+					get_order(dev_table_size));
+			amd_iommu_dev_table = old_dev_tbl_cpy;
+			for_each_iommu(iommu) {
+				iommu_disable_command_buffer(iommu);
+				iommu_disable_event_buffer(iommu);
+				iommu_enable_command_buffer(iommu);
+				iommu_enable_event_buffer(iommu);
+				iommu_enable_ga(iommu);
+				iommu_set_device_table(iommu);
+				iommu_flush_all_caches(iommu);
+			}
+		}
+	}
 
 #ifdef CONFIG_IRQ_REMAP
 	if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 07/13] iommu/amd: Do sanity check for address translation and irq remap of old dev table entry
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (5 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 06/13] iommu/amd: copy old trans table from old kernel Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-01 11:37 ` [PATCH v9 08/13] iommu: Add is_attach_deferred call-back to iommu-ops Baoquan He
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

Firstly split the dev table entry copy into address translation part
and irq remapping part. Because these two parts could be enabled
independently.

Secondly do sanity check for address translation and irq remap of old
dev table entry separately.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu.c       |  5 -----
 drivers/iommu/amd_iommu_init.c  | 25 ++++++++++++++++++++++---
 drivers/iommu/amd_iommu_types.h |  8 ++++++++
 3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 4d00f1bda900..5508f57a3e4f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3776,11 +3776,6 @@ EXPORT_SYMBOL(amd_iommu_device_info);
 
 static struct irq_chip amd_ir_chip;
 
-#define DTE_IRQ_PHYS_ADDR_MASK	(((1ULL << 45)-1) << 6)
-#define DTE_IRQ_REMAP_INTCTL    (2ULL << 60)
-#define DTE_IRQ_TABLE_LEN       (8ULL << 1)
-#define DTE_IRQ_REMAP_ENABLE    1ULL
-
 static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
 {
 	u64 dte;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index e3d8af9d9636..6a77b99d08e4 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -850,12 +850,12 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
 
 static int copy_device_table(void)
 {
+	u64 int_ctl, int_tab_len, entry, last_entry = 0;
 	struct dev_table_entry *old_devtb = NULL;
 	u32 lo, hi, devid, old_devtb_size;
 	phys_addr_t old_devtb_phys;
-	u64 entry, last_entry = 0;
 	struct amd_iommu *iommu;
-	u16 dom_id, dte_v;
+	u16 dom_id, dte_v, irq_v;
 	static int copied;
 	gfp_t gfp_flag;
 
@@ -898,8 +898,27 @@ static int copy_device_table(void)
 			old_dev_tbl_cpy[devid] = old_devtb[devid];
 			dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
 			dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
-			if (dte_v && dom_id)
+			if (dte_v && dom_id) {
+				old_dev_tbl_cpy[devid].data[0]
+					= old_devtb[devid].data[0];
+				old_dev_tbl_cpy[devid].data[1]
+					= old_devtb[devid].data[1];
 				__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+			}
+
+			irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
+			int_ctl = old_devtb[devid].data[2] & DTE_IRQ_REMAP_INTCTL_MASK;
+			int_tab_len = old_devtb[devid].data[2] & DTE_IRQ_TABLE_LEN_MASK;
+			if (irq_v && (int_ctl || int_tab_len)) {
+				if ((int_ctl != DTE_IRQ_REMAP_INTCTL) ||
+					 (int_tab_len != DTE_IRQ_TABLE_LEN)) {
+					pr_err("Wrong old irq remapping flag: %#x\n", devid);
+					return -1;
+				}
+
+				old_dev_tbl_cpy[devid].data[2]
+					= old_devtb[devid].data[2];
+			}
 		}
 		memunmap(old_devtb);
 		copied = 1;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index a7f6cf8c841e..f0979183ec9b 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -250,6 +250,14 @@
 
 #define GA_GUEST_NR		0x1
 
+/* Bit value definition for dte irq remapping fields*/
+#define DTE_IRQ_PHYS_ADDR_MASK	(((1ULL << 45)-1) << 6)
+#define DTE_IRQ_REMAP_INTCTL_MASK	(0x3ULL << 60)
+#define DTE_IRQ_TABLE_LEN_MASK	(0xfULL << 1)
+#define DTE_IRQ_REMAP_INTCTL    (2ULL << 60)
+#define DTE_IRQ_TABLE_LEN       (8ULL << 1)
+#define DTE_IRQ_REMAP_ENABLE    1ULL
+
 #define PAGE_MODE_NONE    0x00
 #define PAGE_MODE_1_LEVEL 0x01
 #define PAGE_MODE_2_LEVEL 0x02
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 08/13] iommu: Add is_attach_deferred call-back to iommu-ops
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (6 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 07/13] iommu/amd: Do sanity check for address translation and irq remap of old dev table entry Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-01 11:37 ` [PATCH v9 09/13] iommu/amd: Use is_attach_deferred call-back Baoquan He
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

This new call-back will be used to check if the domain attach need be
deferred for now. If yes, the domain attach/detach will return directly.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/iommu.c | 8 ++++++++
 include/linux/iommu.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3f6ea160afed..86581b115b92 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1283,6 +1283,10 @@ static int __iommu_attach_device(struct iommu_domain *domain,
 				 struct device *dev)
 {
 	int ret;
+	if ((domain->ops->is_attach_deferred != NULL) &&
+	    domain->ops->is_attach_deferred(domain, dev))
+		return 0;
+
 	if (unlikely(domain->ops->attach_dev == NULL))
 		return -ENODEV;
 
@@ -1324,6 +1328,10 @@ EXPORT_SYMBOL_GPL(iommu_attach_device);
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
+	if ((domain->ops->is_attach_deferred != NULL) &&
+	    domain->ops->is_attach_deferred(domain, dev))
+		return;
+
 	if (unlikely(domain->ops->detach_dev == NULL))
 		return;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2cb54adc4a33..63983c9e6c3a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -225,6 +225,7 @@ struct iommu_ops {
 	u32 (*domain_get_windows)(struct iommu_domain *domain);
 
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
+	bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);
 
 	unsigned long pgsize_bitmap;
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 09/13] iommu/amd: Use is_attach_deferred call-back
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (7 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 08/13] iommu: Add is_attach_deferred call-back to iommu-ops Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-01 11:37 ` [PATCH v9 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled Baoquan He
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

Implement call-back is_attach_deferred and use it to defer the
domain attach from iommu driver init to device driver init when
iommu is pre-enabled in kdump kernel.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 5508f57a3e4f..4ff413b34b51 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -121,6 +121,7 @@ struct iommu_dev_data {
 					     PPR completions */
 	u32 errata;			  /* Bitmap for errata to apply */
 	bool use_vapic;			  /* Enable device to use vapic mode */
+	bool defer_attach;
 
 	struct ratelimit_state rs;	  /* Ratelimit IOPF messages */
 };
@@ -371,12 +372,17 @@ static u16 get_alias(struct device *dev)
 static struct iommu_dev_data *find_dev_data(u16 devid)
 {
 	struct iommu_dev_data *dev_data;
+	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	dev_data = search_dev_data(devid);
 
-	if (dev_data == NULL)
+	if (dev_data == NULL) {
 		dev_data = alloc_dev_data(devid);
 
+		if (translation_pre_enabled(iommu))
+			dev_data->defer_attach = true;
+	}
+
 	return dev_data;
 }
 
@@ -2477,11 +2483,18 @@ static struct iommu_group *amd_iommu_device_group(struct device *dev)
 static struct protection_domain *get_domain(struct device *dev)
 {
 	struct protection_domain *domain;
+	struct iommu_domain *io_domain;
 
 	if (!check_device(dev))
 		return ERR_PTR(-EINVAL);
 
 	domain = get_dev_data(dev)->domain;
+	if (domain == NULL && get_dev_data(dev)->defer_attach) {
+		get_dev_data(dev)->defer_attach = false;
+		io_domain = iommu_get_domain_for_dev(dev);
+		domain = to_pdomain(io_domain);
+		attach_device(dev, domain);
+	}
 	if (!dma_ops_domain(domain))
 		return ERR_PTR(-EBUSY);
 
@@ -3372,6 +3385,13 @@ static void amd_iommu_apply_resv_region(struct device *dev,
 	WARN_ON_ONCE(reserve_iova(&dma_dom->iovad, start, end) == NULL);
 }
 
+static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain,
+					 struct device *dev)
+{
+	struct iommu_dev_data *dev_data = dev->archdata.iommu;
+	return dev_data->defer_attach;
+}
+
 const struct iommu_ops amd_iommu_ops = {
 	.capable = amd_iommu_capable,
 	.domain_alloc = amd_iommu_domain_alloc,
@@ -3388,6 +3408,7 @@ const struct iommu_ops amd_iommu_ops = {
 	.get_resv_regions = amd_iommu_get_resv_regions,
 	.put_resv_regions = amd_iommu_put_resv_regions,
 	.apply_resv_region = amd_iommu_apply_resv_region,
+	.is_attach_deferred = amd_iommu_is_attach_deferred,
 	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
 };
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (8 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 09/13] iommu/amd: Use is_attach_deferred call-back Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-04 12:25   ` Joerg Roedel
  2017-08-01 11:37 ` [PATCH v9 11/13] iommu/amd: Don't copy GCR3 table root pointer Baoquan He
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

AMD pointed out it's unsafe to update the device-table while iommu
is enabled. It turns out that device-table pointer update is split
up into two 32bit writes in the IOMMU hardware. So updating it while
the IOMMU is enabled could have some nasty side effects.

The safe way to work around this is to always allocate the device-table
below 4G, including the old device-table in normal kernel and the
device-table used for copying the content of the old device-table in kdump
kernel. Meanwhile we need check if the address of old device-table is
above 4G because it might has been touched accidentally in corrupted
1st kernel.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu_init.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6a77b99d08e4..8c6431ac5698 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -882,11 +882,15 @@ static int copy_device_table(void)
 			continue;
 
 		old_devtb_phys = entry & PAGE_MASK;
+		if (old_devtb_phys > 0x100000000ULL) {
+			pr_err("The address of old device table is above 4G, not trustworthy!/n");
+			return -1;
+		}
 		old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
 		if (!old_devtb)
 			return -1;
 
-		gfp_flag = GFP_KERNEL | __GFP_ZERO;
+		gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
 		old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
 					get_order(dev_table_size));
 		if (old_dev_tbl_cpy == NULL) {
@@ -2436,7 +2440,8 @@ static int __init early_amd_iommu_init(void)
 
 	/* Device table - directly used by all IOMMUs */
 	ret = -ENOMEM;
-	amd_iommu_dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+	amd_iommu_dev_table = (void *)__get_free_pages(
+				      GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
 				      get_order(dev_table_size));
 	if (amd_iommu_dev_table == NULL)
 		goto out;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 11/13] iommu/amd: Don't copy GCR3 table root pointer
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (9 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-04 12:27   ` Joerg Roedel
  2017-08-01 11:37 ` [PATCH v9 12/13] iommu/amd: Clear out the GV flag when handle deferred domain attach Baoquan He
  2017-08-01 11:37 ` [PATCH v9 13/13] iommu/amd: Disable iommu only if amd_iommu=off is specified Baoquan He
  12 siblings, 1 reply; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

When iommu is pre_enabled in kdump kernel, if a device is set up with
guest translations (DTE.GV=1), then don't copy GCR3 table root pointer
but move the device over to an empty guest-cr3 table and handle the
faults in the PPR log (which answer them with INVALID). After all these
PPR faults are recoverable for the device and we should not allow the
device to change old-kernels data when we don't have to.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu.c       | 28 +++-------------------------
 drivers/iommu/amd_iommu_init.c  | 11 +++++++++++
 drivers/iommu/amd_iommu_proto.h |  1 +
 drivers/iommu/amd_iommu_types.h | 24 ++++++++++++++++++++++++
 drivers/iommu/amd_iommu_v2.c    | 18 +++++++++++++++++-
 5 files changed, 56 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 4ff413b34b51..46d077784da0 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -103,30 +103,6 @@ int amd_iommu_max_glx_val = -1;
 static const struct dma_map_ops amd_iommu_dma_ops;
 
 /*
- * This struct contains device specific data for the IOMMU
- */
-struct iommu_dev_data {
-	struct list_head list;		  /* For domain->dev_list */
-	struct list_head dev_data_list;	  /* For global dev_data_list */
-	struct protection_domain *domain; /* Domain the device is bound to */
-	u16 devid;			  /* PCI Device ID */
-	u16 alias;			  /* Alias Device ID */
-	bool iommu_v2;			  /* Device can make use of IOMMUv2 */
-	bool passthrough;		  /* Device is identity mapped */
-	struct {
-		bool enabled;
-		int qdep;
-	} ats;				  /* ATS state */
-	bool pri_tlp;			  /* PASID TLB required for
-					     PPR completions */
-	u32 errata;			  /* Bitmap for errata to apply */
-	bool use_vapic;			  /* Enable device to use vapic mode */
-	bool defer_attach;
-
-	struct ratelimit_state rs;	  /* Ratelimit IOPF messages */
-};
-
-/*
  * general struct to manage commands send to an IOMMU
  */
 struct iommu_cmd {
@@ -386,10 +362,11 @@ static struct iommu_dev_data *find_dev_data(u16 devid)
 	return dev_data;
 }
 
-static struct iommu_dev_data *get_dev_data(struct device *dev)
+struct iommu_dev_data *get_dev_data(struct device *dev)
 {
 	return dev->archdata.iommu;
 }
+EXPORT_SYMBOL(get_dev_data);
 
 /*
 * Find or create an IOMMU group for a acpihid device.
@@ -2540,6 +2517,7 @@ static int dir2prot(enum dma_data_direction direction)
 	else
 		return 0;
 }
+
 /*
  * This function contains common code for mapping of a physically
  * contiguous memory region into DMA address space. It is used by all
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 8c6431ac5698..cada2e302761 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -214,6 +214,7 @@ u16 *amd_iommu_alias_table;
  * for a specific device. It is also indexed by the PCI device id.
  */
 struct amd_iommu **amd_iommu_rlookup_table;
+EXPORT_SYMBOL(amd_iommu_rlookup_table);
 
 /*
  * This table is used to find the irq remapping table for a given device id
@@ -267,6 +268,7 @@ bool translation_pre_enabled(struct amd_iommu *iommu)
 {
 	return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
 }
+EXPORT_SYMBOL(translation_pre_enabled);
 
 static void clear_translation_pre_enabled(struct amd_iommu *iommu)
 {
@@ -858,6 +860,7 @@ static int copy_device_table(void)
 	u16 dom_id, dte_v, irq_v;
 	static int copied;
 	gfp_t gfp_flag;
+	u64 tmp;
 
 	for_each_iommu(iommu) {
 		/* All IOMMUs should use the same device table with the same size */
@@ -908,6 +911,14 @@ static int copy_device_table(void)
 				old_dev_tbl_cpy[devid].data[1]
 					= old_devtb[devid].data[1];
 				__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+				/* If gcr3 table existed, mask it out */
+				if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
+					tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
+					tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
+					old_dev_tbl_cpy[devid].data[1] &= ~tmp;
+					tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
+					old_dev_tbl_cpy[devid].data[0] &= ~tmp;
+				}
 			}
 
 			irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index a9666d2005bb..90e62e9b01c5 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -88,4 +88,5 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
 }
 
 extern bool translation_pre_enabled(struct amd_iommu *iommu);
+extern struct iommu_dev_data *get_dev_data(struct device *dev);
 #endif /* _ASM_X86_AMD_IOMMU_PROTO_H  */
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index f0979183ec9b..9e5af13be7c5 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -618,6 +618,30 @@ struct devid_map {
 	bool cmd_line;
 };
 
+/*
+ * This struct contains device specific data for the IOMMU
+ */
+struct iommu_dev_data {
+	struct list_head list;		  /* For domain->dev_list */
+	struct list_head dev_data_list;	  /* For global dev_data_list */
+	struct protection_domain *domain; /* Domain the device is bound to */
+	u16 devid;			  /* PCI Device ID */
+	u16 alias;			  /* Alias Device ID */
+	bool iommu_v2;			  /* Device can make use of IOMMUv2 */
+	bool passthrough;		  /* Device is identity mapped */
+	struct {
+		bool enabled;
+		int qdep;
+	} ats;				  /* ATS state */
+	bool pri_tlp;			  /* PASID TLB required for
+					     PPR completions */
+	u32 errata;			  /* Bitmap for errata to apply */
+	bool use_vapic;			  /* Enable device to use vapic mode */
+	bool defer_attach;
+
+	struct ratelimit_state rs;        /* Ratelimit IOPF messages */
+};
+
 /* Map HPET and IOAPIC ids to the devid used by the IOMMU */
 extern struct list_head ioapic_map;
 extern struct list_head hpet_map;
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 6629c472eafd..e705fac89cb4 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -562,14 +562,30 @@ static int ppr_notifier(struct notifier_block *nb, unsigned long e, void *data)
 	unsigned long flags;
 	struct fault *fault;
 	bool finish;
-	u16 tag;
+	u16 tag, devid;
 	int ret;
+	struct iommu_dev_data *dev_data;
+	struct pci_dev *pdev = NULL;
 
 	iommu_fault = data;
 	tag         = iommu_fault->tag & 0x1ff;
 	finish      = (iommu_fault->tag >> 9) & 1;
 
+	devid = iommu_fault->device_id;
+	pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
+	if (!pdev)
+		return -ENODEV;
+	dev_data = get_dev_data(&pdev->dev);
+
+	/* In kdump kernel pci dev is not initialized yet -> send INVALID */
 	ret = NOTIFY_DONE;
+	if (translation_pre_enabled(amd_iommu_rlookup_table[devid])
+		&& dev_data->defer_attach) {
+		amd_iommu_complete_ppr(pdev, iommu_fault->pasid,
+				       PPR_INVALID, tag);
+		goto out;
+	}
+
 	dev_state = get_device_state(iommu_fault->device_id);
 	if (dev_state == NULL)
 		goto out;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 12/13] iommu/amd: Clear out the GV flag when handle deferred domain attach
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (10 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 11/13] iommu/amd: Don't copy GCR3 table root pointer Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  2017-08-04 12:30   ` Joerg Roedel
  2017-08-01 11:37 ` [PATCH v9 13/13] iommu/amd: Disable iommu only if amd_iommu=off is specified Baoquan He
  12 siblings, 1 reply; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

When handle deferred domain attach, we need check if the domain is
v2. If not, should try to clear out the GV flag which could be
copied from the old device table entry.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 46d077784da0..98aaccecbb76 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2085,6 +2085,11 @@ static void clear_dte_entry(u16 devid)
 	amd_iommu_apply_erratum_63(devid);
 }
 
+static void clear_dte_flag_gv(u16 devid)
+{
+	amd_iommu_dev_table[devid].data[0] &= (~DTE_FLAG_GV);
+}
+
 static void do_attach(struct iommu_dev_data *dev_data,
 		      struct protection_domain *domain)
 {
@@ -2459,6 +2464,7 @@ static struct iommu_group *amd_iommu_device_group(struct device *dev)
  */
 static struct protection_domain *get_domain(struct device *dev)
 {
+	struct iommu_dev_data *dev_data = get_dev_data(dev);
 	struct protection_domain *domain;
 	struct iommu_domain *io_domain;
 
@@ -2466,11 +2472,21 @@ static struct protection_domain *get_domain(struct device *dev)
 		return ERR_PTR(-EINVAL);
 
 	domain = get_dev_data(dev)->domain;
-	if (domain == NULL && get_dev_data(dev)->defer_attach) {
+	if (domain == NULL && dev_data->defer_attach) {
+		u16 alias = amd_iommu_alias_table[dev_data->devid];
 		get_dev_data(dev)->defer_attach = false;
 		io_domain = iommu_get_domain_for_dev(dev);
 		domain = to_pdomain(io_domain);
 		attach_device(dev, domain);
+		/*
+		 * If the deferred attached domain is not v2, should clear out
+		 * the old GV flag.
+		 */
+		if (!(domain->flags & PD_IOMMUV2_MASK)) {
+			clear_dte_flag_gv(dev_data->devid);
+			if (alias != dev_data->devid)
+				clear_dte_flag_gv(dev_data->devid);
+		}
 	}
 	if (!dma_ops_domain(domain))
 		return ERR_PTR(-EBUSY);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v9 13/13] iommu/amd: Disable iommu only if amd_iommu=off is specified
  2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
                   ` (11 preceding siblings ...)
  2017-08-01 11:37 ` [PATCH v9 12/13] iommu/amd: Clear out the GV flag when handle deferred domain attach Baoquan He
@ 2017-08-01 11:37 ` Baoquan He
  12 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-01 11:37 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, linux-kernel, Baoquan He

It's ok to disable iommu in normal kernel. While there's no need
to disable it in kdump kernel after the on-flight dma issue has
heen fixed.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 drivers/iommu/amd_iommu_init.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index cada2e302761..5015d61c1dcf 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2502,7 +2502,8 @@ static int __init early_amd_iommu_init(void)
 		goto out;
 
 	/* Disable any previously enabled IOMMUs */
-	disable_iommus();
+	if (amd_iommu_disabled)
+		disable_iommus();
 
 	if (amd_iommu_irq_remap)
 		amd_iommu_irq_remap = check_ioapic_information();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables()
  2017-08-01 11:37 ` [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables() Baoquan He
@ 2017-08-04 12:09   ` Joerg Roedel
  2017-08-04 12:21     ` Baoquan He
  2017-08-04 12:30     ` Baoquan He
  0 siblings, 2 replies; 27+ messages in thread
From: Joerg Roedel @ 2017-08-04 12:09 UTC (permalink / raw)
  To: Baoquan He; +Cc: iommu, linux-kernel

Hi Baoquan,

On Tue, Aug 01, 2017 at 07:37:21PM +0800, Baoquan He wrote:
> +	for_each_iommu(iommu) {
> +		/* All IOMMUs should use the same device table with the same size */
> +		lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
> +		hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
> +		entry = (((u64) hi) << 32) + lo;
> +		if (last_entry && last_entry != entry) {
> +			pr_err("IOMMU:%d should use the same dev table as others!/n",
> +				iommu->index);
> +			return -1;
> +		}
> +		last_entry = entry;
> +
> +		old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
> +		if (old_devtb_size != dev_table_size) {
> +			pr_err("The device table size of IOMMU:%d is not expected!/n",
> +				iommu->index);
> +			return -1;
> +		}
> +
> +		if (copied)
> +			continue;
> +
> +		old_devtb_phys = entry & PAGE_MASK;
> +		old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
> +		if (!old_devtb)
> +			return -1;

You forgot to check whether the old device table is also below 4GB.

> +
> +		gfp_flag = GFP_KERNEL | __GFP_ZERO;
> +		old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
> +					get_order(dev_table_size));
> +		if (old_dev_tbl_cpy == NULL) {
> +			pr_err("Failed to allocate memory for copying old device table!/n");
> +			return -1;
> +		}
> +
> +		for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
> +			old_dev_tbl_cpy[devid] = old_devtb[devid];
> +			dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
> +			dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
> +			if (dte_v && dom_id)
> +				__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
> +		}
> +		memunmap(old_devtb);
> +		copied = 1;

And this one should be outside of the loop, then you can get rid of the
'copied' variable. Also I don't really understand why you need a
temporary copy of the old device-table. Can't you just smart-copy the
contents of the old table to the real new one?


Regards,

	Joerg
	

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 06/13] iommu/amd: copy old trans table from old kernel
  2017-08-01 11:37 ` [PATCH v9 06/13] iommu/amd: copy old trans table from old kernel Baoquan He
@ 2017-08-04 12:21   ` Joerg Roedel
  2017-08-04 13:09     ` Baoquan He
  0 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2017-08-04 12:21 UTC (permalink / raw)
  To: Baoquan He; +Cc: iommu, linux-kernel

On Tue, Aug 01, 2017 at 07:37:22PM +0800, Baoquan He wrote:
>  static void init_device_table_dma(void)
>  {
> @@ -2137,9 +2140,49 @@ static void early_enable_iommu(struct amd_iommu *iommu)
>  static void early_enable_iommus(void)
>  {
>  	struct amd_iommu *iommu;
> +	bool is_pre_disabled = false;
>  
> -	for_each_iommu(iommu)
> -		early_enable_iommu(iommu);
> +	for_each_iommu(iommu) {
> +		if (!translation_pre_enabled(iommu)) {
> +			is_pre_disabled = true;
> +			break;
> +		}
> +	}

I think this could be easier if you make 'is_pre_disabled' a global state
bool variable with reversed meaning and initialize it with 'true'.
Basically

	static bool __init amd_iommu_pre_enabled = true;

And set this variable to 'false' when you find a already disabled IOMMU
during initialization. Then you can remove the code above.

> +
> +	if (is_pre_disabled) {
> +		for_each_iommu(iommu)
> +			early_enable_iommu(iommu);

		return
	}

> +	} else {

And remove this to get rid of one indendation level for the code below.

> +		pr_warn("Translation is already enabled - trying to copy translation structures\n");
> +		if (copy_device_table()) {

Seeing how the return value of copy_device_table() is used, it makes
sense to change its return-type to bool. This way it is also easier to
see which one is the sucess and which one the failure case.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables()
  2017-08-04 12:09   ` Joerg Roedel
@ 2017-08-04 12:21     ` Baoquan He
  2017-08-04 12:30     ` Baoquan He
  1 sibling, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-04 12:21 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel

Hi Joerg,

Thanks for your reviewing!

On 08/04/17 at 02:09pm, Joerg Roedel wrote:
> Hi Baoquan,
> 
> On Tue, Aug 01, 2017 at 07:37:21PM +0800, Baoquan He wrote:
> > +	for_each_iommu(iommu) {
> > +		/* All IOMMUs should use the same device table with the same size */
> > +		lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
> > +		hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
> > +		entry = (((u64) hi) << 32) + lo;
> > +		if (last_entry && last_entry != entry) {
> > +			pr_err("IOMMU:%d should use the same dev table as others!/n",
> > +				iommu->index);
> > +			return -1;
> > +		}
> > +		last_entry = entry;
> > +
> > +		old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
> > +		if (old_devtb_size != dev_table_size) {
> > +			pr_err("The device table size of IOMMU:%d is not expected!/n",
> > +				iommu->index);
> > +			return -1;
> > +		}
> > +
> > +		if (copied)
> > +			continue;
> > +
> > +		old_devtb_phys = entry & PAGE_MASK;
> > +		old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
> > +		if (!old_devtb)
> > +			return -1;
> 
> You forgot to check whether the old device table is also below 4GB.

I did it in patch 10/13. I think it's an sub-issue and can be explained
in a specific patch.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled
  2017-08-01 11:37 ` [PATCH v9 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled Baoquan He
@ 2017-08-04 12:25   ` Joerg Roedel
  2017-08-04 13:01     ` Baoquan He
  0 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2017-08-04 12:25 UTC (permalink / raw)
  To: Baoquan He; +Cc: iommu, linux-kernel

On Tue, Aug 01, 2017 at 07:37:26PM +0800, Baoquan He wrote:
> AMD pointed out it's unsafe to update the device-table while iommu
> is enabled. It turns out that device-table pointer update is split
> up into two 32bit writes in the IOMMU hardware. So updating it while
> the IOMMU is enabled could have some nasty side effects.
> 
> The safe way to work around this is to always allocate the device-table
> below 4G, including the old device-table in normal kernel and the
> device-table used for copying the content of the old device-table in kdump
> kernel. Meanwhile we need check if the address of old device-table is
> above 4G because it might has been touched accidentally in corrupted
> 1st kernel.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  drivers/iommu/amd_iommu_init.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 6a77b99d08e4..8c6431ac5698 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -882,11 +882,15 @@ static int copy_device_table(void)
>  			continue;
>  
>  		old_devtb_phys = entry & PAGE_MASK;
> +		if (old_devtb_phys > 0x100000000ULL) {

Needs to be '>='.

> +			pr_err("The address of old device table is above 4G, not trustworthy!/n");
> +			return -1;
> +		}

Okay, forget my previous comment about it, the check is added here

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 11/13] iommu/amd: Don't copy GCR3 table root pointer
  2017-08-01 11:37 ` [PATCH v9 11/13] iommu/amd: Don't copy GCR3 table root pointer Baoquan He
@ 2017-08-04 12:27   ` Joerg Roedel
  0 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2017-08-04 12:27 UTC (permalink / raw)
  To: Baoquan He; +Cc: iommu, linux-kernel

On Tue, Aug 01, 2017 at 07:37:27PM +0800, Baoquan He wrote:
> @@ -908,6 +911,14 @@ static int copy_device_table(void)
>  				old_dev_tbl_cpy[devid].data[1]
>  					= old_devtb[devid].data[1];
>  				__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
> +				/* If gcr3 table existed, mask it out */
> +				if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
> +					tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
> +					tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
> +					old_dev_tbl_cpy[devid].data[1] &= ~tmp;
> +					tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
> +					old_dev_tbl_cpy[devid].data[0] &= ~tmp;

You also need to clear the DTE_FLAG_GV bit.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables()
  2017-08-04 12:09   ` Joerg Roedel
  2017-08-04 12:21     ` Baoquan He
@ 2017-08-04 12:30     ` Baoquan He
  2017-08-04 12:51       ` Joerg Roedel
  1 sibling, 1 reply; 27+ messages in thread
From: Baoquan He @ 2017-08-04 12:30 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel

On 08/04/17 at 02:09pm, Joerg Roedel wrote:
> 
> On Tue, Aug 01, 2017 at 07:37:21PM +0800, Baoquan He wrote:
> > +	for_each_iommu(iommu) {

	......

> > +		if (copied)
> > +			continue;
> > +
> > +		old_devtb_phys = entry & PAGE_MASK;
> > +		old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
> > +		if (!old_devtb)
> > +			return -1;
> 
> You forgot to check whether the old device table is also below 4GB.
> 
> > +
> > +		gfp_flag = GFP_KERNEL | __GFP_ZERO;
> > +		old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
> > +					get_order(dev_table_size));
> > +		if (old_dev_tbl_cpy == NULL) {
> > +			pr_err("Failed to allocate memory for copying old device table!/n");
> > +			return -1;
> > +		}
> > +
> > +		for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
> > +			old_dev_tbl_cpy[devid] = old_devtb[devid];
> > +			dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
> > +			dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
> > +			if (dte_v && dom_id)
> > +				__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
> > +		}
> > +		memunmap(old_devtb);
> > +		copied = 1;
> 
> And this one should be outside of the loop, then you can get rid of the
> 'copied' variable. Also I don't really understand why you need a
> temporary copy of the old device-table. Can't you just smart-copy the
> contents of the old table to the real new one?

Sorry, I don't get 'this one' meaning, are you suggesting the copy for
loop should be take out of the iommu for loop? 

About the temporary copy of the old device-table, you can see in patch
7/13, if irq sanity check failed, it return -1. This return could happen
in the middle of copy. So I think we should do a whole successful copy,
or don't copy at all. It might not be good do half copy.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 12/13] iommu/amd: Clear out the GV flag when handle deferred domain attach
  2017-08-01 11:37 ` [PATCH v9 12/13] iommu/amd: Clear out the GV flag when handle deferred domain attach Baoquan He
@ 2017-08-04 12:30   ` Joerg Roedel
  2017-08-04 13:13     ` Baoquan He
  0 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2017-08-04 12:30 UTC (permalink / raw)
  To: Baoquan He; +Cc: iommu, linux-kernel

On Tue, Aug 01, 2017 at 07:37:28PM +0800, Baoquan He wrote:
> @@ -2466,11 +2472,21 @@ static struct protection_domain *get_domain(struct device *dev)
>  		return ERR_PTR(-EINVAL);
>  
>  	domain = get_dev_data(dev)->domain;
> -	if (domain == NULL && get_dev_data(dev)->defer_attach) {
> +	if (domain == NULL && dev_data->defer_attach) {
> +		u16 alias = amd_iommu_alias_table[dev_data->devid];
>  		get_dev_data(dev)->defer_attach = false;
>  		io_domain = iommu_get_domain_for_dev(dev);
>  		domain = to_pdomain(io_domain);
>  		attach_device(dev, domain);
> +		/*
> +		 * If the deferred attached domain is not v2, should clear out
> +		 * the old GV flag.
> +		 */
> +		if (!(domain->flags & PD_IOMMUV2_MASK)) {
> +			clear_dte_flag_gv(dev_data->devid);
> +			if (alias != dev_data->devid)
> +				clear_dte_flag_gv(dev_data->devid);

Hmm, thinking more about it, I am not sure what the IOMMU responds to
PRI/PASID prefixes if the GV flag is 0.

But until we know it causes problems we should just disable the bit
while doing the copy in the previous patch and avoid any special
handling like done here.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables()
  2017-08-04 12:30     ` Baoquan He
@ 2017-08-04 12:51       ` Joerg Roedel
  2017-08-04 13:00         ` Baoquan He
  0 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2017-08-04 12:51 UTC (permalink / raw)
  To: Baoquan He; +Cc: iommu, linux-kernel

On Fri, Aug 04, 2017 at 08:30:39PM +0800, Baoquan He wrote:
> Sorry, I don't get 'this one' meaning, are you suggesting the copy for
> loop should be take out of the iommu for loop? 
> 
> About the temporary copy of the old device-table, you can see in patch
> 7/13, if irq sanity check failed, it return -1. This return could happen
> in the middle of copy. So I think we should do a whole successful copy,
> or don't copy at all. It might not be good do half copy.

No, I mean that you should move the copy of the complete device-table
out of the for_each_iommu() loop. Currently you make sure you copy only
once with the 'copied' flag, but that is not necessary if you move the
code behind the loop.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables()
  2017-08-04 12:51       ` Joerg Roedel
@ 2017-08-04 13:00         ` Baoquan He
  0 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-04 13:00 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel

On 08/04/17 at 02:51pm, Joerg Roedel wrote:
> On Fri, Aug 04, 2017 at 08:30:39PM +0800, Baoquan He wrote:
> > Sorry, I don't get 'this one' meaning, are you suggesting the copy for
> > loop should be take out of the iommu for loop? 
> > 
> > About the temporary copy of the old device-table, you can see in patch
> > 7/13, if irq sanity check failed, it return -1. This return could happen
> > in the middle of copy. So I think we should do a whole successful copy,
> > or don't copy at all. It might not be good do half copy.
> 
> No, I mean that you should move the copy of the complete device-table
> out of the for_each_iommu() loop. Currently you make sure you copy only
> once with the 'copied' flag, but that is not necessary if you move the
> code behind the loop.

Ok, will do. Thanks!
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled
  2017-08-04 12:25   ` Joerg Roedel
@ 2017-08-04 13:01     ` Baoquan He
  0 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-04 13:01 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel

On 08/04/17 at 02:25pm, Joerg Roedel wrote:
> On Tue, Aug 01, 2017 at 07:37:26PM +0800, Baoquan He wrote:
> > AMD pointed out it's unsafe to update the device-table while iommu
> > is enabled. It turns out that device-table pointer update is split
> > up into two 32bit writes in the IOMMU hardware. So updating it while
> > the IOMMU is enabled could have some nasty side effects.
> > 
> > The safe way to work around this is to always allocate the device-table
> > below 4G, including the old device-table in normal kernel and the
> > device-table used for copying the content of the old device-table in kdump
> > kernel. Meanwhile we need check if the address of old device-table is
> > above 4G because it might has been touched accidentally in corrupted
> > 1st kernel.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  drivers/iommu/amd_iommu_init.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index 6a77b99d08e4..8c6431ac5698 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -882,11 +882,15 @@ static int copy_device_table(void)
> >  			continue;
> >  
> >  		old_devtb_phys = entry & PAGE_MASK;
> > +		if (old_devtb_phys > 0x100000000ULL) {
> 
> Needs to be '>='.

Will change.

> 
> > +			pr_err("The address of old device table is above 4G, not trustworthy!/n");
> > +			return -1;
> > +		}
> 
> Okay, forget my previous comment about it, the check is added here
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 06/13] iommu/amd: copy old trans table from old kernel
  2017-08-04 12:21   ` Joerg Roedel
@ 2017-08-04 13:09     ` Baoquan He
  0 siblings, 0 replies; 27+ messages in thread
From: Baoquan He @ 2017-08-04 13:09 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel

On 08/04/17 at 02:21pm, Joerg Roedel wrote:
> On Tue, Aug 01, 2017 at 07:37:22PM +0800, Baoquan He wrote:
> >  static void init_device_table_dma(void)
> >  {
> > @@ -2137,9 +2140,49 @@ static void early_enable_iommu(struct amd_iommu *iommu)
> >  static void early_enable_iommus(void)
> >  {
> >  	struct amd_iommu *iommu;
> > +	bool is_pre_disabled = false;
> >  
> > -	for_each_iommu(iommu)
> > -		early_enable_iommu(iommu);
> > +	for_each_iommu(iommu) {
> > +		if (!translation_pre_enabled(iommu)) {
> > +			is_pre_disabled = true;
> > +			break;
> > +		}
> > +	}
> 
> I think this could be easier if you make 'is_pre_disabled' a global state
> bool variable with reversed meaning and initialize it with 'true'.
> Basically
> 
> 	static bool __init amd_iommu_pre_enabled = true;
> 
> And set this variable to 'false' when you find a already disabled IOMMU
> during initialization. Then you can remove the code above.

Great idea, will change as you suggested. Thanks.

> 
> > +
> > +	if (is_pre_disabled) {
> > +		for_each_iommu(iommu)
> > +			early_enable_iommu(iommu);
> 
> 		return
> 	}
> 
> > +	} else {
> 
> And remove this to get rid of one indendation level for the code below.
> 
> > +		pr_warn("Translation is already enabled - trying to copy translation structures\n");
> > +		if (copy_device_table()) {
> 
> Seeing how the return value of copy_device_table() is used, it makes
> sense to change its return-type to bool. This way it is also easier to
> see which one is the sucess and which one the failure case.
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 12/13] iommu/amd: Clear out the GV flag when handle deferred domain attach
  2017-08-04 12:30   ` Joerg Roedel
@ 2017-08-04 13:13     ` Baoquan He
  2017-08-04 13:18       ` Joerg Roedel
  0 siblings, 1 reply; 27+ messages in thread
From: Baoquan He @ 2017-08-04 13:13 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel

On 08/04/17 at 02:30pm, Joerg Roedel wrote:
> On Tue, Aug 01, 2017 at 07:37:28PM +0800, Baoquan He wrote:
> > @@ -2466,11 +2472,21 @@ static struct protection_domain *get_domain(struct device *dev)
> >  		return ERR_PTR(-EINVAL);
> >  
> >  	domain = get_dev_data(dev)->domain;
> > -	if (domain == NULL && get_dev_data(dev)->defer_attach) {
> > +	if (domain == NULL && dev_data->defer_attach) {
> > +		u16 alias = amd_iommu_alias_table[dev_data->devid];
> >  		get_dev_data(dev)->defer_attach = false;
> >  		io_domain = iommu_get_domain_for_dev(dev);
> >  		domain = to_pdomain(io_domain);
> >  		attach_device(dev, domain);
> > +		/*
> > +		 * If the deferred attached domain is not v2, should clear out
> > +		 * the old GV flag.
> > +		 */
> > +		if (!(domain->flags & PD_IOMMUV2_MASK)) {
> > +			clear_dte_flag_gv(dev_data->devid);
> > +			if (alias != dev_data->devid)
> > +				clear_dte_flag_gv(dev_data->devid);
> 
> Hmm, thinking more about it, I am not sure what the IOMMU responds to
> PRI/PASID prefixes if the GV flag is 0.
> 
> But until we know it causes problems we should just disable the bit
> while doing the copy in the previous patch and avoid any special
> handling like done here.

So just drop this patch, right? Will do if I got it right. Thanks.

> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v9 12/13] iommu/amd: Clear out the GV flag when handle deferred domain attach
  2017-08-04 13:13     ` Baoquan He
@ 2017-08-04 13:18       ` Joerg Roedel
  0 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2017-08-04 13:18 UTC (permalink / raw)
  To: Baoquan He; +Cc: iommu, linux-kernel

On Fri, Aug 04, 2017 at 09:13:20PM +0800, Baoquan He wrote:
> On 08/04/17 at 02:30pm, Joerg Roedel wrote:
> > On Tue, Aug 01, 2017 at 07:37:28PM +0800, Baoquan He wrote:
> > > @@ -2466,11 +2472,21 @@ static struct protection_domain *get_domain(struct device *dev)
> > >  		return ERR_PTR(-EINVAL);
> > >  
> > >  	domain = get_dev_data(dev)->domain;
> > > -	if (domain == NULL && get_dev_data(dev)->defer_attach) {
> > > +	if (domain == NULL && dev_data->defer_attach) {
> > > +		u16 alias = amd_iommu_alias_table[dev_data->devid];
> > >  		get_dev_data(dev)->defer_attach = false;
> > >  		io_domain = iommu_get_domain_for_dev(dev);
> > >  		domain = to_pdomain(io_domain);
> > >  		attach_device(dev, domain);
> > > +		/*
> > > +		 * If the deferred attached domain is not v2, should clear out
> > > +		 * the old GV flag.
> > > +		 */
> > > +		if (!(domain->flags & PD_IOMMUV2_MASK)) {
> > > +			clear_dte_flag_gv(dev_data->devid);
> > > +			if (alias != dev_data->devid)
> > > +				clear_dte_flag_gv(dev_data->devid);
> > 
> > Hmm, thinking more about it, I am not sure what the IOMMU responds to
> > PRI/PASID prefixes if the GV flag is 0.
> > 
> > But until we know it causes problems we should just disable the bit
> > while doing the copy in the previous patch and avoid any special
> > handling like done here.
> 
> So just drop this patch, right? Will do if I got it right. Thanks.

Yes, and clear the GV flag in the previous patch too.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-08-04 13:18 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01 11:37 [PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu Baoquan He
2017-08-01 11:37 ` [PATCH v9 01/13] iommu/amd: Detect pre enabled translation Baoquan He
2017-08-01 11:37 ` [PATCH v9 02/13] iommu/amd: add several helper functions Baoquan He
2017-08-01 11:37 ` [PATCH v9 03/13] Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel" Baoquan He
2017-08-01 11:37 ` [PATCH v9 04/13] iommu/amd: Define bit fields for DTE particularly Baoquan He
2017-08-01 11:37 ` [PATCH v9 05/13] iommu/amd: Add function copy_dev_tables() Baoquan He
2017-08-04 12:09   ` Joerg Roedel
2017-08-04 12:21     ` Baoquan He
2017-08-04 12:30     ` Baoquan He
2017-08-04 12:51       ` Joerg Roedel
2017-08-04 13:00         ` Baoquan He
2017-08-01 11:37 ` [PATCH v9 06/13] iommu/amd: copy old trans table from old kernel Baoquan He
2017-08-04 12:21   ` Joerg Roedel
2017-08-04 13:09     ` Baoquan He
2017-08-01 11:37 ` [PATCH v9 07/13] iommu/amd: Do sanity check for address translation and irq remap of old dev table entry Baoquan He
2017-08-01 11:37 ` [PATCH v9 08/13] iommu: Add is_attach_deferred call-back to iommu-ops Baoquan He
2017-08-01 11:37 ` [PATCH v9 09/13] iommu/amd: Use is_attach_deferred call-back Baoquan He
2017-08-01 11:37 ` [PATCH v9 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled Baoquan He
2017-08-04 12:25   ` Joerg Roedel
2017-08-04 13:01     ` Baoquan He
2017-08-01 11:37 ` [PATCH v9 11/13] iommu/amd: Don't copy GCR3 table root pointer Baoquan He
2017-08-04 12:27   ` Joerg Roedel
2017-08-01 11:37 ` [PATCH v9 12/13] iommu/amd: Clear out the GV flag when handle deferred domain attach Baoquan He
2017-08-04 12:30   ` Joerg Roedel
2017-08-04 13:13     ` Baoquan He
2017-08-04 13:18       ` Joerg Roedel
2017-08-01 11:37 ` [PATCH v9 13/13] iommu/amd: Disable iommu only if amd_iommu=off is specified Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).