All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/6] iommu/intel: Handle DMAR faults in a wq
@ 2018-02-12 16:48 Dmitry Safonov
  2018-02-12 16:48   ` Dmitry Safonov via iommu
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Alex Williamson, David Woodhouse,
	Ingo Molnar, Joerg Roedel, Lu Baolu, iommu

Changes to v2:
- Ratelimit printks for dmar faults (6 patch)

First version:
https://lkml.org/lkml/2018/1/24/364

A softlockup-panic fix I've meet on kernel test suite.
While at it, fix a couple of minor issues.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org

Dmitry Safonov (6):
  iommu/intel: Add __init for dmar_register_bus_notifier()
  iommu/intel: Clean/document fault status flags
  iommu/intel: Introduce clear_primary_faults() helper
  iommu/intel: Handle DMAR faults on workqueue
  iommu/intel: Rename dmar_fault() => dmar_serve_faults()
  iommu/intel: Ratelimit each dmar fault printing

 drivers/iommu/dmar.c                | 73 +++++++++++++++++++++++++++++++------
 drivers/iommu/intel-iommu.c         |  8 +++-
 drivers/iommu/intel_irq_remapping.c |  2 +-
 include/linux/dmar.h                |  3 +-
 include/linux/intel-iommu.h         | 12 +++---
 5 files changed, 78 insertions(+), 20 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCHv2 1/6] iommu/intel: Add __init for dmar_register_bus_notifier()
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Alex Williamson, David Woodhouse,
	Ingo Molnar, Joerg Roedel, Lu Baolu, iommu

It's called only from intel_iommu_init(), which is init function.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/iommu/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 9a7ffd13c7f0..accf58388bdb 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -806,7 +806,7 @@ int __init dmar_dev_scope_init(void)
 	return dmar_dev_scope_status;
 }
 
-void dmar_register_bus_notifier(void)
+void __init dmar_register_bus_notifier(void)
 {
 	bus_register_notifier(&pci_bus_type, &dmar_pci_bus_nb);
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 1/6] iommu/intel: Add __init for dmar_register_bus_notifier()
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov via iommu @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Dmitry Safonov, 0x7f454c46-Re5JQEeQqe8AvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Woodhouse, Ingo Molnar

It's called only from intel_iommu_init(), which is init function.

Signed-off-by: Dmitry Safonov <dima-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 9a7ffd13c7f0..accf58388bdb 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -806,7 +806,7 @@ int __init dmar_dev_scope_init(void)
 	return dmar_dev_scope_status;
 }
 
-void dmar_register_bus_notifier(void)
+void __init dmar_register_bus_notifier(void)
 {
 	bus_register_notifier(&pci_bus_type, &dmar_pci_bus_nb);
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 2/6] iommu/intel: Clean/document fault status flags
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Alex Williamson, David Woodhouse,
	Ingo Molnar, Joerg Roedel, Lu Baolu, iommu

So one could decode them without opening the specification.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/intel-iommu.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8dad3dd26eae..ef169d67df92 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -209,12 +209,12 @@
 #define DMA_FECTL_IM (((u32)1) << 31)
 
 /* FSTS_REG */
-#define DMA_FSTS_PPF ((u32)2)
-#define DMA_FSTS_PFO ((u32)1)
-#define DMA_FSTS_IQE (1 << 4)
-#define DMA_FSTS_ICE (1 << 5)
-#define DMA_FSTS_ITE (1 << 6)
-#define DMA_FSTS_PRO (1 << 7)
+#define DMA_FSTS_PFO (1 << 0) /* Primary Fault Overflow */
+#define DMA_FSTS_PPF (1 << 1) /* Primary Pending Fault */
+#define DMA_FSTS_IQE (1 << 4) /* Invalidation Queue Error */
+#define DMA_FSTS_ICE (1 << 5) /* Invalidation Completion Error */
+#define DMA_FSTS_ITE (1 << 6) /* Invalidation Time-out Error */
+#define DMA_FSTS_PRO (1 << 7) /* Page Request Overflow */
 #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
 
 /* FRCD_REG, 32 bits access */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 2/6] iommu/intel: Clean/document fault status flags
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov via iommu @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Dmitry Safonov, 0x7f454c46-Re5JQEeQqe8AvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Woodhouse, Ingo Molnar

So one could decode them without opening the specification.

Signed-off-by: Dmitry Safonov <dima-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
---
 include/linux/intel-iommu.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8dad3dd26eae..ef169d67df92 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -209,12 +209,12 @@
 #define DMA_FECTL_IM (((u32)1) << 31)
 
 /* FSTS_REG */
-#define DMA_FSTS_PPF ((u32)2)
-#define DMA_FSTS_PFO ((u32)1)
-#define DMA_FSTS_IQE (1 << 4)
-#define DMA_FSTS_ICE (1 << 5)
-#define DMA_FSTS_ITE (1 << 6)
-#define DMA_FSTS_PRO (1 << 7)
+#define DMA_FSTS_PFO (1 << 0) /* Primary Fault Overflow */
+#define DMA_FSTS_PPF (1 << 1) /* Primary Pending Fault */
+#define DMA_FSTS_IQE (1 << 4) /* Invalidation Queue Error */
+#define DMA_FSTS_ICE (1 << 5) /* Invalidation Completion Error */
+#define DMA_FSTS_ITE (1 << 6) /* Invalidation Time-out Error */
+#define DMA_FSTS_PRO (1 << 7) /* Page Request Overflow */
 #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
 
 /* FRCD_REG, 32 bits access */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 3/6] iommu/intel: Introduce clear_primary_faults() helper
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Alex Williamson, David Woodhouse,
	Ingo Molnar, Joerg Roedel, Lu Baolu, iommu

To my mind it's a bit more readable - and I will re-use it in the next
patch.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/iommu/dmar.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index accf58388bdb..33fb4244e438 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1611,6 +1611,15 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
 	return 0;
 }
 
+static void clear_primary_faults(struct intel_iommu *iommu)
+{
+	volatile void __iomem *fsts = iommu->reg + DMAR_FSTS_REG;
+
+	assert_raw_spin_locked(&iommu->register_lock);
+
+	writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO, fsts);
+}
+
 #define PRIMARY_FAULT_REG_LEN (16)
 irqreturn_t dmar_fault(int irq, void *dev_id)
 {
@@ -1679,8 +1688,7 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 		raw_spin_lock_irqsave(&iommu->register_lock, flag);
 	}
 
-	writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO,
-	       iommu->reg + DMAR_FSTS_REG);
+	clear_primary_faults(iommu);
 
 unlock_exit:
 	raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 3/6] iommu/intel: Introduce clear_primary_faults() helper
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov via iommu @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Dmitry Safonov, 0x7f454c46-Re5JQEeQqe8AvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Woodhouse, Ingo Molnar

To my mind it's a bit more readable - and I will re-use it in the next
patch.

Signed-off-by: Dmitry Safonov <dima-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/dmar.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index accf58388bdb..33fb4244e438 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1611,6 +1611,15 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
 	return 0;
 }
 
+static void clear_primary_faults(struct intel_iommu *iommu)
+{
+	volatile void __iomem *fsts = iommu->reg + DMAR_FSTS_REG;
+
+	assert_raw_spin_locked(&iommu->register_lock);
+
+	writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO, fsts);
+}
+
 #define PRIMARY_FAULT_REG_LEN (16)
 irqreturn_t dmar_fault(int irq, void *dev_id)
 {
@@ -1679,8 +1688,7 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 		raw_spin_lock_irqsave(&iommu->register_lock, flag);
 	}
 
-	writel(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_PRO,
-	       iommu->reg + DMAR_FSTS_REG);
+	clear_primary_faults(iommu);
 
 unlock_exit:
 	raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Alex Williamson, David Woodhouse,
	Ingo Molnar, Joerg Roedel, Lu Baolu, iommu

dmar_fault() reports/handles/cleans DMAR faults in a cycle one-by-one.
The nuisance is that it's set as a irq handler and runs with disabled
interrupts - which works OK if you have only a couple of DMAR faults,
but becomes a problem if your intel iommu has a plenty of mappings.

We have a test that checks flapping of PCI link. Once or twice it had
stuck on soft lockup (which is panic for the typical switch):

 dmar: DMAR:[DMA Write] Request device [4a:00.0] fault addr 5e1a3000
 DMAR:[fault reason 02] Present bit in context entry is clear
 NMI watchdog: BUG: soft lockup - CPU#8 stuck for 21s!
 CPU: 8 PID: 2774 Comm: SuperServer
 Call Trace:
  generic_exec_single+0x11b/0x12d
  ? flush_tlb_func+0x0/0x18a
  smp_call_function_single+0x89/0xb5
  smp_call_function_many+0xef/0x207
  native_flush_tlb_others+0x2e/0x30
  flush_tlb_mm_range+0x139/0x18a
  tlb_flush_mmu_tlbonly+0x35/0x85
  tlb_flush_mmu+0x13/0x1f
  tlb_finish_mmu+0x14/0x39
  unmap_region+0xda/0xec
  do_munmap+0x273/0x2f5
  vm_munmap+0x45/0x5e
  SyS_munmap+0x26/0x2f
  sysenter_dispatch+0x7/0x25

 Kernel panic - not syncing: softlockup: hung tasks
 sending NMI to all CPUs:
 NMI backtrace for cpu 0
 CPU: 0 PID: 0 Comm: swapper/0
 Call Trace:
  <IRQ>
  wait_for_xmitr+0x26/0x8f
  serial8250_console_putchar+0x1c/0x2c
  uart_console_write+0x40/0x4b
  serial8250_console_write+0xe6/0x13f
  call_console_drivers.constprop.13+0xce/0x103
  console_unlock+0x1f8/0x39b
  ? local_clock+0x21/0x23
  vprintk_emit+0x39e/0x3e6
  printk+0x4d/0x4f
  dmar_fault+0x1ab/0x1fd
  handle_irq_event_percpu+0x8a/0x1f5
  handle_irq_event+0x41/0x61
  handle_edge_irq+0xe1/0xfa
  handle_irq+0x155/0x166
  do_IRQ+0x5a/0xea
  ret_from_intr+0x0/0x15
  <EOI>
  arch_cpu_idle+0xf/0x11
  cpu_startup_entry+0x22f/0x3cb
  rest_init+0x80/0x84
  start_kernel+0x470/0x47d
  x86_64_start_reservations+0x2a/0x2c
  x86_64_start_kernel+0x14b/0x15a

Move DMAR faults clearing out of irq-disabled critical section by
proceeding with that in a workqueue thread.
The next patch will correct the definition of dmar_fault().

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/iommu/dmar.c        | 48 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/intel-iommu.c |  6 ++++++
 include/linux/dmar.h        |  1 +
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 33fb4244e438..34a53aeede38 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -71,6 +71,12 @@ struct acpi_table_header * __initdata dmar_tbl;
 static int dmar_dev_scope_status = 1;
 static unsigned long dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)];
 
+static struct workqueue_struct *fault_wq;
+struct fault {
+	struct work_struct work;
+	struct intel_iommu *iommu;
+};
+
 static int alloc_iommu(struct dmar_drhd_unit *drhd);
 static void free_iommu(struct intel_iommu *iommu);
 
@@ -811,6 +817,14 @@ void __init dmar_register_bus_notifier(void)
 	bus_register_notifier(&pci_bus_type, &dmar_pci_bus_nb);
 }
 
+int __init dmar_init_workqueue(void)
+{
+	fault_wq = alloc_workqueue("dmar_fault", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
+	if (fault_wq == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
 
 int __init dmar_table_init(void)
 {
@@ -1695,6 +1709,38 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static void dmar_fault_work(struct work_struct *work)
+{
+	struct fault *fault = container_of(work, struct fault, work);
+
+	dmar_fault(-1, (void*)fault->iommu);
+}
+
+static irqreturn_t dmar_fault_handler(int irq, void *dev_id)
+{
+	struct intel_iommu *iommu = dev_id;
+	struct fault *fault;
+
+	fault = kzalloc(sizeof(*fault), GFP_ATOMIC);
+	if (fault == NULL) {
+		unsigned long flag;
+
+		/* During OOM - clear faults and let device re-fault */
+		raw_spin_lock_irqsave(&iommu->register_lock, flag);
+		clear_primary_faults(iommu);
+		raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
+
+		return IRQ_HANDLED;
+	}
+
+	fault->iommu = iommu;
+	INIT_WORK(&fault->work, dmar_fault_work);
+
+	queue_work(fault_wq, &fault->work);
+
+	return IRQ_HANDLED;
+}
+
 int dmar_set_interrupt(struct intel_iommu *iommu)
 {
 	int irq, ret;
@@ -1713,7 +1759,7 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
 		return -EINVAL;
 	}
 
-	ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu);
+	ret = request_irq(irq, dmar_fault_handler, IRQF_NO_THREAD, iommu->name, iommu);
 	if (ret)
 		pr_err("Can't request irq\n");
 	return ret;
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 582fd01cb7d1..64ad9786f5b9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4783,6 +4783,12 @@ int __init intel_iommu_init(void)
 		goto out_free_dmar;
 	}
 
+	if (dmar_init_workqueue()) {
+		if (force_on)
+			panic("tboot: Failed to init workqueue for DMARs\n");
+		goto out_free_dmar;
+	}
+
 	if (list_empty(&dmar_rmrr_units))
 		pr_info("No RMRR found\n");
 
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index e2433bc50210..5de4932a6ad8 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -113,6 +113,7 @@ static inline bool dmar_rcu_check(void)
 extern int dmar_table_init(void);
 extern int dmar_dev_scope_init(void);
 extern void dmar_register_bus_notifier(void);
+extern int dmar_init_workqueue(void);
 extern int dmar_parse_dev_scope(void *start, void *end, int *cnt,
 				struct dmar_dev_scope **devices, u16 segment);
 extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov via iommu @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Dmitry Safonov, 0x7f454c46-Re5JQEeQqe8AvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Woodhouse, Ingo Molnar

dmar_fault() reports/handles/cleans DMAR faults in a cycle one-by-one.
The nuisance is that it's set as a irq handler and runs with disabled
interrupts - which works OK if you have only a couple of DMAR faults,
but becomes a problem if your intel iommu has a plenty of mappings.

We have a test that checks flapping of PCI link. Once or twice it had
stuck on soft lockup (which is panic for the typical switch):

 dmar: DMAR:[DMA Write] Request device [4a:00.0] fault addr 5e1a3000
 DMAR:[fault reason 02] Present bit in context entry is clear
 NMI watchdog: BUG: soft lockup - CPU#8 stuck for 21s!
 CPU: 8 PID: 2774 Comm: SuperServer
 Call Trace:
  generic_exec_single+0x11b/0x12d
  ? flush_tlb_func+0x0/0x18a
  smp_call_function_single+0x89/0xb5
  smp_call_function_many+0xef/0x207
  native_flush_tlb_others+0x2e/0x30
  flush_tlb_mm_range+0x139/0x18a
  tlb_flush_mmu_tlbonly+0x35/0x85
  tlb_flush_mmu+0x13/0x1f
  tlb_finish_mmu+0x14/0x39
  unmap_region+0xda/0xec
  do_munmap+0x273/0x2f5
  vm_munmap+0x45/0x5e
  SyS_munmap+0x26/0x2f
  sysenter_dispatch+0x7/0x25

 Kernel panic - not syncing: softlockup: hung tasks
 sending NMI to all CPUs:
 NMI backtrace for cpu 0
 CPU: 0 PID: 0 Comm: swapper/0
 Call Trace:
  <IRQ>
  wait_for_xmitr+0x26/0x8f
  serial8250_console_putchar+0x1c/0x2c
  uart_console_write+0x40/0x4b
  serial8250_console_write+0xe6/0x13f
  call_console_drivers.constprop.13+0xce/0x103
  console_unlock+0x1f8/0x39b
  ? local_clock+0x21/0x23
  vprintk_emit+0x39e/0x3e6
  printk+0x4d/0x4f
  dmar_fault+0x1ab/0x1fd
  handle_irq_event_percpu+0x8a/0x1f5
  handle_irq_event+0x41/0x61
  handle_edge_irq+0xe1/0xfa
  handle_irq+0x155/0x166
  do_IRQ+0x5a/0xea
  ret_from_intr+0x0/0x15
  <EOI>
  arch_cpu_idle+0xf/0x11
  cpu_startup_entry+0x22f/0x3cb
  rest_init+0x80/0x84
  start_kernel+0x470/0x47d
  x86_64_start_reservations+0x2a/0x2c
  x86_64_start_kernel+0x14b/0x15a

Move DMAR faults clearing out of irq-disabled critical section by
proceeding with that in a workqueue thread.
The next patch will correct the definition of dmar_fault().

Signed-off-by: Dmitry Safonov <dima-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/dmar.c        | 48 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/intel-iommu.c |  6 ++++++
 include/linux/dmar.h        |  1 +
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 33fb4244e438..34a53aeede38 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -71,6 +71,12 @@ struct acpi_table_header * __initdata dmar_tbl;
 static int dmar_dev_scope_status = 1;
 static unsigned long dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)];
 
+static struct workqueue_struct *fault_wq;
+struct fault {
+	struct work_struct work;
+	struct intel_iommu *iommu;
+};
+
 static int alloc_iommu(struct dmar_drhd_unit *drhd);
 static void free_iommu(struct intel_iommu *iommu);
 
@@ -811,6 +817,14 @@ void __init dmar_register_bus_notifier(void)
 	bus_register_notifier(&pci_bus_type, &dmar_pci_bus_nb);
 }
 
+int __init dmar_init_workqueue(void)
+{
+	fault_wq = alloc_workqueue("dmar_fault", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
+	if (fault_wq == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
 
 int __init dmar_table_init(void)
 {
@@ -1695,6 +1709,38 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static void dmar_fault_work(struct work_struct *work)
+{
+	struct fault *fault = container_of(work, struct fault, work);
+
+	dmar_fault(-1, (void*)fault->iommu);
+}
+
+static irqreturn_t dmar_fault_handler(int irq, void *dev_id)
+{
+	struct intel_iommu *iommu = dev_id;
+	struct fault *fault;
+
+	fault = kzalloc(sizeof(*fault), GFP_ATOMIC);
+	if (fault == NULL) {
+		unsigned long flag;
+
+		/* During OOM - clear faults and let device re-fault */
+		raw_spin_lock_irqsave(&iommu->register_lock, flag);
+		clear_primary_faults(iommu);
+		raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
+
+		return IRQ_HANDLED;
+	}
+
+	fault->iommu = iommu;
+	INIT_WORK(&fault->work, dmar_fault_work);
+
+	queue_work(fault_wq, &fault->work);
+
+	return IRQ_HANDLED;
+}
+
 int dmar_set_interrupt(struct intel_iommu *iommu)
 {
 	int irq, ret;
@@ -1713,7 +1759,7 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
 		return -EINVAL;
 	}
 
-	ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu);
+	ret = request_irq(irq, dmar_fault_handler, IRQF_NO_THREAD, iommu->name, iommu);
 	if (ret)
 		pr_err("Can't request irq\n");
 	return ret;
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 582fd01cb7d1..64ad9786f5b9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4783,6 +4783,12 @@ int __init intel_iommu_init(void)
 		goto out_free_dmar;
 	}
 
+	if (dmar_init_workqueue()) {
+		if (force_on)
+			panic("tboot: Failed to init workqueue for DMARs\n");
+		goto out_free_dmar;
+	}
+
 	if (list_empty(&dmar_rmrr_units))
 		pr_info("No RMRR found\n");
 
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index e2433bc50210..5de4932a6ad8 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -113,6 +113,7 @@ static inline bool dmar_rcu_check(void)
 extern int dmar_table_init(void);
 extern int dmar_dev_scope_init(void);
 extern void dmar_register_bus_notifier(void);
+extern int dmar_init_workqueue(void);
 extern int dmar_parse_dev_scope(void *start, void *end, int *cnt,
 				struct dmar_dev_scope **devices, u16 segment);
 extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 5/6] iommu/intel: Rename dmar_fault() => dmar_serve_faults()
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Alex Williamson, David Woodhouse,
	Ingo Molnar, Joerg Roedel, Lu Baolu, iommu

Fix the return value, parameters and a bit better naming.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/iommu/dmar.c                | 8 +++-----
 drivers/iommu/intel-iommu.c         | 2 +-
 drivers/iommu/intel_irq_remapping.c | 2 +-
 include/linux/dmar.h                | 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 34a53aeede38..37a1147a7592 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1635,9 +1635,8 @@ static void clear_primary_faults(struct intel_iommu *iommu)
 }
 
 #define PRIMARY_FAULT_REG_LEN (16)
-irqreturn_t dmar_fault(int irq, void *dev_id)
+void dmar_serve_faults(struct intel_iommu *iommu)
 {
-	struct intel_iommu *iommu = dev_id;
 	int reg, fault_index;
 	u32 fault_status;
 	unsigned long flag;
@@ -1706,14 +1705,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 
 unlock_exit:
 	raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
-	return IRQ_HANDLED;
 }
 
 static void dmar_fault_work(struct work_struct *work)
 {
 	struct fault *fault = container_of(work, struct fault, work);
 
-	dmar_fault(-1, (void*)fault->iommu);
+	dmar_serve_faults(fault->iommu);
 }
 
 static irqreturn_t dmar_fault_handler(int irq, void *dev_id)
@@ -1786,7 +1784,7 @@ int __init enable_drhd_fault_handling(void)
 		/*
 		 * Clear any previous faults.
 		 */
-		dmar_fault(iommu->irq, iommu);
+		dmar_serve_faults(iommu);
 		fault_status = readl(iommu->reg + DMAR_FSTS_REG);
 		writel(fault_status, iommu->reg + DMAR_FSTS_REG);
 	}
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 64ad9786f5b9..3b55376b3b46 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3015,7 +3015,7 @@ static void intel_iommu_init_qi(struct intel_iommu *iommu)
 		/*
 		 * Clear any previous faults.
 		 */
-		dmar_fault(-1, iommu);
+		dmar_serve_faults(iommu);
 		/*
 		 * Disable queued invalidation if supported and already enabled
 		 * before OS handover.
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 66f69af2c219..0958fdc76564 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -558,7 +558,7 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 		/*
 		 * Clear previous faults.
 		 */
-		dmar_fault(-1, iommu);
+		dmar_serve_faults(iommu);
 		dmar_disable_qi(iommu);
 
 		if (dmar_enable_qi(iommu)) {
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 5de4932a6ad8..6e9a5e4ac73a 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -280,7 +280,7 @@ extern void dmar_msi_mask(struct irq_data *data);
 extern void dmar_msi_read(int irq, struct msi_msg *msg);
 extern void dmar_msi_write(int irq, struct msi_msg *msg);
 extern int dmar_set_interrupt(struct intel_iommu *iommu);
-extern irqreturn_t dmar_fault(int irq, void *dev_id);
+extern void dmar_serve_faults(struct intel_iommu *iommu);
 extern int dmar_alloc_hwirq(int id, int node, void *arg);
 extern void dmar_free_hwirq(int irq);
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 5/6] iommu/intel: Rename dmar_fault() => dmar_serve_faults()
@ 2018-02-12 16:48   ` Dmitry Safonov via iommu
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov via iommu @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Dmitry Safonov, 0x7f454c46-Re5JQEeQqe8AvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Woodhouse, Ingo Molnar

Fix the return value, parameters and a bit better naming.

Signed-off-by: Dmitry Safonov <dima-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/dmar.c                | 8 +++-----
 drivers/iommu/intel-iommu.c         | 2 +-
 drivers/iommu/intel_irq_remapping.c | 2 +-
 include/linux/dmar.h                | 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 34a53aeede38..37a1147a7592 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1635,9 +1635,8 @@ static void clear_primary_faults(struct intel_iommu *iommu)
 }
 
 #define PRIMARY_FAULT_REG_LEN (16)
-irqreturn_t dmar_fault(int irq, void *dev_id)
+void dmar_serve_faults(struct intel_iommu *iommu)
 {
-	struct intel_iommu *iommu = dev_id;
 	int reg, fault_index;
 	u32 fault_status;
 	unsigned long flag;
@@ -1706,14 +1705,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 
 unlock_exit:
 	raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
-	return IRQ_HANDLED;
 }
 
 static void dmar_fault_work(struct work_struct *work)
 {
 	struct fault *fault = container_of(work, struct fault, work);
 
-	dmar_fault(-1, (void*)fault->iommu);
+	dmar_serve_faults(fault->iommu);
 }
 
 static irqreturn_t dmar_fault_handler(int irq, void *dev_id)
@@ -1786,7 +1784,7 @@ int __init enable_drhd_fault_handling(void)
 		/*
 		 * Clear any previous faults.
 		 */
-		dmar_fault(iommu->irq, iommu);
+		dmar_serve_faults(iommu);
 		fault_status = readl(iommu->reg + DMAR_FSTS_REG);
 		writel(fault_status, iommu->reg + DMAR_FSTS_REG);
 	}
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 64ad9786f5b9..3b55376b3b46 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3015,7 +3015,7 @@ static void intel_iommu_init_qi(struct intel_iommu *iommu)
 		/*
 		 * Clear any previous faults.
 		 */
-		dmar_fault(-1, iommu);
+		dmar_serve_faults(iommu);
 		/*
 		 * Disable queued invalidation if supported and already enabled
 		 * before OS handover.
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 66f69af2c219..0958fdc76564 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -558,7 +558,7 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 		/*
 		 * Clear previous faults.
 		 */
-		dmar_fault(-1, iommu);
+		dmar_serve_faults(iommu);
 		dmar_disable_qi(iommu);
 
 		if (dmar_enable_qi(iommu)) {
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 5de4932a6ad8..6e9a5e4ac73a 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -280,7 +280,7 @@ extern void dmar_msi_mask(struct irq_data *data);
 extern void dmar_msi_read(int irq, struct msi_msg *msg);
 extern void dmar_msi_write(int irq, struct msi_msg *msg);
 extern int dmar_set_interrupt(struct intel_iommu *iommu);
-extern irqreturn_t dmar_fault(int irq, void *dev_id);
+extern void dmar_serve_faults(struct intel_iommu *iommu);
 extern int dmar_alloc_hwirq(int id, int node, void *arg);
 extern void dmar_free_hwirq(int irq);
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv2 6/6] iommu/intel: Ratelimit each dmar fault printing
  2018-02-12 16:48 [PATCHv2 0/6] iommu/intel: Handle DMAR faults in a wq Dmitry Safonov
                   ` (4 preceding siblings ...)
  2018-02-12 16:48   ` Dmitry Safonov via iommu
@ 2018-02-12 16:48 ` Dmitry Safonov
  2018-02-12 17:26   ` [PATCHv3 " Dmitry Safonov
  2018-02-13 16:44 ` [PATCHv2 0/6] iommu/intel: Handle DMAR faults in a wq Joerg Roedel
  6 siblings, 1 reply; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-12 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Alex Williamson, David Woodhouse,
	Ingo Molnar, Joerg Roedel, Lu Baolu, iommu

There is a ratelimit for printing, but it's incremented each time the
cpu recives dmar fault interrupt. While one interrupt may signal about
*many* faults. And delayed to wq dmar fault work might receive even
more faults to clean, than it was earlier.

Ratelimit each fault printing rather than each time the wq is scheduled.

Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler")

BUG: spinlock lockup suspected on CPU#0, CliShell/9903
 lock: 0xffffffff81a47440, .magic: dead4ead, .owner: kworker/u16:2/8915, .owner_cpu: 6
CPU: 0 PID: 9903 Comm: CliShell
Call Trace:$\n'
[..] dump_stack+0x65/0x83$\n'
[..] spin_dump+0x8f/0x94$\n'
[..] do_raw_spin_lock+0x123/0x170$\n'
[..] _raw_spin_lock_irqsave+0x32/0x3a$\n'
[..] uart_chars_in_buffer+0x20/0x4d$\n'
[..] tty_chars_in_buffer+0x18/0x1d$\n'
[..] n_tty_poll+0x1cb/0x1f2$\n'
[..] tty_poll+0x5e/0x76$\n'
[..] do_select+0x363/0x629$\n'
[..] compat_core_sys_select+0x19e/0x239$\n'
[..] compat_SyS_select+0x98/0xc0$\n'
[..] sysenter_dispatch+0x7/0x25$\n'
[..]
NMI backtrace for cpu 6
CPU: 6 PID: 8915 Comm: kworker/u16:2
Workqueue: dmar_fault dmar_fault_work
Call Trace:$\n'
[..] wait_for_xmitr+0x26/0x8f$\n'
[..] serial8250_console_putchar+0x1c/0x2c$\n'
[..] uart_console_write+0x40/0x4b$\n'
[..] serial8250_console_write+0xe6/0x13f$\n'
[..] call_console_drivers.constprop.13+0xce/0x103$\n'
[..] console_unlock+0x1f8/0x39b$\n'
[..] vprintk_emit+0x39e/0x3e6$\n'
[..] printk+0x4d/0x4f$\n'
[..] dmar_fault+0x1a8/0x1fc$\n'
[..] dmar_fault_work+0x15/0x17$\n'
[..] process_one_work+0x1e8/0x3a9$\n'
[..] worker_thread+0x25d/0x345$\n'
[..] kthread+0xea/0xf2$\n'
[..] ret_from_fork+0x58/0x90$\n'

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/iommu/dmar.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 37a1147a7592..7df69f410589 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1640,17 +1640,14 @@ void dmar_serve_faults(struct intel_iommu *iommu)
 	int reg, fault_index;
 	u32 fault_status;
 	unsigned long flag;
-	bool ratelimited;
 	static DEFINE_RATELIMIT_STATE(rs,
 				      DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
 
-	/* Disable printing, simply clear the fault when ratelimited */
-	ratelimited = !__ratelimit(&rs);
 
 	raw_spin_lock_irqsave(&iommu->register_lock, flag);
 	fault_status = readl(iommu->reg + DMAR_FSTS_REG);
-	if (fault_status && !ratelimited)
+	if (fault_status && __ratelimit(&rs))
 		pr_err("DRHD: handling fault status reg %x\n", fault_status);
 
 	/* TBD: ignore advanced fault log currently */
@@ -1660,6 +1657,8 @@ void dmar_serve_faults(struct intel_iommu *iommu)
 	fault_index = dma_fsts_fault_record_index(fault_status);
 	reg = cap_fault_reg_offset(iommu->cap);
 	while (1) {
+		/* Disable printing, simply clear the fault when ratelimited */
+		bool ratelimited = __ratelimit(&rs);
 		u8 fault_reason;
 		u16 source_id;
 		u64 guest_addr;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv3 6/6] iommu/intel: Ratelimit each dmar fault printing
  2018-02-12 16:48 ` [PATCHv2 6/6] iommu/intel: Ratelimit each dmar fault printing Dmitry Safonov
@ 2018-02-12 17:26   ` Dmitry Safonov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-12 17:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: 0x7f454c46, Dmitry Safonov

There is a ratelimit for printing, but it's incremented each time the
cpu recives dmar fault interrupt. While one interrupt may signal about
*many* faults. And delayed to wq dmar fault work might receive even
more faults to clean, than it was earlier.

Ratelimit each fault printing rather than each time the wq is scheduled.

Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler")

BUG: spinlock lockup suspected on CPU#0, CliShell/9903
 lock: 0xffffffff81a47440, .magic: dead4ead, .owner: kworker/u16:2/8915, .owner_cpu: 6
CPU: 0 PID: 9903 Comm: CliShell
Call Trace:$\n'
[..] dump_stack+0x65/0x83$\n'
[..] spin_dump+0x8f/0x94$\n'
[..] do_raw_spin_lock+0x123/0x170$\n'
[..] _raw_spin_lock_irqsave+0x32/0x3a$\n'
[..] uart_chars_in_buffer+0x20/0x4d$\n'
[..] tty_chars_in_buffer+0x18/0x1d$\n'
[..] n_tty_poll+0x1cb/0x1f2$\n'
[..] tty_poll+0x5e/0x76$\n'
[..] do_select+0x363/0x629$\n'
[..] compat_core_sys_select+0x19e/0x239$\n'
[..] compat_SyS_select+0x98/0xc0$\n'
[..] sysenter_dispatch+0x7/0x25$\n'
[..]
NMI backtrace for cpu 6
CPU: 6 PID: 8915 Comm: kworker/u16:2
Workqueue: dmar_fault dmar_fault_work
Call Trace:$\n'
[..] wait_for_xmitr+0x26/0x8f$\n'
[..] serial8250_console_putchar+0x1c/0x2c$\n'
[..] uart_console_write+0x40/0x4b$\n'
[..] serial8250_console_write+0xe6/0x13f$\n'
[..] call_console_drivers.constprop.13+0xce/0x103$\n'
[..] console_unlock+0x1f8/0x39b$\n'
[..] vprintk_emit+0x39e/0x3e6$\n'
[..] printk+0x4d/0x4f$\n'
[..] dmar_fault+0x1a8/0x1fc$\n'
[..] dmar_fault_work+0x15/0x17$\n'
[..] process_one_work+0x1e8/0x3a9$\n'
[..] worker_thread+0x25d/0x345$\n'
[..] kthread+0xea/0xf2$\n'
[..] ret_from_fork+0x58/0x90$\n'

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
v3: inverse `ratelimited'

 drivers/iommu/dmar.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 37a1147a7592..3f98a5fd59c0 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1640,17 +1640,14 @@ void dmar_serve_faults(struct intel_iommu *iommu)
 	int reg, fault_index;
 	u32 fault_status;
 	unsigned long flag;
-	bool ratelimited;
 	static DEFINE_RATELIMIT_STATE(rs,
 				      DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
 
-	/* Disable printing, simply clear the fault when ratelimited */
-	ratelimited = !__ratelimit(&rs);
 
 	raw_spin_lock_irqsave(&iommu->register_lock, flag);
 	fault_status = readl(iommu->reg + DMAR_FSTS_REG);
-	if (fault_status && !ratelimited)
+	if (fault_status && __ratelimit(&rs))
 		pr_err("DRHD: handling fault status reg %x\n", fault_status);
 
 	/* TBD: ignore advanced fault log currently */
@@ -1660,6 +1657,8 @@ void dmar_serve_faults(struct intel_iommu *iommu)
 	fault_index = dma_fsts_fault_record_index(fault_status);
 	reg = cap_fault_reg_offset(iommu->cap);
 	while (1) {
+		/* Disable printing, simply clear the fault when ratelimited */
+		bool ratelimited = !__ratelimit(&rs);
 		u8 fault_reason;
 		u16 source_id;
 		u64 guest_addr;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue
  2018-02-12 16:48   ` Dmitry Safonov via iommu
  (?)
@ 2018-02-13 16:35   ` Joerg Roedel
  2018-02-13 17:38     ` Dmitry Safonov
  -1 siblings, 1 reply; 17+ messages in thread
From: Joerg Roedel @ 2018-02-13 16:35 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Alex Williamson, David Woodhouse,
	Ingo Molnar, Lu Baolu, iommu

On Mon, Feb 12, 2018 at 04:48:23PM +0000, Dmitry Safonov wrote:
> dmar_fault() reports/handles/cleans DMAR faults in a cycle one-by-one.
> The nuisance is that it's set as a irq handler and runs with disabled
> interrupts - which works OK if you have only a couple of DMAR faults,
> but becomes a problem if your intel iommu has a plenty of mappings.

I don't think that a work-queue is the right solution here, it adds a
long delay until the log is processed. During that delay, and with high
fault rates the error log will overflow during that delay.

Here is what I think you should do instead to fix the soft-lockups:

First, unmask the fault reporting irq so that you will get subsequent
irqs. Then:

	* For Primary Fault Reporting just cycle once through all
	  supported fault recording registers.

	* For Advanced Fault Reporting, read start and end pointer of
	  the log and process all entries.

After that return from the fault handler and let the next irq handle
additional faults that might have been recorded while the previous
handler was running.

And of course, ratelimiting the fault printouts is always a good idea.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv2 0/6] iommu/intel: Handle DMAR faults in a wq
  2018-02-12 16:48 [PATCHv2 0/6] iommu/intel: Handle DMAR faults in a wq Dmitry Safonov
                   ` (5 preceding siblings ...)
  2018-02-12 16:48 ` [PATCHv2 6/6] iommu/intel: Ratelimit each dmar fault printing Dmitry Safonov
@ 2018-02-13 16:44 ` Joerg Roedel
  6 siblings, 0 replies; 17+ messages in thread
From: Joerg Roedel @ 2018-02-13 16:44 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Alex Williamson, David Woodhouse,
	Ingo Molnar, Lu Baolu, iommu

Hi Dmitry,

On Mon, Feb 12, 2018 at 04:48:19PM +0000, Dmitry Safonov wrote:
> Dmitry Safonov (6):
>   iommu/intel: Add __init for dmar_register_bus_notifier()
>   iommu/intel: Clean/document fault status flags
>   iommu/intel: Introduce clear_primary_faults() helper
>   iommu/intel: Handle DMAR faults on workqueue
>   iommu/intel: Rename dmar_fault() => dmar_serve_faults()
>   iommu/intel: Ratelimit each dmar fault printing

I applied patches 1 and 2 because they are nice cleanups, thanks. For
the rest, I am not convinced that work-queues are a good solution for
the soft lockups you are seeing. Please consider the approach I outlined
in my reply to patch 4.

Also, please prefix your next patches with 'iommu/vt-d' instead of
'iommu/intel'.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue
  2018-02-13 16:35   ` Joerg Roedel
@ 2018-02-13 17:38     ` Dmitry Safonov
  2018-02-15 19:09       ` Dmitry Safonov
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-13 17:38 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-kernel, 0x7f454c46, Alex Williamson, David Woodhouse,
	Ingo Molnar, Lu Baolu, iommu

On Tue, 2018-02-13 at 17:35 +0100, Joerg Roedel wrote:
> On Mon, Feb 12, 2018 at 04:48:23PM +0000, Dmitry Safonov wrote:
> > dmar_fault() reports/handles/cleans DMAR faults in a cycle one-by-
> > one.
> > The nuisance is that it's set as a irq handler and runs with
> > disabled
> > interrupts - which works OK if you have only a couple of DMAR
> > faults,
> > but becomes a problem if your intel iommu has a plenty of mappings.
> 
> I don't think that a work-queue is the right solution here, it adds a
> long delay until the log is processed. During that delay, and with
> high
> fault rates the error log will overflow during that delay.
> 
> Here is what I think you should do instead to fix the soft-lockups:
> 
> First, unmask the fault reporting irq so that you will get subsequent
> irqs. Then:
> 
> 	* For Primary Fault Reporting just cycle once through all
> 	  supported fault recording registers.
> 
> 	* For Advanced Fault Reporting, read start and end pointer of
> 	  the log and process all entries.
> 
> After that return from the fault handler and let the next irq handle
> additional faults that might have been recorded while the previous
> handler was running.

Ok, will re-do this way, thanks.

> And of course, ratelimiting the fault printouts is always a good
> idea.

-- 
            Dima

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue
  2018-02-13 17:38     ` Dmitry Safonov
@ 2018-02-15 19:09       ` Dmitry Safonov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Safonov @ 2018-02-15 19:09 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-kernel, 0x7f454c46, Alex Williamson, David Woodhouse,
	Ingo Molnar, Lu Baolu, iommu

On Tue, 2018-02-13 at 17:38 +0000, Dmitry Safonov wrote:
> On Tue, 2018-02-13 at 17:35 +0100, Joerg Roedel wrote:
> > On Mon, Feb 12, 2018 at 04:48:23PM +0000, Dmitry Safonov wrote:
> > > dmar_fault() reports/handles/cleans DMAR faults in a cycle one-
> > > by-
> > > one.
> > > The nuisance is that it's set as a irq handler and runs with
> > > disabled
> > > interrupts - which works OK if you have only a couple of DMAR
> > > faults,
> > > but becomes a problem if your intel iommu has a plenty of
> > > mappings.
> > 
> > I don't think that a work-queue is the right solution here, it adds
> > a
> > long delay until the log is processed. During that delay, and with
> > high
> > fault rates the error log will overflow during that delay.
> > 
> > Here is what I think you should do instead to fix the soft-lockups:
> > 
> > First, unmask the fault reporting irq so that you will get
> > subsequent
> > irqs. Then:
> > 
> > 	* For Primary Fault Reporting just cycle once through all
> > 	  supported fault recording registers.
> > 
> > 	* For Advanced Fault Reporting, read start and end pointer of
> > 	  the log and process all entries.
> > 
> > After that return from the fault handler and let the next irq
> > handle
> > additional faults that might have been recorded while the previous
> > handler was running.
> 
> Ok, will re-do this way, thanks.
> 
> > And of course, ratelimiting the fault printouts is always a good
> > idea.

I've looked at this more,
It turns out that fault handler is used only to printk() errors about
faults happened. So, we shouldn't care about added delay with wq, as
it's just another ratelimit for printing of faults.
I also measured how much time it takes to read/clean fault and how much
time it takes to print info about the fault.

Heh, anyway I'll resend v3 of 6/6 patch, which ratelimits printks by
each fault, rather by each irq - it'll work for me with only that patch
applied.

-- 
            Dima

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-02-15 19:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-12 16:48 [PATCHv2 0/6] iommu/intel: Handle DMAR faults in a wq Dmitry Safonov
2018-02-12 16:48 ` [PATCHv2 1/6] iommu/intel: Add __init for dmar_register_bus_notifier() Dmitry Safonov
2018-02-12 16:48   ` Dmitry Safonov via iommu
2018-02-12 16:48 ` [PATCHv2 2/6] iommu/intel: Clean/document fault status flags Dmitry Safonov
2018-02-12 16:48   ` Dmitry Safonov via iommu
2018-02-12 16:48 ` [PATCHv2 3/6] iommu/intel: Introduce clear_primary_faults() helper Dmitry Safonov
2018-02-12 16:48   ` Dmitry Safonov via iommu
2018-02-12 16:48 ` [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue Dmitry Safonov
2018-02-12 16:48   ` Dmitry Safonov via iommu
2018-02-13 16:35   ` Joerg Roedel
2018-02-13 17:38     ` Dmitry Safonov
2018-02-15 19:09       ` Dmitry Safonov
2018-02-12 16:48 ` [PATCHv2 5/6] iommu/intel: Rename dmar_fault() => dmar_serve_faults() Dmitry Safonov
2018-02-12 16:48   ` Dmitry Safonov via iommu
2018-02-12 16:48 ` [PATCHv2 6/6] iommu/intel: Ratelimit each dmar fault printing Dmitry Safonov
2018-02-12 17:26   ` [PATCHv3 " Dmitry Safonov
2018-02-13 16:44 ` [PATCHv2 0/6] iommu/intel: Handle DMAR faults in a wq Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.