linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping
@ 2021-05-04 19:10 Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 1/7] x86/apic: Add irq_cfg::delivery_mode Ricardo Neri
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:10 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon
  Cc: woodhouse, Jacob Pan, Lu Baolu, Stephane Eranian,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, iommu, x86,
	linux-kernel, Ravi V. Shankar, Ricardo Neri, Ricardo Neri

Hi IOMMU experts,

I proposed a hardlockup detector driven by the HPET timer [1]. Such
detector is driven by a single timer. The hardlockup detector brings the
extra complexity of having to update the affinity of the interrupt
periodically and initiated from NMI context. The proposed design only
requires updating the affinity every watchdog_thresh (the interval is
between [1, 60] seconds). Also, the affinity update is offloaded to
an irq_work. Handling the HPET interrupt affinity is trivial with
!intremap since the detector composes the MSI message and writes it
directly to the HPET registers.

However, for intremap we must use the existing IOMMU drivers as well as
the kernel's irq plumbing. Thomas Gleixner has imposed two restrictions:
  1) Do not implement an IRQF_NMI flag for x86 as it is not possible to
     determine the source of an NMI [2].
  2) Use the irq subsystem to update the affinity of the HPET
     interrupt [3].

1) implies that the interrupt remapping drivers need to implement a quirk
to identify the HPET interrupt and update its delivery mode to NMI. 2)
means that the hardlockup detector must use request_irq() to allocate the
HPET interrupt.

This patch series attempts to meet the requirements above by
  a) Decoupling the delivery mode of an APIC interrupt from the delivery
     mode of the APIC driver (patch 1)
  b) Implement quirks in the Intel and AMD IOMMU drivers to identify the
     HPET timer and update the delivery mode accordingly (patches 2-5).
  c) Add support for interrupt remapping in the HPET hardlockup detector
     in [1]. This includes the unavoidable eyesore of using request_irq()
     and having a useless regular interrupt handler (patch 6).

I would like to get your feedback on whether the HPET NMI quirk looks
sane to you and whether offloading the affinity setup to an irq_work
could pose issues.

Thanks and BR,
Ricardo

[1]. https://lore.kernel.org/lkml/20210504190526.22347-1-ricardo.neri-calderon@linux.intel.com/T/#mf77988cc98f9ca6988831e17f68394577388959d
[2]. https://lore.kernel.org/lkml/alpine.DEB.2.21.1808021137400.2037@nanos.tec.linutronix.de/
[3]. https://lore.kernel.org/lkml/alpine.DEB.2.21.1906161042080.1760@nanos.tec.linutronix.de/

Changes since v4:
 * With !CONFIG_IRQ_REMAP [1] now disables the HPET channel before changing
   the MSI Destination ID field. This should avoid races between a pending
   interrupt and updating the detector's interrupt affinity. (Ashok)
 * Rebased to use new enumeration apic_delivery_modes.
 * Removed custom functions to allocate an interrupt for the detector
   and instead added support to identify the detector's interrupt and
   change the delivery mode.
 * With interrupt remapping enabled, use request_irq().
 * Added support for AMD IOMMU.

Changes since v3:
 * None

Changes since v2:
 * None

Changes since v1:
 * Introduced support for interrupt remapping

Ricardo Neri (7):
  x86/apic: Add irq_cfg::delivery_mode
  x86/hpet: Introduce function to identify HPET hardlockup detector irq
  iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode
  iommu/amd: Set the IRTE delivery mode from irq_cfg
  iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
  iommu/amd: Fixup delivery mode of the HPET hardlockup interrupt
  x86/watchdog/hardlockup/hpet: Support interrupt remapping

 arch/x86/include/asm/hpet.h         |  5 +++
 arch/x86/include/asm/hw_irq.h       |  1 +
 arch/x86/kernel/apic/vector.c       | 10 ++++++
 arch/x86/kernel/hpet.c              | 36 ++++++++++++++++++++++
 arch/x86/kernel/watchdog_hld_hpet.c | 48 +++++++++++++++++++++++++----
 drivers/iommu/amd/iommu.c           | 11 ++++++-
 drivers/iommu/intel/irq_remapping.c | 20 ++++++++----
 7 files changed, 118 insertions(+), 13 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH v5 1/7] x86/apic: Add irq_cfg::delivery_mode
  2021-05-04 19:10 [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping Ricardo Neri
@ 2021-05-04 19:10 ` Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 2/7] x86/hpet: Introduce function to identify HPET hardlockup detector irq Ricardo Neri
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:10 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon
  Cc: woodhouse, Jacob Pan, Lu Baolu, Stephane Eranian,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, iommu, x86,
	linux-kernel, Ravi V. Shankar, Ricardo Neri, Ricardo Neri,
	Andi Kleen, David Woodhouse,
	x86 @ kernel . orgReviewed-by : Ashok Raj

Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. This new member can
be used to update the configuration of the delivery mode in each interrupt
domain.

Currently, all interrupt domains set the delivery mode of interrupts using
the APIC setting. Interrupt domains use an irq_cfg data structure to
configure their own data structures and hardware resources. Thus, in order
to keep the current behavior, set the delivery mode of the irq
configuration that as the APIC setting. In this manner, irq domains can
obtain the delivery mode from the irq configuration data instead of the
APIC setting, if needed.

Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org> (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com> (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x86@kernel.orgReviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
 * Rebased to use new enumeration apic_delivery_modes.

Changes since v3:
 * None

Changes since v2:
 * Reduced scope to only add the interrupt delivery mode in
   struct irq_alloc_info.

Changes since v1:
 * Introduced this patch.
---
 arch/x86/include/asm/hw_irq.h |  1 +
 arch/x86/kernel/apic/vector.c | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index d465ece58151..370f4db0372b 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -90,6 +90,7 @@ struct irq_alloc_info {
 struct irq_cfg {
 	unsigned int		dest_apicid;
 	unsigned int		vector;
+	enum apic_delivery_modes	delivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6dbdc7c22bb7..d47ed07a56a4 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -567,6 +567,16 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
 		irqd->chip_data = apicd;
 		irqd->hwirq = virq + i;
 		irqd_set_single_target(irqd);
+
+		/*
+		 * Initialize the delivery mode of this irq to match the
+		 * default delivery mode of the APIC. This is useful for
+		 * children irq domains which want to take the delivery
+		 * mode from the individual irq configuration rather
+		 * than from the APIC.
+		 */
+		 apicd->hw_irq_cfg.delivery_mode = apic->delivery_mode;
+
 		/*
 		 * Prevent that any of these interrupts is invoked in
 		 * non interrupt context via e.g. generic_handle_irq()
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v5 2/7] x86/hpet: Introduce function to identify HPET hardlockup detector irq
  2021-05-04 19:10 [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 1/7] x86/apic: Add irq_cfg::delivery_mode Ricardo Neri
@ 2021-05-04 19:10 ` Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 3/7] iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode Ricardo Neri
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:10 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon
  Cc: woodhouse, Jacob Pan, Lu Baolu, Stephane Eranian,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, iommu, x86,
	linux-kernel, Ravi V. Shankar, Ricardo Neri, Ricardo Neri,
	Andi Kleen, David Woodhouse

The HPET hardlockup detector needs to deliver its interrupt as NMI.
In x86 there is not an IRQF_NMI flag that can be used in the irq plumbing
code to tell interrupt remapping drivers to set the interrupt delivery
mode accordingly. Hence, they must fixup the delivery mode internally.

Implement a method to determine if the interrupt being allocated belongs
to the HPET hardlockup detector.

Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org> (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com> (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x86@kernel.org
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
 * Introduced this patch. Previous versions had special functions to
   allocate and set the affinity of a remapped NMI interrupt.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c      | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index df11c7d4af44..5bf675970d4b 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -149,6 +149,7 @@ extern void hardlockup_detector_hpet_stop(void);
 extern void hardlockup_detector_hpet_enable(unsigned int cpu);
 extern void hardlockup_detector_hpet_disable(unsigned int cpu);
 extern void hardlockup_detector_switch_to_perf(void);
+extern bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info);
 #else
 static inline int hardlockup_detector_hpet_init(void)
 { return -ENODEV; }
@@ -156,6 +157,8 @@ static inline void hardlockup_detector_hpet_stop(void) {}
 static inline void hardlockup_detector_hpet_enable(unsigned int cpu) {}
 static inline void hardlockup_detector_hpet_disable(unsigned int cpu) {}
 static inline void hardlockup_detector_switch_to_perf(void) {}
+static inline bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info)
+{ return false; }
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #else /* CONFIG_HPET_TIMER */
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 5012590dc1b8..3e43e0f348b8 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1479,6 +1479,39 @@ struct hpet_hld_data *hpet_hld_get_timer(void)
 	hld_data = NULL;
 	return NULL;
 }
+
+/**
+ * is_hpet_irq_hardlockup_detector() - Identify the HPET hld interrupt info
+ * @info:	Interrupt allocation info, with private HPET channel data
+ *
+ * The HPET hardlockup detector is special as it needs its interrupts delivered
+ * as NMI. However, for interrupt remapping we use the existing irq subsystem
+ * to configure and route the HPET interrupt. Unfortunately, there is not a
+ * IRQF_NMI flag for x86. Instead, identify whether the interrupt being
+ * allocated for the HPET channel belongs to the hardlockup detector.
+ *
+ * Returns: True if @info indicates that it belongs to the HPET hardlockup
+ * detector. False otherwise.
+ */
+bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info)
+{
+	struct hpet_channel *hc;
+
+	if (!info)
+		return false;
+
+	if (info->type != X86_IRQ_ALLOC_TYPE_HPET)
+		return false;
+
+	hc = info->data;
+	if (!hc)
+		return false;
+
+	if (hc->mode == HPET_MODE_NMI_WATCHDOG)
+		return true;
+
+	return false;
+}
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v5 3/7] iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode
  2021-05-04 19:10 [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 1/7] x86/apic: Add irq_cfg::delivery_mode Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 2/7] x86/hpet: Introduce function to identify HPET hardlockup detector irq Ricardo Neri
@ 2021-05-04 19:10 ` Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 4/7] iommu/amd: Set the IRTE delivery mode from irq_cfg Ricardo Neri
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:10 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon
  Cc: woodhouse, Jacob Pan, Lu Baolu, Stephane Eranian,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, iommu, x86,
	linux-kernel, Ravi V. Shankar, Ricardo Neri, Ricardo Neri,
	Andi Kleen, David Woodhouse

A previous changeset introduced a new member to struct irq_cfg to specify
the delivery mode of an interrupt. Supporting the configuration of the
delivery mode would require adding a third argument to prepare_irte().
Instead, simply take a pointer to an irq_cfg data structure as the only
argument.

Always configure the delivery mode of the Interrupt Remapping Table
Entry using the values specified in the irq_cfg data structure.

This change does not change the existing behavior, as the delivery mode
of the APIC is used to configure the irq_cfg of each irq.

Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org> (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com> (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x86@kernel.org
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
 * None

Changes since v3:
 * None

Changes since v2:
 * None

Changes since v1:
 * Introduced this patch.
---
 drivers/iommu/intel/irq_remapping.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 611ef5243cb6..daa5df53db59 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1104,7 +1104,7 @@ void intel_irq_remap_add_device(struct dmar_pci_notify_info *info)
 	dev_set_msi_domain(&info->dev->dev, map_dev_to_ir(info->dev));
 }
 
-static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
+static void prepare_irte(struct irte *irte, struct irq_cfg *irq_cfg)
 {
 	memset(irte, 0, sizeof(*irte));
 
@@ -1118,9 +1118,9 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	 * irq migration in the presence of interrupt-remapping.
 	*/
 	irte->trigger_mode = 0;
-	irte->dlvry_mode = apic->delivery_mode;
-	irte->vector = vector;
-	irte->dest_id = IRTE_DEST(dest);
+	irte->dlvry_mode = irq_cfg->delivery_mode;
+	irte->vector = irq_cfg->vector;
+	irte->dest_id = IRTE_DEST(irq_cfg->dest_apicid);
 	irte->redir_hint = 1;
 }
 
@@ -1261,8 +1261,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 {
 	struct irte *irte = &data->irte_entry;
 
-	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
-
+	prepare_irte(irte, irq_cfg);
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 		/* Set source-id of interrupt request */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v5 4/7] iommu/amd: Set the IRTE delivery mode from irq_cfg
  2021-05-04 19:10 [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping Ricardo Neri
                   ` (2 preceding siblings ...)
  2021-05-04 19:10 ` [RFC PATCH v5 3/7] iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode Ricardo Neri
@ 2021-05-04 19:10 ` Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt Ricardo Neri
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:10 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon
  Cc: woodhouse, Jacob Pan, Lu Baolu, Stephane Eranian,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, iommu, x86,
	linux-kernel, Ravi V. Shankar, Ricardo Neri, Ricardo Neri,
	Andi Kleen, David Woodhouse

There is not hardware requirement to have a different delivery mode for
each interrupt. Instead of using the delivery mode of the APIC driver, use
the delivery mode of each specific interrupt configuration.

This allows to accommodate interrupts which require a specific delivery
mode, such as the HPET hardlockup detector.

Outside of such case, there are not functional changes since the delivery
mode of an interrupt is initialized with the delivery mode of the APIC
driver.

Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org> (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com> (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x86@kernel.org
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
 * Introduced this patch.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 drivers/iommu/amd/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40..e8d9fae0c766 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3122,7 +3122,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	data->irq_2_irte.devid = devid;
 	data->irq_2_irte.index = index + sub_handle;
-	iommu->irte_ops->prepare(data->entry, apic->delivery_mode,
+	iommu->irte_ops->prepare(data->entry, irq_cfg->delivery_mode,
 				 apic->dest_mode_logical, irq_cfg->vector,
 				 irq_cfg->dest_apicid, devid);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
  2021-05-04 19:10 [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping Ricardo Neri
                   ` (3 preceding siblings ...)
  2021-05-04 19:10 ` [RFC PATCH v5 4/7] iommu/amd: Set the IRTE delivery mode from irq_cfg Ricardo Neri
@ 2021-05-04 19:10 ` Ricardo Neri
  2021-05-04 23:03   ` Thomas Gleixner
  2021-05-14 21:31   ` Thomas Gleixner
  2021-05-04 19:10 ` [RFC PATCH v5 6/7] iommu/amd: " Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 7/7] x86/watchdog/hardlockup/hpet: Support interrupt remapping Ricardo Neri
  6 siblings, 2 replies; 11+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:10 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon
  Cc: woodhouse, Jacob Pan, Lu Baolu, Stephane Eranian,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, iommu, x86,
	linux-kernel, Ravi V. Shankar, Ricardo Neri, Ricardo Neri,
	Andi Kleen, David Woodhouse

The HPET hardlockup detector requires that the HPET timer delivers the
interrupt as NMI. When interrupt remapping is disabled, this can be
done by programming the HPET MSI registers directly. With interrupt
remapping, it is necessary to populate an entry in the interrupt
remapping table.

In x86 there is not an IRQF_NMI flag that can be used to indicate the
delivery mode when requesting an interrupt (via request_irq()). Thus,
there is no way for the interrupt remapping driver to know and set
the delivery mode.

Hence, when allocating an interrupt, check if such interrupt belongs to
the HPET hardlockup detector and fixup the delivery mode accordingly.

Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org> (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com> (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x86@kernel.org
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
 * Introduced this patch.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 drivers/iommu/intel/irq_remapping.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index daa5df53db59..b07c68ecac01 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -18,6 +18,7 @@
 #include <asm/apic.h>
 #include <asm/smp.h>
 #include <asm/cpu.h>
+#include <asm/hpet.h>
 #include <asm/irq_remapping.h>
 #include <asm/pci-direct.h>
 
@@ -1376,6 +1377,14 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain,
 		irq_data->hwirq = (index << 16) + i;
 		irq_data->chip_data = ird;
 		irq_data->chip = &intel_ir_chip;
+
+		/*
+		 * If we find the HPET hardlockup detector irq, fixup the
+		 * delivery mode.
+		 */
+		if (is_hpet_irq_hardlockup_detector(info))
+			irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;
+
 		intel_irq_remapping_prepare_irte(ird, irq_cfg, info, index, i);
 		irq_set_status_flags(virq + i, IRQ_MOVE_PCNTXT);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v5 6/7] iommu/amd: Fixup delivery mode of the HPET hardlockup interrupt
  2021-05-04 19:10 [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping Ricardo Neri
                   ` (4 preceding siblings ...)
  2021-05-04 19:10 ` [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt Ricardo Neri
@ 2021-05-04 19:10 ` Ricardo Neri
  2021-05-04 19:10 ` [RFC PATCH v5 7/7] x86/watchdog/hardlockup/hpet: Support interrupt remapping Ricardo Neri
  6 siblings, 0 replies; 11+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:10 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon
  Cc: woodhouse, Jacob Pan, Lu Baolu, Stephane Eranian,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, iommu, x86,
	linux-kernel, Ravi V. Shankar, Ricardo Neri, Ricardo Neri,
	Ashok Raj, Andi Kleen, David Woodhouse

The HPET hardlockup detector requires that the HPET timer delivers the
interrupt as NMI. When interrupt remapping is disabled, this can be
done by programming the HPET MSI registers directly. With interrupt
remapping, it is necessary to populate an entry in the interrupt
remapping table.

In x86 there is not an IRQF_NMI flag that can be used to indicate the
delivery mode when requesting an interrupt (via request_irq()). Thus,
there is no way for the interrupt remapping driver to know and set
the delivery mode.

Hence, when allocating an interrupt, check if such interrupt belongs to
the HPET hardlockup detector and fixup the delivery mode accordingly.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org> (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com> (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
 * Introduced this patch.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 drivers/iommu/amd/iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e8d9fae0c766..758e08ba42e6 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -35,6 +35,7 @@
 #include <asm/irq_remapping.h>
 #include <asm/io_apic.h>
 #include <asm/apic.h>
+#include <asm/hpet.h>
 #include <asm/hw_irq.h>
 #include <asm/proto.h>
 #include <asm/iommu.h>
@@ -3254,6 +3255,14 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
 		irq_data->hwirq = (devid << 16) + i;
 		irq_data->chip_data = data;
 		irq_data->chip = &amd_ir_chip;
+
+		/*
+		 * If we find the HPET hardlockup detector irq, fixup the
+		 * delivery mode.
+		 */
+		if (is_hpet_irq_hardlockup_detector(info))
+			cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;
+
 		irq_remapping_prepare_irte(data, cfg, info, devid, index, i);
 		irq_set_status_flags(virq + i, IRQ_MOVE_PCNTXT);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v5 7/7] x86/watchdog/hardlockup/hpet: Support interrupt remapping
  2021-05-04 19:10 [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping Ricardo Neri
                   ` (5 preceding siblings ...)
  2021-05-04 19:10 ` [RFC PATCH v5 6/7] iommu/amd: " Ricardo Neri
@ 2021-05-04 19:10 ` Ricardo Neri
  6 siblings, 0 replies; 11+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:10 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon
  Cc: woodhouse, Jacob Pan, Lu Baolu, Stephane Eranian,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, iommu, x86,
	linux-kernel, Ravi V. Shankar, Ricardo Neri, Ricardo Neri,
	Ashok Raj, Andi Kleen, David Woodhouse

When interrupt remapping is enabled in the system, the MSI interrupt
address and data fields must follow a special format that the IOMMU
defines.

However, the HPET hardlockup detector must rely on the interrupt
subsystem to have the interrupt remapping drivers allocate, activate,
and set the affinity of HPET timer interrupt. Hence, it must use
request_irq() to use such functionality.

In x86 there is not an IRQF_NMI flag to indicate to the interrupt
subsystem the delivery mode of the interrupt. A previous changset added
functionality to detect the interrupt of the HPET hardlockup detector
and fixup the delivery mode accordingly.

Also, since request_irq() is used, a non-NMI interrupt handler must be
defined. Even if it is not needed.

When Interrupt Remapping is enabled, use the new facility to ensure
interrupt is plumbed properly to work with interrupt remapping.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org> (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com> (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
 * Use request_irq() to obtain an IRTE for the HPET hardlockup detector
   instead of the custom interfaces previously implemented in the
   interrupt remapping drivers.
 * Simplified detection of interrupt remapping by checking the parent
   of the HPET irq domain.
 * Stopped using the HPET magic fields of struct irq_alloc_info. They
   were removed in commit 2bf1e7bcedb8 ("x86/msi: Consolidate HPET
   allocation")
 * Rephrased commit message for clarity. (Ashok)
 * Clarified error message of non-NMI handler. (Ashok)

Changes since v3:
 * None

Changes since v2:
 * None

Changes since v1:
 * Introduced this patch. Added custom functions in the Intel IOMMU driver
   to allocate an IRTE for the HPET hardlockup detector.
---
 arch/x86/include/asm/hpet.h         |  2 ++
 arch/x86/kernel/hpet.c              |  3 ++
 arch/x86/kernel/watchdog_hld_hpet.c | 48 +++++++++++++++++++++++++----
 3 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 5bf675970d4b..d130285ddc96 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -109,6 +109,7 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
  * @tsc_ticks_per_group:	TSC ticks that must elapse for each group of
  *				monitored CPUs.
  * @irq:			IRQ number assigned to the HPET channel
+ * @int_remap_enabled:		True if interrupt remapping is enabled
  * @handling_cpu:		CPU handling the HPET interrupt
  * @pkgs_per_group:		Number of physical packages in a group of CPUs
  *				receiving an IPI
@@ -133,6 +134,7 @@ struct hpet_hld_data {
 	u64		tsc_next;
 	u64		tsc_ticks_per_group;
 	int		irq;
+	bool		intr_remap_enabled;
 	u32		handling_cpu;
 	u32		pkgs_per_group;
 	u32		nr_groups;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3e43e0f348b8..ff4abdef5e15 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1464,6 +1464,9 @@ struct hpet_hld_data *hpet_hld_get_timer(void)
 	if (!hpet_domain)
 		goto err;
 
+	if (hpet_domain->parent != x86_vector_domain)
+		hld_data->intr_remap_enabled = true;
+
 	hc->mode = HPET_MODE_NMI_WATCHDOG;
 	irq = hpet_assign_irq(hpet_domain, hc, hc->num);
 	if (irq <= 0)
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c
index 3fd2405b31fa..265641d001ac 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -176,6 +176,14 @@ static int update_msi_destid(struct hpet_hld_data *hdata)
 {
 	u32 destid;
 
+	if (hdata->intr_remap_enabled) {
+		int ret;
+
+		ret = irq_set_affinity(hdata->irq,
+				       cpumask_of(hdata->handling_cpu));
+		return ret;
+	}
+
 	destid = apic->calc_dest_apicid(hdata->handling_cpu);
 	/*
 	 * HPET only supports a 32-bit MSI address register. Thus, only
@@ -393,26 +401,52 @@ static int hardlockup_detector_nmi_handler(unsigned int type,
 	return NMI_DONE;
 }
 
+/*
+ * When interrupt remapping is enabled, we request the irq for the detector
+ * using request_irq() and then we fixup the delivery mode to NMI using
+ * is_hpet_irq_hardlockup_detector(). If the latter fails, we will see a non-
+ * NMI interrupt.
+ *
+ */
+static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
+{
+	pr_err_once("Received a non-NMI interrupt. The HLD detector always uses NMIs!\n");
+	return IRQ_HANDLED;
+}
+
 /**
  * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
  * @data:	Data associated with the instance of the HPET timer to configure
  *
  * Configure the HPET timer to deliver interrupts via the Front-
  * Side Bus.
+ *
+ * Returns:
+ * 0 success. An error code if setup was unsuccessful.
  */
-static void setup_irq_msi_mode(struct hpet_hld_data *hdata)
+static int setup_irq_msi_mode(struct hpet_hld_data *hdata)
 {
+	s32 ret;
 	u32 v;
 
-	compose_msi_msg(hdata);
-	hpet_writel(hdata->msi_msg.data, HPET_Tn_ROUTE(hdata->channel));
-	hpet_writel(hdata->msi_msg.address_lo,
-		    HPET_Tn_ROUTE(hdata->channel) + 4);
+	if (hdata->intr_remap_enabled) {
+		ret = request_irq(hld_data->irq, hardlockup_detector_irq_handler,
+				  IRQF_TIMER, "hpet_hld", hld_data);
+		if (ret)
+			return ret;
+	} else {
+		compose_msi_msg(hdata);
+		hpet_writel(hdata->msi_msg.data, HPET_Tn_ROUTE(hdata->channel));
+		hpet_writel(hdata->msi_msg.address_lo,
+			    HPET_Tn_ROUTE(hdata->channel) + 4);
+	}
 
 	v = hpet_readl(HPET_Tn_CFG(hdata->channel));
 	v |= HPET_TN_FSB;
 
 	hpet_writel(v, HPET_Tn_CFG(hdata->channel));
+
+	return 0;
 }
 
 /**
@@ -430,7 +464,9 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 {
 	int ret;
 
-	setup_irq_msi_mode(hdata);
+	ret = setup_irq_msi_mode(hdata);
+	if (ret)
+		return ret;
 
 	ret = register_nmi_handler(NMI_WATCHDOG,
 				   hardlockup_detector_nmi_handler, 0,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
  2021-05-04 19:10 ` [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt Ricardo Neri
@ 2021-05-04 23:03   ` Thomas Gleixner
  2021-05-14  1:57     ` Ricardo Neri
  2021-05-14 21:31   ` Thomas Gleixner
  1 sibling, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2021-05-04 23:03 UTC (permalink / raw)
  To: Ricardo Neri, Joerg Roedel, Will Deacon
  Cc: woodhouse, Jacob Pan, Lu Baolu, Stephane Eranian, Ingo Molnar,
	Borislav Petkov, iommu, x86, linux-kernel, Ravi V. Shankar,
	Ricardo Neri, Ricardo Neri, Andi Kleen, David Woodhouse

On Tue, May 04 2021 at 12:10, Ricardo Neri wrote:
> In x86 there is not an IRQF_NMI flag that can be used to indicate the

There exists no IRQF_NMI flag at all. No architecture provides that.

> delivery mode when requesting an interrupt (via request_irq()). Thus,
> there is no way for the interrupt remapping driver to know and set
> the delivery mode.

There is no support for this today. So what?

> Hence, when allocating an interrupt, check if such interrupt belongs to
> the HPET hardlockup detector and fixup the delivery mode accordingly.

What?

> +		/*
> +		 * If we find the HPET hardlockup detector irq, fixup the
> +		 * delivery mode.
> +		 */
> +		if (is_hpet_irq_hardlockup_detector(info))
> +			irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;

Again. We are not sticking some random device checks into that
code. It's wrong and I explained it to you before.

  https://lore.kernel.org/lkml/alpine.DEB.2.21.1906161042080.1760@nanos.tec.linutronix.de/

But I'm happy to repeat it again:

  "No. This is horrible hackery violating all the layering which we carefully
   put into place to avoid exactly this kind of sprinkling conditionals into
   all code pathes.

   With some thought the existing irqdomain hierarchy can be used to achieve
   the same thing without tons of extra functions and conditionals."

So the outcome of thought and using the irqdomain hierarchy is:

   Replacing an hpet specific conditional in one place with an hpet
   specific conditional in a different place.

Impressive.

hpet_assign_irq(...., bool nmi)
  init_info(info)
    ...
    if (nmi)
        info.flags |= X86_IRQ_ALLOC_AS_NMI;
  
   irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info)
     intel_irq_remapping_alloc(..., info)
       irq_domain_alloc_irq_parents(..., info)
         x86_vector_alloc_irqs(..., info)
         {   
           if (info->flags & X86_IRQ_ALLOC_AS_NMI && nr_irqs != 1)
                  return -EINVAL;

           for (i = 0; i < nr_irqs; i++) {
             ....
             if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
                 irq_cfg_setup_nmi(apicd);
                 continue;
             }
             ...
         }

irq_cfg_setup_nmi() sets irq_cfg->delivery_mode and whatever is required
and everything else just works. Of course this needs a few other minor
tweaks but none of those introduces random hpet quirks all over the
place. Not convoluted enough, right?

But that solves none of other problems. Let me summarize again which
options or non-options we have:

    1) Selective IPIs from NMI context cannot work

       As explained in the other thread.

    2) Shorthand IPI allbutself from NMI
    
       This should work, but that obviously does not take the watchdog
       cpumask into account.

       Also this only works when IPI shorthand mode is enabled. See
       apic_smt_update() for details.

    3) Sending the IPIs from irq_work

       This would solve the problem, but if the CPU which is the NMI
       target is really stuck in an interrupt disabled region then the
       IPIs won't be sent.

       OTOH, if that's the case then the CPU which was processing the
       NMI will continue to be stuck until the next NMI hits which
       will detect that the CPU is stuck which is a good enough
       reason to send a shorthand IPI to all CPUs ignoring the
       watchdog cpumask.

       Same limitation vs. shorthand mode as #2

    4) Changing affinity of the HPET NMI from NMI

       As we established two years ago that cannot work with interrupt
       remapping

    5) Changing affinity of the HPET NMI from irq_work

       Same issues as #3

Anything else than #2 is just causing more problems than it solves, but
surely the NOHZ_FULL/isolation people might have opinions on this.

OTOH, as this is opt-in, anything which wants a watchdog mask which is
not the full online set, has to accept that HPET has these restrictions.

And that's exactly what I suggested two years ago:

 https://lore.kernel.org/lkml/alpine.DEB.2.21.1906172343120.1963@nanos.tec.linutronix.de/

  "It definitely would be worthwhile to experiment with that. if we
   could use shorthands (also for regular IPIs) that would be a great
   improvement in general and would nicely solve that NMI issue. Beware
   of the dragons though."

As a consequence of this conversation I implemented shorthand IPIs...

But I haven't seen any mentioning that this has been tried, why the
approach was not chosen or any discussion about that matter.

Not that I'm surprised.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
  2021-05-04 23:03   ` Thomas Gleixner
@ 2021-05-14  1:57     ` Ricardo Neri
  0 siblings, 0 replies; 11+ messages in thread
From: Ricardo Neri @ 2021-05-14  1:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Joerg Roedel, Will Deacon, woodhouse, Jacob Pan, Lu Baolu,
	Stephane Eranian, Ingo Molnar, Borislav Petkov, iommu, x86,
	linux-kernel, Ravi V. Shankar, Ricardo Neri, Andi Kleen,
	David Woodhouse

On Wed, May 05, 2021 at 01:03:18AM +0200, Thomas Gleixner wrote:
> On Tue, May 04 2021 at 12:10, Ricardo Neri wrote:

Thank you very much for your feedback, Thomas. I am sorry it took me a
while to reply to your email. I needed to digest and research your
comments.

> > In x86 there is not an IRQF_NMI flag that can be used to indicate the
> 
> There exists no IRQF_NMI flag at all. No architecture provides that.

Thank you for the clarification. I think I meant to say that there is a
request_nmi() function but AFAIK it is only used in the ARM PMU and
would not work on x86.

> 
> > delivery mode when requesting an interrupt (via request_irq()). Thus,
> > there is no way for the interrupt remapping driver to know and set
> > the delivery mode.
> 
> There is no support for this today. So what?

Using request_irq() plus a HPET quirk looked to me a reasonable
way to use the irqdomain hierarchy to allocate an interrupt with NMI as
the delivery mode.

> 
> > Hence, when allocating an interrupt, check if such interrupt belongs to
> > the HPET hardlockup detector and fixup the delivery mode accordingly.
> 
> What?
> 
> > +		/*
> > +		 * If we find the HPET hardlockup detector irq, fixup the
> > +		 * delivery mode.
> > +		 */
> > +		if (is_hpet_irq_hardlockup_detector(info))
> > +			irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;
> 
> Again. We are not sticking some random device checks into that
> code. It's wrong and I explained it to you before.
> 
>   https://lore.kernel.org/lkml/alpine.DEB.2.21.1906161042080.1760@nanos.tec.linutronix.de/
> 
> But I'm happy to repeat it again:
> 
>   "No. This is horrible hackery violating all the layering which we carefully
>    put into place to avoid exactly this kind of sprinkling conditionals into
>    all code pathes.
> 
>    With some thought the existing irqdomain hierarchy can be used to achieve
>    the same thing without tons of extra functions and conditionals."
> 
> So the outcome of thought and using the irqdomain hierarchy is:
> 
>    Replacing an hpet specific conditional in one place with an hpet
>    specific conditional in a different place.
> 
> Impressive.

I am sorry Thomas, I did try to make the quirk less hacky but I did not
think of the solution you provide below.

> 
> hpet_assign_irq(...., bool nmi)
>   init_info(info)
>     ...
>     if (nmi)
>         info.flags |= X86_IRQ_ALLOC_AS_NMI;
>   
>    irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info)
>      intel_irq_remapping_alloc(..., info)
>        irq_domain_alloc_irq_parents(..., info)
>          x86_vector_alloc_irqs(..., info)
>          {   
>            if (info->flags & X86_IRQ_ALLOC_AS_NMI && nr_irqs != 1)
>                   return -EINVAL;
> 
>            for (i = 0; i < nr_irqs; i++) {
>              ....
>              if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
>                  irq_cfg_setup_nmi(apicd);
>                  continue;
>              }
>              ...
>          }
> 
> irq_cfg_setup_nmi() sets irq_cfg->delivery_mode and whatever is required
> and everything else just works. Of course this needs a few other minor
> tweaks but none of those introduces random hpet quirks all over the
> place. Not convoluted enough, right?

Thanks for the detailed demonstration! It does seem cleaner than what I
implemented.

> 
> But that solves none of other problems. Let me summarize again which
> options or non-options we have:
> 
>     1) Selective IPIs from NMI context cannot work
> 
>        As explained in the other thread.
> 
>     2) Shorthand IPI allbutself from NMI
>     
>        This should work, but that obviously does not take the watchdog
>        cpumask into account.
> 
>        Also this only works when IPI shorthand mode is enabled. See
>        apic_smt_update() for details.
> 
>     3) Sending the IPIs from irq_work
> 
>        This would solve the problem, but if the CPU which is the NMI
>        target is really stuck in an interrupt disabled region then the
>        IPIs won't be sent.
> 
>        OTOH, if that's the case then the CPU which was processing the
>        NMI will continue to be stuck until the next NMI hits which
>        will detect that the CPU is stuck which is a good enough
>        reason to send a shorthand IPI to all CPUs ignoring the
>        watchdog cpumask.
> 
>        Same limitation vs. shorthand mode as #2
> 
>     4) Changing affinity of the HPET NMI from NMI
> 
>        As we established two years ago that cannot work with interrupt
>        remapping
> 
>     5) Changing affinity of the HPET NMI from irq_work
> 
>        Same issues as #3
> 
> Anything else than #2 is just causing more problems than it solves, but
> surely the NOHZ_FULL/isolation people might have opinions on this.
> 
> OTOH, as this is opt-in, anything which wants a watchdog mask which is
> not the full online set, has to accept that HPET has these restrictions.
> 
> And that's exactly what I suggested two years ago:
> 
>  https://lore.kernel.org/lkml/alpine.DEB.2.21.1906172343120.1963@nanos.tec.linutronix.de/
> 
>   "It definitely would be worthwhile to experiment with that. if we
>    could use shorthands (also for regular IPIs) that would be a great
>    improvement in general and would nicely solve that NMI issue. Beware
>    of the dragons though."
> 
> As a consequence of this conversation I implemented shorthand IPIs...
> 
> But I haven't seen any mentioning that this has been tried, why the
> approach was not chosen or any discussion about that matter.

Indeed, I focused on 5) and I overlooked your comment on using your
new support for shortand IPIs.

I'll go back and see to implement option #2, or perhaps the alternative
solution you proposed on a separate thread.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
  2021-05-04 19:10 ` [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt Ricardo Neri
  2021-05-04 23:03   ` Thomas Gleixner
@ 2021-05-14 21:31   ` Thomas Gleixner
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2021-05-14 21:31 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Joerg Roedel, Will Deacon, Jacob Pan, Lu Baolu, Stephane Eranian,
	Ingo Molnar, Borislav Petkov, iommu, x86, linux-kernel,
	Ravi V. Shankar, Ricardo Neri, Ricardo Neri, Andi Kleen,
	David Woodhouse

On Tue, May 04 2021 at 12:10, Ricardo Neri wrote:

Resending as the original one did not make it on the list because of
fatfingers. Sorry for the noise.

> In x86 there is not an IRQF_NMI flag that can be used to indicate the

There exists no IRQF_NMI flag at all. No architecture provides that.

> delivery mode when requesting an interrupt (via request_irq()). Thus,
> there is no way for the interrupt remapping driver to know and set
> the delivery mode.

There is no support for this today. So what?

> Hence, when allocating an interrupt, check if such interrupt belongs to
> the HPET hardlockup detector and fixup the delivery mode accordingly.

What?

> +		/*
> +		 * If we find the HPET hardlockup detector irq, fixup the
> +		 * delivery mode.
> +		 */
> +		if (is_hpet_irq_hardlockup_detector(info))
> +			irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;

Again. We are not sticking some random device checks into that
code. It's wrong and I explained it to you before.

  https://lore.kernel.org/lkml/alpine.DEB.2.21.1906161042080.1760@nanos.tec.linutronix.de/

But I'm happy to repeat it again:

  "No. This is horrible hackery violating all the layering which we carefully
   put into place to avoid exactly this kind of sprinkling conditionals into
   all code pathes.

   With some thought the existing irqdomain hierarchy can be used to achieve
   the same thing without tons of extra functions and conditionals."

So the outcome of thought and using the irqdomain hierarchy is:

   Replacing an hpet specific conditional in one place with an hpet
   specific conditional in a different place.

Impressive.

hpet_assign_irq(...., bool nmi)
  init_info(info)
    ...
    if (nmi)
        info.flags |= X86_IRQ_ALLOC_AS_NMI;

   irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info)
     intel_irq_remapping_alloc(..., info)
       irq_domain_alloc_irq_parents(..., info)
         x86_vector_alloc_irqs(..., info)
         {   
           if (info->flags & X86_IRQ_ALLOC_AS_NMI && nr_irqs != 1)
      	    return -EINVAL;

           for (i = 0; i < nr_irqs; i++) {
             ....
             if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
      	   irq_cfg_setup_nmi(apicd);
      	   continue;
             }
             ...
         }

irq_cfg_setup_nmi() sets irq_cfg->delivery_mode and whatever is required
and everything else just works. Of course this needs a few other minor
tweaks but none of those introduces random hpet quirks all over the
place. Not convoluted enough, right?

But that solves none of other problems. Let me summarize again which
options or non-options we have:

    1) Selective IPIs from NMI context cannot work

       As explained in the other thread.

    2) Shorthand IPI allbutself from NMI

       This should work, but that obviously does not take the watchdog
       cpumask into account.

       Also this only works when IPI shorthand mode is enabled. See
       apic_smt_update() for details.

    3) Sending the IPIs from irq_work

       This would solve the problem, but if the CPU which is the NMI
       target is really stuck in an interrupt disabled region then the
       IPIs won't be sent.

       OTOH, if that's the case then the CPU which was processing the
       NMI will continue to be stuck until the next NMI hits which
       will detect that the CPU is stuck which is a good enough
       reason to send a shorthand IPI to all CPUs ignoring the
       watchdog cpumask.

       Same limitation vs. shorthand mode as #2

    4) Changing affinity of the HPET NMI from NMI

       As we established two years ago that cannot work with interrupt
       remapping

    5) Changing affinity of the HPET NMI from irq_work

       Same issues as #3

Anything else than #2 is just causing more problems than it solves, but
surely the NOHZ_FULL/isolation people might have opinions on this.

OTOH, as this is opt-in, anything which wants a watchdog mask which is
not the full online set, has to accept that HPET has these restrictions.

And that's exactly what I suggested two years ago:

 https://lore.kernel.org/lkml/alpine.DEB.2.21.1906172343120.1963@nanos.tec.linutronix.de/

  "It definitely would be worthwhile to experiment with that. if we
   could use shorthands (also for regular IPIs) that would be a great
   improvement in general and would nicely solve that NMI issue. Beware
   of the dragons though."

As a consequence of this conversation I implemented shorthand IPIs...

But I haven't seen any mentioning that this has been tried, why the
approach was not chosen or any discussion about that matter.

Not that I'm surprised.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-05-14 21:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04 19:10 [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 1/7] x86/apic: Add irq_cfg::delivery_mode Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 2/7] x86/hpet: Introduce function to identify HPET hardlockup detector irq Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 3/7] iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 4/7] iommu/amd: Set the IRTE delivery mode from irq_cfg Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt Ricardo Neri
2021-05-04 23:03   ` Thomas Gleixner
2021-05-14  1:57     ` Ricardo Neri
2021-05-14 21:31   ` Thomas Gleixner
2021-05-04 19:10 ` [RFC PATCH v5 6/7] iommu/amd: " Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 7/7] x86/watchdog/hardlockup/hpet: Support interrupt remapping Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).